Why everyone should pay attention to Stable Diffusion

Many of the people in my circles hadn’t heard of Stable Diffusion until I told them, and I was already two days late. Heralds of new technologies have a tendency to play up every new thing, however incremental, as the dawn of a new revolution – but in this case, their cries of wolf may be real for once.

Stable Diffusion is an AI tool produced by Stability.ai with help from researchers at the Ludwig Maximilian University of Munich and the Large-scale AI Open Network (LAION). It accepts text or image prompts and converts them into artwork based on, but not necessarily understand, what it ‘sees’ in the input. It created the image below with my prompt “desk in the middle of the ocean vaporwave”. You can create your own here.

But it strayed into gross territory with a different prompt: “beautiful person floating through a colourful nebula”.

Stable Diffusion is like OpenAI’s DALL-E 1/2 and Google’s Imagen and Parti but with two crucial differences: it’s capable of image-to-image (img2img) generation as well and it’s open source.

The img2img feature is particularly mind-blowing because it allows users to describe the scene using text and then guide the Stable Diffusion AI by using a little bit of their own art. Even a drawing on MS Paint with a few colours will do. And while OpenAI and Google hold their cards very close to their chests, with the latter even refusing to release Imagen or Parti in private betas, Stability.ai has – in keeping with its vision to democratise AI – opened Stable Diffusion for tinkering and augmentation by developers en masse. Even the ways in which Stable Diffusion has been released are important: trained developers can work directly with the code while untrained users can access the model in their browsers, without any code, and start producing images. In fact, you can download and run the underlying model on your system, requiring some slightly higher-end specs. Users have already created ways to plug it into photo-editing software like Photoshop.

Stable Diffusion uses a diffusion model: a filter (essentially an algorithm) that takes noisy data and progressively de-noises it. In incredibly simple terms, researchers take an image and in a step-wise process add more and more noise to it. Next they feed this noisy image to the filter, which then removes the noise from the image in a similar step-wise process. You can think of the image as a signal, like the images you see on your TV, which receives broadcast signals from a transmitter located somewhere else. These broadcast signals are basically bundles of electromagnetic waves with information encoded into the waves’ properties, like their frequency, amplitude and phase. Sometimes the visuals aren’t clear because some other undesirable signal has become mixed up with the broadcast signal, leading to grainy images on your TV screen. This undesirable information is called noise.

When the noise waveform resembles that of a bell curve, a.k.a. a Gaussian function, it’s called Gaussian noise. Now, if we know the manner in which noise has been added to the image in each step, we can figure out what the filter needs to do to de-noise the image. Every Gaussian function can be characterised by two parameters, the mean and the variance. Put another way, you can generate different bell-curve-shaped signals by changing the mean and the variance in each case. So the filter effectively only needs to figure out what the mean and the variance in the noise of the input image are, and once it does, it can start de-noising. That is, Stable Diffusion is (partly) the filter here. The input you provide is the noisy image. Its output is the de-noised image. So when you supply a text prompt and/or an accompanying ‘seed’ image, Stable Diffusion just shows off how well it has learnt to de-noise your inputs.

Obviously, when millions of people use Stable Diffusion, the filter is going to be confronted with too many mean-variance combinations for it to be able to directly predict them. This is where an artificial neural network (ANN) helps. ANNs are data-processing systems set up to mimic the way neurons work in our brain, by combining different pieces of information and manipulating them according to their knowledge of older information. The team that built Stable Diffusion trained its model on 5.8 billion image-text pairs found around the internet. An ANN is then programmed to learn from this dataset as to how texts and images correlate as well as how images and images correlate.

To keep this exercise from getting out of hand, each image and text input is broken down into certain components, and the machine is instructed to learn correlations only between these components. Further, the researchers used an ANN model called an autoencoder. Here, the ANN encodes the input in its own representation, using only the information that it has been taught to consider important. This intermediate is called the bottleneck layer. The network then decodes only the information present in this layer to produce the de-noised output. This way, the network also learns what about the input is most important. Finally, researchers also guide the ANN by attaching weights to different pieces of information: that is, the system is informed that some pieces are to be emphasised more than others, so that it acquires a ‘sense’ of less and more desirable.

By snacking on all those text-image pairs, the ANN effectively acquires its own basis to decide when it’s presented a new bit of text and/or image what the mean and variance might be. Combine this with the filter and you get Stable Diffusion. (I should point out again that this is a very simple explanation and that parts of it may well be simplistic.)

Stable Diffusion also comes with an NSFW filter built-in, a component called Safety Classifier, which will stop the model from producing an output that it deems harmful in some way. Will it suffice? Probably not, given the ingenuity of trolls, goblins and other bad-faith actors on the internet. More importantly, it can be turned off, meaning Stable Diffusion can be run without the Safety Classifier to produce deepfakes that are various degrees of disturbing.

Recommended here: Deepfakes for all: Uncensored AI art model prompts ethics questions.

But the problems with Stable Diffusion don’t lie only in the future, immediate or otherwise. As I mentioned earlier, to create the model, Stability.ai & co. fed their machine 5.8 billion text-image pairs scraped from the internet – without the consent of the people who created those texts and images. Because Stability.ai released Stable Diffusion in toto into the public domain, it has been experimented with by tens of thousands of people, at least, and developers have plugged it into a rapidly growing number of applications. This is to say that even if Stability.ai is forced to pull the software because it didn’t have the license to those text-image pairs, the cat is out of the bag. There’s no going back. A blog post by LAION only says that the pairs were publicly available and that models built on the dataset should thus be restricted to research. Do you think the creeps on 4chan care? Worse yet, the jobs of the very people who created those text-image pairs are now threatened by Stable Diffusion, which can – with some practice to get your prompts right – produce exactly what you need, no illustrator or photographer required.

Recommended here: Stable Diffusion is a really big deal.

The third interesting thing about Stable Diffusion, after its img2img feature + “deepfakes for all” promise and the questionable legality of its input data, is the license under which Stability.ai has released it. AI analyst Alberto Romero wrote that “a state-of-the-art AI model” like Stable Diffusion “available for everyone through a safety-centric open-source license is unheard of”. This is the CreativeML Open RAIL-M license. Its preamble says, “We believe in the intersection between open and responsible AI development; thus, this License aims to strike a balance between both in order to enable responsible open-science in the field of AI.” Attachment A of the license spells out the restrictions – that is, what you can’t do if you agree to use Stable Diffusion according to the terms of the license (quoted verbatim):

“You agree not to use the Model or Derivatives of the Model:

  • In any way that violates any applicable national, federal, state, local or international law or regulation;
  • For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
  • To generate or disseminate verifiably false information and/or content with the purpose of harming others;
  • To generate or disseminate personal identifiable information that can be used to harm an individual;
  • To defame, disparage or otherwise harass others;
  • For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
  • For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
  • To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
  • For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
  • To provide medical advice and medical results interpretation;
  • To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).”

As a result of these restrictions, law enforcement around the world has incurred a heavy burden, and I don’t think Stability.ai took the corresponding stakeholders into confidence before releasing Stable Diffusion. It should also go without saying that the license choosing to colour within the lines of the laws of respective countries means, say, a country that doesn’t recognise X as a crime will also fail to recognise harm in the harrassment of victims of X – now with the help of Stable Diffusion. And the vast majority of these victims are women and children, already disempowered by economic, social and political inequities. Is Stability.ai going to deal with these people and their problems? I think not. But as I said, the cat’s already out of the bag.

When a teenager wants to solve poaching with machine-learning…

We always need more feel-good stories, but we need those feel-good stories more that withstand closer scrutiny instead of falling apart, and framed the right way.

For example, Smithsonian magazine published an article with the headline ‘This Teenager Invented a Low-Cost Tool to Spot Elephant Poachers in Real Time’ on August 4. It’s a straightforward feel-good story at first glance: Anika Puri is a 17-year-old in New York who created a machine-learning model (based on an existing dataset) “that analyses the movement patterns of humans and elephants”. The visual input for the model comes from a $250 thermal camera attached to an iPhone attached to a drone, which flies over problem areas and collects data, and which the model then sifts through to pick out the presence of humans. One caveat: the machine-learning model can detect people, not poachers.

Nonetheless, this is clearly laudable work by a 17-year-old – but the article is an affront to people working in India because it plainly overlooks everything that makes elephant poaching tenacious enough to have caught Puri’s attention in the first place. A 17-year-old did this and we should celebrate her, you say, and that’s fair. But we can do that without making what she did sound like a bigger deal than it is, which would also provide a better sense of how much work she has left to do, while expressing our belief – this is important – that we look forward to her and others like her applying their minds to really doing something about the problem. This way, we may also be able to salvage two victims of the Smithsonian article.

The first is why elephant poaching persists. The article gives the impression that it does for want of a way to tell when humans walk among elephants in the wild. The first red-flag in the article, to me at least, is related to this issue and turns up in the opening itself:

When Anika Puri visited India with her family four years ago, she was surprised to come across a market in Bombay filled with rows of ivory jewelry and statues. Globally, ivory trade has been illegal for more than 30 years, and elephant hunting has been prohibited in India since the 1970s. “I was quite taken aback,” the 17-year-old from Chappaqua, New York, recalls. “Because I always thought, ‘well, poaching is illegal, how come it really is still such a big issue?'”

I admit I take a cynical view of people who remain ignorant in this day and age of the bigger problems assailing the major realms of human enterprise – but a 17-year-old being surprised by the availability of ivory ornaments in India is pushing it, and more so by being surprised that there’s a difference between the existence of a law and its proper enforcement. Smithsonian also presents Puri’s view as an outsider, which she is in more than the geographical sense, followed by her resolving to do something about it from the outside. That was the bigger issue and a clear sign of the narrative to come.

Poaching and animal-product smuggling persist in India, among other countries, sensu lato because of a lack of money, a lack of personnel, misplaced priorities and malgovernance and incompetence. The first and the third reasons are related: the Indian government’s conception of how the country’s forests ought to be protected regularly exclude the welfare of the people living in and dependent on those forests, and thus socially and financially alienates them. As a result, some of those affected see a strong incentive in animal poaching and smuggling. (There are famous exceptions to this trend, like the black-necked crane of Arunachal Pradesh, the law kyntang forests of Meghalaya or the whale sharks off Gujarat but they’re almost always rooted in spiritual beliefs – something the IUCN wants to press to the cause of conservation.)

Similarly, forest rangers are underpaid, overworked, use dysfunctional or outdated equipment and, importantly, are often caught between angry locals and an insensitive local government. In India, theirs is a dispriting vocation. In this context the use of drones plus infrared cameras that each cost Rs 20,000 is laughable.

The ‘lack of personnel’ is a two-part issue: it helps the cause of animal conservation if the personnel include members of local communities, but they seldom do; second, India is a very large country, so we need more rangers (and more drones!) to patrol all areas, without any blind spots. Anika Puri’s solution has nothing on any of these problems – and I don’t blame her. I blame the Smithsonian for its lazy framing of the story, and in fact for telling us nothing of whether she’s already aware of these issues.

The second problem with the framing has to do with ‘encouraging a smart person to do more’ on the one hand and the type of solution being offered to a problem on the other. This one really gets my goat. When Smithsonian played up Puri’s accomplishment, such as it is, it effectively championed techno-optimism: the belief that technology is a moral good and that technological solutions can solve our principal crises (crises that techno-optimists like to play up so that they seem more pressing, and thus more in need of the sort of fixes that machine-centric governance can provide). In the course of this narrative, however, the sociological and political solutions that poaching desperately requires fall by the wayside, even as the trajectories of the tech and its developer are celebrated as a feel-good story.

In this way, the Smithsonian article has effectively created a false achievement, a red herring that showcases its subject’s technical acumen instead of a meaningful development towards solving poaching. On the other hand, how often do you read profiles of people, young or old, whose insights have been concerned less with ‘hardware’ solutions (technological innovation, infrastructure, etc.) and more with improving and implementing the ‘software’ – that is, changing people’s behaviour, deliberating on society’s aspirations and effecting good governance? How often do you also encounter grants and contests of the sort that Puri won with her idea but which are dedicated to the ‘software’ issues?

Getting ahead of theory, experiment, ourselves

Science journalist Laura Spinney wrote an article in The Guardian on January 9, 2022, entitled ‘Are we witnessing the dawn of post-theory science?’. This excerpt from the article captures its points well, I thought:

Or take protein structures. A protein’s function is largely determined by its structure, so if you want to design a drug that blocks or enhances a given protein’s action, you need to know its structure. AlphaFold was trained on structures that were derived experimentally, using techniques such as X-ray crystallography and at the moment its predictions are considered more reliable for proteins where there is some experimental data available than for those where there is none. But its reliability is improving all the time, says Janet Thornton, former director of the EMBL European Bioinformatics Institute (EMBL-EBI) near Cambridge, and it isn’t the lack of a theory that will stop drug designers using it. “What AlphaFold does is also discovery,” she says, “and it will only improve our understanding of life and therapeutics.”

Essentially, the article is concerned with machine-learning’s ability to parse large amounts of data, find patterns in them and use them to generate theories – taking over an important realm of human endeavour. In keeping with tradition, it doesn’t answer the question in its headline with a definitive ‘yes’ but with a hard ‘maybe’ to a soft ‘no’. Spinney herself ends by quoting Picasso: “Computers are useless. They can only give you answers” – although the para right before belies the painter’s confidence with a prayer that the human way to think about theories is still meaningful and useful:

The final objection to post-theory science is that there is likely to be useful old-style theory – that is, generalisations extracted from discrete examples – that remains to be discovered and only humans can do that because it requires intuition. In other words, it requires a kind of instinctive homing in on those properties of the examples that are relevant to the general rule. One reason we consider Newton brilliant is that in order to come up with his second law he had to ignore some data. He had to imagine, for example, that things were falling in a vacuum, free of the interfering effects of air resistance.

I’m personally cynical about such claims. If we think we are going to be obsolete, there must be a part of the picture we’re missing.

There was an idea partly similar to this ‘post-theory hypothesis’ a few years ago, and pointing the other way. In 2013, philosopher Richard Dawid wrote a 190-page essay attempting to make the case that string theory shouldn’t be held back by the lack of experimental evidence, i.e. that it was post-empirical. Of course, Spinney is writing about machines taking over the responsibility of, but not precluding the need for, theorising – whereas Dawid and others have argued that string theory doesn’t need experimental data to stay true.

The idea of falsifiability is important here. If a theory is flawed and if you can design an experiment that would reveal that flaw, the theory is said to be falsifiable. A theory can be flawless but still falsifiable: for example, Newton’s theory of gravity is complete and useful in a limited context but, for example, can’t explain the precession of the perihelion of Mercury’s orbit. An example of an unfalsifiable theory is the one underlying astrology. In science, falsifiable theories are said to be better than unfalsifiable ones.

I don’t know what impact Dawid’s book-length effort had, although others before and after him have supported the view that scientific theories should no longer be falsifiable in order to be legitimate. Sean Carroll for one. While I’m not familiar enough with criticisms of the philosophy of falsifiability, I found a better reason to consider the case to trust the validity of string theory sans experimental evidence in a June 2017 preprint paper written by Eva Silverstein:

It is sometimes said that theory has strayed too far from experiment/observation. Historically, there are classic cases with long time delays between theory and experiment – Maxwell’s and Einstein’s waves being prime examples, at 25 and 100 years respectively. These are also good examples of how theory is constrained by serious mathematical and thought-experimental con- sistency conditions.

Of course electromagnetism and general relativity are not representative of most theoretical ideas, but the point remains valid. When it comes to the vast theory space being explored now, most testable ideas will be constrained or falsified. Even there I believe there is substantial scientific value to this: we learn something significant by ruling out a valid theoretical possibility, as long as it is internally consistent and interesting. We also learn important lessons in excluding potential alternative theories based on theoretical consistency criteria.

This said, Dawid’s book, entitled String Theory and the Scientific Method, was perhaps the most popular prouncement of his views in recent years (at least in terms of coverage in the non-technical press), even if by then he’d’ been propounding them for nine years and if his supporters included a bevy of influential physicists. Very simply put, an important part of Dawid’s arguments was that string theory, as a theory, has certain characteristics that make it the only possible theory for all the epistemic niches that it fills, so as long as we expect all those niches to filled by a single theory, string theory may be true by virtue of being the sole possible option.

It’s not hard to see the holes in this line of reasoning, but again, I’ve considerably simplified his idea. But this said, physicist Peter Woit has been (from what little I’ve seen) the most vocal critic of string theorists’ appeals to ‘post-empirical realism’ and has often directed his ire against the uniqueness hypothesis, significantly because accepting it would endanger, for the sake of just one theory’s survival, the foundation upon which almost every other valid scientific theory stands. You must admit this is a powerful argument, and to my mind more persuasive than Silverstein’s argument.

In the words of another physicist, Carlo Rovelli, from September 2016:

String theory is a proof of the dangers of relying excessively on non-empirical arguments. It raised great expectations thirty years ago, when it promised to [solve a bunch of difficult problems in physics]. Nothing of this has come true. String theorists, instead, have [made a bunch of other predictions to explain why it couldn’t solve what it set out to solve]. All this was false.

From a Popperian point of view, these failures do not falsify the theory, because the theory is so flexible that it can be adjusted to escape failed predictions. But from a Bayesian point of view, each of these failures decreases the credibility in the theory, because a positive result would have increased it. The recent failure of the prediction of supersymmetric particles at LHC is the most fragrant example. By Bayesian standards, it lowers the degree of belief in string theory dramatically. This is an empirical argument. Still, Joe Polchinski, prominent string theorist, writes in that he evaluates the probability of string to be correct at 98.5% (!).

Scientists that devoted their life to a theory have difficulty to let it go, hanging on non-empirical arguments to save their beliefs, in the face of empirical results that Bayes confirmation theory counts as negative. This is human. A philosophy that takes this as an exemplar scientific attitude is a bad philosophy of science.

Google Docs: A New Hope

I suspect the Google Docs grammar bot is the least useful bot there is. After hundreds of suggestions, I can think of only one instance in which it was right. Is its failure rate so high because it learns from how other people use English, instead of drawing from a basic ruleset?

I’m not saying my grammar is better than everyone else’s but if the bot is learning from how non-native users of the English language construct their sentences, I can see how it would make the suggestions it does, especially about the use of commas and singular/plural referents.

Then again, what I see as failure might be entirely invisible to someone not familiar with, or even interested in, punctuation pedantry. This is where Google Docs’s bot presents an interesting opportunity.

The rules of grammar and punctuation exist to assist the construction and inference of meaning, not to railroad them. However, this definition doesn’t say whether good grammar is simply what most people use and are familiar with or what is derived from a foundational set of rules and guidelines.

Thanks to colonialism, imperialism and industrialism, English has become the world’s official language, but thanks to their inherent political structures, English is also the language of the elite in postcolonial societies that exhibit significant economic inequality.

So those who wield English ‘properly’ – by deploying the rules of grammar and punctuation the way they’re ‘supposed’ to – are also those who have been able to afford a good education. Ergo, deferring to the fundamental ruleset is to flaunt one’s class privilege, and to expect others to do so could play out as a form of linguistic subjugation (think The New Yorker).

On the other hand, the problem with the populist ontology is that it encourages everyone to develop their own styles and patterns based on what they’ve read – after all, there is no one corpus of popular literature – that are very weakly guided by the same logic, if they’re guided by any logic at all. This could render individual pieces difficult to read (or edit).

Now, a question automatically arises: So what? What does each piece employing a different grammar and punctuation style matter as long as you understand what the author is saying? The answer, to me at least, depends on how the piece is going to find itself in the public domain and who is going to read it.

For example, I don’t think anyone would notice if I published such erratic pieces on my blog (although I don’t) – but people will if such pieces show up in a newspaper or a magazine, because newsrooms enforce certain grammatical styles for consistency. Such consistency ensures that:

  1. Insofar as grammar must assist inference, consistent patterns ensure a regular reader is familiar with the purpose the publication’s styleguide serves in the construction of sentences and paragraphs, which in turn renders the symbols more useful and invisible at the same time;
  2. The writers, while bringing to bear their own writing styles and voices, still use a ‘minimum common’ style unique to and associated with the publication (and which could ease decision-making for some writers); and
  3. The publication can reduce the amount of resources expended to train each new member of its copy-editing team

Indeed, I imagine grammatical consistency matters to any professional publication because of the implicit superiority of perfect evenness. But where it gets over the top and unbearable is when its purpose is forgotten, or when it is effected as a display of awareness of, or affiliation to, some elite colonial practice.

Now, while we can agree that the populist definition is less problematic on average, we must also be able to recognise that the use of a ‘minimum common’ remains a good idea if only to protect against the complete dilution of grammatical rules with time. For example, despite the frequency with which it is abused, the comma still serves at least one specific purpose: to demarcate clauses.

In this regard, the Google Docs bot could help streamline the chaos. According to the service’s support documentation, the bot learns its spelling instead of banking exclusively on a dictionary; it’s not hard to extrapolate this behaviour to grammar and syntactic rules as well.

Further, every time you reject the bot’s suggested change, the doc displays the following message: “Thanks for submitting feedback! The suggestion has been automatically ignored.” This isn’t sufficient evidence to conclude that the bot doesn’t learn. For one, the doc doesn’t display a similar message when a suggestion is accepted. For another, Google tracks the following parameters when you’re editing a doc:

customer-type, customer-id, customer-name, storageProvider, isOwner, editable, commentable, isAnonymousUser, offlineOptedIn, serviceWorkerControlled, zoomFactor, wasZoomed, docLocale, locale, docsErrorFatal, isIntegrated, companion-guest-Keep-status, companion-guest-Keep-buildLabel, companion-guest-Tasks-status, companion-guest-Tasks-buildLabel, companion-guest-Calendar-status, companion-guest-Calendar-buildLabel, companion-expanded, companion-overlaying-host-content, spellGrammar, spellGrammarDetails, spellGrammarGroup, spellGrammarFingerprint

Of them, spellGrammar is set to true and I assume spellGrammarFingerprint corresponds to a unique ID.

So assuming further that it learns through individual feedback, the bot must be assimilating a dataset in the background within whose rows and columns an ‘average modal pattern’ could be taking shape. As more and more users accept or reject its suggestions, the mode could become correspondingly more significant and form more of the basis for the bot’s future suggestions.

There are three problems, however.

First, if individual preferences have diverged to such an extent as to disfavour the formation of a single most significant modal style, the bot is unlikely to become useful in a reasonable amount of time or unless it combines user feedback with the preexisting rules of grammar and composition.

Second, Google could have designed each bot to personalise its suggestions according to each account-holder’s writing behaviour. This is quite possible because the more the bot is perceived to be helpful, the likelier its suggestions are to be accepted, and the likelier the user is to continue using Google Docs to compose their pieces.

However, I doubt the bot I encounter on my account is learning from my feedback alone, and it gives me… hope?

Third: if the bot learns only spelling but not grammar and punctuation use, it would be – as I first suspected – the least useful bot there is.

Injustice ex machina

There are some things I think about but struggle to articulate, especially in the heat of an argument with a friend. Cory Doctorow succinctly captures one such idea here:

Empiricism-washing is the top ideological dirty trick of technocrats everywhere: they assert that the data “doesn’t lie,” and thus all policy prescriptions based on data can be divorced from “politics” and relegated to the realm of “evidence.” This sleight of hand pretends that data can tell you what a society wants or needs — when really, data (and its analysis or manipulation) helps you to get what you want.

If you live in a country ruled by a nationalist government tending towards the ultra-nationalist, you’ve probably already encountered the first half of what Doctorow describes: the championship of data, and quantitative metrics in general, the conflation of objectivity with quantification, the overbearing focus on logic and mathematics to the point of eliding cultural and sociological influences.

Material evidence of the latter is somewhat more esoteric, yet more common in developing countries where the capitalist West’s influence vis-à-vis consumption and the (non-journalistic) media are distinctly more apparent, and which is impossible to unsee once you’ve seen it.

Notwithstanding the practically unavoidable consequences of consumerism and globalisation, the aspirations of the Indian middle and upper classes are propped up chiefly by American and European lifestyles. As a result, it becomes harder to tell the “what society needs” and the “get what you want” tendencies apart. Those developing new technologies to (among other things) enhance their profits arising from this conflation are obviously going to have a harder time seeing it and an even harder time solving for it.

Put differently, AI/ML systems – at least those in Doctorow’s conception, in the form of machines adept at “finding things that are similar to things the ML system can already model” – born in Silicon Valley have no reason to assume a history of imperialism and oppression, so the problems they are solving for are off-target by default.

But there is indeed a difference, and not infrequently the simplest way to uncover it is to check what the lower classes want. More broadly, what do the actors with the fewest degrees of freedom in your organisational system want, assuming all actors already want more freedom?

They – as much as others, and at the risk of treating them as a monolithic group – may not agree that roads need to be designed for public transportation (instead of cars), that the death penalty should be abolished or that fragmenting a forest is wrong but they are likely to determine how a public distribution system, a social security system or a neighbourhood policing system can work better.

What they want is often what society needs – and although this might predict the rise of populism, and even anti-intellectualism, it is nonetheless a sort of pragmatic final check when it has become entirely impossible to distinguish between the just and the desirable courses of action. I wish I didn’t have to hedge my position with the “often” but I remain unable with my limited imagination to design a suitable workaround.

Then again, I am also (self-myopically) alert to the temptation of technological solutionism, and acknowledge that discussions and negotiations are likely easier, even if messier, to govern with than ‘one principle to rule them all’.

Hey, is anybody watching Facebook?

The Boston Marathon bombings in April 2013 kicked off a flurry of social media activity that was equal parts well-meaning and counterproductive. Users on Facebook and Twitter shared reports, updates and photos of victims, spending little time on verifying them before sharing them with thousands of people.

Others on forums like Reddit and 4chan started to zero in on ‘suspects’ in photos of people seen with backpacks. Despite the amount of distress and disruption these activities, the social media broadly also served to channel grief and help, and became a notable part of the Boston Marathon bombings story.

In our daily lives, these platforms serve as news forums. With each person connected to hundreds of others, there is a strong magnification of information, especially once it crosses a threshold. They make it easier for everybody to be news-mongers (not journalists). Add this to the idea that using a social network can just as easily be a social performance, and you realize how the sharing of news can also be part of the performance.

Consider Facebook: Unlike Twitter, it enables users to share information in a variety of forms – status updates, questions, polls, videos, galleries, pages, groups, etc – allowing whatever news to retain its multifaceted attitude, and imposing no character limit on what you have to say about it.

Facebook v. Twitter

So you’d think people who want the best updates on breaking news would go to Facebook, and that’s where you might be wrong. ‘Might’ because, on the one hand, Twitter has a lower response time, keeps news very accessible, encourages a more non-personal social media performance, and has a high global reach. These reasons have also made Twitter a favorite among researchers who want to study how information behaves on a social network.

On the other hand, almost 30% of the American general population gets its news from Facebook, with Twitter and YouTube at par with a command of 10%, if a Pew Research Center technical report is to be believed. Other surveys have also shown that there are more people from India who are on Facebook than on Twitter. At this point, it’d just seem inconsiderate when you realize Facebook does have 1.28 billion monthly active users from around the world.

A screenshot of Facebook Graph Search.
A screenshot of Facebook Graph Search.

Since 2013, Facebook has made it easier for users to find news in its pages. In June that year, it introduced the #hashtagging facility to let users track news updates across various conversations. In September, it debuted Graph Search, making it easier for people to locate topics they wanted to know more about. Even though the platform’s allowance for privacy settings stunts the kind of free propagation of information that’s possible on Twitter (and only 28% of Facebook users made any of their content publicly available), Facebook’s volume of updates enables its fraction of public updates rise to levels comparable with those of Twitter.

Ponnurangam Kumaraguru and Prateek Dewan, from the Indraprastha Institute of Information Technology, New Delhi (IIIT-D), leveraged this to investigate how Facebook and Twitter compared when sharing information on real-world events. Kumaraguru explained his motivation: “Facebook is so famous, especially in India. It’s much bigger in terms of the number of users. Also, having seen so many studies on Twitter, we were curious to know if the same outcomes as from work done on Twitter would hold for Facebook.”

The duo used the social networks’ respective APIs to query for keywords related to 16 events that occurred during 2013. They explain, “Eight out of the 16 events we selected had more than 100,000 posts on both Facebook and Twitter; six of these eight events saw over 1 million tweets.” Their pre-print paper was submitted to arXiv on May 19.

An upper hand

In all, they found that an unprecedented event appeared on Facebook just after 11 minutes while on Twitter, according to a 2014 study from the Association for the Advancement of Artificial Intelligence (AAAI), it took over ten times as longer. Specifically, after the Boston Marathon bombings, “the first [relevant] Facebook post occurred just 1 minute 13 seconds after the first blast, which was 2 minutes 44 seconds before the first tweet”.

However, this order-of-magnitude difference could be restricted to Kumaraguru’s choice of events because the AAAI study claims breaking news was broken fastest during 29 major events on Twitter, although it considered only updates on trending topics (and the first update on Twitter, according to them, appeared after two hours).

The data-mining technique could also have played a role in offsetting the time taken for an event to be detected because it requires the keywords being searched to be manually keyed. Finally, the Facebook API is known to be more rigorous than Twitter’s, whose ability to return older tweets is restricted. On the downside, the output from the Facebook API is restricted by users’ privacy settings.

Nevertheless, Kumaraguru’s conclusions paint a picture of Facebook being just as resourceful as Twitter when tracking real-world events – especially in India – leaving news discoverability to take the blame. Three of the 16 chosen events were completely local to India, and they were all accompanied by more activity on Facebook than on Twitter.

table1

Even after the duo corrected for URLs shared on both social networks simultaneously (through clients like Buffer and HootSuite) – 0.6% of the total – Facebook had the upper hand not just in primacy but also origin. According to Kumaraguru and Dewan, “2.5% of all URLs shared on Twitter belonged to the facebook.com domain, but only 0.8% of all URLs shared on Facebook belonged to the twitter.com domain.”

Facebook also seemed qualitatively better because spam was present in only five events. On Twitter, spam was found to be present in 13. This disparity can be factored in by programs built to filter spam from social media timelines in real-time, the sort of service that journalists will find very useful.

Kumaraguru and Dewan resorted to picking out spam based on differences in sentence styles. This way, they were able to avoid missing spam that was stylistically conventional but irrelevant in terms of content, too. A machine wouldn’t have been able to do this just as well and in real-time unless it was taught – in much the same way you teach your Google Mail inbox to automatically sort email.

Digital information forensics

A screenshot of TweetCred at work. Image: Screenshot of TweetCred Chrome Extension
A screenshot of TweetCred at work. Image: Screenshot of TweetCred Chrome Extension

Patrick Meier, a self-proclaimed – but reasonably so – pioneer in the emerging field of humanitarian technologies, wrote a blog post on April 28 describing a browser extension called TweetCred which is just this sort of learning machine. Install it and open Twitter in your browser. Above each tweet, you will now see a credibility rating bar that grades each tweet out of 7 points, with 7 describing the most credibility.

If you agree with each rating, you can bolster with a thumbs-up that appears on hover. If you disagree, you can give the shown TweetCred rating a thumbs down and mark what you think is correct. Meier makes it clear that, in its first avatar, the app is geared toward rating disaster/crisis tweets. A paper describing the app was submitted to arXiv on May 21, co-authored by Kumaraguru, Meier, Aditi Gupta (IIIT-D) and Carlos Castillo (Qatar Computing Research Institute).

Between the two papers, a common theme is the origin and development of situational awareness. We stick to Twitter for our breaking news because it’s conceptually similar to Facebook, fast and importantly cuts to the chase, so to speak. Parallely, we’re also aware that Facebook is similarly equipped to reconstruct details because of its multimedia options and timeline. Even if Facebook and Twitter the organizations believe that they are designed to accomplish different things, the distinction blurs in the event of a real-world crisis.

“Both these networks spread situational awareness, and both do it fairly quickly, as we found in our analysis,” Kumaraguru said. “We’d like to like to explore the credibility of content on Facebook next.” But as far as establishing a mechanism to study the impact of Facebook and Twitter on the flow of information is concerned, the authors have exposed a facet of Facebook that Facebook, Inc., could help leverage.