Neuromorphic hype

We all know there’s a difference between operating an Indica Diesel car and a WDP 4 diesel locomotive. The former has two cylinders and the latter 16. But that doesn’t mean the WDP 4 simply has eight times more components as the Indica. This is what comes to my mind when I come across articles that trumpet an achievement without paying any attention to its context.

In an example from yesterday, IEEE Spectrum published an article with the headline ‘Nanowire Synapses 30,000x Faster Than Nature’s’. An artificial neural network is a network of small data-processing components called neurons. Once the neurons are fed data, they work together to analyse it and solve problems (like spotting the light from one star in a picture of a galaxy). The network also iteratively adjusts the connections between neurons, called synapses, so that the neurons cooperate more efficiently. The architecture and the process broadly mimic the way the human brain works, so they’re also collected under the label ‘neuromorphic computing’.

Now consider this excerpt:

“… a new superconducting photonic circuit … mimics the links between brain cells—burning just 0.3 percent of the energy of its human counterparts while operating some 30,000 times as fast. … the synapses are capable of [producing output signals at a rate] exceeding 10 million hertz while consuming roughly 33 attojoules of power per synaptic event (an attojoule is 10-18 of a joule). In contrast, human neurons have a maximum average [output] rate of about 340 hertz and consume roughly 10 femtojoules per synaptic event (a femtojoule is 10-15 of a joule).”

The article, however, skips the fact that the researchers operated only four circuit blocks in their experiment – while there are 86 billion neurons on average in the human brain working at the ‘lower’ efficiency. When such a large assemblage functions together, there are emergent problems that aren’t present when a smaller assemblage is at work, like removing heat and clearing cellular waste. (The human brain also contains “85 billion non-neuronal cells”, including the glial cells that support neurons.) The energy efficiency of the neurons must be seen in this context, instead of being directly compared to a bespoke laboratory setup.

Philip W. Anderson’s ‘more is different’ argument provides a more insightful argument against such reductive thinking. In a 1972 essay, Anderson, a theoretical physicist, wrote:

“The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe. In fact, the more the elementary particle physicists tell us about the nature of the fundamental laws the less relevance they seem to have to the very real problems of the rest of science, much less to those of society.”

He contended that the constructionist hypothesis – that you can start from the first principles and arrive straightforwardly at a cutting-edge discovery in that field – “breaks down” because it can’t explain “the twin difficulties scale and complexity”. That is, things that operate on larger scale and with more individual parts are physically greater than the sum of those parts. (I like to think Anderson’s insight to be the spatial analogue of L.P. Hartley’s time-related statement of the same nature: “The past is a foreign country, they do things differently there.”)

So let’s not celebrate something because it’s “30,000x faster than” the same thing in nature – as the Spectrum article’s headline goes – but because it represents good innovation in and of itself. Indeed, the researchers who conducted the new study and are quoted in the article don’t make the comparison themselves but focus on the leap forward their innovation portends in the field of neuromorphic computing.

Faulty comparisons on the other hand could inflate readers’ expectations about what the outcomes of future innovation could be, and when it (almost) inevitably starts to fall behind nature’s achievements, those unmet expectations could seed disillusionment. We’ve already had this happen with quantum computing. Spectrum‘s choice could have been motivated by wanting to pique readers’ interest, which is a fair thing to aspire to, but it remains that the headline employed a clichéd comparison, with nature, instead of expending more effort and framing the idea right.

Why everyone should pay attention to Stable Diffusion

Many of the people in my circles hadn’t heard of Stable Diffusion until I told them, and I was already two days late. Heralds of new technologies have a tendency to play up every new thing, however incremental, as the dawn of a new revolution – but in this case, their cries of wolf may be real for once.

Stable Diffusion is an AI tool produced by Stability.ai with help from researchers at the Ludwig Maximilian University of Munich and the Large-scale AI Open Network (LAION). It accepts text or image prompts and converts them into artwork based on, but not necessarily understand, what it ‘sees’ in the input. It created the image below with my prompt “desk in the middle of the ocean vaporwave”. You can create your own here.

But it strayed into gross territory with a different prompt: “beautiful person floating through a colourful nebula”.

Stable Diffusion is like OpenAI’s DALL-E 1/2 and Google’s Imagen and Parti but with two crucial differences: it’s capable of image-to-image (img2img) generation as well and it’s open source.

The img2img feature is particularly mind-blowing because it allows users to describe the scene using text and then guide the Stable Diffusion AI by using a little bit of their own art. Even a drawing on MS Paint with a few colours will do. And while OpenAI and Google hold their cards very close to their chests, with the latter even refusing to release Imagen or Parti in private betas, Stability.ai has – in keeping with its vision to democratise AI – opened Stable Diffusion for tinkering and augmentation by developers en masse. Even the ways in which Stable Diffusion has been released are important: trained developers can work directly with the code while untrained users can access the model in their browsers, without any code, and start producing images. In fact, you can download and run the underlying model on your system, requiring some slightly higher-end specs. Users have already created ways to plug it into photo-editing software like Photoshop.

Stable Diffusion uses a diffusion model: a filter (essentially an algorithm) that takes noisy data and progressively de-noises it. In incredibly simple terms, researchers take an image and in a step-wise process add more and more noise to it. Next they feed this noisy image to the filter, which then removes the noise from the image in a similar step-wise process. You can think of the image as a signal, like the images you see on your TV, which receives broadcast signals from a transmitter located somewhere else. These broadcast signals are basically bundles of electromagnetic waves with information encoded into the waves’ properties, like their frequency, amplitude and phase. Sometimes the visuals aren’t clear because some other undesirable signal has become mixed up with the broadcast signal, leading to grainy images on your TV screen. This undesirable information is called noise.

When the noise waveform resembles that of a bell curve, a.k.a. a Gaussian function, it’s called Gaussian noise. Now, if we know the manner in which noise has been added to the image in each step, we can figure out what the filter needs to do to de-noise the image. Every Gaussian function can be characterised by two parameters, the mean and the variance. Put another way, you can generate different bell-curve-shaped signals by changing the mean and the variance in each case. So the filter effectively only needs to figure out what the mean and the variance in the noise of the input image are, and once it does, it can start de-noising. That is, Stable Diffusion is (partly) the filter here. The input you provide is the noisy image. Its output is the de-noised image. So when you supply a text prompt and/or an accompanying ‘seed’ image, Stable Diffusion just shows off how well it has learnt to de-noise your inputs.

Obviously, when millions of people use Stable Diffusion, the filter is going to be confronted with too many mean-variance combinations for it to be able to directly predict them. This is where an artificial neural network (ANN) helps. ANNs are data-processing systems set up to mimic the way neurons work in our brain, by combining different pieces of information and manipulating them according to their knowledge of older information. The team that built Stable Diffusion trained its model on 5.8 billion image-text pairs found around the internet. An ANN is then programmed to learn from this dataset as to how texts and images correlate as well as how images and images correlate.

To keep this exercise from getting out of hand, each image and text input is broken down into certain components, and the machine is instructed to learn correlations only between these components. Further, the researchers used an ANN model called an autoencoder. Here, the ANN encodes the input in its own representation, using only the information that it has been taught to consider important. This intermediate is called the bottleneck layer. The network then decodes only the information present in this layer to produce the de-noised output. This way, the network also learns what about the input is most important. Finally, researchers also guide the ANN by attaching weights to different pieces of information: that is, the system is informed that some pieces are to be emphasised more than others, so that it acquires a ‘sense’ of less and more desirable.

By snacking on all those text-image pairs, the ANN effectively acquires its own basis to decide when it’s presented a new bit of text and/or image what the mean and variance might be. Combine this with the filter and you get Stable Diffusion. (I should point out again that this is a very simple explanation and that parts of it may well be simplistic.)

Stable Diffusion also comes with an NSFW filter built-in, a component called Safety Classifier, which will stop the model from producing an output that it deems harmful in some way. Will it suffice? Probably not, given the ingenuity of trolls, goblins and other bad-faith actors on the internet. More importantly, it can be turned off, meaning Stable Diffusion can be run without the Safety Classifier to produce deepfakes that are various degrees of disturbing.

Recommended here: Deepfakes for all: Uncensored AI art model prompts ethics questions.

But the problems with Stable Diffusion don’t lie only in the future, immediate or otherwise. As I mentioned earlier, to create the model, Stability.ai & co. fed their machine 5.8 billion text-image pairs scraped from the internet – without the consent of the people who created those texts and images. Because Stability.ai released Stable Diffusion in toto into the public domain, it has been experimented with by tens of thousands of people, at least, and developers have plugged it into a rapidly growing number of applications. This is to say that even if Stability.ai is forced to pull the software because it didn’t have the license to those text-image pairs, the cat is out of the bag. There’s no going back. A blog post by LAION only says that the pairs were publicly available and that models built on the dataset should thus be restricted to research. Do you think the creeps on 4chan care? Worse yet, the jobs of the very people who created those text-image pairs are now threatened by Stable Diffusion, which can – with some practice to get your prompts right – produce exactly what you need, no illustrator or photographer required.

Recommended here: Stable Diffusion is a really big deal.

The third interesting thing about Stable Diffusion, after its img2img feature + “deepfakes for all” promise and the questionable legality of its input data, is the license under which Stability.ai has released it. AI analyst Alberto Romero wrote that “a state-of-the-art AI model” like Stable Diffusion “available for everyone through a safety-centric open-source license is unheard of”. This is the CreativeML Open RAIL-M license. Its preamble says, “We believe in the intersection between open and responsible AI development; thus, this License aims to strike a balance between both in order to enable responsible open-science in the field of AI.” Attachment A of the license spells out the restrictions – that is, what you can’t do if you agree to use Stable Diffusion according to the terms of the license (quoted verbatim):

“You agree not to use the Model or Derivatives of the Model:

  • In any way that violates any applicable national, federal, state, local or international law or regulation;
  • For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
  • To generate or disseminate verifiably false information and/or content with the purpose of harming others;
  • To generate or disseminate personal identifiable information that can be used to harm an individual;
  • To defame, disparage or otherwise harass others;
  • For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
  • For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
  • To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
  • For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
  • To provide medical advice and medical results interpretation;
  • To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).”

As a result of these restrictions, law enforcement around the world has incurred a heavy burden, and I don’t think Stability.ai took the corresponding stakeholders into confidence before releasing Stable Diffusion. It should also go without saying that the license choosing to colour within the lines of the laws of respective countries means, say, a country that doesn’t recognise X as a crime will also fail to recognise harm in the harrassment of victims of X – now with the help of Stable Diffusion. And the vast majority of these victims are women and children, already disempowered by economic, social and political inequities. Is Stability.ai going to deal with these people and their problems? I think not. But as I said, the cat’s already out of the bag.