I took six weeks off to raise a baby and everyone decided it was the time to declare the AI revolution imminent. It’s hard not to take it personally.
The tick-tock of new developments, each more impressive than the last – and each arriving on the scene faster than the last – hit its apogee last week with the near-simultaneous announcement of Google’s Bard and Microsoft’s Bing Chat. Since then, there’s been possible permutation of the discourse, from millenarian claims of an imminent AI eschaton to rejection of the entire field as glorified autocomplete.
I’m not here to settle that debate. Instead, if 2023 is the year AI changes everything, then early in that year is the time to dig a little deeper into what it is, how it works and why it is what it is. And the best way to do that is to start talking about all those little terms that get left out of mainstream coverage because they’re “too techy”.
What the key AI acronyms and jargon really mean
Neural network
Neural networks are the fundamental technology at the heart of the AI boom. Think of them as the equivalent of the steam engine in the first Industrial Revolution: a general-purpose technology that can reach out into myriad different industries and use cases and transform them.
First conceived in the 1940s, neural networks began as efforts to model animal brains, which are made of millions of simple neurons each connected to a few others. Each individual neuron is extremely simple, but quantity begets quality, and enough of them together can learn to perform complex tasks. And the same is true of artificial neural networks, though those neurons are purely algorithmic ideas rather than physical connections.
Like the steam engine, it took decades for the true power of the invention to be understood. A neural network only works with enormous quantities of computing power and data, so they have been curios for most of the last 70 years. That changed at the turn of the millennium, and the age of AI began sputtering slowing into existence.
LLM
A “large language model”, or LLM, is one of the two major AI approaches that have led to the latest burst of progress in the sector. It describes neural networks that are trained using huge collections of text data, like OpenAI’s GPT series, Google’s PaLM or Meta’s LLaMa. For instance, PaLM uses “high-quality web documents, books, Wikipedia, conversations and GitHub code” to develop an understanding of language.
The question an LLM is trying to answer is simple: given a short section of text, what comes next? But performing that task well is incredibly powerful. For one thing, it’s recursive. Once you’ve predicted what comes next, you have a new, slightly longer section of text, which you can feed back into the LLM and repeat the question, generating whole sentences, paragraphs, articles or books.
The question is also general purpose. Predicting what comes next for a short chunk of factual English text is different from predicting what comes next for a short chunk of code, or a question, or a poem, or a pair of translated sentences, or a logic puzzle – but the same approach seems to work quite well for all of those tasks. The larger the language model, the better the result: GPT-3 is 1,500 times bigger than GPT-1, and we don’t seem to be close to discovering the limit.
GAN
What LLMs have done for text, “generative adversarial networks” have done for images, films, music and more. Strictly speaking, a GAN is two neural networks: one built to label, categorise and rate, and the other built to create from scratch. By pairing them together, you can create an AI that can generate content on command.
Say you want an AI that can make pictures. First, you do the hard work of creating the labelling AI, one that can see an image and tell you what is in it, by showing it millions of images that have already been labelled, until it learns to recognise and describe “a dog”, “a bird”, or “a photograph of an orange cut in half, showing that its inside is that of an apple”. Then, you take that program and use it to train a second AI to trick it. That second AI “wins” if it can create an image to which the first AI will give the desired label.
Once you’ve trained that second AI, you’ve got what you set out to build: an AI that you can give a label and get a picture that it thinks matches the label. Or a song. Or a video. Or a 3D model.
Compute
Training a new AI model can be expensive. The final creation of GPT-3 took around $10m of computing time, based on OpenAI’s research papers, and left unsaid is how many abortive efforts it took before the final run came out as intended. That hurdle – access to “compute”, or computing power – means that big general-purpose tools like LLMs tend to be the purview of massive companies. As far back as 2018, OpenAI was warning that the amount of compute used in AI training runs was doubling every three-and-a-half months. A year later, for that reason, the company announced that it would be shifting from a nonprofit model because of the need “to invest billions of dollars in upcoming years into large-scale cloud compute”.
The UK is a world leader in AI research, thanks to the “golden triangle” of Oxford, Cambridge and London. But academics are often limited in their access to the amount of compute they need to work at the cutting edge, which has led to the commercial gains being captured by the American and Chinese corporate giants with billions to invest. That has led to calls for a government-owned “BritGPT”, built with public funds to provide the compute that UK researchers lack.
Black box
Neural networks are often described as a “black box”: the more competent they get, the harder it is to work out how they do what they do. GPT-3 contains 175bn “parameters”, each of which describes how strongly or weakly one neuron affects another. But it’s almost impossible to say what any given parameter does for the LLM as a whole.
Even the overall structure of the neural networks is something of a mystery. Sometimes, we can get a glimpse of order. The “T” in GPT stands for “Transformer”, a way of wiring up the neural network to allow it to mimic short-term memory, which obviously makes sense for something that involves reading a sentence a word at a time. But other aspects of neural network design are more trial and error: for instance, it seems that forcing a neural network to “squeeze” its thinking through a bottleneck of just a few neurons can improve the quality of the output. Why? We don’t really know. It just … does.
Fine tuning
Not everything requires training an AI model from scratch. You can think of the $10m spent on GPT-3 as the cost of teaching an AI to read and write perfect English. But if all you want to do is develop an AI that can, say, write good scientific articles, you don’t need to start from scratch when AIs that can read English already exist: instead, you can “fine tune” those AIs on the specific data you want them to learn from, teaching them hyper-specific skills for a fraction of the cost. But there’s a risk in doing so: such fine tuning inevitably relies on the initial training, which may not have been under your control.
Alignment
At one level, AI “alignment” is a simple question: have we actually trained the AI to do what we want it to do? If we want an AI that can predict which prisoners are likely to reoffend but the AI is using racial profiling as a core part of its decision, we might describe it as “unaligned” with our desires.
Sometimes AI can be unaligned because of bad training data, which embeds within it biases and inaccuracies. If an AI is trained to spot reoffenders based on a dataset of prisoners, for instance, it will never know about those who aren’t sent to prison; if it’s trained to speak English with a dataset that includes all of Twitter, it might start spouting idiosyncratic beliefs about the links between Bill Gates, 5G and Covid vaccines.
Other times, AI can be unaligned because we’ve asked it the wrong question. An LLM is designed to predict what text comes next, but sometimes that isn’t really what we want: sometimes we would rather have “true” answers than “likely” ones. Sometimes we would rather have answers that don’t repeat racial slurs, or threaten the user, or provide instructions to build bombs. But that isn’t the question we asked the AI.
And sometimes alignment is used to mean something more existential. Say you ask an AI to optimise your factory floor to maximise hourly output, and it decides the most important thing to do is ensure no one interrupts production for the next billion years, so it hides in its plans technology that would kill every organic life form on the planet – that would also be an unaligned AI.
If you want to read the complete version of the newsletter please subscribe to receive TechScape in your inbox every Tuesday.