I’d like to talk about a distinction between two kinds of artificial intelligence (AI): “Strong” AI and “Weak” AI. StrongAI is AI that has attained artificial consciousness — the machine has become sentient and thinks for itself in a way comparable to a human being. Strong AI has been the subject of numerous films and works of fiction since Czech writer Karel Čapek’s 1920 play R.U.R. That play coined the word “robot,” derived from the Czech word “robota,” which means “meaningless work or drudgery.” Strong AI will be the subject of my next post. I’m going to write about Weak AI here.
Weak AI is artificial intelligence that produces an output that resembles something a human being would produce, but it’s one that of course does not have any kind of human consciousness. So you can multiply 6 x 7 in your head and produce an answer (“56,” right?), and so can a computer (but it’d probably say “42”). There’s no question that we’re dealing with an inert object in the case of Weak AI no matter how much processing power is behind it or how complex the tasks that it can perform. It’s just a machine running a program. And the most important thing about AI text generators such as ChatGPT is that the machine is literally incapable of reading.
I’d like to preface my remaining comments by saying that I don’t understand how Large Language Models such as ChatGPT work. I can code in HTML and XML, and I understand some of the basics of computer hardware and software, but I couldn’t write or debug the code for any Large Language Model, so I prefer to say that I don’t understand how the technology works. I think more people writing about ChatGPT need to preface their posts with that caveat. What I will do is summarize what other people with that kind of competence have said about how Large Language Models work, some of which I’m taking from the Digital Humanist listserv, a Google Group dedicated to Digital Humanities, which is the use of computer technology in humanities scholarship. Most of what I’m going to say, however, is widely known. I should also add that what I say below is a very simplified account of technology that has developed in complicated ways, but I believe it still presents an accurate general picture.
Almost all computers work by processing bits, which are basic units of information consisting represented by a 1 and a 0. 1 and 0 signify on and off positions within the computer’s hardware that are being used to function as true/false or yes/no states. Moving up the chain, a byte is generally made up of 8 bits, and it was originally defined as the number of bits needed to represent a single character of text (say, a letter or number) in a computer. So a single letter is made up of a series of eight 1s and 0s, and words are made up of strings of these 1s and 0s put together. You can see the strings in the “bin” column on this ASCII-II table: capital A, for example, is 01000001. The word “dog” would be 011001000110111101100111.
Now you can imagine the kind of processing power needed if a computer were to analyze millions of words character by character, especially including every punctuation mark and space. To simplify the process and reduce the processing power needed, computer programs that analyze text are trained to assign numerical values to “meaning units” (such as -ing and -ly endings in English) called n-grams. So “the cat sat on the mat” might have n-grams associated with “the cat,” “cat sat,” “sat on,” “on the,” “the mat,” etc. Large Language Models are trained to recognize n-grams in a very large collection of text (say, all of Wikipedia), associating each n-gram with a numerical value, and then calculating the statistical probability of the next word (or n-gram) based on the previous ones before it. What’s really advanced about ChatGPT and other Large Language Models is that they calculate the statistical probability not just of the next word, but of the next sentence or sentences based on its training. It’s a massively powerful program.
Now take a step back and look at the entire process: words or bits of words or small groups of words are turned into numbers, and numbers are turned into sequences of 1s and 0s, and the rest is calculation.
At what point does any of this resemble reading in the normal human sense, where words are associated with physical things in the world, or with sensations, emotions, or concepts? Or memory, or a combination of all of these? Actually, nowhere. ChatGPT is a big calculator trained to convert words to numbers, run a statistical calculation and the most probable numbers coming up next, and then spit out numbers that are then translated into text output. At no point does it have any concept of meaning in any human sense of the word. As a powerful iteration of Weak AI, though, it does a great job resembling a human speaker.
And that is why computers can’t read. There is literally no understanding of the text in a human sense because words as words, as language, doesn’t exist for computers.
This post is part two of a three part series. You can read Parts I and III here as well.
3 thoughts on “AI and Talking Heads, Part II: Why Machines Can’t Read”