Ep. 260: Digital Archeology: The Primitive Power of GPT-1

Episode summary: In this episode, Herman Poppleberry and Corn take a fascinating trip back to 2018 to perform some "digital archeology" on the model that started a revolution: GPT-1. While modern users in 2026 might find its 117-million-parameter capacity and tendency to output gibberish laughable, the hosts explain why this "primitive" tool was actually the Wright brothers' flyer of the artificial intelligence era. They dive deep into the technical limitations of the time, including the 512-token context window and the use of absolute positional embeddings that caused the model to frequently lose its train of thought. Beyond the specs, Herman and Corn discuss the shift from supervised learning to unsupervised pre-training and how a dataset of 11,000 unpublished romance novels shaped the early worldview of generative AI. By comparing the raw engine of GPT-1 to the "layered cakes" of 2026, this episode provides a crucial perspective on how far the industry has come and why the ghost of this original architecture still lives within the trillion-parameter giants of today. Show Notes ### Digital Archeology: Unearthing the Ghost of GPT-1 In the fast-paced world of 2026, where trillion-parameter models and agentic autonomy are the norms, looking back at the year 2018 can feel like studying the Paleolithic era. In a recent discussion, podcast hosts Herman Poppleberry and Corn took a deep dive into "digital archeology," triggered by a housemate's frustrating encounter with the original GPT-1. What began as a humorous look at a "broken" model evolved into a profound exploration of how the foundations of modern AI were laid. #### The Wright Brothers' Flyer of AI The conversation begins with a stark reality check: GPT-1, released by OpenAI on June 11, 2018, is a far cry from the sophisticated assistants we use today. Corn notes that when modern users interact with it, the model often fails to maintain coherence, sometimes labeling major cities as villages or devolving into gibberish after just a few sentences. However, Herman offers a vital perspective: GPT-1 wasn't a failure; it was a proof of concept. He compares it to the Wright brothers' first flight—a twelve-second, 120-foot journey that changed the world, even if it couldn't cross an ocean. In 2018, the fact that a transformer-based model could generate any coherent text at all was a landmark achievement. #### The Scale of the Revolution The sheer difference in scale between 2018 and 2026 is staggering. GPT-1 boasted 117 million parameters. While that sounded like a massive number at the time, Herman points out that modern models like GPT-4 (estimated at 1.8 trillion parameters) represent a 15,000-fold increase in raw capacity. This lack of scale explains many of the "primitive" behaviors Daniel encountered. GPT-1 utilized a context window of only 512 tokens—roughly one page of text. More importantly, it used absolute positional embeddings, meaning it had a hard-coded limit. Once the model reached token 513, it simply could not "see" further. Without the sophisticated attention mechanisms of today, the model would begin attending to its own errors, creating a feedback loop of nonsense that rendered long-form conversation impossible. #### Trained on Romance and Dragons One of the most colorful insights from the episode involves the data used to train the original model. Unlike today's models, which are trained on massive swaths of the entire internet, GPT-1 was trained on the "BookCorpus" dataset. This consisted of over 11,000 unpublished books scraped from Smashwords, a platform dominated by indie romance, fantasy, and science fiction. As Herman explains, GPT-1's entire worldview was shaped by the tropes of star-crossed lovers and dragon-slaying adventures. This explains the "dramatic flair" often found in its outputs. But more importantly, it highlights the shift in AI philosophy. Before GPT-1, AI was "supervised"—it had to be hand-held through specific tasks like sentiment analysis. GPT-1 proved that "unsupervised pre-training"—simply letting a model predict the next word in a book—was enough to teach it the fundamental structures of language. #### The "Layered Cake" of Modern AI A major point of confusion for modern users is why GPT-1 feels so "robotic" and unhelpful compared to today's chatbots. Corn and Herman clarify that GPT-1 was never intended to be a chatbot. It was a raw text predictor—an engine sitting on a workbench without a steering wheel. Modern AI is described as a "layered cake." At the bottom is the base model (the raw engine), followed by instruction tuning (learning to follow commands), and finally RLHF (Reinforcement Learning from Human Feedback), which polishes the AI to be helpful and pleasant. GPT-1 was just the base. If you asked it a question, it wasn't trying to help you; it was simply trying to complete a document. If it got confused, it might decide the most logical "next word" was the letter "A" repeated indefinitely. #### BERT vs. GPT: The Battle for the Future The hosts also revisit the historical rivalry between OpenAI's GPT and Google's BERT. Released around the same time, BERT was an "encoder-only" model designed for understanding, while GPT was "decoder-only" and designed for generation. While BERT initially dominated benchmarks for language understanding, the "generative" path taken by GPT eventually led to the breakthrough of general intelligence. As Herman notes, if a model can generate the next word perfectly, it must, by necessity, understand the world. #### The Legacy of a Pioneer As the discussion concludes, Herman and Corn reflect on the current state of GPT-1 in 2026. While it is no longer a flagship model, its size has made it the new standard for "edge" AI. The 117-million-parameter scale is now used for tiny, specific tasks like spam detection or sentiment analysis on mobile devices. Ultimately, GPT-1 is viewed not as a dead end, but as a direct ancestor. Every trillion-parameter model currently in use contains the "ghost" of that original 2018 architecture. It was the first single-celled organism of the generative AI explosion—a simple starting point that proved that with enough data and the right architecture, a machine could eventually learn to speak. Listen online: https://myweirdprompts.com/episode/gpt-1-origins-evolution

Found an issue? Give us feedback