Ep. 257: AI That Evolves: Solving the Preference Problem

Episode summary: In this episode, Herman and Corn tackle a frustration shared by many power users: why can't our AI assistants stay updated with our evolving tastes in real-time? From the limitations of static training data to the "context rot" that plagues current recommendation systems, the duo breaks down the engineering hurdles of building a truly adaptive partner. They explore cutting-edge solutions like Test-Time Training (TTT), self-editing memory architectures like Letta, and the potential for nightly personal fine-tuning using LoRA. Whether you're tired of "amnesiac" LLMs or curious about the next frontier of personalization, this deep dive into the AI feedback loop offers a glimpse into a future where your model grows alongside you. Show Notes In the latest episode of *My Weird Prompts*, hosts Herman and Corn sit down in Jerusalem to tackle one of the most persistent frustrations in modern artificial intelligence: the "amnesia" of large language models (LLMs). The discussion is sparked by a query from their housemate Daniel, an engineer who is tired of the manual labor required to keep AI recommendation systems updated with his evolving personal tastes. Daniel's dilemma serves as a springboard for a deep dive into why AI often feels like a friend who only remembers who you were years ago, and what technical breakthroughs might finally allow models to learn in real-time. ### The Static Model Problem Herman explains that the primary reason AI assistants feel "frozen" is the nature of their training. Most frontier models, such as GPT-4o or Claude 3.5 Sonnet, are trained on massive datasets and then effectively locked. This creates a "knowledge cutoff," where the model's internal weights—the connections that form its "intelligence"—do not change based on new interactions. While these models can access the internet to find facts, they lack a native way to integrate a user's shifting preferences into their core reasoning without being manually fed that information every single time. ### The Pitfalls of RAG and "Context Rot" To solve this, many developers currently rely on Retrieval-Augmented Generation (RAG). In a RAG setup, a user's history is stored in a database and injected into the prompt as needed. However, Herman warns of a phenomenon he calls "context rot." As a user's history grows, the prompt becomes cluttered with old, irrelevant data. Even with the massive context windows available in 2026, LLMs often suffer from the "lost in the middle" problem, where they struggle to prioritize recent feedback over older, potentially obsolete information. This leads to high latency, wasted tokens, and a degradation in reasoning quality. ### The Barrier of Catastrophic Forgetting A common question arises: why not just update the model's weights every time a user provides feedback? Herman points out that this leads to "catastrophic forgetting." When a model is forced to learn highly specific new data—like a niche movie preference—it can inadvertently overwrite the patterns that allow it to perform general tasks like solving math problems or speaking different languages. The model becomes a hyper-specialized but ultimately broken tool, losing the general intelligence that makes it useful in the first place. ### New Horizons: Test-Time Training (TTT) The conversation shifts to one of the most promising areas of AI research: Test-Time Training (TTT). Unlike standard models where hidden states are static, TTT models treat these states as tiny, adaptable neural networks. When a user provides input, the model performs a small amount of "gradient descent" during the inference process itself. This allows the model to compress the context of a conversation into temporary weights. Corn likens this to "learning to play the piano in the middle of a concert." While TTT offers a way to handle massive amounts of data with constant latency, the challenge remains in making these updates permanent without triggering the aforementioned catastrophic forgetting. ### The Rise of Self-Editing Memory For a more immediate solution, Herman points to architectures like Letta (formerly MemGPT). These systems treat the AI as an operating system with tiered memory. The agent can proactively "write" to its own archival memory, essentially keeping a digital diary of user preferences. When a user expresses a dislike for a specific genre or topic, the agent updates its own notes and searches them during future interactions. This creates a self-correcting loop that feels more like a partnership than a static tool. ### The Personal "Adapter" and the Strategic Flywheel The ultimate vision discussed by Herman and Corn is the implementation of a "strategic flywheel." This involves using Parameter-Efficient Fine-Tuning (PEFT), specifically techniques like LoRA (Low-Rank Adaptation). Herman suggests a future where a user's local model undergoes a brief, automated fine-tuning session every night. This process would take the day's feedback and create a "personal adapter" that sits on top of a larger frontier model. By the next morning, the user has a fresh version of their AI that has "digested" their new preferences. ### Conclusion: Bridging the Gap Herman and Corn conclude that while the "holy grail" of a perfectly evolving AI is still being refined, the tools to build it are becoming increasingly accessible. By combining the massive intelligence of foundation models with local, personalized fine-tuning and self-editing memory, developers can bridge the gap between static code and the fluid nature of human taste. As we move further into 2026, the transition from AI as a tool to AI as a truly adaptive partner is no longer just a theoretical dream, but an engineering reality. Listen online: https://myweirdprompts.com/episode/ai-continuous-learning-preferences

Found an issue? Give us feedback