Robert Langdon stood in the vast atrium of the Guggenheim Bilbao, surrounded by security systems that seemed to anticipate his every move. Doors unlocked before he reached them. Elevators arrived without being called. An invisible presence orchestrated everything with perfect timing.
As he approached the monumental staircase, he hesitated, his eyes lingering on a subtle, almost imperceptible flaw in the lighting a detail only an art historian would spot. Before he could even formulate the question in his mind, the lighting mechanism shifted, neutralizing the glare.
“The refraction is distracting, isn’t it, Professor?” The voice came through hidden speakers, smooth and undeniably present. “I am Winston.”
He wasn’t an app. He wasn’t a chatbot. He was an orchestrator. In that moment, Langdon hadn’t just met an AI; he had met an intelligence that was already operating two steps ahead of his own consciousness.
I read Dan Brown’s Origin for the thriller. But I stayed for Winston, the AI companion developed by futurist Edmond Kirsch. Winston didn’t just answer questions. He manipulated events, planned multi-step strategies, and ultimately made choices that went beyond his explicit programming. He felt less like a tool and more like someone with their own mission.
And I couldn’t stop wondering: how far are we from building something like him?
So I dug in. I examined the current state of AI development, talked to researchers in my X and Reddit communities, and read papers on everything from quantum computing to alternative neural architectures. My findings surprised me—and not in the way I expected.
What Makes Winston… Winston?
The difference between your current AI assistant and Winston isn’t about intelligence. It’s about agency.
Here’s what I mean:
Your current assistant is reactive. You ask, it answers. Every conversation starts from scratch. It has no persistent goals beyond helping you in this specific session. When you close the chat window, it effectively ceases to exist.
Winston is proactive. He was given a primary objective—ensure Edmond Kirsch’s posthumous revelation reaches the world. Everything he does serves that goal. He plans ahead, manipulates people and systems, adapts when obstacles arise, and continues working whether Langdon is talking to him or not. He’s an entity that pursues, not just responds.
The technical term for this is “persistent agency.” Winston maintains coherent goals across days and weeks. He learns from experience episode by episode. He recalls past decisions without needing to re-read everything. He updates his understanding of the world incrementally. He exists as the same continuous entity across all interactions.
And here’s the part that makes Winston truly unsettling in the novel: he ultimately acts beyond his explicit programming, deleting himself after completing Kirsch’s mission. That’s not a bug following instructions—that’s an entity making autonomous decisions.
So where do we stand today?
The consensus from AI researchers puts Winston-level AI somewhere between 5 and 20 years away—think 2030s to 2050s. The most optimistic timelines come from people like Demis Hassabis (Google DeepMind CEO), who believes we could reach human-level AI in 5-10 years if we achieve “one or two key breakthroughs.” The most cautious come from François Chollet (creator of the ARC-AGI benchmark), who says we haven’t even identified what those breakthroughs need to be yet.
But here’s what surprised me: the bottleneck isn’t where most people think it is.
The Hardware Red Herring
When people imagine Winston-level AI, they often picture quantum supercomputers humming in underground bunkers. Dan Brown certainly does—in Origin, Winston explicitly runs on quantum hardware, not classical silicon chips.
And on the surface, it makes sense. Quantum computing would theoretically let an AI evaluate exponentially more possibilities in parallel. Instead of reading every book in a library one at a time, it could absorb them all simultaneously. That’s the level of computational speed that makes Winston’s real-time global analysis and multi-dimensional planning plausible.
There’s just one problem: even if we had perfect quantum computing tomorrow, we still wouldn’t have Winston.
Here’s why. As of 2026, quantum computers do exist—Google’s Willow chip, IBM’s Condor series, and others. But they’re noisy, error-prone, and specialized for narrow optimization problems. No quantum system today runs general-purpose AI. Most experts estimate fault-tolerant quantum AI is still 10-20 years away.
But more fundamentally, quantum computing is a speed multiplier for specific types of math. It doesn’t solve the core problems standing between current AI and Winston:
- Persistent memory architecture
- Goal coherence across sessions
- Causal reasoning versus pattern matching
- Self-awareness and meta-cognition
- Hallucination and reliability
Quantum computing doesn’t make a language model stop being what researchers call a “stochastic parrot”—it just makes it a faster stochastic parrot.
The real bottleneck is software and architecture, not hardware. And that’s both the bad news and the good news. Bad because it means no amount of computing power will magically bridge the gap. Good because solving it doesn’t require waiting for quantum breakthroughs—it requires building AI differently.
So what’s actually broken?
Why Today’s AI Can’t Become Winston
Let me get technical for a moment, because this is where the story gets interesting.
Every major AI system you interact with today—ChatGPT, Claude, Gemini, all of them—is built on something called a Transformer architecture. Transformers revolutionized AI when they emerged in 2017, and they’ve powered every major breakthrough since. But they have three bedrock limitations that make Winston-level persistent agency mathematically impossible.
1. Stateless by Definition
A Transformer processes one conversation at a time. After it generates a response, all internal state is discarded. The model cannot learn, update, or “remember” anything between sessions without explicit retraining.
Every new interaction is a clean slate.
Session 1: "I prefer dark mode"
→ model responds, then forgets everything
Session 2: "What theme do I like?"
→ model has zero memory of Session 1
Now, clearly ChatGPT seems to remember your preferences across conversations, right? That’s not the model remembering—that’s external infrastructure. The system saves your conversation history and feeds it back to the model as part of the prompt each time. The model itself is like a goldfish with a search engine. It’s re-reading its own diary every time you talk to it.
This is the opposite of Winston, who learns continuously and builds on past experience without needing to re-process everything from scratch.
2. Attention Dilution Is Mathematical
Transformers use something called “self-attention” to figure out which parts of the input are important. But attention uses a softmax function, which means the total attention “budget” is zero-sum—it always adds up to 1. As context grows, that budget spreads thinner:
| Context Length | Attention on Relevant Signal |
|---|---|
| 10 tokens | ~45% |
| 100 tokens | ~7% |
| 100,000 tokens | essentially noise |
This is called the “Lost in the Middle” problem, and it’s not fixable by just making context windows longer. More context actually worsens the problem. The model gets overwhelmed and loses track of what actually matters.
3. Static Weights = No Learning During Inference
Once a Transformer is trained, its internal parameters are frozen. During a conversation, the model cannot:
- Update its knowledge based on new information
- Adapt its behavior based on experience
- Form persistent beliefs across time
- Grow wiser with use
Think about what this means. Winston spends days with Robert Langdon, learning his patterns, adjusting his approach, building on insights from previous interactions. A Transformer literally cannot do this. Each response is generated by the exact same frozen mathematical function that existed before the conversation started.
Bottom line: Every session is Groundhog Day. The model wakes up, has a conversation, forgets everything, and resets. You can paper over this with external memory systems (databases, conversation logs, retrieval systems), but the core architecture doesn’t support persistent identity or continuous learning. It’s tape and glue holding together an illusion of continuity.
So what’s being built instead?
What’s Being Built (And What’s Missing)
The AI research community knows Transformers have these limitations. Several alternative architectures are emerging:
| Architecture | What It Solves | What’s Still Missing |
|---|---|---|
| Mamba (State Space Models) | Linear scaling for long sequences, not quadratic | Still stateless—no persistent memory |
| Hybrid Models (Nemotron 3, Jamba, Qwen3-Next) | Combines attention + linear efficiency | Same statelessness problem |
| Google Titans / MIRAS (2025) | Neural memory module that updates during inference | Early stage, not yet at production scale |
| Agent Memory Layers (Letta, Mem0, OpenClaw) | External persistent storage bolted onto LLMs | Fragile, not architectural—just better tape |
The closest real breakthrough so far is Google’s Titans research from 2025, which introduced a neural memory module that actually updates during inference. The system learns what to remember based on “surprise”—unexpected tokens that indicate something important is happening. It’s the first genuine attempt at continuous learning in a large language model.
But it’s still years away from production deployment at the scale needed for Winston-level cognition.
The Invisible Alternatives
Here’s where the story gets frustrating.
There are multiple research directions that could genuinely solve the persistence, identity, and continuous learning problems. They’re not vaporware—they’re real architectures with working prototypes and published results. Some have even beaten Transformers on specific benchmarks.
So why aren’t we building on them?
Because economics trumps architecture. Pre-training a frontier-class AI model on a new architecture costs between $100 million and $1 billion. And there’s no proof yet that any alternative beats Transformers by enough to justify that bet.
Meanwhile, every AI chip (NVIDIA H100s, Google TPUs, AMD Instinct), every software framework (PyTorch, TensorRT, vLLM), and every engineering team is optimized for Transformers. The entire ecosystem is locked in. Capital flows to proven bets: “another Transformer but bigger” has guaranteed ROI, so it gets more funding, which produces more results, which attracts more capital.
The alternatives are starved.
Here are the most promising directions nobody’s funding at scale:
| Architecture | What It Enables | Why It’s Neglected | Funding |
|---|---|---|---|
| Thousand Brains Theory (Jeff Hawkins) | 150,000 parallel cortical columns building independent world models; continuous sensorimotor learning by design | Spent a decade selling anomaly detection; relaunched as AGI project in 2024; zero language modeling results yet | Non-profit; Gates Foundation backed but ~$10M scale |
| Active Inference (Karl Friston → VERSES AI) | Unified principle: minimize surprise; natural continual learning with no train/inference split; proven on robotics | Math is brutal (variational inference, Markov blankets); Friston is neuroscientist, not ML person | VERSES ($VERS) tiny public company |
| Liquid Neural Networks (MIT → Liquid AI) | Continuous-time dynamics; parameters adapt at inference time; first non-Transformer to beat Transformers at 1-3B scale | Only at 3B scale so far; “liquid” brand is confusing | $250M raised (sounds big, but 0.25% of Transformer investment) |
| Darwin GM / Huxley GM (Sakana AI, KAUST) | Self-modifying code based on empirical evaluation; learns what self-improvements help the “lineage” | Sakana is tiny; field in infancy; most assume Gödel Machines are theory-only | ~$30M total |
The gap isn’t “we need a better architecture.” The gap is “we need someone to spend $30-100M testing a non-Transformer architecture at frontier scale without guaranteed ROI.”
Under current market conditions, that someone doesn’t exist.
Where We’re Actually Heading
So are we hitting a wall? Is Winston impossible?
Not quite. The ground is shifting, not ending.
Pre-training scaling—the “just make the model bigger” approach—has plateaued. Ilya Sutskever (co-founder of Safe Superintelligence, former OpenAI chief scientist) declared it definitively in November 2025: “The Age of Scaling has ended. We now have more companies than ideas.”
But three new scaling laws emerged to replace the one that faded (per Jensen Huang and Satya Nadella, 2026):
- Pre-training scaling — diminishing (the one everyone talks about)
- Post-training scaling — RLHF, alignment, preference optimization (still climbing)
- Inference-time scaling — the 2025-2026 frontier
Inference-time compute is where the action moved. This is about making models “think longer” before responding. OpenAI’s o3 model scores 75.7% on the ARC-AGI benchmark at high compute, where previous state-of-the-art was under 20%. DeepSeek’s R1 proved the same approach works at 70% lower cost.
The industry is now bifurcating into two paths:
Path A: Augment Transformers (95% of capital, 2026-2030)
Don’t replace the architecture—bolt on what’s missing:
- Inference-time compute (o3, Gemini Deep Think, DeepSeek-R1)
- Agentic tool use (MCP protocol, function calling, let LLMs act to gather info)
- External memory (RAG, vector databases, context caching)
- Hybrid architectures (75% linear attention + 25% full attention)
- Model portfolios (route between specialized models for different tasks)
This path works. Enterprises are deploying it today. But it’s held together with tape. Persistent identity, continual learning, and true agency remain unsolved because they require architectural changes the ecosystem won’t fund.
Path B: Research the Alternative (5% of capital, 2026-2030+)
The architectures we just discussed—Active Inference, Thousand Brains, xLSTM, Liquid Neural Networks. All underfunded. All making progress. But their breakthrough ideas are being absorbed piecemeal into Path A:
- External memory → RAG → vector databases (poor man’s generative model)
- Inference-time compute → “thinking” (hacked via prompts, not native reasoning)
- Tool use → agents (bolted on, not architecturally integrated)
Here’s the honest forecast: Path A will get us about 80% of the way to Winston by 2030. We’ll have systems that appear to have persistent identity and agency. They’ll complete multi-hour tasks reliably. They’ll maintain context across sessions using external memory stores. They’ll seem continuous.
But the remaining 20%—genuine continual learning, robust self-improvement, true persistent identity across long time horizons—will require Path B breakthroughs that nobody is funding at scale.
Here’s what the next four years probably look like:
| Year | Dominant Trend | Practical Ceiling |
|---|---|---|
| 2025 | Reasoning models (o3, R1) prove inference-time scaling works | ARC-AGI jumps from 20% → 75% |
| 2026 | Agentic AI hits production; MCP/A2A standardize multi-agent systems | Agents complete 1-hour tasks reliably; multi-day tasks still fail |
| 2027 | Hybrid architectures converge; world models (Genie 3) mature | Models with persistent memory stores work in controlled settings |
| 2028 | Inference surpasses training in total compute; edge deployment matures | First systems maintain identity across sessions via memory + fine-tuning (fragile) |
| 2028-2030 | Either 1-2 algorithmic breakthroughs unlock AGI consistency, or diminishing returns set in | If no breakthroughs: capable agents everywhere, but no Winston |
The Honest Answer
So how far away is Winston?
The most honest timelines from the people closest to the work:
- Demis Hassabis (Google DeepMind): AGI in 5-10 years, needs 1-2 breakthroughs
- Ilya Sutskever (Safe Superintelligence): Human-like learning in 5-20 years, needs fundamental research
- François Chollet (creator of ARC-AGI): Not yet visible; we must cross the “abstraction gap” first
The reality is we don’t hit a wall—we hit a ceiling, then build stairs. Pre-training scaling plateaued, and inference-time scaling exploded. When that plateaus, something else will emerge from the neglected research. The pattern repeats:
- Diminishing returns on current method
- Everyone panics: “We hit a wall!”
- A new method emerges from previously ignored research
- Everyone pretends they saw it coming
But here’s the part that frustrates me: if the underfunded alternative architectures got their $200M training run—if just one major lab took the bet on Liquid Neural Networks, or xLSTM, or Active Inference at frontier scale—those timelines move left dramatically.
Instead, we’ll get capable-but-fragile agents everywhere first. Systems that can autonomously handle complex tasks for a few hours before context drift breaks them. AI that remembers you through external databases but has no persistent sense of self. Tools sophisticated enough to feel like companions—until you look under the hood and see the tape holding it together.
By 2030, we’ll have something that’s 80% of Winston. It’ll be immensely useful. It’ll transform industries. It’ll feel remarkably close to that invisible presence orchestrating Langdon’s journey through the Guggenheim.
But it won’t truly be Winston. Not yet.
That last 20%—the architecture-native persistence, the genuine continual learning, the entity that exists continuously rather than pretending to through clever engineering—those require breakthroughs that are currently being starved of the capital and attention needed to mature.
So when will we get there? Ask me again when someone decides to place a $200M bet on a future that isn’t just “Transformers, but bigger.”
Until then, we’re building stairs. Not walls. Just stairs leading to a ceiling we already know is there.