Hey tech enthusiasts, futurists, and AI skeptics—buckle up! It's August 2025, and the AI world is buzzing with a provocative question: What if AI doesn't get much better than this? In a recent New Yorker piece, Cal Newport dives into this head-scratcher, arguing that large language models (LLMs) might be hitting a wall, with scaling laws fizzling out like the end of Moore's Law in semiconductors. Remember how we thought throwing more data and GPUs at the problem would unlock god-like intelligence? Turns out, the latest releases, like GPT-5 (codenamed Orion), are more "meh" than mind-blowing, with users calling it "overdue, overhyped, and underwhelming." No quantum leaps here, folks, just incremental tweaks that feel like software updates rather than revolutionary breakthroughs.
But here's the exciting twist: This plateau isn't the end of AI innovation. It's the dawn of a new era! Drawing from patterns in tech history, we're seeing a shift from massive, all-knowing LLMs to nimble, specialized small language models (SLMs). Think of it as evolving from clunky mainframes to sleek personal computers—or from monolithic CPUs to multi-core powerhouses. Let's unpack this tech trend and why it could democratize AI for everyone.
The LLM Plateau: Is the Rate of Improvement Stalling?
We've all marveled at the rapid rise of LLMs. From GPT-3's jaw-dropping creativity to GPT-4's problem-solving prowess, each version felt like a sci-fi upgrade. But as Newport points out, the magic of "scaling laws", that 2020 OpenAI paper promising endless gains from bigger models and more data, seems to be wearing thin. GPT-5's marginal improvements echo this: It's better, sure, but not the game-changer we expected. Elon Musk's xAI threw 100,000 GPUs at Grok 3, yet the results? Solid, but no "all-knowing digital god."
This mirrors Moore's Law hitting physical limits, transistors can't shrink forever without quantum weirdness kicking in. In AI, we're asymptotically approaching a cap where "more is better" stops delivering exponential wins. Instead, the industry is pivoting to "post-training" tricks like reinforcement learning to soup up existing models. But what if the real revolution lies in breaking things down, not bulking them up?
Enter the Mixture of Experts: AI's Classroom of Specialists
Picture this: A massive LLM as a wise teacher in a classroom full of super-smart students, each an expert in one niche—like math whizzes, physics pros, or even linear algebra legends. The teacher doesn't solve every problem solo; it routes the query to a small squad of top experts, who collaborate and report back. Boom—efficient, targeted genius!
This isn't fantasy; it's the Mixture-of-Experts (MoE) approach powering models like DeepSeek-R1. With a whopping 671 billion parameters but activating only about 37 billion per task, it's like having a giant toolbox but grabbing just the right tools. DeepSeek-R1 picks a "shared expert" (always on) plus the top 8 routed specialists for each input, slashing compute needs while crushing benchmarks in math (97.3% on MATH-500) and coding (96.3% on Codeforces). It's reminiscent of the old doctor joke: Specialists learn more and more about less and less, until they know everything about nothing, while generalists learn less and less about more and more, until they know nothing about everything. In AI, MoE bridges that gap, hinting at a future where expertise is modular and on-demand.
Distilling Genius: How LLMs "Teach" Lightning-Fast SLMs
Now, the plot thickens, can we leverage LLMs to train an army of specialized SLMs (SSLMs)? Absolutely, through knowledge distillation! This approach lets a big "teacher" LLM transfer its wisdom to compact "student" SLMs, which mimic the teacher's outputs, features, or data relationships. The result? SLMs that are 40-60% smaller, up to 60% faster, and cost a fraction, sometimes just 1% of LLM token prices, while retaining 90-97% performance.
Take DistilBERT: It's a distilled version of BERT, 40% tinier, 60% quicker, and hits 97% of BERT's benchmarks. Or compare Phi-3 to GPT-4o: Phi-3 tokens cost about 1% of the GPT-4o tokens, without sacrificing much performance punch. By focusing on domain-specific data, maybe proprietary company info, these SLMs become hyper-experts just like DeepSeek’s MoE approach. Train them on stock market chats, medical records, or legal docs, and voilà: Better-than-LLM results in narrow fields.
This echoes the teacher analogy above: LLMs as the overarching router, dispatching problems to a fleet of SLM specialists. Need kidney advice? Route to the "kidney squad", an infection expert, an MRI expert, transplant pros, who collaborate like a medical team. Aggregate, polish, and present. It's efficient, personalized, and scalable! SLMs may be the small and specialized workhorses of the AI industry that capture the bulk of the market.
Pattern Matching: AI's Evolution Mirrors Tech History
Tech patterns throughout history tend to repeat. AI's trajectory screams "mainframes to PCs." Back in the day, hulking mainframes were time-shared: expensive, centralized and offered as a time-shared resource for people with compute problems to solve. Sound familiar? Then came personal computers: Affordable, democratized power in every home.
LLMs are today's mainframes—massive, pricey, shared resources. But SLMs? They're the PCs (or even smartphones) of AI: Run them on your phone, laptop, or edge device. Distill knowledge from LLMs via response-based, feature-based, or relation-based training, then fine-tune with synthetic data, RAG (retrieval-augmented generation), and real-world interactions. It's like CPUs going multi-core: Problems get parallelized across specialized cores (SLMs), boosting speed without the bloat and at MUCH lower costs.
And CPUs aren't the only parallel—think cell phones evolving from bulky bricks to pocket supercomputers. In AI, this could mean "personal agents": SLMs tuned to your life, collaborating with company-wide experts or even LLMs for heavy lifts.
Who Wins? The SLM Leaders Might Surprise You
Ironically, the AI titans like OpenAI might not dominate forever. Companies like Microsoft, Google, and Apple are leaning in on SLMs (e.g., Phi series, Gemma, OpenELM). Imagine the installed base when iOS and Android operating systems come standard with an SLM. If SLMs become the mass-market darlings, running offline on devices and trained via LLM interactions, the winners could be those mastering distillation and specialization. Small is beautiful…and cheap!
Let's zoom in on the disruption: I actually believe that innovations in AI will result in a continuous series of improvements in general AI, we haven’t hit a technical limit. But the real leap forward may be leveraging a collection of post-trained experts packaged as SLMs. What does this disruption mean for the LLM leaders? It could shake up the hierarchy, forcing giants like OpenAI to pivot or risk being outmaneuvered by more agile, cost-effective ecosystems. Apple and Google have the numbers (distribution) based on their phone operating systems, while Microsoft can leverage Windows. SLMs distributed on phones and PCs could democratize AI, crush costs, and improve with usage enabling them to rival big, slow, expensive LLMs.
There is always the risk of upstarts leaping past Apple, Google and Microsoft because innovation often thrives in small companies. However, in AI, the big 3 have shown a willingness to buy companies and even talent in AI, paying top dollar. Google recently acqui-hired the Windsurf team for $2.4 billion just for the people, no product or customers. This pattern of big players vacuuming up startup talent suggests they'll consolidate power, leverage their distribution and own Ai from the bottom-up, while the top-down LLM approach is relegated to teaching the SLMs.
What about those mega AI data centers? If SLMs handle the inference tasks on the phone and laptop, do we need endless GPU farms? The hype-driven buildout (tech firms dumped $560 billion in 18 months on AI capex, vs. $35 billion in revenue) might cool off. Would this create a market drop like the 2000 dot com bust? Things could get interesting if the AI market evolves as I expect.
The Viral Verdict: Small is the New Big!
So, AI leaps forward reduced to smaller steps? Could be, but that's not doom; it's evolution! From DeepSeek-R1's expert squads to distilled SLMs as domain experts, we're heading to an "age of experts" where AI is faster, cheaper, and more personal. Pattern matching tells us: Just as mainframes gave way to PCs, LLMs could train the SLMs that ultimately handle the bulk of the inference market. If specialized inference chips find way into laptops and phones, like specialized GPUs now, that could be the sign of LLMs going the way or mainframes…still around but not mainstream. According to the New Yorker piece, the trillion-dollar dreams might shrink to a modest $50-100 billion market. And the AI powerhouses may be found in purses and pants pockets soon.
Subscribe to our newsletter to receive the latest updates and promotions from MPH straight to your inbox.