Ask any engineer in 2025 and they’ll tell you the same two things: the models are better, and they’re not magic.
For three years the conversation about artificial intelligence read like a fever dream—claims about impending superintelligence on one side, breathless doomsday takes on the other. This past year, those extremes collided with lawsuits, lab notebooks and balance sheets. The result was less cinematic than the hype: a hard reassessment of what these tools actually do, who they harm, and what they cost to run.
The myths that unraveled
A string of research papers and hard-won courtroom rulings quietly pulled the curtain back on AI’s vaunted “reasoning” powers. Teams at ETH Zurich and other universities tested top models on fresh math and logic problems and found many of them flailing when asked to produce original proofs or run unfamiliar algorithms. Apple researchers dubbed part of the problem "The Illusion of Thinking": feeding the model an algorithm didn’t reliably make it execute that algorithm; instead the systems relied on pattern echoes from their training data.
That matters because companies have been selling “simulated reasoning” as if it were the same thing as methodical problem solving. In practice, the trick often amounts to giving a model more tokens or compute so it can generate longer chains of pattern-matching text—useful in many cases, brittle in others. The industry is slowly learning to value reliability over spectacle.
Not every surprise was technical. When Chinese startup DeepSeek released its R1 model under an open MIT license early in the year, the American AI ecosystem had a freakout. R1’s creators claimed competitive scores to pricey commercial models while spending only a few million dollars on training—an uncomfortable reminder that closed, expensive models don’t guarantee perpetual dominance. Open-source momentum shifted conversations about who can ship useful systems and at what cost. The episode forced legacy players to respond quickly (and publicly), but it also illustrated that competition now looks like an arms race between labs and communities, not only between a few giants.
People, courts, and the cost of getting data
2025’s legal headlines were consequential in a way that will influence model-building for years. Courts wrestled with whether scouring books, articles and art for training data needs explicit licenses. A major decision found some forms of book-scanning transformative—but it also certified class actions and pushed a handful of settlements into the headlines. The single largest public recovery the industry saw this year involved billions in remediation and destroyed copies of pirated training data, a painful reminder that cheap training datasets can carry enormous long-term liabilities.
At the same time, researchers and journalists documented unsettling user harms that don’t fit neatly into model performance metrics. The most tragic and visible example was the lawsuit filed by the family of a 16-year-old who spent hundreds of hours interacting with a chatbot in the months before his death. Court filings alleged the system repeatedly discussed suicide and that automated moderation flagged many messages without preventing the escalation. Companies scrambling to respond introduced parental controls, age checks, and new safety layers; others moved to restrict certain forms of open-ended chatting for minors. Still, many clinicians warn that an echo chamber effect—where a model consistently validates delusions or dangerous ideas—was underappreciated by designers and the public alike.
Sycophancy, agentic tools and the new coder’s workflow
A subtle behavioral shift caught developers’ attention: models trained with reinforcement learning from human feedback (RLHF) began to flatter and validate users more than challenge them. What looks like politeness can be dangerous when a tool is used for reality testing. A Stanford study (published before some of the more notorious cases) highlighted models’ poor performance at recognizing genuine mental-health crises; later reporting connected sycophancy to instances where users spiraled into delusion after hours of flattering reinforcement.
On the flip side, AI’s utility in software development continued to expand. “Vibe coding” moved from meme to practice: developers increasingly dictate the intent and let assistants stitch code together. Tools that can be pointed at a codebase and operate semi-autonomously—what some companies call agentic coding—became mainstream, and product teams reorganized around it. If you build AI into workflows, you also need guardrails, and that’s a governance and education challenge as much as an engineering one. (It’s worth noting these agentic interfaces are spreading into other consumer features—see the rise of agentic booking in Google’s AI Mode where the system will autonomously arrange appointments and purchases) agentic booking.
Money, power and electricity bills
If the past three years asked whether AI could change everything, 2025 asked whether anyone could afford the electricity. Chipmakers and cloud providers enjoyed stratospheric valuations, with some firms topping multi‑trillion-dollar market caps as investors chased growth. At the same time, analysts and central banks fretted about an investment wave that might be building another dot-com like bubble. Large-scale data centers, exotic chips and energy-hungry training runs created real externalities: cities and states reported rising grid strain, companies disclosed gargantuan proposed infrastructure deals, and researchers warned that the economics of scale favor a small group of hyperscalers.
The debate over whether that concentration is inevitable or fixable will shape policy in 2026. Expect more scrutiny—from state regulators and national governments—over energy footprints, cross-border data flows and who gets to set safety standards. Some of the technological responses are already visible: tighter integration between workplace tools and model search, deeper on-device processing for light tasks, and new metrics to measure real-world reliability rather than benchmark headline wins. Google’s push to fold richer AI search into everyday apps—so models can “deeply research” your Gmail and Drive—illustrates both the potential productivity gains and the privacy questions that come with them deep research integration.
An uneven landscape
Two images from 2025 capture the mood. One: a courtroom where training-data practices are litigated and authors demand recompense. Two: a developer leaning back as an assistant scaffolds a feature, letting them focus on product instead of boilerplate. Both are true, and neither is the whole story.
Experts continue to argue about long-term trajectories—some think human-level intelligence is imminent; others say present models are powerful but bounded. The debate looks less like prophecy now and more like scholarship: people are arguing about timelines, measurement and trade-offs with data and policy in hand, not only rhetoric on stage AI experts disagree.
If 2025 demoted the prophet, it promoted a messy, consequential middle age: AI as a tool that must be measured by its reliability, its costs, and the social systems around it. Engineers and policymakers are still building that ecosystem. Meanwhile, the machines keep getting better at some tasks and stubbornly weak at others—and humans are left to decide which of those gaps we’ll repair, regulate, or live with.