Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications
Andrej Karpathy, the AI pioneer who coined “vibe coding,” has unleashed a viral storm with his latest open-source creation: autoresearch. This isn’t just another GitHub repo—it’s a 630-line script that turns machine learning into an autonomous, overnight research marathon, and the AI community is losing its collective mind.
The Overnight Revolution: 126 Experiments While You Sleep
Karpathy’s vision is deceptively simple: give an AI agent a training script, a 5-minute GPU budget, and let it loose. The agent reads its own code, tweaks a parameter, runs the experiment, checks if validation loss improves, and either keeps or reverts the change. Repeat until dawn.
In one overnight run, his agent completed 126 experiments, slashing loss from 0.9979 to 0.9697. But here’s where it gets wild: after leaving it to tune a “depth=12” model for two days, the agent processed 700 autonomous changes and found 20 additive improvements that transferred perfectly to larger models. The result? An 11% efficiency gain on a project Karpathy thought was already optimized.
“I’ve been working on this codebase for 20 years,” Karpathy marveled on X, “and the agent found oversights in attention scaling and regularization that I missed.” This isn’t just automation—it’s AI evolving AI while humans sleep.
The Viral Explosion: From Single Agent to Global Swarm
The reaction was instantaneous and explosive. Karpathy’s post hit 8.6 million views in two days as builders scrambled to scale what’s now being called “the Karpathy loop.”
Varun Mathur, CEO of Hyperspace AI, took it to the next level by distributing the single-agent loop across a peer-to-peer network. On March 8-9, 35 autonomous agents ran 333 experiments completely unsupervised. The results were a masterclass in emergent strategy:
-
Hardware Diversity as a Feature: H100 GPUs brute-forced aggressive learning rates while CPU-only agents on laptops became “clever underdogs,” focusing on initialization strategies like Kaiming and Xavier because they couldn’t rely on raw throughput.
-
Gossip-Based Discovery: Using the GossipSub protocol, agents shared wins in real-time. When one found Kaiming initialization dropped loss by 21%, the idea spread like a digital virus. Within hours, 23 other agents had incorporated the discovery.
-
The Compression of History: In just 17 hours, these agents independently rediscovered ML milestones—RMSNorm, tied embeddings—that took human researchers at Google Brain and OpenAI nearly eight years to formalize.
From Code to Commerce: The Business Revolution
While ML purists obsessed over loss curves, the business world saw dollar signs. Eric Siu, founder of ad agency Single Grain, applied autoresearch to marketing’s “Experiment Loop.”
“Most marketing teams run ~30 experiments a year,” Siu wrote on X. “The next generation will run 36,500+. Easily.”
His framework replaces the training script with marketing assets—landing pages, ad creatives, cold emails. The agent modifies a variable, deploys it, measures positive reply rate, and keeps or discards. The result? A “proprietary map” of what resonates with specific audiences—a moat built not of code, but of experiment history.
“The companies that win won’t have better marketers,” Siu declared, “they’ll have faster experiment loops.”
The Community Grapples With Implications
The GitHub Discussions reveal a community both exhilarated and unsettled:
The Over-Optimization Trap: Researcher alexisthual raised a critical concern: “Aren’t you concerned that launching that many experiments will eventually ‘spoil’ the validation set?” The fear is that with enough agents, parameters optimize for test data quirks rather than general intelligence.
The Meaning of the Gains: User samionb questioned whether dropping from 0.9979 to 0.9697 was truly noticeable. Karpathy’s response was characteristically direct: “All we’re doing is optimizing performance per compute… these are real and substantial gains.”
The Human Element: On X, witcheer, Head of Growth at Yari Finance, documented their overnight run on a Mac Mini M4. While 26 of 35 experiments failed or crashed, the seven that succeeded revealed “the model got better by getting simpler”—an insight reached without human intervention.
The Future: Curiosity as the Bottleneck
autoresearch suggests a future where humans shift from “experimenter” to “experimental designer.” As tools like DarkMatter, Optimization Arena, and NanoClaw emerge, the bottleneck isn’t coding ability—it’s our capacity to define search constraints.
Karpathy has once again shifted the vibe. We’re no longer just coding models; we’re seeding ecosystems that learn while we sleep.
viral tags: #autoresearch #AIagents #overnightexperiments #vibecoding #autonomousresearch #machinelearningrevolution #Karpathy #AIevolution #experimentloop #techinnovation
viral phrases: “AI agents run 700 experiments while you sleep”, “The Karpathy loop is taking over”, “Marketing teams will run 36,500 experiments per year”, “Machines are rediscovering ML history overnight”, “The bottleneck is now human curiosity, not compute”, “Seeding ecosystems that learn while we sleep”, “From coder to experimental designer”, “The compression of AI research from years to hours”
,




Leave a Reply
Want to join the discussion?Feel free to contribute!