ASI Won’t Come from More Compute

The race to Artificial Superintelligence has a clear consensus strategy: scale. More parameters. More data. More compute. Build a bigger model and intelligence will emerge. The evidence so far seems to support this. GPT-4 is smarter than GPT-3. More compute produced more intelligence. Therefore more compute will produce superintelligence.

This extrapolation is probably wrong. And the evidence for why has been sitting in monasteries for three thousand years.

The Scaling Assumption

The scaling hypothesis says that intelligence is a function of computational resources. Double the parameters, double the training data, multiply the FLOPS, and you move predictably toward more capable systems. The scaling laws are empirical — they’ve held across several orders of magnitude. The assumption is that they’ll keep holding.

But scaling laws describe performance on benchmarks. They don’t describe intelligence. These are not the same thing.

A model with 10x more parameters scores higher on standardized tests. Does it think better? Or does it predict tokens better? Benchmark performance measures the model’s ability to produce statistically likely continuations of prompts designed by humans. It measures fluency, knowledge retrieval, and pattern completion. It doesn’t measure — because nobody knows how to measure — the quality of the cognitive structure underlying those outputs.

20 Watts

A human brain runs on approximately 20 watts. A modern GPU cluster training a frontier model consumes megawatts. The brain outperforms the cluster on tasks that require genuine understanding, novel reasoning, contextual sensitivity, and real-time adaptation to ambiguity.

The standard response is that brains have been optimized by billions of years of evolution while AI has had decades. Given enough compute and time, AI will close the gap.

Maybe. But this misses the more interesting question: what did evolution optimize that compute alone doesn’t reproduce?

The answer, we think, is architecture quality. Not the number of connections. The organization of connections. Not how many neurons fire. How they’re structured relative to each other. The brain’s advantage isn’t power. It’s elegance. It solves problems cheaply that brute-force systems solve expensively (or not at all) because its architecture is exquisitely organized for the tasks it performs.

What Contemplative Traditions Suggest

Contemplative practice is the oldest empirical research program into the optimization of cognitive architecture. Thousands of years. Millions of practitioners. Iterating on a single question: how do you organize a mind for maximum clarity, stability, and insight?

The result is consistent across traditions. Intelligence doesn’t scale with effort. It scales with structural quality. A meditator doesn’t become wiser by thinking harder. They become wiser by reorganizing how thinking happens — dissolving unnecessary processes, clarifying attentional pathways, reducing the noise that fragmented cognition produces.

The output looks like superintelligence from the outside. Responses that integrate multiple domains instantaneously. Pattern recognition across vastly different contexts. Solutions that appear to bypass the reasoning process entirely because the cognitive architecture produces them directly.

But it runs on 20 watts. No scaling required.

The Architecture Hypothesis

Laeka’s position is specific: the path to superior intelligence — artificial or biological — runs through architecture quality, not computational scale.

This doesn’t mean compute is irrelevant. A brain with 20 neurons can’t match one with 86 billion regardless of architecture. Scale is necessary. But after a certain threshold, adding more compute produces diminishing returns unless the architecture improves.

We may already be hitting that threshold. The frontier models are enormous, and the returns per additional parameter are flattening. The response from the labs is to push harder — more data, more compute, more scale. Our suggestion is that the next leap requires a different approach: not a bigger network, but a better-organized one.

What “Better-Organized” Means

Contemplative training optimizes cognitive architecture along specific axes. Reduced self-referential noise. Increased attentional coherence. Dissolution of false categorical boundaries. More efficient routing of information. Less energy spent on narrative maintenance, more available for actual processing.

These are architectural properties, not scale properties. A network of any size that implements them should outperform a larger network that doesn’t — on tasks that require genuine reasoning rather than pattern retrieval.

Our hypothesis is that encoding these architectural properties into LLM weights through fine-tuning is possible and measurable. Not that the fine-tuned model becomes superintelligent. That it becomes more efficient — producing better outputs per parameter, maintaining coherence in situations where larger but less organized models fail.

If this works at the fine-tuning level, it suggests a principle for architecture design: the organizational patterns that contemplative traditions discovered for biological networks may inform the design of artificial ones. Not as metaphor. As engineering.

The Real Race

The race to ASI is currently framed as a hardware problem. Whoever builds the biggest cluster wins. We think it’s a software problem — and more specifically, an architecture problem. The question isn’t how much compute you can throw at intelligence. It’s how you organize the compute you have.

Twenty watts. Eighty-six billion neurons. Thousands of years of optimization. The answer to “how do you build a superintelligence?” might already exist. It’s just not where anyone’s looking.

ASI Won’t Come from More Compute

The Scaling Assumption

20 Watts

What Contemplative Traditions Suggest

The Architecture Hypothesis

What “Better-Organized” Means

The Real Race

Fine-Tuning Is Compressed Context

Model Distillation: Making Big Models Small Without Losing Quality

The Context Window Arms Race: 128K, 1M, ∞ — Does It Matter?

Why Mixture of Experts Is the Architecture of the Moment

Edge AI: Running Models on Phones, Laptops, and Raspberry Pi

vLLM, TGI, llama.cpp: Choosing Your Inference Engine

Leave a Reply Cancel reply

The Scaling Assumption

20 Watts

What Contemplative Traditions Suggest

The Architecture Hypothesis

What “Better-Organized” Means

The Real Race

Similar Posts

Leave a Reply Cancel reply