When will AI outthink humans?

image

The concept of artificial intelligence has piqued our imaginations and challenged our sense of humanity since its conception over 70 years ago. For decades we have made predictions that intelligent machines will transform society. In some ways, they have. In others, they haven’t.

Instead of considering whether AI will be “transformative”, in this post I consider a specific question: when will AI outthink humans — in terms of volume of thought?

More specifically, I’m framing the concept of “outthinking” in two orthogonal dimensions: the volume of tokens and intelligence of tokens. And I only consider the former in this post.

image

In Part 1 of this post I introduce the concept of using thought-hours to reason about large scale token volumes. In Part 2, we’ll use that unit to reason about the volume of AI thought-hours.

Of course, this is a mental model with tons of assumptions. It’s bound to be wrong, but it might be useful. To do this analysis right would involve a significant time commitment … I’d love to see someone rip this post to shreds with much better analysis!

Part 1: watt-hours → thought-hours

In an interview with Patrick Collison, Jensen Huang juxtaposed the units of the industrial revolution ($/kWh) to the AI revolution ($/M tokens).

It’s thought provoking — $/M tokens is indeed useful for pricing AI. But couldn’t we improve on it?

image

“token” usage is useful as a pricing metric: its measurable, fairly predictable, scalable, and correlated with the provider’s costs. But it need not be the only unit we use for measuring LLM volumes.

The “horsepower” of the cognitive revolution

It’s not clear to most people how an LLM “token” correlates to value.

By analogy, in the early days of the steam engine there wasn’t a clear way for potential adopters to intuitively quantify its capabilities vs alternatives (e.g. horses). To quantify this for his new steam engine, James Watt developed the unit “horsepower”, which he and Matthew Boulton standardized at 33,000 foot-pounds in 1783.

image

By analogy to “horsepower”, AI can be measured in terms of “thought-hours”. Whereas horsepower quantifies physical work relative to horses, a thought-hour quantifies cognitive work relative to humans.

The core idea is that one “thought-hour” should be equivalent to the average number of tokens a person “thinks” in an hour, hence providing a mapping between LLM usage and human cognitive work. Of course, quantifying human cognitive work is non-trivial. Just like “horsepower”, a “thought-hour” is an imperfect mapping that trades complete precision to be intuitive. But when someone is using LLMs (especially in the context of agents), it’s clearer for them to see that their session has used three thought-hours of work instead of “30,000 tokens”. We’ll come back to the idea of “token quality” later.

The clear analogy between thought-hours and tokens is that of watt-hours to horsepower-hours. In hindsight, horsepower seems like a folksy approximation of manual power. But it was useful because it was intuitive and practical at the time.

Estimating tokens per thought-hour

To come up with an estimate for tokens per hour, let’s start with the assumption that humans read at an average rate of 200 words per minute, or roughly 250 tokens per minute. This would imply that one “thought-hour” is 12,000 tokens.

To support this, consider that audiobooks default to around 9,000 words per hour. Given a words-to-tokens ratio of approximately 3:4, this implies that audiobooks amount to 12,000 tokens per hour as well.

Humans are not 100% productive during their workday. Let’s make the conservative assumption that humans are 83.33% productive, so we can get a nice round estimate of 10,000 tokens per hour. This is quite arbitrary, but keep in mind that when James Watt standardized horsepower, there were millions of horses he could have measured, and he ended up using a round figure anyways.

So given these assumptions, 1 thought-hour (Th) equates to 10,000 tokens.

image

There’s a notable difference between how many tokens we can read/perceive (input) and how many we can speak/write/create (output) in an hour. And the distinction is meaningful for AI models. But its not necessary for the unit — we can distinguish output thought-hours from input thought-hours where appropriate.

Reasoning about utility-scale AI

As we see more adoption of AI models at near-utility scale, we’ll need some way to reason about the vast volumes of tokens consumed and produced, and thought-hours may be a way to juxtapose those volumes with the human cognitive labor force.

It would be absurd to quantify the net generation of electricity generators in the U.S. in 2023 as “15.05 quintillion joules”. We have watt-hours for that (3,600 joules) and physics prefixes to quantify the orders of magnitude. So U.S. electricity generation in 2023 is “4.18 terawatt-hours (TWh)”.

Like watt-hours and other standard units, the thought-hour allows us to use physics prefixes to reason about orders of magnitude:

image

Limits of the analogy to electricity

We can take the analogy a bit further, thought its not perfect. For example, one watt is 1 joule / 1 second. By the same relationship, one “thought” would be 2 7/9th tokens / 1 second. That seems … a bit odd because thoughts don’t intuitively map to a number of tokens.

image

Part 2: human vs AI thought-hours

Quantifying human thought in the global knowledge labor force

image

Assume knowledge workers work 40 hours a week, 2,000 hours a year, and 95% of their time is thinking. Each knowledge worker produces 1,900 thought-hours (Th) each year. This implies the global knowledge labor force produces 1.9 terathought-hours (TTh) each year.

🧮
The global knowledge labor force produces 1.9 terathought-hours (TTh) each year

Assume the average knowledge worker makes $25k per year. And we’ll assume there are 1 billion knowledge workers globally. Hence, the global knowledge labor force costs $25 trillion each year, or roughly 25% of gross world product (GWP). That implies the average human thought-hour (Th) costs $25T/1.9TTh, or approximately $13 / Th.

🧮
The average human thought-hour (Th) costs $25T/1.9TTh, or approximately $13 / Th.

What has been the thought-hour capacity of the knowledge labor force over time?

McKinsey estimated that there were 230 million knowledge workers in 2012. And other reports cite one billion knowledge workers globally in 2021, implying a compound annual growth rate (CAGR) of 17%. I’m suspicious of the 1 billion figure, as I can’t find an original source for it, and the growth rate of 17% from such a high base seems nothing short of astounding. This is what the growth rate would look like:

image

If the 17% figure is sustained until 2024 (and the 2012 and 2021 estimates are correct), then nearly half of the global workforce is a knowledge worker. This seems implausible to me. But for the purposes of this exercise, let’s use the 1 billion figure from 2021, carry it forward to 2024, and assume the size of the knowledge labor force remains static over time, as if it hit a ceiling in 2021. That means human thought-hours would peak about now, at just over 2 TTh.

image

Quantifying synthetic thought in the global AI labor force

image

What are synthetic thought-hours?

To understand when AI might outthink humans, we need to first grasp the concept of synthetic thought-hours, or Th(s). Synthetic thought-hours quantify the cognitive output produced by artificial intelligence systems. In essence, this metric allows us to compare the intellectual labor performed by machines to that of humans, using a standardized unit of measurement—tokens.

How fast are synthetic thought-hours?

GPT-4o outputs tokens at a median rate of 67 tokens per second, meaning it can process 67*60*60 = 241,200 tokens per hour. This implies that synthetic thought-hours are 24 times faster than human thought-hours.

How much do synthetic thought-hours cost?

For simplicity, let’s focus solely on output tokens. GPT-4 costs $15 per million tokens, which translates to $15 per 100Th, or $0.15 per synthetic Th. At approximately $13 per Th for humans, synthetic thought-hours are 86 times cheaper than human Th.

What is the global GPU capacity for synthetic thought-hours today?

This section of the analysis is very rough. There are other analyses of compute trends in machine learning systems whose methodologies could likely be extended to get a clear picture. To get a back-of-the-envelope analysis, I simply look at H100 capacity and tokens per H100, given that (1) successive generations of frontier models often require leading/bleeding-edge chips, (2) H100 data is easier to collect, and (3) the methodology may get us within the correct order of magnitude.

Number of H100s deployed for inference

As a starting point, let’s look at the number of H100s sold in 2023:

image
  • Microsoft Azure: 150,000
  • Meta: 150,000
  • Google Cloud: 50,000
  • Oracle Cloud: 50,000
  • Amazon (AWS): 50,000
  • CoreWeave: 50,000
  • AI Startups: 40,000
  • Applied Digital: 30,000
  • Lambda: 30,000
  • Tesla: 20,000
  • Crusoe: 20,000

Total: 640,000 units

Considerations:

  • Chips are used for more than just LLM inference, such as training LLMs, powering other types of AI/ML systems, or doing simulations.
  • H100s tend to be preferred for training, while other chips (e.g., A100s) are used for inference.
  • Training:Inference ratio may be about 1:1.

Meta has 600,000 H100 equivalents, with their 2023 orders representing approximately 25% of their full capacity.

Based on these data points, I’ll assume there are one million H100 or equivalents deployed for LLMs, with half used for inference (500k). This is, of course, a rough estimate — #back_of_the_envelope

As a gut check: we know that OpenAI produces approximately 1 billion tokens a day on inference. 1 billion tokens a day = 36 trillion a year = 100KTh/day, 36GTh/year. This implies that OpenAI represents 42% of inference capacity globally. This is plausible given their reported 39% market share. Moreover, the H100 data shows Microsoft with 150k units — or 30% of our 500k figure — which aligns with Microsoft’s 30% market share in the same research report. This lends some credibility to our estimate, but nevertheless it should be taken with a salt lick.

Token Throughput for AI Inference per H100

To estimate the global capacity for AI inference, we'll use LLama2 70b as our benchmark. This choice is due to its well documented performance on Nvidia H100 GPUs, and because it is a rough midpoint for throughput. Caveat that this figure overestimates throughput for frontier models.

LLama2 70b operates at a rate of 3269 tokens per second (TPS). Note that we’re not measuring TPS from an inference provider, but on the bare metal because our multiplier is number of GPUs.

Breaking this down:

  • Tokens per hour: 3269 TPS * 3600 seconds = 196,140 tokens per hour.
  • Thought-hours per hour: 196,140 tokens / 10,000 tokens per Th = 19.6 Th per hour.
  • Thought-hours per year: 19.6 Th/hour * 8760 hours/year ~= ~170 kTh / yr

Given our estimate of 500,000 H100 units dedicated to inference:

🧮
Total synthetic thought-hours per year: 170 kTh/year * 500,000 = 85 GTh per year.
image

What is the CAGR of global GPU capacity for synthetic thought-hours?

This is an important figure given that chips are a rate-limiting factor for AI today. To get to a napkin sketch, we’ll assume the CAGR of the data center spend on GPUs to be equivalent to the CAGR of the volume of GPUs.

Markets and Markets puts the growth of the GPU market at 36%, so we’ll use that figure as our estimate for CAGR in GPU units.

image

Putting it all together

When contemplating the future where AI outthinks humans, we must integrate both human and synthetic cognitive capacities. By quantifying the global knowledge labor force's thought-hours and juxtaposing them with the growing synthetic thought-hours powered by AI, we can begin to forecast trend lines for each.

Comparative Analysis of Thought-Hours

  1. Human Thought-Hours:
    • Global Production: The global knowledge labor force produces approximately 1.9 trillion thought-hours (TTh) annually.
    • Cost Efficiency: Each human thought-hour costs around $13, culminating in a total annual cost of $25 trillion.
  2. Synthetic Thought-Hours:
    • Global Capacity: With an estimated 500,000 H100 units dedicated to inference, the global AI systems produce around 85 gigathought-hours (GTh) annually.
    • Cost Efficiency: Each synthetic thought-hour costs approximately $0.15, drastically undercutting human thought-hour costs by a factor of 86.

Growth Trajectories

Considering the current CAGR of 36% for global GPU capacity, the synthetic thought-hour production could expand significantly. Even if the growth rate of the human knowledge labor force remains static, the rapid advancements in AI suggest a different trajectory for AI.

Projections for Human vs AI thought-hour capacity

To estimate when AI might outthink humans, we must project forward:

  • Current State: AI currently produces 85 GTh annually, while humans produce 1.9 TTh.
  • Future Growth: If the CAGR of 36% continues, AI’s capacity will double approximately every two years.

If this growth rate were to remain constant, AI's synthetic thought-hours would surpass human thought-hours by volume in about 10 years.

image

This is likely a lower bound of the growth rate of synthetic thought-hours. Critically, it leaves out the observed 0.4 OOM improvements in algorithmic efficiency each year. If we factor these in, the synthetic thought-hour volume exceeds that of humans in less than half the time (4 years instead of 11.

image

What other factors might influence this trajectory? In short – many. Some top considerations that come to mind:

  • Chips: When it comes to the volume of synthetic thought-hours in the labor force, the single biggest rate limiting factor (and biggest unknown) is chips. The biggest question on this front is the geopolitical situation around Taiwan and TSMC.
  • Energy: Nearly every conversation about energy these days is about data center compute. I think energy is among the largest bottlenecks we face for training larger systems, but when it comes to inference, it seems the largest bottleneck will be chips. My main rationale for this is once we get frontier models, we typically find performance optimization strategies that allow those models (or derivatives thereof) to run on earlier generations of chips that are already deployed in data centers with sufficient energy supply.
  • AI ROI: AI is extremely capital intensive. Success stories like Service Now suggest the the gold rush is warranted, but those wins will need to be sustained over time to keep the momentum in the industry and investor sentiment.

In case I haven’t already made it abundantly clear, this is a very rough napkin sketch. I would love to see someone make this analysis more detailed and robust. Let me know if you do!

Implications

If these even projections are directionally correct, what does that mean for society? If you also believe AI models are going to get “smarter” over this same time period, the impact can definitely be societal-scale. If the projections for the volume of thought-hours said otherwise — if they projected a tiny fraction of human thought-hours — societal-scale impact might be much less likely.

As someone who has worked in data and AI/ML for my entire career, I’m quite allergic to “AI hype”. But my conviction is that AI is like the internet in the 90s … full of hype, but nonetheless set to transform society in the coming years and decades.