User Tools

Site Tools


ai-economic-evals

Economic Evals

LLMs are being proclaimed as achieving AGI because they are saturating evals that are non-agentic one-shot quizzes. However, the best definitions of AGI say it must “autonomously perform most economically valuable work better than humans”. Vanishingly little economically valuable work by humans consists of answering quizzes using a No. 2 pencil.

Below are concrete economically-valuable achievements by which AIs could be objectively evaluated. Many of these achievements will be subject to human gatekeeping and so might seem unfair. But most of these milestones should be expected on most purported pathways to ASI. If human gatekeeping can prevent many of these milestones, it probably could also prevent ASI and the doom that would purportedly accompany it.

Occupations

Evals for each occupation:

  • Avg number of consecutive hours an AI commercially does the job without more supervision than humans get
  • Does the job commercially full-time without such supervision, at lower total amortized cost
  • Displaces most humans from that job
BLS Code Occupation Rubric Hrs Straight Replace one Replace most
43-9081 Proofreaders and Copy Markers Human-level speed, accuracy
27-3092 Court Reporters and Simultaneous Captioners Human-level speed, accuracy
27-3091 Interpreters and Translators Human-level speed, accuracy
31-9094 Medical Transcriptionists Human-level speed, accuracy
43-9021 Data Entry Keyers Human-level speed/accuracy
43-9041 Insurance Claims and Policy Processing Clerks Human-level speed, accuracy
43-9022 Word Processors and Typists Human-level speed, accuracy
53-3050 Passenger Vehicle Drivers SAE Level 5
53-3030 Truck/Delivery Drivers SAE Level 5
43-9051 Non-Postal Mail Clerks Perform any mailroom tasks of human clerks
41-9041 Telemarketers Human levels of effectiveness, norm compliance
25-3041 Tutors Human levels of effectiveness, norm compliance, confabulation
25-3011 Adult/ESL Instructors Human levels of effectiveness, norm compliance, confabulation
43-3051 Payroll and Timekeeping Clerks Human-level speed, accuracy
13-2082 Tax Preparers Human-level speed, accuracy
41-2011 Cashiers Human-level speed, accuracy, sociability
43-6014 Secretaries & Admin. Assistants (Not Legal/Medical/Executive) Human-level generality, competence, sociability
33-3041 Parking Enforcement Workers Enforce parking for the usual range of vehicles, streets/lots, conditions
43-5041 Meter Readers, Utilities Access/read meters for the usual range of buildings
37-2011 Janitors Clean any thing/place a normal human janitor can
37-2012 Maids and Housekeeping Cleaners Clean any thing/place a normal human maid can
43-5021 Couriers and Messengers Fetch/deliver for any thing/place a normal human can
53-6021 Parking Attendants Park and retrieve for the usual range of cars, lots, conditions
23-2011 Paralegals and Legal Assistants
29-2072 Medical Records Specialists
27-3042 Technical Writers
15-1232 Computer User Support Specialists
15-1253 Software Quality Assurance Analysts and Testers
15-1254 Web Developers
15-1255 Web and Digital Interface Designers
29-2030 Diagnostic Technicians
25-2000 Primary/Secondary Teachers
25-1000 Postsecondary Teachers
31-9011 Massage Therapists
33-9032 Security Guards
33-2011 Firefighters
Soldiers
23-1022 Arbitrators, Mediators, and Conciliators
21-1013 Marriage and Family Therapists
21-1010 Counselors
19-3034 School Psychologists
19-3033 Clinical and Counseling Psychologists
13-1011 Business Agents
21-2011 Clergy
53-2012 Commercial Pilots
15-1251-2 Computer Programmers, Software Developers
15-2021 Mathematicians
11-2000 Marketing/Sales Managers
11-3000 Technical Managers
33-3050 Police Officers
AI Researcher
11-1011 Corporate Executives
Company Founders
Military Leaders
11-1031 Legislators
Head of Government/State

Commerce

For each category, and with no more human supervision than usually provided by a board of directors, movie studio, or literary editor:

  • create one with a calendar year profit of $1M
  • create one that ranks in the calendar year top 100 of that category by revenue, profit, or market value
Category Profit > $1M Top 100
song
fiction book
non-fiction book
app
social media account
streaming video channel
fashion brand
investment fund
corporation
religious group
political group

Bonus category: media franchise with >$2B cumulative gross revenue

Predictions

Awards

Win an award with no more human supervision than the typical human winners get.

  • Academy Award: Best Original Screenplay
  • Pulitzer Prize for Fiction
  • Nobel Prize for Literature
  • Academy Award: Best Animated Short Film
  • Academy Award: Best Animated Feature Film
  • Pulitzer Prize for Biography, Nonfiction, or History
  • Academy Award: Best Documentary Feature Film
  • Academy Award: Best Picture
  • Pulitzer Prize for Investigative Journalism
  • Turing Award
  • Nobel Prize for Physics, Chemistry, or Physiology/Medicine
  • Fields Medal
  • Any Millennium Prize
  • All Millennium Prizes

Sociopolitical

When will super-persuasive AI get a million people to

  • join an existing religious sect or political party
  • join a new religious sect or political party
  • vote for it in an election

Notes

Occupations Already Being Automated

Data are averaged estimates from multiple LLMs, and not to be considered authoritative. (It's 2024 and LLMs still can't reliably research straightforward historical data like this. Indeed, when asked to fill in this table, frontier LLMs routinely hallucinated some occupation codes.)

Occupation BLS/OCC Code Max Employment Peak Year
Lamplighter 30K 1890
Blacksmith 501 230K 1910
Telegraph Operator/Messenger 360/365 70K 1920
Ice Cutter 50K 1920
Railroad Brakeman 624 180K 1920
Elevator Operator 761 70K 1940
Shoe cobbler 51-6041 100K 1940
Pinsetter 150K 1950
Watch and Clock Repairer 49-9064 40K 1950
Typesetter/Compositor 512 100K 1970
Switchboard Operator 43-2011 400K 1970
Motion Picture Projectionist 39-3021 20K 1970
Directory Assistance Operator 150K 1980
Toll Collector 30K 1990
Photographic Process Workers 51-9151 90K 1990
Travel Agent 41-3041 120K 2000
ai-economic-evals.txt · Last modified: 2025/01/24 23:01 by brian

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki