I suspect that AI will end humanity

Part 1 of the [[Will AI kill us all]] sequence

Recently I’ve come to the conclusion that AI is probably the greatest X-Risk humanity faces. I’ve also changed my mind quite a bit about timelines and now expect with something like 80% credence that everyone dies by 2100 from AI. Most of the 20% is uncertainty about the reliability of my own thought processes and worries about social contagion etc…

Thoughts on AI

  • why I think we all die
    • It will be much smarter than we are
      • train stop analogy
        • analogy
          • imagine a train going at full speed across a railway crossing a desert
          • the railway is 100km long
          • at some point, there is a roughly 200m long platform
          • the train has an automated breaking system that randomly applies the breaks at some point during the journey
          • Q: how likely is it the train will stop alongside the platform?
          • A: not very likely at all
        • explanation
          • imagine a scale of intelligence going from 0 – 100
          • commonly we think of 0 as being a dumb human/child and 100 as being Einstein/John von Neumann etc…
          • this is incorrect. In reality 1 is a small cell, 3 is a mammal, 4 is a human, 10 is a intelligence so far beyond us we can’t comprehend it etc…
          • if the range from dumb human – smart human is very small as a % of the overall possible range, it’s unlikely AI will happen to reach it’s natural limit just in that range
      • no reason to believe human brain is magic natural limit
        • natural selection is highly inefficient vs intentionally designed solutions
        • for most things, we can outdo nature by several orders of magnitude e.g:
          • mach 7 hypersonic vehicle vs the fastest animal
          • titanium and other alloys vs the strongest tree/bone
          • nuclear weapons, guns, blades vs predators claws
          • vaccines, drugs vs natural immune systems
        • by default, we should probably assume that the minds we one day build will be much, much better than the best (human) minds which natural selection has built
    • it will probably be unaligned
      • mindspace is large
        • all agents have some kind of utility function
        • the space of possible utility functions is incredibly vast
        • the space of utility functions resembling anything we humans would recognise (even things we would recognise as bad human morality like e.g: facism) are a tiny % of the overall space
        • absent very significant effort, the utility function an AI ends up having will be very, very alien and strange to us
      • alignment is hard
        • mesa optimizers
        • deceptive alignment
          • If you want an agent to do X it will do X while it’s weaker than you. Once it’s much stronger than you and no longer needs to care about what you want it will stop doing X and do Y instead. That also means that it’s hard to tell if an agent is actually aligned or just playing you.
        • interpretability
          • At the moment, it’s impossible to really know what an AI wants, what it values or what it’s thinking
          • hence, it seems unlikely we’ll know how aligned AI’s are or whether they’re trying to deceive us
      • bad incentives/dynamics
        • commercial races
        • military races
        • alignment harder than application = we do application first
      • breaking point argument
        • everything that worked before breaks the moment you go from 80 IQ systems to 280 IQ systems
    • it will kill us all
      • it will want to kill us all
        • humans could destroy it/switch it off
        • humans are made of atoms which can be used for other things
      • it will be able to kill us all
        • If something is much smarter than you, it can outplay you at virtually every game you can conceive of. This includes manipulation, cyber-security, AI research, biotech etc…
  • a few standard counterarguments and why I think they don’t really make sense
    • AI will stop at human level or close to it
      • see above
    • we can put it in a box/airgapped system
      • this won’t happen in reality. Commercial and military systems are and will be fully networked
      • if it did happen:
        • the moment you interact with it, it can manipulate you into doing anything including letting it out
        • we think it’s airgapped, it’s understanding of physics, computers and signal processing will be far beyond ours so it may well be able to find a way around the air-gapping
    • it won’t hate us
      • see above
    • AI’s won’t be agentic by default
      • I think this is the most likely objection
      • heavily depends on what ML paradigm is used for AI
      • I think there are strong incentives to make agentic AI’s for both firms and govs as agentic systems are far more useful
    • people will realise it’s bad and start to regulate
      • no evidence of this happening ATM
      • don’t think this will happen given
        • huge commercial and military incentives to speed ahead
        • nothing bad happens until you hit the part of the slop where you go from below human to very beyond human AI quickly (AKA there’s no fire alarm for AGI)
      • don’t believe China will do it
      • not sure how a ban would work? Two factor model: AI = algo strength + amount of compute. Do you ban anyone having more than X GPU’s in a data center? Ban cloud providers from providing GPU’s for model training? Ban algo research in CS journals?

