Part 1 of the [[Will AI kill us all]] sequence
Recently I’ve come to the conclusion that AI is probably the greatest X-Risk humanity faces. I’ve also changed my mind quite a bit about timelines and now expect with something like 80% credence that everyone dies by 2100 from AI. Most of the 20% is uncertainty about the reliability of my own thought processes and worries about social contagion etc…
Thoughts on AI
- why I think we all die
- It will be much smarter than we are
- train stop analogy
- analogy
- imagine a train going at full speed across a railway crossing a desert
- the railway is 100km long
- at some point, there is a roughly 200m long platform
- the train has an automated breaking system that randomly applies the breaks at some point during the journey
- Q: how likely is it the train will stop alongside the platform?
- A: not very likely at all
- explanation
- imagine a scale of intelligence going from 0 – 100
- commonly we think of 0 as being a dumb human/child and 100 as being Einstein/John von Neumann etc…
- this is incorrect. In reality 1 is a small cell, 3 is a mammal, 4 is a human, 10 is a intelligence so far beyond us we can’t comprehend it etc…
- if the range from dumb human – smart human is very small as a % of the overall possible range, it’s unlikely AI will happen to reach it’s natural limit just in that range
- analogy
- no reason to believe human brain is magic natural limit
- natural selection is highly inefficient vs intentionally designed solutions
- for most things, we can outdo nature by several orders of magnitude e.g:
- mach 7 hypersonic vehicle vs the fastest animal
- titanium and other alloys vs the strongest tree/bone
- nuclear weapons, guns, blades vs predators claws
- vaccines, drugs vs natural immune systems
- by default, we should probably assume that the minds we one day build will be much, much better than the best (human) minds which natural selection has built
- train stop analogy
- it will probably be unaligned
- mindspace is large
- all agents have some kind of utility function
- the space of possible utility functions is incredibly vast
- the space of utility functions resembling anything we humans would recognise (even things we would recognise as bad human morality like e.g: facism) are a tiny % of the overall space
- absent very significant effort, the utility function an AI ends up having will be very, very alien and strange to us
- alignment is hard
- mesa optimizers
- if you train a system by rewarding it for doing Y, it doesn’t actually learn to do Y. Rather it learns to do X, which is highly correlated with Y in the training environment
- https://www.alignmentforum.org/posts/pL56xPoniLvtMDQ4J/the-inner-alignment-problem
- deceptive alignment
- If you want an agent to do X it will do X while it’s weaker than you. Once it’s much stronger than you and no longer needs to care about what you want it will stop doing X and do Y instead. That also means that it’s hard to tell if an agent is actually aligned or just playing you.
- interpretability
- At the moment, it’s impossible to really know what an AI wants, what it values or what it’s thinking
- hence, it seems unlikely we’ll know how aligned AI’s are or whether they’re trying to deceive us
- mesa optimizers
- bad incentives/dynamics
- commercial races
- military races
- alignment harder than application = we do application first
- breaking point argument
- everything that worked before breaks the moment you go from 80 IQ systems to 280 IQ systems
- mindspace is large
- it will kill us all
- it will want to kill us all
- humans could destroy it/switch it off
- humans are made of atoms which can be used for other things
- it will be able to kill us all
- If something is much smarter than you, it can outplay you at virtually every game you can conceive of. This includes manipulation, cyber-security, AI research, biotech etc…
- it will want to kill us all
- It will be much smarter than we are
- a few standard counterarguments and why I think they don’t really make sense
- AI will stop at human level or close to it
- see above
- we can put it in a box/airgapped system
- this won’t happen in reality. Commercial and military systems are and will be fully networked
- if it did happen:
- the moment you interact with it, it can manipulate you into doing anything including letting it out
- we think it’s airgapped, it’s understanding of physics, computers and signal processing will be far beyond ours so it may well be able to find a way around the air-gapping
- it won’t hate us
- see above
- AI’s won’t be agentic by default
- I think this is the most likely objection
- heavily depends on what ML paradigm is used for AI
- I think there are strong incentives to make agentic AI’s for both firms and govs as agentic systems are far more useful
- people will realise it’s bad and start to regulate
- no evidence of this happening ATM
- don’t think this will happen given
- huge commercial and military incentives to speed ahead
- nothing bad happens until you hit the part of the slop where you go from below human to very beyond human AI quickly (AKA there’s no fire alarm for AGI)
- don’t believe China will do it
- not sure how a ban would work? Two factor model: AI = algo strength + amount of compute. Do you ban anyone having more than X GPU’s in a data center? Ban cloud providers from providing GPU’s for model training? Ban algo research in CS journals?
- AI will stop at human level or close to it