In what has been hailed as one of the biggest recent landmarks in the development of AI, AlphaGo Zero recently taught itself to play the ancient and notoriously complex strategy game Go - without any human input. Previous approaches had depended on huge database of previously played games to train the AI. The new approach gave it the first principles (rules) then had it play itself millions of time.
In the case of AlphaGo Zero, it took just 40 days to surpass 3,000 years of human knowledge and what’s more significant was not constrained by human knowledge. It was able create knowledge itself from first principles - from a blank slate.
"It starts from a blank slate and figures out only for itself, only from self-play, and without any human knowledge, or any human data, or features, or examples, or intervention from humans. It discovers how to play the game of Go from first principles," says DeepMind's professor David Silver.
Why does this matter?
What’s particularly notable about Alpha Go Zero is the removal of any need for human expertise in the system. This represents a big step forward for reinforcement learning versus supervised learning.
Reinforcement is arguably a more elegant approach and the more a machine can teach itself without human guidance or human data the better. Especially in a world where data is heavily regulated.
This is especially important with new regulations such as GDPR on the horizon.
While Alpha Go Zero lives in the rule-based world of games, the approach can be extended to any application that is based on physical rules from chemistry and biology, to traffic management and logistics. These systems would need no outside data and therefore no data ingestion or structure problems. What’s more there are fewer human derived bottlenecks in the learning process.