Until a few years ago, the only AI agent capable of strategy making was OpenAI’s multi-agent system. OpenAI Five was capable of playing the complex strategy game Dota2 at superhuman levels. While the OpenAI Five engineers coded the various features, rules and reward functions of the game, it learned by playing against itself; 180 years per day. It was a remarkable experience in collaboration and strategy making by an AI agent.
Recently, the engineers of Deepmind went a step further in the development of a general-purpose algorithm.
Five years in the making, starting with AlphaGo in 2016, at the end of 2020, Deepmind published a paper in Nature describing a computer program that can learn to play games without knowing the rules. This ground-breaking development will most likely be as fundamental as the development of AlphaGo back in 2016. The algorithm can plan winning strategies in unknown environments and can simply learn by doing.
The algorithm, called MuZero, managed to outperform all prior algorithms in the 57 Atari games while matching the superhuman capabilities of AlphaGo in games like Go, Chess or Shogi. These prior algorithms all relied on knowledge embedded by the developers on the dynamics and rules of the environment.
Reinforcement Learning …
Read More on Datafloq
Credit: Source link