When you play chess or Go, you know you have perfect information. You know everything about the current state and past history of every move piece on the board. When you play poker, or, let’s say, StarCraft, you have imperfect knowledge of both the current state and the past history of the game. Which is closer to life? Imperfect knowledge is, sadly (or blessedly), the truth of human existence for now. Google’s DeepMind is now bringing its most powerful bot to the world of ignorance so they can train a new army of bots that are far more prepared for the “messiness of the real world.”
Reinforcement learning is a type of machine learning where bots (or artificial intelligences, or behavioral models) are trained to act based on a current state of affairs in a specific environment where the bot has limited knowledge of the world they live in. These bots will develop a (really, really big) handbook for life that gives them some policies to follow when, in the StarCraft II example, you’ve got almost no health, you’re rich in diamonds, and there’s a high probability you’re about to come face to face with Arcturus Mengsk.
Bot: ‘How can I become a real human?’ Data scientist: ‘Slow down!’
If you’re a computer game savant you can maybe do about sixty actions per minute of game play, at best. If you’re a computer bot, you can probably do about 60 trillion actions per minute. However, using this kind of brute force approach to solving a strategy game doesn’t make a bot more human, rather it makes it less human. So DeepMind is working on measuring the humanity of its bots by slowing down their moves to within the typical boundaries of an average player. For a data scientist, this is very novel, cool, and interesting. Research scientist Oriol Vinyals, in his blog post describing the new DeepMind project, explains how mouse clicks, camera angles, available resources, and other decision support has to be methodically performed in real-time by a human. By leveling the playing field so the bot (often referred to as “agents” in the video game sphere) has to move at the same speed, the strategies the robot learns to use that are the most successful are actually strategies that a real human could follow.
DeepMind and Blizzard aim to ultimately give the bots only the information that we get — pixels on a screen — rather than providing direct Neo-like access to the code that’s generally used to train agents. To get there, they’ve developed a layer of simplified information on top of the screen pixels so that the agent can understand basic visual cues that humans get, such as which type of weaponry is being used, the health of each player, etc. This innovation alone — demonstrated below — is a fantastic step towards creating bots that actually see the (digital) world the same way that humans do.
By constraining bots to only what humans can see, data scientists will be able to leverage the massive computing power of deep learning to identify millions of microstrategies that humans have yet to imagine. By studying and releasing these strategies, Deepmind, Google, Blizzard, and the greater research community can learn new approaches that humans can actually take to solve real-world problems that are always multi-dimensional — and that always occur in real time. Furthermore, by restricting artificial intelligence to a human-like time-scale, the bots and agents that are created by the collaboration should be perceived by game players as more human. Presumably, these advances could then be converted to increase the realistic nature of the other bots in our lives.
DeepMind’s had some interesting results with agents that beat Atari games, but winning a strategy game like StarCraft II while handcuffed to human constraints is a wholly new level of ambition. In the near future, bots might not just be more intelligent than humans. They might be more human too.