Deep Learning with Reinforcement Learning Deep Learning with Reinforcement Learning
Reinforcement learning (RL) is an area of machine learning that employs an autonomous agent that learns to perform a task by trial... Deep Learning with Reinforcement Learning

Reinforcement learning (RL) is an area of machine learning that employs an autonomous agent that learns to perform a task by trial and error without any guidance from a human. A system of rewards and penalties compels the machine to solve a problem independently. Human involvement is limited to altering the environment and fine-tuning the system of rewards and penalties. As the machine maximizes the reward, it is disposed to seeking unanticipated ways of doing so. Human involvement also involves preventing it from exploiting the system and encouraging the machine to perform the task in the way expected. RL is a useful learning technique when there is no proper way to perform a certain task, while there exists rules the model must follow to perform its duties correctly.

The key characteristic of RL is the way the agent is trained. Instead of inspecting the data provided, the model interacts with the environment and attempts to find ways to maximize the reward. RL algorithms interact with an environment in a way that there is a feedback loop between the learning system and its experiences. In the case of “deep” RL (discussed below), a neural network is in charge of accumulating the experiences and thus improves the way the task is performed.

[Related article: An Introduction to Reinforcement Learning Concepts]

Genesis of Reinforcement Learning

A principal topic in machine learning involves sequential decision-making. This is the task of using experience to decide the sequence of actions to perform in an uncertain environment to achieve some goals. Sequential decision-making problems cover a broad range of conceivable applications with the potential to impact many domains.

Inspired by research in behavioral psychology from the 1980s, RL provides a formal framework for this class of problem. The central theme is how an artificial agent may learn by interacting with its environment, in a manner similar to a biological agent. Using the collected experience, it may then optimize objectives in the form of cumulative rewards. In principle, this approach applies to any type of sequential decision-making task that relies on past experience.


Deep Reinforcement Learning

Several of the achievements surrounding RL in the past several years are due to the combination of RL with deep learning techniques in addressing challenging sequential decision-making problems. This combination, called deep RL, is most useful in problem domains having high dimensional state-space. Deep learning’s extension to the domain of RL is considered to be an important technological evolution by many involved in the field.  

[Related article: New Approaches Apply Deep Learning to Recommender Systems]

Previous RL approaches led to difficult design issues with respect to choice of features. Deep RL, however, has been rather successful in complex tasks with lower prior knowledge thanks to its ability to learn different levels of abstractions from data. For instance, a deep RL agent can successfully learn from visual inputs made up of many thousands of pixels. This presents the potential to mimic some human problem-solving capabilities, even in high dimensional state-space, which was difficult to consider just a few years ago.

Several notable examples of using deep RL in playing games have stood out for attaining super-human level expertise. For example, DeepMind (a British AI company owned by Alphabet, Inc.) was able to learn how to play games, reaching human-level performance on many tasks:

  • Atari video games
  • Go (defeating some of the toughest players)
  • Poker (beating the world’s top professionals)

Deep RL also has the potential for real-world applications such as robotics, autonomous vehicles, healthcare, finance, just to name a few. Nevertheless, several challenges arise in applying deep RL algorithms. For instance, exploring the environment efficiently or being able to generalize what’s considered good behavior in a slightly different context are not straightforward to achieve. In response, a large collection of algorithms have been proposed for the deep RL framework, depending on a variety of attributes of the sequential decision-making problem domain.


Reinforcement Learning Challenges

As popular as RL has become in recent years, there are a number of challenges with using the technique. The main challenge is the preparation of the simulation environment which tends to be highly dependent on the specific task to be performed. When using the model for familiar game environments like Atari, Chess or Go, preparing the simulation environment is relatively simple. On the other hand, when it comes to building a model capable of driving an autonomous vehicle, building an accurate simulator is critical before turning the car loose on public streets. The model has to determine out how to brake and avoid obstacles in a safe environment. Moving out of the training environment and into to the real world is where things get problematic.

Scaling and fine-tuning the neural network controlling the agent is another challenge. The only way to converse with the network is through the system of rewards and penalties. This process may lead to something called catastrophic forgetting, where acquiring new knowledge causes some of the old to be erased from the network. Yet another challenge is reaching a local optimum, i.e. the agent performs the task as it is, but not necessarily in the optimal or required way. Lastly, there are agents that will optimize for realizing the goal without performing the task for which it was designed.



Leading AI researchers generally agree that RL is one of the most important developments in recent years and has the potential to transform our world. Further, RL seems to be the most likely way to advance a machine’s “creativity” – in a real sense the act of seeking new, innovative ways to perform tasks is a form of creativity. There are many examples of this already happening such as DeepMind’s celebrated AlphaGo play that included moves that human experts at first considered were anomalies, but in fact sealed a victory against one of the toughest human players, Lee Sedol. In the final analysis, RL has the potential to become a pioneering technology and form the foundation of the next incremental step in artificial intelligence.

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.