## Fall 2019

### Project 1 – Building Reinforcement Learning Environment

Due Date: Sunday, September 29, 11:59pm

### 1 Project Overview

The goal of the project is to explore and get an experience of building reinforcement learning environments, following the OpenAI Gym standards. The project consists of building deterministic and stochastic environments that are based on Markov decision process, and applying tabular method to solve them.

Part 1 [30 points] – Build a deterministic environment

Defifine a deterministic environment, where P(s0 , r|s, a) = {0, 1}. It has to have more than one state and more then one action.

Environment ideas:

• Tic-Tac-Toe
• Grid world
• Student’s Life
• Any your ideas are welcome

Part 2 [30 points] – Build a stochastic environment  代写Reinforcement Learning

Defifine a stochastic environment, where P s0 ,r P(s0 , r|s, a) = 1. A modifified version of environment defifined in Part 1 can be used.

Part 3 [40 points] – Implement tabular method

Apply a tabular method to solve environments, that were built in Part 1 and Part 2.

Tabular methods options:

• Dynamic programming
• Q-learning
• SARSA
•  TD(0)
• Monte Carlo

### 2 Deliverables

There are two parts in your submission (unless they are combined into Jupyter notebook):

### 2.1 Report  代写Reinforcement Learning

Report can be done, either as a pdf or directly in Jupyter notebook, but it has to follow the report structure,as NIPS template.

In your report, describe the deterministic/stochastic environments, that were defifined. Show your results after applying an algorithm to solve deterministic and stochastic types of problems, that might include plots and your interpretation of the results.

Show your understanding of:

• Difffferences between the deterministic/stochastic environments
• Example and role of transition-probability matrix
• Main components of the RL environment
• Explain tabular method that was used to solve the problem

### 2.2 Code

The code of your implementations. Code in Python is the only accepted one for this project. You can submit multiple fifiles, but they all need to have a clear naming. All Python code fifiles should be packed in a ZIP fifile named Y OUR_UBID_project1.zip After extracting the ZIP fifile and executing command python main.py in the fifirst level directory, it should be able to generate all the results and plots you used in your report and print them out in a clear manner.

### 3 References

• NIPS Styles (docx, tex)
• Google Colab tutorial
• GYM environments
• Lecture slides
• Richard S. Sutton and Andrew G. Barto, “Reinforcement learning: An introduction”, Second Edition,MIT Press, 2019