Gymnasium

Gymnasium#

Before we further introduce reinforcement learning concepts, let’s look at the Python package gymnasium. gymnasium is an open-source Python library for developing and comparing reinforcement learning algorithms. Initially developed by OpenAI, it is now maintained by a community of developers.
Like many other Python packages, the gymnasium package follows a specific structure. Let’s start by creating one of the environments packaged with gymnasium.

import gymnasium as gym

The gymnasium package comes with a range of environments that we can work with. The pprint_registry() function shows a list of the available environments.

gym.pprint_registry()

===== classic_control =====
Acrobot-v1             CartPole-v0            CartPole-v1
MountainCar-v0         MountainCarContinuous-v0 Pendulum-v1
===== phys2d =====
phys2d/CartPole-v0     phys2d/CartPole-v1     phys2d/Pendulum-v0
===== box2d =====
BipedalWalker-v3       BipedalWalkerHardcore-v3 CarRacing-v3
LunarLander-v3         LunarLanderContinuous-v3
===== toy_text =====
Blackjack-v1           CliffWalking-v0        FrozenLake-v1
FrozenLake8x8-v1       Taxi-v3
===== tabular =====
tabular/Blackjack-v0   tabular/CliffWalking-v0
===== mujoco =====
Ant-v2                 Ant-v3                 Ant-v4
Ant-v5                 HalfCheetah-v2         HalfCheetah-v3
HalfCheetah-v4         HalfCheetah-v5         Hopper-v2
Hopper-v3              Hopper-v4              Hopper-v5
Humanoid-v2            Humanoid-v3            Humanoid-v4
Humanoid-v5            HumanoidStandup-v2     HumanoidStandup-v4
HumanoidStandup-v5     InvertedDoublePendulum-v2 InvertedDoublePendulum-v4
InvertedDoublePendulum-v5 InvertedPendulum-v2    InvertedPendulum-v4
InvertedPendulum-v5    Pusher-v2              Pusher-v4
Pusher-v5              Reacher-v2             Reacher-v4
Reacher-v5             Swimmer-v2             Swimmer-v3
Swimmer-v4             Swimmer-v5             Walker2d-v2
Walker2d-v3            Walker2d-v4            Walker2d-v5
===== None =====
GymV21Environment-v0   GymV26Environment-v0

To demonstrate, we will use the "LunarLander-v3" environment.

env = gym.make("LunarLander-v3", render_mode='rgb_array')

Note that we instantiated the environment with the render_mode of 'rgb_array'; this is necessary to produce the animations below.

The env object has a series of methods associated with providing actions to and generally controlling the environment. The first thing we can do is reset the environment.

observation, info = env.reset()

This returns an observation and some info (we can usually ignore the latter). The observation structure depends on the environment being used; for the LunarLander-v3 the documentation tells us that the observation is an 8-dimensional vector containing:

coordinates in x and y,
velocities in x and y,
angle and angular velocity,
two boolean integers representing whether each leg is in contact with the ground.

observation

array([ 0.00776711,  1.4029775 ,  0.78670657, -0.35302836, -0.00899331,
       -0.17820069,  0.        ,  0.        ], dtype=float32)

The environment works because an action is supplied each time the step method is called. Again, the action depends on the environment, with the LunarLander-v3 having four possible actions:

0: do nothing
1: fire left orientation engine
2: fire main engine
3: fire right orientation engine

Below, we select the action based on the modulo of the iteration number by four. This is run for 1000 steps or until termination.

import numpy as np

current_rewards = 0
obs = env.reset(seed=0)[0]
render = []
for step in range(env.spec.max_episode_steps):
    action = step % 4
    obs, reward, terminated, truncated, info = env.step(action)
    current_rewards += reward
    render.append(env.render())
    if terminated:
        break

env.close()

We can then visualise the render with the following code.

import matplotlib.pyplot as plt 
import matplotlib.animation
from IPython.display import HTML

plt.rcParams["animation.html"] = "jshtml"
plt.ioff()

fig, ax = plt.subplots(1,1)

def animate(t):
    plt.cla()
    ax.imshow(render[t] / 255)
    ax.text(500, 50, f'Step: {t}', color='white')

ani = matplotlib.animation.FuncAnimation(fig, animate, frames=len(render))
html = HTML(ani.to_jshtml())
display(html)
plt.close()

Surprisingly, this specific starting configuration manages to survive the landing with this simplistic action. Other starting configurations would be less lucky.