# Gymnasium

Before we further introduce reinforcement learning concepts, let's look at the Python package `gymnasium`. 
`gymnasium` is an open-source Python library for developing and comparing reinforcement learning algorithms.
Initially developed by OpenAI, it is now maintained by a community of developers.  
Like many other Python packages, the `gymnasium` package follows a specific structure. 
Let's start by creating one of the environments packaged with `gymnasium`. 

In [None]:
import gymnasium as gym

The `gymnasium` package comes with a range of environments that we can work with. 
The `pprint_registry()` function shows a list of the available environments.

In [None]:
gym.pprint_registry()

To demonstrate, we will use the `"LunarLander-v3"` environment. 

In [None]:
env = gym.make("LunarLander-v3", render_mode='rgb_array')

Note that we instantiated the environment with the `render_mode` of `'rgb_array'`; this is necessary to produce the animations below. 

The `env` object has a series of methods associated with providing actions to and generally controlling the environment. 
The first thing we can do is `reset` the environment. 

In [None]:
observation, info = env.reset()

This returns an `observation` and some `info` (we can usually ignore the latter). 
The `observation` structure depends on the environment being used; for [the `LunarLander-v3` the documentation](https://gymnasium.farama.org/environments/box2d/lunar_lander/) tells us that the observation is an 8-dimensional vector containing: 
- coordinates in *x* and *y*, 
- velocities in *x* and *y*,
- angle and angular velocity, 
- two boolean integers representing whether each leg is in contact with the ground. 

In [None]:
observation

The environment works because an action is supplied each time the `step` method is called. 
Again, the action depends on the environment, with the `LunarLander-v3` having four possible actions: 
- `0`: do nothing
- `1`: fire left orientation engine
- `2`: fire main engine
- `3`: fire right orientation engine

Below, we select the action based on the modulo of the iteration number by four. 
This is run for 1000 steps or until termination. 

````{margin}
```{note}
The `reset` method here is given a `seed=0` to ensure the environment is appropriately initialised for the course book. 
I recommend removing that seed if you want to train a general solution. 
```
````

In [None]:
import numpy as np

current_rewards = 0
obs = env.reset(seed=0)[0]
render = []
for step in range(env.spec.max_episode_steps):
    action = step % 4
    obs, reward, terminated, truncated, info = env.step(action)
    current_rewards += reward
    render.append(env.render())
    if terminated:
        break

env.close()

We can then visualise the render with the following code. 

In [None]:
import matplotlib.pyplot as plt 
import matplotlib.animation
from IPython.display import HTML

plt.rcParams["animation.html"] = "jshtml"
plt.ioff()

fig, ax = plt.subplots(1,1)

def animate(t):
    plt.cla()
    ax.imshow(render[t] / 255)
    ax.text(500, 50, f'Step: {t}', color='white')

ani = matplotlib.animation.FuncAnimation(fig, animate, frames=len(render))
html = HTML(ani.to_jshtml())
display(html)
plt.close()

Surprisingly, this specific starting configuration manages to survive the landing with this simplistic action. 
Other starting configurations would be less lucky. 