OpenAI Gym Environments — Core Concepts
Why Gym environments matter
Reinforcement learning (RL) needs two sides: an agent that decides what to do and an environment that reacts. OpenAI Gym (now Gymnasium) standardises the environment side so that any agent can plug in without modification. This separation is what turned RL from a niche research effort into an accessible field.
The core loop
Every Gym environment follows the same rhythm:
- Reset — the environment returns an initial observation (a snapshot of the world state).
- Step — the agent picks an action; the environment returns four things: the new observation, a reward (a number), a flag saying whether the episode is done, and extra info.
- Repeat until the episode ends, then reset again.
This loop is captured by two methods: env.reset() and env.step(action).
Spaces: observation and action
Environments declare what the agent sees (observation space) and what the agent can do (action space). Common space types include:
- Discrete — a finite set of choices, like “left” or “right.”
- Box — a range of continuous numbers, like joint angles of a robot arm.
- MultiDiscrete — several independent discrete choices packed together.
Knowing the space shapes tells the agent how to read inputs and format outputs without hard-coding anything per game.
Built-in environments
Gymnasium ships dozens of ready-made worlds grouped by theme:
| Category | Examples | Typical use |
|---|---|---|
| Classic Control | CartPole, MountainCar, Pendulum | Quick sanity checks |
| Toy Text | FrozenLake, Taxi, Blackjack | Tabular algorithms |
| Box2D | LunarLander, BipedalWalker | Continuous control |
| MuJoCo | HalfCheetah, Humanoid, Ant | High-dimensional robotics |
| Atari (via ale-py) | Breakout, Pong, Space Invaders | Deep RL benchmarks |
Each comes with a string identifier like "CartPole-v1" that you pass to gymnasium.make().
Rendering modes
Environments can render to a window ("human" mode), return pixel arrays ("rgb_array" mode for recording), or skip visuals entirely for speed. Choose the mode at creation time:
env = gymnasium.make("LunarLander-v3", render_mode="human")
Wrappers
Wrappers are thin layers that sit between the agent and the raw environment. They can change observations (resize images, normalise values), modify rewards (clip to [-1, 1]), or limit episode length. You chain them like decorators:
TimeLimit → NormalizeObservation → ClipReward → base env
This keeps your agent code clean because transformations live outside the learning algorithm.
Common misconception
Many beginners think Gym trains the agent. It does not. Gym only defines the world. You still need a separate library or your own code for the learning algorithm — Gym just provides the stage.
From OpenAI Gym to Gymnasium
The original gym package is no longer maintained. The community fork, Gymnasium (pip install gymnasium), is the active successor with the same API plus improvements like explicit truncation handling and better type annotations. New projects should always use gymnasium.
The one thing to remember: Gym environments give every RL agent the same standardised stage — reset, step, observe, reward — so researchers can compare apples to apples.
See Also
- Python Environment Wrappers How thin add-on layers let you change what a learning program sees and does without rewriting the game itself
- Python Monte Carlo Tree Search The clever trick behind AlphaGo — how a program explores millions of possible moves by playing quick random games against itself
- Python Multi Agent Reinforcement What happens when multiple programs learn together in the same world — cooperation, competition, and emergent teamwork
- Python Policy Gradient Methods Instead of scoring every move, what if the program just learned which moves feel right? That is policy gradients
- Python Q Learning Implementation How a program builds a cheat sheet of every situation and every action to figure out the best move — no teacher required