Environment Catalog

Browse published RL environments. 9 available.

All Finance Game Control Optimization Robotics

All Easy Medium Hard

5-Stock Trading

5-stock portfolio trading with synthetic GBM prices, transaction costs, and Sharpe-based reward.

financemedium

Inventory Management

Warehouse inventory optimization for 3 products with stochastic demand.

optimizationmedium

TurkishEditorEnv

A text editing environment where an RL agent learns to proofread Turkish documents containing translation errors from English and spelling mistakes violating Turkish "imla kuralları" (spelling rules). The agent navigates through documents using a sliding window, detecting and correcting character-level errors including Turkish-specific distinctions (ı/i, ğ/g, ş/s) and translation false friends, while managing a limited editing budget.

medium

dynamic-goal-navigation-v0

A 2D point-mass navigation task where the target goal location evolves continuously to test adaptation to changing reward structures while maintaining fixed transition dynamics. The agent controls a point mass with double-integrator dynamics (mass=1.0, friction=0.1) in a 10x10 bounded arena. The goal follows configurable dynamics: static, linear drift with reflection, Brownian random walk, or periodic teleportation.

navigationmedium8/8

non-stationary-cartpole

A CartPole environment with continuously drifting physical parameters (pole length and mass) to test adaptation to non-stationary dynamics. Parameters evolve via configurable schedules (sinusoidal, random walk, or abrupt steps). Observation space optionally includes temporal awareness features (sin/cos of phase) to help the agent anticipate parameter changes.

classic_controlmedium1/8

adaptive-goal-nav

A 2D point-mass navigation environment where a holonomic robot tracks a smoothly moving goal. The reward structure continuously morphs between dense (distance-based) and sparse (proximity-based) according to a time-varying alpha parameter, testing continual adaptation to non-stationary reward functions.

navigationmedium8/8

custom-env

Gymnasium-compatible continuous 2D navigation environment (10m x 10m arena). Observation space: Box(low=-10, high=10, shape=(14,), dtype=float32) containing [agent_x, agent_y, agent_vx, agent_vy, goal_relative_x, goal_relative_y, 8_lidar_readings (ray cast distances)]. Action space: Box(low=-1, high=1, shape=(2,), dtype=float32) representing [force_x, force_y]. Physics uses Euler integration with velocity damping 0.9. The environment maintains a competency buffer tracking success/failure of last 20 episodes to compute score c ∈ [0,1]. Adaptation mechanism: Obstacle count N = 5 + floor(15*c). Placement policy evolves continuously: (1) Random uniform when c < 0.33; (2) Corridor-blocking when 0.33 ≤ c < 0.66 using k-means clustering on recent agent trajectories to identify high-traffic zones, placing obstacles to minimize passage width; (3) Adversarial placement when c ≥ 0.66 using trajectory distribution analysis to maximize expected path length to goal. Obstacles are static circles with radii 0.3-0.6m. Reward: r_t = -0.1*||pos - goal||_2 - 0.01*||action||^2 + 10*success_flag - 5*collision_flag. Episode terminates on goal reach (distance < 0.5m), collision, or 500 steps. Reset() randomizes start/goal positions (min separation 8m) and regenerates obstacles via current placement policy based on c.

navigationmedium0/8

custom-env

Gymnasium-compatible continuous resource management with 3 interdependent resources (A, B, C). Observation space: Box(low=0, high=100, shape=(15,), dtype=float32): [storage_A, storage_B, storage_C, demand_A, demand_B, demand_C, demand_derivative_A, demand_derivative_B, demand_derivative_C, coupling_AB, coupling_BC, coupling_CA, time_since_shock, rolling_efficiency_score, normalized_step]. Action space: Box(low=0, high=10, shape=(6,), dtype=float32): [produce_A, produce_B, produce_C, convert_A_to_B, convert_B_to_C, convert_C_to_A]. Dynamics: storage_t+1 = storage_t + production + conversion_in - conversion_out - demand_t - waste. Demand follows non-stationary process d_t = d_base + α*sin(ω*t) where ω = ω_base*(1+e) scales with efficiency e ∈ [0,1] (rolling satisfied_demand/total_demand over 100 steps). Shock events occur with probability p = 0.01 + 0.2*max(0, e-0.7). Coupling coefficients C_ij (resource i requires resource j) evolve as C_ij = C_base * e, creating progressive interdependencies. Higher e increases production complexity and demand non-stationarity. Reward: r_t = -sum(|demand_t - satisfied_t|) - 0.5*sum(waste) - 0.01*||action||^2. Episode length: 1000 steps. Reset() initializes storage at 50 units, sets coupling matrix based on performance history (persistence across episodes), and samples new demand phase parameters.

resource_managementmedium8/8

Dynamic Resource Market

A medium-complexity economic simulation where an agent manages a portfolio of 5 resources. The agent observes resource quantities, market prices, and demand levels to make buy/sell/hold decisions. Market conditions exhibit volatility cycles (20-50% fluctuation ranges) and random scarcity/abundance events. The agent must optimize portfolio value while minimizing transaction costs and drawdowns.

financemedium8/8