custom-env
Gymnasium-compatible continuous 2D navigation environment (10m x 10m arena). Observation space: Box(low=-10, high=10, shape=(14,), dtype=float32) containing [agent_x, agent_y, agent_vx, agent_vy, goal_relative_x, goal_relative_y, 8_lidar_readings (ray cast distances)]. Action space: Box(low=-1, high=1, shape=(2,), dtype=float32) representing [force_x, force_y]. Physics uses Euler integration with velocity damping 0.9. The environment maintains a competency buffer tracking success/failure of last 20 episodes to compute score c ∈ [0,1]. Adaptation mechanism: Obstacle count N = 5 + floor(15*c). Placement policy evolves continuously: (1) Random uniform when c < 0.33; (2) Corridor-blocking when 0.33 ≤ c < 0.66 using k-means clustering on recent agent trajectories to identify high-traffic zones, placing obstacles to minimize passage width; (3) Adversarial placement when c ≥ 0.66 using trajectory distribution analysis to maximize expected path length to goal. Obstacles are static circles with radii 0.3-0.6m. Reward: r_t = -0.1*||pos - goal||_2 - 0.01*||action||^2 + 10*success_flag - 5*collision_flag. Episode terminates on goal reach (distance < 0.5m), collision, or 500 steps. Reset() randomizes start/goal positions (min separation 8m) and regenerates obstacles via current placement policy based on c.
Box(shape=?)
Discrete(shape=?)
see spec