Back to Catalog

non-stationary-cartpole

A CartPole environment with continuously drifting physical parameters (pole length and mass) to test adaptation to non-stationary dynamics. Parameters evolve via configurable schedules (sinusoidal, random walk, or abrupt steps). Observation space optionally includes temporal awareness features (sin/cos of phase) to help the agent anticipate parameter changes.

Domain

classic_control

Difficulty

medium

Observation

Box(shape=[6])

Action

Discrete(shape=[1])

Reward

dense

Max Steps

1000

Version

v1

Tests (1/8)

syntaximportresetstepobs_spaceaction_spacereward_sanitydeterminism

Use via API

import kualia env = kualia.make("non-stationary-cartpole") obs, info = env.reset()

Environment Code

1477 chars
import gymnasium as gym
import numpy as np
from typing import Optional, Dict, Any


class NonStationaryCartPoleEnv(gym.Env):
    """
    A non-stationary CartPole environment where pole length and mass drift over time.
    
    Observation Space:
        Box(6,) if include_time_features=True: 
            [cart_pos_norm, cart_vel_norm, pole_angle_norm, pole_vel_norm, sin_phase, cos_phase]
        Box(4,) if include_time_features=False:
            [cart_pos_norm, cart_vel_norm, pole_angle_norm, pole_vel_norm]
        All values normalized to [-1, 1].
    
    Action Space:
        Discrete(2): 0 = push left (force = -FORCE_MAG), 1 = push right (force = +FORCE_MAG)
        
    Reward:
        +1.0 for each step survived (dense), clipped to [-10, 10].
        
    Dynamics:
        Physical parameters evolve according to drift_mode and drift_schedule:
        - gradual + sinusoidal: length = base + amp * sin(2*pi*drift_rate*t)
        - gradual + random_walk: Brownian motion with mean reversion
        - sudden + step: discrete jumps every (1/drift_rate) steps
    """
    
    # Physics constants
    GRAVITY: float = 9.8
    CART_MASS: float = 1.0
    FORCE_MAG: float = 10.0
    TAU: float = 0.02  # Time step duration (seconds)
    
    # Termination thresholds
    THETA_THRESHOLD_RADIANS: float = 12 * 2 * np.pi / 360  # ~0.2094 rad
    X_THRESHOLD: float = 2.4
    
    # Base physical parameters
    BASE_POLE_LENGTH: float = 0.5  # meters
    BASE_POLE_