Article

Analysis of Deep RL, Traditional RL and PID Control for Assistive Walker and CartPole Systems

Designed a unified research framework benchmarking the Five controllers - PID, Traditional RL, Deep-RL, PPO on custom Assistive Walker and CartPole systems.

Overview

Github repo

This repository provides a unified research platform for benchmarking classical and modern control strategies — including PID, Q-Learning, PPO, SAC, and DDPG — on two custom robotic systems: an Assistive Walker and a CartPole.
Both systems are modeled using URDF for realistic physics and simulated in PyBullet, with custom Gymnasium environments for reinforcement learning research.

Table of Contents

  • Project Objectives

  • System Architecture

  • Assistive Walker

  • Cart-Pole

  • Custom Environment Creation

  • Control Algorithms

  • Training & Evaluation Pipeline

  • How to Run

  • File Structure

  • Research Insights

  • References

  • License

Project Objectives

  • Develop custom URDF models for both systems, capturing realistic mechanical properties.

  • Implement Gymnasium-compatible environments using PyBullet for physics simulation.

  • Train and benchmark PID, Q-Learning, PPO, SAC, and DDPG controllers.

  • Compare performance using metrics such as episode length, cumulative reward, and stability.

  • Analyze strengths and limitations of each control strategy for both robots.

System Architecture

Layer

Description

URDF Model

Defines robot structure, joints, inertia, friction, and sensors.

PyBullet

Loads URDF, simulates physics, provides state and control APIs.

Environment

Gymnasium-compatible class defining observations, actions, rewards, and episode logic.

RL Algorithm

Agent interacts with environment, learns to optimize reward.

Assistive Walker

Description:
A differential-drive walker with two powered wheels, an assistive handle (pole), and a simulated IMU sensor. Designed for research in stabilization, navigation, and user-adaptive control.

URDF Highlights:

  • Base: Rigid box (4.0 kg), realistic inertia.

  • Wheels: Two, each 0.8 kg, high friction for realistic drive.

  • Pole: 1.2 kg, 1.0 m, revolute joint for handle dynamics.

  • IMU: Simulated MPU6050 providing orientation, angular velocity, and linear acceleration.

Environment:

  • Observation: Pole angle/velocity, base pose, wheel velocities, IMU data.

  • Action:

    • Discrete → {left, right, stop}

    • Continuous → [left_wheel_torque, right_wheel_torque]

  • Reward: Penalizes pole deviation, displacement, and excessive wheel velocity.

  • Termination: Pole falls or walker moves out of bounds.

CartPole

Description:
A classic inverted pendulum system with a sliding cart and pole, implemented with a custom URDF for realistic simulation.

URDF Highlights:

  • Track: Fixed, 30×0.05×0.05 m (visual only).

  • Cart: 0.5×0.5×0.2 m, 4 kg, prismatic joint for horizontal motion.

  • Pole: 1 m, 1 kg, continuous joint for rotation.

  • Friction/Damping: Realistic values for both cart and pole for stable physics.

Environment:

  • Observation: Cart position/velocity, pole angle/velocity.

  • Action:

    • Discrete → {left, right}

    • Continuous → Apply force/torque to cart.

  • Reward: +1 per step pole remains balanced.

  • Termination: Pole falls or cart moves off track.

Custom Environment Creation

Both environments are implemented as Python classes inheriting from gymnasium.Env.

Key Steps

  1. URDF Modeling: Define robot structure and joints.

  2. PyBullet Integration: Load URDF, set up physics simulation.

  3. Observation & Action Spaces: Define what the agent sees and controls.

  4. Reward & Episode Logic: Specify how agents are scored and when episodes end.

  5. Registration: Register with Gymnasium for use in RL pipelines.

Example Usage

# Assistive Walker (Continuous)
from environments.walker import AssistiveWalkerContinuousEnv  
env = AssistiveWalkerContinuousEnv()  
obs = env.reset()  
done = False  
while not done:  
    action = env.action_space.sample()  
    obs, reward, done, truncated, info = env.step(action)  
env.close()  

# CartPole (Continuous)
from environments.cartpole import CartPoleContinuousEnv  
env = CartPoleContinuousEnv()  
obs = env.reset()  
done = False  
while not done:  
    action = env.action_space.sample()  
    obs, reward, done, truncated, info = env.step(action)  
env.close()

Control Algorithms

Algorithm

Type

Action Space

Library

Notes

PID

Classical

Continuous

Custom

Baseline for comparison

Q-Learning

RL (Classic)

Discrete

Stable Baselines3

Value-based, tabular

PPO

Deep RL

Continuous

Stable Baselines3

On-policy, robust, stable

SAC

Deep RL

Continuous

Stable Baselines3

Off-policy, sample-efficient

DDPG

Deep RL

Continuous

Stable Baselines3

Off-policy, deterministic

All RL algorithms are trained and evaluated using Stable Baselines3, with custom wrappers for noise and logging.

Training & Evaluation Pipeline

  1. Configure environment → choose robot and action space.

  2. Select algorithm → PID, Q-Learning, PPO, SAC, or DDPG.

  3. Train → run training loop with chosen hyperparameters.

  4. Evaluate → test trained policy, collect metrics (reward, episode length, stability).

  5. Analyze → compare across algorithms and environments for insights.

Example PPO Config:

policy: MlpPolicy  
learning_rate: 0.0003  
gamma: 0.99  
batch_size: 64  
n_steps: 2048  
total_timesteps: 1000000  
action_noise: 0.1  
wandb_project: assistive-walker-ppo

How to Run

  1. Install Dependencies

pip install gymnasium pybullet stable-baselines3 wandb
  1. Train PPO on Assistive Walker

python train/ppo_trainer.py --config configs/ppo_config.yaml
  1. Train PPO on CartPole

python train/ppo_trainer.py --config configs/cartpole_ppo_config.yaml
  1. Monitor Training
    Use Weights & Biases for live logging and visualization.

File Structure

urdf/
  walker.urdf
  cartpole.urdf
environments/
  walker.py
  cartpole.py
train/
  basetrainer.py
  ppo_trainer.py
  utils/
    callbacks.py
    logger.py
    configloader.py
configs/
  ppo_config.yaml
  cartpole_ppo_config.yaml
README.txt

Research Insights

  • PID: Fast, interpretable, but limited adaptability to nonlinearities and disturbances.

  • Q-Learning: Effective for simple, discrete tasks; doesn’t scale well to high-dimensional or continuous domains.

  • Deep RL (PPO, SAC, DDPG): Superior in complex, noisy, continuous environments; robust to varied initial conditions.

  • IMU Integration (Walker): Improves state estimation and reward shaping for robustness.

  • Realistic Physics (URDF + PyBullet): Ensures learned policies are physically plausible and transferable.

License

Licensed under the MIT License.

Contact

For technical questions or collaboration, open an issue or contact the maintainers.

Create a free website with Framer, the website builder loved by startups, designers and agencies.