Go back

Article

Analysis of Deep RL, Traditional RL and PID Control for Assistive Walker and CartPole Systems

Designed a unified research framework benchmarking the Five controllers - PID, Traditional RL, Deep-RL, PPO on custom Assistive Walker and CartPole systems.

Overview

Github repo

This repository provides a unified research platform for benchmarking classical and modern control strategies — including PID, Q-Learning, PPO, SAC, and DDPG — on two custom robotic systems: an Assistive Walker and a CartPole.
Both systems are modeled using URDF for realistic physics and simulated in PyBullet, with custom Gymnasium environments for reinforcement learning research.

Project Objectives
System Architecture
Assistive Walker
Cart-Pole
Custom Environment Creation
Control Algorithms
Training & Evaluation Pipeline
How to Run
File Structure
Research Insights
References
License

Project Objectives

Develop custom URDF models for both systems, capturing realistic mechanical properties.
Implement Gymnasium-compatible environments using PyBullet for physics simulation.
Train and benchmark PID, Q-Learning, PPO, SAC, and DDPG controllers.
Compare performance using metrics such as episode length, cumulative reward, and stability.
Analyze strengths and limitations of each control strategy for both robots.

System Architecture

Layer	Description
URDF Model	Defines robot structure, joints, inertia, friction, and sensors.
PyBullet	Loads URDF, simulates physics, provides state and control APIs.
Environment	Gymnasium-compatible class defining observations, actions, rewards, and episode logic.
RL Algorithm	Agent interacts with environment, learns to optimize reward.

Assistive Walker

Description:
A differential-drive walker with two powered wheels, an assistive handle (pole), and a simulated IMU sensor. Designed for research in stabilization, navigation, and user-adaptive control.

URDF Highlights:

Base: Rigid box (4.0 kg), realistic inertia.
Wheels: Two, each 0.8 kg, high friction for realistic drive.
Pole: 1.2 kg, 1.0 m, revolute joint for handle dynamics.
IMU: Simulated MPU6050 providing orientation, angular velocity, and linear acceleration.

Environment:

Observation: Pole angle/velocity, base pose, wheel velocities, IMU data.
Action:
- Discrete → {left, right, stop}
- Continuous → [left_wheel_torque, right_wheel_torque]
Reward: Penalizes pole deviation, displacement, and excessive wheel velocity.
Termination: Pole falls or walker moves out of bounds.

CartPole

Description:
A classic inverted pendulum system with a sliding cart and pole, implemented with a custom URDF for realistic simulation.

URDF Highlights:

Track: Fixed, 30×0.05×0.05 m (visual only).
Cart: 0.5×0.5×0.2 m, 4 kg, prismatic joint for horizontal motion.
Pole: 1 m, 1 kg, continuous joint for rotation.
Friction/Damping: Realistic values for both cart and pole for stable physics.

Environment:

Observation: Cart position/velocity, pole angle/velocity.
Action:
- Discrete → {left, right}
- Continuous → Apply force/torque to cart.
Reward: +1 per step pole remains balanced.
Termination: Pole falls or cart moves off track.

Custom Environment Creation

Both environments are implemented as Python classes inheriting from gymnasium.Env.

Key Steps

URDF Modeling: Define robot structure and joints.
PyBullet Integration: Load URDF, set up physics simulation.
Observation & Action Spaces: Define what the agent sees and controls.
Reward & Episode Logic: Specify how agents are scored and when episodes end.
Registration: Register with Gymnasium for use in RL pipelines.

Example Usage

# Assistive Walker (Continuous)
from environments.walker import AssistiveWalkerContinuousEnv  
env = AssistiveWalkerContinuousEnv()  
obs = env.reset()  
done = False  
while not done:  
    action = env.action_space.sample()  
    obs, reward, done, truncated, info = env.step(action)  
env.close()  

# CartPole (Continuous)
from environments.cartpole import CartPoleContinuousEnv  
env = CartPoleContinuousEnv()  
obs = env.reset()  
done = False  
while not done:  
    action = env.action_space.sample()  
    obs, reward, done, truncated, info = env.step(action)  
env.close()

Control Algorithms

Algorithm	Type	Action Space	Library	Notes
PID	Classical	Continuous	Custom	Baseline for comparison
Q-Learning	RL (Classic)	Discrete	Stable Baselines3	Value-based, tabular
PPO	Deep RL	Continuous	Stable Baselines3	On-policy, robust, stable
SAC	Deep RL	Continuous	Stable Baselines3	Off-policy, sample-efficient
DDPG	Deep RL	Continuous	Stable Baselines3	Off-policy, deterministic

All RL algorithms are trained and evaluated using Stable Baselines3, with custom wrappers for noise and logging.

Training & Evaluation Pipeline

Configure environment → choose robot and action space.
Select algorithm → PID, Q-Learning, PPO, SAC, or DDPG.
Train → run training loop with chosen hyperparameters.
Evaluate → test trained policy, collect metrics (reward, episode length, stability).
Analyze → compare across algorithms and environments for insights.

Example PPO Config:

policy: MlpPolicy  
learning_rate: 0.0003  
gamma: 0.99  
batch_size: 64  
n_steps: 2048  
total_timesteps: 1000000  
action_noise: 0.1  
wandb_project: assistive-walker-ppo

How to Run

Install Dependencies

pip install gymnasium pybullet stable-baselines3 wandb

Train PPO on Assistive Walker

python train/ppo_trainer.py --config configs/ppo_config.yaml

Train PPO on CartPole

python train/ppo_trainer.py --config configs/cartpole_ppo_config.yaml

Monitor Training
Use Weights & Biases for live logging and visualization.

File Structure

urdf/
  walker.urdf
  cartpole.urdf
environments/
  walker.py
  cartpole.py
train/
  basetrainer.py
  ppo_trainer.py
  utils/
    callbacks.py
    logger.py
    configloader.py
configs/
  ppo_config.yaml
  cartpole_ppo_config.yaml
README.txt

Research Insights

PID: Fast, interpretable, but limited adaptability to nonlinearities and disturbances.
Q-Learning: Effective for simple, discrete tasks; doesn’t scale well to high-dimensional or continuous domains.
Deep RL (PPO, SAC, DDPG): Superior in complex, noisy, continuous environments; robust to varied initial conditions.
IMU Integration (Walker): Improves state estimation and reward shaping for robustness.
Realistic Physics (URDF + PyBullet): Ensures learned policies are physically plausible and transferable.

License

Licensed under the MIT License.

Contact

For technical questions or collaboration, open an issue or contact the maintainers.