Article
Analysis of Deep RL, Traditional RL and PID Control for Assistive Walker and CartPole Systems
Designed a unified research framework benchmarking the Five controllers - PID, Traditional RL, Deep-RL, PPO on custom Assistive Walker and CartPole systems.
Overview
This repository provides a unified research platform for benchmarking classical and modern control strategies — including PID, Q-Learning, PPO, SAC, and DDPG — on two custom robotic systems: an Assistive Walker and a CartPole.
Both systems are modeled using URDF for realistic physics and simulated in PyBullet, with custom Gymnasium environments for reinforcement learning research.
Table of Contents
Project Objectives
System Architecture
Assistive Walker
Cart-Pole
Custom Environment Creation
Control Algorithms
Training & Evaluation Pipeline
How to Run
File Structure
Research Insights
References
License
Project Objectives
Develop custom URDF models for both systems, capturing realistic mechanical properties.
Implement Gymnasium-compatible environments using PyBullet for physics simulation.
Train and benchmark PID, Q-Learning, PPO, SAC, and DDPG controllers.
Compare performance using metrics such as episode length, cumulative reward, and stability.
Analyze strengths and limitations of each control strategy for both robots.
System Architecture
Layer | Description |
|---|---|
URDF Model | Defines robot structure, joints, inertia, friction, and sensors. |
PyBullet | Loads URDF, simulates physics, provides state and control APIs. |
Environment | Gymnasium-compatible class defining observations, actions, rewards, and episode logic. |
RL Algorithm | Agent interacts with environment, learns to optimize reward. |
Assistive Walker
Description:
A differential-drive walker with two powered wheels, an assistive handle (pole), and a simulated IMU sensor. Designed for research in stabilization, navigation, and user-adaptive control.

URDF Highlights:
Base: Rigid box (4.0 kg), realistic inertia.
Wheels: Two, each 0.8 kg, high friction for realistic drive.
Pole: 1.2 kg, 1.0 m, revolute joint for handle dynamics.
IMU: Simulated MPU6050 providing orientation, angular velocity, and linear acceleration.
Environment:
Observation: Pole angle/velocity, base pose, wheel velocities, IMU data.
Action:
Discrete → {left, right, stop}
Continuous → [left_wheel_torque, right_wheel_torque]
Reward: Penalizes pole deviation, displacement, and excessive wheel velocity.
Termination: Pole falls or walker moves out of bounds.
CartPole
Description:
A classic inverted pendulum system with a sliding cart and pole, implemented with a custom URDF for realistic simulation.
URDF Highlights:
Track: Fixed, 30×0.05×0.05 m (visual only).
Cart: 0.5×0.5×0.2 m, 4 kg, prismatic joint for horizontal motion.
Pole: 1 m, 1 kg, continuous joint for rotation.
Friction/Damping: Realistic values for both cart and pole for stable physics.
Environment:
Observation: Cart position/velocity, pole angle/velocity.
Action:
Discrete → {left, right}
Continuous → Apply force/torque to cart.
Reward: +1 per step pole remains balanced.
Termination: Pole falls or cart moves off track.
Custom Environment Creation
Both environments are implemented as Python classes inheriting from gymnasium.Env.
Key Steps
URDF Modeling: Define robot structure and joints.
PyBullet Integration: Load URDF, set up physics simulation.
Observation & Action Spaces: Define what the agent sees and controls.
Reward & Episode Logic: Specify how agents are scored and when episodes end.
Registration: Register with Gymnasium for use in RL pipelines.
Example Usage
Control Algorithms
Algorithm | Type | Action Space | Library | Notes |
|---|---|---|---|---|
PID | Classical | Continuous | Custom | Baseline for comparison |
Q-Learning | RL (Classic) | Discrete | Stable Baselines3 | Value-based, tabular |
PPO | Deep RL | Continuous | Stable Baselines3 | On-policy, robust, stable |
SAC | Deep RL | Continuous | Stable Baselines3 | Off-policy, sample-efficient |
DDPG | Deep RL | Continuous | Stable Baselines3 | Off-policy, deterministic |
All RL algorithms are trained and evaluated using Stable Baselines3, with custom wrappers for noise and logging.
Training & Evaluation Pipeline
Configure environment → choose robot and action space.
Select algorithm → PID, Q-Learning, PPO, SAC, or DDPG.
Train → run training loop with chosen hyperparameters.
Evaluate → test trained policy, collect metrics (reward, episode length, stability).
Analyze → compare across algorithms and environments for insights.
Example PPO Config:
How to Run
Install Dependencies
Train PPO on Assistive Walker
Train PPO on CartPole
Monitor Training
Use Weights & Biases for live logging and visualization.
File Structure
Research Insights
PID: Fast, interpretable, but limited adaptability to nonlinearities and disturbances.
Q-Learning: Effective for simple, discrete tasks; doesn’t scale well to high-dimensional or continuous domains.
Deep RL (PPO, SAC, DDPG): Superior in complex, noisy, continuous environments; robust to varied initial conditions.
IMU Integration (Walker): Improves state estimation and reward shaping for robustness.
Realistic Physics (URDF + PyBullet): Ensures learned policies are physically plausible and transferable.
License
Licensed under the MIT License.
Contact
For technical questions or collaboration, open an issue or contact the maintainers.
