Unity ML-Agents / deep reinforcement learning

RL Drone Obstacle Course

A quadcopter simulation trained with Unity ML-Agents to navigate an obstacle course using raycast observations, continuous thrust actions, PPO, curriculum learning, behavioral cloning, and GAIL.

Read Report View Demo

What this proves

Autonomous drone behavior trained in randomized simulation.

The project explored Unity as a robotics simulation environment and ML-Agents as a reinforcement-learning sandbox. The agent learned forward progress first, then obstacle avoidance with randomized cubes, human demonstrations, and imitation-learning signals.

Simulator: Unity + NVIDIA PhysX rigid-body simulation
Learning: PPO with curriculum learning, behavioral cloning, and GAIL
Observations: Drone position, velocity, angular velocity, and raycast obstacle distances/normals
Actions: Continuous forward, sideways, and hover thrust commands

Demo artifact

Trained drone policy navigating randomized obstacles.

The final demo recording shows the trained drone behavior and links the result back to the report, agent configuration, and exported policy artifact.

Open Report Download ONNX

Code-level details

Reward design, sensors, and action space.

Raycast Perception

The agent cast two rings of eight rays around the drone and observed normalized hit distance plus surface normal for obstacle awareness.

Continuous Control

ML-Agents actions were clamped to -1..1 and mapped into forward, sideways, and hover thrust inputs on a Rigidbody-based drone controller.

Randomized Obstacles

A cube spawner generated obstacles with randomized positions and sizes, then respawned them at the start of each training episode.

Reward Iteration

The final reward emphasized forward progress, finish-line success, collision penalties, and boundary violations after earlier reward mixtures produced undesirable crashing behavior.

Imitation Learning

Human controller demonstrations were used with behavioral cloning and GAIL to speed training and improve results versus pure PPO attempts.

Training Output

The archive includes a trained ONNX policy, PyTorch checkpoint, ML-Agents YAML config, final report, and Unity C# agent scripts.

Future iteration

A foundation for simulation QA and sim-to-real thinking.

Convert the demo video to MP4 and add training-curve screenshots.
Run policy comparisons across obstacle counts, random seeds, and raycast layouts.
Add automated scenario sweeps for success rate, collisions, and time-to-goal.
Compare Unity ML-Agents against Isaac Sim, Gazebo, or MuJoCo for similar tasks.