| | --- |
| | library_name: ml-agents |
| | tags: |
| | - SnowballTarget |
| | - deep-reinforcement-learning |
| | - reinforcement-learning |
| | - ML-Agents-SnowballTarget |
| | --- |
| | |
| | # **ppo** Agent playing **SnowballTarget** |
| | This is a trained model of a **ppo** agent playing **SnowballTarget** |
| | using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). |
| |
|
| | ## Usage (with ML-Agents) |
| | The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/ |
| |
|
| | We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub: |
| | - A *short tutorial* where you teach Huggy the Dog 🐶 to fetch the stick and then play with him directly in your |
| | browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction |
| | - A *longer tutorial* to understand how works ML-Agents: |
| | https://huggingface.co/learn/deep-rl-course/unit5/introduction |
| |
|
| | ### Resume the training |
| | ```bash |
| | mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume |
| | ``` |
| |
|
| | ### Watch your Agent play |
| | You can watch your agent **playing directly in your browser** |
| |
|
| | 1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity |
| | 2. Step 1: Find your model_id: lambdavi/ppo-SnowballTarget |
| | 3. Step 2: Select your *.nn /*.onnx file |
| | 4. Click on Watch the agent play 👀 |
| | |
| | ### Hyperparams used: |
| | ``` |
| | SnowballTarget: |
| | trainer_type: ppo |
| | hyperparameters: |
| | batch_size: 128 |
| | buffer_size: 2048 |
| | learning_rate: 0.005 |
| | beta: 0.005 |
| | epsilon: 0.2 |
| | lambd: 0.95 |
| | num_epoch: 5 |
| | shared_critic: False |
| | learning_rate_schedule: linear |
| | beta_schedule: linear |
| | epsilon_schedule: linear |
| | checkpoint_interval: 50000 |
| | network_settings: |
| | normalize: False |
| | hidden_units: 256 |
| | num_layers: 2 |
| | vis_encode_type: simple |
| | memory: None |
| | goal_conditioning_type: hyper |
| | deterministic: False |
| | reward_signals: |
| | extrinsic: |
| | gamma: 0.99 |
| | strength: 1.0 |
| | network_settings: |
| | normalize: False |
| | hidden_units: 128 |
| | num_layers: 2 |
| | vis_encode_type: simple |
| | memory: None |
| | goal_conditioning_type: hyper |
| | deterministic: False |
| | init_path: None |
| | keep_checkpoints: 10 |
| | even_checkpoints: False |
| | max_steps: 500000 |
| | time_horizon: 64 |
| | summary_freq: 10000 |
| | threaded: True |
| | self_play: None |
| | behavioral_cloning: None |
| | ``` |