注意
本示例兼容 Gymnasium 1.2.0 版。
加载自定义四足机器人环境¶
在本教程中,你将使用模型文件(以 .xml 结尾)创建一个 MuJoCo 四足机器人行走环境,而无需创建新的类。
步骤
- 获取你的机器人的 MJCF(或 URDF)模型文件。
创建你自己的模型(参见 MuJoCo 指南),或者,
寻找一个现成的模型(在本教程中,我们将使用 MuJoCo Menagerie 集合中的一个模型)。
使用 xml_file 参数加载模型。
- 调整环境参数以获得所需的行为。
调整环境仿真参数。
调整环境终止参数。
调整环境奖励参数。
调整环境观测参数。
训练智能体来移动你的机器人。
# The reader is expected to be familiar with the `Gymnasium` API & library, the basics of robotics,
# and the included `Gymnasium/MuJoCo` environments with the robot model they use.
# Familiarity with the **MJCF** file model format and the `MuJoCo` simulator is not required but is recommended.
设置¶
我们需求 gymnasium>=1.0.0。
import numpy as np
import gymnasium as gym
# Make sure Gymnasium is properly installed
# You can run this in your terminal:
# pip install "gymnasium>=1.0.0"
步骤 0.1 - 下载机器人模型¶
在本教程中,我们将从优秀的 MuJoCo Menagerie 机器人模型集合中加载 Unitree Go1 机器人。Go1 是一种四足机器人,控制它移动是一个重要的学习问题,比 Gymnasium/MuJoCo/Ant 环境难得多。
注意:原始教程包含一张 Unitree Go1 机器人在平坦地形场景中的图片。你可以在以下链接查看此图片:https://github.com/google-deepmind/mujoco_menagerie/blob/main/unitree_go1/go1.png?raw=true
# You can download the whole MuJoCo Menagerie collection (which includes `Go1`):
# git clone https://github.com/google-deepmind/mujoco_menagerie.git
# You can use any other quadruped robot with this tutorial, just adjust the environment parameter values for your robot.
步骤 1 - 加载模型¶
要加载模型,我们只需在 Ant-v5 框架中使用 xml_file 参数。
# Basic loading (uncomment to use)
# env = gym.make('Ant-v5', xml_file='./mujoco_menagerie/unitree_go1/scene.xml')
# Although this is enough to load the model, we will need to tweak some environment parameters
# to get the desired behavior for our environment, so we will also explicitly set the simulation,
# termination, reward and observation arguments, which we will tweak in the next step.
env = gym.make(
"Ant-v5",
xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
forward_reward_weight=0,
ctrl_cost_weight=0,
contact_cost_weight=0,
healthy_reward=0,
main_body=1,
healthy_z_range=(0, np.inf),
include_cfrc_ext_in_observation=True,
exclude_current_positions_from_observation=False,
reset_noise_scale=0,
frame_skip=1,
max_episode_steps=1000,
)
步骤 2 - 调整环境参数¶
调整环境参数对于获得所需的学习行为至关重要。在以下小节中,建议读者查阅参数文档以获取更详细的信息。
步骤 2.1 - 调整环境仿真参数¶
相关的参数是 frame_skip、reset_noise_scale 和 max_episode_steps。
# We want to tweak the `frame_skip` parameter to get `dt` to an acceptable value
# (typical values are `dt` ∈ [0.01, 0.1] seconds),
# Reminder: dt = frame_skip × model.opt.timestep, where `model.opt.timestep` is the integrator
# time step selected in the MJCF model file.
# The `Go1` model we are using has an integrator timestep of `0.002`, so by selecting
# `frame_skip=25` we can set the value of `dt` to `0.05s`.
# To avoid overfitting the policy, `reset_noise_scale` should be set to a value appropriate
# to the size of the robot, we want the value to be as large as possible without the initial
# distribution of states being invalid (`Terminal` regardless of control actions),
# for `Go1` we choose a value of `0.1`.
# And `max_episode_steps` determines the number of steps per episode before `truncation`,
# here we set it to 1000 to be consistent with the based `Gymnasium/MuJoCo` environments,
# but if you need something higher you can set it so.
env = gym.make(
"Ant-v5",
xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
forward_reward_weight=0,
ctrl_cost_weight=0,
contact_cost_weight=0,
healthy_reward=0,
main_body=1,
healthy_z_range=(0, np.inf),
include_cfrc_ext_in_observation=True,
exclude_current_positions_from_observation=False,
reset_noise_scale=0.1, # set to avoid policy overfitting
frame_skip=25, # set dt=0.05
max_episode_steps=1000, # kept at 1000
)
步骤 2.2 - 调整环境终止参数¶
终止对于机器人环境很重要,可以避免采样“无用”的时间步。
# The arguments of interest are `terminate_when_unhealthy` and `healthy_z_range`.
# We want to set `healthy_z_range` to terminate the environment when the robot falls over,
# or jumps really high, here we have to choose a value that is logical for the height of the robot,
# for `Go1` we choose `(0.195, 0.75)`.
# Note: `healthy_z_range` checks the absolute value of the height of the robot,
# so if your scene contains different levels of elevation it should be set to `(-np.inf, np.inf)`
# We could also set `terminate_when_unhealthy=False` to disable termination altogether,
# which is not desirable in the case of `Go1`.
env = gym.make(
"Ant-v5",
xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
forward_reward_weight=0,
ctrl_cost_weight=0,
contact_cost_weight=0,
healthy_reward=0,
main_body=1,
healthy_z_range=(
0.195,
0.75,
), # set to avoid sampling steps where the robot has fallen or jumped too high
include_cfrc_ext_in_observation=True,
exclude_current_positions_from_observation=False,
reset_noise_scale=0.1,
frame_skip=25,
max_episode_steps=1000,
)
# Note: If you need a different termination condition, you can write your own `TerminationWrapper`
# (see the documentation).
步骤 2.3 - 调整环境奖励参数¶
相关的参数是 forward_reward_weight、ctrl_cost_weight、contact_cost_weight、healthy_reward 和 main_body。
# For the arguments `forward_reward_weight`, `ctrl_cost_weight`, `contact_cost_weight` and `healthy_reward`
# we have to pick values that make sense for our robot, you can use the default `MuJoCo/Ant`
# parameters for references and tweak them if a change is needed for your environment.
# In the case of `Go1` we only change the `ctrl_cost_weight` since it has a higher actuator force range.
# For the argument `main_body` we have to choose which body part is the main body
# (usually called something like "torso" or "trunk" in the model file) for the calculation
# of the `forward_reward`, in the case of `Go1` it is the `"trunk"`
# (Note: in most cases including this one, it can be left at the default value).
env = gym.make(
"Ant-v5",
xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
forward_reward_weight=1, # kept the same as the 'Ant' environment
ctrl_cost_weight=0.05, # changed because of the stronger motors of `Go1`
contact_cost_weight=5e-4, # kept the same as the 'Ant' environment
healthy_reward=1, # kept the same as the 'Ant' environment
main_body=1, # represents the "trunk" of the `Go1` robot
healthy_z_range=(0.195, 0.75),
include_cfrc_ext_in_observation=True,
exclude_current_positions_from_observation=False,
reset_noise_scale=0.1,
frame_skip=25,
max_episode_steps=1000,
)
# Note: If you need a different reward function, you can write your own `RewardWrapper`
# (see the documentation).
步骤 2.4 - 调整环境观测参数¶
相关的参数是 include_cfrc_ext_in_observation 和 exclude_current_positions_from_observation。
# Here for `Go1` we have no particular reason to change them.
env = gym.make(
"Ant-v5",
xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
forward_reward_weight=1,
ctrl_cost_weight=0.05,
contact_cost_weight=5e-4,
healthy_reward=1,
main_body=1,
healthy_z_range=(0.195, 0.75),
include_cfrc_ext_in_observation=True, # kept the same as the 'Ant' environment
exclude_current_positions_from_observation=False, # kept the same as the 'Ant' environment
reset_noise_scale=0.1,
frame_skip=25,
max_episode_steps=1000,
)
# Note: If you need additional observation elements (such as additional sensors),
# you can write your own `ObservationWrapper` (see the documentation).
步骤 3 - 训练你的智能体¶
最后,我们完成了,可以使用强化学习(RL)算法训练一个智能体来让 Go1 机器人行走/奔跑。注意:如果你使用自己的机器人模型遵循了本指南,你可能会在训练过程中发现某些环境参数不符合预期,请随时返回步骤 2 进行必要的更改。
def main():
"""Run the final Go1 environment setup."""
# Note: The original tutorial includes an image showing the Go1 robot in the environment.
# The image is available at: https://github.com/Kallinteris-Andreas/Gymnasium-kalli/assets/30759571/bf1797a3-264d-47de-b14c-e3c16072f695
env = gym.make(
"Ant-v5",
xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
forward_reward_weight=1,
ctrl_cost_weight=0.05,
contact_cost_weight=5e-4,
healthy_reward=1,
main_body=1,
healthy_z_range=(0.195, 0.75),
include_cfrc_ext_in_observation=True,
exclude_current_positions_from_observation=False,
reset_noise_scale=0.1,
frame_skip=25,
max_episode_steps=1000,
render_mode="rgb_array", # Change to "human" to visualize
)
# Example of running the environment for a few steps
obs, info = env.reset()
for _ in range(100):
action = env.action_space.sample() # Replace with your agent's action
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
print("Environment tested successfully!")
# Now you would typically:
# 1. Set up your RL algorithm
# 2. Train the agent
# 3. Evaluate the agent's performance
结语¶
你可以按照本指南创建大多数四足环境。要创建类人型/双足机器人,你也可以使用 Gymnasium/MuJoCo/Humnaoid-v5 框架遵循本指南。
注意:原始教程包含一段训练好的 Go1 机器人行走的视频演示。视频显示该机器人根据制造商的数据,最高速度可达 4.7 米/秒。在原始教程中,此视频嵌入自:https://odysee.com/$/embed/@Kallinteris-Andreas:7/video0-step-0-to-step-1000:1?r=6fn5jA9uZQUZXGKVpwtqjz1eyJcS3hj3
# Author: @kallinteris-andreas (https://github.com/Kallinteris-Andreas)