加载自定义四足机器人环境¶

在本教程中，你将使用模型文件（以 .xml 结尾）创建一个 MuJoCo 四足机器人行走环境，而无需创建新的类。

步骤

获取你的机器人的 MJCF（或 URDF）模型文件。
- 创建你自己的模型（参见 MuJoCo 指南），或者，
- 寻找一个现成的模型（在本教程中，我们将使用 MuJoCo Menagerie 集合中的一个模型）。
使用 xml_file 参数加载模型。
调整环境参数以获得所需的行为。
1. 调整环境仿真参数。
2. 调整环境终止参数。
3. 调整环境奖励参数。
4. 调整环境观测参数。
训练智能体来移动你的机器人。

# The reader is expected to be familiar with the `Gymnasium` API & library, the basics of robotics,
# and the included `Gymnasium/MuJoCo` environments with the robot model they use.
# Familiarity with the **MJCF** file model format and the `MuJoCo` simulator is not required but is recommended.

设置¶

我们需求 gymnasium>=1.0.0。

import numpy as np

import gymnasium as gym


# Make sure Gymnasium is properly installed
# You can run this in your terminal:
# pip install "gymnasium>=1.0.0"

步骤 0.1 - 下载机器人模型¶

在本教程中，我们将从优秀的 MuJoCo Menagerie 机器人模型集合中加载 Unitree Go1 机器人。Go1 是一种四足机器人，控制它移动是一个重要的学习问题，比 Gymnasium/MuJoCo/Ant 环境难得多。

注意：原始教程包含一张 Unitree Go1 机器人在平坦地形场景中的图片。你可以在以下链接查看此图片：https://github.com/google-deepmind/mujoco_menagerie/blob/main/unitree_go1/go1.png?raw=true

# You can download the whole MuJoCo Menagerie collection (which includes `Go1`):
# git clone https://github.com/google-deepmind/mujoco_menagerie.git

# You can use any other quadruped robot with this tutorial, just adjust the environment parameter values for your robot.

步骤 1 - 加载模型¶

要加载模型，我们只需在 Ant-v5 框架中使用 xml_file 参数。

# Basic loading (uncomment to use)
# env = gym.make('Ant-v5', xml_file='./mujoco_menagerie/unitree_go1/scene.xml')

# Although this is enough to load the model, we will need to tweak some environment parameters
# to get the desired behavior for our environment, so we will also explicitly set the simulation,
# termination, reward and observation arguments, which we will tweak in the next step.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0, np.inf),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0,
    frame_skip=1,
    max_episode_steps=1000,
)

步骤 2 - 调整环境参数¶

调整环境参数对于获得所需的学习行为至关重要。在以下小节中，建议读者查阅参数文档以获取更详细的信息。

步骤 2.1 - 调整环境仿真参数¶

相关的参数是 frame_skip、reset_noise_scale 和 max_episode_steps。

# We want to tweak the `frame_skip` parameter to get `dt` to an acceptable value
# (typical values are `dt` ∈ [0.01, 0.1] seconds),

# Reminder: dt = frame_skip × model.opt.timestep, where `model.opt.timestep` is the integrator
# time step selected in the MJCF model file.

# The `Go1` model we are using has an integrator timestep of `0.002`, so by selecting
# `frame_skip=25` we can set the value of `dt` to `0.05s`.

# To avoid overfitting the policy, `reset_noise_scale` should be set to a value appropriate
# to the size of the robot, we want the value to be as large as possible without the initial
# distribution of states being invalid (`Terminal` regardless of control actions),
# for `Go1` we choose a value of `0.1`.

# And `max_episode_steps` determines the number of steps per episode before `truncation`,
# here we set it to 1000 to be consistent with the based `Gymnasium/MuJoCo` environments,
# but if you need something higher you can set it so.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(0, np.inf),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,  # set to avoid policy overfitting
    frame_skip=25,  # set dt=0.05
    max_episode_steps=1000,  # kept at 1000
)

步骤 2.2 - 调整环境终止参数¶

终止对于机器人环境很重要，可以避免采样“无用”的时间步。

# The arguments of interest are `terminate_when_unhealthy` and `healthy_z_range`.

# We want to set `healthy_z_range` to terminate the environment when the robot falls over,
# or jumps really high, here we have to choose a value that is logical for the height of the robot,
# for `Go1` we choose `(0.195, 0.75)`.
# Note: `healthy_z_range` checks the absolute value of the height of the robot,
# so if your scene contains different levels of elevation it should be set to `(-np.inf, np.inf)`

# We could also set `terminate_when_unhealthy=False` to disable termination altogether,
# which is not desirable in the case of `Go1`.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=0,
    ctrl_cost_weight=0,
    contact_cost_weight=0,
    healthy_reward=0,
    main_body=1,
    healthy_z_range=(
        0.195,
        0.75,
    ),  # set to avoid sampling steps where the robot has fallen or jumped too high
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

# Note: If you need a different termination condition, you can write your own `TerminationWrapper`
# (see the documentation).

步骤 2.3 - 调整环境奖励参数¶

相关的参数是 forward_reward_weight、ctrl_cost_weight、contact_cost_weight、healthy_reward 和 main_body。

# For the arguments `forward_reward_weight`, `ctrl_cost_weight`, `contact_cost_weight` and `healthy_reward`
# we have to pick values that make sense for our robot, you can use the default `MuJoCo/Ant`
# parameters for references and tweak them if a change is needed for your environment.
# In the case of `Go1` we only change the `ctrl_cost_weight` since it has a higher actuator force range.

# For the argument `main_body` we have to choose which body part is the main body
# (usually called something like "torso" or "trunk" in the model file) for the calculation
# of the `forward_reward`, in the case of `Go1` it is the `"trunk"`
# (Note: in most cases including this one, it can be left at the default value).

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=1,  # kept the same as the 'Ant' environment
    ctrl_cost_weight=0.05,  # changed because of the stronger motors of `Go1`
    contact_cost_weight=5e-4,  # kept the same as the 'Ant' environment
    healthy_reward=1,  # kept the same as the 'Ant' environment
    main_body=1,  # represents the "trunk" of the `Go1` robot
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,
    exclude_current_positions_from_observation=False,
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)

# Note: If you need a different reward function, you can write your own `RewardWrapper`
# (see the documentation).

步骤 2.4 - 调整环境观测参数¶

相关的参数是 include_cfrc_ext_in_observation 和 exclude_current_positions_from_observation。

# Here for `Go1` we have no particular reason to change them.

env = gym.make(
    "Ant-v5",
    xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
    forward_reward_weight=1,
    ctrl_cost_weight=0.05,
    contact_cost_weight=5e-4,
    healthy_reward=1,
    main_body=1,
    healthy_z_range=(0.195, 0.75),
    include_cfrc_ext_in_observation=True,  # kept the same as the 'Ant' environment
    exclude_current_positions_from_observation=False,  # kept the same as the 'Ant' environment
    reset_noise_scale=0.1,
    frame_skip=25,
    max_episode_steps=1000,
)


# Note: If you need additional observation elements (such as additional sensors),
# you can write your own `ObservationWrapper` (see the documentation).

步骤 3 - 训练你的智能体¶

最后，我们完成了，可以使用强化学习（RL）算法训练一个智能体来让 Go1 机器人行走/奔跑。注意：如果你使用自己的机器人模型遵循了本指南，你可能会在训练过程中发现某些环境参数不符合预期，请随时返回步骤 2 进行必要的更改。

def main():
    """Run the final Go1 environment setup."""
    # Note: The original tutorial includes an image showing the Go1 robot in the environment.
    # The image is available at: https://github.com/Kallinteris-Andreas/Gymnasium-kalli/assets/30759571/bf1797a3-264d-47de-b14c-e3c16072f695

    env = gym.make(
        "Ant-v5",
        xml_file="./mujoco_menagerie/unitree_go1/scene.xml",
        forward_reward_weight=1,
        ctrl_cost_weight=0.05,
        contact_cost_weight=5e-4,
        healthy_reward=1,
        main_body=1,
        healthy_z_range=(0.195, 0.75),
        include_cfrc_ext_in_observation=True,
        exclude_current_positions_from_observation=False,
        reset_noise_scale=0.1,
        frame_skip=25,
        max_episode_steps=1000,
        render_mode="rgb_array",  # Change to "human" to visualize
    )

    # Example of running the environment for a few steps
    obs, info = env.reset()

    for _ in range(100):
        action = env.action_space.sample()  # Replace with your agent's action
        obs, reward, terminated, truncated, info = env.step(action)

        if terminated or truncated:
            obs, info = env.reset()

    env.close()
    print("Environment tested successfully!")

    # Now you would typically:
    # 1. Set up your RL algorithm
    # 2. Train the agent
    # 3. Evaluate the agent's performance

结语¶

你可以按照本指南创建大多数四足环境。要创建类人型/双足机器人，你也可以使用 Gymnasium/MuJoCo/Humnaoid-v5 框架遵循本指南。

注意：原始教程包含一段训练好的 Go1 机器人行走的视频演示。视频显示该机器人根据制造商的数据，最高速度可达 4.7 米/秒。在原始教程中，此视频嵌入自：https://odysee.com/$/embed/@Kallinteris-Andreas:7/video0-step-0-to-step-1000:1?r=6fn5jA9uZQUZXGKVpwtqjz1eyJcS3hj3

# Author: @kallinteris-andreas (https://github.com/Kallinteris-Andreas)