Reinforcement Learning using Gazebo

Hi,

I’m new to Gazebo Fortress and my company is working on a Reinforcement Learning algorithm. We were using Unity to simulate our agent but we’ve reached the limitations of the game engine.

I was able to setup our robot and a world.

In order for RL to work, we need rewards and penalties, from what I understand the TriggeredPublisher plugin could be used to send a message on a topic when a collision with a reward is detected.

The issue I’m now faced with is how to remove rewards/penalties from the world when its touched by the robot?

Would anyone happen to have an idea?

1 Like

Can’t you also subscribe positions of the robot and make the reward masking in your processing code?

Otherwise, you could try using the breadcrumbs system like in subt/cave_circuit_practice_01.sdf at master · osrf/subt · GitHub and https://app.gazebosim.org/OpenRobotics/fuel/models/Medium%20Rock%20Fall . The breadcrumb plugin has a config option telling how many deployments are allowed, and publishes a ~/remaining topic with the remaining number of deployments.

Sounds very interesting. Could you maybe write up a bit more about this?

1 Like

I found a way to make it work by compiling Gazebo from source to take advantage of the new feature of the TriggeredPublisher which allows to call a service gz-sim/pull/1611. By combining the contact sensor, the TouchPlugin and the TriggeredPublisher. I’m able to detect the collision with a reward model and then send a service request to delete the reward model from the world.

I’d post a video of the result, but I can’t upload it since I’m a new user.

[Edit: I can now add the video]

1 Like

If it can be useful to anyone, a typical sphere reward would be defined like this:

<model name="reward_yellow">
      <pose>0 -2 0.05 0 0 0</pose>
      <static>true</static>
      <link name="link">
        <visual name="v2">
          <geometry>
            <sphere>
              <radius>0.05</radius>
            </sphere>
          </geometry>
          <material>
            <ambiant>1 1 0 1</ambiant>
            <diffuse>1 1 0 1</diffuse>
            <specular>1 1 0 1</specular>
            <emissive>1 1 0 1</emissive>
        </material>
        </visual>
        <collision name="c2">
          <geometry>
            <cylinder>
              <radius>0.10</radius>
              <length>0.5</length>
            </cylinder>
          </geometry>
        </collision>
        <sensor name='sensor_contact' type='contact'>
          <contact>
            <collision>c2</collision>
          </contact>
        </sensor>
      </link>
      <plugin filename="libignition-gazebo-touchplugin-system.so" name="ignition::gazebo::systems::TouchPlugin">
        <target>rover</target>
        <namespace>reward_yellow</namespace>
        <time>0.001</time>
        <enabled>true</enabled>
      </plugin>
      <plugin filename="libignition-gazebo-triggered-publisher-system.so" name="ignition::gazebo::systems::TriggeredPublisher">
        <input type="ignition.msgs.Boolean" topic="/reward_yellow/touched">
          <match>data: true</match>
        </input>
        <service name="/world/demo/remove" reqType="ignition.msgs.Entity" repType="ignition.msgs.Boolean" timeout="1000" reqMsg='name: "reward_yellow", type: 2'></service>
      </plugin>
    </model>
1 Like

Hello Alex_SSoM,

Maybe you have seen it already, but there is Gym-Ignition project that provides a programmatic Python interface for ~Ignition~ Gazebo with a focus on Reinforcement Learning. I have used it for Gazebo Fortress, and it works fine for my purposes. However, it might not work with Gazebo Garden straight away due to a different name and other breaking changes.

Collisions can be checked programmatically (by listing per-object/link contacts). If you want to give a reward upon reaching a target position in 2D/3D, you can also just compute the distance to the target and apply a threshold if desired.

By utilising the programmatic interface of the Gazebo server directly, it reduces some transport overhead while also increasing determinism by avoiding the stochastic nature of socket-based communication. A reliable interface might not be provided for all features, in which case you can still fallback on Gazebo Transport and/or ROS (2).