[POLL] Gazebo init crash frequency

Hi everyone!

Colleagues of mine and I have been experiencing / reporting Gazebo crashes upon execution for quite some time now. I have no knowledge of their Gazebo versions etc. Additionally, I have been having my own problems with NVIDIA, which maybe muddies the waters (is this crash NVIDIA-related or just Gazebo’s? – and so on).

So, I would like to have a rough idea about the frequency with which Gazebo crashes with respect to your experience. For my setup, I would say that roughly 2 out of every 7 roslaunch-es I execute end up dead, with me having to re-execute.

Thanks!

Hi @li9i, I ran into the same issue.
I was running some optimizers that called the simulation to compute some KPIs that would be the input for the cost function (https://github.com/uuvsimulator/uuv_simulation_evaluation)
Some times it was some Nvidia related issue and usually after the first time the error happened, it was even better to reboot and start over (because the simulation was called sometimes 1000 times and eventually too many Nvidia errors came around).
Sometimes though there was some initialization problem that I haven’t figured out where the whole simulation started to kind of becoming unstable from the beginning (the robot would start to oscillate even without its ROS controllers being initialized yet).
For automation of these runs with Gazebo with different parameter sets I made a comparison of the KPIs coming from similar scenarios. What would usually happen is that KPIs would diverge a lot from each other and that meant one or more simulations had ran until the end, but something was wrong. So just to be sure it wasn’t the parameter set making the simulation unstable, I would flag the simulations to be run again.
But it took me forever to figure out how to identify cases where the simulator crashed or initialized with these issues.

My rough estimate is that when I called the optimizer with the simulation for a scenario with a robot with joints with dynamic parameters and contacts happening, 50% of the time something happened. Without contacts or active joints (like the scenarios with underwater vehicles I had), it happened less often, maybe 20%.
Interestingly, by the first optimizer iterations it went pretty smoothly, after probably roughly 300 to 400 calls to start Gazebo, then a couple of simulations started failing. It might be worth mentioning that I started from 2 to 4 instances of Gazebo at the same time at each iteration. 2 was a safe number, 4 instances (on my machine) generated errors way sooner that 300 iterations, and mostly Nvidia related.