“Why can’t we have realtime performance data from the robot the way F1 teams do from their cars?”Kaylin (FTC#9929 Pit Crew Chief and F1 Fan)
This turned out to be an excellent question, which lead to a fun off-season project for our software team.
TLDR; Recent Real-Life Use Case
Let’s start with a robot issue that the team had leading up to our first league meet last weekend, and how realtime performance data collected from the robot made it possible for the team to solve a hardware problem.
Leading up to our first league meet, our robot’s Mecanum drive would not strafe consistently, in tele-op, or during autonomous. The team was able to work around this issue during the league meet. However, they knew that they would like to have this issue ironed out before our next league meet. This is our third season with a Mecanum drive base, so while the software team was pretty confident the issue was not with their code, we could not rule that out, because we’ve built robots before that did not have this issue from a mechanical point of view too.
Our two veteran software team members, Lauren and Calvin put together a quick tele-op that used the kinematics from our regular tele-op that would only allow inputs for strafing to be sent to the drive base. They then put the robot up on some gold minerals left over from Rover Ruckus so that the wheels could spin freely. They then sent different power levels to the drive base and collected the power levels and velocities of each drive motor over time. This is one of the resultant charts:
The two series in each chart are motors that should be turning in the same direction, with the same velocity resulting in a strafe that does not drift. The collected data shows that for the exact same power levels, there is one motor on the robot that both lags in response to power input and never reaches the same velocity as the motor that is diagonally opposite. Based on this data, Lauren inspected the drive train, checking chain tension, ensuring the wheel and sprocket were not binding on anything and then decided to replace the motor and test again.
After the motor was replaced, Lauren and Calvin re-ran the test, and this is the data that was collected, which is looking much better:
We still have to tear down the NR20 Orbital that was removed from the robot to see what is going on internally, but it the team can now move on to testing the robot on the field.
The Technical Details
The FTC SDK itself provides a means to send real-time data to the driver station phone via the Telemetry class, but that data is unstructured, and is not stored anywhere. Android’s logging facilities do store what is logged, but there is a rate limit to the amount of data that can be sent to the logs, and the log itself is unstructured. Our robot OpModes have historically used the log for significant events, but not performance data for this reason.
The team decided to spend some time before SKYSTONE kicked off to research some solutions to this problem, and come up with something they felt could be used to analyze performance of their robot and operators in seasons to come. They knew that the solution should have the following characteristics:
(1) It should be possible to analyze the data in (near) realtime.
(2) Collection and transmission of the data should not have an impact on robot reliability or performance
(3) The data should be persisted for later analysis
(4) The tools used should allow users to ask “what-if” questions easily
(5) The team should not reinvent the wheel, if possible
The Receiving Side
One hint to the solution was given by our experience with using over-the-air debugging between the robot controller and Android Studio. We’ve been using Jeremy Cole’s setup for this since last season, and it greatly improves the code, test, debug, deploy cycle. Because the team knew that it was possible for their development tools to communicate with the robot, they felt the inverse should be true too.
Keeping in mind requirements (1), (2) and (5) the team looked towards an already-existing timeseries transmission standard, which is StatsD. StatsD is a very basic protocol, so it is easy to understand. It is also popular – there are many libraries that implement it, and it uses UDP for transmission, which means it is high performance, and low-impact on the client. We decided to use DataDog’s Java StatsD implementation, but forked it to remove the unix domain socket support. Our fork exists here – https://github.com/HF-Robotics/java-dogstatsd-client. It builds with Maven, and you can add it to your FTC code like any other external library.
The next step was to build a small server (https://github.com/HF-Robotics/metrics) that would run on one of the team’s laptops to receive the StatsD UDP packets, parse them, and then store them in a timeseries database. The team used Netty, and hand-rolled a parser, which currently only supports the Gauge metric type from StatsD.
After parsing the StatsD UDP packets, the server stores them in an InfluxDB timeseries database. The team then uses Grafana to build dashboards to visualize the collected data.
The Robot Side
The software team wired together the StatsD client with a series of collectors that are called once per OpMode loop() to send their data to the server. How this currently looks is here on GitHub – https://github.com/HF-Robotics/SkyStone/tree/master/TeamCode/src/main/java/com/hfrobots/tnt/corelib/metrics
Currently, the robot has instrumentation for the following values:
* All joysticks, buttons and triggers on both gamepads
* Motor power levels
* Motor velocity (encoders ticks/sec)
* Servo positions
As various sensors get added to the robot this season, data from those will be collected as well.
A “Big Data” Problem
As the team added more instrumentation to the robot, they soon noticed an issue:
Today we expanded what our server intakes for data by adding the driver and operator buttons. We used two types of metrics in the code for it. One dealing with ranged inputs and one with on-off buttons.Engineering Notebook Entry – September 6, 2019 – Lauren
One problem we came across was that we took in a lot of data which slowed down the computer we were running the server on. We were taking in 350 KB a second, which is about equivalent to streaming a high def video.
One idea to reduce overload of the computer or server is to only log when the value has changed which would remove all the unchanged zero values that we intake. These unchanged zeros are most of our data intake and we don’t need to log unchanged zeros for all inputs. With the method of only giving data when the value had changed is we would only log if we had a change in value and assume it has remained the same as the last one we received.
Interestingly enough, 350KB/sec of metrics being sent from the phone did not have an impact on robot performance. Ping times at the drivers’ station did not suffer – but it was more data than InfluxDB could handle on the laptops the team uses.
A quick software efficiency aside – the programming team’s laptops are MacBooks vintage 2009, maxed out on RAM, but with traditional hard drives. The robot controller phone is probably more powerful than our developer laptops, in some ways!
While we could get our hands on more powerful hardware, the team felt that it was possible to reduce the amount of data sent from the robot without losing the “interesting” information. This would allow them to instrument more on the robot. What they decided to do was send periodic samples of unchanged values, and send all values around the change of a value. This has worked out well, and reduced the volume of data sent to less than 50KB/s.
The team is looking forward to using this functionality to analyze and optimize how our human operators use the robot throughout the SKYSTONE season. They are also thinking about what they want to instrument to determine how their autonomous programs are making decisions about what to do, when.
During the next off-season, they’d like to clean up, package, and publish what they’ve built so that it is easily used by other teams.
If you have any interest in this topic, please feel free to leave a comment. If we can find some time later in the season, we’d like to add more of a “How to” component to this post.