Mavlogdump.py crashes while reading DISTANCE_SENSOR msgs in some tlog files

I’m using mavlogdump.py (and other pymavlink-based tools) to read DISTANCE_SENSOR messages sent by BlueOS and logged by QGC, and I noticed that mavlogdump.py crashes when trying to read some tlog files.

  • DISTANCE_SENSOR messages are from the ping service in BlueOS 1.1.0 beta 18
  • QGC 4.2.6
  • pymavlink 2.4.37

Traceback:

$ mavlogdump.py --quiet --types DISTANCE_SENSOR ~/Desktop/sa_tests/2023_04_18_surftrak3/2023-04-18\ 11-03-34.tlog
Traceback (most recent call last):
  File "/home/clyde/.local/bin/mavlogdump.py", line 242, in <module>
    m = mlog.recv_match(blocking=args.follow, type=match_types)
  File "/home/clyde/.local/lib/python3.10/site-packages/pymavlink/mavutil.py", line 1633, in recv_match
    m = self.recv_msg()
  File "/home/clyde/.local/lib/python3.10/site-packages/pymavlink/mavutil.py", line 481, in recv_msg
    self.post_message(msg)
  File "/home/clyde/.local/lib/python3.10/site-packages/pymavlink/mavutil.py", line 1436, in post_message
    super(mavlogfile, self).post_message(msg)
  File "/home/clyde/.local/lib/python3.10/site-packages/pymavlink/mavutil.py", line 378, in post_message
    add_message(self.sysid_state[src_system].messages, type, msg)
  File "/home/clyde/.local/lib/python3.10/site-packages/pymavlink/mavutil.py", line 99, in add_message
    messages[mtype]._instances[instance_value] = msg
TypeError: 'NoneType' object does not support item assignment

So far I’ve encountered 3 corrupt tlog files out of 40. I’m not sure if this is a bug in pymavlink (mavlogdump.py should not crash) or something else. I’ve tried to trace through the pymavlink code but I’m hitting the limits of my MAVLink knowledge. I’m curious if anybody else is seeing this?

/Clyde

Hi @clyde, you have a knack for finding fascinating problems it seems :stuck_out_tongue:

After a bit of sleuthing:

  • MAVLink_message objects have their _instances attribute set to None by default
  • Your error line can only run if a message is received that has an instance field set (msg._instance_field is not None), and is of a type that’s been seen before in this set of messages
  • Your error can only occur if msg._instances is None, which implies the previous time the message type was seen it had a None value for its _instance_field (so was added directly to the messages dictionary via the “simple” case, without bothering to set up an _instances dictionary), but the current time it has a non-None _instance_field value which it then fails to put into the corresponding _instances None object which it expects to be a dictionary

This seems like a failure case that pymavlink could quite easily be robust to, but having a message that sometimes does and sometimes doesn’t have an instance field* specified is also very weird, and likely doesn’t make sense within the MAVLink specification.

*Note: I didn’t know what exactly “instance field” was referring to, but after looking through some of the message definitions it’s about what it sounds like - things like the “compass_id” field for compass messages, and the “name” field for NAMED_VALUE_FLOAT messages so that it’s possible to distinguish between different instances covered by the same broad message type.

Beyond that, as far as I can tell the instance field is a class attribute, so I’m not sure how two messages of the same type could end up with different values for the _instance_field attribute unless they’re not the same class (which could conceivably occur if a message gained an instance field in a more recent MAVLink version, since pymavlink dynamically changes the current MAVLink version depending on the messages it’s receiving/processing).

I tried finding messages that are in both v1.0 and v2.0 but have changed their instance field, and it turns out there are a couple of IMU messages that fit the bill:

from pymavlink.dialects.v10 import ardupilotmega as v10
from pymavlink.dialects.v20 import ardupilotmega as v20

v10_instance_fields = {msg_type: getattr(msg_class, 'instance_field', None)
                       for msg_type, msg_class in vars(v10).items()
                       if msg_type.startswith('MAVLink')}
v20_instance_fields = {msg_type: getattr(msg_class, 'instance_field', None)
                       for msg_type, msg_class in vars(v20).items()
                       if msg_type.startswith('MAVLink')}

for msg_type, old_instance_field in v10_instance_fields:
    new_instance_field = v20_instance_fields[msg_type]
    if old_instance_field != new_instance_field:
        print(f'{msg_type} - v10: {old_instance_field}; v20: {new_instance_field}')

which yielded

MAVLink_raw_imu_message - v10: None; v20: id
MAVLink_highres_imu_message - v10: None; v20: id

I suspect you’re running into something like this, but it’s still not super clear to me why that would be happening, since generally the autopilot should be consistently sending messages from the same MAVLink version. I wonder if there’s a BlueOS service that’s accidentally using MAVLink 1.0…

Out of interest,

  1. Does turning off --quiet allow you to find out which message type it’s failing on?
  2. Are you able to upload one breaking and one ok tlog file?

I suspect the simplest fix for your issue would be changing the mavutil mtype check to also check for _instances, e.g.

if mtype not in messages or messages[mtype]._instances is None:

If it works and you submit that as a PR to pymavlink it’s probably worth including a comment above that line with some context, or at least linking to this discussion in the PR description :slight_smile:

Wow, @EliotBR thank you for the detailed analysis! My MAVLink knowledge is increasing.

With that insight, I traced this w/ a debugger and set up a breakpoint on add_message with condition mtype=='DISTANCE_SENSOR', and it just so happened that the first DISTANCE_SENSOR message was a MAVLink_bad_data object with reason "invalid MAVLink CRC in msgID 132 0x2fb3 should be 0x6d00". As expected, _instance_field was None.

It doesn’t have to be the first message: any MAVLink_bad_data message will blow away the existing _instances list.

I upgraded to the latest pymavlink to 2.4.38 and it somehow “fixed” 3 of the 4 tlog files. One tlog file still throws an exception at the same spot.

I’ll file a bug against pymavlink and reference this conversation.

Thanks again,
/Clyde

P.S. – Exception in mavutil.py after bad_data multi-instance message · Issue #807 · ArduPilot/pymavlink · GitHub

2 Likes