Ping360 parsing recovery - Issue with save format or parser script

Issue

I’m currently trying to process some ping360 scan files, but am running into issues with the python script stopping partway through parsing the file (in multiple files). PingViewer stops briefly at the same location, but is then able to continue on and display the remaining pings.

I’m using logic from the example decoding script provided here by @patrickelectric, and have applied PR #95 to ping-python to continue after errors.

Potential Cause(s)

In my debugging I’ve found that the last message that gets read in has a really long length, which leads me to believe that either ping-viewer is sometimes saving incorrectly but has logic built in to recover from such issues that isn’t used in the python script (I went looking and couldn’t find anything like this), or the unpack_int method in python is somehow being applied incorrectly/using an incorrect format to determine array lengths.

It is particularly confusing that after a complete message is registered and checked in line 143, the loop for that message keeps parsing characters instead of breaking to wait for the next message. It seems like that shouldn’t be necessary if the messages and timestamps are being read correctly from the file, and if there are multiple messages wrapped into one somehow then this would fail to read the relevant timestamp for any after the first one.

As a side note, I also tried reading the .bin files in plain text, and found that some of the timestamps that I can see in the file don’t get read at all by Python, so perhaps it’s reading too much every message or something? That may also be from messages that aren’t data-getters (1300, 2300), but PingViewer definitely shows more times while scanning than python prints (although not sure if PingViewer currently displays real-time clock ticks or just received data message times).

Reproducing

I’ve put two example files here.

File Stop point last message length
20201208-120330402.bin 00:01:07.711 29978416
20201208-113540051.bin 00:04:38.281 15349698

Extra Info

In case it’s relevant, I’m on macOS 11.1 (Big Sur), and the .bin files were created on a Windows 10 machine. They get read in and displayed fine by ping-viewer on both the Windows and Mac machines, and I haven’t yet tried parsing with the script on Windows.

I’m a bit confused that ping-python’s PingMessage class for analysing the messages uses < (little-endianness), while the decode_sensor_binary_log script for getting timestamps and messages from the file uses > (big-endianness), but I imagine this isn’t the cause of the issue as several messages at the start get read in fine.

Note (code error): the binary file structure documentation specifies array-sizes are provided as big-endian uint32s, but the decoding script uses >1i which is big-endian signed 32-bit integers. Unfortunately changing to >1I (or just >I without the redundant 1) has no noticeable effect on the program results, which is expected anyway because none of the message or timestamp arrays should be anywhere near big enough to reach the sign bit.

What now?

I’m happy to provide extra information as required, and help to solve this. Unfortunately I haven’t been able to fix it myself yet, so couldn’t just post a PR with a fix. Wasn’t sure if it made sense to make as an issue in the ping-viewer repo because it’s not necessarily a bug with ping-viewer itself.

1 Like

Correction on this, array lengths are indeed uint32s, but there are values within the header which are int32s, which should be parsed differently.

I’m trying to create a workaround for this that basically involves

if data_array_length > max_data_array_length:
    # assume invalid data received
    # -> read until next timestamp string
    # -> seek back in the file to the start of the timestamp string
    # -> extract and return (timestamp, message) pair starting from here
else:
    # -> extract message
    return timestamp, message

It likely won’t be super fast, but anyone post-processing ping data with python is likely more after accessing the data at all over getting less data quickly, so it seems a reasonable starting point.

In trying to determine the reasonable max_data_array_length I figured I’d try to find the longest possible ping message and set that as the length. I’m figuring a 2301 message with 1200 samples is the longest currently possible, which would end up with max_data_array_length = 1214, or 1220 for future-proofing to allow for 2301 messages. I’m confused though, because in the ping-viewer code there seems to be a message-buffer length of 10240, with no reasoning provided for where that number is from.

Should I work off messages being at max 1220 bytes long, or is there some future reason why 10240 is the number to work with?

So, not sure what exactly is wrong with the files, but my recovery method seems to work well (script here, PR here).

Just to make it more annoying to get working, seems like Windows adds a null byte (\x00) before every data byte (likely as part of the serialisation process when ping-viewer is saving the .bin), so my ‘search for timestamp’ regex, as well as the max message and timestamp lengths have to account for the possibility of an optional null byte every second character.

Anyway, this now reads to the ends of my files, with generally only a couple of recoveries throughout, so I can finally start analysing my data.

2 Likes