Sub locks up when in depth hold - unarmed (and armed)

This happened in the water but we can reproduce it in the shop. When the sub is powered up but not armed and we enter depth hold the sub seems to lock up. Nearly all the messages from the sub seem to stop and trying to change back to manual mode can take anywhere from 20 seconds to minutes.

Stabilize does not have this problem and the code for each mode is almost exactly the same with the exception of a call to relax_z_controller.

I’m working with tag ArduSub-4.1.0

Hi @Tautala

I’m seeing similar behavior that I can’t reliably reproduce on the bench: many (all?) MAVLink messages seem to stop after a while. The obvious problem during a dive is that the HEARTBEAT messages seem to stop and I have to restart the autopilot in BlueOS. I will try to repro with depth hold + unarmed.

Is this custom firmware built from the ArduSub 4.1.0 tag? Have you tried it with the stable 4.1.0 build from BR? I’m trying to see if my toolchain is the culprit.

Are you running BlueOS? I was thinking it might be related to mavlink-routerd and my endpoint configuration, since that is also restarted, but this might be a red herring.

/Clyde

It was custom built. I just tried the factory default version which is probably 4.0.3 and I did not have the problem. I will see what differs between the two versions.

It is very odd that depth hold and stabilize are so similar code wise yet don’t share this problem.

I’ll try a clean bulld of 4.1.0.

1 Like

OK, I’ve done a lot of testing, and I’m not sure that we have the same problem. But I’ll report my symptoms here and see if anybody else has some insights.

I found a simple repro case:

  • install BlueOS 1.1.0 beta 29
  • install sub firmware for Navigator from the cloud: select DEV binary (I tested sha b57d1712)
  • launch QGC, leave in MANUAL, disarmed (any mode, armed or disarmed)
  • wait 16 minutes
  • QGC will report “lost connection”

QGC complains because the HEARTBEAT messages have stopped arriving, but ArduSub is still running. I’m not seeing any crashes or errors of note reported anywhere.

AFAICT, binaries built from the Sub-4.1 branch work well. It’s only the binaries built from commits near the tip of master that run into trouble.

AFAICT, all binaries (including DEV) run fine on BlueOS 1.0.1. I have seen problems on BlueOS 1.1.0 beta 17, 23, 28 and 29.

Since this is the DEV binary, I’ve ruled out problems with my toolchain. I’m thinking of flashing the SD card to see if there’s some dependency there. Or it could be my Pi, or Navigator, etc.

/Clyde

1 Like

Hi @clyde ,

Thanks a lot for reporting.

Have you noticed if the heartbeat in blueOS still blinks when this happens?
Also some system logs would help a lot in understanding what is going on

. Can you share some system logs?

Thanks for looking at this @williangalvani

The BlueOS heart stops beating at the same time as QGC announces “communication lost.”

I just ran a test and gathered the dataflash (BIN), tlog and all of the text files in the BlueOS File Browser under ‘system_logs’. I’ll DM you the link.

1 Like