Systemd service hanging during startup

We are using a Raspberry Pi 3 from Blue Robotics as the basis for a custom underwater robot being developed. The robot is controlled through a custom python app that we’ve developed. When the app is manually started (logging in through ssh and running the command to start the app) everything works as intended, however, when auto-starting the app through a systemd service we created, the app will sometimes hang indefinitely. The call stack for where the app hangs is roughly below (most recent call last):

  1. mavutil.recv_match() call in our custom app code with blocking set to True
  2. the recv_match() definition in mavutil.py:
            if m is None:
                if blocking:
                    for hook in self.idle_hooks:
                        hook(self)
                    if timeout is None:
                        self.select(0.05) <---------------------
                    else:
                        self.select(timeout/2)
  1. the select() definition in mavutil.py:
        if self.fd is None:
            time.sleep(min(timeout,0.5))
            return True
        try:
            (rin, win, xin) = select.select([self.fd], [], [], timeout)
        except select.error: <--------------
            return False
        return len(rin) == 1

I’ve been able to attach pdb to the background process that our systemd service starts up and it is hanging at the except select.error line.

Through trying various things to get around this, the app still has issues auto-starting and complains about either not being able to communicate with a serial port and afile descriptor not being available. I believe both of these are referencing the Pixhawk connected to the Pi.

The systemd service is set to require and start after the rc.local service since the .companion.rc startup script is run by rc.local. Also we’ve tried just putting a 30 second sleep at the start of our systemd service and that didn’t help. If the systemd service hangs, restarting the service using systemctl seems to fix it. This seems to imply that it’s some kind of race condition and our custom app is starting too soon. However, a 30 second sleep before trying to run our custom app seems like it should be enough time to let any background process finish starting.

I’ve run out of things to try at the moment and was hoping there might be some leads to look into. Thanks in advance for any help!

Hello,

That looks unusual. Is your service trying to connect to the Pixhawk via the Serial port? It looks like it from the error you got.
But the freeze happening in the exception is also interesting.

The process is hanging, but not crashing, right? If if crashes, you could try something like Systemd’s restart option

Have you tried different versions of mavproxy and python?

Hi! Thanks for getting back to me. The service runs our custom app that does connect to the Pixhawk via serial over USB. Yeah, the process is hanging and if I restart the service manually using systemctl it works, but we sometimes hand off the vehicle to non-technical people that don’t know about linux/ssh/systemd and were trying to find a way to make the startup service reliable. i.e. They can power on the vehicle, the service starts, and they can start using our software to control the vehicle.

I have not tried a different version of mavproxy yet, but that sounds like something I should look into. Would using a newer version cause any clashes with the rest of the software stack that came pre-installed on the Pi?

Yeah, the app hanging on the except line is strange. I’ve never seen that happen before and I’m not able to inspect what the exception actually is since it hangs before executing that line :frowning:

Right, changing mavproxy system-wide is likely to cause issues. You could install it into a virtual environment to make sure nothing bad happens.

Using a different python version will probably be safer. I’d try latest mavproxy with Python3.