Research on ROV visual autonomous obstacle avoidance navigation based on deep learning

Yup!

Nope!

How to use python script to control ROV navigation using Blue os system ? ( My goal is based on the visual ROV obstacle avoidance, if feasible, I want to use Blue os to extract OAK parsing video stream, and then through the path planning and control the ROV course ( up, down, left, right, rotation, acceleration, deceleration ) ) )

Hi @akang-211 -
You’ll need to develop a significant amount of software to make that work. Generally

  1. Python script receives and processes the Oak depth stream, identifying obstacles or making whatever decisions to create the desired “pilot” output.
  2. The script sends control signals to the autopilot via pymavlink
  3. The vehicle moves, and 1&2 are repeated as desired…

I have installed the rpi4, but I still can’t install the oakd extension. What’s the situation?

Hi @akang-211 -
It looks like you are being blocked from reaching github. This can happen in some countries, or if your IP address has been blocked. I’d recommend trying from another source of internet, like a cell-phone hotspot.

Are you able to install any other extensions?

I can’t tell what version of BlueOS you’re running from the text in the lower left - is it not a standard stable or beta version?

I can’t install any extensions, I have Blue os 1.3.1 installed.

Hi @akang-211
The issue is likely your internet connection / IP being blocked. Can you try another source of internet?

Please verify from the terminal that you can ping the internet, like the google dns server:

ping 8.8.8.8

Hi @akang-211 -
It looks like you have internet access, but I still suspect a firewall is blocking you from reaching github. Can you try another source of internet?

Why does the video stream disappear after upgrading cockpit?



I have installed the OAK extension, how can I get the video stream of Oak?

I can see the video streams now. The RGB video stream of the OAK camera is displayed on my cockpit client, and the parallax video stream is displayed on the browser’s Blue os. How can I make them display together on the cockpit client?


Hi @akang-211 -
Glad you resolved your video / configuration issues. You can learn more about how to configure extra video widgets here, and you can setup additional views to have different full screen or picture in picture configurations of the video widgets - see the earlier portions of that video.


Thank you for your reply. I have successfully displayed three video streams in one window. Next, I want to run Depthai with OAK camera. I already have the code of depthai on my computer (and it has been implemented with OAK camera before). How can I use OAK camera to display the following pictures on my computer?

Hi @akang-211 -
You may find that you need to calibrate the camera once immersed, if the depth image ceases to function well.

It’s tough to recommend the best way to run the software you share without any links to an install process or description of what is being run - that looks like a windows program?

I googled depthai, and assuming your talking about the API provided by the camera manufacturer, it appears they have install instructions for Ubuntu. Setting up a github repository to build an extension, and adjusting the dockerfile to install the necessary dependencies and execute this install code can give you that software running on BlueOS. Fork this repository to get started! Then create a docker hub account and follow along with the extension documentation. If it makes the video streams you pictures available on a port it should be possible to embed this in Cockpit via an iFrame widget!

This thread may be of interest, to see and refer to what others have done previously.

If you’re wanting to draw custom annotations on the image then you’ll need to either do that on the camera/RPi side and encode the annotated images into the video stream that gets sent to the topside, or send the annotation data separately and use it to draw the desired annotations at the receiving end.

The latter approach is likely preferable in the long term, although it could be somewhat complicated at the moment in a software like Cockpit, because it’s currently lacking dedicated widget overlay support. That said, it should already be possible for an OAK camera BlueOS Extension to provide an annotation-drawing widget to Cockpit as an iframe with a clear background, which could then be manually moved and resized to fit over the relevant video stream widget :slight_smile:

# coding=utf-8
from datetime import datetime
from pathlib import Path

import cv2
import depthai as dai
import numpy as np
from utils import getDeviceInfo, non_max_suppression, FPSHandler, process_mask, toTensorResult

ROOT = Path(__file__).parent


# 配置参数
nnWidth, nnHeight = 320, 320
rgbWeight = 0.4
MAX_DEPTH = 9000
MIN_DEPTH = 200

labelMap = [
    "person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train",
    "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
    "bird", "cat", "dog", "horse", "sheep", "cow", "elephant",
    "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie",
    "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
    "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
    "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich",
    "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
    "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor",
    "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven",
    "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
    "teddy bear", "hair drier", "toothbrush"
]

def create_pipeline():
    blob = ROOT.joinpath("yolov8n-seg.blob")
    model = dai.OpenVINO.Blob(blob)
    dim = next(iter(model.networkInputs.values())).dims
    nnWidth, nnHeight = dim[:2]

    pipeline = dai.Pipeline()

    # 定义节点
    camRgb = pipeline.create(dai.node.ColorCamera)
    monoLeft = pipeline.create(dai.node.MonoCamera)
    monoRight = pipeline.create(dai.node.MonoCamera)
    stereo = pipeline.create(dai.node.StereoDepth)
    detectionNN = pipeline.create(dai.node.NeuralNetwork)
    xoutRgb = pipeline.create(dai.node.XLinkOut)
    xoutNN = pipeline.create(dai.node.XLinkOut)
    xoutDepth = pipeline.create(dai.node.XLinkOut)

    xoutRgb.setStreamName("rgb")
    xoutNN.setStreamName("detections")
    xoutDepth.setStreamName("depth")

    # RGB相机配置
    camRgb.setPreviewSize(nnWidth, nnHeight)
    camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
    camRgb.setBoardSocket(dai.CameraBoardSocket.CAM_A)
    camRgb.setInterleaved(False)
    camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
    camRgb.setFps(30)

    # 双目相机配置
    monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
    monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
    monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

    # 立体深度配置
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    stereo.setLeftRightCheck(True)
    stereo.setSubpixel(True)
    stereo.setDepthAlign(dai.CameraBoardSocket.CAM_A)
    stereo.setOutputSize(nnWidth, nnHeight)

    # 深度后处理
    stereo.initialConfig.setMedianFilter(dai.MedianFilter.KERNEL_7x7)
    config = stereo.initialConfig.get()
    config.postProcessing.speckleFilter.enable = False
    config.postProcessing.temporalFilter.enable = True
    config.postProcessing.spatialFilter.enable = True
    config.postProcessing.thresholdFilter.minRange = MIN_DEPTH
    config.postProcessing.thresholdFilter.maxRange = MAX_DEPTH
    stereo.initialConfig.set(config)

    # 神经网络配置
    detectionNN.setBlob(model)
    detectionNN.setNumInferenceThreads(2)

    # 连接节点
    monoLeft.out.link(stereo.left)
    monoRight.out.link(stereo.right)
    stereo.depth.link(xoutDepth.input)
    camRgb.preview.link(detectionNN.input)
    detectionNN.passthrough.link(xoutRgb.input)
    detectionNN.out.link(xoutNN.input)

    return pipeline


def run():
    with dai.Device(create_pipeline(), getDeviceInfo()) as device:
        qRgb = device.getOutputQueue("rgb", 4, False)
        qDet = device.getOutputQueue("detections", 4, False)
        qDepth = device.getOutputQueue("depth", 4, False)

        # 状态控制变量
        last_detections = []
        last_protos = []
        show_grayscale = False  # 默认彩色模式
        BLUE_COLOR = (255, 191, 0)  # BGR蓝色
        fpsHandler = FPSHandler()

        cv2.namedWindow("Triple View")
        cv2.createTrackbar('RGB Weight', 'Triple View', 40, 100, lambda x: globals().update(rgbWeight=x / 100))

        # 视频录制(适配不同模式)
        videoWriter = cv2.VideoWriter(
            ROOT.joinpath(f"output_{datetime.now().strftime('%Y%m%d_%H%M%S')}.mp4").as_posix(),
            cv2.VideoWriter_fourcc(*"mp4v"), 20, (1920, 480)
        )

        while True:

            # 处理按键输入
            key = cv2.waitKey(1)
            if key == ord('g'):
                show_grayscale = not show_grayscale
            elif key == ord('q'):
                videoWriter.release()
                break

            # 获取数据
            inRgb = qRgb.get()
            inDepth = qDepth.get()
            inDet = qDet.tryGet()

            # 处理RGB帧
            frame = inRgb.getCvFrame()
            rgbDisplay = frame.copy()
            blendBase = frame.copy()

            # 处理深度数据(支持模式切换)
            depthFrame = inDepth.getFrame()
            validDepth = np.where((depthFrame >= MIN_DEPTH) & (depthFrame <= MAX_DEPTH), depthFrame, 0)

            if show_grayscale:
                # 灰度模式处理
                depthNormalized = cv2.normalize(validDepth, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
                depthDisplay = cv2.cvtColor(255 - depthNormalized, cv2.COLOR_GRAY2BGR)
            else:
                # 彩色模式处理
                depthNormalized = cv2.normalize(validDepth, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
                depthDisplay = cv2.applyColorMap(255 - depthNormalized, cv2.COLORMAP_JET)

            depthDisplay = cv2.resize(depthDisplay, (nnWidth, nnHeight))

            # 更新检测结果
            if inDet is not None:
                tensor = toTensorResult(inDet)
                last_detections = non_max_suppression(tensor["output0"], 0.5, nc=80)[0]
                last_protos = tensor.get("output1", [np.zeros((32, 160, 160))])[0]

            # 应用分割掩码
            h, w = frame.shape[:2]
            depthOverlay = cv2.resize(depthDisplay, (w, h))

            if len(last_detections) > 0:
                masks = process_mask(last_protos, last_detections[:, 6:], last_detections[:, :4], (h, w), upsample=True)

                for mask, det in zip(masks, last_detections):
                    clsId = int(det[5])
                    conf = det[4]
                    bbox = list(map(int, det[:4]))

                    # 计算深度值
                    cx = np.clip((bbox[0] + bbox[2]) // 2, 0, w - 1)
                    cy = np.clip((bbox[1] + bbox[3]) // 2, 0, h - 1)
                    depthVal = depthFrame[cy, cx] if MIN_DEPTH <= depthFrame[cy, cx] <= MAX_DEPTH else 0

                    # 应用蓝色掩码
                    alpha = 0.5
                    for display in [rgbDisplay, depthOverlay, blendBase]:
                        display[mask > 0.5] = display[mask > 0.5] * (1 - alpha) + np.array(BLUE_COLOR) * alpha
                        cv2.rectangle(display, (bbox[0], bbox[1]), (bbox[2], bbox[3]), BLUE_COLOR, 2)
                        cv2.putText(display, f"{labelMap[clsId]} {conf:.2f}",
                                    (bbox[0], bbox[1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, BLUE_COLOR, 2)
                        cv2.putText(display, f"{depthVal}mm",
                                    (bbox[0], bbox[1] - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

            # 生成融合视图
            blended = cv2.addWeighted(blendBase, rgbWeight, depthOverlay, 1 - rgbWeight, 0)

            # 组合显示视图
            displayRGB = cv2.resize(rgbDisplay, (640, 480))
            displayDepth = cv2.resize(depthOverlay, (640, 480))
            displayBlend = cv2.resize(blended, (640, 480))

            tripleView = cv2.hconcat([displayRGB, displayDepth, displayBlend])

            # 添加视图标题
            titles = ["RGB View", "Depth View", "Fusion View"]
            for i, title in enumerate(titles):
                cv2.putText(tripleView, title, (10 + 640 * i, 30),
                            cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

            # 添加模式状态提示
            mode_text = "GRAYSCALE" if show_grayscale else "COLOR"
            cv2.putText(tripleView, f"Mode: {mode_text} (Press G to toggle)",
                        (10, 460), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

            # 显示和录制
            cv2.imshow("Triple View", tripleView)
            videoWriter.write(tripleView)

        # 退出循环后释放资源
        cv2.destroyAllWindows()
        videoWriter.release()


if __name__ == "__main__":
    run()

This is the code that runs when the OAK camera is connected to a computer, which can achieve the following effects


How to deploy the above code on Blue os,How can I deploy the above code on Blue os, or how can I implement these functions (depth estimation + instance segmentation) on Blue os?

1 Like

Hi @akang-211 -
Congrats on getting code working!
The next step is to package it as a BlueOS extension. You’ll fork an existing github example repository, add your code to it, and configure the GitHub action to push to Docker hub. Once the image builds successfully, you’ll be able to install your extension in BlueOS manually and test it out!
This process is all detailed here, but a tutorial style guide is in the works. Let us know your questions if you get stuck!

1 Like
"""
如何使用 RC_CHANNEL_OVERRIDE 消息强制 Ardupilot 中的输入通道的示例。这些有效地替换了输入通道(来自操纵杆或无线电),而不是通往推进器和伺服器的输出通道。
"""

# Import mavutil
from pymavlink import mavutil
import time
# Create the connection
master = mavutil.mavlink_connection('udpin:0.0.0.0:14550')
# Wait a heartbeat before sending commands
master.wait_heartbeat()

# Create a function to send RC values
# More information about Joystick channels
# here: https://www.ardusub.com/operators-manual/rc-input-and-output.html#rc-inputs
def set_rc_channel_pwm(channel_id, pwm=1500):
    """ Set RC channel pwm value
    Args:
        channel_id (TYPE): Channel ID
        pwm (int, optional): Channel pwm value 1100-1900
    """
    if channel_id < 1 or channel_id > 18:
        print("Channel does not exist.")
        return

    # Mavlink 2 supports up to 18 channels:
    # https://mavlink.io/en/messages/common.html#RC_CHANNELS_OVERRIDE
    rc_channel_values = [65535 for _ in range(18)]
    rc_channel_values[channel_id - 1] = pwm
    master.mav.rc_channels_override_send(
        master.target_system,                # target_system
        master.target_component,             # target_component
        *rc_channel_values)                  # RC channel list, in microseconds.


#俯仰控制(通道1,>1500为向上仰,<1500为向下俯)
#set_rc_channel_pwm(1, 1600)
#time.sleep(0.1)

# 侧倾控制(通道2,>1500为右横滚,<1500为左横滚)
#set_rc_channel_pwm(2, 1600)

# 横摆控制(通道4,>1500为右偏航,<1500为左偏航)
set_rc_channel_pwm(4, 1600)

#前进控制(通道5,>1500为前进,<1500为后退)
#while True:
#    set_rc_channel_pwm(5, 1600)
#    time.sleep(0.2)

#左右控制(通道6,>1500为)
#set_rc_channel_pwm(6, 1600)

# 相机 pwm 值设置从当前角度到
# 最小/最大相机角度的扫描伺服速度。它不设置伺服位置。
# 将相机倾斜度设置为全速 45º(最大)

# 将通道 12 设置为 1500us
# 这可用于通过将
# SERVO[N]_Function 设置为 RCIN12(其中 N 是 PWM 输出之一)来控制连接到伺服输出的设备
#set_rc_channel_pwm(12, 1500)

I tried to use the above code control, but did not see any changes in my ROV. How can I use the code to control the ROV correctly? (pitch control, roll control, yaw control, forward and backward, left and right control)