Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

YOLOv8 + ByteTrack integration issues #1320

Open
1 task done
ddrisco11 opened this issue Jul 1, 2024 · 6 comments
Open
1 task done

YOLOv8 + ByteTrack integration issues #1320

ddrisco11 opened this issue Jul 1, 2024 · 6 comments
Labels
question Further information is requested

Comments

@ddrisco11
Copy link

Search before asking

  • I have searched the Supervision issues and found no similar feature requests.

Question

Hello! I'm currently building a program to detect deep sea creatures in submarine video. I am using YOLOv8 to make detections and ByteTrack to assign object IDs to these detections. My output includes both an annotated video (based exclusively on YOLO output) and a csv file of all distinct detections (determined as distinct by ByteTrack). I am having an issue where certain creatures are annotated in the video output, ie. detected by YOLO, but then they are omitted from the csv output ie. not assigned a tracking ID by ByteTrack. Please help! Thanks!

Additional

def process_video(video_path: str, output_path: str, model_path: str, location_path: str, start_time: str, time_col: int, lat_col: int, lon_col: int, depth_col: int, salinity_col: int, oxygen_col: int, altitude_col: int,
confidence_threshold: float, iou_threshold: float, track_activation_threshold: float, minimum_matching_threshold: float, lost_track_buffer: int,
frame_rate: int, min_box_area: int, aspect_ratio_thresh: float):
"""Process the video to track objects and save tracking data."""
model = YOLO(model_path)
tracker = ByteTrack(
track_activation_threshold=track_activation_threshold,
minimum_matching_threshold=minimum_matching_threshold,
lost_track_buffer=lost_track_buffer
)
location_data = get_location_data(location_path, time_col, lat_col, lon_col, depth_col, salinity_col, oxygen_col, altitude_col)
start_time_seconds = time_to_seconds(start_time)

cap = cv2.VideoCapture(video_path)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_path.replace('.csv', '.mp4'), fourcc, fps, (width, height))

tracking_info = {}
pbar = tqdm(total=frame_count, desc='Processing frames', leave=True, mininterval=10)

frame_index = 0
cached_boxes = None
cached_labels = None

try:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        current_time = start_time_seconds + (frame_index / fps)
        lat, lon, depth, salinity, oxygen, altitude = get_location_at_time(location_data, current_time)

        if frame_index % 5 == 0:  # Process frame every 5 frames
            results = process_frame(frame, model, confidence_threshold, iou_threshold)
            cached_boxes = results.boxes.xyxy.numpy()  # Convert to numpy array
            names = model.names  # Class names
            labels = results.boxes.cls.numpy().astype(int)  # Convert to integer labels

            cached_labels = [
                f"{names[label]} {round(confidence, 2)}"
                for label, confidence in zip(labels, results.boxes.conf.numpy())
            ]

        # Draw bounding boxes using cached detections and labels
        annotated_frame = frame.copy()
        if cached_boxes is not None and cached_labels is not None:
            drawn_boxes = set()  # Track drawn boxes
            for box, label in zip(cached_boxes, cached_labels):
                x1, y1, x2, y2 = map(int, box)  # Get box coordinates
                class_name = label.split()[0]  # Get class name from label

                # Check if the box is already drawn
                if (x1, y1, x2, y2) not in drawn_boxes:
                    # Draw rectangle with red color (BGR: (0, 0, 255)) and thicker lines (thickness=3)
                    cv2.rectangle(annotated_frame, (x1, y1), (x2, y2), (0, 0, 255), 3)
                    # Put label text with red color
                    cv2.putText(annotated_frame, class_name, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 2)
                    drawn_boxes.add((x1, y1, x2, y2))

        # Write the frame to the output video
        out.write(annotated_frame)

        if frame_index % 5 == 0:
            detections = sv.Detections.from_ultralytics(results)
            detections = tracker.update_with_detections(detections)

            for index in range(len(detections.class_id)):
                object_id = detections.tracker_id[index]
                class_name = model.names[int(detections.class_id[index])]
                confidence = detections.confidence[index]
                
                if object_id not in tracking_info:
                    image_path = save_detection_image(frame, detections[index], object_id, current_time, SOURCE_VIDEO_PATH)
                    tracking_info[object_id] = {
                        'Class': class_name,
                        'Confidence': confidence,
                        'Start Time': seconds_to_time_str(int(current_time)),
                        'End Time': seconds_to_time_str(int(current_time)),
                        'Latitude': lat,
                        'Longitude': lon,
                        'Depth': depth,
                        'Salinity': salinity,
                        'Oxygen': oxygen,
                        'Altitude': altitude,
                        'Image Path': image_path,
                        'All Classes': [class_name]
                    }
                else:
                    tracking_info[object_id]['End Time'] = seconds_to_time_str(int(current_time))
                    tracking_info[object_id]['Latitude'] = lat
                    tracking_info[object_id]['Longitude'] = lon
                    tracking_info[object_id]['Depth'] = depth
                    tracking_info[object_id]['Salinity'] = salinity
                    tracking_info[object_id]['Oxygen'] = oxygen
                    tracking_info[object_id]['Altitude'] = altitude
                    tracking_info[object_id]['All Classes'].append(class_name)

        pbar.update(1)
        frame_index += 1
@ddrisco11 ddrisco11 added the question Further information is requested label Jul 1, 2024
@rolson24
Copy link
Contributor

rolson24 commented Jul 2, 2024

Hi @ddrisco11!

This could be caused by several things, could you upload a test video and the model weights to google drive and share the link so we can reproduce this?
My initial thought is that the creatures are being detected inconsistently, and the object tracker is struggling to refind the tracks if they have been lost for several frames, but there is no way to verify without using your model on your test video.

Thanks

@ddrisco11
Copy link
Author

Hi @rolson24 , thanks for the response! Here is a link to a short training video, the full code, and the model weights I am using. Please let me know if you need anything else. https://drive.google.com/drive/folders/11u0m7Koew1D7lPEZngvfSR762Rvz0BNr?usp=drive_link

@rolson24
Copy link
Contributor

rolson24 commented Jul 3, 2024

Thanks for the code, model weights, and video. I have done a few tests and it looks like there is a few things going on.

First off, it looks like there is a small bug in supervision that makes the tracker_id's skip several numbers. This can be solved by using minimum_consecutive_frames=2 for now. This may be part of your confusion.

The other part is that the tracker relies on high confidence detections to determine if it should create a new track. Most of the detections from your model have confidence values of less than 0.3. Generally a good performing model will have confidence values of around 0.8. The confidence values of the detections greatly affect how the tracker performs because the tracker uses it as a metric of how likely the detection will be detected again in the next frame, and thus if it should be an object to track. To increase the confidence values of the detections you will need more training data. I would recommend adding image augmentations to your existing training data and considering using a more powerful foundation model like DETIC with Roboflow autodistill to automatically label images and then train your smaller yolov8 model on those labeled images.

The final thing you can try is to reduce the minimum_matching_threshold. This parameter determines the minimum threshold of an existing track being matched to a new detection. It essentially combines both the confidence of the detection and how much the track and the detection overlap into one number. By reducing the threshold, you allow the tracker to track objects with lower confidence, but also track detections that happen to overlap with an existing track that corresponds to a different object. By reducing the minimum_matching_threshold you risk tracks switching between different objects, but you may be able to track lower confidence detections. This is unlikely though, and I would first recommend improving the performance of your object detector.

@ddrisco11
Copy link
Author

@rolson24 thanks for taking the time to help out! This was my first time asking a question on GitHub and I very much appreciate the support.

@LinasKo LinasKo mentioned this issue Jul 15, 2024
1 task
@rolson24
Copy link
Contributor

@LinasKo,

I think I know the reason the tracker is skipping ID's. I will try to make a fix and see if it works.

@tteresi7
Copy link

tteresi7 commented Sep 26, 2024

Generally a good performing model will have confidence values of around 0.8

Not necessarily. A good model just has a good degree of output separation between false positives and true positives with minimal false negatives.

The confidence values of the detections greatly affect how the tracker performs

There should be an * here. It greatly affects how ByteTrack works. Most trackers don't operate with confidence thresholds.

To increase the confidence values of the detections you will need more training data

This will not necessarily create higher confidence values.

Overall, the way to fix this is to understand your outputs. If your outputs are largely good from 0.3 and above. You want the high confidence threshold (minimum_matching_threshold) in ByteTrack to be maybe 0.4ish.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants