Objective
The primary objective of this project was to develop a machine learning model capable of detecting fish in fly fishing videos. The goal was to identify scenes where a fish is in hand, clip these scenes, and save the relevant segments for a TikTok channel.
Introduction
Fly fishing videos often contain valuable moments where anglers hold their catch. Identifying these moments manually can be time-consuming. This project leverages computer vision and deep learning to automate the detection of fish in these videos, enhancing the efficiency of content creation for social media platforms like TikTok.
Technical Setup
Tools and Technologies
• Python: Programming language used for scripting.
• YOLOv5: State-of-the-art object detection model.
• OpenCV: Library for computer vision tasks.
• LabelImg: Tool for annotating images.
• FFmpeg: Tool for processing video files.
• Torch: Deep learning library.
Training Process
•Trained the model using the following command:
python train.py --img 640 --batch 16 --epochs 50 --data /path/to/dataset.yaml --cfg models/yolov5s.yaml --weights yolov5s.pt --name fish_detector
Environment Setup
1. Install Required Libraries:
pip install torch torchvision opencv-python labelImg ffmpeg-python
2. Set Up YOLOv5:
Testing and Evaluation
git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt
Data Collection and Preparation
Data Sources
•Collected various fishing images and videos from personal recordings and public datasets.

Data Annotation
•Used LabelImg to annotate images with bounding boxes for fish, hands, fishing rods, and other fishing equipment.
•Saved annotations in YOLO format.
Data Augmentation
•Applied techniques such as flipping, rotating, and adjusting brightness/contrast to increase dataset robustness.
Model Training
Configuration
Created a dataset.yaml file to define the dataset:
train: /path/to/your/dataset/train/images
val: /path/to/your/dataset/val/images
nc: 3 # number of classes
names: ['fish', 'hand', 'fishing_rod']
Evaluation Metrics
• Evaluated model performance using metrics like precision, recall, and confidence thresholds.
• Adjusted the model based on test results and retrained with additional data as necessary.
Results
• Initial results showed a high number of false positives, likely due to confusion between fish and other objects like hands and fishing rods.
Iterations and Improvements
Identified Issues
• High rate of false positives due to model confusion.
• Poor video quality affecting detection accuracy.
Solutions Implemented
• Collected and annotated more diverse images, including negative samples (images without fish).
• Retrained the model to improve its ability to distinguish between fish and other objects.
Conclusion and Reflections
Learnings
• Importance of a diverse and well-annotated dataset in training accurate models.
• Need for high-quality data to enhance model performance.
• Iterative process of training, testing, and refining is crucial in developing robust machine learning models.
Future Work
• Further expand the dataset to include more variations of fish and different environments.
• Experiment with different model architectures and hyperparameters to improve detection accuracy.
• Explore real-time detection capabilities for live video feeds.
Visuals and Examples


Video Clips
Code Snippets
#reelhooked.py - Automated Fish Detection and Clipping for Fly Fishing Videos
import os
import subprocess
import torch
import cv2
import numpy as np
# Directory containing the videos
video_dir = "/Users/XXXX/Desktop/fishing"
output_dir = video_dir # Output to the same directory
temp_dir = os.path.join(video_dir, "temp")
detections_dir = os.path.join(video_dir, "detections")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
if not os.path.exists(temp_dir):
os.makedirs(temp_dir)
if not os.path.exists(detections_dir):
os.makedirs(detections_dir)
# Full path to FFmpeg
FFMPEG_PATH = "/opt/homebrew/bin/ffmpeg"
# Load YOLO
print("Loading YOLO...")
model_path = '/Users/XXXX/Desktop/fishing/yolov5/runs/train/exp/weights/best.pt' #
if not os.path.exists(model_path):
raise FileNotFoundError(f"Model weights not found at {model_path}")
model = torch.hub.load('ultralytics/yolov5', 'custom', path=model_path)
model.eval()
print("YOLO loaded successfully.")
def remove_audio(video_path, temp_path):
command = [
FFMPEG_PATH,
"-i", video_path,
"-c", "copy",
"-an", # Remove audio stream
temp_path
]
print(f"Running ffmpeg command to remove audio: {' '.join(command)}")
subprocess.run(command, check=True)
# Function to process each video
def process_video(video_path, output_dir):
temp_video_path = os.path.join(temp_dir, os.path.basename(video_path))
remove_audio(video_path, temp_video_path)
print(f"Processing video: {temp_video_path}")
cap = cv2.VideoCapture(temp_video_path)
fps = int(cap.get(cv2.CAP_PROP_FPS))
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
duration = frame_count / fps
fish_in_hand_times = []
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
current_time = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000.0 # current time in seconds
# Detect fish in hand
fish_detected, confidence, bbox = detect_fish_in_hand(frame)
if fish_detected:
fish_in_hand_times.append(current_time)
print(f"Fish detected at {current_time} seconds with confidence {confidence} and bounding box {bbox}.")
# Save the frame with detection
save_detection_frame(frame, bbox, current_time)
cap.release()
# Clip video segments where fish was detected
for i, fish_in_hand_time in enumerate(fish_in_hand_times):
start_time = max(0, fish_in_hand_time - 20) # Go back 20 seconds
end_time = fish_in_hand_time
print(f"Clipping video from {start_time} to {end_time} seconds.")
# Cut the video using ffmpeg
input_filename = os.path.basename(video_path)
output_filename = os.path.join(output_dir, f"clip_{i}_{input_filename}")
command = [
FFMPEG_PATH,
"-i", temp_video_path,
"-ss", str(start_time),
"-to", str(end_time),
"-c:v", "copy", # Copy video stream
output_filename
]
print(f"Running ffmpeg command: {' '.join(command)}")
subprocess.run(command, check=True)
print(f"Video clip saved: {output_filename}")
def detect_fish_in_hand(frame):
img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = model(img)
labels, coords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()
n = len(labels)
class_ids = []
confidences = []
boxes = []
for i in range(n):
if labels[i] == 0: # Assuming 'fish' is class 0
x1, y1, x2, y2, conf = coords[i]
if conf > 0.3: # Lower confidence threshold
width, height = frame.shape[1], frame.shape[0]
x1, y1, x2, y2 = int(x1 * width), int(y1 * height), int(x2 * width), int(y2 * height)
boxes.append([x1, y1, x2 - x1, y2 - y1])
confidences.append(conf)
class_ids.append(labels[i])
print(f"Detections: {len(boxes)}")
if len(boxes) > 0:
return True, max(confidences), boxes[0] # Return the first detection for simplicity
return False, None, None
def save_detection_frame(frame, bbox, current_time):
x, y, w, h = bbox
color = (0, 255, 0) # Green bounding box
thickness = 2
cv2.rectangle(frame, (x, y), (x + w, y + h), color, thickness)
output_path = os.path.join(detections_dir, f"detection_{current_time:.2f}.jpg")
cv2.imwrite(output_path, frame)
print(f"Saved detection frame at {output_path}")
# Process all videos in the directory
print("Starting video processing...")
for video_file in os.listdir(video_dir):
if video_file.endswith(".mp4") or video_file.endswith(".avi"):
video_path = os.path.join(video_dir, video_file)
process_video(video_path, output_dir)
print("Video processing completed.")