Polyp detection

In this repo, we are going to employ Machine learning methods to detect colon tissues(polyp).

Summary

Model	Dataset / Modality	Precision (P)	Recall (R)	mAP50	mAP50-95
YOLOv9m	Trained (WLI)	0.933	0.912	0.963	0.816

MambaYOLO	Trained (WLI)	0.8923	0.8278	0.9138	0.7264
	Unseen (NBI-LCI...)	0.8402	0.5672	0.6947	0.4972

YOLOv11	Trained (WLI)	0.8799	0.8638	0.9282	0.7067
	Unseen (NBI-LCI...)	0.8357	0.6675	0.7619	0.5383
	External (REAL-Colon)	0.4986	0.3413	0.3415	0.1922

YOLOv11-LoRA	Trained (WLI)	0.9007	0.8628	0.9351	0.7629
	Unseen (NBI-LCI...)	0.8608	0.6895	0.7985	0.6086
	External (REAL-Colon)	0.4906	0.3745	0.3665	0.1925

YOLOv12	Trained (WLI)	0.9418	0.8743	0.9503	0.7134
	Unseen (NBI-LCI...)	0.9438	0.8999	0.9616	0.7563
	External (REAL-Colon)	0.4705	0.3644	0.3530	0.2131

Data

PolypDB

WLI (White Light Imaging): Standard endoscopic view
NBI (Narrow Band Imaging): Enhanced vascular pattern visualization
LCI (Linked Color Imaging): Improved color contrast
FICE (Flexible Spectral Imaging Color Enhancement): Spectral enhancement
BLI (Blue Laser Imaging): Surface structure enhancement

REAL-Colon

REAL-Colon provides 60 full-resolution, real-world colonoscopy videos (2.7M frames) from multiple centers, with 350k expert-annotated polyp bounding boxes. Includes clinical metadata, acquisition details, and histopathology. Designed for robust CADe/CADx development and benchmarking. Released for non-commercial research. See the paper for details.

YOLOv9-FineTune:

Train

YOLOv9m summary (fused): 151 layers, 20,013,715 parameters, 0 gradients, 76.5 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
                 all        359        382      0.933      0.912      0.963      0.816

detect polyps(bounding box)

MambaYOLO-Train

Mamba-YOLO merges the state-space modeling efficiency of Mamba with the real-time detection strength of YOLOv8.
The architecture replaces the CSP backbone with a Selective Scan (Mamba) block, enabling long-range spatial dependency modeling at reduced computational cost.

This implementation targets medical image analysis, specifically polyp detection from multimodal colonoscopy datasets.

Architecture

Component	Description
Backbone	Mamba-based state-space selective scan layers replacing CSP blocks
Neck	PANet-style feature pyramid
Head	YOLOv8 detection head (multi-scale anchors)
Losses	CIoU + BCE + objectness loss
Training Framework	Ultralytics YOLO API
Hardware	NVIDIA T4 (16 GB) × 2
Software Stack	PyTorch 2.3.1 + CUDA 12.1, Python 3.11

Training Methodology

Parameter	Value
Modality	WLI
Epochs	300
Batch size	16
Optimizer	AdamW
Image size	640×640
Scheduler	Cosine annealing
Mixed precision	AMP enabled

Train metrix

Inference: WLI(Training modality)

Metrics of prediction on WLI

GroundTruth/Prediction on WLI

=== Test Set Metrics on trained modality(WLI)===
mAP50: 0.9138
mAP50-95: 0.7264
Precision: 0.8923
Recall: 0.8278

Inference: NBI-LCI-FICE-BLI

Metrics of prediction on NBI-LCI-FICE-BLI

GroundTruth/Prediction on NBI-LCI-FICE-BLI

=== NBI-LCI-FICE-BLI Metrics on unseen modalities===
mAP50: 0.6947
mAP50-95: 0.4972
Precision: 0.8402
Recall: 0.5672

Inference: REAL-Colon

Metrics of prediction on REAL-Colon

GroundTruth/Prediction on REAL-Colon

YOLOv11-FineTune

This project implements a robust polyp detection system using YOLO11 (You Only Look Once version 11) for medical image analysis. The model is trained on WLI modality.

Training Configuration

Model Architecture

Base Model: YOLO11 from Ultralytics
Input Resolution: 640×640 pixels
Backbone: CSPDarkNet
Neck: PANet
Head: Multi-scale detection

Hyperparameters

Epochs: 50
Batch Size: 16
Initial Learning Rate: 0.001
Optimizer: Auto-selected
Early Stopping Patience: 10 epochs

Data Augmentation

Mosaic: 0.8 probability
MixUp: 0.1 probability
Copy-Paste: 0.1 probability
Horizontal Flip: 0.5 probability
Color Augmentation: HSV adjustments
Spatial Transformations: Rotation, translation, scaling, shearing

Detection Parameters

Training Confidence Threshold: 0.1
IoU Threshold: 0.4
Augmentation Focus: Small object detection

Inference: WLI

=== Test Set Metrics on trained modality(WLI)===
mAP50: 0.9282
mAP50-95: 0.7067
Precision: 0.8799
Recall: 0.8638

Inference: NBI-LCI-FICE-BLI

=== NBI-LCI-FICE-BLI Metrics on unseen modalities===
mAP50: 0.7619
mAP50-95: 0.5383
Precision: 0.8357
Recall: 0.6675

Inference: REAL-Colon

=== Metrics on unseen Dataset: REAL-Colon ===
mAP50: 0.3415
mAP50-95: 0.1922
Precision: 0.4986
Recall: 0.3413

YOLOv11-FineTune-LoRA

Now, we turn to implementing bigger YOLO models using efficiency tricks. More specifically we mimic a LoRA(Low rank adaptation) style fine-tunning here:

Training Configuration

Model Architecture

Base Model: YOLO11(L) from Ultralytics
Input Resolution: 640×640 pixels
Backbone: CSPDarkNet
Neck: PANet
Head: Multi-scale detection

Hyperparameters

Epochs: 50
Batch Size: 8
Freeze: 10 ---> LoRA imitation
Initial Learning Rate: 0.001
Optimizer: Auto-selected
Early Stopping Patience: 10 epochs

Data Augmentation

Mosaic: 0.8 probability
MixUp: 0.1 probability
Copy-Paste: 0.1 probability
Horizontal Flip: 0.5 probability
Color Augmentation: HSV adjustments
Spatial Transformations: Rotation, translation, scaling, shearing

Detection Parameters

Training Confidence Threshold: 0.1
IoU Threshold: 0.4
Augmentation Focus: Small object detection

Inference: WLI

=== Test Set Metrics ===
mAP50: 0.9351
mAP50-95: 0.7629
Precision: 0.9007
Recall: 0.8628

Inference: NBI-LCI-FICE-BLI

=== Test Set Metrics ===
mAP50: 0.7985
mAP50-95: 0.6086
Precision: 0.8608
Recall: 0.6895

Inference: REAL-Colon

=== Metrics on unseen Dataset: REAL-Colon ===
mAP50: 0.3665
mAP50-95: 0.1925
Precision: 0.4906
Recall: 0.3745

YOLOv12-FineTune

This time, We will try the most recent version of YOLO released by Ultralytics which is YOLOv12(s).

Training Configuration

Model Architecture

Base Model: YOLO12 from Ultralytics
Input Resolution: 640×640 pixels
Backbone: Attention mechanisms and convolutional operations for feature extraction
Neck: Residual Efficient Layer Aggregation Networks (R-ELAN)
Head: Optimized attention architecture with Area Attention Mechanism, FlashAttention, and multi-scale detection

Hyperparameters

Epochs: 50
Batch Size: 16
Initial Learning Rate: 0.001
Optimizer: Auto-selected
Early Stopping Patience: 10 epochs

Data Augmentation

Mosaic: 0.8 probability
MixUp: 0.1 probability
Copy-Paste: 0.1 probability
Horizontal Flip: 0.5 probability
Color Augmentation: HSV adjustments
Spatial Transformations: Rotation, translation, scaling, shearing

Detection Parameters

Training Confidence Threshold: 0.1
IoU Threshold: 0.4
Augmentation Focus: Small object detection

Inference: WLI

=== Test Set Metrics ===
mAP50: 0.9503
mAP50-95: 0.7134
Precision: 0.9418
Recall: 0.8743

Inference: NBI-LCI-FICE-BLI

=== Test Set Metrics ===
mAP50: 0.9616
mAP50-95: 0.7563
Precision: 0.9438
Recall: 0.8999

Inference: REAL-Colon

=== Metrics on unseen Dataset: REAL-Colon ===
mAP50: 0.3530
mAP50-95: 0.2131
Precision: 0.4705
Recall: 0.3644

=======

c198478c873890a30547bc43d0c43818cef58b27

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LoRA_YOLOv11		LoRA_YOLOv11
MambaYOLO		MambaYOLO
YOLO12		YOLO12
YOLOv11		YOLOv11
YOLOv9		YOLOv9
images		images
LICENSE		LICENSE
README.md		README.md

License

theveryhim/Polyp-detection

Folders and files

Latest commit

History

Repository files navigation

Polyp detection

Summary

Data

PolypDB

REAL-Colon

YOLOv9-FineTune:

MambaYOLO-Train

Architecture

Training Methodology

Inference: WLI(Training modality)

Inference: NBI-LCI-FICE-BLI

Inference: REAL-Colon

YOLOv11-FineTune

Training Configuration

Model Architecture

Hyperparameters

Data Augmentation

Detection Parameters

Inference: WLI

Inference: NBI-LCI-FICE-BLI

Inference: REAL-Colon

YOLOv11-FineTune-LoRA

Training Configuration

Model Architecture

Hyperparameters

Data Augmentation

Detection Parameters

Inference: WLI

Inference: NBI-LCI-FICE-BLI

Inference: REAL-Colon

YOLOv12-FineTune

Training Configuration

Model Architecture

Hyperparameters

Data Augmentation

Detection Parameters

Inference: WLI

Inference: NBI-LCI-FICE-BLI

Inference: REAL-Colon

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages