Title page

期刊：PLoS One

年份：2021 Aug 17

数据集链接：https://doi.org/10.7910/DVN/FCBUOR

pdf链接：

Summary

In this paper, we create an endoscopic dataset collected from various sources and annotate the ground truth of polyp location and classification results with the help of experienced gastroenterologists.
The dataset can serve as a benchmark platform to train and evaluate the machine learning models for polyp classification
We have also compared the performance of eight state-ofthe-art deep learning-based object detection models.

Workflow

无

Methods

Evaluation models for detection and classification

YOLOv3
YOLOv4
SSD
RetinaNet: a one-stage framework based on the SSD model, using the FPN and Focal loss
DetNet
RefineDet

Dataset build

Images in different datasets vary greatly

Each endoscopic video sequence has significant redundancies

Datasets selection and annotation

将所有数据集规整化为 bounding box + class（二分类：腺瘤/增生性息肉）

源数据：

MICCAI 2017:

18 videos for training and 20 videos for testing
只有掩码

CVC colon DB：

15 short colonoscopy videos with a total of 300 frames
只有掩码

GLRC dataset：

76 short video sequences with class labels
只有分类标签

KUMC dataset：

80 colonoscopy video sequences.

数据集筛选：

To avoid some long videos overwhelming others, we adopt an adaptive sampling rate to extract the frames from each video sequence based on the camera movement and video lengths
After sampling, we extracted around 300 to 500 frames for long sequences to maintain a balance among different sequences, while for small sequences like CVC colon DB, we simply keep all image frames in the sequence.
分类标签：When the endoscopist could not reach an agreement on the classification results, we simply remove those sequences from the dataset.

数据集划分：

We make the division for each dataset and polyp class independently
For each class in one dataset, we randomly select 75%, 10%, and 15% sequences to form the training, validation, and test sets, respectively

Result-show

1. 数据集概览

标注类型：bbox + 二分类结果（腺瘤/增生性息肉）
肠镜类型：待核查
116 training, 17 validation, and 22 test sequences
训练集 28773, 验证集4254, 测试集 4872 帧

3. 基准模型测试结果

息肉分类结果

仅息肉检测的结果：

基于序列分析结果

数据集注释内容

# 数据集名称：
PolypSet

# 数据集结构
--train2019
	Annotation
    	1.xml  (28773个文件)
        2.xml
        ...
    Image
    	1.jpg (28773个文件)
        2.jpg
        ...    
--test2019
	- image 
    	- 1 （22个文件夹）
        	1.jpg
            2.jpg
            ...
    	- 2
        - ...
    - Annotation 
    	- 1 （22个文件夹）
        	1.xml
            2.xml
            ...
        - 2
        - 3
--val2019
	- 1（17个文件夹）
    	- image
            1.jpg
            2.jpg
            3.jpg
            ...
        - Annotation
        	1.xml
            2.xml
            3.xml
            ...
    - 2
    - ...


# Annotation example:
# 1.xml
<annotation>
    <folder>3</folder>
    <filename>245.png</filename>
    <path>/scratch/mfathan/Thesis/Dataset/Extracted/MICCAI2017_Test/test/3/245.png</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>384</width>
        <height>288</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>adenomatous</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>164</xmin>
            <ymin> 113</ymin>
            <xmax> 343</xmax>
            <ymax> 279</ymax>
        </bndbox>
    </object>
</annotation>

Colonoscopy Polyp Detection and Classification: Dataset Creation and Comparative Evaluations

polyp dataset building