Omni-sourced Webly-supervised Learning for Video Recognition

Haodong Duan, Yue Zhao, Yuanjun Xiong, Wentao Liu, Dahua Lin

In ECCV, 2020. Paper

模型库

Kinetics-400

MMAction2 当前公开了 4 个 OmniSource 框架训练的模型，包含 2D 架构与 3D 架构。下表比较了使用或不适用 OmniSource 框架训练得的模型在 Kinetics-400 上的精度：

模型	模态	预训练	主干网络	输入	分辨率	Top-1 准确率(Baseline / OmniSource (Delta))	Top-5 准确率(Baseline / OmniSource (Delta)))	模型下载链接
TSN	RGB	ImageNet	ResNet50	3seg	340x256	70.6 / 73.6 (+ 3.0)	89.4 / 91.0 (+ 1.6)	Baseline / OmniSource
TSN	RGB	IG-1B	ResNet50	3seg	short-side 320	73.1 / 75.7 (+ 2.6)	90.4 / 91.9 (+ 1.5)	Baseline / OmniSource
SlowOnly	RGB	None	ResNet50	4x16	short-side 320	72.9 / 76.8 (+ 3.9)	90.9 / 92.5 (+ 1.6)	Baseline / OmniSource
SlowOnly	RGB	None	ResNet101	8x8	short-side 320	76.5 / 80.4 (+ 3.9)	92.7 / 94.4 (+ 1.7)	Baseline / OmniSource

我们使用的 Kinetics400 验证集包含 19796 个视频，用户可以从验证集视频下载这些视频。同时也提供了对应的数据列表（每行格式为：视频 ID，视频帧数目，类别序号）以及标签映射（类别序号到类别名称）。

Mini-Kinetics 上的基准测试

OmniSource 项目当前公开了所采集网络数据的一个子集，涉及 Mini-Kinetics 中的 200 个动作类别。OmniSource 数据集准备中记录了这些数据集的详细统计信息。用户可以通过填写申请表获取这些数据，在完成填写后，数据下载链接会被发送至用户邮箱。更多关于 OmniSource 网络数据集的信息请参照 OmniSource 数据集准备。

MMAction2 在公开的数据集上进行了 OmniSource 框架的基准测试，下表记录了详细的结果（在 Mini-Kinetics 验证集上的精度），这些结果可以作为使用网络数据训练视频识别任务的基线。

TSN-8seg-ResNet50

模型	模态	预训练	主干网络	输入	分辨率	Top-1 准确率	Top-5 准确率	ckpt	json	log
tsn_r50_1x1x8_100e_minikinetics_rgb	RGB	ImageNet	ResNet50	3seg	short-side 320	77.4	93.6	ckpt	json	log
tsn_r50_1x1x8_100e_minikinetics_googleimage_rgb	RGB	ImageNet	ResNet50	3seg	short-side 320	78.0	93.6	ckpt	json	log
tsn_r50_1x1x8_100e_minikinetics_webimage_rgb	RGB	ImageNet	ResNet50	3seg	short-side 320	78.6	93.6	ckpt	json	log
tsn_r50_1x1x8_100e_minikinetics_insvideo_rgb	RGB	ImageNet	ResNet50	3seg	short-side 320	80.6	95.0	ckpt	json	log
tsn_r50_1x1x8_100e_minikinetics_kineticsraw_rgb	RGB	ImageNet	ResNet50	3seg	short-side 320	78.6	93.2	ckpt	json	log
tsn_r50_1x1x8_100e_minikinetics_omnisource_rgb	RGB	ImageNet	ResNet50	3seg	short-side 320	81.3	94.8	ckpt	json	log

SlowOnly-8x8-ResNet50

模型	模态	预训练	主干网络	输入	分辨率	Top-1 准确率	Top-5 准确率	ckpt	json	log
slowonly_r50_8x8x1_256e_minikinetics_rgb	RGB	None	ResNet50	8x8	short-side 320	78.6	93.9	ckpt	json	log
slowonly_r50_8x8x1_256e_minikinetics_googleimage_rgb	RGB	None	ResNet50	8x8	short-side 320	80.8	95.0	ckpt	json	log
slowonly_r50_8x8x1_256e_minikinetics_webimage_rgb	RGB	None	ResNet50	8x8	short-side 320	81.3	95.2	ckpt	json	log
slowonly_r50_8x8x1_256e_minikinetics_insvideo_rgb	RGB	None	ResNet50	8x8	short-side 320	82.4	95.6	ckpt	json	log
slowonly_r50_8x8x1_256e_minikinetics_kineticsraw_rgb	RGB	None	ResNet50	8x8	short-side 320	80.3	94.5	ckpt	json	log
slowonly_r50_8x8x1_256e_minikinetics_omnisource_rgb	RGB	None	ResNet50	8x8	short-side 320	82.9	95.8	ckpt	json	log

下表列出了原论文中在 Kinetics-400 上进行基准测试的结果供参考：

Model	Baseline	+GG-img	+[GG-IG]-img	+IG-vid	+KRaw	OmniSource
TSN-3seg-ResNet50	70.6 / 89.4	71.5 / 89.5	72.0 / 90.0	72.0 / 90.3	71.7 / 89.6	73.6 / 91.0
SlowOnly-4x16-ResNet50	73.8 / 90.9	74.5 / 91.4	75.2 / 91.6	75.2 / 91.7	74.5 / 91.1	76.6 / 92.5

注：

如果 OmniSource 项目对您的研究有所帮助，请使用以下 BibTex 项进行引用：

@article{duan2020omni,
  title={Omni-sourced Webly-supervised Learning for Video Recognition},
  author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua},
  journal={arXiv preprint arXiv:2003.13042},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_zh-CN.md

README_zh-CN.md

Omni-sourced Webly-supervised Learning for Video Recognition

模型库

Kinetics-400

Mini-Kinetics 上的基准测试

TSN-8seg-ResNet50

SlowOnly-8x8-ResNet50

注：

Files

README_zh-CN.md

Latest commit

History

README_zh-CN.md

File metadata and controls

Omni-sourced Webly-supervised Learning for Video Recognition

模型库

Kinetics-400

Mini-Kinetics 上的基准测试

TSN-8seg-ResNet50

SlowOnly-8x8-ResNet50

注：