Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video
Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi
Pont-Tuset,
Ivan Laptev, Josef Sivic, Cordelia Schmid
CVPR 2023
@inproceedings{ventura2023covr,
title={CoVR: Learning Composed Video Retrieval from Web Video Captions},
author={Lucas Ventura and Antoine Yang and Cordelia Schmid and Gul Varol},
booktitle={arXiv},
year={2023}}