-
Notifications
You must be signed in to change notification settings - Fork 31
Home
Alexandru Dinu edited this page Jan 3, 2019
·
27 revisions
Welcome to the cae wiki!
These models are inspired from [1, 2].
The input consists of 720p images from the YouTube-8M dataset (credit goes to gsssrao for the downloader and frame generator scripts). The dataset consists of 121,827 frames.
The images are padded to 1280x768 (i.e. 24,24 height pad), so that they can be split into 60 128x128 patches.
The model only gets to see a singular patch per forward pass (i.e. there are 60 forward passes and optimization steps for an image)
The loss is computed (per patch) as MSELoss(orig_patch_ij, out_patch_ij)
, and we have an average loss per image.
[1] https://arxiv.org/abs/1703.00395 [2] http://arxiv.org/abs/1511.06085