-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Several questions for this model #37
Comments
@sherlock666 Thank you for your interest.
A group of images mean sequential images? Currently, this is not supported but in the future version, I want to support it. lighthouse/lighthouse/models.py Line 228 in d7c4707
Please see this method for details.
The current methods do not process all of the frames but 2fps frames. Hence, if the video is 150s, the number of frames that the model process is 75. This is because videos are redundant and processing all of the frames is quite computationally heavy.
Mm.. What does it mean?
Yes, this is expected. If you want to get more frames (in the demo), change the TOPK_HIGHLIGHT variable. lighthouse/gradio_demo/demo.py Line 30 in d7c4707
|
thanks for reply what i mean for :
1.mmm...i had seen some where said the video is separated to 2-seconds clip (ex: you demo video is 150 so it'll generate 75 clips) which match the inference code , (but not sure the 2 fps you mentioned) , well, just hope to know whether 2 seconds or 2 fps can be adjusted or not
3.(sorry a new question) /media/user/ch_2024_8T/project_202409_trial-lighthouse/lighthouse/frame_loaders/slowfast_loader.py:71: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.) |
Sorry, not 2 fps but 1 frame per 2 second (so 0.5 fps, correctly). This fps is fixed because the model is trained on 0.5fps videos. If you want to change it, you need to extract frames, convert them into frame-level CLIP features, and train the model again. You can input different fps videos into the model trained on 0.5 fps, but I am not sure what's happen.
Sorry, I could not understand what you are getting at. In this case, the model predicts the moments 55s~85s based on the input video and text query. Could you tell me your question for details? :)
Thank you for reporting the issue. We will fix it next week. |
Thank you for your patience What i mean is i know that : the "Highlighted Frame"(which is the right bottom part of demo) is from the 2 seconds clips which sorted by the saliency score right? but how does the "Retrieved Moments" worked and be predicted? (which is my question how does the 55s~85s come? which is 30 seconds) my assumption:
|
@sherlock666 |
@sherlock666 I fixed the bug you reported. If you have any questions, please re-open the issue. Thanks. |
does it support for a group of images? (let's said 50 processed images) for model then output with saliency score
i'm quite interested about how does the program process the video
what i understand : (assume its a 30 fps video)
the video will be separated to n2 seconds clip (n2 <= 150 )
then... the 2 seconds clip which is 60 frames... all of them will be as input? (if not how do you do here? )
is it possible to adjust the 2 seconds parameter?
why the demo on the huggingface space the "Retrieved moments" sometime is more than 2 seconds? (which is longer than the clip we just got)
for "Highlighted frames" some time it output all minus score , but seems it do capture the right things, is it reasonable? and possible to get more frames? (ex 5-->10)
thanks!!!
The text was updated successfully, but these errors were encountered: