You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
Eg. it gives 112x1024 dimensional rgb and flow features on converted 25fps video using above command.
But for audio features, after converting the video fps to 25
python main.py feature_type=vggish
produces features which don't match with that of visual feature in the first dimension
Eg. It gives 32x128 dim feature only.
Can you please tell what needs to be done so that I can get same 112x128 audio feature?
Thank you
The text was updated successfully, but these errors were encountered:
I see. Vggish extracts features from 0.96 sec without overlap. With the command above, I3d extracts features from 0.96 sec with 0.32 sec overlap. hence i3d featues should be 3 times longer but it is not in your case and i don’t know why.
You may want to change the code for vggish feature extraction to support overlap which might solve the issue. You may try to use no overlap for i3d features (step size =24) if your application permits. This should make them of the same size
Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
Eg. it gives 112x1024 dimensional rgb and flow features on converted 25fps video using above command.
But for audio features, after converting the video fps to 25
python main.py feature_type=vggish
produces features which don't match with that of visual feature in the first dimension
Eg. It gives 32x128 dim feature only.
Can you please tell what needs to be done so that I can get same 112x128 audio feature?
Thank you
The text was updated successfully, but these errors were encountered: