Vggish feature vs i3d flow visual feature #121

1980x · 2024-02-05T01:50:05Z

Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
Eg. it gives 112x1024 dimensional rgb and flow features on converted 25fps video using above command.

But for audio features, after converting the video fps to 25
python main.py feature_type=vggish
produces features which don't match with that of visual feature in the first dimension
Eg. It gives 32x128 dim feature only.

Can you please tell what needs to be done so that I can get same 112x128 audio feature?

Thank you

v-iashin · 2024-02-05T06:21:59Z

I see. Vggish extracts features from 0.96 sec without overlap. With the command above, I3d extracts features from 0.96 sec with 0.32 sec overlap. hence i3d featues should be 3 times longer but it is not in your case and i don’t know why.

You may want to change the code for vggish feature extraction to support overlap which might solve the issue. You may try to use no overlap for i3d features (step size =24) if your application permits. This should make them of the same size

1980x · 2024-02-05T16:05:22Z

Thank you,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vggish feature vs i3d flow visual feature #121

Vggish feature vs i3d flow visual feature #121

1980x commented Feb 5, 2024

v-iashin commented Feb 5, 2024

1980x commented Feb 5, 2024

Vggish feature vs i3d flow visual feature #121

Vggish feature vs i3d flow visual feature #121

Comments

1980x commented Feb 5, 2024

v-iashin commented Feb 5, 2024

1980x commented Feb 5, 2024