video-sampler

Video sampler allows you to efficiently sample video frames and summarize the videos. Currently, it uses keyframe decoding, frame interval gating and perceptual hashing to reduce duplicated samples.

Use case: video data collection for machine learning, video summarisation, video frame analysis.

Documentation

Documentation is available at https://lemurpwned.github.io/video-sampler/.

Features

Installation and Usage

If you intend to use all the integrations, you need all the dependencies:

python3 -m pip install -U video_sampler[all]

for minimalist no-cli usage install:

python3 -m pip install -U video_sampler

Available extras are:

yt-dlp - for YT-DLP integration
clip - for CLIP models integration
language - for language capture
all - for all dependencies
dev - for development dependencies

To see all available options, run:

python3 -m video_sampler --help

Basic usage

Plain:

python3 -m video_sampler hash FatCat.mp4 ./dataset-frames/ --hash-size 3 --buffer-size 20

From the config file (this is the recommended way if you plan to re-use the same config for different videos):

python3 -m video_sampler config ./configs/hash_base.yaml /my-video-folder/ ./my-output-folder

You can set the number of workers to use with the n_workers parameter. The default is 1.

Streaming and RTSP support

RTSP support is experimental and may not work for all RTSP servers, but it should work for most of them. You can test out the RTSP support by running the following command:

python3 -m video_sampler config ./configs/hash_base.yaml rtsp://localhost:8554/some-stream ./sampled-stream/

RTSP simple server is a good way to test RTSP streams.

Other streams (MJPEG) also work, e.g.

python3 -m video_sampler config ./configs/hash_base.yaml "http://honjin1.miemasu.net/nphMotionJpeg?Resolution=640x480&Quality=Standard" ./sampled-stream/

For proper streaming, you may want to adjust min_frame_interval_sec and buffer sizes to have a shorter flush time. Keep in mind that streaming will be sampled until interrupted, so you may want to specify the end time of the stream with end_time_s parameter. If the stream is a looped video, this is especially important -- otherwise, you'll end up overwriting the same frames over and over again.

Image sampling

If your frames are ordered, then you can use the image_sampler module to sample them. The images should have some concept of ordering, e.g. they should be named in a way that allows for sorting, e.g. image_001.png, image_002.png, etc, because the sampler will deduplicate based on the circular buffer of hashes. An example of a config for image_sampler is given in ./configs/image_base.yaml. Key changes respective to video_sampler are:

frame_time_regex - regex to extract frame time from the filename. If not provided, the frames will be lexiographically ordered.
any video sampling params such as min_frame_interval_sec, keyframes_only will be disregarded.

You can run the image sampler with -- you need to specify the images flag.

python3 -m video_sampler config ./configs/image_base.yaml "./folder-frames/worlds-smallest-cat-bbc" ./sampled-output/ --images

YT-DLP integration plugin

Before using please consult the ToS of the website you are scraping from -- use responsibly and for research purposes. To use the YT-DLP integration, you need to install yt-dlp first (see yt-dlp). Then, you simply add --yt-dlp to the command, and it changes the meaning of the video_path argument.

to search

video_sampler hash "ytsearch:cute cats" ./folder-frames/ \
  --hash-size 3 --buffer-size 20 --ytdlp

to sample a single video

video_sampler hash "https://www.youtube.com/watch?v=W86cTIoMv2U" ./folder-frames/ \
    --hash-size 3 --buffer-size 20 --ytdlp

to sample a playlist

video_sampler hash "https://www.youtube.com/watch?v=GbpP3Sxp-1U&list=PLFezMcAw96RGvTTTbdKrqew9seO2ZGRmk" ./folder-frames/ \
  --hash-size 3 --buffer-size 20 --ytdlp

segment based on the keyword extraction

video_sampler hash "https://www.youtube.com/watch?v=GbpP3Sxp-1U&list=PLFezMcAw96RGvTTTbdKrqew9seO2ZGRmk" ./folder-frames/ \
  --hash-size 3 --buffer-size 20 --ytdlp --keywords "cat,dog,another keyword,test keyword"

The videos are never directly downloaded, only streamed, so you can use it to sample videos from the internet without downloading them first.

Extra YT-DLP options

You can pass extra options to yt-dlp by using the -yt-extra-args flag. For example:

this will only sample videos uploaded before 2019-01-01:

... --ytdlp --yt-extra-args '--datebefore 20190101'

or this will only sample videos uploaded after 2019-01-01:

... --ytdlp --yt-extra-args '--dateafter 20190101'

or this will skip all shorts:

... --ytdlp --yt-extra-args '--match-filter "original_url!*=/shorts/ & url!*=/shorts/"

OpenAI summary

To use the OpenAI multimodal models integration, you need to install openai first pip install openai. Then, you simply add --summary-interval to the command and the url.

In the example, I'm using llamafile LLAVA model to summarize the video every 50 frames. If you want to use the OpenAI multimodal models, you need to export OPENAI_API_KEY=your_api_key first. The format should also work with default OpenAI stuff.

To replicate, run LLAVA model locally and set the summary-url to the address of the model. Specify the summary-interval to the minimal interval in seconds between frames that are to be summarised/described.

video_sampler hash ./videos/FatCat.mp4 ./output-frames/ --hash-size 3 --buffer-size 20 --summary-url "http://localhost:8080/completion" --summary-interval 50

Supported env in case you need those:

OPENAI_API_KEY - OpenAI API key
OPENAI_MODEL - OpenAI model name

Confirmed that you can make it work with e.g. LM Studio, but you need to adjust the summary-url to the correct address, e.g. it might be "http://localhost:8080/completions". Similar if you want to use the OpenAI API.

Some frames, based on the interval specified, will be summarised by the model and the result will saved in the ./output-frames/summaries.json folder. The frames that are summarised come after the sampling and gating process happens, and only those frames that pass both stages are viable for summarisation.

summaries.jsonl
---
{"time": 56.087, "summary": "A cat is walking through a field of tall grass, with its head down and ears back. The cat appears to be looking for something in the grass, possibly a mouse or another small creature. The field is covered in snow, adding a wintry atmosphere to the scene."}
{"time": 110.087, "summary": "A dog is walking in the snow, with its head down, possibly sniffing the ground. The dog is the main focus of the image, and it appears to be a small animal. The snowy landscape is visible in the background, creating a serene and cold atmosphere."}
{"time": 171.127, "summary": "The image features a group of animals, including a dog and a cat, standing on a beach near the ocean. The dog is positioned closer to the left side of the image, while the cat is located more towards the center. The scene is set against a beautiful backdrop of a blue sky and a vibrant green ocean. The animals appear to be enjoying their time on the beach, possibly taking a break from their daily activities."}

API examples

See examples in ./scripts.

Advanced usage

There are 3 sampling methods available:

hash - uses perceptual hashing to reduce duplicated samples
entropy - uses entropy to reduce duplicated samples (work in progress)
gzip - uses gzip compressed size to reduce duplicated samples (work in progress)

To launch any of them you can run and substitute method-name with one of the above:

video_sampler buffer `method-name` ...other options

e.g.

video_sampler buffer entropy --buffer-size 20 ...

where buffer-size for entropy and gzip mean the top-k sliding buffer size. Sliding buffer also uses hashing to reduce duplicated samples.

Gating

Aside from basic sampling rules, you can also apply gating rules to the sampled frames, further reducing the number of frames. There are 3 gating methods available:

pass - pass all frames
clip - use CLIP to filter out frames that do not contain the specified objects
blur - use blur detection to filter out frames that are too blurry

Here's a quick example of how to use clip:

python3 -m video_sampler clip ./videos ./scratch/clip --pos-samples "a cat" --neg-samples "empty background, a lemur"  --hash-size 4

CLIP-based gating comparison

Here's a brief comparison of the frames sampled with and without CLIP-based gating with the following config:

  gate_def = dict(
      type="clip",
      pos_samples=["a cat"],
      neg_samples=[
          "an empty background",
          "text on screen",
          "a forest with no animals",
      ],
      model_name="ViT-B-32",
      batch_size=32,
      pos_margin=0.2,
      neg_margin=0.3,
  )

Evidently, CLIP-based gating is able to filter out frames that do not contain a cat and in consequence, reduce the number of frames with plain background. It also thinks that a lemur is a cat, which is not entirely wrong as fluffy creatures go.

Pass gate (no gating)	CLIP gate	Grid

The effects of gating in numbers, for this particular set of examples (see produced vs gated columns). produced represents the number of frames sampled without gating, here after the perceptual hashing, while gated represents the number of frames sampled after gating.

video	buffer	gate	decoded	produced	gated
FatCat.mp4	grid	pass	179	31	31
SmolCat.mp4	grid	pass	118	24	24
HighLemurs.mp4	grid	pass	161	35	35
FatCat.mp4	hash	pass	179	101	101
SmolCat.mp4	hash	pass	118	61	61
HighLemurs.mp4	hash	pass	161	126	126
FatCat.mp4	hash	clip	179	101	73
SmolCat.mp4	hash	clip	118	61	31
HighLemurs.mp4	hash	clip	161	126	66

Blur gating

Helps a little with blurry videos. Adjust threshold and method (laplacian or fft) for best results. Some results from fft at threshold=20:

video	buffer	gate	decoded	produced	gated
MadLad.mp4	grid	pass	120	31	31
MadLad.mp4	hash	pass	120	110	110
MadLad.mp4	hash	blur	120	110	85

Benchmarks

Configuration for this benchmark:

SamplerConfig(min_frame_interval_sec=1.0, keyframes_only=True, buffer_size=30, hash_size=X, queue_wait=0.1, debug=True)

Video	Total frames	Hash size	Decoded	Saved
SmolCat	2936	8	118	106
SmolCat	-	4	-	61
Fat Cat	4462	8	179	163
Fat Cat	-	4	-	101
HighLemurs	4020	8	161	154
HighLemurs	-	4	-	126

SamplerConfig(
    min_frame_interval_sec=1.0,
    keyframes_only=True,
    queue_wait=0.1,
    debug=False,
    print_stats=True,
    buffer_config={'type': 'entropy'/'gzip', 'size': 30, 'debug': False, 'hash_size': 8, 'expiry': 50}
)

Video	Total frames	Type	Decoded	Saved
SmolCat	2936	entropy	118	39
SmolCat	-	gzip	-	39
Fat Cat	4462	entropy	179	64
Fat Cat	-	gzip	-	73
HighLemurs	4020	entropy	161	59
HighLemurs	-	gzip	-	63

Benchmark videos

Flit commands

Build

flit build

Install

flit install

Publish

Remember to bump the version in pyproject.toml before publishing.

flit publish

🛡 License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

📃 Citation

@misc{video-sampler,
  author = {video-sampler},
  title = {Video sampler allows you to efficiently sample video frames},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LemurPwned/video-sampler}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github		.github
assets		assets
configs		configs
docker		docker
notebooks		notebooks
scripts		scripts
tests		tests
video_sampler		video_sampler
videos		videos
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_docs.txt		requirements_docs.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

video-sampler

Table of Contents

Documentation

Features

Installation and Usage

Basic usage

Streaming and RTSP support

Image sampling

YT-DLP integration plugin

Extra YT-DLP options

OpenAI summary

API examples

Advanced usage

Gating

CLIP-based gating comparison

Blur gating

Benchmarks

Benchmark videos

Flit commands

Build

Install

Publish

🛡 License

📃 Citation

About

Releases 13

Packages

Contributors 3

Languages

License

LemurPwned/video-sampler

Folders and files

Latest commit

History

Repository files navigation

video-sampler

Table of Contents

Documentation

Features

Installation and Usage

Basic usage

Streaming and RTSP support

Image sampling

YT-DLP integration plugin

Extra YT-DLP options

OpenAI summary

API examples

Advanced usage

Gating

CLIP-based gating comparison

Blur gating

Benchmarks

Benchmark videos

Flit commands

Build

Install

Publish

🛡 License

📃 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 3

Languages

Packages