Implemented hardware decoding #1685

matthewlai · 2024-12-14T19:09:17Z

This implements hardware decoding continuing from the original work of @rvillalba-novetta and refactoring by @mikeboers in main...rvillalba-novetta:PyAV:hwaccel (and children commits)

I've completed the refactoring, fixed a few bugs, and brought the changes up to date with both PyAV at HEAD and ffmpeg 7.

I've also added a test and an example script. The test cannot be run in CI for obvious reasons (GitHub runners don't have GPUs), but I've manually run it on Mac, Windows, and Linux (WSL).

I understand that @mikeboers didn't find it worthwhile back in the days, but the situation has changed significantly since then - now almost every machine has a supported decoder, and modern encodings (HEVC, VP9, AV1) are so compute-intensive to decode that they have practically been designed for hardware decoding. As an example, I've been working with 4K HEVC videos (standard format coming out of action cameras and phones these days), and my Apple M3 can only decode them at 15 fps, while VideoToolbox easily does 100+ fps. I believe now and especially going forward, hardware decoding will be a necessity for almost any application that cares about performance or need realtime decoding.

Performance tests with a 4K HEVC video (using the new example script) -
MacBook Air M3:
CPU (M3) - 16.3 fps
GPU (M3, videotoolbox) - 121.4 fps

Windows:
CPU (AMD Ryzen 5 3500X) - 15.6 fps
GPU (NVIDIA RTX 3060 Ti, using d3d11va) - 49.1 fps

Linux (WSL):
CPU (AMD Ryzen 5 3500X) - 15.5 fps
GPU (NVIDIA RTX 3060 Ti, using cuda) - 91.3 fps

Notes:

Windows accelerated performance is much lower than using ffmpeg directly (ffmpeg -hwaccel d3d11va -i ... -f null -). I don't know why but it's hard for me to work on Windows because I haven't figured out how to build PyAV on Windows and have to rely on GitHub Action. Using ffmpeg I get around 120 fps with d3d11va, and 150 fps with cuda. Maybe we need threading to interleave PCI-E transfers with decode?
On Linux it's a bit slower but probably close enough. On Mac it's almost as fast as ffmpeg CLI.
The ffmpeg included in wheels already has videotoolbox and d3d11va/d3d12va, so Windows and Mac are pretty well covered. Linux mostly only has vendor APIs and none are being built right now. Windows would also benefit from vendor APIs, but D3D11 is probably good enough for now (for NVIDIA it's about 15% slower).
No new failures in existing tests.

Thanks!

av/codec/codec.pyx

WyattBlue · 2024-12-15T04:47:42Z

make lint is failing.

12 tests are also failing on my physical MacOS machine. The errors are the same as Github Actions

matthewlai · 2024-12-15T05:06:57Z

Sorry about that. I thought they were existing failures but I forgot to do a make clean when I syned back, and apparently the .so's didn't get rebuilt? I'll investigate, unless you can see why formats may be getting set to None in streams?

matthewlai · 2024-12-15T07:22:47Z

All tests pass locally for me now. Also made make lint happy.

@rvillalba-novetta

This implements hardware decoding continuing from the work of @rvillalba-novetta and @mikeboers in main...rvillalba-novetta:PyAV:hwaccel (and children commits)

WyattBlue · 2024-12-17T12:22:28Z

I've cleaned up the things that remained a problem.

legraphista · 2024-12-23T09:38:45Z

examples/basics/hw_decode.py

+if HW_DEVICE is None:
+    av.codec.hwaccel.dump_hwdevices()
+    print("Please set HW_DEVICE.")
+    exit()
+
+assert HW_DEVICE in av.codec.hwaccel.hwdevices_available, f"{HW_DEVICE} not available."


I think the example is a little outdated here:

there is no dump_hwdevices member in av.codec.hwaccel

av.codec.hwaccel.hwdevices_available is a cyfunction, not a list, set or dict

#1689 should fix it.

matthewlai · 2024-12-23T11:37:47Z

The HWAccel and HWAccelContext merge is also creating problems with multiple streams sharing one context. I'm preparing a new PR to fix these issues.

legraphista · 2024-12-23T12:42:46Z

The HWAccel and HWAccelContext merge is also creating problems with multiple streams sharing one context. I'm preparing a new PR to fix these issues.

If you're doing a second PR, could you also have a look at #1690?

WyattBlue reviewed Dec 15, 2024

View reviewed changes

av/codec/codec.pyx Outdated Show resolved Hide resolved

matthewlai force-pushed the hwaccel branch from c39c47c to 6420f5a Compare December 15, 2024 05:01

matthewlai force-pushed the hwaccel branch from 6420f5a to e8983d1 Compare December 15, 2024 07:22

Implemented hardware decoding

2a7e38b

This implements hardware decoding continuing from the work of @rvillalba-novetta and @mikeboers in main...rvillalba-novetta:PyAV:hwaccel (and children commits)

matthewlai force-pushed the hwaccel branch from e8983d1 to 2a7e38b Compare December 15, 2024 07:40

WyattBlue added 2 commits December 17, 2024 04:54

Clean up

a0fc7ec

No args forwarding in Codec.create

6fa996c

WyattBlue force-pushed the hwaccel branch 2 times, most recently from 44f8397 to 741834d Compare December 17, 2024 11:26

Clean up hwaccel

d5121a1

WyattBlue force-pushed the hwaccel branch from 741834d to d5121a1 Compare December 17, 2024 11:47

WyattBlue added 2 commits December 17, 2024 07:06

Combine hwaccel and hwaccel ctx

2b9e3cf

Add changelog

98b863d

WyattBlue merged commit 2c63608 into PyAV-Org:main Dec 17, 2024
8 checks passed

legraphista reviewed Dec 23, 2024

View reviewed changes

legraphista mentioned this pull request Dec 23, 2024

AV1 support for HW decoding #1690

Closed

6 tasks

matthewlai mentioned this pull request Dec 27, 2024

Make previews fast matthewlai/ReefShader#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented hardware decoding #1685

Implemented hardware decoding #1685

matthewlai commented Dec 14, 2024

WyattBlue commented Dec 15, 2024

matthewlai commented Dec 15, 2024

matthewlai commented Dec 15, 2024 •

edited

Loading

WyattBlue commented Dec 17, 2024 •

edited

Loading

legraphista Dec 23, 2024

matthewlai Dec 23, 2024

matthewlai commented Dec 23, 2024

legraphista commented Dec 23, 2024

Implemented hardware decoding #1685

Implemented hardware decoding #1685

Conversation

matthewlai commented Dec 14, 2024

WyattBlue commented Dec 15, 2024

matthewlai commented Dec 15, 2024

matthewlai commented Dec 15, 2024 • edited Loading

WyattBlue commented Dec 17, 2024 • edited Loading

legraphista Dec 23, 2024

Choose a reason for hiding this comment

matthewlai Dec 23, 2024

Choose a reason for hiding this comment

matthewlai commented Dec 23, 2024

legraphista commented Dec 23, 2024

matthewlai commented Dec 15, 2024 •

edited

Loading

WyattBlue commented Dec 17, 2024 •

edited

Loading