-
-
Notifications
You must be signed in to change notification settings - Fork 65
Focus on Chapter metadata
Centralized information about audio chapters and their various implementations.
Standard | Marker | Supported fields | Specifications | Notes |
---|---|---|---|---|
ID3v1 | None | N/A | ||
APE tag | None | N/A | ||
ID3v2 |
CHAP / CTOC
|
Start timestamp, End timestamp, Start frame, End Frame, Title, Subtitle + any other ID3v2 field | Link | Very modular |
Vorbis | CHAPTERnnnXXX |
Start timestamp, title, URL | Link | Purely text-based, simple and efficient |
MP4 | "Quicktime" chapters (CHAP ) |
Duration, title, attached picture | Link - short paragraph about chapter lists | Text data muxed into the media stream (akin to subtitles) -> complex to read and write |
MP4 | "Nero" chapters (CHPL ) |
Start timestamp, title | None | Metadata-based -> simpler to use than QT chapters |
Matroska |
Chapters Element |
Start timestamp, End timestamp, title | Basic / Advanced | Metadata-based, using Matroska Elements |
ID3v2 chapters structure is organized around the CTOC (table of contents) and CHAP (chapter) fields
- CTOC basically lists all chapters, giving them an order. CTOC frames can be nested in one another, making it possible to even describe a tree structure
- CHAP describes a chapter : title, subtitle, starting and ending frame / timestamps... The format allows the use of any ID3v2 field to describe a chapter, which makes it very flexible.
The standard is documented thoroughly at http://id3.org/id3v2-chapters-1.0
Vorbis structure for chapters is very simple and efficient. It fits into "vanilla" Vorbis, as it is text-based.
The idea is to use basic frames to describe chapter metadata, prefixing them with "CHAPTERnnn" (e.g. CHAPTER001NAME=Prologue)
The standard is documented thoroughly at https://wiki.xiph.org/Chapter_Extension
MP4 allows two methods for describing chapters : "Quicktime" chapters and "Nero" chapters
The MP4 file format being the grandson of the Quicktime file format, they share many things in common, including chapter formatting.
Contrary to ID3v2 or Vorbis implementations, QT chapters are not described into a isolated metadata field somewhere in the header of the file. They are incorporated (multiplexed, to be more precise) into the media data stream itself. They "show up" at their own start timestamps, just like subtitles would. Actually, they do use the same data structure as subtitles.
That being said, parsing QT chapters requires a bit more than peeking into the udta.meta atom :
- In every audio track, look for moov.trak.tref.chap (optional atom)
- If present, 'chap' contains as many int32 (1) as there are related text tracks that contain chapters (e.g. track 1 has a 'chap' atom containing a single int32 with the '3' value -> that means track 3 contains chapters for track 1)
NB : As far as I know, if an audio track refers to multiple chapter tracks, the one containing chapter titles is the first of the list. I'm not sure about how to identify the contents of the other referred chapter tracks (that might contain chapter URLs, chapter pictures or chapters in other languages ?)
- Go to the referred track and make sure its handler type is 'text'
- Get its number of samples and their duration using moov.trak.mdia.minf.stbl.stts. Be aware that actual durations have to be calculated using track timescale (trak.tkhd), and not file timescale (moov.mvhd)
- Map each sample to its containing chunk using moov.trak.mdia.minf.stbl.stsc. Make sure to carefully read the documentation of stsc, as its way of describing data is somehow unusual.
- Calculate the absolute offset of each sample using moov.trak.mdia.minf.stbl.stco (chunk offset) and moov.trak.mdia.minf.stbl.stsz (frame size)
- For each sample thus located, read its title located at the offset.
Type (1) | Data | Notes |
---|---|---|
int16 | String data size | Size of following string data |
string | Chapter name | Uses UTF-8 encoding; size of binary data is declared on previous field |
(1) : Big-Endian convention
Specifications for QT chapters are limited to a small paragraph in the Quicktime File Format, that explains how it works from a functional point of view : https://developer.apple.com/standards/qtff-2001.pdf
"Nero" chapters are an alternative to Quicktime chapters implemented by Nero software suite. It aims at providing a simpler, metadata-based chapter description akin to Vorbis chapters.
They are implemented as a specific atom located at moov.udta.chpl
The contents of the atom is as follows
Type (1) | Data | Notes |
---|---|---|
int32 | Atom Size | As part of standard MP4 atom header |
char[4] | Atom Name | As part of standard MP4 atom header; value is "chpl" |
byte | Version | Atom version |
int24 | Flags | Atom flags (none known so far) |
byte | Reserved | Unknown reserved byte |
int32 | Chapter count | Number of chapters |
--- | --- | --- Following lines are repeated for each chapter |
int64 | Chapter start time | Uses 100-nanosecond base; divide by 10 000 to get milliseconds |
byte | String data size | Size of following string data |
string | Chapter name | Uses UTF-8 encoding; size of binary data is declared on previous field |
(1) : Big-Endian convention
To my knowledge, understanding of Nero chapters comes from retro-engineering, as there are no official specifications.
NB1 : Quicktime player, iTunes and the built-in iOS audiobook player support Quicktime chapters only, and ignore Nero chapters entirely.
NB2 : Some players such as VLC seem to fail reading Nero chapters properly when there are more than 255 of them, for instance on (very) long audiobooks. As the Nero structure actually allows for any number of chapters to be written, I'm unsure if this is a bug or a part of the Nero standard I'm unaware of...
Matroska has its own data structure to describe chapters through the Chapters
Element.
The official doc does a good job at describing its contents.
chapterEditor's dev went down the rabbit hole of describing all advanced use cases and features of Matroska chapters, which looks quite overwhelming.
A few advanced features worth noting :
- "Editions", which can be used to create alternate versions of the same media (e.g. series episode with and without opening credits, alternate cut...)
- Subchapters within chapters (= nested
ChapterAtom
s, technically speaking) - Menus
As far as I know, there is no other implementation of audio chapters. I wouldn't be surprised to see Vorbis-like chapters included informally in other standards, as they are portable to any tagging system without effort.