Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple instance of tags (like genre) #886

Open
whatdoineed2do opened this issue Jan 15, 2020 · 14 comments
Open

support multiple instance of tags (like genre) #886

whatdoineed2do opened this issue Jan 15, 2020 · 14 comments

Comments

@whatdoineed2do
Copy link
Contributor

Some file formats (flac) support multiple instance of a tag, like genre, and ffmpeg happily reports these as semicolon separated values and indeed this is what appears in the db.

$ sox -n -c 2 -r 44100 -b 16 /tmp/sine441.flac synth 30 sin 500-100 fade h 0.2 30 0.2 

$ ffprobe -hide_banner -i /tmp/sine441.flac 
Input #0, flac, from '/tmp/sine441.flac':
  Metadata:
    Comment         : Processed by SoX
  Duration: 00:00:30.00, start: 0.000000, bitrate: 206 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, stereo, s16

$ metaflac --set-tag=GENRE=foo /tmp/sine441.flac
$ metaflac --set-tag=GENRE=bar /tmp/sine441.flac
$ metaflac --list /tmp/sine441.flac 
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4096 samples
  maximum blocksize: 4096 samples
  minimum framesize: 2080 bytes
  maximum framesize: 2946 bytes
  sample_rate: 44100 Hz
  channels: 2
  bits-per-sample: 16
  total samples: 1323000
  MD5 signature: 764ae326da977d55bf50734623c34b24
METADATA block #1
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 94
  vendor string: reference libFLAC 1.3.2 20170101
  comments: 3
    comment[0]: Comment=Processed by SoX
    comment[1]: GENRE=foo
    comment[2]: GENRE=bar

$ ffprobe -hide_banner -i /tmp/sine441.flac 
Input #0, flac, from '/tmp/sine441.flac':
  Metadata:
    Comment         : Processed by SoX
    GENRE           : foo;bar
  Duration: 00:00:30.00, start: 0.000000, bitrate: 206 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, stereo, s16

Mp3s appear to only support one instance of genre but convention seems to allow for a similar scheme of multiple genres separated by ;. Would this be something that you would support?

The main support would be to allow searches and listing for the different genres - looking at the db, it (probably) doesn't need to change and it can continue to store the ; seperated genres in the single db column and it's only the presentation and search that would be affected.

@ejurgensen
Copy link
Member

Support for having this is in itself fine, but implementing this in a proper fashion would be a difficult, I think. Maybe it isn’t even possible without changing the data model. In my view, a proper solution would have to be general, so not just for genre, and would have to not affect search performance too much. So database indexing also needs consideration. If done without changing the data model I imagine that a number of the searches would become wildcard. That would impact performance, and also gives pains like finding rows with “Rock”, “Rock;Paper” and “Paper;Rock”, but not “Hard Rock”.

That said, I would also like to hear @chme's view. And if the changes you propose are just for the json api and the web then I think it is completely up to @chme.

@chme
Copy link
Collaborator

chme commented Jan 18, 2020

One problem i see, if the db does not change is, how to get the list of genres. You would need to split the genre in the files table into multiple rows and eliminate duplicates.

I do not see a way to do this without changing the database model (maybe you could outline how the different queries will need to change?).

Maybe it is a good idea to have a new database table for additional metadata. This could also allow storing additional information scraped from the web (e. g. lyrics).

@vasilisvg
Copy link

Joining in to let you know that I am having the same issue/feature request. Would love to see a solution to this as well (without re-tagging my whole library :-)

@whatdoineed2do
Copy link
Contributor Author

I did start looking at this but its kinda fraught with challenges. There are a couple of options:

  • leave files as it stands and have extract/queries explicitly seperate out multi-tag elements
    queries on multi-tag elements will be more complex (see above)

  • remodel the files table to allow multi-tag elements somehow
    not desirable, multi rows per song to handle multiple elements

  • add supplementary meta table that can have linkage back to files entry
    this is the one I explored most and is still troubling to resolve. If we keep the main files table as it stands, where each cols like genre continue to hold only one item element and any additional elements can be held in the meta table that could be name-value pairs

ffmpeg would provide genre like: Pop;Jazz;Smooth Listening and the scanning would have to insert:

files

id ... genre ...
1234 ... Pop
2345 ... Rock

meta

id type value
1234 genre (use int rep) Jazz
1234 genre (use int rep) Smooth Listening
2345 something else (use int rep) xxxx

The big problem is in the main BROWSE queries as the structure expects only lookups from one table and esp in db_build_query_clause() that would need to change to augment data from the meta table and it's not clear whether this approach would be acceptable.

@ejurgensen
Copy link
Member

I don't think any of those options is the way to go. Since we would want to support n:n relationships, we would have to remove the columns from the files table. We would then either have to have two tables, one with all the metadata, and one for joining the two tables, or a denormalized meta table like the one you have above. Not sure which would be best. As you say, most of the queries would then have to be revisited and optimised, and if the complexity of them goes through the roof it wouldn't be acceptable for me.

@Toby-Haynes
Copy link

With the closure of freedb.org, I expect more people to use musicbrainz.org for their collections. I've taken that path, along with folksonomy tags taken from last.fm. This means that most of the several thousand tracks I have are tagged with three to five genres.

I think you want to leave the genre returned from ffmpeg intact in the files table so that it remains searchable. Then the meta table would include the separated tags one per row.

@hacketiwack
Copy link
Collaborator

To revive this subject, I would simply complement the API endpoint "/api/library/genres" with a new parameter that would return a list of genres split given a list of characters acting as genre separator.

The list of separators could simply be a configuration in the web UI with a configurable default being ";,/".
These 3 characters probably cover most of the standard possibilities.

@ejurgensen how complicated would it be to have a new parameter on the API for that purpose?
The option should be optional.

This would cover the point #1464 as well.

@ejurgensen
Copy link
Member

list of genres split given a list of characters acting as genre separator

I'm not completely sure I understand what you mean. Do you mean that the endpoint for a library with a file that has Rock/Pop should return Rock and Pop separately? Then yes that is easy enough, but the problem comes afterwards when the user then expects to see the track when clicking either Rock or Pop.

@hacketiwack
Copy link
Collaborator

@ejurgensen, yes, you understood correctly.

To be clearer, it could work as follow:

A call on the endpoint /api/library/genres?separators=/,; would return:

{
  "items": [
    {
      "name": "Disco",
      "name_sort": "Disco",
      "track_count": 10,
      "album_count": 1,
      "artist_count": 1,
      "length_ms": 2816099,
      "time_added": "2022-06-19T16:55:47Z",
      "in_progress": false,
      "media_kind": "music",
      "data_kind": "file",
      "year": 2007
    },
    {
      "name": "Electronic",
      "name_sort": "Electronic",
      "track_count": 14,
      "album_count": 1,
      "artist_count": 1,
      "length_ms": 2883520,
      "time_added": "2022-06-19T17:02:20Z",
      "in_progress": false,
      "media_kind": "music",
      "data_kind": "file",
      "year": 2010
    },
    {
      "name": "Techno",
      "name_sort": "Techno",
      "track_count": 4,
      "album_count": 1,
      "artist_count": 1,
      "length_ms": 1628001,
      "time_added": "2022-06-19T16:58:53Z",
      "in_progress": false,
      "media_kind": "music",
      "data_kind": "file",
      "year": 1993
    },
    {
      "name": "Acid Jazz",
      "name_sort": "Acid Jazz",
      "track_count": 114,
      "album_count": 17,
      "artist_count": 12,
      "length_ms": 33586231,
      "time_played": "2023-06-30T20:17:40Z",
      "time_added": "2023-01-09T04:30:34Z",
      "in_progress": false,
      "media_kind": "music",
      "data_kind": "file",
      "year": 2014
    },
    {
      "name": "Ambient",
      "name_sort": "Ambient",
      "track_count": 1,
      "album_count": 1,
      "artist_count": 1,
      "length_ms": 313965,
      "time_added": "2022-06-19T16:45:23Z",
      "in_progress": false,
      "media_kind": "music",
      "data_kind": "file",
      "year": 2004
    }
]}

And instead of using the search endpoint to retrieve the albums or tracks of a specific genre, the following endpoint could be implemented: /api/library/genres/{genre name}?type={albums|tracks}

{
  "items": [
      {
        "id": "2118313652228633454",
        "name": "Alta Fidelidade",
        "name_sort": "Alta Fidelidade",
        "artist": "André Bourgeois & Mano Bap",
        "artist_id": "6257071163004072277",
        "track_count": 10,
        "length_ms": 3394798,
        "time_added": "2023-01-06T11:00:20Z",
        "in_progress": false,
        "media_kind": "music",
        "data_kind": "file",
        "year": 2004,
        "uri": "library:album:2118313652228633454",
        "artwork_url": "./artwork/group/39986"
      },
      {
        "id": "7624549345586162870",
        "name": "Europop",
        "name_sort": "Europop",
        "artist": "Eiffel 65",
        "artist_id": "4385975471614414567",
        "track_count": 1,
        "length_ms": 286720,
        "time_added": "2022-06-19T17:04:26Z",
        "in_progress": false,
        "media_kind": "music",
        "data_kind": "file",
        "year": 1999,
        "uri": "library:album:7624549345586162870",
        "artwork_url": "./artwork/group/41017"
      },
      {
        "id": "227802217919468243",
        "name": "From One Human Being to Another",
        "name_sort": "From One Human Being to Another",
        "artist": "Mourah",
        "artist_id": "813699217636935292",
        "track_count": 11,
        "length_ms": 3644608,
        "time_added": "2022-06-19T16:55:34Z",
        "in_progress": false,
        "media_kind": "music",
        "data_kind": "file",
        "year": 2003,
        "uri": "library:album:227802217919468243",
        "artwork_url": "./artwork/group/36420"
      },
      {
        "id": "1831492059001993846",
        "name": "Handcream for a Generation",
        "name_sort": "Handcream for a Generation",
        "artist": "Cornershop",
        "artist_id": "1026720172435764867",
        "track_count": 4,
        "length_ms": 1020924,
        "time_added": "2022-06-19T16:53:41Z",
        "in_progress": false,
        "media_kind": "music",
        "data_kind": "file",
        "year": 2002,
        "uri": "library:album:1831492059001993846",
        "artwork_url": "./artwork/group/35453"
      }
]}

By the way, I'm a bit surprised that the current call on /api/library/genres returns the properties time_added, in_progress, data_kind, year.

@ejurgensen
Copy link
Member

My thought was around how to search for albums/artists in these cases. Let's say you have an album tagged "Rock/Alternative" and another "Hard rock". So the split genre list would be:
Alternative
Hard rock
Rock

If the user clicks "Alternative", you want to return the first album, but that requires a wildcard search. That gives a problem when the user clicks "Rock", because then both albums will be returned (assuming the search is case insensitive). Maybe there is some search query magic that could solve this, but I fear it could get ugly.

@hacketiwack
Copy link
Collaborator

Indeed, when clicking Rock, it should return only one album.
I will think about a query that could potentially work without being ugly.

@hacketiwack
Copy link
Collaborator

Given the fact that you the user provides a list of separation characters: ,;/, the query could look like this:

SELECT *
FROM files
WHERE 
    ',' || genre || ',' LIKE '%,Easy Listening,%'
    OR ';' || genre || ';' LIKE '%;Easy Listening;%'
    OR '/' || genre || '/' LIKE '%/Easy Listening/%'
    -- Add more conditions for other separation characters as needed
    OR '\' || genre || '\' LIKE '%\Easy Listening\%';

Efficiency might not be very high though.

@ejurgensen
Copy link
Member

Yes, that could work. The search will be a bit slow, but maybe that's ok here. I will look into it so we can test it a bit. I think it will be with just one kind of separator. Other separators can be converted to that when the library is scanned.

@hacketiwack
Copy link
Collaborator

Sounds good. Your idea with the separators being converted when the library is scanned sounds right to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants