A maintain my own digital music collection. The only two tools I use for maintaining the CD portion of my collection are k3b and MusicBrainz Picard. k3b can rip to flac and it will on embed metadata present on the CD itself. Then after I rip it, I add it to Picard.
I use the "lookup CD" feature in Picard, which gives me a selection of releases to choose from. Among the choices, I usually see a release matching the catalog number on my CD's case. When I don't see a matching release, I will typically add the disc ID to an existing release, or I will create a new release, or sometimes even creating a new release + new release group and add the necessary metadata to MusicBrainz.
I haven't tried any automatic tagging process like the ripping program the article talks about does, mostly because I want to use Picard to make sure the metadata is correct or contribute to MusicBrainz if it isn't.
I like MusicBrainz a lot because applications like Plex use it very well to group release groups together and will (usually) deduplicate identical recordings so that identical tracks can share a rating. It's a really great database and is kept up to date pretty well.
This is a small point, but calling the 33-byte unit a sector in CDDA is a bit misleading and probably incorrect for the quantity being labeled. This is a channel data frame and contains 24-bytes of audio data, 1 byte of subcode data (except for the channel data frames that have sync symbols instead) and the rest is error correction. This is the smallest grouping of data in CDDA, but it's not really an individually addressable unit.
98 of these channel data frames make up a timecode frame which represents 1/75th of a second of audio and has 2352 audio data bytes, 96 subcode bytes (2 frames have sync codes instead) with the remainder being sync and error correction. Timecode frames are addressable (via the timecodes embedded in the subcode data) and are the unit referred to in the TOC. This is probably what's being called a sector here. Notably, a CD-ROM sector corresponds 1:1 with a timecode frame.
Note: Red book actually just confusingly calls both of these things frames and does not use the terms "channel data frame" or "timecode frame"
I used to do the MusicBrainz thing with Picard and later with Beets, but I got sick of Somebody Else's Metadata because of MusicBrainz's (former?) policy where everything must be Title Cased regardless of how it's presented on the CD sleeve. I prefer my tags to match the artist's choice, because I consider it a tonal indicator that helps set the mood for the work.
It seems like they might not enforce that any more since the album I was going to pick on as an example is now tagged like I have it, although I also have lower-case “my bloody valentine” Artist tags on every track with Title Cased “My Bloody Valentime” Album-Artist tag for browsing in Navidrome: https://musicbrainz.org/release/1e4c282b-8b0d-4d20-9f74-175f...
…but I already got out of the habit and will still just keep typing them out myself :)
I also always include the catalog number in the Comment field and in brackets in my folder names to separate different releases of supposedly the same thing. Good example of why you would want to do this is the 2004 vs the 2007 releases of MM..FOOD? where the last track (Kookies) had to be redone to remove the Sesame Street samples:
Somewhat related: some conscious artistic choices - such as writing down two tracks but delivering them as one (not sure if this is what happened here) can’t really be transferred into databases.
I own a cd where one track name is a small icon depicting a heart stabbed with a rather lengthy knife. To my knowledge, this track has no canonical name. Any digital version of this cd betrays the respective author‘s interpretation of the icon.
When I was building out infrastructure to support streaming at Sony Music Entertainment, it was well known that interns would input the metadata. Typos were rife and genres? Made up out of whole cloth.
It feels safe to assume that the situation has improved since then, but I doubt seriously we’ll ever be free of typos ;)
There's always going to be outliers but I find MusicBrainz pretty useful. I note that a lot of CD-text has poor application of title capitalization and MB usually has it in a more rational form. My ripping system presents a choice when both are available and I usually pick MB. There's also the benefit that the MB database is Unicode and CD-text is whatever the authoring tool used which is usually CP1252 but sometimes not.
- rip the audio CD via EAC with acousticID (flac)
- retrieve metadata via beets in a script completely automated
- convert flac to mp3 via beets inplace convert (see below)
- backup the flac files to another location
- self-host navidrome and use the substreamer / dsub app and smart playlists to listen "on the go" (The Apple usb-c-to-audiojack adapter is pretty decent)
- transfer this via iTunes VM to my good old iPod Nano 7g as main listening device for audiobooks
If anyone is looking for fast and accurate ripping hardware, recently I updated my recommended hardware list including a linked tutorial for EAC:
"MusicBrainz is operated by the MetaBrainz Foundation, a California based 501(c)(3) tax-exempt non-profit corporation dedicated to keeping MusicBrainz free and open source." - the gloriously retro-looking front page
I use MusicBrainz and donate every month - yeah data is not perfect, but you can go and fix it yourself if needed, and the UI is extremely functional without any frills.
> Aside from some audio tracks and a table of contents over those tracks, very little extra information is included on a disk - you've pretty much only got the artist name, album name and track names actually burned into the disk.
Huh, I actually didn't think there was any metadata at all.
I am keeping an eye on this thread, as I plan to eventually rip my somewhat large collection, but would prefer to do it just the one time.
Exact Audio Copy, the author seems to have moved on to other interests, which is a shame because I was looking for something compatible with an autoloader. And it looks like dbpoweramp is the only one left in that arena.
I am allllll about the metadata. Also, a thumbnail, synced lyrics if they could be found, custom metadata for hyperlinks back to entries on Discogs and MusicBrainz, perhaps some ReplayGain values in fields on the FLAC, depending on my MP3 processing case ... but I have so many unanswered questions.
Man, I thought this was going to be about a decoding tool that had some edge case incorrect, but instead it was just about incorrect entries in a database that was used in place of actually decoding...
I had always thought that the odds of doubled discs based on the TOC were unlikely, but it turns out that with discs with fewer tracks (≤4 or so), you can get duplicates quite easily.
Wait so back in the day I remember Winamp let you configure a CDDB thing and it connected to something called.. Gracenote? (Am I remembering that correctly?) iTunes desktop at some point used to handle this all for you and I assumed it was pulling from those sources under the hood. Where did MusicBrainz come from?
Huh, once I saw the image with the discrepancies I immediately assumed 'ah, "Nothing Coming Soon" must be in the pre-gap of "Don't Need a Reason", especially with that track length, and the rip combined that into one music file', but no, turns out it just isn't defined in the disc metadata at all. Wonder if that's a (mastering?) error, given that the TITLE metadata doesn't even include it.
Perhaps I should create an overlay for MusicBrainz with sub-minute lag called ZombieBrainz.
If you own a CD and send an edit with a $5 donation, it goes on volatile and nightly; It can go to beta instantly for $100 donations and if not it'll have to be flagged for violations. If it needs to happen instantly on stable, $10000 (generous patron tier, where I will write a blog post for this entry as well) else get to it in 3 months.
> Aside from some audio tracks and a table of contents over those tracks, very little extra information is included on a disk - you've pretty much only got the artist name, album name and track names actually burned into the disk.
Is that so? Was that introduced very late in the life of CDs, or why was CDDB a thing then?
You know, I've only ever ripped with iTunes or Music on a Mac, and I've never run into this over (at this point) decades and thousands of rips. Am I just lucky?
MusicBrainz and CDDB have become error-ridden enough that I've essentially stopped bothering with them and have switched back to just entering the information manually.
I've ripped hundreds of CDs and the metadata is usually ok on commercial discs. When ripping CDs I created from LP rips, I use Mp3tag to make it right.
tangentially related- does anyone have a good recommendation on an external CD drive that works well with macOS and has a good form factor and build quality?
I have an ancient thinkpad that I use a couple of times a year _just for reading cds_ and and have considered retiring it. But all the CD drives I see on amazon look like disposable crap.
Why does my ripped CD have messed up track names? And why is one track missing?
(akpain.net)168 points by surprisetalk 12 June 2025 | 152 comments
Comments
I use the "lookup CD" feature in Picard, which gives me a selection of releases to choose from. Among the choices, I usually see a release matching the catalog number on my CD's case. When I don't see a matching release, I will typically add the disc ID to an existing release, or I will create a new release, or sometimes even creating a new release + new release group and add the necessary metadata to MusicBrainz.
I haven't tried any automatic tagging process like the ripping program the article talks about does, mostly because I want to use Picard to make sure the metadata is correct or contribute to MusicBrainz if it isn't.
I like MusicBrainz a lot because applications like Plex use it very well to group release groups together and will (usually) deduplicate identical recordings so that identical tracks can share a rating. It's a really great database and is kept up to date pretty well.
98 of these channel data frames make up a timecode frame which represents 1/75th of a second of audio and has 2352 audio data bytes, 96 subcode bytes (2 frames have sync codes instead) with the remainder being sync and error correction. Timecode frames are addressable (via the timecodes embedded in the subcode data) and are the unit referred to in the TOC. This is probably what's being called a sector here. Notably, a CD-ROM sector corresponds 1:1 with a timecode frame.
Note: Red book actually just confusingly calls both of these things frames and does not use the terms "channel data frame" or "timecode frame"
It seems like they might not enforce that any more since the album I was going to pick on as an example is now tagged like I have it, although I also have lower-case “my bloody valentine” Artist tags on every track with Title Cased “My Bloody Valentime” Album-Artist tag for browsing in Navidrome: https://musicbrainz.org/release/1e4c282b-8b0d-4d20-9f74-175f...
…but I already got out of the habit and will still just keep typing them out myself :)
I also always include the catalog number in the Comment field and in brackets in my folder names to separate different releases of supposedly the same thing. Good example of why you would want to do this is the 2004 vs the 2007 releases of MM..FOOD? where the last track (Kookies) had to be redone to remove the Sesame Street samples:
- 2004: https://www.youtube.com/watch?v=Ci_XcL4nYos
- 2007: https://www.youtube.com/watch?v=8iYSwvdEfeY
Shout-out to https://covers.musichoarders.xyz/ and https://fanart.tv/ for high-quality album art to embed.
I own a cd where one track name is a small icon depicting a heart stabbed with a rather lengthy knife. To my knowledge, this track has no canonical name. Any digital version of this cd betrays the respective author‘s interpretation of the icon.
And then, of course, there’s „Love Symbol“: https://en.wikipedia.org/wiki/Prince_(musician)
It feels safe to assume that the situation has improved since then, but I doubt seriously we’ll ever be free of typos ;)
Sony was a big supporter of it ~25 years ago.
My personal workflow:
If anyone is looking for fast and accurate ripping hardware, recently I updated my recommended hardware list including a linked tutorial for EAC:https://pilabor.com/blog/2022/10/audio-cd-ripping-hardware/
beets convert config:
Not all edits, just major ones (e.g. name changes). Minor edits usually get auto-accepted.
"MusicBrainz is operated by the MetaBrainz Foundation, a California based 501(c)(3) tax-exempt non-profit corporation dedicated to keeping MusicBrainz free and open source." - the gloriously retro-looking front page
Huh, I actually didn't think there was any metadata at all.
Exact Audio Copy, the author seems to have moved on to other interests, which is a shame because I was looking for something compatible with an autoloader. And it looks like dbpoweramp is the only one left in that arena.
I am allllll about the metadata. Also, a thumbnail, synced lyrics if they could be found, custom metadata for hyperlinks back to entries on Discogs and MusicBrainz, perhaps some ReplayGain values in fields on the FLAC, depending on my MP3 processing case ... but I have so many unanswered questions.
So I been contributing to tmdb for the last half year or so :)
If you own a CD and send an edit with a $5 donation, it goes on volatile and nightly; It can go to beta instantly for $100 donations and if not it'll have to be flagged for violations. If it needs to happen instantly on stable, $10000 (generous patron tier, where I will write a blog post for this entry as well) else get to it in 3 months.
Is that so? Was that introduced very late in the life of CDs, or why was CDDB a thing then?
I have an ancient thinkpad that I use a couple of times a year _just for reading cds_ and and have considered retiring it. But all the CD drives I see on amazon look like disposable crap.
It had scratches and even holes, but somehow it worked, lol.