Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd love a new solution that wasn't "break the CD data into pieces."

I've never looked inside a CUE file, but it's just text and I don't think it supports meta data, right?

We need like a new CUE file to go with the FLAC, right?

p.s. https://news.ycombinator.com/item?id=40923646



Ideally, I think I'd want a singular container (of whatever sort) that has the album's audio, the music-related timing metadata (as applicable), and whatever other metadata may be appropriate (lyrics? liner note graphics? music videos? sure!).

The audio should be able to be FLAC. But it should also be able to be anything else, like Vorbis or MP3 or AAC or IDK. It needs to be able to be played continuously without aberration (which can't actually be done with a group of MP3 streams).

The audio needs to be able to be seekable, like a CD is also seekable. By track. By index. (With pregaps, where appropriate -- because CDs also have pregaps.)

Other potential metadata must be able to include whatever subcodes are involved in things like CD+G[0] and HDCD and CD Text, since all of those are supersets of the regular datastream and playback is compatible with any CD player.

And it needs to be a singular container file because...well, that's just easier to keep track of as the years go by and data migrates.

Only then, will we have the beginning of a valid archive format for audio CDs as they actually still exist on [some] store shelves today.

(Some stuff can be optional, just as lots of things are optional inside of an MKV container for a film.)

[0]: Almost nobody ever used this outside of the 1990s karaoke world, but Information Society's self-titled album includes an illustrated sequence, with lyrics, that is completely implemented in CD+G and that runs for the entire length of the album. And I should be able to render that locally here in 2024 from a container on my pocket supercomputer instead of watching a bad rip from a Sega Genesis: https://www.youtube.com/watch?v=b89sSa8QlLg


You mentioned MKV - Matroska (MKA for audio, MKV for video) could honestly work quite well for this situation with just a little extra standardization.

Audio codecs: use a single stream of whatever codec you'd like. FLAC/Vorbis/MP3/AAC/Opus/etc. can all go into Matroska.

Seekable: Use chapters for tracks, and nested chapters for indices. Matroska documentation even gives an example of using ChapterPhysicalEquiv 20 for CD tracks and ChapterPhysicalEquiv 10 for CD indices.

Other metadata can be muxed into the stream as well.

Lyrics can be included as text in metadata (lyric tag) or as a subtitle stream.

Liner note graphics (and basically anything else) can be included as embedded files.

Music videos can be video streams in the Matroska file.


I'm glad to see this mentioned. This was first thought I had as I progressed through this thread. I'm surprised this isn't a popular, supported standard already.


Nested chapters can work for index markers, especially if a player supports them right.

I mean: As mentioned, these have almost never been usable with real CD players in the wild. Maybe not much is lost there. (But the format must still accept these things, and allow them to be usable! An archival format must respect all aspects of the item being archived, including those that are unpopular or disused. I am willing to die on this hill.)

What of things like CD+G? Here in 2024, they're very simple graphics using 35+-year-old tech, and they should be archived neatly, precisely, and without interpretation, to be rendered client-side at a later point. I think I've mentioned it, but we literally have pocket supercomputers in common use today. If we can make the complexities of MAME work for the past couple of decades, and do it with direct ROM dumps, we can do this for CD+G.

But the CD+G must be rendered synchronously with CD audio data on playback. This applies whether it is my Goldilocks example of an Information Society album, or whether it is a CD+G karaoke disk with Garbage's I'm only happy when it rains (and twelve other crowd pleasers from that month of 1995).

How will that work with MKA?

And how will pregaps work?

(Maybe MKA isn't an ideal container if it does not already include avenues that lead to this kind of functionality in ways that are compatible with the original article.)


Interesting point about CD+G. I think whatever format was used needs to take this into account.

There were also a ton of Audio CDs that were not CD+G but had a data track with the music video etc on them.

I worked on a horrible one for Sony, one of those ones with all the anti-rip protection on it, where I was tasked to build a binary blob for a web site that detected if the specific audio CD was in your drive and let you into the web site. What were those things called, ActiveX?


Sony had plenty of awful stuff at different times for audio CDs, despite being a co-developer of this wildly-successful and long-lasting format.

I think you're referring to ActiveX, yes. It's the only thing I can think of where "web" stuff and "hardware" stuff commingled back then in a semi-transparent way.

And famously, as you surely know yourself, Sony once published some rootkitted audio CDs: https://en.wikipedia.org/wiki/List_of_compact_discs_sold_wit...

Anyway, I'll just assume that you aren't the rootkit guy -- or even if you are, that your heart is in the right place.

---

And yes, CD+G is is important. As are the mixed-mode releases with video. All things CD audio are important if we are to talk about an ideal archival (and playback!) format for audio CDs, and archiving an audio CD is not always quite as simple as ripping a folder of FLACs -- there's a ton of diversity here that FLAC (and cue) can't accurately embody.

We're fortunate that we still have so many CDs right now, and that they're still being sold today. This will change. (It must change. It can't not change.)

The good folks working on the Domesday Duplicator have a relatively uphill battle for the often-older (and often rotting) LaserDisc media that they're working on tools to properly preserve.

It would be good to get ahead of the curve and get something with a practical workflow working sooner instead of later.


You are almost describing the MAME CHD format. As they have the problem that the object (hard drive, cd, dvd, etc) must be in one file. Have the ability to do differences (writable in some cases). But also compressed (compressed hunks of data). They also need that sub track data too as some systems do interesting things with that sub data. As some even hide their encryption format in the SBI fields. The CHD format is more like a container that acts like whatever media it was. Depending on what system they hook it up to. The downside is there is no concept of 'metadata' to find different things in CHD. It is up to the system it is hooked up to to interpret what that data stream is.


This could be a good avenue. It might be possible the CHD format could be extended and be backward-compatible, or even just as simple as bodging all the extra data onto the end of the file and hope it is ignored by other readers. This is an avenue worth exploring, thank you.


There is a way to extend the format. As it has version number. It does have some metadata fields (drive geometry, compression formats, version of mame created with, etc). The trick would be getting the MAME team to accept the changes. Just dumping it on the end of the file I would guess they would not be too happy with.

There are a number of requests out there to extend the chd format and fix a few things. They are currently tracking some of this info in XML files (called hash files). They would be down with more accurate information though. So you would need a proposal that adds more accurate information and gives them something to work with. More like getting all the info that something like the redump project tracks in there would make them very happy.

There is a separate project that some other emus use libchdr which is a soft fork of MAME. I think they are trying to track closely to what the MAME group is doing but let other emus use it.


Can’t an MP4 container do most or all of that already? (Pregaps would probably need to become a full-fledged chapter in their own right, with the current spec.)


Cue is a bodge that should never have become a defacto standard. Joerg Schilling's cdrdao tool has its own TOC format that faithfully captures everything including index marks, various flags, and multilingual CD text but it was ignored by everything else in the heyday of the ripping era. Nowadays we'd be better off with a standard yaml/json format that duplicates what cdrdao provides.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: