Ask HN: Recovering an audio disk ripped in an unknown format

matja · on Dec 29, 2018

A quick look in a hex-editor shows a familiar little-endian 16-bit PCM format, with 2352 byte chunks of data with 96 bytes of non-PCM data repeating - this is characteristic of raw Mode-2 CD digital audio with sub-channel data.

I think .MDF files started with the Alcohol 120% program and are typically the raw data from a CD/DVD, with the track information in a corresponding .MDS file.

Not sure of a free/open program to directly convert to a familiar format, so I made this C program:

  #include <unistd.h>
  #include <string.h>
  #define BLOCK_SIZE 2448
  #define DATA_SIZE 2352
  int main() {
    char buf[BLOCK_SIZE];
    ssize_t count;
    while (1) {
      count = read(0, buf, BLOCK_SIZE);
      if (count > 0) {
        write(1, buf, DATA_SIZE);
      } else break;
    }
    return 0;
  }

compile and converted to raw CDDA audio with:

  cc -o convert convert.c
  ./convert < "Audio Disk.mdf" > out.cdda

Then use 'sox' (open source audio processing program) to convert CDDA to .wav :

  sox -c 2 -b 16 -r 44100 --endian little out.cdda out.wav

I'm not sure if the track length information is recoverable.

Update: mdf2iso seems to work fine also to convert the .mdf to raw .cdda format: mdf2iso "Audio Disk.mdf" out.cdda

matja · on Dec 30, 2018

On further thought, the subcode 'P' channel can be used to identify the start of each track, which I verified is present in the .mdf file, but I didn't find a way to convince mdf2iso to output a TOC/CUE or to convince cdda2wav or cdaparanoia to read from a file rather than a CD drive. So I made a utility to split a CDDA .mdf file into WAV files : https://github.com/matja/mdf2wav

The .mdf file probably also contains the subcode 'Q' channel , which can be used to identify the CD and get the track names using cddb (cdda2wav/cdparanoia can do this, but apparently only from a physical CD drive).

gus_massa · on Dec 30, 2018

You are just reading blocks of 2448 bytes and writing blocks of 2352 bytes. Does this just remove some kind of "footer"/crc/whatever?

matja · on Dec 30, 2018

Yes, in the .mdf format there is usually subchannel information after every sector. In Red Book (CDDA) Mode 2, this is 2352 bytes of audio data followed by 96 bytes of subchannel data, so this is just skipping the subchannel data.

gus_massa · on Dec 29, 2018

Most formats include some kind of header. Try to look at the first 8-16 bytes of the file, and hope that Google can find a site about them.

ohnoesjmr · on Dec 29, 2018

Yeah, first few hundred bytes are all zeroes.

gus_massa · on Dec 29, 2018

First 8 non zero bytes?

Last 8 bytes? (IIRC the .zip file has the "header" at the end. I doubt this is a .zip file, but a small amount of formats have the header at the end.)

Have you tried listening to the file as a raw wave? (perhaps with 1/2/4 bytes per sample and 1/2 channels)