While running some tests creating CD-ROM ISO images with ddrescue, I ended up with ISO images that were incomplete in some cases (last ~50 MB of image file missing), even though ddrescue’s log file didn’t report any errors. Below the results I got from 4 attempts at imaging the same CD-ROM on the same PC (note that some of the ddrescue options I used are slightly different, but this appears to be unrelated to my issue). For this I used 2 different external DVD readers:
-
Reader A - modern Samsung USB device;
-
Reader B - old SATA (internal) device, refurbished to USB.
Attempt 1 - reader A
Command line:
ddrescue -b 2048 -r4 -v /dev/sr0 windows_98_upgrade_nl.iso windows_98_upgrade_nl.log
This resulted in a 601.7 MB ISO image. Here are the contents of the log file:
# Rescue Logfile. Created by GNU ddrescue version 1.17
# Command line: ddrescue -b 2048 -r4 -v /dev/sr0 windows_98_upgrade_nl.iso windows_98_upgrade_nl.log
# current_pos current_status
0x23DC0000 +
# pos size status
0x00000000 0x23DCB000 +
I.e. the log file indicates the CD was imaged without problems. MD5 checksum is 82603be06a8142aad1dfaa9e1279371f
Attempt 2 - reader B
Command line:
ddrescue -d -n -b 2048 /dev/sr0 windows_98_upgrade.iso windows_98_upgrade.log
Again this resulted in a 601.7 MB ISO image, again with no indication of read errors in the ddrescue log. MD5 checksum was (again) 82603be06a8142aad1dfaa9e1279371f
.
Then by chance I discovered some text files in the image file weren’t readable, so I did a third try, now again with reader A.
Attempt 3 - reader A
Command line:
ddrescue -d -n -b 2048 /dev/sr0 windows_98_upgrade_test.iso windows_98_upgrade_test.log
This resulted in a 660.9 MB ISO file. Again no errors in the ddrescue log; MD5 checksum is 24f0f746d0817121253c6b1242d4246e
. After mounting the image, the text files that were problematic in the earlier images were normally readable.
Attempt 4 - reader B
Command line:
ddrescue -d -n -b 2048 /dev/sr0 windows_98_upgrade_refurbished_onemoretry.iso windows_98_upgrade_refurbished_onemoretry.log
Result was identical to result of attempt 3!
So summarising, 2 runs of ddrescue (using 2 different USB readers) resulted in exactly the same error, whereas the remaining 2 runs (again using 2 different readers) completed fine. So what’s going on here!?
Md5sum directly on physical CD
As a first step I computed the MD5 checksum directly on the phyical disc, using:
md5sum /dev/sr0
I repeated this 4 times, using both readers A and B, plugging them into different USB slots. In each case the result was 24f0f746d0817121253c6b1242d4246e
, which is identical to the hash I got for the ISO in attempts 3 and 4 (confirming these images are correct).
Comparison of ISO images in hex editor
I also did a comparison of the intact and faulty ISOs in a hex editor. This revealed that in the faulty images a block of about 59 MB of data is missing at the end of the file. I double checked this by copying the block of missing data to a separate file (missingblock.dat), after which I appended it to one of the faulty files using:
cat windows_98_upgrade_nl.iso missingblock.dat > isorepaired.iso
Then check:
md5sum isorepaired.iso
Result:
24f0f746d0817121253c6b1242d4246e
Which corresponds to the value of the intact image.
But why is this happening in the first place?!
The really important question is why this is happening in the first place, and if there’s any way to avoid it? The thread below on the ddrescue mailing list describes a somewhat similar (but not quite the same) issue:
https://lists.gnu.org/archive/html/bug-ddrescue/2014-02/msg00003.html
Note the following quote from the response by ddrescue’s main author. He suggests that the problem might an issue with a USB port, adding:
Ddrescue can’t know if the data are really good or if the hardware is lying about it.
If correct, this would apply to other imaging tools as well. Based on my results, I’m curious if other people may have run into similar issues. More importantly: how does one even detect errors like these? Of course it is always possible to run a checksum on the physical medium and then compare it to the ISO checksum, but this takes ages. A more quick and dirty approach would be to compare the size of each created image against the size of input medium. E.g. to get the size of a CD-ROM I can use something like this:
lsblk /dev/sr0 -n -b
Result:
sr0 11:0 1 660850688 0 rom
(Third column is size of CD in bytes).
To get the size of the ISO image:
du -b windows_98_upgrade_refurbished_onemoretry.iso
Result:
660850688 windows_98_upgrade_refurbished_onemoretry.iso
This does not guarantee the image is correct, but it will detect missing blocks of data.
I also ran some cursory checks with isovfy and isoinfo, but the output of those tools turned out to be identical for both faulty and intact images, so they’re probably not very helpful for this sort of error.
I’m curious how other people/memory institutions are dealing with this. Any thoughts / suggestions are welcome!
Addition
On Twitter Alexander Duryee rightly pointed out that a CD-Rom's Primary Volume Descriptor contains a field with the size of the disk (this is also where lsblk gets this value). So one would assume that ddrescue would check against this number. Apparently it doesn't do this, so I think I'll reprt this as a bug. (Note that such a check doesn't guarantee the copied data are identical to the source disc.)