Probably far too late to be of any help, but...
The surface of a floppy is just a bunch of oriented magnetic regions. A 1 bit is flagged anywhere that a region with magnetism running in one direction transitions into a region with magnetism pointing in the other. Different drives run at slightly different speeds, and there's an aerodynamics factor, so the controller uses a phase-locked loop to try properly to space the 1s amongst 0s.
The disk needs to be able to store an arbitrary number of 0s in a row because it needs to be able to store an arbitrary data stream but you can't just leave an arbitrary distance between flux inversions and expect the correct number of 0s to be picked up. A phase-locked loop with no input over a surface with a slightly variable rotation rate is likely to lose synchronisation. So an encoding scheme needs to be derived that places flux inversions sufficiently regularly to keep the PLL synchronised.
Single density/FM was the most naive scheme possible — insert a 1 as a clock bit between every data bit. So there's never more than one 0 in a row. But a disk is an analogue recording surface so if you try to place flux transitions too close together, they'll bleed into each other. So spending so much of the available bandwidth on clock bits gives a relatively low data rate.
MFM is the modified version. It's just a smarter solution to the keeping the PLL synchronised problem. Take the same size of bit windows as FM. Keep the same rule that there'll never be more than two windows in a row without a 1 in them. But increase the underlying clock rate so that you can determine whether the 1 is in the left half or the right half of the window. You can achieve that with a simple rule — write and read at double the rate, but insert a 1 clock bit only if both the bit on the left and the bit on the right are 0. If either is 1 then there's already something there for the PLL.
Further complications are that (i) different sectors may have been written by different drives. So may be at different data rates; and (ii) the read and write heads in a drive are placed slightly apart and the physical density of bits is constant by rotation so different by track, implying that the two heads are a different number of bit windows apart depending on the track. Which, all together, means that sector contents may be spliced in to the existing stream with an abrupt change in phase and slightly aside from where the previous sector lay.
IBM resolved this in two ways:
- leave a large gap between sectors to ensure that even a track formatted on the fastest drive and then modified on the slowest doesn't lose data;
- start each piece of independently written data with a sequence of 00 bytes — which encode as 01010101, priming the PLL — and then a synchronisation word, for proper framing of the bytes that follow.
They defined two different synchronisation words, both of which are intended to be illegal byte encodings. $4489 is used for most purposes, being $A1 with the clock bit that should be at $20 missing. Missing a single clock is unlikely to be enough to throw off a PLL. There's also $5224 which is used only for the index hole. That's $C2 with the clock bit at $80 omitted.
When the WD is normally scanning the disk, any time it spots either synchronisation value it updates its framing for all future bytes. When performing a read sector, it switches off its byte-frame synchronisation during the body of the sector. When performing a read track, it leaves it on.
Unfortunately, $5224 is not just $C2 with an omitted clock, it's also the 16 bits in the centre of the 18 that result when correctly encoding the 9-bit data stream 000101000. So it crops up all the time in the middle of sectors. And each time it crops up during a read track, the WD is going to throw out its previous idea about byte windowing and return a read C2.
So you're therefore almost always going to see additional $C2s, depending on the particular data in the particular sectors on the particular track. It's not faulty hardware and they're not some freak race condition. You should be able to reread the same track over and over and always get the same $C2s.
Clues that you can use:
- if a $C2 is spurious it'll occur partway through an expected byte period. So it'll come early;
- if it's within sector contents then you can spot whether it's spurious by comparing to a read sector of the same place.
So you can in theory spot the first spurious $C2 through timing, do a manual decode on the data returned to make a guess about the sector it was in, read the sector to figure out how the $C2 threw framing off, shift the rest of the previous read track, determine what the next sector would be, read that to patch in contents, etc.
I believe the WD17[7/9]X support sector sizes only up to 2048 bytes. So sadly you can't even do a destructive (i) read track to get the real first few bytes; (ii) start a write track and write just enough to begin a really big sector; (iii) read the track again. Which I think you could on an IBM PC-style 8272, which is a completely different chip. It's also found in the Amstrad CPC and ZX Spectrum +3 if you're looking for something with that chip and a 3" drive.
Euphoric is emulating incorrectly by not inserting the $C2s but I think it comes down to the MFM_DISK file format. It's a heavily reductive, simplified model of a disk surface that can't tell you the difference between a $C2 that is ordinarily encoded and a $C2 or $A1 that has a clock missing. The missing clock makes a big difference to read track, read sector and read address on a WD so an emulator has to make a choice. With no defined guidance, I suppose each makes its own guess — mine has a process of guessing which appear probably to be outside sector contents, which may or may not align with other people's guesses. Similarly it makes no provision for data that is not exactly aligned at a constant clock rate with the index hole across the entire track at a consistent data rate. Presumably Euphoric is pretending that there's no such thing as clock bits, that alignment is perfect, and that the missing clocks aren't actually the trigger. But I'm speculating.
Summary: spurious $C2s are part of the WDs intended design. Not a hardware fault. They relate to a part of the floppy disk mechanism that the most popular Oric file format doesn't have any comprehension of. As a result don't trust any of the emulators as guides, as each is inventing their own way of dealing with the file format.
EDIT: I've a full emulation implementation of a synchronising read track if it would be of any help to throw a DSK at it and confirm or deny the output for you.