"Who cares if it's slow, at least the code is correct" - someone who never wrote 8-bit code in their life
It at least managed to regurgitate the fast multiply-by-10 algorithm, which of course exists in countless examples all over the web. But then, instead of maybe repeating that twice to multiply by 100, it produced absolutely insane code which - even at a glance - can't be correct unless the input happened to be zero:
; No decimal point, multiply by 100
ld b, h
ld c, l
add hl, hl ; * 2
add hl, hl ; * 4
add hl, hl ; * 8
add hl, hl ; * 16
add hl, hl ; * 32
add hl, hl ; * 64
add hl, bc ; * 65
add hl, hl ; * 130
ld b, h
ld c, l
srl b
rr c ; * 65
add hl, bc ; * 195
add hl, hl ; * 390
add hl, hl ; * 780
srl h
rr l ; * 390
add hl, hl ; * 780
srl h
rr l ; * 390
srl h
rr l ; * 195
srl h
rr l ; * 97.5
add hl, hl ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
add hl, hl ; * 390
srl h
rr l ; * 195
jr done_convert
First a bunch of random left shifts and additions that overshoot the target, then an even more directionless attempt of correcting the result that seems stuck in a loop until it just gives up. Kind of reminds me of "PanicSort" (xkcd.com/1185)
And as you mentioned, it ignores overflow, which might be a real problem since even if it worked correctly, it could only handle amounts up to $655.35. A better solution would have used BCD arithmetic, which the Z80 provides some dedicated instructions for.
That's already how it works on the VGA, you write the start address into a CRTC register and it will be used once the next frame starts, without affecting the current one.
The CRTC latches the display start address at the beginning (IIRC) of vblank, so you can just write a new value to that register at any time without affecting the current frame.
>Miller’s study uses a test called the “trait-judgment task”: A trait like happy or sad flashes on a screen, and research subjects indicate whether the trait describes them. Miller has slightly modified this task for his split-brain patients—in his experiments, he flashes the trait on a screen straight in front of the subject’s gaze, so that both the left and right hemispheres process the information. Then, he quickly flashes the words “me” and “not me” to one side of the subject’s gaze—so that they’re processed only by one hemisphere—and the subject is instructed to point at the trait on the screen when Miller flashes the appropriate descriptor.
Seems to me (not a neuroscientist) like there's a flaw in that experiment: how would the right hemisphere understand the meaning of the words, if language is only processed by the left? I also recall reading that the more "primitive" parts of our brains don't have a concept of negation.
But maybe they have been considering this and it's no issue?
It looks to me (not being an 68k expert) that only the first word is considered the "opcode": the second word just selects what "D" registers are used for the CAS operation. Normally one would expect the zero bits to be completely ignored in that case, since they don't have any role in the instruction.
But maybe on the 68030 in this case, the bits must be zero even if they have no documented use, because there is hardwired logic for another instruction that is activated by those bits being set, somewhat like the 6502 illegal opcodes?
It's reminiscent of ARM, but the relevant part is that the CAS instruction's second word bits 5:0 look like a "modrm" (to use the x86 terminology) where the officially documented values select only Dn, but the undocumented variant would correspond to (d16,An). At least, that's my theory for why A1 gets modified.
It also appears that I may not have been the first one to discover that something odd was going on with that bit, causing it to use A0-A7 (with weird results) instead of D0-D7:
The article is about the hardware and kernel level APIs used for interacting with storage. Everything else is by necessity built on top of that interface.
"fopen"? That is outdated stuff from a shitty ecosystem, and how do you think it's implemented?
If that were the case, wouldn't people just fall down, and possibly die from their heart stopping? Instead of feeling an invisible wall that they can walk away from.
It's possible that actually reading the register takes (significantly) more time than an empty countdown loop. A somewhat extreme example of that would be on x86, where accessing legacy I/O ports for e.g. the timer goes through a much lower-clocked emulated ISA bus.
However, a more likely explanation is the use of "volatile" (which only appears in the working version of the code). Without it, the compiler might even have completely removed the loop?
> However, a more likely explanation is the use of "volatile" (which only appears in the working version of the code). Without it, the compiler might even have completely removed the loop?
No, because the loop calls cpu_relax(), which is a compiler barrier. It cannot be optimized away.
And yes, reading via the memory bus is much, much slower than a barrier. It's absolutely likely that reading 4 times from main memory on such an old embedded system takes several hundred cycles.
From what I understand the timer registers should be on APB(1) bus which operates at fixed 26MHz clock. That should be much closer to the scale of fast timer clocks compared to cpu_relax() and main CPU clock running somewhere in the range of 0.5-1GHz and potentially doing some dynamic frequency scaling for power saving purpose.
The silliest part of this mess is that 26Mhz clock for APB1 bus is derived from the same source as 13Mhz, 6.5Mhz 3.25Mhz, 1Mhz clocks usable by fast timers.
You're right, didn't account for that. Though even when declared volatile, the counter variable would be on the stack, and thus already in the CPU cache (at least 32K according to the datasheet)?
Looking at the assembly code for both versions of this delay loop might clear it up.
The only thing volatile does is to assure that the value is read from memory each time (which implicitly also forbids optimizations). Whether that memory is in a CPU cache is purely a hardware issue and outside the C specification. If you read something like a hardware register, you yourself need to take care in some way that a hardware cache will not give you old values (by mapping it into a non-cached memory area, or by forcing a cache update). If you for-loop over something that acts as a compiler barrier, all that 'volatile' on the counter variable will do is potentially make the for-loop slower.
There's really just very few reasons to ever use 'volatile'. In fact, the Linux kernel even has its own documentation why you should usually not use it:
doesnt volatile also ensure the address is not changed for the read by compiler (as it might optimise data layout otherwise)? (so you can be sure when using mmio etc. it wont read from wrong place?)
"volatile", according to the standard, simply is: "An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine."
Or simpler: don't assume anything what you think you might know about this object, just do as you're told.
And yes, that for instance prohibits putting a value from a memory address into a register for further use, which would be a simple case of data optimization. Instead, a fresh retrieval from memory must be done on each access.
However, if your system has caching or an MMU is outside of the spec. The compiler does not care. If you tell the compiler to give you the byte at address 0x1000, it will do so. 'volatile' just forbids the compiler to deduce the value from already available knowledge. If a hardware cache or MMU messes with that, that's your problem, not the compiler's.
It at least managed to regurgitate the fast multiply-by-10 algorithm, which of course exists in countless examples all over the web. But then, instead of maybe repeating that twice to multiply by 100, it produced absolutely insane code which - even at a glance - can't be correct unless the input happened to be zero:
First a bunch of random left shifts and additions that overshoot the target, then an even more directionless attempt of correcting the result that seems stuck in a loop until it just gives up. Kind of reminds me of "PanicSort" (xkcd.com/1185)And as you mentioned, it ignores overflow, which might be a real problem since even if it worked correctly, it could only handle amounts up to $655.35. A better solution would have used BCD arithmetic, which the Z80 provides some dedicated instructions for.