Does it also convert '...' into '…'? Been seeing that a lot more recently. Also 'naïve' (outside of Raymond Chen's blog), that one seems to have been happening for longer though.
Personally, I can't stand such anti-features, and smartphone OSes seem to be full of them. Which is another reason besides the giant privacy invasion why some weird people (like me) don't use them or are aware of every new feature.
The absolutely minimal code to enter - and leave - protected mode is this:
mov eax,cr0
inc ax ;sets bit 0, assuming it was clear
mov cr0,eax
dec ax ;clears bit 0
mov cr0,eax
As the article correctly said, descriptor caches are what the CPU actually uses to access memory. Coming from real mode, the attributes are already set up the same as in privileged 16-bit protected mode (except that CS is writable), the limit is 64K, and the base is the segment number shifted left by 4, exactly what we require.
"But that's cheating!", some might say - not really, how else would it be possible to even execute one instruction in protected mode, if those registers weren't already initialized to a sane state? CS at the very least has to be, so that you can execute a jump to the "proper" protected mode segment right after loading CR0.
I remember reading some documentation that even said you can load GDT either before or after the "switch" to protected mode, which would be even more impossible if that somehow required different segments already set up.
If you want to be pedantic, there also has to be a jump in there to clear the prefetch queue and make sure the CPU actually interprets code according to the new mode, instead of the one that was active when the instruction was fetched and decoded. But that first jump can - and according to some Intel manuals, must! - actually be a near jump, staying in the same code segment at least for the moment. Since a lot of protected mode init code gets this wrong however, they had to keep support for a far jump as the first instruction as well, probably making the microcode for that instruction slower than it could have otherwise been.
(To enter "unreal mode", of course you do also need a GDT, but it doesn't need to have any descriptors other than one for the flat 4G data segment)
They're only mentioned for the 80386 version of the LOADALL instruction though, where he confirmed that the CPU actually does bus cycles to read them. But the same registers already existed on the 80286 (only 16 bits of course). On that chip they are the 3 words before and 7 words after the MSW, the ones marked as "None" in the table there (note that it is slightly wrong, MSW should be 806H instead of 804H).
Whatever register is copied has to be one of the "type b" ones - ESP, EBP, ESI or EDI. Only ESP is special enough for the hardware to have that direct path for it.
Maybe it's not (just) for privilege transitions, but automatically saving the value of ESP at the start of every instruction, so that it can be "rolled back" when there is a stack limit violation?
That's still billions of CPU instructions being run. If you spent the rest of your life locked in some Tibetan monastery, going through all the steps by hand on paper, you wouldn't even get 1% of the way to rendering this single context menu.
The amount of bloat in modern software is simply obscene.
Yeah, it's probably not an intentional watermark, just something the model has been trained to do. Maybe some professionally written news articles already use them for the same purpose?
Still hope HN adds a filter to block any comment with those characters in it :)
It is very easy to filter those out from the output of GPT, though, using basic UNIX utilities. In fact, many methods don't survive reformatting or copy-pasting, not requiring filtering at all.
It is a very basic watermark technique (text steganography) if it indeed is supposed to be one.
A more advanced one would be a linguistic (grammar-based) one, but I am not going to give any more ideas. :D
It's easy to remove those characters, but that still requires being aware of them, and an intent to deceive. So many people just copy LLM output here because they (wrongly) believe it adds something of value to a discussion.
Rather than some conspiracy, my suspicion is that AI companies accidentally succeded in building a machine capable of hacking (some) people's brains. Not because it's superhumanly intelligent, or even has any agenda at all, but simply because LLMs are specifically tuned to generate the kind of language that is convincing to the "average person".
Managers and politicians might be especially susceptible to this, but there's also enough in the tech crowd who seem to have been hypnotized into becoming mindless enthusiasts for AI.
It's not "paper tape" (that's a digital storage medium), just a printout. And the numbers in the left half are not actually part of the source file, they are line numbers and machine code output produced by the assembler. You'd probably better not waste time transcribing them.
Not trying to dissuade you, but here's some things you should consider:
• Turn off your spell checker, it will only make this more difficult! It certainly won't help with the code itself, and it seems like you want to reproduce everything perfectly, including typos in the comments.
• I'd strongly suggest to at the very least become a bit familiar with 8080 assembly language before attempting this.
• The tools used to produce this output add another layer of complications. They used the PDP10 system's assembler with a set of macros to adapt it to generate 8080 code, so it's using somewhat different syntax and directives than those of 8080-native assemblers (like the ones from Intel or Digital Research).
• Some characters are hard to read, and without knowledge of the context and at least some of the PDP10-specific syntax it will be impossible to just guess. E.g. decimal numbers are sometimes prefixed with '^D', and octal numbers with '^O', which look quite similar in this scan. The 'RADIX' directive changes the default for when there is no such prefix, it should be 10 for most of it, but I think that it does start out as octal. Memory addresses will be octal (like 'RAMBOT==^O20000' in line 13), ASCII characters could be either but they seem to prefer decimal for those ('^D13' is CR, '^D10' is LF).
There are PDP-10 emulators with well-maintained copies of the different operating systems for them, so someone could check that the typed up source can be assembled.
I do have some (surface-level) experience with GB/GBC assembly, but other than that I'm new. As for spell-checker, I've figured out how to rid myself of that. And the paper tape mix-up was just my inexperience.
Personally, I can't stand such anti-features, and smartphone OSes seem to be full of them. Which is another reason besides the giant privacy invasion why some weird people (like me) don't use them or are aware of every new feature.
reply