More

rep_lodsb · 2025-05-05T20:16:57 1746476217

Does it also convert '...' into '…'? Been seeing that a lot more recently. Also 'naïve' (outside of Raymond Chen's blog), that one seems to have been happening for longer though.

Personally, I can't stand such anti-features, and smartphone OSes seem to be full of them. Which is another reason besides the giant privacy invasion why some weird people (like me) don't use them or are aware of every new feature.

rep_lodsb · 2025-05-05T19:20:01 1746472801

The absolutely minimal code to enter - and leave - protected mode is this:

    mov   eax,cr0
    inc   ax        ;sets bit 0, assuming it was clear
    mov   cr0,eax
    dec   ax        ;clears bit 0
    mov   cr0,eax

As the article correctly said, descriptor caches are what the CPU actually uses to access memory. Coming from real mode, the attributes are already set up the same as in privileged 16-bit protected mode (except that CS is writable), the limit is 64K, and the base is the segment number shifted left by 4, exactly what we require.

"But that's cheating!", some might say - not really, how else would it be possible to even execute one instruction in protected mode, if those registers weren't already initialized to a sane state? CS at the very least has to be, so that you can execute a jump to the "proper" protected mode segment right after loading CR0.

I remember reading some documentation that even said you can load GDT either before or after the "switch" to protected mode, which would be even more impossible if that somehow required different segments already set up.

If you want to be pedantic, there also has to be a jump in there to clear the prefetch queue and make sure the CPU actually interprets code according to the new mode, instead of the one that was active when the instruction was fetched and decoded. But that first jump can - and according to some Intel manuals, must! - actually be a near jump, staying in the same code segment at least for the moment. Since a lot of protected mode init code gets this wrong however, they had to keep support for a far jump as the first instruction as well, probably making the microcode for that instruction slower than it could have otherwise been.

(To enter "unreal mode", of course you do also need a GDT, but it doesn't need to have any descriptors other than one for the flat 4G data segment)

rep_lodsb · 2025-05-01T23:53:46 1746143626

AFAIK, the 10 internal registers were first mentioned by Robert Collins in this article: https://www.rcollins.org/articles/loadall/tspec_a3_doc.html

They're only mentioned for the 80386 version of the LOADALL instruction though, where he confirmed that the CPU actually does bus cycles to read them. But the same registers already existed on the 80286 (only 16 bits of course). On that chip they are the 3 words before and 7 words after the MSW, the ones marked as "None" in the table there (note that it is slightly wrong, MSW should be 806H instead of 804H).

rep_lodsb · 2025-05-01T23:47:56 1746143276

Whatever register is copied has to be one of the "type b" ones - ESP, EBP, ESI or EDI. Only ESP is special enough for the hardware to have that direct path for it.

Maybe it's not (just) for privilege transitions, but automatically saving the value of ESP at the start of every instruction, so that it can be "rolled back" when there is a stack limit violation?

rep_lodsb · 2025-04-22T22:03:41 1745359421

That's still billions of CPU instructions being run. If you spent the rest of your life locked in some Tibetan monastery, going through all the steps by hand on paper, you wouldn't even get 1% of the way to rendering this single context menu.

The amount of bloat in modern software is simply obscene.

rep_lodsb · 2025-04-21T07:43:49 1745221429

Probably won't show up in source code at all, though nothing of value would be lost if they did!

cluckindan · 2025-04-21T08:03:52 1745222632

Those characters do matter, though. They are the difference between

    $2.5
    billion

and

    $2.5 billion

rep_lodsb · 2025-04-21T07:41:51 1745221311

Yeah, it's probably not an intentional watermark, just something the model has been trained to do. Maybe some professionally written news articles already use them for the same purpose?

Still hope HN adds a filter to block any comment with those characters in it :)

johnisgood · 2025-04-21T07:54:45 1745222085

It is very easy to filter those out from the output of GPT, though, using basic UNIX utilities. In fact, many methods don't survive reformatting or copy-pasting, not requiring filtering at all.

It is a very basic watermark technique (text steganography) if it indeed is supposed to be one.

A more advanced one would be a linguistic (grammar-based) one, but I am not going to give any more ideas. :D

rep_lodsb · 2025-04-21T08:18:39 1745223519

It's easy to remove those characters, but that still requires being aware of them, and an intent to deceive. So many people just copy LLM output here because they (wrongly) believe it adds something of value to a discussion.

johnisgood · 2025-04-21T08:30:36 1745224236

I do not think either that pasting output of LLM typically adds anything to the conversation. It might, usually it does not.

rep_lodsb · 2025-04-11T11:18:01 1744370281

The zero-byte program should work on either :)

It's also possible to detect which mode the CPU is in:

    bits 16
    mov  ax,start16   ;may load EAX instead,
    jmp  ax           ;skipping this 2-byte instruction

    bits 32
    dec  eax          ;REX prefix in long mode,
    mov  eax,start32  ;may load RAX,
    jmp  eax          ;skipping these 4 bytes
    nop
    nop

    bits 64
    jmp  start64

You can even be compatible with CP/M-80 by putting this at the start:

    add  bx,start8    ;8080: ADD C, JMP start8
    nop               ;immediate may be 16 or 32 bits
    nop

rep_lodsb · 2025-04-09T15:18:50 1744211930

Rather than some conspiracy, my suspicion is that AI companies accidentally succeded in building a machine capable of hacking (some) people's brains. Not because it's superhumanly intelligent, or even has any agenda at all, but simply because LLMs are specifically tuned to generate the kind of language that is convincing to the "average person".

Managers and politicians might be especially susceptible to this, but there's also enough in the tech crowd who seem to have been hypnotized into becoming mindless enthusiasts for AI.

rep_lodsb · 2025-04-07T11:58:18 1744027098

It's not "paper tape" (that's a digital storage medium), just a printout. And the numbers in the left half are not actually part of the source file, they are line numbers and machine code output produced by the assembler. You'd probably better not waste time transcribing them.

Not trying to dissuade you, but here's some things you should consider:

• Turn off your spell checker, it will only make this more difficult! It certainly won't help with the code itself, and it seems like you want to reproduce everything perfectly, including typos in the comments.

• I'd strongly suggest to at the very least become a bit familiar with 8080 assembly language before attempting this.

• The tools used to produce this output add another layer of complications. They used the PDP10 system's assembler with a set of macros to adapt it to generate 8080 code, so it's using somewhat different syntax and directives than those of 8080-native assemblers (like the ones from Intel or Digital Research).

• Some characters are hard to read, and without knowledge of the context and at least some of the PDP10-specific syntax it will be impossible to just guess. E.g. decimal numbers are sometimes prefixed with '^D', and octal numbers with '^O', which look quite similar in this scan. The 'RADIX' directive changes the default for when there is no such prefix, it should be 10 for most of it, but I think that it does start out as octal. Memory addresses will be octal (like 'RAMBOT==^O20000' in line 13), ASCII characters could be either but they seem to prefer decimal for those ('^D13' is CR, '^D10' is LF).

rjsw · 2025-04-07T13:31:05 1744032665

There are PDP-10 emulators with well-maintained copies of the different operating systems for them, so someone could check that the typed up source can be assembled.

EgoIncarnate · 2025-04-07T23:44:35 1744069475

I don't think it would help. As far as I can tell, the source doesn't include the macros needed to actually perform an assembly.

LuciOfStars · 2025-04-07T13:29:20 1744032560

I do have some (surface-level) experience with GB/GBC assembly, but other than that I'm new. As for spell-checker, I've figured out how to rid myself of that. And the paper tape mix-up was just my inexperience.

All super interesting info!