More

bytefire · on Nov 18, 2018

It's interesting how little known this exception handling mechanism is: https://stackoverflow.com/questions/51761688/linux-driver-tr...

bytefire · on Nov 16, 2018

Thanks. I am actually using this inside a kernel module whose job is to inspect Intel's virtualisation state: https://github.com/bytefire/vmtool

bytefire · on Oct 15, 2018

good point. may be the central idea of how it's implemented isn't too bad: i see hypervisor as a sort of OS kernel for VMs and the transitions from VM to hypervisor - VM exits - akin to syscalls. of course there is more but the above analogy is the basic idea and other things get added along the way

bytefire · on Oct 14, 2018

hi userbinator :) isn't the purpose of virtual 8086 mode somewhat different? i.e. to run real mode applications while the cpu is in protected mode? or did you mean that virtual 8086 could be generalised into a wider virtualisation system?

userbinator · on Oct 15, 2018

or did you mean at virtual 8086 could be generalised into a wider virtualisation system?

Yes, if you look at the way V86 is implemented, it wouldn't be too hard to extend it to full virtualisation --- something like a "VMX mode task" would've been ideal.

bytefire · on Oct 15, 2018

i see, makes sense. may be a different team from V86 worked on it? Conways law :)

bytefire · on Oct 14, 2018

very interesting and creative use if EPT, will read the link. thanks for sharing

bytefire · on Oct 14, 2018

thank you that means a lot! please do add any information you think is relevant :)

bonzini · on Oct 14, 2018

The bit about TLBs is a bit confusing, it seems like you're taking about a software TLB but EPT is just a second layer of address translation.

Also, after moving a VMCS from a physical CPU to another you have to do VMLAUNCH the first time your start the guest on the new CPU, because you had VMCLEARed it on the old CPU. That's it. :-)

bytefire · on Oct 14, 2018

very good, thank you. i'll try to tidy it up

bytefire · on Oct 8, 2018

no you didn't overlook. the article doesn't discuss actual mechanics of DRAM init, so thank you for adding this info :) i know there is a process of memory training whose aim is to arrive at the right parameters for that DRAM. the way i see it, it is sort of in-field caliberation. boot firmware can then store those parameters inside BIOS chip and then on next reboot just use those parameters, because memory training is a time-consuming process.

bytefire · on Oct 8, 2018

you're right, MRC is a major part of FSP but i think FSP does more work than just initialise memory. it also performs some CPU init and also ICH.

bytefire · on Oct 7, 2018

@burfog i have updated the post with explanation of how the reset vector address is calculated. thanks for pointing out :)

JdeBP · on Oct 8, 2018

As others here, I strongly recommend reading the IA manuals on this subject, as well as the equivalent AMD doco. Most of the processor part (but not the firmware part) of this subject is in the manufacturer doco.

And yes, one has to be careful about outdated information.

* https://superuser.com/a/347115/38062

* https://superuser.com/a/695716/38062

* https://superuser.com/a/345333/38062

* https://unix.stackexchange.com/a/461774/5132

bytefire · on Oct 7, 2018

yes CS not SS. i should fix that.

regarding how the CPU addresses 0xffff.fff0 is not exactly specified in the post. actually CS register is loaded with 0xf000 and normally this would yield a segment selector address of 0x000f.0000 (CS left-shifted by 4 bits). but on a reset, like the post mentions, first 12 address lines are asserted so the base address ends up being 0xffff.0000. these address lines remain asserted until a long jump is made, after which the first 12 address lines are de-asserted and normal CS segment selector calculation resumes.

instruction pointer contains -16 as you mentioned, the resulting address is:

base address + IP = 0xffff.0000 + 0xfff0 = 0xffff.fff0

i am not sure if this is worth adding to the post but it is definitely useful.

atq2119 · on Oct 7, 2018

I recall reading that it's not that those 12 bits are explicitly asserted, but rather that the CS descriptor after reset is in an "unreal mode". After all, x86 segment descriptors consist not just of their numeric value, but also of a base address, segment size, and privilege information.

So at reset, CS is set to a descriptor whose numeric value is 0xf000 and whose base address is 0xffff0000, or something to that effect. All the rest follows naturally -- there's no special case logic that asserts lines of the address bus until the first long jump, it's simply that the reset value of the CS descriptor is rather magical, and that long jumps by their nature load a new CS segment descriptor which isn't magical.

userbinator · on Oct 7, 2018

Yes, initial CS descriptor has base FFFF0000 and limit 0000FFFF, and initial EIP is FFF0. Paging is disabled so first instruction is fetched from physical address FFFFFFF0. This has been true since the 386.

See 386 datasheet, page 20:

https://media.digikey.com/pdf/Data%20Sheets/Intel%20PDFs/Int...

The 8086/8088 is slightly different since it doesn't have protected mode; initial CS:IP is FFFF:0000 which gives a first address of FFFF0. The 286 is closer to the 386+ but its 24-bit address space means the first instruction comes from FFFFF0 instead.

bytefire · on Oct 7, 2018

this is interesting! i am not aware of how this logic is implemented, i.e. the logic of initial state where 12 most significant bits but thanks for enlightening

hyperman1 · on Oct 7, 2018

I looked it up:

https://software.intel.com/en-us/articles/intel-sdm#nine-vol...

Get volume 3A and read chapter 9.1.4 at pg 315. The text is quite readable:

  The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode. The
  processor is initialized to this starting address as follows. The CS register has two parts: the visible segment
  selector part and the hidden base address part. In real-address mode, the base address is normally formed by
  shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address. However, during a
  hardware reset, the segment selector in the CS register is loaded with F000H and the base address is loaded with
  FFFF0000H. The starting address is thus formed by adding the base address to the value in the EIP register (that
  is, FFFF0000 + FFF0H = FFFFFFF0H).

Any change to CS reverts this to normal real mode operation. So near jumps are OK, far jumps or interrupts are not.

Stratoscope · on Oct 7, 2018

Speaking of readability, here's a readable copy of that quote:

> The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode. The processor is initialized to this starting address as follows. The CS register has two parts: the visible segment selector part and the hidden base address part. In real-address mode, the base address is normally formed by shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address. However, during a hardware reset, the segment selector in the CS register is loaded with F000H and the base address is loaded with FFFF0000H. The starting address is thus formed by adding the base address to the value in the EIP register (that is, FFFF0000 + FFF0H = FFFFFFF0H).

(Don't use indentation to format a block quote, only use it for code listings.)

monocasa · on Oct 7, 2018

Practically, nearly all code I've seen pretty much immediately far jumps into 32-bit protected mode.

burfog · on Oct 7, 2018

The instruction pointer is the IP register. It is zero. It does not contain -16 or 0xfffffff0. The linear address is a different thing (computed from the CS base plus the IP/EIP/RIP content), as is the physical address.

Unless something has changed in recent hardware, there aren't 12 address lines just asserted. This is a side effect of the CS base being a particular value.

An important thing to realize is that x86 has hidden registers associated with segments. These registers get set when a segment selector register is loaded, not when it is used. The CS base is one of these hidden registers. If CS is loaded in protected mode, the base comes out of the descriptor table, and it remains when switching back to real mode. (this is the "unreal mode") If CS is loaded in real mode, the base comes from the selector shifted left, and this base remains even if you switch to protected mode. Switching modes doesn't change a segment base. Loading segment registers is what changes a segment base.

So initially, the CS base is not set in a way that matches what you would get if you loaded the CS selector value that is seen. It is set to a value that is possibly 0xfffffff0, 0x00000ffffffffff0, 0x0000fffffffffff0, or 0xfffffffffffffff0. The older documentation I've seen would use the largest of those values. I suppose it could then be cut down to 32-bit by the bottleneck that is normally a part of addressing when not in long mode. This is the sort of area where Intel, AMD, and others may differ.

Perhaps there is a hardware debugger for x86 (like a JTAG debugger) that would show the initial CS base. One could also guess that Simics or VMware might be correct, disassembling them to find out what they use. Another idea is to examine the badly-documented state used by the virtualization instructions.

bytefire · on Oct 7, 2018

> The instruction pointer is the IP register. It is zero.

it is 0xfff0, at least according to Intel Software Developer's Manual Volume 3, section 9.1.4 "First Instruction Executed". regarding 12 address lines being asserted, that is just a way of thinking about it. actual implementation might be different but what happens on reset is akin to 12 most significant bits being set. CS is 0xf000.

indeed a debugger would give the right answer.

userbinator · on Oct 7, 2018

Initial IP was 0 on the 8086/8088. I suspect that detailed technical information like this tends to be copy-pasted more than understood, which is why a lot of second-sourced information out there on it is just plain wrong or caveated. The sometimes self-contradicting information in Intel's own docs doesn't help either.

This is what I've figured out from Intel's docs:

    8086/88:   CS:IP = FFFF:0000 first instruction at FFFF0
    80186/188: CS:IP = FFFF:0000 first instruction at FFFF0
    80286:     CS:IP = F000:FFF0 first instruction at FFFF0
    80386:     CS:IP = 0000:0000FFF0 or F000:0000FFF0[1], first instruction at FFFFFFF0
    80486+:    CS:IP = F000:0000FFF0(?) first instruction at FFFFFFF0

[1] Depending on which datasheet/programmer's reference manual you read. I can't find any reference to someone who actually checked what the hardware did, however.

More interesting reading...

http://www.rcollins.org/Productivity/DescriptorCache.html

http://www.rcollins.org/ddj/Aug98/Aug98.html

https://www.pcjs.org/pubs/pc/reference/intel/80386/loadall/

bytefire · on Oct 8, 2018

@userbinator thanks for clarification! this is indeed useful and helps understand contradictions.

ddevault · on Oct 7, 2018

By "asserted" do you mean "pinned" or "fixed"?

bytefire · on Oct 7, 2018

sorry could you explain what pinned or fixed mean in this context. by asserted, i mean the corresponding bits being set. similar thing to when one says an interrupt line is asserted, or gpio is asserted :)

ddevault · on Oct 7, 2018

"Asserted" is a past-tense verb, which implies a process of checking that it's valid at a certain moment in time, before proceeding some some work that requires it to be so. If I understand correctly (and I may not), the case is rather that the bits are always set, in which case I might call them "pinned" to 1 or "fixed" to 1 - meaning they cannot change, rather than should not change. It might also be that they are set to these values automatically during boot, but can be changed by the firmware - in which case "initialized" could be better.

Or at least that's the source of confusion for me, maybe the terminology is different at this level.

Filligree · on Oct 7, 2018

> Or at least that's the source of confusion for me, maybe the terminology is different at this level.

Others have mentioned it is.

The reason why 'asserted' is used is that signals at this level are basically analog. The circuit that asserts a signal is, fairly literally, being assertive, and there are all sorts of commonly used options: Pull-ups and pull-downs, either one in either weak or strong (assertive) form.

Connecting a strong pull-down to a strong pull-up represents a short circuit, but having one circuit assert a logical 1 while the other circuit on the same pin holds a weak pull-down (presumably, in this case, 0), is a pretty common configuration.

The most important thing to keep in mind, working with electronics, is that all pins must be connected to at least a weak pull-up/down, which can be as simple as an MOhm-class resistor connected to ground.

If they aren't, then the gate is floating -- and a floating CMOS gate can easily reach states where the gate itself is short-circuiting, since they're made from a transistor pair connected to both ground and power. (As is necessary to support both pull-up and pull-down.) If that doesn't destroy the gate -- check your datasheet -- then, at a minimum, it'll still waste power.

The majority of common microcontrollers (e.g. Arduinos) will allow you to configure the gate with a internal weak pull-up/down, to let you avoid connecting every single pin, but you shouldn't assume that it's configured that way out of the reset vector. Nor that such an internal pull-up even exists.

dfox · on Oct 7, 2018

Asserted is standard term used in electronics for "having the logically active state" used when whether that means one or zero on the physical wire is irrelevant. In particular Intel's documentation uses "asserted" in this sense.

bytefire · on Oct 7, 2018

ah i see your point. here asserted defines a state and not a way of ensuring a certain condition is met (as in higher level languages). yes "initialised to 1" would also convey same meaning. i hope it's clearer now. the term asserted is often used in electronics and i can see how it can be misleading.

ddevault · on Oct 7, 2018

Thanks for clarifying!

utborin · on Oct 7, 2018

The terminology is different at this level. Asserted means the lines are set to whichever voltage is logically 1.

0xcde4c3db · on Oct 7, 2018

To me, "asserted" implies a control signal whose semantics aren't numeric. That is, it's describing a true/false or on/off state ("is this condition present?") rather than a binary digit. For an address or data bus I think it's fine to just use "1".

jamiek88 · on Oct 7, 2018

[flagged]

ddevault · on Oct 7, 2018

I don't think it's appropriate for you to bring beef from another thread into an unrelated one, nor do I think your characterization of my behavior in that thread is accurate.

jamiek88 · on Oct 7, 2018

Point taken. You are correct.

Too late to edit now but I apologize.

craftyguy · on Oct 7, 2018

Asking for clarification is being an 'asshole'? Man, get off your high horse.