Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] What happens when you type Google.com into your browser's address box (github.com/alex)
130 points by ceocoder on March 12, 2015 | hide | past | favorite | 43 comments




I mentioned the gethostbyname call during an interview with this question. The interviewer reminded me that gethostbyname was deprecated and we had a short discussion about the recently discovered vulnerability in the glibc implementation of it.


It has been obsoleted in favour of getnameinfo. (The vulnerability in glibc has been fixed.)


Don't forget key pinning checks... (google.com will be in browsers' preload lists, but for other domains, there's an hpkp check)


Pretty cool.

Since everyone is going to mention what it misses, here's this pet peeve of mine: when I type Google.com it redirects the URL to google.[countrycode] and then I have to jump through hoops to get the .com, the English version and the global results.


go to www.google.com/ncr (no country redirect) and it won't redirect you any more.


Not working for me. I still get redirected to google.co.th


Do you have cookies enabled?


Thank you !


The DNS, ARP, and Socket sections need an update; they're all IPv4-specific.


I'd also like to see what's going on within the Google servers and data centers. Also, a flow diagram would be awesome. I am hoping to show this to my wife to have her appreciate the engineering involved.


Despite how long this is, it leaves out very many details. I'd like to see a description that goes in detail into protocols, code and circuitry signals at every step.


This is still very much a work in progress, there are 12 open pull requests and lots of issue reports. When I first saw this project a few weeks ago it was much smaller and less complete, so I'm looking forward to how it will look like in a couple more weeks.


I got asked this as a Google interview question, I'm assuming that's why it's here though its a fun question by itself.


It seems to forget debouncing by the computer's BIOS, or is that now handled by keyboard hardware?


By the computer in the keyboard. It always was on the PC.

The keyboard is supposed to send properly debounced MAKE/BREAK codes. It also handles autorepeat and knows that some keys are shift keys and lock keys.

It even handles the consequences of new developments and backwards compatibility: the extra ctrl and alt keys couldn't get new keycodes because then existing programs wouldn't recognize them. The solution was to send a special code before MAKE/BREAK of right ctrl/alt.

And it gets worse: there wasn't a separate set of keys for arrows/PgUp/PgDn/Home/End/Ins/Del at first. All we had was the numeric keypad. Whether you got numbers or arrows depended on the xor of num lock and the shift key. So the arrow keys actually share their keycode with the numbers on the numeric keypad! Again the solution involved a special code being sent: before an entirely fake MAKE for one of the shift keys followed by the keycode for one of the numeric keypad keys (MAKE or BREAK), followed by the special code and a fake BREAK for the shift key. And of course, whether you get the synthesized shift key events depends on whether you already have a shift key pressed and on the Num lock status. You can get them for either of the keys sharing the keycode.

Entirely new keys were a lot easier (F11, F12): they just got a new keycode.


On old Windows versions, you could get even more complicated backwards compatibility fun.

The 8086/8088 only had a 1MB address space that wrapped around: if you used a high enough segment value with a high enough offset, you would actually refer to the beginning of memory.

The 80286 had a much larger address space: a whole 16MB! But when running in compatibility mode (where you could only refer to 1MB), the addresses no longer wrapped around. The result was that you actually had an address space of 1MB+64KB (minus 16 bytes). IBM wanted better compatibility than that so they found an unused AND gate in a TTL chip lying around somewhere and put the A20 signal from the CPU through that before it reached the bus. You could then switch A20 off at will, thus getting better compatibility. So how do you control that gate? Well, the microcontroller on the motherboard that talks to the keyboard microcontroller has an extra pin that isn't used for anything. Let's use that pin! So you switch the A20 gate on/off by sending commands to the keyboard controller.

Fast forward some years. We now have a decentish version of Windows that is basically running on top of DOS. It has extensive compatibility with DOS in many ways: not only can it run (and multitask) DOS programs inside it, it can also use DOS device drivers and TSRs loaded before Windows. It does this by merrily switching back and forth between so many different CPU modes that it makes you dizzy. It needs to run with the A20 on, otherwise you would get a really funny address space where every other megabyte was inaccessible, which would be both clumsy and wasteful. And device drivers and TSRs loaded into the 64KB right above the first megabyte would also need A20 on to run. On the other hand, there might be other real mode code in the system that needs A20 to be off! So Windows could switch it on and off while running. It doesn't quite involve the keyboard but it is close... and sometimes a bad keyboard (with bad code in its microcontroller) would distract the microcontroller on the motherboard so much that the A20 switching was slow or even faulty. In that case you could fix your machine by switching keyboard!


Also a great read, "Dizzying but invisible depth" by Jean-Baptiste Quéru

https://plus.google.com/+JeanBaptisteQueru/posts/dfydM2Cnepe



Would be interesting to see what google does when you visit google.com , cause a lot is being done behind the scenes. But still an interesting read.


Disclaimer: not representing anybody, my opinion not anybody else's. I work at Google, and my team looks after this.

Here: https://www.youtube.com/watch?v=DWpBNm6lBU4

I can't tell you anything that's not in that video.


The posts misses out the bit where the NSA scan your text and keep it in a big database, ready to use it against you any time you step out of line.


I was waiting how long it would take until someone made a witty NSA comment.


I am still waiting.


gethostbyname uses /etc/nsswitch.conf to determine how host names should be resolved. The document describes /etc/hosts and DNS, but there can be more.

On modern systems, it is likely that multicast DNS is performed to resolve local names before going to DNS.

I actually pointed this out in my Google interview and the interviewer instructed me to skip Name Service Switch part.


wrote this in response to a snarky comment that the omnibox should be implementable with zero allocations

https://gist.github.com/greggman/18953af231c9bf79cb12


I still think the correct answer is "Google.com webpage appears."


There is no "correct" answer. Explanation is multifaceted: explanations of one kind are grounded-in or reduce-to or supervene on explanations of another kind. Each kind of explanation explains different things.

There might be a "final" explanation in the sense that we talk about quarks, strings, etc. - an explanation which doesnt explain very much but is irreducible.


Question: Where is text rendered? on the gpu or cpu?


That depends on a lot of factors but regardless even if it's ultimately rendered on the GPU it's first generated on the CPU as in glyphs are rendered and put in a glyph cache. Those might be cached in textures and then used to render text with the GPU as appropriate.


That depends on the browser renderer. However, I suppose most windows browsers use Direct2D for 2D rendering, which is Microsoft's 2D API for the GPU.


You start using a non-open source search algorithm?


when i type 'g', the autocomplete feature suggests 'gmail.com'. After typing 'g', i type 'oo' and the suggestion feature suggests 'google.com' on its own. I do not have to type the complete 'google.com'. i type 'goo' and then i can press enter


The article chooses as its starting point the ENTER key hitting the bottom of its travel -- how you got the url or whatever into the address bar doesn't really matter.


Yes it matters. Autocomplete (either local or 100ms network database lookup) is a whole other layer of fascinating computer science+engineering.


It does not matter for the purposes of this project. You could go even further back and look at how the browser renders the location bar, how the OS starts the browser, how the machine boots the OS.... None of those matter, even if they are interesting in their own right.


Now I want to be asked this question at an interview, so I can describe what happens from the start of the process: not the Enter key bottoming out, but from the thought and prefrontal cortex activity; to the motor cortex; describing the action potentials in nerves and how they work, neurotransmitters, charge-hopping via nodes of Ranvier and so forth; all the way down to actuating the muscles and the molecular biology in that; actuating the bones, then onto bottoming out the Enter key...

... then finishing up with the eyes receiving the data; edge-detection in the retina [1]; transfer to the visual cortex and the several layers in the build-up there; eventually finishing up with the sense of disappointment you get when you realise you typed in a misspelling that google couldn't correct for.

[1] Pandemonium (http://en.wikipedia.org/wiki/Pandemonium_architecture) is my favourite theory of anything anywhere. Basically it sees the nerves as a bunch of simple-minded demons yelling out, and the bigger demons hear the loudest below them and yell out, so forth up the chain :)


<off topic rant> What happens is that Google decides to show me the Google search engine in exactly the language I did not want, despite all my settings on all my accounts specifying one and only default language.

It will do so even if I explicitly type google.de, google.nl or google.fr. Whatever language I expect or want, I will get a different one.

This makes me suspect all requests to google.com are routed through a routine with the name "because_fuck_you()".

To add insult to injury, this does not just affect the UI language, but also the content, functionality and sometimes even navigation, and not always in the straightforward relationship between language and content you may expect.

And I'm not even one of those poor souls that lives in a multilingual country... </rant>


This is why I never type google.com into my browser. So annoying.

My current workaround is a search provider with

    https://www.google.com/search?num=50&tbs=li:1&fg=1&gws_rd=cr&q=
http://ready.to/search/en/?sna=goo-v&prf=https%3A%2F%2Fwww.g...


Living in trilingual country here. I suggest u to try google.com/ncr to get English google wherever you are in the world.


I use https://encrypted.google.com/ which lets me stay on the english version.


That works for search but do you have one for maps and their other sites (like blogger)?


Checking only with curl here, they seem to respect the "Accept-Language" header. Have you had a look at the requests that are being sent?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: