Chrome, multicore and embarrassing parallelism

MicahWedemeyer · on Sept 4, 2008

To take advantage of multiple cores doesn’t require multiple (heavyweight) processes. A single multithreaded process would do just as well.

Not to say it’s a bad idea, just that we’ve got to remember that multiple processes is nothing revolutionary. A single thread for each tab would probably work just as well from a parallelization standpoint, but you’d lose the ability to kill one without bringing down the entire browser.

lallysingh · on Sept 4, 2008

Helps in a few ways:

1. Fewer synchronization points. You now have a two-level hierarchy (threads & procs) vs just 1 (just b/w threads). You can reduce the scope of the hardest-hitting syncs (e.g. malloc) to contend within a smaller arena.

Anyone well-versed in Win32 know if there are any GDI-related advantages to different processes? Synchronizing graphics context accesses is another PITA, which may be avoidable here.

2. Resiliency: a single bad plugin or V8 bug need not take down the entire browser. Just a self-closing tab.

3. Security. (not in this version of the browser) The supervisor proc may be able to set per-process access controls (e.g. RBAC) to keep the tabs in check. E.g. keeping activeX in control, while allowing the in-house controls to function with as much access as they need.

A lot of OS functionality is process based, and using them allows for a lot of open space for new possibilities.

sanj · on Sept 4, 2008

I agree. However, it seems like a fundamentally simpler to build a share-nothing system than a threaded one.

anamax · on Sept 4, 2008

Yes, but "share-nothing" almost always has some sharing. (If it doesn't, it can be run somewhere else, by someone else, at some other time, including "never".)

iman · on Sept 4, 2008

yeah, chrome tabs are not "shared-nothing".

They have a common document cache, a common cookie store, a common history store, probably a common dns lookup result cache, and maybe some other things. These all need to be carefully synchronized between multiple tabs.

In fact, I would argue that multi threaded would be better than multi process so that even more things could be easily shared. For example, I imagine that css stylesheets are parsed into some big fat data structure inside browsers. If I open two tabs from a website that share a stylesheet it would be optimal if they share this same internal representation (no locking would be required for the sharing since it's read-only). This has the obvious savings of memory, but it also increases speed since the css file only has to be parsed and processed once instead of multiple times. And things like sharing keep-alive connections between tabs are virtually impossible with multi process, while very possible with multiple threads.

pivo · on Sept 5, 2008

Except that with multi-threading you give up the OS-provided process level memory management and are back to having to track down all those memory leaks which at least Mozilla never can seem to fix satisfactorily. Even though Chrome seems plenty fast to me, I'd gladly give up some speed if I never had to close the browser and restart it because it was taking up 3/4 of available memory.

anewaccountname · on Sept 5, 2008

>probably a common dns lookup result cache

Uhh, from the OS they get that for free.

iman · on Sept 5, 2008

But nevertheless, browsers still maintain their own internal cache. Firefox: http://jelmer.jteam.nl/2006/11/04/disabling-the-firefox-dns-...

sanj · on Sept 4, 2008

Agreed.

But don't undervalue the difference in mindset. At least for me, threading makes me think "what can I peel off of my main task to run in threads" versus share-nothing which makes me think "here are the 3 shared resources that I anticipate will be bottlenecks".

anamax · on Sept 5, 2008

> At least for me, threading makes me think "what can I peel off of my main task to run in threads"

Don't do that.

> "here are the 3 shared resources that I anticipate will be bottlenecks".

Do something like that.

The difference between threads and processes is in the mechanisms for sharing. While those mechanisms affect how you organize computation (different things are cheap), they don't mandate an organization.

jmtulloss · on Sept 4, 2008

And I would argue that this is The Right Way to think about things. When you're working in a shared-nothing world, suddenly the lines you draw between components become much more important. Thinking about these lines and drawing them carefully almost always results in more modular and more flexible software.

anamax · on Sept 5, 2008

"much more important" is actually "much more expensive to cross".

Expensive lines make systems less flexible, not more. They lead to copies for efficiency, aka denormalization, which is another word for "bug waiting to happen". While this can be dealt with, doing so involves lots of nasty tradeoffs, with bugs a common outcome.

More to the point "more modular and more flexible" is neither necessary nor sufficient for producing good software.

MicahWedemeyer · on Sept 5, 2008

"more modular and more flexible" is neither necessary nor sufficient for producing good software

I'm glad I'm not the only one that thinks this way. I wish I could upmod by 1 million.

jmtulloss · on Sept 5, 2008

I would argue that it is necessary, though not sufficient. Writing black box tests for non-modular code is a nightmare.

eggnet · on Sept 5, 2008

Multiple threads don't buy you crash protection. If one of the threads crashes, the process and all threads in it are terminated.

allertonm · on Sept 4, 2008

Your claim is true only if you assume that there will be no penalty for sharing memory across cores. This is sort of true now with 2-4 core machines, but will probably not be the case as the number of cores increases.

MicahWedemeyer · on Sept 4, 2008

Can you explain? How is sharing memory any slower than each process having its own address space? I'll admit: I don't know a lot about the system bus and how a multi-processor machine accesses the memory.

rbanffy · on Sept 4, 2008

I believe the GP was not talking about shared memory spaces but shared memory buses. It would not be trivial to have 50 processors sharing a single pool of physical memory even if the processes don´t share any physical memory addresses. Making all of them agree on the contents of a shared memory space is quite a nightmare.

jmtulloss · on Sept 4, 2008

I'm not sure how they're doing it now, but AMD used to design their multi-core processors optimistically. The RAM would be partitioned between cores, and they would only communicate on a central bus if they needed memory that resided on a different core's partition.

This actually works quite well since the OS tends to schedule a process on the same core, so processes tend to always access the local memory partition.

orib · on Sept 5, 2008

If that's the way they're doing it, then it sounds bad.

Why? Because the "working quite well" is only really valid for single threads with no shared data structures. In other words, when you pretend a thread is a process.

As soon as you have more than one high load thread, the OS will want to split them across multiple processors, which means that you're now trying to share the same chunk of memory between processes. If the OS tries to keep them on the same core, though, then you've got 2 processes competing for CPU time and leaving another core free.

Then again, even turning them into processes on a modern OS wouldn't distribute the memory contention all the time; memory is usually copy on write across forked processes, which means that unless you've written the memory, reads are still contending for the same bus.

jmtulloss · on Sept 5, 2008

Threads are the same as processes to the Linux scheduler.

When the alternative is using a shared bus all the time, it works out nicely this way.

Most programs are single threaded, and even most multithreaded programs don't share that much between threads. Of course the scheduler is going to schedule across CPUs in a reasonable way, but if it makes sense to keep it on the same CPU, it does that. The point is to keep bus contention to a minimum, and this does that in the average case.

orib · on Sept 5, 2008

Yes, but the typical data access patterns differ between threads and processes.

Again, this architecture makes sense for the "lots of totally independent processes" case. The problem is that this case isn't as common as you'd expect. on Linux, if you fork a process, you're sharing memory between them until you write to it. in threads, you're sharing all read-only data unless you've explicitly duplicated it.

cx01 · on Sept 4, 2008

I don't really get what he's saying. Most of the time you will have one active tab, which is consuming a lot of CPU, and all the other tabs in the background will do nothing. Dividing those background tabs onto multiple cores won't bring any significant performance enhancement.

dabeeeenster · on Sept 4, 2008

If you have a tab in the background that has flash elements (i.e. most sites with animated ads these days) they will continue to run, even if the tab is hidden. This will have a knock on performance hit.

rbanffy · on Sept 4, 2008

I really think that there should be a way to shut down "smart" content that´s invisible as not to steal CPU cycles from other tasks. I don´t think Flash content is that much important you need to run it all the time.

wmf · on Sept 4, 2008

This is already done in Safari (and probably other browsers): http://webkit.org/blog/96/background-music/

In both Safari 2 and WebKit nightlies, GIFs don’t animate unless they are being painted somewhere. If an animated GIF becomes invisible, then the animation will pause and no CPU will be consumed by the animation. Therefore all animated images in a background tab will not animate until the page in that tab becomes visible. If an animated GIF is scrolled offscreen even on a foreground page, it will stop animating until it becomes visible again.

Many plugins do animation and work based off being pumped “null events” in which they do processing. The faster you pump these events, the faster animations will occur, and the more CPU will be used. Safari 2 actually throttles these events aggressively to background windows and background tabs.

jmtulloss · on Sept 4, 2008

I run a music site, and I hate that safari does exactly that. I have to bring the tab to focus if I want it to move to the next song correctly. You'll notice the same behavior on Pandora.

olefoo · on Sept 4, 2008

That's not true for me, I usually have a couple of browser tabs open for email and news that are doing things in the background, And a lot of the control panels for various things are starting to update continuously as well.

lallysingh · on Sept 4, 2008

Flash animations come to mind.

As well as a background Pandora tab.

iamelgringo · on Sept 4, 2008

I don't ever recall a time when a web page that I was loading pegged my CPU. I agree that tabbed browsing is embarrassingly parallel. But, are web pages particularly CPU bound? If anything, it's the bandwidth that's throttling the development issue.

The hard core javascript guys I know talk about developing interesting client side code compare it to developing the old 8 bit computer games, and trying to do something really cool and interesting with 64K of memory. They're trying to do something interesting and bundle it up in 64K of code, so the download speeds don't cause people to bounce to another site.

wmf · on Sept 4, 2008

I don't ever recall a time when a web page that I was loading pegged my CPU.

I see this occasionally due to browser bugs. (This is tautological, since I define pegging the CPU as a bug.) In some sense Chrome is the first postmodern browser: instead of trying to eliminate bugs it lets you kill -9 tabs.

fauigerzigerk · on Sept 5, 2008

I'm not sure if that makes any sense. The bottleneck in terms of parallelism on the client is me, the user. Predictably, what all those cores will do 99.999% of the time is wait for me and wait for my ISP to increase throughput. The multi process model in chrome (that IE has as well, on a per window basis) is meant to reduce the effects of crashes, not increase parallelism.

axod · on Sept 4, 2008

I'd really question the theory that #cores will grow exponentially.

Be interesting to see some evidence that's likely to happen.

rbanffy · on Sept 4, 2008

Niagara... cough... Larrabee... cough...

If you really think Larrabee is only a GPU, think again. Each core is a full 64-bit x86 that can quite happily run anything that´s running on my Core 2 Duo.

13ren · on Sept 5, 2008

Moore's Law is exponential

If you believe it will continue, and that clock speeds won't increase, then you get exponentially increasing #'s of cores (or some other use of silicon estate).

An alternative scenario is that there will be no mainstream demand for more computing power, and only niche markets will buy them. The mainstream will favour cheap, less powerful CPU's. The rise of eeePC and clones is suggestive of this scenario.

anewaccountname · on Sept 5, 2008

>If you believe it will continue, and that clock speeds won't increase, then you get exponentially increasing #'s of cores (or some other use of silicon estate).

Only if you believe the wiring connecting the cores wouldn't explode with the number of cores.

13ren · on Sept 5, 2008

I think the wires would just be a single bus.

A single bus works fine for moving data to/from one of many memory locations. Another method of using just one bus is TCP-IP packet-style communication. Of course, there's limited bandwidth, so you want most of your computation done within a processor, not between them.

anamax · on Sept 5, 2008

If a single bus works for large numbers of processors, why are data centers wired differently?

For many problems, bisection bandwidth is a key constraint.

axod · on Sept 6, 2008

Moore's Law is more of an observation. There's nothing to suggest it can continue indefinitely forever.

sanj · on Sept 4, 2008

Now that you mention it, I'm not sure it needs to.

As long as you clear ~100 cores, I think you're done.

DabAsteroid · on Sept 4, 2008

as you clear ~100 cores, I think you're done.

Why do you think so? The 2007 book "Future Directions in Processor Design" says (p483, Ch21):

http://www.springerlink.com/content/xp811205j8523537/

If we think of the technology development as predicted in the ITRS roadmap [208], it is clear that more and more processors will be crammed onto a single chip. The prediction for the roadmap is that we will see hundreds or even thousands of precessors integrated within the next ten ... fifteen years, e.g., 424 precessing elements per chip in 2017. The trend can be confirmed by looking at some ambitious high-end projects in multi-core and multi-processor development. For example, Rapport Inc. is shipping a chip with 256 processing elements on board and is developing a 1,024-core processor, however these are only 8-bit elements [281].

...

My bet is that we will see more specialized processors for different specific tasks. The "one size fits all" simply cannot provide enough cost and power efficient enough solutions for the embedded sector. Thus, we will see various special-purpose off-the-shelf cores emerging.

sanj · on Sept 4, 2008

Sorry, that was unclear. I was thinking more that few people would have >100 browsers/tabs/applications open at one time.

It isn't an argument about the technology, rather about what humans will be able to manage.

hhm · on Sept 4, 2008

Your argument sounds a lot like "64k should be enough to anyone"; I guess we'll only need more and more processing power (and processes aren't necessarily applications, as in the case of Chrome -and you could also be using threads for that matter-, so your argument doesn't follow anyway).

0x44 · on Sept 4, 2008

While I currently only have about twelve tabs open, in the past I have gone above 100. I tend to use tabs to store things I will look at in the recent future, and also as a way to follow threaded comments, or links within articles.