Meteor 0.3.9 adds search engine optimization

mcgwiz · on Aug 9, 2012

Their technique for generating the HTML representation of deep-link into a Meteor app is to run the entire client app in a headless browser and serialize the generated DOM?!

This is an area of vital importance to public, JS-based RIAs, and needs some real innovation. Why even bother delivering this half-baked solution? The processing cost makes it untenable for all but the tiniest of URL-spaces.

dgreensp · on Aug 9, 2012

Yeah, it's definitely the kind of solution that makes you twitch a bit. The reason we wrote it is because we needed it now for meteor.com, and it wasn't too much engineering work (I think Nick wrote the first version over a weekend, and then it was a couple days of packaging it up). The reason we released it is, well, why not? :)

The "real solution" is coming, and we will get it right. It's connected to URL routing, and sending down initial HTML on page load. We're all excited for the day when Meteor apps initialize the client session on the server. It will look at first glance like a traditional server-side app. :)

besquared · on Aug 9, 2012

Did you guys think about just building a normal website?

raja · on Aug 9, 2012

Did you look into using Zombie instead of PhantomJS? It advertises itself as a tool for headless full stack testing utility but I have come to rely on it for integrating headless browsing in node.js apps.

KaoruAoiShiho · on Aug 9, 2012

Not affiliated with meteor but I have a similar setup.

I have tried zombie but there were various issues with getting it to work. Phantomjs was just much more painless.

enos_feedler · on Aug 9, 2012

+1 for Zombie. I have been using it for acceptance testing with mocha and it works quite well for emulating the browser environment. The nice thing is its just an npm module that you can require and works within the js environment.

geoffschmidt · on Aug 9, 2012

> Why even bother delivering this half-baked solution?

Because we used it for our funding announcement at the end of last month -- a full press cycle, where the Andreessen Horowitz and Matrix Partners press machine pushed our story out to all of the tech blogs, with all of the traffic that that implies -- and not only did it work fine, it got us to #1 on Google for "meteor." Above, you know, actual meteors :)

It works fine for us as a stopgap measure and we wanted to share it with others. We put it in a optional smart package that isn't included in new projects by default.

We're near the end of a major rewrite of Meteor's page update engine. You can see the latest progress on the 'spark' branch. One thing that happened during this rewrite was the conversion of Meteor's templating to be 100% string-based. Check it out: go to meteor.com/faq, open your browser console, and evaluate "Template.faq()".

This means that the server can render the templates for your app without having a DOM implementation of any kind (much less a headless client.) 'spiderable' doesn't do this yet, but it will by Meteor 1.0 (if 1.0 even has a separate package.)

For the record, the ratio of our investment in auth and accounts, to our investment in Spark, is about 2:1.

gaius · on Aug 9, 2012

Above, you know, actual meteors :)

Elegant proof if any were needed that so-called SEO is merely tricks for polluting search indexes. As bad as spammers.

csallen · on Aug 9, 2012

Search engines are all about what people are looking for on the web. It's quite possible that more people are currently interested in Meteor the framework than meteors the objects, in which case I wouldn't call this result "pollution."

If you really want the Wikipedia page for meteors, nothing's stopping you from going straight there.

gaius · on Aug 10, 2012

I very much doubt that, with the Perseid shower due this weekend.

libria · on Aug 10, 2012

My top 2 hits were meteor.com and meteor.ie (a mobile communications provider in Ireland) and then space debris. It would seem domain names have a lot of SEO weight, an important takeaway for startups.

jcoglan · on Aug 9, 2012

Never mind the cost. They are proposing that you run your app, with its 3rd-party jquery/facebook/twitter/google code loaded dynamically over non-SSL connections on a platform with filesystem-write-access to your server.

mnutt · on Aug 9, 2012

It's no less safe than loading hostile 3rd-party web pages on your browser at home.

jcoglan · on Aug 9, 2012

It is much less safe. PhantomJS has a filesystem API, which is fine when you consider that its primary use case is testing code you wrote and reporting the results. However, given that it is fairly easy to create a bridge between the PhantomJS and WebKit JavaScript runtimes, or to exploit common patterns for making such a bridge, running arbitrary 3rd-party code, loaded over a connection not verified by an SSL cert chain, is asking for trouble. This is an obvious backdoor to get write access to Meteor servers.

mortice · on Aug 9, 2012

You are so wrong that you are dangerous.

mnutt · on Aug 10, 2012

No, I'm not.

I think people's confusion lies in the fact that there are actually two separate areas where javascript is run in Phantom: one is the javascript that controls Phantom and has a filesystem API; the other is the javascript that gets run inside the browser sandbox as part of the web page, just like any other javascript run in any other browser. It is possible to set up a bridge between the two such that the latter can issue commands to the former, just as you can curl sites and pipe them into bash. The point is that with default settings you can use PhantomJS to load a website without any danger whatsoever.

If you disagree, please write a more worthwhile comment showing me which part of the API is dangerous.

tomku · on Aug 9, 2012

Seems like this would be a very easy way to DoS a Meteor app, if it's really spawning a PhantomJS process for each request.

bct · on Aug 9, 2012

It is. And the server doesn't even feed PhantomJS the HTML that it can generate, PhantomJS has to make another request to the server. I wonder if it can be made to recurse? https://github.com/meteor/meteor/blob/master/packages/spider...

jacobr · on Aug 10, 2012

If this content is mostly for spiders, I don't see why you couldn't cache the resulting HTML for 1 hour or so and only regenerate new pages.

mnutt · on Aug 9, 2012

It actually is possible to render those pages in phantom at scale, but it's not very elegant. It seems like something like env.js would be sufficient to render the page, though. I think other people have done that, although I haven't seen a framework that makes it natural. Meteor and others would be in a great position to build such a framework.

audreyr · on Aug 9, 2012

Agreed that it needs some real innovation, something better than this. Consider it just a start: the fact that the spiderable package is one of the first ones out means that it is something the Meteor team cares about trying to solve. It's certainly not perfect, but it paves the way for future iterations.

trung_pham · on Aug 11, 2012

agree. this is a horrible solution. sigh...

jarcoal · on Aug 9, 2012

Who really needs this for their web app? Nearly 99% of heavy web apps require a login, so Google is out of the picture anyway.

Anyone who is building a content site with DOM-manipulating Javascript doing all the work have completely lost their way. Seriously, just render your templates on the server and deliver them to the client. Why does the world want app-ify everything?

arohner · on Aug 9, 2012

Fat client apps are fantastic for making some kinds of UI interactions trivial.

The simplest example is the checkout form, or any kind of wizard linking multiple pages together. In a fat client app, all state for all N pages of your checkout cart are in the same page, with 5 lines of code to switch between them. Doing that in standard MVC "fat server" model is annoying, and about 10x more code.

> Why does the world want app-ify everything?

Think about GUI applications pre-web. They weren't written as servers that generate PDFs, with an embedded scripting language. The current webapp technology stack is a complete accident, and if we had actually sat down to design the "optimal" stack, it would look nothing like what we have.

jarcoal · on Aug 9, 2012

I completely agree with you that in some (many) cases it's useful.

But your checkout example proves my point; you wouldn't want a bot crawling through there.

Bots should be crawling content-rich pages (blogs, articles, marketing pages), and IMO those should rarely be handled by fat clients.

bjourne · on Aug 10, 2012

I have a site with lots of statistics, tables, charts and similar stuff. As you can understand, fetching all that data from the database to render the analytical charts is fairly expensive. My solution is to first render the page template and send it back to the client as quickly as possible with all statistics widgets empty. Then the page makes a few ajax calls to fetch the data from the server to render the pie and bar charts widgets.

It works very well for a human visitor. The page loads extremely quickly and the widgets rendered using javascript and ajax are below the fold anyway so it doesn't matter that they are visible 500 ms later than the initial page load. Unfortunately it is crap for Googlebot which never runs the ajax calls and never "sees" my pretty graphs which leads it to think I have a much more boring site than what it really is.

So that is my need to "app-ify" my site and my, as of yet unfulfilled, need for a framework that is able to provide Googlebot with an accurate view of my appified site.

mattacular · on Aug 9, 2012

Because server architecture and rendering is really expensive to scale whereas a client-side-dependent app could be supported for much less. It is a trade-off.

The presentation layer has been moving to the client-side for the past few years, where have you been?

jarcoal · on Aug 9, 2012

Rendering templates on the server is not expensive at all, that is completely untrue.

I've been keeping up like any developer, in fact I'm writing a Backbone app as we speak. But like all of the apps I write, Google doesn't need to crawl it.

mattacular · on Aug 9, 2012

>Rendering templates on the server is not expensive at all, that is completely untrue.

Vaguest statement ever. It depends on the complexity of the template and the templating system, also then that file has to be served either directly or through a cache. complexity++ It is harder to scale a server-side presentation layer.

If you don't think that is the case please expand on your statement...

edit; please don't cite Twitter as a case, they are not the norm.

audreyr · on Aug 9, 2012

Why is this interesting? Because 1) search engine crawlability matters and 2) the more AJAXy web apps get, the harder it is to make them crawlable.

The more we move away from traditional web "pages" to rich web apps that do everything through DOM manipulations on a single page, the harder it is for the search engine robots to crawl what we build.

ceejayoz · on Aug 9, 2012

Google's making moves towards having their crawler essentially be a headless Chrome instance. Crawling AJAXy apps is rapidly going to get easier for Google et al.

mortice · on Aug 9, 2012

Easily the funniest link on this site this year.

scottkrager · on Aug 9, 2012

Yeah I mean, sure SEO isn't exactly rocket science...but it's a little more than making a URL crawl-able....

jfarmer · on Aug 9, 2012

Being crawlable might not be sufficient, but it sure is necessary! :D

Xyzodiac · on Aug 9, 2012

Wait, they added SEO before database authentication? So very logical.

debergalis · on Aug 9, 2012

You can try out the prerelease auth feature right now if you want. A bunch of projects are using it.

https://github.com/meteor/meteor/wiki/Getting-Started-with-A...

mikebannister · on Aug 9, 2012

Other core devs are working furiously on auth (getting pretty close). This feature was just releasable first.

Xyzodiac · on Aug 10, 2012

Sounds good, despite my first post Meteor is the most interesting framework out there atm IMO.

encoderer · on Aug 9, 2012

Relax. It's 0.39 Preview. C'mon...

spullara · on Aug 10, 2012

If they can tell which pages are public vs private (which I think they can) they could just tell the client that they need a copy of the page when they are done rendering it, have it post the serialized DOM back to the server and then cache and serve that until the next redeploy.

dotborg2 · on Aug 10, 2012

So after year of developing this product, they finally realized that each page should have an unique URL.

It's called lack of vision.

ps. ajax content does not rank in Google SERPs at all, it's a typical band aid solution, so websites made with meteor will have some serious issues with monetization and stuff

AznHisoka · on Aug 9, 2012

I know this isn't 100% on-topic, but does anyone know if you can choose your own user agent when using PhantomJS such as Firefox, or IE?