I'm currently developing an open-source framework that allows theoretically infinite scalability utilizing round-robin dns, S3, EC2, S3DFS (allows mounting an S3 bucket as a local file system on multiple EC2 instances), SQLite (serverless embeddable database where each db is just a flat file).
S3DFS works at the block level and has read/write caching so is really fast. However, it requires a commercial license for non-personal use.
SQLite is amazing in it's power and simplicity. It will start to have issues on a high-traffic website, but I'm breaking each user into their own db file. That sounds really bad - but SQLite has a great feature that allows you to attach multiple databases together and run queries across them as if it's just one database (handy for site-wide stats, search indexing, etc). Also, since the db's are just flat-files - backing up is super-easy with tar gzip!
I'm using PHP for the coding, but anticipate other languages libraries to be built to use the system as well.
Help is welcome, just let me know if you're interested!
I was thinking of calling it infinizon - but that sounds kinda dorky - thoughts?
Update: I just came across http://rightscale.com which is an amazing AWS console that allows you to control almost all aspects of EC2, S3, and SQS. Too many features to list here, but definitely a must see. I've been playing around with it and it works great.
This past week I listened in on Werner Vogels' ETech 2007 speech and he said, explicitly, that EC2 is not suitable for high performance, large scale, databases. My database is more than 10Gb in size. You can't have that large image on EC2. I guess you could get away by hosting some tables there that:
"Each instance predictably provides the equivalent of a system with a 1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth."
I'm not seeing where you can't host that DB on EC2. It might not be the optimal platform, but my guess is that EC2 will do a pretty good job for the money.
1) The data is not persistent -- meaning, if your server dies, you loose your database with it. Rebooting the server gives you a clean slate.
This theoretically can be addressed with a complex backup/redundancy model, but frankly, it's not worth the trouble. Alternatively, you can use a distributed filesystem based on S3, but the performance will be poor. Long story short, your database is your bread and butter, and EC2 simply can't provide the basic services required to run a DB efficiently.
2) Dynamic DNS. Extra headaches, simply not worth it.
EC2 was not designed for persistent data storage, it's a compute cloud.
Re: "rebooting the server gives you a clean state"
That's not quite true: the HD contents survive crashes and reboots. Only if you release the node (or lose it otherwise -- it's not guaranteed to stay with you), the HD contents are lost.
I suspect lots of startups are finding it "stable enough", and the backup/redundancy planning needed manageable.
Are you certain? This must be new, because before your would get a fresh machine every time. Anytime you boot, AMZ looks at the image you want to run, and they unpack and uninstall it for you, meaning that even a reboot led to a clean drive. Now, there's been some talk about 'shadow copies' where you might even get access to other customers data by simply starting a deep scan on the drive.. but that's hardly persistence.
This is just another cool thing happening to reduce the cost of new software apps. Salesforce.com has the same thing with Apex. The only bad thing there is that they tie you a bit too much to their technologies (Java/Oracle). They have adapters for other platforms but it still runs on Java/Oracle.
Using it on http://onetimeline.com - EC2 for indexing nodes and web front end, s3 for data storage and awsp for webcrawler.
Couldn't have got up this fast without it - 1.5 years in the making (well fast to develop a search engine anyways!)
I've used S3 on a few projects and would definitely endorse it. I just got accepted to the EC2 beta last week and haven't started messing around with it.
Sure, if you're using Rails it's really easy to use attachment_fu (the new generation of the image/file storing acts_as_attachment plugin) to keep uploaded photos on S3. Also for Rails, Jamglue's technique of putting your entire static file hierarchy on S3 is great for maximimizing your app servers' bandwidth. BTW, Justin.tv runs off EC2 servers.