Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Service Idea: Cloud based image hosting
9 points by al_james on Sept 8, 2010 | hide | past | favorite | 11 comments
Hi there.

I run a medium size travel blogging site. When I started developing the site, I did not give much thought to dealing with user uploaded photos. I simply saved them to directory on the server and resized them into the correct sizes at upload time.

However, when the site became bigger, photos actually became a bit of a pain in the butt:

* Its a pain to synchronise across multiple servers

* If you change the sizes of the photos in your templates you need to regenerate all the images to the new size (hoping you kept the originals!)

* A web server set up to serve images efficiently is different to one set up to serve dynamic HTML efficiently. Running two web servers with limited resources can be a pain.

Of course, most of these go away by using something like Amazon S3. However, you still have the problem of making sure your images are available in the right sizes, file format conversion etc.

I propose a cloud based service to make this all easier for developers. Something that allows you to use a simple HTTP API to post images to. You can then request the image back in whatever size you want. Other image process functions could be supported (e.g. cropping, black and white, saturation) etc...

So say, you save an image to the service with file name 'image1', you could form a url to apply these changes:

http://fictional-image-sevice.com/myaccount/image1?crop(50,50,600,400)&size(300,200)&black_and_white&format(jpg)

Would return a cropped, resized black and white version of the source image as a JPG (regardless of original file format).

This would remove the headache of having to worry about image resizing or processing.

It would be a low cost service. It would use amazon s3, so storage and bandwidth would be the same. It would simply charge a fee per 1000 image requests.

(Really simple) client libraries would be available in all popular languages.

I have a really fast implementation of this image 'proxy' written in node.js that can handle these transformations, and caches the results in memory and then disk to speed it up. So it would be fast. Eventually it may be possible to offer a geo-aware version that downloads from a local server.

Any legs?




I built exactly that about a year ago, serving up billions of images (not billions of different images but billions of requests) every day.

The way I ended up doing the scaling (we didn't need any other format conversions) was using the retrieval URL and a 404 handler that is smart enough to be able to access a list of 'allowed' sizes (so that doesn't become a potential attack vector). So if you access a file in a size that hasn't been made yet it gets created on the fly.

The whole thing has been up and running across 9 servers for a year now, it has triple replication and a bunch of varnish servers on the front end to make it fast.

We have two ways of putting data on there, one through an API that accesses the servers directly, another using a queuing mechanism.

To improve the legibility of the urls we used a virtual path rather than a bunch of parameters.

so http://mycdn.com/storage/client/format/id/id/id/id/id/id/id....

where the 'id' bits are 2 digits from the image identifier.

The nodes have 4TB storage each. Originally we used XFS but deletion was too much of a bottle-neck so we ended up switching the system after it was already live to EXT3, which improved performance quite a bit.

I'm sure that if you build this 'properly' (as in nicely abstracted, multi-user, with redundancy by using multiple locations and so on) that there is a market for it but I'm not sure how big that market would be.

So yes, this probably has legs.


Sounds very similar to what we have here, except we are storing the original files on S3 to avoid replication / redundancy issues.

I guess getting the pricing right is key to working out demand.


Is what you built public or private ?


Private. Building this taught me a lot I of stuff that I thought was 'easy' is actually pretty hard when you need to do it often enough :)

I always thought live video was hard, it turns out large numbers of images is actually much harder. That really surprised me.


Google App Engine just released something similar to this a few weeks ago:

Announcement: http://googleappengine.blogspot.com/2010/08/multi-tenancy-su...

Docs: http://code.google.com/appengine/docs/python/images/function...

It could be set up to do image resizing on the fly per URL parameters you pass to it, and storage/bandwidth is cheaper than S3 if I recall correctly. It's based on the same infrastructure as Picasa.

Edit: In fact it could be easily used to create such a service rather than having to build out the functionality oneself.


Thats very interesting. Thanks!


No problem, let me know how it goes, my email is in my profile. You question peaked my interest in building such a service on top of GAE with just basic API access and billing for usage. As long as you cover the hosting cost Google charges you, it'd seem to be relatively straightforward. I'm just not sure if there is enough control built in to determine what bandwidth went where.


GAE (can) serve images out of the blobstore, so you can monitor statistics when you tell it to get the image out. Docs are here: http://code.google.com/appengine/docs/java/images/overview.h...

(I expect you'd probably want to use memcache to cache images rather than the blobstore everytime, though)

(Note that you pretty much have to keep the images in the blobstore because you don't have filesystem access. You might be able to keep them in the datastore if you wanted, but those are the only two AppEngine options)


Hmmmm.... The 1 MB limit in and out of the image service could be a problem.


The image service is nice, but really only needed if you want to do transforms. You can serve raw image data out of the blobstore.


Wordpress does this, but rather than serve the images from s3, it uses s3 to store the images and populate a self-hosted varnish cache: http://blog.apokalyptik.com/2007/10/10/so-you-wanna-see-an-i... This reduces the s3 bill by an order of magnitude (http://ma.tt/2007/10/s3-news/), so you may want to consider this approach.

In fact, using this approach, you could use s3 (just storage) to undercut s3 (storage+bandwidth) on cost and get lots of customers! I'd be in! :-)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: