A friend who is a sales associate within this particular industry complained to me about how hard and time consuming it can be to search for a particular item. He said if I could build a search engine that searches the top 500-1000 sites for this industry, it could be 'really' valuable. My target market for this search engine would be the owners and associates of the sites I would be scraping.
The data I would be scraping are images and its associated description. I would only store and display thumbnail images. Without an image, the description would be fairly worthless. For each image/item, a link would lead directly to the original website.
One business model I am considering, and the most obvious, is a subscription based web app.
While at PyCon last month I showed a few people a prototype. One person, an employee at Google, said, "Be careful." He was alluding to potential copyright and legal issues. "But," I said, "I'm not really doing anything different than Google." He countered, "Google has lots of lawyers." Ahhhh, message heard loud and clear!
I understand, in general, copyright and fair use [0]. But, I don't want to be writing letters to the owners of the original content arguing this fact let alone wind up in court. What advice or experiences can you share that might helpful?
[0] http://en.wikipedia.org/wiki/Fair_use
Instead, index. Indexing, on the other hand, connotes supplementation in the sense of adding value to that which is already there. Have the thumbnails, excerpts of the descriptions, and whatever secret sauce you've not mentioned add value to the owners' data. Provide traffic or some other measurable benefit to them.
Don't rely merely on Fair Use (or weak interpretations of the doctrine). Provide value to the data owners, and be ready to respect their wishes if they chose not to accept the value proposition you offer.