Hacker News new | past | comments | ask | show | jobs | submit login

Could this strategy work to match products across retailers? If so, any tips on getting started with vector databases? I've heard of them but have yet to try one out.



Yes. You compute the embedding for the product name + description from Target.com and then the embedding for the product name + description from Walmart.com. They'll have a very close vector similarity.

The easiest way to get started is with Supabase since it has a free tier and the pg_vector plugin built in.

You calculate the embedding using OpenAI's embeddings API and store the result. Then it's just a vector similarity query in Postgres (trivially easy).


Another way to do this is using the pgml extension. You can run huggingface embedding models, which have surpassed OpenAI's at this point. It's pretty fast if you run it on a machine with a gpu for acceleration. I've created embeddings on my local desktop with a 3090 for ~2,000,000 tokens in chunks of ~100 (450 characters). It took around 20 min using the gte-base model including insert into indexed table.

Still uses pg_vector.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: