Hacker News new | past | comments | ask | show | jobs | submit login

Bing was my secret weapon when doing data science contracts that were MVPs or side features.

Take a couple million preselected strings they needed to match to and then run them through Bing, page through the results, and build up the corpus I needed out of something like mechanize / nokogiri / headless browser.[0] Then clean up the data the normal data science way then do whatever mathy stuff I needed to layer on for whatever app they needed. There you go mister client something that has billions of dollars of R&D behind it and cleaned up for your specific use-case. You want something better? Go off and hire a team of Phds and spend a couple years and 50x the money I charged you to get your RoC curves (or whatever) looking 10% or 15% better.

Haven't done this in a couple years due to a startup and then after that randomly finding a client I liked working with so much I haven't needed to take on random jobs again, but I'm sure it would still work.

[0] I had also written custom tools to make this easier / saner. Also inspector gadget + CSS selectors goes a long way too.




That is not a scalable practice. You may have been able to do it, but if it became a too common a practice among developers making similar things then I'm sure Bing would have taken steps to limit such activity.


That's why it was a secret weapon and not a "I told everyone how to do it" weapon.

Also, lol, I spelled it "RoC curves" I clearly didn't proofread this.



I mean, on a good day I capitalize the "O" in ROC.


Oh right! Yeah, and now there are "roc" acronyms everywhere, like AMDs rocm or whatever, and I'm pretty sure I've seen a few other. Pretty easy to get confused.

Edit: I knew I wouldn't get the capitalizaton right, AMD's is ROCm for "Radeon Open Compute" and who knows what the "m" stands for.


I'm curious what about Bing made it specially suited for this?


Absolutely fantastic rate limits (essentially a non-issue) and no captchas or other other annoyances. Reasonable enough data quality. It's no Google, but it's not bad if the search terms are for scientific concepts or similar.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: