Hacker News new | past | comments | ask | show | jobs | submit | olegus8's comments login

Take a look at Weboob (http://weboob.org). It's an attempt to create a well-defined uniform API for websites lacking one. I wrote a few modules for it, and they're scraping 4 banks and 4 shops on a daily basis for last few months.

As I see it, upside of this approach is that you can scrape pretty much everything you can open in your browser. Downside is that you have to update your scraper every time website changes. From my personal experience this happens at least once a month.


Excellent point about PDFs. Some Weboob modules scrape PDF statements as well: at least Amazon Store Card, Wells Fargo and Citibank (disclaimer - written by yours truly). But even with these statements sometimes whole history cannot be restored. For example, Citibank stores statements for only so many years.


I think "writing my own fits-like-a-glove tool" pretty much covers it. I wanted a very minimalistic tool which just scrapes the data and transforms it into a report. Probably I could have achieved the same result by writing a few plugins for HLedger. Though I doubt the overall code size would've been smaller (currently it's <1000 SLOC and I'm not planning to add any more features). I've been using it for last few months without adding more features. The only thing that changes/expands is scraping, which is a part of a separate project - Weboob.


Certainly can imagine this happens. Currently I'm scraping 4 banks and 4 shops on a daily basis, and the only issue was when scraping for the first time from new IP address. They asked a security question or displayed a captcha. I resolved this by logging in first time from the browser through SSH tunnel over that IP. All subsequent scraping went well since then.


Weboob (http://weboob.org) has moderately active community. There's even a couple of finance tools built around it. Though web services it scrapes are mostly European.


Thank you! My love to Python started with this game. Though now I'd rather use something like C++ and Lua for games. It'd make it much easier to port and distribute.


javascript.. the actual wora.


Good point, I actually considered Javascript too. Though it seems like there's no one-size-fits-all tool, but right tools for right problems. I'd use Javascript for in-browser games, but something different for standalone ones if high portability is the requirement. Lua seems to be a good choice, because it's de-facto industry standard in gamedev and runs on anything but my door bell (http://luajit.org/luajit.html). So my current best idea is to write all control logic in Lua, and platform-specific low-level stuff in C++. Regarding the speed concerns, both Javascript and Lua have JIT, and anyways, no Lua or C++ code should be visible in the profiler at all if we're talking about performance. It seems to me most computations should be done on GPU instead.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: