Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How was this created? It would be cool to see how I could download copies of this for personal analysis.


https://registry.opendata.aws/irs990/

A dataset of IRS 990 filings are available there. It is a big collection of XML files.

Here is an example of one chosen at random: https://pastebin.com/pzNYBZYQ

EDIT: here is the same thru propublica explorer:

https://projects.propublica.org/nonprofits/organizations/437...

which links here, which is the document I posted:

https://s3.amazonaws.com/irs-form-990/201643199349201044_pub...


Yup. We’ve been using this data for a while to render e-filed 990s on our site and to extract highly paid employees. Now we just strip the markup out and toss it all into elasticsearch for search. It’s really interesting to surface things like grants.

I will say for personal analysis that the schema has a habit of changing, and things like grants can appear in multiple places depending on the context. What’s more, just 2/3rds of nonprofits e-file now (and I’m sure fewer and fewer the further back you go) Just some things to look out for.

If you’re interested in processing the 990 XML data though, check out the truly excellent irsx: https://github.com/jsfenfen/990-xml-reader


If you don't e-file does that mean the IRS don't digitise your accounts and so you avoid appearing in these sorts of data sets?

Sounds like a lot of interesting data will be in that last third, in which case.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: