Hacker News new | past | comments | ask | show | jobs | submit | schwanksta's comments login

As others have said, it does not really make sense to search by ZIP code for most consumers. Your chicken could come from anywhere. This is meant to check the processor for a specific package of chicken.


The site is specifically by p-code, company, or processor location.

But only by single location, I'm not sure if you've been near a chicken processing plant, but they are very often clustered together, shockingly near where an abundance of commercial chicken farms are, so knowing the percentage chance by an area can be very useful.

I don't think my perspective and interest seem to be aligning with most people commenting here.

I'm not concerned about if a random package of chicken is OK, I'm concerned about a cluster of processors.


There are more complaints than that, but these are all closed complaints for any officer who has had at least one substantiated allegation. It's only a fraction.


And it's only for currently active officers, right?


Not to be overlooked: ProPublica actually constructed a 3D version of the navigation system's control board: https://www.propublica.org/article/how-we-reconstructed-the-...


Couldn't find it in your article, but it looks like the 3D models are integrated into the story here: https://features.propublica.org/navy-uss-mccain-crash/navy-i...


This is neat! I thought about extracting the grants (still might), but full-text seemed like good bang for the buck. Your tools sound like they might be very useful for reporters. Have you given any thought to that? We love mapping these sorts of connections.


Hey, first up really amazing work you’re doing, hugely inspiring for us! Thank you.

Whilst our focus has been delivering a consumer layer on top of all this data, yea, very open to exposing our underlying graph to others. Want to drop me a note at dan at alma.app?

As you mention elsewhere, half the battle is cleaning the data and getting quality.


Yup. We’ve been using this data for a while to render e-filed 990s on our site and to extract highly paid employees. Now we just strip the markup out and toss it all into elasticsearch for search. It’s really interesting to surface things like grants.

I will say for personal analysis that the schema has a habit of changing, and things like grants can appear in multiple places depending on the context. What’s more, just 2/3rds of nonprofits e-file now (and I’m sure fewer and fewer the further back you go) Just some things to look out for.

If you’re interested in processing the 990 XML data though, check out the truly excellent irsx: https://github.com/jsfenfen/990-xml-reader


If you don't e-file does that mean the IRS don't digitise your accounts and so you avoid appearing in these sorts of data sets?

Sounds like a lot of interesting data will be in that last third, in which case.


Oh hey I built this. Let me know if you have any questions about how it works.

Edit: wrote a little bit about that here - https://news.ycombinator.com/item?id=20141744


This is great. AC is an old friend of mine, you guys are doing amazing work at ProPublica. Nice to see the tech work that runs behind the stories.


How do you get the data? I wasn't able to find the forms for 2018 for some charities. Did the IRS make them available yet?


The IRS puts them on an s3 bucket: https://docs.opendata.aws/irs-990/readme.html

There are 2018 filings in there, but many charities have fiscal years that end in Dec. IIRC, they generally file within 6 months. Given things like human error, bureaucracy and filing extensions... more should start rolling in over time.


Just thought this was curious, if I search for "HOCKANUM VALLEY COMMUNITY COUNCIL" nothing comes up for them specifically but if I search for the town they are in then the business appears in that list.


That's super weird! Comes up when I search other text in their form too. I'm gonna flag this and take a look tomorrow. Thanks!


Cool. I had one question, what's the usual lead time for non-profit data to show up in this dataset (e.g. when would you expect that 2018 forms/data would appear)?


There are already some 2018 forms in there, but it's based on fiscal year. So a nonprofit whose FY is Dec 2018 would have had to just file last month -- and sometimes they file late or get extensions. And again, this only covers e-filed forms -- they could file on paper, in which case you'd have to use the nonprofit name search and check for a filing.


Is it possible to sort after searching? Or filter by annual revenue? I'd like to see the organizations with the highest annual revenue for my specific searches. Thanks!


Those are good suggestions! I'm planing to add sorting by year, but revenue makes sense as well.


assuming elastic search?


Yup! We use it for a bunch of things, and I thought: what if I just dumped all this into it?


The page you found is a different page, not related to the IRS' "free file"/FFA program for those who make under a certain amount. See previous coverage here: https://www.propublica.org/article/turbotax-just-tricked-you...


The only way to get information about their web host is to submit to Cloudflare's abuse form, which, well...


According to GoDaddy, AWS and Rackspace, it is not. Not for non-DMCA complaints, anyway.

Side note: I wrote this.


Why are you cherry-picking the most expensive enterprise hosting companies in the industry? Those are hardly representative of the rest.


What? AWS and GoDaddy are not terribly expensive. GoDaddy is incredibly common. I don't know about Rackspace. Together they make up a large chunk of the Internet as we know it.


AWS and Rackspace are terribly expensive, GoDaddy is huge in the domain space but not particularly big in hosting, besides shared-.

Why not look at the likes of OVH, Hetzner, Voxility, Colocrossing and so on. Or maybe try Level3, their business may be a bit different but they're HUGE and certainly forward abuse reports.


I'm no expert, but a quick Google leads me to http://www.webqom.com/blog/2016_web_hosting_market_share_tre..., which states that GoDaddy is the most popular hosting provider out there.


Yes, Godaddy sells lots of shared hosting at insane margins.

These insane margins help pay for a big abuse department.

Most dedicated hosting providers don't have as big margins because they can't stuff 1000+ customers on one server.

Most dedicated hosting providers don't have very big abuse departments, or any abuse department at all.


What does price have to do with not stripping contact information from a report?


It's got everything to do with having a bunch of humans handling your abuse reports. It's also one of the reasons why IP reputation is so important to these hosts.


You really never realized that? It's been the stated reason for people opting out for a while now. It takes PRISM for it to sink in?

Yeesh.


Apparently up until this point people figured that everyone but the government would have access to their data, but they were OK with that... it's free to signup, after all!


Yeah, that's all kinds of weird. For starters, what rational person would expect that in a world where everyone but the government has access to their data- that the government would not have access to their data?

The statement is of course a strange construct when examined literally, but have you ever tried keeping a secret from just one person?


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: