Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Dow Jones’ watchlist of 2.4M high-risk clients has leaked (techcrunch.com)
171 points by twmahna on Feb 27, 2019 | hide | past | favorite | 45 comments


Am I missing something or is misconfiguring your cloud the way to go if you're a vendor of an osint product?

Information from public sources - no liability? No DJ customer details - no loss of business? Bob Diachenko discovered it - so no dumps floating around? 3rd responsible - remains unnamed, no brand damage? Free sample included in the high traffic TC article

It probably was not intentional, but could Dow Jones have benefited from this press overall?


It is called shared responsibility security model.

AWS can not know what is your true intention.



So, where's data?


Now that most hacking is nation state driven we aren’t seeing these datasets posted publicly nearly as often.


Do you have a source for that assumption?


I also want to see this list and who is on it.


Does anyone know if the targets of this database have a right of reply, and given it is from public sources, does that mean media reports are the primary sources that inform it?

The consequences of those questions could be quite serious.


Can anyone explain why Dow Jones would store data like this in elasticsearch? This seems like a classic relational database scenario.


The data didn't leak from Dow Jones, and the article doesn't cover how Dow Jones stores the data internally. Some customer who had the data leaked it from their own open system.


Extremely fast full text fuzzy search on sparse datasets.


Data from various arbitrary public sources would be difficult to put into a rational schema. Querying that schema would also be more difficult that a full tact text ES query


How to get a copy:)?


Is Trump in the leaked "risky customer" DB?

We could find intel out about political candidates.


Yet another sensitive database with probably no way to know if you're in it - GDPR sounds like a pain but I'm coming around to believing it's a necessary evil to stop this nonsense.

OTOH I guess this is relevant information and so they should be allowed to have it under GDPR rules? I'm obviously not a lawyer although my work, like most programmers' is affected by GDPR, PCI and whatnot.


... it's a compilation of public records... Hard to get too excited about public information being made public.

>The data is all collected from public sources, such as news articles and government filings.


Isn't that a bit like saying Facebook is just a collection of forwards from Granny? The compilation of raw materials into a coherent whole has a larger value than the existence of the raw materials.


So?

It's exactly the same as Wikipedia; yet nobody calls Wikipedia a "sensitive database."

I mean "So?" from a privacy standpoint, from a business standpoint it's an issue for Dow Jones.


Wikipedia is a sensitive database.

Mostly the sensitivity is in controlling write access rather than read though.


This database is the equivalent of a collection of terrorists' Wikipedia page. That's not a "sensitive database."


It's not where the data came from that's interesting, it's the the fact the list exists, who's on it and that it's being used to identify people that you may not want to start a business relationship with.


According to the article, the data was all pull together from public sources.


From what I understand (as mentioned, IANAL) having the database itself is lawful, as they're compiling it to comply with a legal requirement, but under GDPR, that still wouldn't stop from having to comply with GDPR and the rights (including knowing if a subject is in the database):

http://lexindicium.com/2018/03/19/data-mining-and-gdpr-compl...

https://ec.europa.eu/info/law/law-topic/data-protection/refo...

Right to:

information about the processing of your personal data;

obtain access to the personal data held about you;

ask for incorrect, inaccurate or incomplete personal data to be corrected;

request that personal data be erased when it’s no longer needed or if processing it is unlawful;

object to the processing of your personal data for marketing purposes or on grounds relating to your particular situation;

request the restriction of the processing of your personal data in specific cases;

receive your personal data in a machine-readable format and send it to another controller (‘data portability’);

request that decisions based on automated processing concerning you or significantly affecting you and based on your personal data are made by natural persons, not only by computers. You also have the right in this case to express your point of view and to contest the decision.

In particular, the clauses about access to personal data and to have decisions being made by a natural person seem relevant here.


There is a very interesting clash here where the anti-money laundering and know your customer laws require pretty substantial investigation in to customers and EU laws (GDPR, right to be forgotten) which require this sort of data to be purged or publicized.


GDPR explicitly addresses the right of holding and processing data for legal and regulatory compliance.


Yes, I'd quite like to know if I show up in it or not.


Or we could just start punishing companies for massive and widely damaging data leaks. AFAIK about GDPR, it wouldn't prevent this. These things keep happening because nothing bad happens to companies that let it happen.


GDPR prevents this by putting rules in place that you, as the owner of the data, need to show that you're protecting it responsibly.

The threat of the gigantic fine is what gets people into compliance to prevent this from happening.

Lots, possibly even the majority of companies in Europe beefed up their IT security procedures because of this, and I wouldn't be surprised if almost everyone that sits at a keyboard in Europe didn't get called into a meeting to talk about how important it is for them to keep their customer's data private and ways to do that.

Without something like this in place, companies can just not even care about users data.. because 'oops, we did nothing to protect it' is still a valid excuse.


>Lots, possibly even the majority of companies in Europe beefed up their IT security procedures because of this

On the other hand, they also don't provide internet services to people.


GDPR specifies fines up to 4% of annual global turnover or 20 million euros, whichever is greater. That seems like plenty enough bite, if it were enforced.


Why does an unprofitable 1person tiny business get a bankrupting (identical) fine as a profitable 1000 employee firm with $500m in turnover?


It doesn't. Those numbers are upper limits. Just like with traffic tickets and other fines, the actual amount is left to judgement.


If this were true then why have upper limits at all? The only reason I can think of is to protect large corporations.


> If this were true then why have upper limits at all?

Because while the rulemaker believes that there is a range of potentially reasonable judgments based on particular circumstances, they do not believe that range is unbounded.

> The only reason I can think of is to protect large corporations.

The fixed minimum upper limit of $20 million is actually probably to prevent (or limit the effect of) large corporations using smaller subsidiaries and fancy accounting for GDPR-risky activities, rather than the upper limit protecting large corps.


For two reasons:

1. To prevent cruel and unusual punishment.

2. To set expectations about the seriousness of the infraction in the eyes of the law.

I am not a lawyer or a legal scholar, so I'm sure there are more reasons.


“up to” and “equal to” are not the same.

When a store says “Everything up to 50% off”, that doesn't mean everything is half price.


>up to

I think that is the catch?


Shouldn't we see if GDPR actually starts preventing these leaks before declaring it a success? I'd imagine it being a 'success' is a predicate on it being useful right?

Not just punishing the small percentage who get 'caught' while doing nothing to actually help the problem - ala the drug war. And for everyone who thinks it's just big evil companies who get punished, one of the first GDPR fines was $4k against an Austrian small business owner whose video surveillance around his building was deemed too broad it violated peoples privacy.

I'm not declaring GDPR a failure by any means but all policy must be judged on a long-term full-picture basis. Not simply on "good intentions" of the bill + a few high visibility wins early on, then moving on as if the world is a better place.


GDPR doesn't prevent leaks any more than anti-speeding laws prevent speeding.

GDPR tells you what you can't do and what the penalty is for being caught in violation, just like a speeding law tells you what speed you can't exceed and what the penalty is for being caught.


So long as all the sources were public, I don't see why this is newsworthy at all.

You could probably build most of it with Google.


It is newsworthly for a number of reasons. Firstly, most people do not know that companies are scanning their customers, suppliers and employees against these Watchlists.

Secondly, people are placed on these watchlists with no burden of proof or right to recourse.

Thirdly, if you appear on these lists, which can be quite fuzzy, you can find that your banks accounts are frozen, with no explanation. Banks are now very risk adverse meaning that they are more than happy to alienate a few customers if it means avoiding the risk of massive fines.


Compiling and updating this information requires many man hours, which has value, thus Dow Jones can receive payment for access to this database (and many are very willing to pay). It's an asset.


We are big aws customers at my current employer and have generally had success, and I use amazon products, but that said:

This is totally on amazon for not having vpc-enabled elasticsearch clusters for way too long, AND, not providing an upgrade mechanism to move an existing internet-accessible cluster to a vpc. I was mindblown when I first utilized elasticsearch service and was sure that there would be data leaks for only having public net.


While I agree and those defaults are certainly suboptimal with blame to share, I would argue the buck stops with the individual that indexed all the proprietary data on :9200 open to the internet. You can do all sorts of stupid things with AWS (or any other tool). That doesn't make it Amazon's fault entirely. The individual is responsible for attempting a basic understanding of the tools they use.

When I learned the ropes of ES, configuring the endpoint was one of the first things that came up in a large number of docs and posts. In this case, I also wonder if the person doing it even realized it would be a problem since the database was based on "Publicly Available data". "Sure, turn CORS on, let's roll."

Thankfully this leak was of public data combined into a proprietary reporting tool, rather than something more sinister that would cause greater harm.


Just so I'm clear -- AWS elasticsearch service was launched in October 2015, and VPC support didnt come around until fall 2017. So for over 2 full years the only way to utilize their elasticsearch service was to run it internet-accessible. I'm not talking about defaults here -- it was the only option.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: