Am I missing something or is misconfiguring your cloud the way to go if you're a vendor of an osint product?
Information from public sources - no liability?
No DJ customer details - no loss of business?
Bob Diachenko discovered it - so no dumps floating around?
3rd responsible - remains unnamed, no brand damage?
Free sample included in the high traffic TC article
It probably was not intentional, but could Dow Jones have benefited from this press overall?
Does anyone know if the targets of this database have a right of reply, and given it is from public sources, does that mean media reports are the primary sources that inform it?
The consequences of those questions could be quite serious.
The data didn't leak from Dow Jones, and the article doesn't cover how Dow Jones stores the data internally. Some customer who had the data leaked it from their own open system.
Data from various arbitrary public sources would be difficult to put into a rational schema. Querying that schema would also be more difficult that a full tact text ES query
Yet another sensitive database with probably no way to know if you're in it - GDPR sounds like a pain but I'm coming around to believing it's a necessary evil to stop this nonsense.
OTOH I guess this is relevant information and so they should be allowed to have it under GDPR rules? I'm obviously not a lawyer although my work, like most programmers' is affected by GDPR, PCI and whatnot.
Isn't that a bit like saying Facebook is just a collection of forwards from Granny? The compilation of raw materials into a coherent whole has a larger value than the existence of the raw materials.
It's not where the data came from that's interesting, it's the the fact the list exists, who's on it and that it's being used to identify people that you may not want to start a business relationship with.
From what I understand (as mentioned, IANAL) having the database itself is lawful, as they're compiling it to comply with a legal requirement, but under GDPR, that still wouldn't stop from having to comply with GDPR and the rights (including knowing if a subject is in the database):
information about the processing of your personal data;
obtain access to the personal data held about you;
ask for incorrect, inaccurate or incomplete personal data to be corrected;
request that personal data be erased when it’s no longer needed or if processing it is unlawful;
object to the processing of your personal data for marketing purposes or on grounds relating to your particular situation;
request the restriction of the processing of your personal data in specific cases;
receive your personal data in a machine-readable format and send it to another controller (‘data portability’);
request that decisions based on automated processing concerning you or significantly affecting you and based on your personal data are made by natural persons, not only by computers. You also have the right in this case to express your point of view and to contest the decision.
In particular, the clauses about access to personal data and to have decisions being made by a natural person seem relevant here.
There is a very interesting clash here where the anti-money laundering and know your customer laws require pretty substantial investigation in to customers and EU laws (GDPR, right to be forgotten) which require this sort of data to be purged or publicized.
Or we could just start punishing companies for massive and widely damaging data leaks. AFAIK about GDPR, it wouldn't prevent this. These things keep happening because nothing bad happens to companies that let it happen.
GDPR prevents this by putting rules in place that you, as the owner of the data, need to show that you're protecting it responsibly.
The threat of the gigantic fine is what gets people into compliance to prevent this from happening.
Lots, possibly even the majority of companies in Europe beefed up their IT security procedures because of this, and I wouldn't be surprised if almost everyone that sits at a keyboard in Europe didn't get called into a meeting to talk about how important it is for them to keep their customer's data private and ways to do that.
Without something like this in place, companies can just not even care about users data.. because 'oops, we did nothing to protect it' is still a valid excuse.
GDPR specifies fines up to 4% of annual global turnover or 20 million euros, whichever is greater. That seems like plenty enough bite, if it were enforced.
> If this were true then why have upper limits at all?
Because while the rulemaker believes that there is a range of potentially reasonable judgments based on particular circumstances, they do not believe that range is unbounded.
> The only reason I can think of is to protect large corporations.
The fixed minimum upper limit of $20 million is actually probably to prevent (or limit the effect of) large corporations using smaller subsidiaries and fancy accounting for GDPR-risky activities, rather than the upper limit protecting large corps.
Shouldn't we see if GDPR actually starts preventing these leaks before declaring it a success? I'd imagine it being a 'success' is a predicate on it being useful right?
Not just punishing the small percentage who get 'caught' while doing nothing to actually help the problem - ala the drug war. And for everyone who thinks it's just big evil companies who get punished, one of the first GDPR fines was $4k against an Austrian small business owner whose video surveillance around his building was deemed too broad it violated peoples privacy.
I'm not declaring GDPR a failure by any means but all policy must be judged on a long-term full-picture basis. Not simply on "good intentions" of the bill + a few high visibility wins early on, then moving on as if the world is a better place.
GDPR doesn't prevent leaks any more than anti-speeding laws prevent speeding.
GDPR tells you what you can't do and what the penalty is for being caught in violation, just like a speeding law tells you what speed you can't exceed and what the penalty is for being caught.
It is newsworthly for a number of reasons. Firstly, most people do not know that companies are scanning their customers, suppliers and employees against these Watchlists.
Secondly, people are placed on these watchlists with no burden of proof or right to recourse.
Thirdly, if you appear on these lists, which can be quite fuzzy, you can find that your banks accounts are frozen, with no explanation. Banks are now very risk adverse meaning that they are more than happy to alienate a few customers if it means avoiding the risk of massive fines.
Compiling and updating this information requires many man hours, which has value, thus Dow Jones can receive payment for access to this database (and many are very willing to pay). It's an asset.
We are big aws customers at my current employer and have generally had success, and I use amazon products, but that said:
This is totally on amazon for not having vpc-enabled elasticsearch clusters for way too long, AND, not providing an upgrade mechanism to move an existing internet-accessible cluster to a vpc. I was mindblown when I first utilized elasticsearch service and was sure that there would be data leaks for only having public net.
While I agree and those defaults are certainly suboptimal with blame to share, I would argue the buck stops with the individual that indexed all the proprietary data on :9200 open to the internet. You can do all sorts of stupid things with AWS (or any other tool). That doesn't make it Amazon's fault entirely. The individual is responsible for attempting a basic understanding of the tools they use.
When I learned the ropes of ES, configuring the endpoint was one of the first things that came up in a large number of docs and posts. In this case, I also wonder if the person doing it even realized it would be a problem since the database was based on "Publicly Available data". "Sure, turn CORS on, let's roll."
Thankfully this leak was of public data combined into a proprietary reporting tool, rather than something more sinister that would cause greater harm.
Just so I'm clear -- AWS elasticsearch service was launched in October 2015, and VPC support didnt come around until fall 2017. So for over 2 full years the only way to utilize their elasticsearch service was to run it internet-accessible. I'm not talking about defaults here -- it was the only option.
Information from public sources - no liability? No DJ customer details - no loss of business? Bob Diachenko discovered it - so no dumps floating around? 3rd responsible - remains unnamed, no brand damage? Free sample included in the high traffic TC article
It probably was not intentional, but could Dow Jones have benefited from this press overall?