What I don't get about Shodan: Why aren't all unsecured databases found instantly (at the moment Shodan went online), but recurring attacks/dumps like this one that rely on it? Do they update their crawl data in waves?
One real limitation is getting data out of Shodan. Having done a few different projects that involve large-scale use of Shodan results (e.g. several hundred thousand records), this kind of thing usually ends up costing $300 for either export credits or a service plan. Sure, $300 isn't really that much to cause millions in damage, but I think it's a big factor in why we don't often see Shodan used for huge-scale malfeasance. You both have to put up the money and in paying you probably give up some identification info, and I don't know if Shodan has complied with law enforcement in the past but I can sure see them getting a warrant for "the person who just spend hundreds to export and/or query every unsecured MongoDB."
Also, as a bit of an aside, the relationship between "export credits" and "query credits," and the export system and API of Shodan, are extremely confusing and just a bad bit of product design. Each one seems to be capable of things the other isn't, but they're priced on totally different systems.
But really it's mostly just a matter of motivation, I think. Pulling even just thousands of entries from Shodan, writing some software to use them, and then running it in a reasonably deniable way, takes effort and is pretty slow (why we see this going for multiple days). It's not a huge amount of effort but it's enough that "script kiddie" types don't really seem to do it, you need to be motivated and spend the time on it.
Contrary to security urban legend it seems like the number of people who are highly motivated to purely cause damage is not actually that large, people only put in the time if they can figure out a way to gain from it... and just deleting data doesn't really achieve that. You've got to figure out a way to hold it for ransom and/or collect and leverage sensitive data. We've seen both happening on various scales with this kind of unsecured database and we'll probably see more of both as we go forward... but keep in mind that in the ransomware game, encrypting computers is both easier (established off-the-shelf ransomware can be purchased) and probably shows higher returns, so the "professionals" aren't spending a lot of time messing around with exposed databases.
We're actually getting rid of export credits because it's caused confusion over the years. We now just have query credits to download data/ do searches, and scan credits for users that want to request on-demand scans. We announced this change in the most recent Shodan Update newsletter. You can already use our new website (https://beta.shodan.io) to download data using your query credits.
Export credits were the first way I tried to monetize Shodan and it became a legacy system that lots of companies used so I was hesitant to get rid of it until something better was in place.
I'll also add that the API was purposely not designed for downloading lots of search results. The API is designed for security operations center (SOC) use cases. Companies that need large-scale, bulk access to our data would need to check out our enterprise platform (https://enterprise.shodan.io).
This is what I've assumed, but it's in a pretty uncomfortable place right now as e.g. the documentation often refers to export credits with a broken link.
The API is somewhat unsuitable for exporting large volumes because it seems remarkably unstable as to ordering, it suggests that you can do paginated requests but the second page tends to have 30% overlap with the first page.
I 100% understand the product motive to move large exports to an "Enterprise" feature, but it's rather disappointing because as a small-scale independent operation I don't expect to be able to afford it, and that would go for a lot of productive people in security research. But then, that's capitalism.
I decided that a broken link is better than having people spend money on something that will be deprecated. We're obviously working on cleaning up those broken links but it's an easy thing to explain if anybody emails support@shodan.io
The ordering is based on timestamp and it can happen that new results were indexed in between your 1st request and 2nd request which creates an overlapping result. A 30% overlap is unusual and sounds like it's for a query with many results.
Finally, most researchers don't actually need to download data. They could just use our free API and facet queries to get the information without downloading the actual data. This entire website is powered by a free API key that uses facets:
I think a lot of researchers go into the default mode of "I want to have the data" but using facets is way easier, faster and doesn't cost any money at all. And you can navigate the available facets using our new beta website (another area we're trying to make things a bit clearer). For example:
Note that we provide free upgrades to universities/ students/ professors as well as routinely work together with researchers so we're not trying to push them into the enterprise product. We also let universities monitor up to ~120k IPs for free using Shodan Monitor (https://monitor.shodan.io). But the use cases for researchers are few and we figure that if you need lots of data then you can send us an email.
A story like this pops up every year and preventing ransomware/ mass-deletion of publicly exposed databases has proven to be very challenging to stop from happening. I mean, I wrote about this issue 5 years ago:
We've also sent the raw data to various database vendors for free but even for them it's difficult to reach out to customers to get it fixed. And then there's always the worry that you'll get shot as the messenger of bad news. We've had a lot more success in getting things taken offline when we already have some relationship with the organization or at least a mutual customer.
In the past, older versions of MongoDB were more public than newer versions but that isn't the case anymore based on what we're seeing right now:
And in terms of Shodan, we crawl 24/7 (i.e. not waves) and update the search engine as the data is collected with a small delay (<1 hour) so anybody that gets real-time notifications (https://monitor.shodan.io) for their networks will see it before it shows up on the search index.