Many banks, hedge funds use kdb+/q for time series databases. This (very expensive) software is literally unheard of outside of these niche domains. I've been using it for close to 5 years for high frequency data, and honestly nothing out there comes close to this awesomeness of kdb+
I would be interested to hear what you consider high volume (writes/second). I am supporting a manufacturing system and it sits at the moment at 7 000 writes/second (it is a normal time series, ie id,time,value,quality).
7000 writes per-second is pretty low for many high-volume time series needs.
Though it's not usually write throughput that most of these technologies are worried about. It's usually compression using dsp methods, aggregate stream folding computations, etc... that matters.
The systems I deal with start at millions of writes per second and go up from there. I have heard of systems that do over a billion writes per second, though I have not breached that threshold personally (yet).
From an IoT or sensor network standpoint, 7000 writes/second is an idle server.
I don't have numbers handy. The real power of kdb+/q comes from the column oriented architecture and the extremely powerful vector functional language q. The language q is inspired by APL. I highly recommend you to check out this article to get a sense of the APL family of languages https://scottlocklin.wordpress.com/2013/07/28/ruins-of-forgo...
If you want a database for blazing fast data-storage and retrieval, there are many options available. You start seeing the real benefits of kdb+/q when you use q to simplify very complex operations that aren't easily done in SQL. Also, the high level operators that q makes your code extremely terse. I've written complex backtesting systems that perform data mining on massive datasets - all in one page of very tight q code!
There is a free 32bit version of kdb available (http://kx.com/software-download.php). For the commercial version, pricing information is not publically available.
The thing with KDB is almost all uses of it are in memory deployments. It isn't hard to make something that has little persistence or relegates persistence to a 2nd class citizen to run quickly.
Datomic is closed source and has features that no open source database currently offers. In particular, it's a time series database of immutable/append-only facts, so its horizontal read scalability is excellent, but it's still ACID and supports joins.
It's definitely a very slow database. You have to be extremely fortunate to have a problem that fits into its niche neatly. I'd sooner figure out a historical insert-only schema for PostgreSQL in future. They're not great about fixing problems with Datomic either, it feels like an afterthought. Means of overflowing labor not currently allocated to a Cognitect contract gig, not a priority in its own right.
I think they've improved a lot WRT fixing problems--we had a chat with them after some issues with Datomic in production, and since then (6 months ago) we've had every problem we've discovered get fixed very promptly, and Datomic's continued to scale for us.
coolsunglasses - Why did Datomic seem slow to you? Can you describe the problems you had in detail? I'm not from Cognitect, just someone who is developing some prouducts that currently use Datomic among a few other databases.
Would love to hear some honest feedback. Maybe your struggles were because of the tech, earlier versions, bad hardware config, or mis-applied use case?
Stardog[0], a semantic/ontological[1] database, is probably best in class, and is closed source. Anyone interested in writing a open source triplestore, email me ;)
[0] http://stardog.com/
[1] They've started calling it a graph database, though I think triplestore is the most correct name
When you're best in class you can afford to be proprietary.
Clark & Parsia had a history of open source (eg. Pellet) which was the best in-memory reasoner for a long time IMO... but not a lot of luck getting sustainable business subscription revenue. This led to the switch to dual-license AGPL in 2008 and now closed-source Stardog...
The nice thing if you where using Stardog and this happened, you could easily move to any of its competitors which implement the same standards. Including opensource version too. i.e. you might miss a feature but at least your queries will still rung and you won't need to redo your whole app again.
SPARQL should really be everyone first technology to investigate before heading off to anything else. i.e. when you are still pivoting every week you should have the most generic database tech possible. Only when you scale you should specialize.
One could be "interested in writing a open source triplestore", but why would you go down that path rather than, say, optimizing the heck out of Neo4J?
Well, first, I don't have a high opinion of Neo4j. Secondly, SparQL queries are pretty distinct, and while, yes, they can be translated into generic graph queries, I'm pretty sure there are some fun optimizations to be had if you focus on their patterns 100%. Thirdly, because it'd be a hell of a lot of fun! Why else would anyone write a database...
> Thirdly, because it'd be a hell of a lot of fun! Why else would anyone write a database...
Presumably, because you have a business, which has a product, which has a nascent feature, which requires some particular set of time-and-space-and-distribution guarantees that no current database on the market makes. This is why, for example, Cassandra was developed.
Do you mean forking the codebase or layering something like N3 over it? (btw, last I checked the Neo4j community version could only scale up and the distributed version was commercial.)
While there are some OSS column store DBs, Vertica is a very well put together solution.
It's very fast, it scales reasonably well, it has support for wide-range of analytic functions, and good support.
Foundationdb has ACID transactions over the entire db, over the cluster and over multiple keys. And fast.
I've looked over so many open sources alternatives, and they claim ACID but it's a deception based on some narrow interpretation of ACID. It's very annoying to spend time researching to discover the truth between the lines.
I would love to find a fast, scalable open source db that implements Foundationdb's features.
Clustered Indexes. Data in-index. Restoring your database should not be measured in hours for just millions of rows. Statement generation for backups? IF you had clustered indexes you'd never finish restoring.