What is an example of something done much better by a closed source database (co...

bladecatcher · on March 25, 2015

Many banks, hedge funds use kdb+/q for time series databases. This (very expensive) software is literally unheard of outside of these niche domains. I've been using it for close to 5 years for high frequency data, and honestly nothing out there comes close to this awesomeness of kdb+

testrun · on March 25, 2015

I would be interested to hear what you consider high volume (writes/second). I am supporting a manufacturing system and it sits at the moment at 7 000 writes/second (it is a normal time series, ie id,time,value,quality).

Ixiaus · on March 25, 2015

7000 writes per-second is pretty low for many high-volume time series needs.

Though it's not usually write throughput that most of these technologies are worried about. It's usually compression using dsp methods, aggregate stream folding computations, etc... that matters.

testrun · on March 25, 2015

Still waiting what is considered as high. Compression etc are not part of the question.

gd1 · on March 25, 2015

Just queried my db, and 300k a second is a typical peak in CME futures (across all products). I'd imagine that options could be a lot more than that.

testrun · on March 25, 2015

If I may ask, what are you using as db?

gd1 · on March 25, 2015

stingraycharles · on March 25, 2015

I worked in adserving. On peak moments we were doing around 350,000 updates a second.

Ixiaus · on March 25, 2015

Chill dude, I was being conversive. Your attitude makes me not even want to provide you with my own data now.

hyperliner · on March 25, 2015

That took some restrain. Thanks for being nice. I got your point originally but some people think everything needs to be an argument or worse a fight.

carlivar · on March 26, 2015

Engineers wanting data isn't exactly surprising. I would not consider it attitude.

jandrewrogers · on March 25, 2015

The systems I deal with start at millions of writes per second and go up from there. I have heard of systems that do over a billion writes per second, though I have not breached that threshold personally (yet).

From an IoT or sensor network standpoint, 7000 writes/second is an idle server.

testrun · on March 26, 2015

what db are you using?

bladecatcher · on March 26, 2015

I don't have numbers handy. The real power of kdb+/q comes from the column oriented architecture and the extremely powerful vector functional language q. The language q is inspired by APL. I highly recommend you to check out this article to get a sense of the APL family of languages https://scottlocklin.wordpress.com/2013/07/28/ruins-of-forgo...

If you want a database for blazing fast data-storage and retrieval, there are many options available. You start seeing the real benefits of kdb+/q when you use q to simplify very complex operations that aren't easily done in SQL. Also, the high level operators that q makes your code extremely terse. I've written complex backtesting systems that perform data mining on massive datasets - all in one page of very tight q code!

smiler · on March 25, 2015

Are you using SQL or ?

testrun · on March 25, 2015

_jcwu · on March 25, 2015

How much would the price be?

rluhar · on March 25, 2015

There is a free 32bit version of kdb available (http://kx.com/software-download.php). For the commercial version, pricing information is not publically available.

phpnode · on March 25, 2015

The zip files on that page contain the source for kdb, I thought I'd take a look at how it works.... nope! That is impenetrable

gd1 · on March 25, 2015

It's impenetrable to you, in the same way Mandarin is impenetrable at first glance to someone used to reading Latin languages.

easytiger · on March 25, 2015

The thing with KDB is almost all uses of it are in memory deployments. It isn't hard to make something that has little persistence or relegates persistence to a 2nd class citizen to run quickly.

pfortuny · on March 25, 2015

Well, but it does not exists as open source.

easytiger · on March 26, 2015

no, but std::map does

pfortuny · on March 26, 2015

Certainly, but it does not provide the same benefits as the complete package.

JulianMorrison · on March 25, 2015

Not even J?

kyllo · on March 25, 2015

Datomic is closed source and has features that no open source database currently offers. In particular, it's a time series database of immutable/append-only facts, so its horizontal read scalability is excellent, but it's still ACID and supports joins.

sargun · on March 25, 2015

If I remember correctly, Datomic is more of a data modeling layer / transaction manager, and less of a database.

weavejester · on March 25, 2015

It's definitely a database.

coolsunglasses · on March 25, 2015

It's definitely a very slow database. You have to be extremely fortunate to have a problem that fits into its niche neatly. I'd sooner figure out a historical insert-only schema for PostgreSQL in future. They're not great about fixing problems with Datomic either, it feels like an afterthought. Means of overflowing labor not currently allocated to a Cognitect contract gig, not a priority in its own right.

-- sad production ex-user of Datomic

dgrnbrg · on March 25, 2015

I think they've improved a lot WRT fixing problems--we had a chat with them after some issues with Datomic in production, and since then (6 months ago) we've had every problem we've discovered get fixed very promptly, and Datomic's continued to scale for us.

lazyseq · on March 25, 2015

coolsunglasses - Why did Datomic seem slow to you? Can you describe the problems you had in detail? I'm not from Cognitect, just someone who is developing some prouducts that currently use Datomic among a few other databases.

Would love to hear some honest feedback. Maybe your struggles were because of the tech, earlier versions, bad hardware config, or mis-applied use case?

MichaelGG · on March 25, 2015

MS SQL Server: who else has such easy and varied ways to cluster/mirror/replicate?

collyw · on March 25, 2015

MongoDB!

lclarkmichalek · on March 24, 2015

Stardog[0], a semantic/ontological[1] database, is probably best in class, and is closed source. Anyone interested in writing a open source triplestore, email me ;)

[0] http://stardog.com/ [1] They've started calling it a graph database, though I think triplestore is the most correct name

parasubvert · on March 25, 2015

When you're best in class you can afford to be proprietary.

Clark & Parsia had a history of open source (eg. Pellet) which was the best in-memory reasoner for a long time IMO... but not a lot of luck getting sustainable business subscription revenue. This led to the switch to dual-license AGPL in 2008 and now closed-source Stardog...

jerven · on March 25, 2015

The nice thing if you where using Stardog and this happened, you could easily move to any of its competitors which implement the same standards. Including opensource version too. i.e. you might miss a feature but at least your queries will still rung and you won't need to redo your whole app again.

SPARQL should really be everyone first technology to investigate before heading off to anything else. i.e. when you are still pivoting every week you should have the most generic database tech possible. Only when you scale you should specialize.

derefr · on March 24, 2015

One could be "interested in writing a open source triplestore", but why would you go down that path rather than, say, optimizing the heck out of Neo4J?

maxdemarzi · on March 24, 2015

If you want to help "optimizing the heck out of Neo4j", we are hiring http://neo4j.com/jobs/

lclarkmichalek · on March 25, 2015

Well, first, I don't have a high opinion of Neo4j. Secondly, SparQL queries are pretty distinct, and while, yes, they can be translated into generic graph queries, I'm pretty sure there are some fun optimizations to be had if you focus on their patterns 100%. Thirdly, because it'd be a hell of a lot of fun! Why else would anyone write a database...

derefr · on March 25, 2015

> Thirdly, because it'd be a hell of a lot of fun! Why else would anyone write a database...

Presumably, because you have a business, which has a product, which has a nascent feature, which requires some particular set of time-and-space-and-distribution guarantees that no current database on the market makes. This is why, for example, Cassandra was developed.

eternalban · on March 25, 2015

Do you mean forking the codebase or layering something like N3 over it? (btw, last I checked the Neo4j community version could only scale up and the distributed version was commercial.)

e1ven · on March 24, 2015

While there are some OSS column store DBs, Vertica is a very well put together solution. It's very fast, it scales reasonably well, it has support for wide-range of analytic functions, and good support.

emp · on March 25, 2015

Foundationdb has ACID transactions over the entire db, over the cluster and over multiple keys. And fast. I've looked over so many open sources alternatives, and they claim ACID but it's a deception based on some narrow interpretation of ACID. It's very annoying to spend time researching to discover the truth between the lines.

I would love to find a fast, scalable open source db that implements Foundationdb's features.

orand · on March 24, 2015

> What is an example of something done much better by a closed source database (compared to open source)?

How about FoundationDB? ;-)

hueving · on March 25, 2015

That's not the question. A correct answer would be saying what foundationdb does.

orand · on March 25, 2015

Okay Mr. Pedantic, high-performance distributed ACID transactions with an optional SQL layer on top.

glogla · on March 25, 2015

While I very much don't like Oracle as a company, I'm not aware there being other DB with something like flashback.

EDIT: Also, I'm not sure there's production ready free software column-store DB.

rodgerd · on March 25, 2015

> I'm not aware there being other DB with something like flashback.

Postgresql had it long before Oracle, but it was dropped as being too much of a hassle to maintain somewhere in the 7->8 transition IIRC.

pcsanwald · on March 25, 2015

I haven't really found an open source vertica style columnar store either, and I find this mystifying.

ssmoot · on March 25, 2015

Clustered Indexes. Data in-index. Restoring your database should not be measured in hours for just millions of rows. Statement generation for backups? IF you had clustered indexes you'd never finish restoring.

hydrogen18 · on March 24, 2015

Oracle. Who else can give me half the database for ten times the cost?