Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What is an example of something done much better by a closed source database (compared to open source)?


Many banks, hedge funds use kdb+/q for time series databases. This (very expensive) software is literally unheard of outside of these niche domains. I've been using it for close to 5 years for high frequency data, and honestly nothing out there comes close to this awesomeness of kdb+


I would be interested to hear what you consider high volume (writes/second). I am supporting a manufacturing system and it sits at the moment at 7 000 writes/second (it is a normal time series, ie id,time,value,quality).


7000 writes per-second is pretty low for many high-volume time series needs.

Though it's not usually write throughput that most of these technologies are worried about. It's usually compression using dsp methods, aggregate stream folding computations, etc... that matters.


Still waiting what is considered as high. Compression etc are not part of the question.


Just queried my db, and 300k a second is a typical peak in CME futures (across all products). I'd imagine that options could be a lot more than that.


If I may ask, what are you using as db?


kdb


I worked in adserving. On peak moments we were doing around 350,000 updates a second.


Chill dude, I was being conversive. Your attitude makes me not even want to provide you with my own data now.


That took some restrain. Thanks for being nice. I got your point originally but some people think everything needs to be an argument or worse a fight.


Engineers wanting data isn't exactly surprising. I would not consider it attitude.


The systems I deal with start at millions of writes per second and go up from there. I have heard of systems that do over a billion writes per second, though I have not breached that threshold personally (yet).

From an IoT or sensor network standpoint, 7000 writes/second is an idle server.


what db are you using?


I don't have numbers handy. The real power of kdb+/q comes from the column oriented architecture and the extremely powerful vector functional language q. The language q is inspired by APL. I highly recommend you to check out this article to get a sense of the APL family of languages https://scottlocklin.wordpress.com/2013/07/28/ruins-of-forgo...

If you want a database for blazing fast data-storage and retrieval, there are many options available. You start seeing the real benefits of kdb+/q when you use q to simplify very complex operations that aren't easily done in SQL. Also, the high level operators that q makes your code extremely terse. I've written complex backtesting systems that perform data mining on massive datasets - all in one page of very tight q code!


Are you using SQL or ?


ISAM


How much would the price be?


There is a free 32bit version of kdb available (http://kx.com/software-download.php). For the commercial version, pricing information is not publically available.


The zip files on that page contain the source for kdb, I thought I'd take a look at how it works.... nope! That is impenetrable


It's impenetrable to you, in the same way Mandarin is impenetrable at first glance to someone used to reading Latin languages.


The thing with KDB is almost all uses of it are in memory deployments. It isn't hard to make something that has little persistence or relegates persistence to a 2nd class citizen to run quickly.


Well, but it does not exists as open source.


no, but std::map does


Certainly, but it does not provide the same benefits as the complete package.


Not even J?


Datomic is closed source and has features that no open source database currently offers. In particular, it's a time series database of immutable/append-only facts, so its horizontal read scalability is excellent, but it's still ACID and supports joins.


If I remember correctly, Datomic is more of a data modeling layer / transaction manager, and less of a database.


It's definitely a database.


It's definitely a very slow database. You have to be extremely fortunate to have a problem that fits into its niche neatly. I'd sooner figure out a historical insert-only schema for PostgreSQL in future. They're not great about fixing problems with Datomic either, it feels like an afterthought. Means of overflowing labor not currently allocated to a Cognitect contract gig, not a priority in its own right.

-- sad production ex-user of Datomic


I think they've improved a lot WRT fixing problems--we had a chat with them after some issues with Datomic in production, and since then (6 months ago) we've had every problem we've discovered get fixed very promptly, and Datomic's continued to scale for us.


coolsunglasses - Why did Datomic seem slow to you? Can you describe the problems you had in detail? I'm not from Cognitect, just someone who is developing some prouducts that currently use Datomic among a few other databases.

Would love to hear some honest feedback. Maybe your struggles were because of the tech, earlier versions, bad hardware config, or mis-applied use case?


MS SQL Server: who else has such easy and varied ways to cluster/mirror/replicate?


MongoDB!


Stardog[0], a semantic/ontological[1] database, is probably best in class, and is closed source. Anyone interested in writing a open source triplestore, email me ;)

[0] http://stardog.com/ [1] They've started calling it a graph database, though I think triplestore is the most correct name


When you're best in class you can afford to be proprietary.

Clark & Parsia had a history of open source (eg. Pellet) which was the best in-memory reasoner for a long time IMO... but not a lot of luck getting sustainable business subscription revenue. This led to the switch to dual-license AGPL in 2008 and now closed-source Stardog...


The nice thing if you where using Stardog and this happened, you could easily move to any of its competitors which implement the same standards. Including opensource version too. i.e. you might miss a feature but at least your queries will still rung and you won't need to redo your whole app again.

SPARQL should really be everyone first technology to investigate before heading off to anything else. i.e. when you are still pivoting every week you should have the most generic database tech possible. Only when you scale you should specialize.


One could be "interested in writing a open source triplestore", but why would you go down that path rather than, say, optimizing the heck out of Neo4J?


If you want to help "optimizing the heck out of Neo4j", we are hiring http://neo4j.com/jobs/


Well, first, I don't have a high opinion of Neo4j. Secondly, SparQL queries are pretty distinct, and while, yes, they can be translated into generic graph queries, I'm pretty sure there are some fun optimizations to be had if you focus on their patterns 100%. Thirdly, because it'd be a hell of a lot of fun! Why else would anyone write a database...


> Thirdly, because it'd be a hell of a lot of fun! Why else would anyone write a database...

Presumably, because you have a business, which has a product, which has a nascent feature, which requires some particular set of time-and-space-and-distribution guarantees that no current database on the market makes. This is why, for example, Cassandra was developed.


Do you mean forking the codebase or layering something like N3 over it? (btw, last I checked the Neo4j community version could only scale up and the distributed version was commercial.)


While there are some OSS column store DBs, Vertica is a very well put together solution. It's very fast, it scales reasonably well, it has support for wide-range of analytic functions, and good support.


Foundationdb has ACID transactions over the entire db, over the cluster and over multiple keys. And fast. I've looked over so many open sources alternatives, and they claim ACID but it's a deception based on some narrow interpretation of ACID. It's very annoying to spend time researching to discover the truth between the lines.

I would love to find a fast, scalable open source db that implements Foundationdb's features.


> What is an example of something done much better by a closed source database (compared to open source)?

How about FoundationDB? ;-)


That's not the question. A correct answer would be saying what foundationdb does.


Okay Mr. Pedantic, high-performance distributed ACID transactions with an optional SQL layer on top.


While I very much don't like Oracle as a company, I'm not aware there being other DB with something like flashback.

EDIT: Also, I'm not sure there's production ready free software column-store DB.


> I'm not aware there being other DB with something like flashback.

Postgresql had it long before Oracle, but it was dropped as being too much of a hassle to maintain somewhere in the 7->8 transition IIRC.


I haven't really found an open source vertica style columnar store either, and I find this mystifying.


Clustered Indexes. Data in-index. Restoring your database should not be measured in hours for just millions of rows. Statement generation for backups? IF you had clustered indexes you'd never finish restoring.


Oracle. Who else can give me half the database for ten times the cost?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: