More

rgo · on Aug 29, 2017

Here's my personal version of the story:

With SQL DBs (Oracle, SQLServer and MySQL):

1. SQL database migrations where killing us. Going back and forward in a dev environment was impossible. No hot deploy in production.

2. Could not work well with application user-defined fields: adding columns adhoc to the database, indexing them, normalizing and denormalizing, performance issues, everything was a problem.

3. Blobs holding logging data got unmanageable quickly.

4. Joins where very hard to optimize even though the team had a lot of DBA experience fine tuning databases.

5. Had to build a very complex architecture around the database for a product that was not that complex: cache, search, database, blob store, distributed, etc.

And with all our 1990s and 2000s previous experiences in data warehousing, business intelligence and DB optimization tools, we were still wasting valuable time with SQL design, indexing, query planning and parameter optimization. So we gave MongoDB a try. First as a cache. Later as the only DB.

Our journey:

1. Heard about Mongo. Tried the DB. The driver worked great. To me that's the number one "marketing antics behind MongoDB": their strategy creating drivers and supporting the programmer community.

2. Understood what NoSQL meant and forgot about joins altogether.

3. Understood what NoSQL meant and built transactions into atomic documents.

4. Understood what NoSQL meant and stopped relying on the database for type, primary and foreign key constraints, default values, triggers (argh!), stored procedures (2x argh!), etc.

5. Simplified the architecture with integrated search, queue and cache. Less moving parts = joy.

6. Result: very low maintenance, easy install, configuration, replication and migrations. 99.999% availability.

7. Bonus: we even implemented a very high frequency, atomic distributed semaphore system with a FIFO queue that reaps zombies using Mongo built-in networking features.

So we've reduced DB-related issues by an order of magnitude. How? I think because NoSQL is a way of saying the DB should not be magically answering random queries. A database should be a data store, period -- just store and retrieve data the way the app needs it. By focusing our energies on getting the data right as documents for a document store meant data flows as objects from code in and out of Mongo.

I believe people underestimate how important (and productive) it is to keep the same data structures flowing between the UI (JSON), server (Object/Hash/Dictionary) and DB (document). It makes code easier to read and more resillient to errors.

But SQL DBs come with a convenience layer bolted on to run random user queries with things like OUTER joins and GROUP BYs. For that we need to flatten data into tables, which clashes with typically how data flows in an app.

SQL DBs however are great as the single source of truth for data: a schema can be laid out and enforced independently of code, so it's safely guarded from programmers breaking it. Business sets up a SQL DB so that their reporting people can query data on demand while consultants with zero knowledge of the business can write code limited by constraints managed by DBAs. SQL is even taught at business schools, which is revealing of who its target audience actually is.

Bottom-line: SQL and schema enforcing are end-user features we did not need to build our tool. On the other hand, every single MongoDB feature is something we need and use profusely.

DaiPlusPlus · on Aug 29, 2017

> 2. Understood what NoSQL meant and forgot about joins altogether.

How would you represent a simple invoicing system in MongoDB (e.g. Customers + Products + Orders + OrderLineItems )? NoSQL-for-everything advocates posit two solutions: either denormalize the data by embedding Customer information within an Order document, which also contains an array of OrderLineItems, or use a UUID as a kind-of foreign key and maintain separate relationships. Both approaches have serious problems (data-duplication and inevitable inconsistency in the first, and lack of referential integrity in the second, besides ending-up abusing a NoSQL database as an RDBMS). Is there a better way? Or would you agree that certain classes of problems are best left to RDBMS' domain?

rmrfrmrf · on Aug 29, 2017

The example you've used (invoices) is actually quite instructive for demonstrating the benefits of a "document store." An invoice, historically, was a literal printed piece of paper. Invoices are actually really annoying to implement in an RDBMS because of so-called "referential integrity" -- an invoice should be a "snapshot in time" of everything that happened when the order was processed, so ideally, when a user views their invoices from the past 2 years, they look the same every time.

Except, oops, your user got married and moved, now your precious "referential integrity" means jack because the generated invoice is flat-out wrong. Product removed from the store? Too bad, needs to stay in the database forever for historical purposes. Prices need to change? Better design the database to handle snapshots of every product state.

If you were implementing this in MongoDB, you'd probably store a UUID and the flattened data at the time of invoice generation, that way you can still query on ids AND not deal with the headache of having a combinatorial explosion of data in your RDBMS.

icedchai · on Aug 29, 2017

You would solve this in a RDBMS the same way: de-normalize when you're saving the invoice (example: a line items table with snap shot of current item price, description, etc.)

rmrfrmrf · on Aug 30, 2017

Yes, which suggests that the "serious problems" mentioned by the grandparent aren't serious (or problems) at all.

purerandomness · on Aug 31, 2017

In Postgres, you'd simply have a table with a JSON column for the snapshot-in-type contract.

You can then select fields from that JSON for invoices, reports, etc with the arrow operator:

https://www.postgresql.org/docs/9.6/static/functions-json.ht...

wvenable · on Aug 29, 2017

With SQL you can denormalize all that (and should) to create that snapshot. But with NoSQL you can't normalize and get back a way to quickly query the number of products sold per month over the last 5 years.

chrisco255 · on Aug 29, 2017

Yes, this is possible with Aggregation and MapReduce: https://docs.mongodb.com/manual/aggregation/

wvenable · on Aug 30, 2017

For relative values of "quickly".

imtringued · on Aug 29, 2017

Instead of nebulous terms like NoSQL you should instead just look at the damn features because these concepts are orthogonal. MongoDB has transaction isolation on the document level instead of the database level. If you can store everything in a single document then it doesn't matter. If you can't then use a database that supports database level transactions. It doesn't matter if it's a NoSQL or RDBMS database.

I feel a lot of people know that typical nosql databases (without database level transactions) are not suitable for their problem but they don't know why and then just think NoSQL is always bad and RDBMS are always better because the NoSQL databases are intended to be used for different problems.

kreetx · on Aug 29, 2017

Not the original commentor, but there are some valid cases for NoSQL: some people use it for storing massive amounts of web crawling data. But the thing here is that it's throw-away'ish, and in that case it's often not worth it to add structure (even though there pretty much is structure in everythig you look at long enough).

But I do think having any data consisting of, say, items, orders, users, payment in MongoDB is very much a bad idea. Been there.

mdpopescu · on Aug 30, 2017

> I think because NoSQL is a way of saying the DB should not be magically answering random queries.

The reason this is wrong is something that Codd et al learned a while ago: the data is MORE IMPORTANT than the application. Applications change and/or become obsolete; the data doesn't. You will still need to query the same database 50 years from now, but you likely won't have the same application to do it with. That means that everything that is important to the data (schema, constraints and so on) needs to stay with the data.

Something1234 · on Aug 29, 2017

What was your tool?

rgo · on June 10, 2016

Here's my advice regarding conferences and events in general, out of experience running a enterprise software startup with a decent marketing budget:

- Most of large conferences and shows are not worthy it, especially the expensive ones that "everyone goes to". People are too overwhelmed, busy and disperse. So both branding and lead generation is ineffective.

- Mid-sized ones are better, I mean, the more targeted ones, which focus on a special professional-groups with only a few booths and a cozy space you can network like crazy.

- Always try/pay to get a speaker spot. Negotiate a deal so it's "included" in the price. That's what gives you most visibility, draws people to your booth and kicks off lots of conversations.

- Don't grab speaking slots right after meals (lunch usually), otherwise you'll get a drowsy audience.

- The opposite is true too. If you have a speaker slot, try to get a booth so attendees can find you to extend the conversation.

- Put up the largest screen you can get in your booth or stand, close to the edge, so that passers-by can stop without fear of being harassed.

- Try to get a booth close to the speaker/conference area so that you can quickly draw people into your booth. Here's a trick: have different slide decks, focused on each of the talks being given (prepared in advance) relating your tech with that subject matter. Then run your decks in synch with the talks. After listening to a talk (ie "Mobile app churn"), many people want to stay in the momentum, they'll be immediately interested if they see "Churn Management Strategies" in big letters on your booth's screen.

- Focus on demoing the technology continuously instead of approaching people asking if they'd like a demo. People stop by when they see you demoing to someone (even if it's an accomplice). They want to listen in, but they don't want to be sold to.

- Don't spend money on swag. People that come for swag just want swag (or food). But have something handy (ie. a simple card-seized mini brochure that's not bulky) so people that stop by to see your tech but don't want to interact have something to grab on to that has your website on.

- Alcohol, if the conference allows it, at the end of the day is actually a great weapon for hearing out your (potential) users. Offer beer at the booth or sponsor a happy-hour. Don't expect to get leads or do serious branding. And don't over do it! (like building a whole Vodka bar with DJ music at the booth). This is more about doing F2F and socializing with people that want to share a drink with you after you take off your salesperson/marketing mask.

- Have your local reseller/partner (or salesperson) in the booth with you, as co-sponsor. Not just for costs, but they can do a follow up locally much better than your marketing team.

- Rent the badge reader option, so you don't have to clumsy exchange emails or biz cards. Also works great with antisocial attendees that are just watching your deck from afar. That's an instant email distribution list for doing a great follow up.

- And don't forget about the follow up email. To all attendees, offer a post-conference webinar where the same content is discussed again so they can share the link with their colleagues saying "you should hear this talk".

Measure everything (cost vs. leads). Make sure you repeat at the good conferences and don't insist with the bad ones. Good marketing is all about consistency.

karlmdavis · on June 10, 2016

Wow, uh... I'm not in enterprise software right now, but if I were, I'd be asking you where I could send a check. This is fantastic advice. Thanks for sharing it!

rgo · on June 8, 2016

Everytime I hear arguments for going back to relational databases, I remember all the scalability problems I lived through for 15 years in relational hell before switching to Mongo.

The thing about relational databases is that they do everything for you. You just lay the schema out (with ancient E-R tools maybe) load your relational data, write the queries, indexes, that's it.

The problem was scalability, or any tough performance situation really. That's when you realized RDBMSs were huge lock-ins, in the sense that they would require an enormous amount of time to figure out how to optimize queries and db parameters so that they could do that magic outer join for you. I remember queries that would take 10x more time to finish just by changing the order of tables in a FROM. I recall spending days trying different Oracle hints just to see if that would make any difference. And the SQL-way, with PK constraints and things like triggers, just made matters worse by claiming the database was actually responsible for maintaining data consistency. SQL, with its naturalish language syntax, was designed so that businessman could inquire the database directly about their business, but somehow that became a programming interface, and finally things like ORMs where invented that actually translated code into English so that a query compiler could translate that back into code. Insane!

Mongo, like most NoSQL, forces you to denormalize and do data consistency in your code, moving data logic into solid models that are tested and versioned from day one. That's the way it's supposed to be done, it sorta screams take control over your data goddammit. So, yes, there's a long way to go with Mongo or any generalistic NoSQL database really, but RDBMS seems a step back even if your data is purely relational.

snuxoll · on June 8, 2016

> And the SQL-way, with PK constraints and things like triggers, just made matters worse by claiming the database was actually responsible for maintaining data consistency.

I...just...I can't.

Your database should ALWAYS be responsible for maintaining consistency. Your application will likely be dead in a couple years, and if it isn't then you are going to end up having something else interfacing with the database at some point, guaranteed. The only sane place to guarantee consistency is in the database itself, otherwise every time you change constraints you are going to be updating every integration you have and hoping you didn't miss something.

My lord, what has come over developers?

nathan_long · on June 9, 2016

Suppose you're writing a system for vacation rentals. Here are two data consistency rules: 1) only one user may have a given email address 2) a given property can't have two overlapping rentals.

Postgres can enforce both, in a way that's not subject to race conditions. 1) Is a unique constraint and 2) is an exclusion constraint.

As far as I'm aware, the only way for application code to do this would be for it to 1) lock an entire table/collection of data 2) do a query to see what data is there 3) check the new data against the existing 4) write new data 5) unlock the database.

Postgres probably has to do the same thing conceptually, but by using indexes and running the checks in C on the same machine where the data lives, it can do it very quickly.

Are you saying you have a better solution?

wvenable · on June 8, 2016

I've been in the opposite situation and I couldn't disagree more. But I will say this, it's always possible to take an RDBMS model and de-normalize it and use it like a NoSQL database (like reddit does, for example) but it's not possible to go the other way.

lloyd-christmas · on June 8, 2016

> but it's not possible to go the other way

Why not? We do exactly that. We prototype in mongo and then migrate to postgres when we're comfortable with where the app is headed.

wvenable · on June 8, 2016

I don't mean it's possible to use a different technology; I mean within the same technology (postgres, for example) you can use it both as a normalized relational database and/or as a de-normalized document store.

kermatt · on June 8, 2016

> do data consistency in your code, moving data logic into solid models that are tested and versioned from day one

So your consistency check code is more reliable that that which has been built and tested over many years, people and projects?

And your code is guaranteed to be deployed into any project that need to access that database later?

rgo · on May 8, 2016

To me the relationship between NYT and FB (or any other content creator and FB) is not very different from that of freelancer content writers and content companies like Demand Media, of eHow and Livestrong fame. The only difference is that content published in FB, either by users or sponsored, gets to keep its branding, whereas content freelancer are ghosts - although most people I know share content in FB with total disregard to the source. It's sad to see how quality content creators like NYT now get to share the wall with unscrupulous ones. To all effects, FB is turning news into a commodity.

rgo · on May 2, 2016

My sister worked closely with many of the parties involved in the "Marco Civil", Brazil's brilliant Internet Bill of Rights. Two years later, I feel much of her work was in vain. No significant legislation was enacted from it and lately judges are trying to circumvent common sense by brute force. Now over 100M people are unable to use their communication platform of choice for 72h. In the meanwhile, congressmen are busy impeaching the president and calling it an act of god, probably as a device to expiate their own sins. It would make a good argument for a Game of Thrones clone series, some would say. And my sister, she's been out of the job since last January. She was let go when local NGOs ran out of money for fighting for an open internet. It seems freedom is the first thing that runs out in a recession. Scarcity is a bitch.

rgo · on April 10, 2016

I thought A380 and 787 had equivalent cabin pressure of 6000 ft with similar humidity features.

That said, I unfortunately did not notice any reduction in tiredness or jet lag in either plane compared to older aircraft. But I did enjoy the lower noise levels and higher stability of the A380, which make for a more pleasant journey overall. I feel less worn off when I land, which means I'll take less to recover from the flight itself, which I find more damaging then the time zone differences.

cylinder · on April 10, 2016

agree. The a380 is tremendous, especially when combined with Qantas service.

rgo · on March 29, 2016

These are the 3 questions I ask my team on non-deterministic errors:

- Can you reproduce it? (locally)

- No? Then can they reproduce it? (remotely)

- No? Then can you follow the flow byte-by-byte by just looking at the code? You should.

If you can reproduce it, great, you can most probably brute force your way into the cause with local monkey-logging or step-by-step debugging.

If a customer can reproduce it then you may have a shot at remote debugging, injecting logging or requesting a dump of some sort. That's why it's important for an app to have good tools built-in so a customer can send back useful debug info.

If you can't reproduce it, then give it a shot at following the flow byte-by-byte. Either mentally, with test cases or a combination of both. Here's a quick guide from the top of my head:

- determine if there are black spots where the variable, stack, heap etc. could have unexpected data or your assumptions could be wrong or your understanding of the language, library or any technology supporting the logic could be incomplete or needs a reread of the manual.

- order your black spots by probability, starting with the most vulnerable code related to the bug (ie, for that infinite loop bug the recursive function tops the rank for weak spot)

- now compare the bug symptoms against such vulnerable code to check if there's 100% match. That way you make sure all symptoms can be caused by the alleged culprit.

- do negative symptom match also, thinking of symptoms that would be caused by that fault and make sure they can be observed (ie, the recursive function writes zeros to a file beside looping forever - did it happen?)

- if there's more than one possible cause, apply Occram's razor: the simpler one, with the least assumptions, although unlikely, is the cause.

- if no possible explanation exists still, start over with less moving parts.

- if a vulnerable fragment as been identified, but no concrete cause or solution found, rewrite the code for robustness, with plenty of assertions, complementary logging and clear error messages. This is a good practice every time you revisit code it should come out cleaner and more robust than before.

danieltillett · on March 30, 2016

If you can reproduce the bug 99% of the problem is solved. I doubt I have spent more than a day fixing a bug that I could reliably trigger.

It is the non-deterministic bugs that drive me crazy. I have one bug where a call to a third party library randomly fails but only after the program has been running for days (no it is not a memory leak). If I make a cut down stub then the error never occurs even after running for a week. My best guess is I am trashing memory somewhere, but under valgrind everything is fine. Arg!

rgo · on Aug 9, 2014

Do you know why they are acquihiring you?

Bootstrapped or not, VC funded or not, profitable or not, a company's value is given by its potential. This potential will be measured (with different metrics, some of them quite subjective) by both parties, then negotiated hopefully to somewhere in between. In your case, you need to come up with your number by best estimating what the value of your team will be once acquired and assigned your new task.

Ie. you are making a cool AI board but you got no traction (hence no VC, revenue or customers). Now Intel wants to acquihire you to make their next generation of synaptic chips something cool. Just lately IBM closed a deal valued at $100M with a Chinese manufacturer for their synaptic chips. Since they are looking at an internal BP for the acquihire of a $100M potential, a 10x ROI puts you all at $10M. Now you have your own BP that you can use to support your intended price for the deal.

Once you have that potential value, you will want to figure your opportunity cost and BATNA by looking at the potential value of your product or company in X reasonable years. This will be your leverage for negotiating.

rgo · on June 19, 2014

External dependencies increase failure points, no argue there.

It depends on your line of business, but if you compare the benefits of making user signup faster, lowering acquisition barriers and getting access to the social graph of users against the risks of depending on one of the top infrastructures in the cloud, I think it may well be worth the trouble.

rgo · on May 13, 2014

I wonder if Spain will claim ownership of the ship or its contents like they (rightfully) did with the treasures of Nuestra Señora de las Mercedes.

http://www.businessweek.com/articles/2012-06-07/odyssey-and-...

The difference here is that the Santa Maria is not a warship, so I'm not sure the same rule applies.

tanzam75 · on May 13, 2014

It does not matter that it is not a warship. Sovereign wrecks are owned by the sovereign entity, whether or not they are armed.

The problem is that the Santa Maria was not owned by Spain. It was owned by Juan de la Cosa, a private citizen.