Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SQL vs. NoSQL (learncodethehardway.org)
55 points by thinkerer on June 1, 2014 | hide | past | favorite | 77 comments


The problem with SQL is it seems everyone hates its guts. It is a weird obtuse kind of "non-language" that most programmers can't stand.

I'm baffled. I'm a programmer and i rather like SQL. Most programmers i know use SQL quite happily. Indeed, i often hear them wishing they could just write SQL to solve some problem rather than use some API that purports to save you from writing SQL.

I certainly prefer SQL to MongoDB's query language, which is verbose and unexpressive by comparison.

Am i in a tiny minority? Or is this SQL hatred a popular myth?


It's not a popular myth, it's just become a trend. People want to use javascript or some dynamically-typed interpreted algol-like language for everything, and they're freaked out by SQL because it's both different and mathy.

It's also very, very easy, and almost completely fat-free as a mathematical expression of what you might want to do with data. I learned almost everything I needed to know about it within an afternoon ~10 years ago, and the dividends paid since on that investment have been extraordinary.

The use of "non-language" here just reflects the extremely narrow view of how languages must look amongst programmers of late.


When I see people complain about SQL, "mathy" never seems to come up. Mostly it's "tries too hard to be English".


Those "hard to learn" complains are about SQL being mathy. But maybe if it didn't try that hard to be english-like, people would resort to learning the math and get it faster, or maybe it would mean that nobody would use it, and some other english-like language would take its place.


I know SQL inside and out, and I hate it for anything put trivial queries that involve a single table and a single WHERE clause.

Every time I write a query of reasonable complexity, I have to figure out what I want the database to do (in an imperative sense -- read from this table first, then look up this in that table, then do this other thing with this other table for each row), and then figure out how to translate this into a declarative SQL statement, just so the database's query interpreter can turn that back into an imperative program it can run.

The whole thing is just a huge unnecessary round-trip of thinking, and SQL is limited in that there are many simple things it can't do, and you never really have any guarantee that the database's query analyzer will execute your query imperatively in the manner you hope for (which can lead to terrible performance problems, and hacks like index "hints" still don't always solve it). I'd MUCH rather be able to program against the database imperatively, directly.

So despite using SQL day-in, day-out, I totally agree that it is weird, obtuse, and a kind of "non-language" (in the sense that any query is limited to a single declarative statement, with a host of limitations, instead of an imperative programming language that allows you to write whole programs of execution).


Good points all the way around. Do you think SQL's obtuseness comes from it's ability to do more than (perhaps) it should?

Maybe it really only should be used for simple queries with simple where clauses. But because it can be used for more, we choose to do so.

It's based on relational algebra making it fairly predictable (query optimizer aside which is completely unpredictable). It operates on relations and returns relations.

"Select * from MyRelation;" returns a relation. So we could also write "select * from (select * from MyRelation);" and get the same result. You can continue that nested select as many times as you like and with as much complexity as you like and it would still work. Though undoubtedly it gets difficult to follow.

Adding to the obtuseness, SQL is not typed. So for "select * from MyRelation where SomeField = 1;" you could replace the 1 with another select statement, which would return a relation. But as long as that relation had a single column and a single row, it would be interpreted as a single value and successfully replace the 1.

Also adding to the obtuseness as you get more complex is that it variable scope for table aliases can be difficult to follow. Table alias names can be used across various select, where, from and other clauses to link them together through a common filter but it happens in the single SQL statement.

I think it's that common issue of reducing a "program" into a so gel statement that causes the issues I run into. At times, I feel that I am writing what basically amounts to a stored procedure in a single statement. That's when I start getting mad at SQL. Though, it's really my own choices that made it a problem.


This seems to be a good example of not allowing a fundamentally different paradigm to "sink in" but rather attempting to impose a paradigm you're already comfortable with onto it. Relational databases really aren't imperative systems, and while SQL is a pretty good declarative language for relational systems, it is indeed very frustrating to attempt to use it imperatively.


I think the sad proliferation of horrific ORM systems and the thousands of hours put into them by many people is proof enough that this is not just a popular myth. While I wouldn't say that most programmers hate it, I'd say there is a fair share of those who do. Then again, considering the horrors that are built to avoid SQL, one wonders if they should be counted as programmers anyway.


Just because I hate extracting the results of a query into a structure the program can use, doesn't mean I don't like SQL. It's a very nice language to write relational algebra in.


Indeed. The current crop of lightweight semi-ORMs (MyBatis, Ebean, ORMLite, etc) are built around that use case: let programmers write SQL, but automate the conversion of result sets to objects.

You can also do this in standard JPA, to some extent, using constructor expressions, although those are pretty verbose.


I like relational algebra. I like the idea of a data-store language whose core features are the CRUD operations. But I'll pipe up and say I don't like SQL. Verbose string management, monstrous 3-value boolean logic, the lack of any linguistic shortcuts for joining on foreign-key constraints, etc.. The problem is that a good replacement for SQL would be a modernized relational language, not a NOSQL language that throws the baby out with the bath-water.


Yeah the SQL vs. NoSQL thing conflates so many different concerns. Sometimes it's about not wanting to do things relationally, sometimes it's about trading off ACID for other concerns, and sometimes (most rarely, it seems to me) it is actually about the SQL language itself. I don't think the language is really so bad (inconsistencies across databases aside), but I'd be interested in taking a look at a redesigned language that does all the same stuff but cleans a few things up.


You mean shorter than "JOIN fktable USING (key_name)"?

(Granted, that implies both columns have the same name iirc)


Actually, yes. JPQL, the generally SQL-like query language in the Java Persistence API, lets you write joins through foreign key references (as most joins are, although of course not all) as if they were object property references. So instead of:

  select u.user_id
  from User u
   join Company c using (company_id)
   join Country k using (country_id)
  where k.code = 'br'
You would write something like:

  select u from User u where u.company.country.code = 'br'
That's pretty neat. I wish i could do that in SQL.

If you want to fetch data from joined entities, then you still need to name them in an explicit join, but it's still easier than in SQL:

  select u, k.code from User u join u.company.country k where u.creation_date = current_date


Microsoft's Entity Framework is similar, except that they put the Select at the end so that the editor is aware of the availability of relevant variables when you're writing the "select" part of the statement. Such simple changes that will never be part of the SQL standard.


I don't like SQL either. Disclaimer: I wrote my own SQL server some time ago, then went on to help to prepare a NoSQL architecture at one of the largest tech companies.

SQL is a weird hack of relational algebra with a huge variation of syntax between different providers. Also, it's pretty inflexible if you need to change schema, you need to solve crazy transactional problems if you write your own server, not mentioning the problems arising from clustering.

Try modeling trees in SQL - if your data is mostly static, you will be saved by nested sets or their fractal successor, however most SQL servers mislead you in thinking their tree-functions are anywhere as performing. If you think you don't need trees, imagine modeling categories or an org chart in SQL, and making sure decisions at one level propagate correctly to all levels underneath and can be overriden only with correct permissions, and this must be a part of each query.


I really like SQL, because all those hacks crazy transactional problems are encapsulated within it, and I don't have to deal with them in application code.

But yes, writing a noSQL server should be way easier than a SQL one. And yes, you have a point about trees, as do people complaining about queries results having no structure. Thigs could be better, but schema enforcement and data security in face of massive parallelism are quite worth it.


Don't CTEs solve the problem of representing trees in a DB (modulo depth limit)? I have used them to construct schemas that represented org charts and other kind of trees. Right now I'm working on a project that will probably use an RDBMS to store the parse tree of a compilation.


Try to benchmark your CTE query against a nested-sets implementation as described e.g. here:

http://www.evanpetersen.com/item/nested-sets.html


I can't speak for any other server but Postgres recurvise CTEs should be reasonably performant


I am not sure what is the situation today; in 2005 when I wrote nested-sets implementation for TSQL and benchmarked it against Oracle's START-CONNECT extension it was a few milliseconds vs seconds for large hierarchies. Of course, nested sets need an occasional rebuild of the whole hierarchy so they are only useful for mostly static hierarchical data.

If you have only a couple of levels, you are probably fine with what your SQL server offers; for huge amount of hierarchical data you need to use something smarter.


I'm with you. I love SQL. I find it surprisingly expressive, despite having a rigid syntax. Some of my favorite projects that I've worked on thus far into life have been heavily SQL-based, and no successful code run has ever matched the thrill of seeing a SQL statement return what I expected.

I'd venture to say NoSQL has gained it's popularity because it's Magic(TM), and it works well with equally magical frameworks. For frameworks that don't have generators like ActiveRecord, NoSQL replaces the need to boringly write your models and database logic. It's like every Getting Started document on the Internet is having a race to the bottom, and people are falsely equating a quick start with a good framework. I have never picked a technology to work with because I could have a blog written in 10 minutes.


> I'd venture to say NoSQL has gained it's popularity because it's Magic

No, it's because to a first approximation, they aren't magic. Writing your queries in a more imperative style means that most developers can more easily reason about them, doubly so once they get more complicated.

You don't have to rely on generators or magic frameworks to generate SQL for you with a good document database. All your queries can quite reasonably be right there, and things like "get the document with the ID of foo" are already just one method call away.

Of course, most NoSQL databases have tradeoffs that are not obvious, while I believe all SQL databases stick to ACID. That's probably more of the problem than what query language is used.


> You don't have to rely on generators or magic frameworks to generate SQL for you with a good document database. All your queries can quite reasonably be right there, and things like "get the document with the ID of foo" are already just one method call away.

I guess the scenario I was thinking of was if somewhere were to sit down and learn Rails with no knowledge of programming, they don't also have to learn SQL in addition to Ruby and Rails and ERB and everything else bundled in because ActiveRecord takes care of the database for you. If they were learning Sinatra, as a comparison, they'd have to also learn some ORM-layer tool, which potentially abstracts the SQL, but may not. But if they do Mongoid or the like, then the database returns to Magic status. Change you schema? No problem.


Do you know of any good ACID NoSQL?


After studying SQL at MIT with Greenspun, I came to see it not only as a nice language, but beautiful and expressive in a way similar to Lisp. I personally would rather write database applications mostly in SQL than pull as much as possible up into the scripting language through ORMs.


Popular myth. Definitely. There is hardly anything as beautiful and simple as SQL.


Agreed.

Years ago, when I took a database class in college, I hated it. Several years later, I began working at a consulting firm on the support team. My primarily role was defect/enhancement tracking and resolution in the product we had developed for them as well as their satellite apps.

The first time I really got my hands dirty with SQL was pulling out a bunch of VBA and Access "SQL" they were using to generate reports and to rewrite it into real SQL so it could be run on the backend, which cut times down from hours to milliseconds. It was okay (I'd hate those queries now) but it's where I really picked it up.

Then the same client started calling me to write (or edit) queries to pull out some information from their database I don't think anyone had anticipated they'd ever need. That's when I started to really appreciate how beautiful and expressive SQL can be. Stop thinking imperatively - SQL will let you do it but don't. No cursors unless you absolutely have to do. You probably don't. Dynamic SQL has its place, but try to stay away from that too.

There are a few queries I still remember with some fondness. One was to determine the number of days particular classes of employees worked on particular classes of projects. Unfortunately, the way their data was modeled made this not-at-all obvious - and the date information was stored in the "start" and "end" date type fields.

There was a lot more to it, but after pulling my hair out - the customer generally needed these in under half an hour - I suddenly realized I could use a recursive CTE to generate a dataset of all days that I could use to join with, filter out any duplicates, and then use aggregation functions to get what I needed. There were a lot of other details because of the complexity of their model, but it worked and it just seemed so elegant at the end.

I've moved on to another job now and honestly, writing SQL is what I miss most. I would have never pictured myself saying that before I worked there.


I'm probably one of those programmers you think "use SQL quite happily". You're wrong.

SQL is an abstraction with myriad problems in both its core design and its disjointed implementations. ORMs and similar interfaces are crappy abstractions on top of a crappy abstraction. You get the problems of SQL combined with the problems of the ORM.

Preferring to remove a set of problems and be left with only the problems of SQL is not the same thing as liking SQL, it is merely preferring the lesser of two evils.


In what sense SQL is bad?


Well, I like SQL, but let me play devil's advocate and guess at some of the ways people may consider it "bad":

- SQL queries flatten structured and hierarchical relationships into unstructured rows, often requiring very creative join operations to get at the exact set of data you want, after which you then have to convert everything back to the structured data you wanted in the first place. ORM's are a leaky abstraction on top of that, which means that when you use them you end up dealing with two problems instead of one.

- SQL is superficially simple, but at scale the simplicity is a lie. Thhe performance trade-offs you make require you to understand the internals of the database you're running on and how it interprets the query, and often you feel like you're dealing with a too high level API. Often you're resigned to "tricks" to force the engine to do the right thing, like optimizer hints, special index types or denormalized data.

- SQL has no ability to express relexation of the ACID principles. If you're streaming weather data into a SQL database, it could be ok to drop inserts here and there to gain throughput for the data analysis operations. SQL has no abilities to express such a thing (except in db-specific hacks).


Even those reasons are really stretching for it these days.

There are numerous ways of dealing with hierarchical or otherwise structured data using relational DBs. Support for recursive CTEs is becoming more common, which makes such querying quite clean and simple. And with many relational systems offering support for storing and manipulating XML and JSON data, storing more structured data becomes a non-issue.

I think that SQL managing to hide a lot of the underlying implementation details is one of its strongest points. In many situations the defaults are more than sufficient, saving developers and users a significant time. Yes, there are situations where more control over the exact handling is needed, but it gives ways of managing this. At least a simplified, high-level abstraction is offered; many NoSQL systems immediately force you to take a very low-level approach for even the simplest of queries.

And I don't see the ability to intentionally lose data as something that's good, either. The database should be concerned with doing everything in its power to protect and ensure the integrity of the data it has been given. If it's fine to lose some of that data, then whatever or whoever is providing that data should just not provide it to the DB in the first place.

So while some people may present those as possible disadvantages of SQL, I just don't think the arguments really hold up upon further scrutiny.


> SQL has no ability to express relexation of the ACID principles. If you're streaming weather data into a SQL database, it could be ok to drop inserts here and there to gain throughput for the data analysis operations. SQL has no abilities to express such a thing (except in db-specific hacks).

Why should the language control the underlying data store to that level? Also, SQL itself is not defined by or defines ACID principles. You can (and some people do) run SQL over very non-ACID databases (everyone using MyISAM or SQLite?)

I personally feel that tying the data-store so close to the query language is a recipe for unneeded complexity. Though, the transaction isolation levels may be close to what you're talking about (though, I don't think there is anyway to tell a db like PostgreSQL "Go ahead, lose this. I don't care about it".)

> SQL is superficially simple, but at scale the simplicity is a lie. Thhe performance trade-offs you make require you to understand the internals of the database you're running on and how it interprets the query, and often you feel like you're dealing with a too high level API. Often you're resigned to "tricks" to force the engine to do the right thing, like optimizer hints, special index types or denormalized data.

Please tell me of anything where it isn't true that the more intimately you know your tools the better you can use them. Quite honestly most query planners are really friggin' good at what they do (just like a compiler, or are you a person that still claims hand tune assembly is the answer to everything).

EDIT: PostgreSQL does allow unlogged tables that are not crash safe, though I don't think they're randomly lose rows, but are faster. So there are some semantics that allow you to control that, but they are probably vendor-specific (getting into my complexity argument). Thanks Myon on #postgresql for pointing that out.


There is Asynchronous Commit feature in postgres, that allows you to gain performance by allowing postgres to lose some commited transactions. It can be controlled on transaction level and is exactly :"Go ahead, lose this. I don't care about it". Manual: http://www.postgresql.org/docs/current/static/wal-async-comm...


Like an unlogged table that's only not crash safe. I think GP was talking about rows actually not being written to the db during normal circumstances. Maybe I'm misunderstanding what GP is asking for.


Hm. "rows actually not being written to the db during normal circumstances". A database that acts like /dev/null ? Like BLACKHOLE MySQL storage engine? I can not imagine a real use case, where it will be needed. If you allow a database to lose "some" rows it may well lose all of them. What is the purpose of a database that does not store your data?


I'm not the one who said "If you're streaming weather data into a SQL database, it could be ok to drop inserts here and there to gain throughput for the data analysis operations. SQL has no abilities to express such a thing (except in db-specific hacks)."

I can understand wanting an unlogged table for the speedup (missing the last sec of weather data won't cause huge problems and I'm more forgiving of missing data when a whole machine fails), but just to be able to randomly drop things seems scary.

Also, I thought you were joking, but you're not http://dev.mysql.com/doc/refman/5.0/en/blackhole-storage-eng... bu


I know we're not supposed to comment just to say you agree with the above post, but: OMG yes! Everything you just wrote. Relational algebra is a great concept for a data-store language, but why are we stuck with the COBOL of data-store languages?


I thought COBOL was hated because it's very verbose, the programs written in it are very complex because of the domain, and contain warts because they're so old.

What exactly don't you like about SQL?


I think COBOL is just being thrown around here as a euphemism for old people used this. ORMs are 1000x more verbose than SQL. Map-reduce queries are no shorter, require you to denormalize your data ahead of time in frequently nonobvious (and always inflexible) ways, and have vastly different syntaxes and semantics depending on which product you're using. They're definitely new, though.


All the stuff the grandparent post mentioned. Insane 3-value boolean logic. The fact that the result-set must be in the form of a table instead of a graph of data (when the client code wants a graph). Even just moving the "select" keyword to the end of the query so query-writing tools can provide some autocomplete.

Even simple things like providing a simple syntax for walking foreign key relations instead of cumbersome Join operations. Using any ORM worth its salt, I can say customer.company, whereas in native SQL I have to say "from customer join company on customer.companyid = company.companyid".

To me, SQL is like Common LISP. A fantastic invention of a bygone era that has stood the test of time... but is stubbornly resistant to real improvement and shows too many warts of its age.

And again, as the grandparent said, the cases where the abstraction fails and you end up having to figure out how exactly it's implementing your query. Abstraction failures make a tool worse-than-useless, because it means I have to not only understand the problem the tool is trying to solve, but I have to understand the tool and the quirks of its handling of the problem.

I don't want to imply that ORMs are better (the last thing we need is another abstraction layer) or NoSQL is better (throwing the baby out with the bath-water there) just that SQL is old and has not seen the kind of improvement that other languages have seen.


> Insane 3-value boolean logic.

What? Not having a value is not having a value, it's not false. It's also a fairly standard concept. http://c2.com/cgi/wiki?ThreeValuedLogic

> The fact that the result-set must be in the form of a table instead of a graph

Don't use a hammer to screw in a screw. There are graph databases (which support ACID, btw) out there for a reason.

> A fantastic invention of a bygone era that has stood the test of time... but is stubbornly resistant to real improvement and shows too many warts of its age.

I don't get this. It is improving and it's stood the test of time because it solves a problem and solves it well.

> And again, as the grandparent said, the cases where the abstraction fails and you end up having to figure out how exactly it's implementing your query. Abstraction failures make a tool worse-than-useless, because it means I have to not only understand the problem the tool is trying to solve, but I have to understand the tool and the quirks of its handling of the problem.

I don't quite get you. When a SQL query fails, it's not because you need to understand the database arch better. It's like when a C program fails, you don't need to understand GCC to fix it. You do however, need to understand how things are being used to pull the most performance out of it as possible, just like a regular language and a compiler.


Paraphrasing Rich Hickey: It's primarily a user oriented language and there is no underlying abstraction for external programs or systems. In the end programmatic access boils down to using StringBuilders.


I see no reason to reproduce Wikipedia's laundry list:

http://en.wikipedia.org/wiki/SQL#Criticism


Wikipedia's laundry list contains of two items:

  * SQL not being relational algebra
  * SQL implementations being incompatible
Both are real but I have not experienced either of these being a problem in real usage. Despite the problems caused by NULLs and that SQL allows duplicates I do not think the alternative would be better for real world programming. And the incompatibilities are avoided by picking one database per project and sticking with it.


Even if you reduce it to its most basic issues, you missed a third: Incompleteness of implementations.

If you have not experienced problems as a result of these core issues, it honestly makes me feel you must not have a lot of experience -- at least not in diverse projects and environments.

That you think picking one database is some sort of viable solution in the general case amplifies that feeling. That is so often simply not an option, and even where supporting only one database at a time is an option, you absolutely cannot guarantee that you will not have to migrate later. I've been through that pain many times, it is a real-world problem.


I honestly think you should look at the database just like any other piece of your code, you cannot just replace GTK with WX Widgets and except there to be no migration work. And most of these differences between databases have nothing to do with SQL, but rather databases not implementing it correctly for legacy reasons.

I would not expect a new database query language or NoSQL databases to do this better.


I mostly like SQL.

The biggest annoyance for what I do, is that I don't have a clean way to edit a select list; that is "select foo., ... from (...) foo" works to add* columns, but I can't remove or rename or replace existing columns without listing all of them.

The second biggest annoyance, is the hoops I have to jump thru to join a parent to two (or more) sets of child rows without either multiplying everything or dropping data.


A project from 2010 that's stalled half-way through in alpha state.

Still, he is quite correct. You still need a Structured Data Query Language.


SQL remains a useful tool in the box, but while SQL and NoSQL are fighting it out to be the default go-to database technology, something else on the horizon is threatening the concept of having any database separate from the app at all: persistent in-application state.

We don't need a database if we can just load the application state in RAM and save it back via serialization. This works well in surprisingly many cases.


That's a fairly traditional no-DB approach [1], but what's the reason to believe the dominance of that approach is on the horizon?

[1] Common Lisp has a particularly nice system for doing it transparently, http://common-lisp.net/project/elephant/, and I believe HN is implemented using a DB-via-serialization strategy as well


Actor systems.

Actor systems allow this to scale and cross the machine boundary without having some dedicated database product doing it for us.

I've noticed that the more I work with actor systems (each actor is stateful and keeps its own state), the less I use databases.

In fact, the idea of keep state separate from the code reading it and changing it begins to look oddly broken as a concept.


The problems with that are easy to see though: you need lots of RAM if your database is non-trivial in size. You also need to write the state to the database continuously to prevent data loss from failures. Not saying the idea doesn't have use cases, of course. Redis (for example) is pretty popular these days, but I don't see how it'll become the default go-to database.


Yes, you need to write continuously. Basically you need a transaction log. Redis unfortunately botches that part a bit (it has no performant and reliable way of persisting every change before acknowledging a "commit"), but that's roughly the idea.

However not all of your state will be persistent, so you write only the persistent parts.

For example, if you need to have an index for quick access to items based on some criteria, you don't need to store the index on disk, you can always rebuild it on startup from the rest of the data.

While there's an upper limit for the amount of data you can store this way (it has to fit in RAM), it fits very favorably in web scenarios, where you have read-heavy write-light data access patterns, and you never have to touch the disk in order to read a piece of data (guaranteed).


Personally, I find that SQL is hard to generate. It's hard to reason about the creation of dynamic queries, say, for some types of searches. It's not reasonable to create a generic function that can operate on any same-shaped piece of data. And it's ridiculously hard to manage the concept of a pointer to a piece of data which could be one of many types, a problem I'm sure most people have faces.

Stored procedures can help with some of this, but then you've got to remember to update them every time you need to access data in a slightly different way, and it's hard to figure out when you can get rid of old stored procedures. Tools for managing them aren't up to scratch with tools for managing "real" programs.

Personally, I'm keen on RethinkDB. It's a document-oriented NoSQL database which doesn't lose your data, has built-in clustering, and an extremely strong query language built on chaining method calls. While it doesn't have transactions, the query language is powerful enough that you can easily model most forms of deterministic data transformation within a single query.


There is some irony in opening the post proclaiming how important SQL is because of webSQL, webSQL has hung on by being the only choice with Safari, with Apple almost certainly adopting IndexedDB webSQL will cease to exist in any relevance pretty quickly (its already fairly niche).

This post is just filled with naive strawmen arguments, I work on a NoSQL database, show me a SQL database that is implemented embedded inside every browser and allows offline operations to sync between masters transparently ...

Trying to define or dismiss 'NoSQL' as a singular movement is not going to work, there are lots of tools, I use different ones for different things, some of them even have a SQL interface.


NoSQL is a strange term. It really just means "everything else that isn't a traditional SQL database". It encompasses a massive range of completely different database technologies.

SQL has been astonishingly successful in recent years and it's a testament to its dominance that the term NoSQL even exists. It's kind of like calling all meats except for beef "NoBeef".

I'm really happy to see some different database technologies getting some attention these days. They each have their own niche. There's no SQL vs redis vs MongoDB vs cassandra vs whatever "winner", just databases with different strengths which can benefit us in different ways.


I don't think that's necessarily the definition of "NoSQL database".

Databases that predate Codd's work, or long-established ones like dbm and BDB, may appear similar to modern-day NoSQL databases in how they operate, but they surely aren't the same.

Those systems couldn't use relational theory or SQL because they didn't exist yet, or at least didn't reject them outright as one of their main goals.

NoSQL databases, on the other hand, are completely about rejecting relational theory and rejecting SQL. That's at the very core of their philosophy.

True NoSQL databases have been developed as a reaction to several things:

1. A very, very, very small number of situations where relational DB systems cannot easily scale.

2. The far more widespread ignorance of the basics of relational theory, and a lack of willingness to learn about it.

3. The far more widespread ignorance of SQL, and a lack of willingness to learn about it.

4. An urge to be "different" solely for the sake of being different, even if it brings no technological benefit.

5. Unmitigated hype surrounding the term "NoSQL".

Of those, 1) is perhaps the only legitimate reason for using a NoSQL approach today. The number of times this sort of a situation actually arises is remarkably small.

The other four are why those databases have become more widely used, especially within the web development community. As anyone who has dealt with such systems knows, they're rarely about safely and reliably storing and managing data, and they're rarely about doing this efficiently. They're merely a shortcut that some developers use to avoid learning how to use a RDMS.


We surveyed NoSQL users in 2012 and it turns out the biggest draw is the no-schema thing. People don't want to coordinate migrations when they are implementing new features. In some cases this may be technical debt, in others, having a schema to maintain is technical debt. http://blog.couchbase.com/nosql-adoption-survey-surprises


This is crazy. Why wouldn't somebody want a schema?


I had a larger comment typed out, but I think it would detract from my point, ignorant comments like this make this industry a far less pleasant place to work.

Consider the notion that not everyone involved in 'nosql', including some of the SQL industry leaders arent doing so because they are too stupid to realise what you find so obvious.


Do you still have your original comment? If you do, you should seriously consider posting it.

It's difficult to discuss this topic with you if you intentionally avoid posting anything of substance, and instead throw out accusations of "ignorance".


If you have certain religious beliefs then the concept of NoBeef or NoPork is an important one.

I guess that databases must be the same for some people.


Actually I wanted to write something along this lines, had some revelations hah!

Btw, what would you regard as religious-less or belief-less?

I ask on SQL vs. NoSQL not for comparison or that they are mutually exclusive, but rather these are the most often mentioned items.


Exactly. Anyone who tries to compare SQL and NoSQL is frankly an idiot. Cassandra is as different from MongoDB as it is from Oracle, Riak, MySQL or SQLite. They all have very different behaviours and many model SQL concepts quite closely. It just doesn't make any sense to group them.

Funnily enough it's always the PostgreSQL community that wants to create this artificial war. It's like they think everything in terms of one or the other when in fact most systems these days are heterogeneous.


What do you mean? Postgres is probably one of the most popular NoSQL databases. Its SQL-support is among the best as well.


I'll just put a few points here to understand when you should use which architecture.

NoSQL:

- reading is fast, writing is expensive (if all data are pre-processed/denormalized for reading during the writing phase)

- often schema-less

- low latency (for key-value storage)

- offline batch processing (classical Map Reduce)

- no ACID, choose 2 of 3 in CAP

- demanding on SW engineers to get client-side conflict resolution, tricky in general

- Petabytes of data can be suddenly processed

- huge variation of different paradigms, key-value, document, graph, batch etc.

- haywire indexing

SQL:

- writes are fast (normalization), reads are expensive (JOINs)

- ACID (well, only to some extent, clustering messes up many ACID properties unfortunately and conflicts arise in corner cases)

- set operations and a neat math theory behind them

- stable indexing, easily constructable real-time JOINs

- OLTP

- easier for developers

- non-flexible schema

- tradition, well-known recipes on how to do things

Basically, if you want to have low-latency access, your concurrency model allows eventual consistency, or you have a need to store your data in non-standard structure such as graphs/trees, use NoSQL and pre-process all data to be exactly in the format you require for reading.

If you need 99.999% guarantee of consistent data, amount of data you need to handle is under 50TB, you can put your data into a fixed schema and latency doesn't matter that much, use SQL.

I would recommend you to ask yourself a question - is your app/business read-heavy or write-heavy and decide accordingly.


>The NoSQL movement is partially a reaction to antiquated database servers, and also a response to a fear of SQL borne from ignorance of how it works

I think the NoSQL movement was created for practical reasons. Since SQL didn't scale properly for certain applications, new kinds of databases had to be invented. What often happened is that a software used a typical SQL database with locks, and as that didn't scale, the locks were decreased to a point ACID wasn't theoretically guaranteed anymore. And it worked.

If the database in production doesn't guarantee consistency, you might aswell design a new database which is based on that idea, which is a core reason NoSQL databases were created. There's other reasons, too, of course..


"Movement" is a bit much, no? It is a solution for a niche problem.


"Movement" is a good term for it.

There are a lot of people out there who have a quasi-religious attitude toward NoSQL databases.

It's a cause for them to rally around. In some ways, it gives them something to "fight" for.

Objective consideration about whether it's the best technology to use in a given situation is often disregarded.

There are a very small number of very rare situations where such technology is the best or most feasible approach. Otherwise, it's merely something that a lot of people get involved with in order to intentionally avoid learning how to use SQL and relational databases, or to feel like they "belong" to some greater cause.


>There are a very small number of very rare situations where such technology is the best or most feasible approach.

I don't agree that it's a very small number. In fact, usually SQL brings a lot to the table of which very little is actually useful.


He is right, the issue has been servers not scaling horizontally.

SQL the query language is great, the concept of set theory applies really well to data. SQL can be and is used on non-SQL database, impala for example implements SQL to query data burried in hadoop.

NoSQL stores become popular because they scale horizontally (able to use more than one server) naturally. Once the SQL servers can do automatic partitioning, I suspect people will start migrating back.

The truth is that there is no pixie dust, at the end of the day you need to index. I see nosql proponent having the same strugles sql people have with indexing, but right now they have the upper hand because they can spread the work over several servers.


I guess this might explain some of its popularity but I think it has much more to do with the fact you don't have to do any advance planning about your data (i.e. no need to write schemata). "Rapid prototyping" is much easier when you don't have to think about the your data, its types nor the relationships between them.

Of course, moving out of this phase becomes a massive headache, since basing your product on essentially unstructured data is a very good definition of "technical debt". And if you're using structured data in your rapidly prototyped object, why not just a RDBMS in the first place? :)


> Of course, moving out of this phase becomes a massive headache, since basing your product on essentially unstructured data is a very good definition of "technical debt".

Of course, it's possible to use things like JSON-Schema to validate your data if you choose to.

> And if you're using structured data in your rapidly prototyped object, why not just a RDBMS in the first place?

Because where's the RDBMS with a "natural" query language that is well-suited to complex, dynamic queries and document structures? How do people model a document store, with the ability to point to differently-shaped data, in an RDBMS without giving up all that safety?


All what safety?

Anyway, I've used and have used both kinds of data stores. They have their places. Solr's a great document store with a very good distributed data story, for example. Not to mention interesting nested documents & queries features.

For the most part though, you'd have a hard time convincing me that MongoDB or COuchDB (or whatever other document stores) are good enough as the system of record for actual products.

Yes, relational databases require a lot more thought about your data. That's the point.


> Or Why You Still Need SQL

Great way to make people close the page immediately after reading the title. (I personally don't need SQL for my small web app => the title's lying => close tab).


I don't need a hammer to screw in a screw, so I don't need a hammer. QED.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: