What’s Wrong with SQL?
A lot of things, Mike Stonebraker might say, but I have something rather fundamental in mind.
Suppose I’m developing some sort of academic course management system. Chances are I’ll want to display to the user a list of course offerings and their associated course codes, readings from the syllabus, meeting times etc. Maybe something like this:
Now according to Good Rules of Normalization and Decency, I probably stored this data across several database tables, related by foreign keys. I might have tables named “offerings”, “course_codes”, “readings”, “sections”, “meetings” and so forth. So how do I retrieve all this related data from the database?
The good news is that relational databases are made for just this kind of task: joining tables efficiently is what they do for a living. Unsuspectingly, I run my query [1]:
SELECT o.title, cc.code, r.author, r.title, s.name,
m.start_time, m.end_time, m.day, m.place
FROM offerings o, course_codes cc, readings r, sections s,
meetings m
WHERE cc.oid = o.id
AND r.oid = o.id
AND s.oid = o.id
AND m.sid = s.id;
The bad news is: That didn’t work too well. The mistake may seem obvious to seasoned database application developers: I can’t just do several unrelated joins in parallel like that, or I’ll get a gazillion rows [2] back. Not only does this lead to exponentially bad performance, but the result is also in a rather annoying form as far as the client application is concerned. There is even another problem: if any of the courses in the database do not happen to have any sections or readings listed, they will be omitted from the result. SQL “fixes” this through a hack known as outer joins. It introduces NULL values into the result and, rather undeclaratively, requires each join to have its particular join condition specified explicitly rather than as part of the more general WHERE clause.
So how do we retrieve data like this from a relational database? We pull the joins out of the database and evaluate them ourselves, in our own application-specific data structures. Just about every non-trivial database web app out there does this in some way or another. The data is stored across multiple related tables in some MySQL or Postgres database. When the Javascript in the end user’s browser needs to present data to the user in some hierarchical fashion like the example above, it issues a request to a server-side middle layer, written in PHP, Ruby on Rails, Python, Java, awk or whatnot. The middle layer, possibly with the help of a persistence library, then issues a bunch of separate SQL queries to the database to retrieve all the data involved, assembles (read: joins) this into some hierarchical data structure, and returns it to the Javascript app in JSON or XML form. True, the database does help limit the data enough that this assembly process is not too much of a performance concern. But joining tables is the job of the database, and we shouldn’t have to write middle layers to do it ourselves.
There should be a general and declarative way to make big joiny queries like the above work efficiently, returning the data in exactly the hierarchical form we want it — strictly relational result sets are not expressive enough. I am currently working on a simple SQL-like query language that does just this: send my generalized middleware a single big, declarative (no for loops or outer joins here!) query, and you’ll get back the JSON equivalent of the relational result set with the data nested into arrays and objects any way you want it.
[1] “No one does this!” some may object. Actually, Ruby on Rails’ own ActiveRecord did for a while.
[2] I believe the technical term is “The Cartesian Product.” Darn you, Descartes.


Does anyone know if there is another language or set of commands beside SQL for talking with databases?
I’m working on a project and am doing some research thanks
You know, the thing about SQL is, that there is virtually nothing that can replace it.
Does anyone know if a substitute exists for sql? I mean besides MS SQL and Oracle and all that jazz. Thanks.