Can you really “abstract away your database”?

TL;DR It’s possible to succeed for very narrow requirements and between a limited choice of databases. However it’s improbable you’ll succeed, and it’ll be late-night, high-scale edge cases that let you know.

“Just use a repository!”

Let me stop you there. If you use the word “just”, you’ve made a snap judgement and haven’t thought it through to all the possible branches of possibility.

First, a look at history

I personally believe this whole “abstract the database” thing came about properly in the late 1990’s when Oracle and DB2 were the RDBMS mainstays in “Enterprise”, and people were still writing “custom” databases to their own design of file structure. Along comes SQL Server, and suddenly there’s a new “Enterprise” option.

This is before the cloud. This is before “software as a service”. Even before enterprise apps were mostly web apps. It’s when you could say “enterprise” with a straight face. It’s probably before “enterprise” was really a thing as we know it now.

The landscape became such that any software you wrote that you delivered to your customer would have to run on their infrastructure, and you’d have little control over that. So if you wanted to cater for all customers, you needed to find a way to “configure” which database was in use.

However since your choices were between a small variety of RDMBS database engines, abstracting your database was, in theory, quite possible. It’s where Hibernate and Entity Framework came from. They are literally database abstractions.

Even SQL isn’t standard

ANSI SQL != T-SQL != PL/SQL and so on. All the “SQL” databases have either different syntaxes or different features or both. Here’s just a few differences I can think of (some have been aligned by now I expect, but I bet there’s still very little parity in the implementation):

  • SQL Server uses auto-incrementing IDENTITY columns. Oracle didn’t have those. You had to use a “sequence” and some PL/SQL to get auto-incrementing numbers. (I believe SQL Server now has sequences).
  • Datatypes are different. Some dbs have bools.
  • Oracle has Multi Version Concurrency Control. SQL Server didn’t (though there’s row versioning now, which is mostly the same).
  • The way that row locks are promoted to page locks and ultimately table locks may be different – so the same usage on two RDBMSs might suffer from deadlocking in one but not the other.
  • SQL Server participates in distributed transactions between databases and other DTC-capable things like MSMQ and (to a point RavenDB). Not all databases (i.e. non-Microsoft ones) do.

And that’s just RDBMSs. For example, GetEventStore doesn’t support transactions across two or more documents in it’s own database, let alone transactions across databases. Redis doesn’t even guarantee writes.

Urm, your abstraction is leaking…

Ultimately, a database abstraction is a leaky one at best. Your high-level code has to make assumptions about the implementation of lower-level code. Your interfaces don’t protect you here. If you’re “doing DDD” and/or CQRS for example, and you have your aggregate boundaries/transaction boundaries/context boundaries all set up nice, but you switch from SQL Server to Cassandra, you WILL eventually find that some assumptions about how data is written to and read from those systems are wrong. And it’ll happen at scale, when you least expect it, and it’ll be hard to debug.

So don’t kid yourself. And never say “just”.

Leave a comment