Skip to main content

Abstract Data Use Not Data Access

Common data access abstractions I've come across and been guilty of implementing myself are the likes of:

  • IDatabase
  • IPersistentStore
  • IConnection
  • IDataStore
  • IRepository

The problem is, these are not really abstractions. If anything they add an extra layer of indirection. One such benefit of this level of indirection is each concrete implementation can be substituted. This makes testing easy. Other than this, such generic solutions introduce a whole host of problems.



Such examples are said to be at the wrong level of abstraction. This indirection forces developers to work at the wrong level of abstraction. For example, a controller has no right to be directly querying your data store directly. If the same query is required somewhere else you introduce duplication.

Big Bang Upgrade

Given such indirection offers a poor abstraction, upgrading to use a different implementation is tricky. If we assume one hundred usages of IDatabase, all of these code paths need to be migrated and tested. This can be such a huge undertaking that upgrades are often left as technical debt, never to be fulfilled.

Leaky Abstractions

In a similar manner to the previous point, these abstractions are poor. They leak implementation details. Due to this they cannot be considered as valid abstractions. Consider a SQL implementation of IDatabase, we may have a FindById method that takes an integer as the Id. If we wished to update to a NoSQL solution the lack of a primary key causes problems. FindById for the NoSQL implementation may require a Guid. There interface is now broken.

Interface Bloat

Another downside of coding at the wrong level of abstraction is that the amount of use cases increase constantly. What might begin as a humble interface consisting of a handful of query methods soon becomes a dumping ground for all sorts of exotic behaviour - specific to niche use cases.

Lowest Common Denominator

Different data access providers have different capabilities, but in order to stay "decoupled" only core functionality present in all providers can be used. This leads to dull, limited interfaces consisting of standard data access functionality. The limited feature set can mean a poor integration. Why avoid the advanced features your library offers?

A poor abstraction that exhibits the problems above may look like this.

To retrieve a user based on the Id.


If we abstract how the data is used and not how the data access is performed we can avoid these pitfalls. By staying at the right level of abstraction and not leaking implementation details we end up with a different looking interface.

The concrete implementation in this example will be a SQL implementation using Dapper.NET.

The usage is similar.

The key point here is that we solve the problems of the "generic" solution.

  • IUserQuery is a better abstraction, it allows selective upgrades. This use case will have limited use, meaning updating a handful of references is easier than updating every data access component in one go.
  • The fact we use a SQL database as our store is hidden, no details leak. UserId encapsulates how we identify users, if we were to switch to a NoSQL store our consumers would be unaware.
  • One of the biggest benefits is the ability to use our third party library to its fullest. Rather than wrapping Dapper we can make use of it directly, making use of any special features it offers, rather than conforming to a limited subset of an API.

Aren't We Introducing Lots of Classes?

More, but not "lots". However this is a common complaint when the above solution is proposed, though given the vast benefits included this trade off is certainly worth it. Additionally, each query or repository that is implemented in this manner is easier to develop and test due to closer adherence to the Single Responsibility Principle.

How Do We Unit Test SqlUserQuery?

You don't. In this example we make use of the third party library directly. The benefits discussed prior justify this, though it means unit testing is not possible. Therefore you should apply integration testing against a real data store. The rest of the system will be coded against the abstraction, so unit tests can be applied as normal here. Any attempt to "abstract" or wrap the third party will remove many of the benefits of this solution, so don't worry about it.

For a great discussion on this topic, check out a talk by Kijana Woodard for more examples.


Popular posts from this blog

Constant Object Anti Pattern

Most constants are used to remove magic numbers or variables that lack context. A classic example would be code littered with the number 7. What does this refer to exactly? If this was replaced with DaysInWeek or similar, much clarity is provided. You can determine that code performing offsets would be adding days, rather than a mysterious number seven.Sadly a common pattern which uses constants is the use of a single constant file or object. The beauty of constants is clarity, and the obvious fact such variables are fixed. When a constant container is used, constants are simply lumped together. These can grow in size and often become a dumping ground for all values within the application.A disadvantage of this pattern is the actual value is hidden. While a friendly variable name is great, there will come a time where you will want to know the actual value. This forces you to navigate, if only to peek at the value within the constant object. A solution is to simple perform a refactor …

Three Steps to Code Quality via TDD

Common complaints and problems that I've both encountered and hear other developers raise when it comes to the practice of Test Driven Development are: Impossible to refactor without all the tests breakingMinor changes require hours of changes to test codeTest setup is huge, slow to write and difficult to understandThe use of test doubles (mocks, stubs and fakes is confusing)Over the next three posts I will demonstrate three easy steps that can resolve the problems above. In turn this will allow developers to gain one of the benefits that TDD promises - the ability to refactor your code mercifully in order to improve code quality.StepsStop Making Everything PublicLimit the Amount of Dependencies you Use A Unit is Not Always a Method or ClassCode quality is a tricky subject and highly subjective, however if you follow the three guidelines above you should have the ability to radically change implementation details and therefore improve code quality when needed.

DRY vs DAMP in Tests

In the previous post I mentioned that duplication in tests is not always bad. Sometimes duplication becomes a problem. Tests can become large or virtually identically excluding a few lines. Changes to these tests can take a while and increase the maintenance overhead. At this point, DRY violations need to be resolved.SolutionsTest HelpersA common solution is to extract common functionality into setup methods or other helper utilities. While this will remove and reduce duplication this can make tests a bit harder to read as the test is now split amongst unrelated components. There is a limit to how useful such extractions can help as each test may need to do something slightly differently.DAMP - Descriptive and Meaningful PhrasesDescriptive and Meaningful Phrases is the alter ego of DRY. DAMP tests often use the builder pattern to construct the System Under Test. This allows calls to be chained in a fluent API style, similar to the Page Object Pattern. Internally the implementation wil…