Skip to main content

The N+1 Problem

The N+1 problem is when multiple queries are executed against a persistent store when a reduced amount could serve the same purpose. This degrades performance, uses more memory and can cause complexity to be added to the code that processes the results. Most sources of the problem come from the poor use of ORMs or developers thinking procedurally instead of in terms of how the underlying database operates.

Example

Consider a collection of posts that each contain zero or more comments.

  Post
    Comment
    Comment
  Post
    Comment
    Comment
    Comment
    Comment
    Comment
  Post

To retrieve a selection of ten posts including their comments, one option would be to query all posts then perform a query for each individual posts' comments. This would result in a total of eleven queries. While this solution works it is far from ideal. Disturbingly this solution is easily introduced when developers execute queries against databases using loops or misconfigured ORMs.

Solutions

Solutions to solving the N+1 problem are remarkably straightforward. In the case of manual queries such changes are usually easy to implement.

Single Query

Use a join operation to perform a single query. This one query would pull back all posts and their matching comments. This would be the ideal fix for the example described above.

Query and Stitch

Sometimes there is no clear grouping or relation between sets of data. This is often the case when normalized data needs to be denormalized prior to retrieval. In these cases the query and stitch method can be used.

  • One query to grab master set.
  • Another query to grab the related set.

Then simply match on a key in code. The key would be something that groups the data and is present in both sets or is the result of additional programming logic. Query and stitch is useful for paging or when relational thinking and grouping does not fit. This tends to be the case for REST APIs where data is aggregated or composed from multiple sources, or needs further processing after retrieval.

Despite two queries here, it is often possible to return separate datasets within a single query prior to stitching the data together as a further optimisation and simplification.

ORMs or Tooling

When ORMs are used discovering the N+1 problem is more obscured without logging the underlying queries that are performed. Once an issue is discovered it is usually a case of consulting documentation on what the fix is - often configuration related. Due to this it is worth enabling logging during development so queries can be analysed.

Comments

Popular posts from this blog

Constant Object Anti Pattern

Most constants are used to remove magic numbers or variables that lack context. A classic example would be code littered with the number 7. What does this refer to exactly? If this was replaced with DaysInWeek or similar, much clarity is provided. You can determine that code performing offsets would be adding days, rather than a mysterious number seven.Sadly a common pattern which uses constants is the use of a single constant file or object. The beauty of constants is clarity, and the obvious fact such variables are fixed. When a constant container is used, constants are simply lumped together. These can grow in size and often become a dumping ground for all values within the application.A disadvantage of this pattern is the actual value is hidden. While a friendly variable name is great, there will come a time where you will want to know the actual value. This forces you to navigate, if only to peek at the value within the constant object. A solution is to simple perform a refactor …

Three Steps to Code Quality via TDD

Common complaints and problems that I've both encountered and hear other developers raise when it comes to the practice of Test Driven Development are: Impossible to refactor without all the tests breakingMinor changes require hours of changes to test codeTest setup is huge, slow to write and difficult to understandThe use of test doubles (mocks, stubs and fakes is confusing)Over the next three posts I will demonstrate three easy steps that can resolve the problems above. In turn this will allow developers to gain one of the benefits that TDD promises - the ability to refactor your code mercifully in order to improve code quality.StepsStop Making Everything PublicLimit the Amount of Dependencies you Use A Unit is Not Always a Method or ClassCode quality is a tricky subject and highly subjective, however if you follow the three guidelines above you should have the ability to radically change implementation details and therefore improve code quality when needed.

DRY vs DAMP in Tests

In the previous post I mentioned that duplication in tests is not always bad. Sometimes duplication becomes a problem. Tests can become large or virtually identically excluding a few lines. Changes to these tests can take a while and increase the maintenance overhead. At this point, DRY violations need to be resolved.SolutionsTest HelpersA common solution is to extract common functionality into setup methods or other helper utilities. While this will remove and reduce duplication this can make tests a bit harder to read as the test is now split amongst unrelated components. There is a limit to how useful such extractions can help as each test may need to do something slightly differently.DAMP - Descriptive and Meaningful PhrasesDescriptive and Meaningful Phrases is the alter ego of DRY. DAMP tests often use the builder pattern to construct the System Under Test. This allows calls to be chained in a fluent API style, similar to the Page Object Pattern. Internally the implementation wil…