Tuesday, 1 September 2015

Release It - Highlights Part 1

Release It! is one of the most useful books I've read. The advice and suggestions inside certainly change your perspective on how to write software. My key takeaway is that software should be cynical. Expect the worst, expect failures and put up boundaries. In the majority of cases these failures will be trigged by integration points with other systems, be it third parties or your own.

My rough notes and snippets will be spread across the following two posts. There is much more to the book than this, including various examples of real life systems failing and how they should have handled the problem in the first place.

  • Part 1 - Shared Resources, Responses, SLA, Databases and Circuit Breakers
  • Part 2 - Caches, Testing, HTML, Pre-Computation and Logging (Future Post)

Shared Resources

  • Shared Resources can jeopardize scalability.
  • When a shared resource gets overloaded, it will become a bottleneck.
  • If you provide the front end system, test what happens if the back end is slow/down. If you provide the back end, test what happens if the front end is under heavy load.

Responses

  • Generating a slow response is worse than refusing to connect or timing out.
  • Slow responses trigger cascading failures.
  • Slow responses on the front end trigger more requests. Such as the user hitting refresh a few times, therefore generating more load ironically.
  • You should error when a response exceeds the systems allowed time, rather than waiting.
  • Most default timeouts of libraries and frameworks are far too generous - always configure manually.
  • One of the worst places that scaling effects will bite you is with point to point communication. Favour other alternatives such as messaging to remove this problem.

SLA

  • When calling third parties, services levels only decrease.
  • Make sure even without a third party response your system can degrade gracefully.
  • Be careful when crafting SLA's. Do not simply state 99.999%, it costs too much to hit this target and most systems don't need this sort of uptime.
  • Reorient the discussion around SLA's to focus on features, not systems.
  • You cannot offer a better SLA than the worst of any external dependencies you use.

Databases

  • Your application probably trusts the database far too much.
  • Design with scepticism and you will achieve resilience.
  • What happens if the DB returns 5 million rows instead of 5 hundred? You could run out of memory trying to load all this. The only answers a query can return is 0, 1 or many. Don't rely on the database to follow this limit. Other systems or batch processes may not respect this rule and insert too much data.
  • After a system is in production, fetch results can return huge result sets. Unlike developer testing where only a small subset of data is around.
  • Limit your DB queries, e.g. SELECT * FROM table LIMIT 15 (the wildcard criteria would be substituted)
  • Put limits into other application protocols such REST endpoints via paging or offsets.

Circuit Breakers

  • Now and forever networks will always be unreliable.
  • The timeout pattern prevents calls to integration points from becoming blocked threads.
  • Circuit Breakers area way of automatically degrading functionality when a system is under stress.
  • Changes in a circuit breaker should always be logged and monitored.
  • The frequency of state changes in a circuit breaker can help diagnose other problems with the system.
  • When there is a problem with an integration point, stop calling it during a cool off period. The circuit breaker will enable this.
  • Popping a circuit breaker always indicates a serious problem - log it.

Tuesday, 25 August 2015

Production Code is Dirty

Production code is dirty. Dirty may be the wrong word however. Complex could be more suitable. Unlike code that is not yet in production, it is weathered, proven, and full of edge cases including numerous bug fixes. After some time this build up of additions can cause the code to be considered dirty or legacy.

Greenfield development used to appeal so much more. Small classes. Small methods. Few dependencies. Just simple, clean code. Except this is not the case. Get into production and that clean code starts to weather. You'll handle edge cases, fix bugs and stabilize the functionality. That lovely, small, well factored application starts to accumulate dirt. The new code smell wears off and you're back waiting for the next new project so you can do it properly a second time around.

This does not have to be the case however. Long living software such as operating systems, browsers and embedded systems are maintained and extended well after they were created. Production code can be complicated but still clean with redeemable qualities. In order to do this you should write tests, control dependencies and get into production or the hands of the user as soon as possible. This may seem an obvious solution but sadly many software projects fall into this trap of dirty code after a handful of iterations.

Tuesday, 18 August 2015

Queue Centric Work Pattern

The Queue Centric Work Pattern (QCWP) is simple. Send a message declaring the intent of the command, acknowledge the message and proceed. All work takes place in a background process so the user is not kept waiting for the request to return. Acknowledgement usually takes the form of persistence to ensure that no messages are lost. Real life examples of the QCWP in action would be the sending of an email or the confirmation of an order being accepted from an online retailer.

The QCWP will introduce the concept of eventual consistency, which surprisingly is not an issue in most cases. The queue itself should be implemented via some form of message queue that handles some of the more complicated technical issues regarding message meta data, routing, persistence and so on. Once a message queue has been chosen the code required to implement QCWP does not differ to far from simple request-response examples in terms of both complexity and lines of code.

Benefits

Reduced Latency

Transferring the message, confirming acknowledgement and returning to user with some form of confirmation can be very quick. If the process is long running, it can be vastly quicker to use the QCWP. Even for low latency scenarios, the use of the QCWP introduces other benefits.

Retry

If something fails you can retry the command in a background process. Nothing is lost when one or more systems are down. If the command fails consistently, then you can simply notify the user or perform some other compensating action.

Decoupled

If one system is offline the message is just stored and the queue builds up. Once back online the queue will be emptied. The temporal coupling between the two systems is now removed. Coupling has been reduced so much that you can switch consumer with another system and the client would be unaware as long as the message formats remain the same. This allows different languages to read and populate the queues.

Scaling

To increase throughput you can simple introduce a competing consumer until the appropriate amount of messages is handled within a SLA boundary. The inverse is also true. The QCWP allows throttling. Rather than peak load from web server traffic hitting the back end services, these can be scaled independently. As the consumer of the messages will handle each message at its own pace, there is no chance that other dependencies such as databases would become overwhelmed.

Downsides

These benefits don't come for free however. The main issue with the QCWP is the time it takes to get to grips with this change of conceptual model. Testing asynchronous code is a lot harder, introducing problems such as polling shared resources for changes. The very same issue means simply debugging asynchronous systems can be challenging even with good monitoring and auditing in place.

Conclusion

QCWP was a real change in terms of how I think about two services communicating. This change in pattern is not hard, merely different. Once you adjust to the challenges, the benefits enable some truly resilient systems when communication must occur out of process.

Tuesday, 11 August 2015

Loops vs Functional Programming Styles

The following examples are four of the most common functional programming patterns that appear in mainstream languages though they may be known under different names.

Being a fan of CQS and CQRS, queries work great when coding using the functional style. While this is completely subjective in terms of style there is another benefit - composition. In other words the functional styles below can all be joined together with minimal changes. A traditional loop would require additional modifications. The benefit composition provides is similar to the pipes and filter architecture - it is very easy to change the behaviour of the pipeline by simply adding or removing statements.

Composition and concise code aside, traditional loops should not be avoided fully. Each scenario will have different solutions. Sometimes you really just want a standard loop.

The benefit of learning the key concepts behind Map, Filter, ForEach and Reduce is the ability to translate these styles and idioms into other languages that may have the same functionality just behind a different interface.

Map

Also known as Projection. Convert the array into a new array based on the callback provided.

Filter

Filter the array based on the callback if the result is true. In the same manner as Map, the non functional version of this code is an extremely common pattern so the functional version really shines here.

ForEach

Invokes the callback for each member of the array. This is another very common pattern that really benefits from the functional form.

Reduce

Converts the array into a single value by taking the current index and the next index as parameters to be applied.

Tuesday, 4 August 2015

Why I Don't Like Mocking Frameworks

Disclaimer: By mocking framework I generalize anything that includes support for stubs and mock objects.


The use of mocking frameworks was a difficult part of my TDD journey. Not only are newcomers expected to get their head around the basics of the practice there are now new tools to contend with. To make matters worse there is a lot of mocking frameworks out there with differing quality qualities and suitability.

The use of mocking frameworks includes a variety of disadvantages.

  • Readability suffers in most cases. You often find yourself asking what is exactly happening here? The frameworks themselves usually impose these constraints and make the issue worse.
  • The use of frameworks tends to lead to header interfaces and not role interfaces being used. IDE's usually have a factor in this as they make this anti pattern so very easy to introduce.
  • A lot of developers are not aware of what these frameworks are doing behind the scenes. This can lead to confusing tests and a general lack of understanding.

Solution

My preference is to use hand crafted test doubles. While these are looked down upon by some, they offer numerous benefits.

  • Stubs and Fakes are easier to understand, write and maintain when hand crafted.
  • Manual test doubles read easier. The key benefit here being able to name implementations after their use and function.
  • Hand crafted test doubles promote reuse. It is likely that such doubles will be used across numerous tests. Once created code duplication actually reduces.
  • Hand crafted test doubles are a prerequisite to enable contract testing.

The actual implementation of these hand crafted doubles is minimal. In most cases simply providing the arguments as constructor or method parameters works. For more complicated scenarios DAMP tests can be used.

One area where frameworks provide a benefit is that of mock objects. In non trivial examples the requirements to verify numerous parameters and configurations can be verbose to hand craft. However there are alternatives to hand crafted test doubles such as the self shunt pattern which will be expanded upon in a future post.