Tuesday, 27 January 2015

Why Technical Blogging?

Given this is my fifth year of blogging I figured it would be worth while answering "Why bother with technical blogging?".

Get Writing

Write about anything. Just get started, providing it fits your core focus. This blog focuses on programming and software development related topics, so anything that falls within this category is fair game. Take a single idea and from this one blog post you can generate many more ideas. This is where my upcoming list comes from. A single post can spawn many others and the process will repeat itself.

Honest posts, that focus on your experiences tend to be the most well received. Quality over quantity also factors. I try to focus posts, rather than going for length or in depth topics. My early posts are very rough around the edges, they will continue to improve as time goes by. Ultimately the more you blog, the better you'll become at it.

Schedule

Finding the time to create posts is quite difficult. Making and sticking to a schedule can help immensely though. Since adopting a weekly schedule, this has lead to a steady stream of posts. In turn these posts lead to a steady stream of views. Being completely honest, getting started was hard. Following a schedule and using the advice in this post can help though. Initially you may spend a long time working on content, but overtime this will reduce.

Views

The best advice is to ignore view counts. High view counts make you feel great, but there is much more to writing content than simply generating stats. Your highest viewed posts may very well surprise you, likewise content you feel should be seen by everyone can struggle. Rather than views, interactions are much more rewarding. Any content that gets a retweet, reply or email is much more satisfying.

People

In the area of technical blogging the majority of interactions are good natured. People are overwhelmingly nice in most cases. Twitter tends to yield positive comments or retweets partly due to the use of real names in most cases. Article submission sites can be a mixed bag, but for any negativity the view count to comment ratio will balance out. A lot of interesting followers are discovered thanks to this blog.

Benefit Yourself

Regular posts allows you to practice writing, which is a surprisingly enjoyable activity when you enjoy the content.

I learn a lot from doing, but I also find writing down what I have learned or discovered is incredibly valuable. Having an archive of content that I find important is a huge help. If you've learned something new? Blog about it. If you've talked about something relevant? Blog about it. This helps with the generation of new content.

Having an archive of posts is great for reflection. Looking back over old posts and confirming whether or not I still agree helps with learning. Have I discovered anything new since? Just the act of re-reading and refreshing myself with a concept can be useful. This tactic combined with a developer diary has proved a powerful combination.

Career Benefits

Additionally to personal benefits, regularly blogging has had a big impact on my career. It has helped me during job interviews as it helps provide evidence for my claims. Most surprisingly eighteen months ago I was offered the chance to help write a book. Unfortunately due to a new house and job I was forced to decline the offer at the time, however without technical blogging and other writing there is no chance I would have had this opportunity.

Within the last couple of weeks I have received contacts from numerous recruiters. As part of these there was a personalised email, which not only detailed my blog but clearly saw other online contributions. This recruiter went above and beyond the norm. While I never worked with them, this polite and encouraging email is yet another benefit of technical blogging.


The book Techinical Blogging by @acangiano is a great starting place for more information and advice.

Tuesday, 20 January 2015

Abstract Data Use Not Data Access

Common data access abstractions I've come across and been guilty of implementing myself are the likes of:

  • IDatabase
  • IPersistentStore
  • IConnection
  • IDataStore
  • IRepository

The problem is, these are not really abstractions. If anything they add an extra layer of indirection. One such benefit of this level of indirection is each concrete implementation can be substituted. This makes testing easy. Other than this, such generic solutions introduce a whole host of problems.


Problems

Abstraction

Such examples are said to be at the wrong level of abstraction. This indirection forces developers to work at the wrong level of abstraction. For example, a controller has no right to be directly querying your data store directly. If the same query is required somewhere else you introduce duplication.

Big Bang Upgrade

Given such indirection offers a poor abstraction, upgrading to use a different implementation is tricky. If we assume one hundred usages of IDatabase, all of these code paths need to be migrated and tested. This can be such a huge undertaking that upgrades are often left as technical debt, never to be fulfilled.

Leaky Abstractions

In a similar manner to the previous point, these abstractions are poor. They leak implementation details. Due to this they cannot be considered as valid abstractions. Consider a SQL implementation of IDatabase, we may have a FindById method that takes an integer as the Id. If we wished to update to a NoSQL solution the lack of a primary key causes problems. FindById for the NoSQL implementation may require a Guid. There interface is now broken.

Interface Bloat

Another downside of coding at the wrong level of abstraction is that the amount of use cases increase constantly. What might begin as a humble interface consisting of a handful of query methods soon becomes a dumping ground for all sorts of exotic behaviour - specific to niche use cases.

Lowest Common Denominator

Different data access providers have different capabilities, but in order to stay "decoupled" only core functionality present in all providers can be used. This leads to dull, limited interfaces consisting of standard data access functionality. The limited feature set can mean a poor integration. Why avoid the advanced features your library offers?

A poor abstraction that exhibits the problems above may look like this.

To retrieve a user based on the Id.


Solution

If we abstract how the data is used and not how the data access is performed we can avoid these pitfalls. By staying at the right level of abstraction and not leaking implementation details we end up with a different looking interface.

The concrete implementation in this example will be a SQL implementation using Dapper.NET.

The usage is similar.

The key point here is that we solve the problems of the "generic" solution.

  • IUserQuery is a better abstraction, it allows selective upgrades. This use case will have limited use, meaning updating a handful of references is easier than updating every data access component in one go.
  • The fact we use a SQL database as our store is hidden, no details leak. UserId encapsulates how we identify users, if we were to switch to a NoSQL store our consumers would be unaware.
  • One of the biggest benefits is the ability to use our third party library to its fullest. Rather than wrapping Dapper we can make use of it directly, making use of any special features it offers, rather than conforming to a limited subset of an API.

Aren't We Introducing Lots of Classes?

More, but not "lots". However this is a common complaint when the above solution is proposed, though given the vast benefits included this trade off is certainly worth it. Additionally, each query or repository that is implemented in this manner is easier to develop and test due to closer adherence to the Single Responsibility Principle.

How Do We Unit Test SqlUserQuery?

You don't. In this example we make use of the third party library directly. The benefits discussed prior justify this, though it means unit testing is not possible. Therefore you should apply integration testing against a real data store. The rest of the system will be coded against the abstraction, so unit tests can be applied as normal here. Any attempt to "abstract" or wrap the third party will remove many of the benefits of this solution, so don't worry about it.


For a great discussion on this topic, check out a talk by Kijana Woodard for more examples.

Tuesday, 6 January 2015

Caching

The naive approach to implement caching is to just store everything in an in memory collection such as a hashtable. After all it works on my machine.

I've worked on systems in the past that used this technique but:

  • Bring in two processes and this falls apart
  • No Time to Live (TTL)
  • No cache eviction, memory will grow until it crashes the process

This sort of caching meant the system needed daily restarts due to each worker process starting to eat up more and more RAM. At the time I didn't realise this was the problem as to why daily restarts were required. These were automated so the team just sort of forgot about the problem after a while. This never felt right.

"Improper use of caching is the major cause of memory leaks, which turn into horrors like daily server restarts" - @mtnygard in Release It!.

Scale this system up, and daily becomes twice daily and so on. In a global market where software shouldn't be constrained by time zones or "working hours" this is wrong.

Solutions

There are numerous easy ways to solve these problems depending on the application in question.

Don't Roll your Own, Try a Third Party

Easy. Just use an off the shelf solution that solves the problems above plus includes a whole host of additional features.

Use your Standard Library

For example .NET includes caching functionality within the System.Runtime.Caching namespace. While there are limitations to this, it will work for some scenarios and solves some of the problems above.

Soft References

I've overlooked soft references in the past but for caching they can be incredibly useful. Use soft references for anything that isn't important or that can be recalculated. An example would be content displayed within an MVC view using the web servers session. Here if each item stored is a weak reference we introduce some benefits.

  • Stops your web server running of of memory - references will be reclaimed if memory starts to become a bottleneck.
  • Greater scalability with the same amount of memory - great for a sudden spike in traffic.

A web server's session being full of references that won't expire for a set period is a common cause of downtime. If soft references are used all we need to do is perform a simple conditional check prior to retrieval from the session. Different languages have similar features, e.g. Weak References in .NET.

Pre-Computation

Caching isn't always the best solution, in some cases pre-computation can be much easier and offer better performance. In other words at least some users will experience a slow response until the cache is warm, other techniques can be used to avoid this completely. I will expand on pre-computation in a future post.

Reference

More information can be found in the excellent book Release It!