A Modest Proposal: The Most Recent Update Pattern

As I’m currently working on the design of a new application, I have been reading up on design patterns lately. Using an established pattern to solve a problem that isn’t unique to your application is a great way to make your design more understandable to others and thus easier to maintain (so is writing unit tests, by the way).

For my particular problem, I was most interested in the Concurrency Patterns, though some them are pretty trivial (like basic locking, do we really need to call that a pattern?). Going over that list I found one pattern, I have used successfully in several multi-threaded applications processing large amounts of real-time data, missing. I have termed this, for the lack of a better term, the Most Recent Update Pattern.

Pattern Description

Let me explain what I mean by the Most Recent Update Pattern and where I see a use case for it, using as an example an application that processes stock quotes.

Use Case: Multi-threaded Producer Consumer Relationship

This application has an uplink to a market data provider (e.g. Bloomberg, Reuters) that provides it with a feed of quotes for thousands of different stocks. Data from this feed is used to update objects representing the stocks we’re interested in. The logic to read from the data feed and update the stock objects (henceforth referred to as data objects) is contained in the producer thread. Actually processing the stock quotes (e.g. doing some technical analysis on them) is very complex and time-consuming and therefore done in one or more separate threads a.k.a. the consumer threads. Each consumer tells the producer what data objects will be processing by subscribing them form the producer (also known as the observer pattern). Updates for the subscribed data objects are passed to the consumers via a message queue (one per consumer) where they wait until the consumer has time to process them.

Problem: Information Overload

If this was a simple FIFO queue and there would be a situation where the rate of new updates was greater than the rate at which updates are processed, the consumer might never get around to processing all updates and the application would eventually run out of memory. Even when the rate of incoming updates would go down and the consumer could catch up, there would probably be a lot of outdated updates in that queue. As the consumer is dequeuing updates from the head of the queue it would be unaware of newer updates waiting somewhere towards the tail of the queue.

Solution: Process only the most recent update

What you would really want in a situation like this, is to to tell the consumer which data objects have been updated, and when it has time to process one, it would do so using the most recent update for this data object. Thus it would ignore any updates there may have been in the mean time, but those would be outdated and thus irrelevant, anyway (i.e. no guaranteed message delivery). This way, the queue’s maximum size is bound by the number of data objects the consumer has subscribed. Also, when the consumer dequeues an update and begins its processing, it would always work with the most recent and up-to-date data.

The following graphic illustrates the objects involved, using the same terminology as the sample implementation below:

MostRecentUpdatePatternSample

Sample Implementation

Here is a sample implementation of the Most Recent Update pattern in C#. Since C# uses garbage collection of automatic memory management, we don’t have to worry about what will happen to updates that have been replaced by newer ones in the queue. I have, however, implemented this pattern in Delphi as well where one doesn’t have that luxury and has to keep track of the objects to discard them on the right thread. It’s actually a pretty simple modification, one that could be used in C# as well to implement the object pool pattern in order to avoid having to create and destroy so many update objects.

Update

The update class is merely a container to hold some value. For simplicity I have made that value a double, but it could be anything you like. It is important to note, though, that the Value field is set only once by the producer and any subsequent access to it by the consumer(s) is read-only. Therefore, we don’t need to have any locks or other synchronization mechanisms here.

class Update
{
    public Update(double value)
    {
        Value = value;
    }

    public readonly double Value;
}

Subscription

The subscription represents the producer consumer relationship and holds a reference the most recent update that has been produced but not yet consumed. The logic for accessing the the update field is contained in the SubscriptionQueue class and described in the section below.

internal class Subscription
{
    public  Update MostRecentUpdate;
}

SubscriptionQueue

The subscriptions queue contains most of the logic needed to implement the Most Recent Update pattern. Each consumer has its own queue that only this one consumer reads from, although multiple producers may write to the queue simultaneously. Since I am aiming for a lock-free design wherever possible, I am using the ConcurrentQueue<T> class from version 4 of the .NET framework. This queue contains the subscriptions of objects that have been updated, but not the updates themselves. When there is an update for a subscription, the MostRecentUpdate field of the subscription is updated in the producer thread using Interlocked.Exchange(). This operation is atomic and thus thread-safe, guaranteeing that there is no simultaneous access to that field by the consumer without using locks. This reference is set to null by the consumer when it has taken ownership of the update in TryDequeue() using Interlocked.Exchange() as well. Interlocked.Exchange() returns the previous value of the field, when a new value is placed into it. If the previous returned to the producer when enqueuing an update is null, the producer knows that the consumer has received the previous update and the subscription is to be queued up again. A non-null value means the subscription is still queued and a previous value has just been replaced. The consumer will thus never see the previous update.

class SubscriptionQueue
{
    private ConcurrentQueue<Subscription> _queue = new ConcurrentQueue<Subscription>();

    public void Enqueue(Subscription subscription, Update update)
    {
        if (null == Interlocked.Exchange(ref subscription.MostRecentUpdate, update))
        {
            _queue.Enqueue(subscription);
        }
        //
        // If the previous value of subscription.Update was something other than
        // null, the subscription was still queued and we just replaced the update
        // with a newer one
        //
    }

    public bool TryDequeue(out Update update)
    {
        Subscription subscription;
        update = null;
        //
        // The update on a subscription may be null (if it has been voided)
        // in which case we move on to the next subscription
        //
        while (null == update)
        {
            if (!_queue.TryDequeue(out subscription))
            {
                //
                // Stop when there are no more subscriptions
                //
                return false;
            }
            update = Interlocked.Exchange(ref subscription.MostRecentUpdate, update);
        }
        return true;
    }
}

DataObject

The DataObject class is pretty boring and not really part of the pattern’s core logic. It is provided here only as an example of how the SubscriptionQueue might be used.

class DataObject
{
    private Dictionary<SubscriptionQueue, Subscription> _subscriptions =
        new Dictionary<SubscriptionQueue, Subscription>();

    private double _value;
    public double Value
    {
        get
        {
            return _value;
        }
        set
        {
            _value = value;
            //
            // Post the same update object (because its data read-only) with the new value to 
            // each subscriber (TODO: use object pool for update objects)
            //
            PostUpdateToSubscribers(new Update(value));
        }
    }

    public void VoidValue()
    {
        PostUpdateToSubscribers(null);
    }

    private void PostUpdateToSubscribers(Update update)
    {
        foreach (KeyValuePair<SubscriptionQueue, Subscription> kvp in _subscriptions)
        {
            //
            // Enqueue the subscription (kvp.Value) with the update in the queue (kvp.Key)
            //
            kvp.Key.Enqueue(kvp.Value, update);
        }
    }

    /// <remarks>
    /// Call on producer thread only
    /// </remarks>
    public void Subscribe(SubscriptionQueue subscriber)
    {
        Subscription subscription;
        if (_subscriptions.TryGetValue(subscriber, out subscription))
        {
            throw new ArgumentException("Cannot add same subscriber twice", "subscriber");
        }
        subscription = new Subscription();
        _subscriptions.Add(subscriber, subscription);
    }

    /// <remarks>
    /// Call on producer thread only
    /// </remarks>
    public void Unsubscribe(SubscriptionQueue subscriber)
    {
        _subscriptions.Remove(subscriber);
    }
}

Conclusion

This article introduced the Most Recent Update Pattern, a pattern used to manage processing large numbers of updates in a multi-threaded application where guaranteed message delivery is not a requirement, but working only with the most up-to-date data is. The sample implementation has shown that this pattern can be implemented completely lock-free making it perfectly suited for high-throughput data applications working with real-time data.

You can find sample code for the Most Recent Update Pattern on GitHub.

Why I Love Unit Testing

A lot has been written about the benefits of unit testing. As someone who was originally skeptical about investing a lot of time into writing tests instead of production code, I just want to bring up a few things that made me change my mind.

A good design is also an easily testable design

Being a business major slash self-taught coder, meant that I originally approached software development mostly from a coding perspective, i.e. writing some code to solve a problem. The basic process for me was: write code, compile and run to make sure it works. Obviously, this doesn’t work so well for large, complex programs. But once I was at the point where I had a large program, I had already written so much code with all these dependencies that it was almost impossible to separate out the different modules to do proper unit testing. So except for a few shared components with no external dependencies I gave up on the idea of unit testing and never used it much. As a result, I feel really uncomfortable when I have to go back to this project now and make changes or fix bugs, because I can never be sure that I don’t break something.

So for my latest project (which is in C#) I decided I was going to look into object-oriented design-patterns to help me in coming up with a design that was modular and would make it easy to test individual components early on. The early on part was important, because this project had a bunch of rather abstract parts that I needed to develop first, but that on their own wouldn’t really be a program I could run and test manually. Also, a lot of the other components that these parts would later work with, didn’t exist yet. Because waiting with testing until the other components were written wasn’t an option, I looked to dependency injection for the answer. This did not only help me with keeping dependencies between my classes in check, it also allowed for easily testing individual components in isolation from the rest by using mock objects. Here is a pretty need intro that has convinced me to write more testable code with dependency injection.

Proper tooling support

Maybe the biggest hindrance for me to doing regular unit testing in Delphi was that writing and running tests required me to switch to a different project, recompile and launch the test application before I could even run the first unit test. I think in order to get the most out of unit testing, it has to be as little an interruption to the normal coding process as possible, allowing me to make changes to the code, run the unit test and two seconds later see that I didn’t break anything.

This means, the testing infrastructure has to be integrated right into the IDE and allow running a test with the click of a single button. Thankfully, the unit testing functionality in Visual Studio courtesy of the ReSharper plugin (the absolutely essential plugin that every .NET developer must have, period) is almost all of that. It is integrated right into the IDE and takes just one click to run one test or even all unit tests at once. Unfortunately, the start-up times of these tests can be quite long for some reason, even when the test itself is just a few lines of code. Don’t know why that is.

Feeling safe

Maybe the biggest benefit in addition to being able to test isolated components early on is the good feeling I get after having refactored something or made some other changes and all tests still run through without errors. This is true for when I work on my own code, but it’s even more true for those projects that I have inherited from someone else. These are the kinds of projects where the guy (or girl, though it’s usually a guy) that originally wrote the code is no longer there, and no one really knows how the code works or let alone has the time to dig into it to get a good understanding. Most of the time, the code is also a mess, because let’s face it, we feel that all code that’s not written by ourselves and/or older than a year or so, is garbage. It’s just a fact of life that programmers hate old and love new code, because it’s easier to write code than to read code.

So here’s what I did this one time when I had to rewrite a portion of someone else’s program that originally was extremely monolithic, meaning most of the functionality was contained in only a couple of huge classes that were pretty much impossible to test but extremely easy to break by making modifications in the wrong place: First, I wrote a unit test for what the module I was working one should be doing. This is one of the few occasions where I wrote a test before I wrote any other code. I then extract the original code from the rest of the program and ran the unit tests on it. Most passed, though some didn’t (hence why I had to rewrite the code in the first place). After correcting/rewriting the code I ran the tests again to make sure my change worked as expected. It did. Eventually.

I guess what I should have done if I wanted to be really thorough, is write unit tests for the rest of the program, too, or at least for the modules immediately adjacent to the one I changed. But as is often the case, time was of the essence, because let’s face it: writing a lot of tests does require more time, initially. And this can be hard to justify to impatient customers when the actual fix is only a couple of lines of code. Even when writing tests once it is really an investment that pays off later at every new revision, when you can have confidence that these changes don’t break any of the code written earlier.

Conclusion

When a colleague was trying to sell me on unit testing he said “the green bar is addictive”, referring to the progress bar in most unit testing frameworks (see this screenshot of csUnit, for instance) that turns green when all tests were passed. At the time I thought he was exaggerating, but it’s true. Having all these tests in a project complete without error can be a huge motivator. Especially in like the early stages of developing a headless server application where there isn’t anything really visible to show for your development efforts. Be warned, though, that end users are not impressed when you show them a bunch of successful unit tests instead of the application they have been eagerly waiting for. The addictive quality of the green bar must be geek thing, I guess.

Another Great Read by Dan Ariely: The Upside of Irrationality

About a year ago I read Predictably Irrational and loved it. So when I heard about Dan Ariely’s latest book The Upside of Irrationality, I bought and read it immediately. And just like the first one, it was excellent.

While the topic of both books is irrationality, the second one takes a slightly different approach. Where Predictably Irrational pointed out the various irrationalities in human behavior, The Upside of Irrationality shows how irrationality influences our decisions and thus have a significant and long-term impact on our personal and professional lives.

For example, it explains how bankers have ended up earning million-dollar bonus, and why such compensation still does very little to increase their motivation and performance. It is also explained why it is that we care more about a child that has fallen and gotten stuck in a well somewhere in our country than the much more severe suffering of millions of children every day in Africa and elsewhere. This book is also more personal than the first one, as Ariely uses several stories from his own life to illustrates these points and show how even he has been influenced by irrationality in several important life choices.

In addition to pointing out these various biases in our decision making, Ariely gives some practical advice on how to deal with them. My favorite is his suggestion to go canoeing on a date to find out how compatible you are with a potential romantic partner. Good stuff.