Reading up on Apache Kafka

Even though it is a lot of fun to write your own high-performance middleware and fine-tune its multicast performance, it is important to every now and then look outside your team and your organization at what’s available in the market. Even if it isn’t a perfect fit, it might give you some ideas for your own implementation (like PGM did for me).

Hence I attended Data-driven Day at codecentric, a day of talks about various aspects of Apache Kafka. Key note speaker was Tim Berglund from Confluent, a company building a platform on top of the open source Kafka. A lot of the basic concepts from his talk you can find in the Kafka Wikipedia article, but it was helpful to have Tim explain them in a way that made sense even for someone like me who had never heard of Kafka before.

Videos from that day’s talks are available on YouTube: [1], [2], [3].

Afterwards, I got a chance to talk to Tim about some of the specific requirements we have in terms of performance, throughput and protocol design, and he pointed out some articles on the Confluent blog. Additionally, I also found the following resources very helpful:

Advertisements

Pictures from Slovenia Part 3: Mostnica Gorge

After climbing the mountains around Lake Bled on day 2 of our Slovenia trip, on day 3 we took the bus to Lake Bohinji in Triglav National Park.

IMG_20180607_182030_863

From there we hiked a few miles up to Stara Fuzina and on through Mostnica Gorge to the waterfall at its Northern end.

Mostnica Schlucht (2)

Mostnica Schlucht (1)

Mostnica Schlucht (3)

IMG_20180607_171722_669

Mostnica Gorge itself is at times extremely narrow and you literally have to climb between roots and tree trunks to make your way through. Towards the end, though, you walk through a valley and are presented with beautiful vistas of the Julian Alps.

Almhütte im Triglav Nationalpark (2)

Almhütte im Triglav Nationalpark (3)

Pictures from Slovenia Part 2: Lake Bled

It’s been almost two years since M and I went to Berchtesgaden, so it was time again for our bi-annual hiking trip. So after landing in Ljubljana, we took the bus to Lake Bled and spent a couple of days there.

I highly recommend the hike up Osojnica Hill. It is pretty steep at times, but the views of the lake and its surroundings from up there are simply breathtaking.

Insel Bled

Bleder See Panorama von oben

I couldn’t get enough of the postcard-worthy views of Bled Island and the vegetation around Lake Bled.

IMG_20180606_153157_333

IMG_20180606_103551~2

Baum am See-01-01

Flora (1)

From our hotel room balcony we had a direct view of Bled Castle. And despite some rain on the first night and discouraging weather forecasts for the week, the new day welcomed us with this gorgeous view.

Burghügel Morgens

Burhgügel Dämmerung (1)

Burhgügel Dämmerung (2

Burghügel Nacht

Pictures from Slovenia Part 1: Ljubljana

In my quest to visit the capitals of Europe, I was able to cross another one off the list this month: Ljubljana, capital of Slovenia.

The city is rather small, but features an interesting combination of architectural styles: The small, sometimes dilapidated houses in the Old Town had a Mediterranean flair, almost like Rome.

Ljubljana am Fluss (1)

Ljubljana am Fluss (2)

Ljubljana am Fluss (3)

On the other hand there are places like Republic Square and several residential high-rise buildings whose architecture seems influenced by Soviet ideals.

Republic Square

Ljubljana Plattenbau

And of course there is classic architecture from when Slovenia was under Habsburg rule that might just as well fit in Vienna.

Ljubljana Kathedrale (3)

_DSC2923-01

Pictures from Ghent

My love of old cities had already taken me down to Rothenburg ob der Tauber and Heidelberg, so I don’t know why I had never been to Ghent, which is a lot closer. However, while all of these places feature beautiful medieval architecture, Ghent feels the most vibrant. Unlike Rothenburg, which forbids any changes that alter the cities appearance, Ghent feels more alive as it also incorporates modern elements in beautiful symbioses. And it is build around several rivers and canals which – as you can probably tell from the number of photos – has its very own appeal.

Day 1

The weather the first day wasn’t great, so all of that day’s pictures a bit gloomy.

Altstadt-Panorama

IMG_20180430_110315_244

IMG_20180430_110121_434

Day 2

I got a very early start on the second day and was rewarded by these beautiful sights during the blue hour.

IMG_20180501_063629_496

Kanal

IMG_20180501_063907_375

Mond über Kanal

Alte Post

On my way home, a final look back at the church towers that define Ghent’s skyline and are visible from almost anywhere in the city.

Look back

The Case of Multicast Message Loss (again)

I have written about trouble-shooting multicast issues several times before, but multicast is a gift that keeps on giving.

The Problem

The application in question would miss a substantial number of messages. A trace on the connected switch showed that all packets had been put on the wire. Tracing with Microsoft Message Analyzer on the machine showed these same messages missing, so our application probably was not at fault. Additionally, it did work on other machines just fine.

The Analysis

So I went back to the drawing board, reviewed and double-checked everything I had learned about high-throughput multicast messaging and

  • set appropriately large socket receive buffer sizes in the multicast message receiving application,
  • activated all TCP/UDP Rx/Tx offloads in the NIC configuration,
  • activated receive side scaling (RSS) and picked the maximum number of RSS queues,
  • set the NICs’ receive buffers to their maximum values,
  • disabled flow-control,
  • turned off all power-saving features in NIC and operating system,
  • used the most aggressive interrupt moderation setting, and
  • updated the NIC driver top the latest version.

In order to check NIC settings, I keep the following PowerShell snippet handy. It gives me the current, all valid and maximum values for each parameter of each NIC in the NIC team. And it doesn’t even require admin privileges.

(Get-NetLbfoTeam "MyNicTeam").Members | Get-NetAdapterAdvancedProperty | ft DisplayName,DisplayValue,ValidDisplayValues,NumericParameterMaxValue

Other useful sources:

But even after ensuring all parameter were at their optimal values, the problems persisted. So I spent some time setting up perfmon with these network-related performance counters.

One counter immediately jumped out: Packets Received Discarded was pretty much constant on the machines our application worked on. But on the machines where we noticed packet loss, this number was growing fast.

This Technet blog post has a good explanation of that performance counter and tips on how to gather it from multiple machines remotely using PowerShell.

The Cause

It turns out the machines experiencing multicast message loss had substantially smaller receive buffers (512) compared to the machines that were working fine (2048 and 4096). Even though our setup script had correctly configured the maximum value for this parameter, that was apparently still insufficient.

So we ended up upgrading the NICs on the cluster experiencing the problems and the multicast messages loss went away.

Upon closer examination we also noticed TCP packet loss while our multicast application was running. But because resends were mostly successful, only introducing small delay this had gone unnoticed before.