Business Intelligence with PowerQuery: First steps getting JSON into Microsoft Excel

There are many business intelligence solutions large and small for knowledge workers to choose from. But due to its ubiquity, I assume many (myself included) just use Microsoft Excel to interactively analyze data.

A lot of data on the web is available via RESTful APIs returning JSON, e.g. the REST countries service I’ll use in this example.

In the 2016 release Microsoft has vastly expanded the data analysis capabilities of Excel compared to previous versions. And with the free PowerQuery plugin, these capabilities are available to Excel 2013 users as well.

Getting JSON into Excel

One of these capabilities is to retrieve JSON data from the web and turn it into an Excel table. Just go to Data > From Web and enter the URL, e.g.

http://restcountries.eu/rest/v2/all

image

Excel will figure out whether there is a regular web page at the given URL (and offer to extract HTML tables from it) or JSON. In the latter case, a list of records is displayed in the Query Editor.

There is a Convert To Table button conveniently placed at the top left. But with every JSON document I’ve come across, this has required several additional steps to create a proper table.

image

Making a table

Instead I recommend going to View > Advanced Editor and add a manual conversion step by changing the query to this.

let
    Source = Json.Document(Web.Contents("http://restcountries.eu/rest/v2/all")),
    Table = Table.FromRecords(Source)
in
    Table

The same can be achieved by adding a manual step in the Query Settings sidebar and adding the bold text as the function.

Voilà, you now have that JSON as a table with one record per country.

You may want to add some more steps such as right-click the topLevelDomain column and Extract Values… to get the list of domains comma-separated in a single cell.

image

Great usability for non-programmers

This is just one of the many conversion and transformations available via easy to use (but numerous) commands in the context-menu and ribbon.

The great thing about Excel’s handling of all of these conversion steps is that they are applied non-destructively. In the Query Settings sidebar, you can see each one and click it to see the intermediate results it produces.

Not so great: M language

There is one, thing, however, that is a bit disappointing: the programming language behind all of this is the M language inspired by F#. This is unfortunate as it means any previous skills you’ve had working with data in Excel macros is useless, as even basic things such as if statements are syntactically so different I needed to look it up.

Data Analytics and Minimalist Art

At the beginning of July, I created a new Instagram account: _art_minis. And for 31 days I posted a new picture each afternoon. All of these pictures were created using the same basic shapes following the rules set out here.

It was interesting to see, how despite their similarities some pictures scored substantially more likes than others. One thing that was abundantly clear from the start was that using many popular hashtags significantly increased my chances of getting likes. Every Instagram guide will tell you that.

What I am interested in is whether there is a way to forecast how many likes a given picture might get. I’m planning on writing a little application to download hashtags and likes for each image and analyze the dataset for correlations. This would be an interesting opportunity to try out Azure Machine Learning and Cognitive Services.

Reading up on Blockchain and Distributed Ledger Technology

Last year, I was reading up on Bitcoin, blockchain and beyond. Since then, there have been several interesting developments in distributed ledger technology (DLT).

If you’re new to the technology, I highly recommend this introductory, plain English guide to blockchain.

I also came cross this great article defining criteria to avoid pointless blockchain projects and its follow-up on four genuine blockchain use cases.

R3 Corda

For one, R3, which I thought then and still think today shows a lot of promise, has released the code for Corda, its distributed ledger project. They also published a non-technical whitepaper as an introduction and two webinar videos: Introduction to Corda and Corda Developers’ Tutorial. There is alos this excellent non-technical 18 second definition of DLT by Richard Gendal Brown, CTO of R3.

R3 also offered its code to the Hyperledger project.

Hyperledger

Hyperledger isn’t a distributed ledger, per se, but contains multiple DLT projects, e.g. Fabric, which is backed by IBM. While you can run Hyperledger Fabric on your own machines, IBM also gives developers an opportunity to play with the technology in their cloud Bluemix.

Unlike Corda, which was built from the ground up for the financial services industry, finance is only one of the industries Hyperledger is targeting. There are, however, a number of projects underway in the financial services that use Hyperledger, as their proof of concept tracker shows.

One of those projects was undertaken by Germany’s central bank Deutsche Bundesbank and the country’s largest exchange operator Deutsche Börse. A November 2016 speech by Carl-Ludwig Thiele, member of the executive board of Deutsche Bundesbank contained mostly questions about the new technology. His speech from January 2017 already presented a prototype to handle simple settlement, payment and corporate actions.

Conclusion

There are a number of interesting projects underway to apply distributed ledger technology to finance.

Still, a lot of questions to be addressed regarding distributed ledger technology, as this position paper by SWIFT and Accenture from last year points out.

The Germany IT industry association Bitkom looks at some of these, e.g. legal ramifications of distributed ledger technology in banking (in German).

It is interesting to see though that regulators and central banks are already actively involved even though distributed ledger technology is still in its infancy in the financial services industry.

The Case of Outlook not downloading RSS content because of a problem connecting to the server

The Problem

I had been using Outlook to subscribe to RSS feeds for many years. Mostly because I like to have my email and news in one place with the ability to add them to my Outlook to-do list for follow-up.

Over time, however, I noticed that one feed after the other stopped updating. Even feeds from Microsoft sites such as TechNet and MSDN which I had assumed simply had to be compatible with Microsoft’s own RSS reader. Trying to remove and then re-add them gave the following error:

image

I eventually found out that all feeds that no longer updated had been switched from HTTP to HTTPS, which seems to be unsupported by Outlook.

The Cause

This support document describes the scenarios in which Outlook supports authenticated RSS feeds. Basically, support is extremely limited. Normally, HTTPS is not supported and the workaround using Internet Explorer’s Common Feed List did not work for me.

Initially, Internet Explorer would download the RSS feed and they would show up as unread items in Outlook, but after the initial download I wouldn’t get any new items in Outlook, even though they did show up in Internet Explorer.

The Solution

Searching online for a solution did not turn up anything. Either people described workarounds that did not work for me, or recommended simply abandoning Outlook in favor of some other feed reader.

So I implemented my own solution and wrote a very simple proxy server that runs a local web server via HTTP that fetches the requested feed via HTTPS from the original source. By configuring Outlook to use this proxy, I now receive updates for feeds in Outlook even when the source uses HTTPS.

This RSS-via-HTTP Proxy runs as a service in the background. Its source code and some instructions on how to configure it are available on GitHub.

The Case of Multicast Message Loss on Windows Server 2012 R2

I have worked quite a bit with applications using UDP/multicast messaging recently. And I’ve run into a few issues along the way, such as multicast messages not being received on a Windows Fail-over Cluster.

So by now I have a solid checklist of things to configure on our servers and ensure in our applications that consume multicast messages to make sure everything runs smoothly and message loss is kept at acceptable levels. Yet, on our latest Windows Server 2012 R2 machines I had applications experiencing serious datagram loss as the amount of network traffic (in general, not just multicast) on the machine increased increased.

I researched the problem online and got the tips you’d expect: get latest NIC drivers, increase NIC receive buffer sizes, turn on offloads, turn on receive-side scaling, fine-tune receive-side scaling, increase socket buffer sizes etc. Of course, I had already tried all those things, and none of them had worked.

Solution: Exempting multicast traffic from Base Filtering Engine

Eventually I found this support document: Datagram loss when you run a multicast receiver application in Windows 8 and in Windows Server 2012. The problem description matched perfectly with what I was seeing on our server. Unfortunately, the document describes an issue in Windows Server 2012 and the hotfix available there cannot be installed on Windows Server 2012 R2. Fortunately, it doesn’t have to be. You can just set the registry key and the Base Filtering Engine supports it out of the box.

New-ItemProperty HKLM:\System\CurrentControlSet\services\Tcpip\Parameters\ -Name UdpExemptPortRange -Value "XXXX-YYYY" -PropertyType MultiString -Force

I haven’t found any official documentation on this, and prior to this post, there were just four results when searching Google for UdpExemptPortRange. But it works as far as I can tell.

The Case of the Not Enough Storage Error in a Delphi Application

The Problem

After a few weeks Delphi applications running on Windows Server (2008 or later) will fail to start or create new windows with error 8: Not enough storage is available to process this command.

Possible Cause 1: Desktop Heap Exhaustion

Session 0 which service applications are running in gets allocated substantially less desktop heap than interactive sessions, so you might be running out of it. There is a system event log entry with event ID 243 or 244 when the desktop heap gets exhausted.

By increasing the values in the SharedSection part of the data in registry value “Windows” in key “HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems”, you can allocate more to the service session, as described in this Microsoft support article or this Stack Overflow answer.

Note that Microsoft’s Desktop Heap Monitor tool no longer works on current versions of Windows Server.

Possible Cause 2: Atom Leak

Applications compiled with Delphi XE2 or earlier call RegisterWindowMessage in the initialization section of Controls.pas to allocate a unique window message ID. This takes up one slot in the session’s atom table. Unfortunately, there are only around 16 000 slots in the atom table. With no way to unregister a window message, you will eventually run out of them. It has been said that on earlier versions of Windows, the counter wrapped around so you never ran out as slots got overwritten/reused.

For more details, see this bug report on Embarcadero’s Quality Central and this Stack Overflow answer.

I fixed this by editing the Controls.pas unit replacing this line

RM_GetObjectInstance := RegisterWindowMessage(PChar(ControlAtomString));

with this one

RM_GetObjectInstance := RegisterWindowMessage(PChar('DelphiRM_GetObjectInstance'));

This way all Delphi applications share the same atom instead of creating their own. There is also a more involved fix by Andreas Hausladen, but this solution sufficed for me.

Analyzing the Atom Leak

Jordi Corbilla has written the super useful Atom Table Monitor which dumps entries from the atom table. I made a C# version of it, the Atom Table Dumper, because (a) I didn’t want to diagnose issues with my Delphi applications using another Delphi application that had the same issue and (b) I needed an easier to read and process output format.

Note that your analysis tool has to run in the same session as the application getting the error.

Further note that not all of the atoms listed under RegisteredWindowMessage were necessarily allocated by RegisterWindowsMessage. RegisterClass is another function that allocates atoms in this table. However, as the documentation points out “All window classes that an application registers are unregistered when it terminates.“ so having many of them (temporarily) isn’t necessarily a problem.

Watch out mixing String and PChar in Delphi

Rudy Velthuis has written an excellent article about PChar in Delphi. Whenever someone comes up to me with a PChar/string question, I refer them to that article.

There are, however, a few caveats one needs to be aware of when mixing string and PChar that he does not mention. This is what I want to cover in this brief article.

The problems mixing string and PChar described here are caused by null characters (or #0 in Delphi syntax). Since Delphi strings have a length counter, It is perfectly legal for them to contain #0. Since PChar are null-terminated, by definition they cannot contain #0.

However, a few Delphi routines that seem to operate on strings internally actually operate on PChar. They will therefore not work as expected when the string they are given contains #0.

You cannot replace #0 in a string

If you call StringReplace to replace #0, it simply won’t work.

function RemoveNull(AInput: string): string;
begin
  Result := StringReplace(AInput, #0, '', [rfReplaceAll]); // Won’t work
end;

It seems that all replacing routines internally cast to PChar making them useless for this. Instead you will have to do a character by character comparison as in this routine by David Heffernan.

StrPCopy stops at #0

The StrPCopy routine takes a string as an input and copies it into a PChar. I used to use this routine in a TStringBuilder-like class which used a PChar to refer to an internal buffer containing the string being built. It also had a length counter, so one would not have to scan the PChar to figure out its length.

But even though StrPCopy takes a string as its input and could thus have access to its length, it does not copy all character, but stops at the first #0.

If you really want to copy all characters, you’ll have to use a routine such as Move that copies bytes not characters as in this example (again by David Heffernan).

Beware of implicit conversions

Since the compiler supports implicitly converting between PChar and string, you will often get away with passing a PChar where a routine takes a string parameter. The compiler will just generate a temporary string from the PChar and pass that instead.

Hence, this code will compile and work just fine:

function GetSubstring(AInput: PChar; AStart, ASubstringLength: Integer): string;
begin
  Result := Copy(AInput, AStart, ASubstringLength); // Works, but is slow
end;

Unfortunately, it has horrible performance, because the compiler needs to know the length of the PChar in order to built that temporary string. And the only way to figure out the length of a PChar is to check every character until the first #0.

See this question I asked on Stack Overflow how to fix this with a PChar-equivalent for the Copy routine.