Blog: How Tos

How to reverse engineer MSWord malware

David Lodge 08 May 2017

Following Myonlinesecurity’s malware email warning post (“I am disturbing you for a very important matter”) I too received one of these emails. While they went to the lengths of checking all possible sources and involved parties, I thought I’d take a different view: to deconstruct and analyse exactly what went into it.

Here’s a redacted screenshot:

Every single aspect of this email shouts “virus” at me, from the random email address it came from (an address) to the wording inside the document, to the attached file. Even the extension used (.dot) to try and avoid malware filters.

The bit that got me was that it included my real home address (that big redacted bit in the middle). A bit of thinking told me that this wasn’t that strange. The email address this was received at, and the specific format of address used is very like the details eBay has on me. So I suspect an eBay trader that I’ve dealt with in the past got compromised, so other details should be safe.

A quick analysis of the email’s headers (with suitable redaction) shows me where it really comes from:

From the headers, the spam and AV filters show that it’s passed through Sophos without creating an alert, and it’s not been highlighted as spam.

Using the Received headers we can find the route it traversed to get to me. This should be read from the bottom-to-top. where it transfers to my email server:

The route is basically: -> -> My email server

The source address is owned by the Italian ISP infostrada:

This tells me nothing other than it increases the likelihood of it being a malicious email.

Right, there’s one thing left – to look at the file itself. Before I start, there’s a warning: DO NOT DO THIS YOURSELF UNLESS YOU KNOW WHAT YOU’RE DOING. SERIOUSLY.

Initial Analysis

The first step is to extract the image and see what AV thinks about it. Windows Defender lets it past, and VirusTotal gives us no hits:

This is most likely because it has been encrypted with a password. If you refer back to the email it states that a password – 2266 – is needed to open the document. We can check this quite easily.

Modern word documents are stored in a format known as OOXML, which is a poorly documented ISO standard which was pushed out quickly to try and circumvent the already existing ISO standard for offices documents (known as ODF). This format zips up XML and other assets into a single file in a semi-consistent way.

If this is encrypted then within the encrypted document we should have no data, except for the meta data. This is what we can see when viewing it within a zip file viewer:

So, our first step is to decrypt the code. It would be fairly brave to run this in Word, unless you have a sacrificial, sandboxed VM. Similarly, using another office system that can read OOXML files (such as Libre Office) is rather risky.

Fortunately, the encryption scheme is relatively well documented and we can find tools to do this for us:

After this, the file looks more like a normal Word document:

Let’s see what VirusTotal thinks about the decrypted version:

Looks like I was right: it was malware. At this point I could stop, but it’s more interesting to continue and see whether we can see the macros themselves.

Next Step: the Code

At this point, I’m moving from a Windows VM to a Linux VM, because:

Macros are stored as a bit of a throwback. You know where I said earlier that the standard is a bit poor; in terms of macros Microsoft decided not to mess around with the old style much at all. To this end, instead of storing them sensibly as tokenised code, or even as the source, they’re compiled to bytecode and stored in the archaic OLE format. Think of this format as multiple files (known as streams) in one file, it’s all very 1990s.

Fortunately, we have tools to extract this data and decompile it. The first step is dump the streams in the document:

The M next to stream A3 shows that it is a macro and we can extract that stream separately:

Success! We have the source of the macro to analyse!

Unfortunately, it’s not quite as simple to interpret as this, it’s been obfuscated by renaming variables also, if you look at the last line of the extract above, by using random named pointers to system calls.

To cut a long story short, I loaded the macro up and manually deobfuscated it by renaming variables and we get back something readable.

So What Does it Do?

Here’s a quick snapshot of it getting its malware (URLs redacted for obvious reasons):

So here it updates the document to imply that its checking for SSL certificates and then pulls down one of two GIF files. At first I thought that this was just some sort of ping mechanism to detect whether the script has Internet access, but then I couldn’t see where it was getting the malware from, so I looked further.

At the end, we have a procedure, which I called “decode”:

Reading into it, it’s taking the data pulled down from the above URL (i.e. the GIF file), and passing it to decode. It then copies bytes 10 to 265 into a variable called dbuf. Then then XORs bytes 266 onwards with dbuf.

(Note both of the GIFs passed through VirusTotal without being recognised.)

This makes sense if you understand that the GIF specification tells the render program to ignore any extra bytes in the file. So, if we look at the header of the GIF we can see the following:

A GIF file has a 13 byte header, which starts with a magic value of GIF followed by a version or 89a or 87a, so far, so good. The next 4 bytes are the image dimensions, which would work out as 0x00cf and 0x008b, all of which are reasonable. So the first 10 bytes are roughly valid.

But, from byte 10 onwards, the values become nonsensical. These first 10 bytes are probably enough to fool any IDS system into thinking it’s a GIF file.

As we already know that bytes 10 – 265 (i.e. 255 bytes) are a key, let’s try decoding what the rest of the file is. The quickest way is to use a small script Python script:

After doing these we have a valid decoded PE (i.e. Windows executable) file:

At this point I stopped – a quick look into the executable showed me that it was most likely written in C++, which means I’d have to disassemble it to go further, time which would not be effectively spent. One last thing is to send the exe up to VirusTotal:

Oh, that’s definitely dodgy!


Be wary of dodgy emails, especially if they want you to open a file. The use of macros in office documents is one of the easiest vectors for malware.

You can’t depend on anti-virus, even when stuff is decoded not all anti-virus software will stop the exploit. Even the decoded executable was only recognised by a third of anti-virus software on VirusTotal, and there were some big names that missed it (McAfee, Symantec and TrendMicro, I’m looking at you here).

As a final note, out of the two URLs referenced in the code, one is still actively serving the malware even a week after the email was sent out.