By Tomer Bitton, security researcher, Imperva
PDFs are widely used business file format, which makes them a common target for malware attacks. Because PDFs have so many “features,” hackers have learned how to hide attacks deep under the surface. By using a number of utilities, we are able to reverse engineer the techniques in malicious PDFs, providing insight that we can ultimately use to better protect our systems. We’ll take you through the process that a hacker uses to insert a piece of malware into a sample PDF.
By opening the PDF file with a text editor it is possible to see that there are some encrypted objects. The first circle, object 11, is a command to execute Javascript in object 12. The second and third circles, are a command for object 12 to filter the Javascript with AsciiHexDecode. The main reason for this filter is to hide malicious code inside the PDF and avoid anti-virus detection. This is our first red flag.
This second image shows how the stream is decoded, but additional analysis is required to make sense of it. Again, we will open this code with a text editor to understand its purpose.
Opening this code as text, can see some Javascript, which is another red flag. We will now work to determine its intent.
By using a utility called Malzilla, we can analyze the Javascript. We input the Javascript in the top box and decode it with the circled button. A closer look at the second circle indicates that this Javascript contains shellcode, yet another red flag.
This is a closer view of the shellcode. Shellcode is typically used to exploit vulnerabilities while avoiding detection. Shellcode has earned its name for launching a command shell for the attacker to control.
Again, we run a utility, this time to convert the shellcode into an Executable file, which we save, so that we can take an even closer look at its function.
Here, we run yet another utility, IDA, which enables us to disassemble and debug the commands of the Executable file. As we have highlighted, this file contains multiple Nop slide functions, which are used in Shellcode attacks since the location of the Shellcode is not precisely known. This raises another red flag. From here, we should see if there are any interesting binary strings.
Here we have circled multiple binary strings that should raise concern. One of the circled items, URLDownloadToFileA, is a Windows API function to download a file from a remote server and to save it on the user’s PC. In this infected PDF, the shellcode uses it to point the PC to an infection point, which is the IP address we have circled (by the way, don’t visit that IP address). Once the infected file is downloaded, the shellcode will execute it, infecting the computer.
There you have it! You have to go deeper to find what is truly at the heart of this infected PDF. Hackers are intelligent about wrapping executable files in shellcode, encrypting it and hiding it in Javascript within PDF files, but by reverse engineering their techniques, we gain a better understanding of our vulnerabilities and can work to strengthen your security posture.