Detecting Potentially Malicious Javascript Embedded Within a PDF File Using LogRhythm Netmon

The Graphical Structure of a Typical PDF File

Various blog posts have been written by LogRhythm’s very own resident LogRhythm NetMon expert Rob McGovern regarding the numerous benefits of using Deep Packet Analytics within NetMon. If you’re not already familiar with deep packet analytics (DPA) rules, Rob’s post would be a great resource to review and includes free training!

While analyzing a PCAP file recently, I discovered some malicious, obfuscated JavaScript contained within a PDF file. After performing malware analysis on the PDF file and extracting the second stage JavaScript code and subsequent XOR encoded shellcode, I began to wonder if it would be possible to detect PDF files containing JavaScript in transit via the network.

Due to the extensive and complex PDF specifications, as well as the rich content that can be carried within them, PDF files are a great way to hide harmful JavaScript code. Typically, a heap spray exploit can be embedded quite easily within a PDF document using obfuscated JavaScript. This ease of embed, coupled with the fact that a PDF is a common file type regularly transmitted via e-mail, makes for a rather lethal recipe.

However, there is a solution to this problem: you can employ LogRhythm NetMon Deep Packet Analytics to enable rapid detection of malicious JavaScript embedded within a PDF file. To understand how to utilize NetMon in this case, you first need to understand the PDF file format.

PDF File Format: Recap

The PDF file format is well documented, and the graphical structure of a typical PDF file is shown below:

The Graphical Structure of a Typical PDF File Figure 1: The Graphical Structure of a Typical PDF File (Click to Enlarge)

The table below provides a high-level overview of some of the more common PDF file format elements:

Element Description
Header The header typically contains the PDF version, which is mandatory in order for the PDF reader to be able to successfully open the PDF document
Object A PDF will contain one or more objects. An object can contain information necessary to render the document, such as text, graphics, fonts, forms,
pictures, ActionScript, and JavaScript
Xref The Xref table is a mapping table of sorts, which contains offset values to the various elements within the PDF
Trailer This contains metadata about the file, as well as the root object, offsets, number of objects, their sizes, and so on
End-of-File This simply marks the end of the file

Now that you have a basic understanding of a PDF file, you can relate that understanding back to NetMon.

Network Packet Capture Details Reveal Obfuscated JavaScript Code

Suppose that a network packet capture has been obtained and is found to contain a PDF file. For a typical PDF file, an object similar to the following will appear in a WireShark output:

    2 0 obj
    <</Type /Page
    /Parent 1 0 R
    /Resources 4 0 R
    /Contents 5 0 R>>
    endobj

With a little reversing, you can obtain a similar output by accessing the same physical PDF file outside of the PCAP file. For example, you can carve the PDF file from the PCAP to access this same outside file.

obj 2 0
 Type: /Page
 Referencing: 1 0 R, 4 0 R, 5 0 R
  <<
    /Type /Page
    /Parent 1 0 R
    /Resources 4 0 R
    /Contents 5 0 R
  >>

The below packet contains a different object than the above, known as a keyword. In this instance, the keyword of interest is “JavaScript.” This keyword marks the start of the obfuscated JavaScript code embedded in the PDF file, as seen within the TCP stream for this particular network transfer.

/S /JavaScript
/JS

Enabling the DPA Rule

Using the network packet capture details and NetMon 3.6.2, you can use a function called GetHttpResponseContent() to search for the “/JavaScript” identifier across the life of the entire flow. While it is possible to do a similar lookup using the GetPayloadString() function within a DPA packet rule, this method would be significantly more expensive from a performance standpoint. In addition, a flow rule using the GetHttpResponseContent function can trigger a user alarm, which is instantly viewable in the NetMon Alarms dashboard.

The full code for this rule is shown below.

function DetectJavaScriptInHttpFiles (dpiMsg, ruleEngine)
    require 'LOG'
    local customFieldName = EmbeddedJavaScript

    local function FilenameAlreadyFlagged(filename)
        local flaggedFilenames = {GetCustomField(dpiMsg, customFieldName)}
        if flaggedFilenames ~= nil then
            return flaggedFilenames[filename] ~= nil
        end
        return false
    end

    if HasApplication(dpiMsg, 'http') then
        local fileType = GetString(dpiMsg, http, file_type)
        if fileType ~= nil then
            local httpResponseContent = GetHttpResponseContent(dpiMsg)
            local jsMatcher = '/JavaScript'
            if string.match(httpResponseContent, jsMatcher) then
                local filename = GetString(dpiMsg, http, filename)
                EZWARNING(Session , GetUuid(dpiMsg),  contains a ,fileType, file with embedded JavaScript: , filename)
                if not FilenameAlreadyFlagged(filename) then
                    SetCustomField(dpiMsg, customFieldName, filename)
                    local mediumSeverity = medium
                    TriggerUserAlarm(dpiMsg, ruleEngine, mediumSeverity)
                end
            end
        end
    end
end

Additional highlights for this rule are as follows:

  • The rule can create a custom field name called “EmbeddedJavaScript.” This becomes automatically prefixed with “EmbeddedJavaScript_NM” in ElasticSearch for easier querying.
  • If two files are sent via HTTP in the same session, both containing “/JavaScript” identifiers, the custom field generates two entries for each of the filenames.

Once the DPA rule has been enabled, clicking on the alarms dashboard will show any occurrences of the rule firing, assuming a PDF containing the “/JavaScript” identifier has been transmitted via HTTP.

Alarms Dashboard Shows Any Occurrences of the Rule Firing Figure 2: Alarms Dashboard Shows Any Occurrences of the Rule Firing (Click to Enlarge)

Alternatively, a simple Lucene search query will display all results for the custom metadata field of filetype PDF for the time window selected.

Lucene Search Query Displays all Results for the Custom Metadata Field of Filetype PDF Figure 3: Lucene Search Query Displays all Results for the Custom Metadata Field of Filetype PDF (Details Redacted) (Click to Enlarge)

Ascertaining the Final Host Details

But what if a separate team manages the NetMon appliances in your organization? The good news is that despite not necessarily having access to the NetMon dashboard, you can still be alerted when the DPA rule fires. The “NetMon Lua Alarm” common event can quickly be searched. This will bring up the raw log message that shows a host of useful information pertaining to the source and destination IPs, MAC addresses, port numbers, and other metadata fields.

NetMon Lua Alarm Common Event Can Quickly Be Searched Figure 4: NetMon Lua Alarm Common Event Can Quickly Be Searched (Click to Enlarge)

The session ID can be used for further collaboration and can be used to pivot on the metadata parsed out from NetMon. The “DetectJavaScriptInHttpFiles” vendor message ID will inform you as to which specific Lua rule triggered the alarm.

Once you ascertain which PDF file triggered the alarm, you will have all of the host details you need from both LogRhythm and NetMon to begin taking steps towards mitigation. You will also have access to underlying packet captures available in NetMon to continue with the investigation workflow. Your workflow may include creating a case, adding evidence and notes to the case, and performing the investigation to see how the PDF may have slipped by existing security defenses. This investigation could entail manually carving out the PDF from the PCAP, or performing incident response on the infected machine. The DPA rule can be found on our Community site here.

NetMon Deep Packet Analytics Rapidly Detects Malicious Code

Because PDF files are commonly used and exchanged, malicious code embedded in them has a high chance of reaching a target. You can help prevent additional users from being affected by quickly determining which PDF file was the source of this code. LogRhythm and NetMon seamlessly work together to provide immediate detection of threats that may otherwise slip under the radar, such as a threat that is transmitted via a PDF. Using NetMon Deep Packet Analytics, you can be alerted to the presence of a PDF embedded with malicious JavaScript and stop its spread—effectively securing your network.