Content-Type attacks are related to the vulnerabilities in client side software that are used to read the content like adobe reader, Microsoft office, Image viewer. Attackers attempt to exploit programming flaws in that code to induce memory corruption issues, resulting in their own attack code being run on the victim computer that opened the PDF or DOC file.
Content-Type attack is Dark Hole in a secure environment due to following reasons
- Un-detective Nature: There are multiple types of attack e.g. cross-site scripting, SQL injection, DNS cache poisoning, HTTP tunneling etc. Though there are multiple devices like WAF (Web Application Firewall), IPS (Intrusion Prevention System) that can be used to detect and prevent these attacks, it’s difficult to detect content-type attack.
- Ignorance: Most of the penetration testing assignments focus on web-application testing and some are for critical servers in the infrastructure. But very rarely organizations focus on the workstation in the environment. Even if the organizations look for workstation related vulnerabilities and patches, it is still limited to Windows or any other critical application running in the environment. Last, if any organization looks into vulnerability of content reading software then 0-day attacks cannot be avoided without any other preventive measurements.
- False sense of Security: When we talk about security of any environment components like firewall, IPS, IDS, Anti-Virus comes first in the mind. But having these components does not mean that environment is completely secure. It also required best configuration of these component as well as other components in the environment like Proxy Server, outbound policy at internet gateway etc.
Content-Type Attack Process:
- This attack document is sent by an attacker to a victim, perhaps using a compromised machine to relay the e-mail to help conceal the attacker’s identify.
- If the victim double-clicks the file attached to the e-mail, the application registered for the file type launches and starts parsing the file.
- In this malicious file, the attacker will have embedded malformed content that exploits a file-parsing vulnerability, causing the application to corrupt memory on the stack or heap.
- Successful exploits transfer control to the attacker’s shell code that has been loaded from the file into memory.
- The shell code often instructs the machine to write out an EXE file embedded at a fixed offset and run that executable. After the EXE file is written and run, the attacker’s code writes out a ”clean file” also contained in the attack document and opens the application with the content of that clean file.
- In the meantime, the malicious EXE file that has been written to the file system is run, carrying out whatever mission the attacker intended.
Malicious Content-Type Attack Document Structure
A malicious document is combination of multiple components that is used by attacker to compromise any victim machine. Following are the different components.
- Vulnerability: This code is used by attacker to exploit the vulnerability of the content reading software. After successful exploitation control gets transfer to shellcode part of the document.
- ShellCode: This part of document used by attacker for post exploitation activity, which can varies from executing binary file , downloading malicious file , installation of key-logger , reverse tunnel to attacker controlled server and many more.
- Embedded binary code: Embedded binary code is the executable code that attacker want to execute on victim machine as part of post-exploitation activity.
- Clean Document within context: This part is used by attacker to clean the evidences or to cover the attack vector.
PDF file analysis
This part includes brief about PDF file structure, PDF file format and different objects which are interest of attacker as well as for analysis of PDF file. Also analysis of PDF using python based scripts.
PDF file structure: PDF file is divided into four main parts- PDF Header,Body, Cross-Reference table and Trailer.
- PDF Header: The first line of the PDF specifies the version of a PDF file format. These headers are the topmost portion of a document. It reveals the basic information of a PDF file, for example, “%PDF-1.4”, it means that this PDF format is the fourth version. By the way, to read a PDF, you need a later version of PDF reader, i.e. you have to download Adobe Acrobat 5.0 to view %PDF-1.4.
- PDF Body: The body of a PDF file consists of objects that compose the contents of the document. These objects include image data, fonts, annotations, text streams and so on. Users can also integrate invisible objects or elements. These objects embed the interactive features in a document like animation or graphics. A user can also implement logical structure in the document. You can also make the content of a PDF document more secure by implementing security features. One can protect the content of a document from unauthorized printing, viewing, editing or modifying. The body of a PDF also supports two types of numbers called integers and real numbers.
- The Cross-Reference Table: The cross-reference table consists of links to all the objects or elements in a file. You can deploy this feature to navigate to other pages or content in a document. When users update their PDF files, they will automatically get updated in the cross-reference table. One can also trace the updated changes in the cross-reference table.
- The Trailer: The trailer contains links to cross-reference table and always ends up with “%%EOF” to identify the end of a PDF file. The “%%EOF” is necessary for a PDF file, if this line missed, the PDF-file is not complete and may not be processed correctly. This is not same as PostScript files. If the last few lines of a PostScript file missed, you will still print most of the pages. For a PDF file, you lose everything. The trailer enables a user to navigate to the next page by clicking on the link provided.
PDF file format
PDF file format use Post-Scripting language to describe a PDF file. In which one object contains reference of other objects and form a tree like structure as shown below.
In the body (the object list), there are following different kind of definitions:
- Indirect reference (n r R): references an object, e.g. 5 0 R. If the objects doesn’t exist this is equivalent to the Null object (see below).
- Name(/Name): names are identifiers. If you know Lisp or Scheme, this is similar to the quote special form (e.g. ‘ok). The initial / introduces the name but isn’t part of the name; this is similar to $ in Bash, Perl or PHP.
- Dictionary(<< … >>): this is a unordered list of (Name,Object) pairs. They are essentially hash tables. The Object part can be another Name (e.g. /Type /Font).
- Array ([x y z …]): an ordered list of objects, e.g. [ 0 0 200 200 ].
- String Object ((text)): text. The complete syntax is complex, but for now suffice to say it’s text between parenthesis, e.g. (Hello, world!).
- Stream (<< /Length … >> stream … endstream): embedded data, can be compressed. It starts with a dictionary that describes the stream such as its length or the encoding (/Filter) is uses.
PDF File analysis using scripts
Multiple scripts are available publically to analyze PDF file. For demonstration purpose I will use pdfid.py and pdf-parser.py scripts developed by Didier Stevens.
PDF analysis using pdf-parser.py: This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document.
stats option display statistics of the objects found in the PDF document. Use this to identify PDF documents with unusual/unexpected objects, or to classify PDF documents. The search option searches for a string in indirect objects (not inside the stream of indirect objects). The search is not case-sensitive, and is susceptible to the obfuscation technique.
filter option applies the filter(s) to the stream. For the moment, only FlateDecode is supported.
The raw option makes pdf-parser output raw data.
OBJECT outputs the data of the indirect object which ID was specified. This ID is not version dependent. If more than one object have the same ID (disregarding the version), all these objects will be outputted.
reference allows you to select all objects referencing the specified indirect object. This ID is not version dependent.
type allows you to select all objects of a given type. The type is a Name and as such is case-sensitive and must start with a slash-character (/).
PDF analysis using pdfid.py
PDFiD will scan a PDF document for different type of objects as shown in the below snapshot and count the occurrences (total and obfuscated) of each word.
Protection measure against Content-Type Attack
Following are the few protection measure that can be used to protect the environment.
- Security update: All the security updates must be available, which an prevent from exploitation of all vulnerabilities except 0-day attack. 0-day attack can be avoided by use of other protection measure.
- Java script in adobe reader: Java Script is used for automation of some task in PDF e.g. calculation, form filling etc. But attacker use the same for some malicious activity. So Java Script should be disabled in PDF and should be enable only if required.
- DEP implementation: DEP (Data Execution Prevention) prevents execution of code in non-executable area. Attacker usually try to overflow the buffer to execute the code in non-executable area. So DEP should be enable and if require it should be enable for trusted application.
- Security Awareness: Security awareness related to emails & attachments can prevents content-type attack. These awareness should motivate employee to submit attachment from un-trusted source for analysis purpose.
- White-list based proxy: Internet proxy can be implemented in two ways. First black-list based which prevent access to some of the URL’s like facebook , Gmail etc. Second white-list based which grant access to only allowed URL’s. White-list based proxy implementation can prevent from post exploitation activity where attacker wants victim to connect to malicious websites.
- Strong outbound firewall policy: Strong firewall policy both for inbound and outbound also prevents from post exploitation activity where attacker’s post exploitation code try to open reverse channel from victim to attacker controlled machine as shown in Content-Type attack process diagram.
Every organization should implements maximums of the protection measure to secure the environment from Content-Type Attack.