PHP output buffering

Web development in PHP abstracts a lot of the underlying networking and communication issues away from developers. This is great in most cases, but sometimes an application may want more control over when and how responses are sent. The output buffering mechanism is built into the PHP interpreter to satisfy those more advanced requirements.

What is output buffering?

When sending responses from a PHP script, the output is typically sent immediately. This is advantageous, as the script can quickly free up the memory needed to hold the data to be sent, so script execution takes less memory and data is sent as soon as possible. Output buffering puts a stop to this: it stores the entire response in memory, and sends it all at once when all the data is available.

Basic output buffering usage

Output buffering must be started explicitly using the ob_start() function. Every piece of output, for example from echo or print statements, will then be buffered. The buffered response is sent to the user once one of the flushing functions flush(), ob_end_flush() or ob_get_flush() is called, or when the script finishes.

<?php
    // start output buffering
    ob_start();

    // this will not be sent yet, but stored in the buffer
    print("Hello world!");

    // send buffered message
    ob_flush();
?>

The buffered response can be modified before sending too. It can be read using ob_get_contents() or deleted with ob_clean(). This allows to discard previous output entirely:

<?php
    // start output buffering
    ob_start();

    // this will not be sent yet, but stored in the buffer
    print("Hello world!");

    // read currently buffered output
    $previous_output = ob_get_contents();

    // delete buffered output
    ob_clean();

    // write new output
    print("I am the new content")
    print("Previous output was " + $previous_output);

    // send output buffer
    ob_flush();
?>

Note that multiple calls to ob_start() will created nested output buffers, so you can safely use it at any point in the script, even if a previous buffer is still active.

Performance considerations

The main benefit of output buffering is the additional control over the output, especially valuable in larger applications. While it provides a considerable benefit on a structural level, it is important to remember the drawbacks: the entire output is stored in memory until it is flushed. This can significantly increase resource usage - consider this example:

<?php
    ob_start();

    readfile("archive.zip");

    ob_flush();
?>

The script uses readfile() to output the contents of the file archive.zip to the user. This can be problematic depending on the size of that file: if archive.zip is 15GB in size, the script will now need 15GB per user calling the script to execute. If output buffering were off, it could instead read and send the file in tiny chunks (8kb by default), reading only that little part into memory, sending it, and repeating with the next chunks until the file is transferred.

HTTP Headers and output buffering

While output buffering will prevent premature sending of response contents, this does not affect headers. Consider this example:

<?php
    print("Hello world");
    header("Content-Type: text/plain");
?>

This script would result in the error Warning: Cannot modify header information - headers already sent by [...], and the header would not be set in our response. This is due to the nature of the HTTP protocol: it expects messages to begin with metadata, such as the HTTP status code indicating the success of the request, followed by headers that provide information about the type, size and language of the response data, among other things. After the headers comes a blank line followed by the actual response content. The HTTP protocol was not designed to provide headers out of order: once the response content starts, the response cannot switch back to providing more headers inbetween (this has been addressed to some extend in recent years through the addition of trailing headers, we will ignore those for simplicity).

Output buffering can help mitigate this situation, by ensuring no content is sent before all headers had a chance:

<?php
    // start output buffering
    ob_start();

    // is stored in buffer, not sent yet
    print("Hello world");

    // headers are not buffered, so this gets send immediately
    header("Content-Type: text/plain");

    // now the print gets send too
    ob_flush();
?>

Now that the output buffer has delayed the writing of the print() function until after the header() function, the execution order is correct again and the previous error does not show up anymore.

Preventing incomplete pages on errors

By default, an error in a PHP script will stop execution at the point it occurred, allowing a partial response to be sent to the user.

Consider this example:

<?php
    // set custom exception handler
    set_exception_handler(function($e){
        print($e->getMessage());
    });

    // will be displayed
    print("Hello ");

    // simulate an error
    throw new Exception("Something went wrong!");

    // will not be displayed
    print("World");
?>

The script returns the partial result Hello Something went wrong!, where the first print() call is visible to the user, but the second isn't. Output buffering is commonly used to mitigate this partial response, by buffering the page and replacing it with an error page if something goes wrong:

<?php
    // set custom exception handler
    set_exception_handler(function($e){
        // turn off output buffering and delete buffered content
        ob_end_clean();
        print($e->getMessage());
    });

    // enable output buffering
    ob_start();

    // written to output buffer
    print("Hello ");

    // simulate an error
    throw new Exception("Something went wrong!");

    // cannot be reached because of error
    print("World");

    ob_flush();
?>

Now only the custom error handler's output is shown, and the previous half-rendered page gets discarded.

When used properly, output buffering can be a powerful tool, enabling developers to build more flexible and robust web applications. While it's advantages are desirable, it is important to remember it's drawbacks and turn it off for larger response contents to ensure scripts don't use system memory excessively.