One of the first things when receiving data from a request is to validate if they match the expected format and sanitize potentially undesired contents. Since this is such a common task, PHP provides a default extension for this purpose, filter.
Using the filter extension
The filter PHP extension provides a few common filters to validate or sanitize data. You use it by calling
filter_var() with the value to filter and the filter to apply. You may supply options as the third parameter that change the behaviour of the selected filter. The function will return the filtered value on success, or
false on failure.
$input = "14"; $result = filter_var($input, FILTER_VALIDATE_INT, FILTER_NULL_ON_FAILURE); var_dump($result);
In this example, we picked
FILTER_VALIDATE_INT to check if
$input is a valid integer. Additionally, we specified the option
FILTER_NULL_ON_FAILURE, which will make it return
NULL instead of
false if the validation check fails. Since the string in
$input is an integer, the example will return
$input to for example
"b" will make it return
Filters and flags are defined as global constants by the filter extension. Validation filters will start with
FILTER_VALIDATE_ while sanitization filters will have the
FILTER_SANITIZE_ prefix. The documentation contains detailed explanations of all validation filters, sanitization filters and their option flags.
Validating an email address
A common use case of validation is to check if an email is correctly formatted. Here is how that would look using
$email = "firstname.lastname@example.org"; $result = filter_var("sample@@somewhere.com", FILTER_VALIDATE_EMAIL); var_dump($result);
Email validation is not as trivial as you would expect. From just checking if it contains
. characters, to multiple lines of regular expressions, there are countless strategies for accomplishing this task, most either being incorrect or using a lot of resources. The advantage of using filter_var() is that it properly abstracts all that away from you, making validation of email addresses according to RFC 822 easy and efficient.
Note that this will only check if the email format is valid, not if the mailbox (or domain) actually exists.
Sanitizing a username
While rarely given much attention, usernames can be tricky at times. Imagine you identify users uniquely by username. What if a user enters a username that looks the same as someone else's, except with some invisible characters inbetween? For the server they would be different, but for humans they are visually identical.
Stripping such undesirable characters from strings is a prime example of sanitization with
$input = "user\nna\rme"; $result = filter_var($input, FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW|FILTER_FLAG_STRIP_HIGH); var_dump($result);
As you can see, we sneaked a newline
\n and carriage return
\r symbol in the
$input variable, but
filter_var() simply returns
"username" with all non-printable characters removed.
Be careful when assigning the results of sanitization filters directly, as even they can fail (see next paragraph for an example of that).
A closer look at
Filtering commonly happens to data received in a request. For this purpose, the filter extension includes the filter_input() function, which takes a value directly from the supplied input source (given as constants, for example
When reading the documentation, you may be confused why there are both
filter_input().At first glance, they seem to do the same thing, for example these 2 lines are equivalent:
$n = filter_var($_GET["number"], FILTER_SANITIZE_NUMBER_INT); $n = filter_input(INPUT_GET, "number", FILTER_SANITIZE_NUMBER_INT);
While they seem equal on first glance,
filter_input() behaves slightly differently from
For starters, it returns the value on success, false on error and
NULL if the variable was not present in the input source. This saves you from having to first check if the variable
$_GET["number"] even exists before filtering it, as it is done for you implicitly.
The second difference is that filter_input() will operate on the data received int he request, not the current state of the variable. Assume this script was called without any GET parameters:
$_GET["number"] = "11"; $n = filter_var($_GET["number"], FILTER_SANITIZE_NUMBER_INT); // $n is now 11 $n = filter_input(INPUT_GET, "number", FILTER_SANITIZE_NUMBER_INT); // $n is now NULL
Even thought we changed
$_GET["number"] within our code,
filter_input() read it directly from the input and not our altered variable. This may prevent some attack vectors or safeguard against side effects of functions like
Filtering multiple variables at once
When receiving data, you often need to validate multiple fields. To make this process easier for developers, the filter extension includes array variants:
Assume you are validating data for a new user signing up:
$filters = array( "username" => ["filter" => FILTER_SANITIZE_SPECIAL_CHARS, "flags" => [FILTER_FLAG_STRIP_LOW, FILTER_FLAG_STRIP_HIGH] ], "age" => ["filter" => FILTER_VALIDATE_INT, "options" => ["min_range" => 18], ], "email" => FILTER_VALIDATE_EMAIL, ); $userData = filter_input_array(INPUT_POST, $filters); var_dump($userData);
This condensed syntax lets us specify and apply all filters at once. If you have trouble understanding the syntax, the documentation explains it in more detail.
$userData will be an associative array with each of the named fields set to the result of applying the filters to the input value of the same name. This means that for example
$_POST["email"] will be filtered using
FILTER_VALIDATE_EMAIL and the result (the passing email address or
false) will be available at
$userData["email"]. You can check each result individually or simply loop over the entire
$userData to check if any of the fields is