The first step of penetration testing a web application is reconnaissance. It starts with simple fingerprinting of web server and cms names and versions, then continues to looking at website contents, links, query parameters, unusual http headers and so on. Modern reconnaissance relies mostly on automated tools, which we will explore in this article.
Disclaimer
Even though reconnaissance is used to gather information about a target website, it does so by making requests which can potentially overload or even crash it. Be aware that these tools can, even unintentionally, cause service degradation or entire outages. Only use them on websites with prior permission and after verifying that they are not currently under heavy load.
wafw00f
Before looking into a website, the first step should always be to look for defenses. Web application firewalls have raised in popularity since the creation of services like CloudFlare, and may well interfere with the information gathering process.
The wafw00f tool is used to simply by passing it a target url:
wafw00f https://example.comThe output will be only a few lines short (not counting the ascii art at the top):
                   ~ WAFW00F : v2.2.0 ~
   The Web Application Firewall Fingerprinting Toolkit
[*] Checking https://example.com
[+] Generic Detection results:
[-] No WAF detected by the generic detection
[~] Number of requests: 7If it finds anything, it will provide the name of the identified firewall, for example:
[*] Checking https://example.com/
[+] The site https://example.com/ is behind Cloudflare (Cloudflare Inc.) WAF.
[~] Number of requests: 2While wafw00f seems simplistic at first glance, it can accurately detect an impressive list of web application firewalls, you can list them with wafw00f --list.
whatweb
One of the first tools penetration testers turn to when investigating a website is whatweb, because it offers a naive but quick overview of a specific URL, highlighting points of interest. It can help identify early target on a page that merit further investigation, from web server versions to uncommon http headers, cookies and even contact information such as email addresses.
Similarly to wafw00f, simply pass it a target url:
whatweb https://example.comThe output will show a condensed stream of information, perfect for quickly scanning the most relevant parts of the page:
https://example.com [200 OK] Cookies[__Host-authjs.csrf-token,__Secure-authjs.callback-url], Country[GERMANY][DE], HTML5, HTTPServer[LiteSpeed], IP[1.2.3.4], LiteSpeed, UncommonHeaders[alt-svc], X-Powered-By[Express]In the example output, you can clearly identify the web server as LiteSpeed, while the X-Powered-By header hints that the backend is written in javascript, using node and express.js.
Your output may vary; whatweb only displays hints that it is fairly sure about, but includes 1800 (!) plugins to find and identify interesting portions of a website. It runs in stealth mode by default, using only shallow, difficult-to-detect scans. You can make it more aggressive by setting the -a flag to 3:
whatweb -a 3 https://example.comAggression level 3 heuristically includes tests by category; if a base test shows that a feature exists, all tests for that feature will implicitly run too. (There is also aggression level 4 which runs all tests unconditionally, but you should never use it. More than 1800 tests result in a lot of requests, for little benefit)
nikto
For a more complete initial assessment of a target url, nikto is very popular. It is designed as a complete vulnerability scanner, capable of finding anything from interesting files to misconfigurations or even sql injections. It is a rather aggressive tool that often trips intrusion detection of firewall systems, and has no options for stealth scans.
Running nikto against a url is very simple:
nikto -h https://example.comThe command may take a several minutes to complete, because it runs a large number of tests while also limiting the request frequency to prevent overloading the target server or triggering rate-limits.
The scanning time can be reduced by limiting the scan to specific vulnerabilities with the -Tuning flag:
nikto -h https://example.com -Tuning 9The example uses tuning value 9 to restrict scanning to sql injection vulnerabilities, see nikto -H for a list of all possible categories.
Nikto can detect several thousand security issues, from infections and existing backdoors to misplaced configuration files, and vulnerable database interactions, even supporting advanced evasion features like changing the URL-encoding or switching the sequence used for whitespace. See man nikto for a complete explanation of all features.
Deeper reconnaisance
Once the initial tools are exhausted, a penetration tester may turn to more specialized tools for further information gathering or for more target-specific scans. For more thorough and deep scanning of targets, skipfish, OWASP ZAP and Burp Suite are often used, which can crawl entire pages and perform analysis more tailored to the host system, potentially at the expense of significantly increased load on the target system.
If a common CMS was detected during initial reconnaissance, specialized scanners like wpscan (wordpress) or joomscan (joomla) may be used to search for cms-specific misconfigurations or known vulnerabilities.
If none of these tools yield sufficient results, a penetration tester may choose to manually investigate the target's source code and server responses. For some targets, especially custom-built applications, this is still the best option to find and assess security risks.
After reconnaissance, a penetration tester will typically pick points of interest that were gathered during reconnaissance for vulnerability testing, or turn to more aggressive tools like directory fuzzers if no data about potentially exploitable functionality has been found. The information gathering phase lays the foundation for the rest of the security assessment and automated tools significantly reduce the time investment of learning about the target system.