Practical wget use cases

Page contents

Installed by default on virtually all linux distributions, wget is an ancient solution for the need of automated web access. While most users will only use it to download one file at a time, it has lots of use cases beyond that, from interacting with http apis to emulating user sessions.

Probing server behavior

People who had contact with web security or penetration testing may be familiar with the flags --spider to prevent downloading the remote document, in combination with --server-response to show the server's http response headers:

wget --spider --server-response 'https://example.com'  

The output will show a lot of meta-information about the remote system:

Spider mode enabled. Check if remote file exists.  
--2025-12-02 12:23:12-- https://example.com/  
Resolving example.com (example.com)... 23.215.0.138, 23.220.75.232, 23.220.75.245, ...  
Connecting to example.com (example.com)|23.215.0.138|:443... connected.  
HTTP request sent, awaiting response...  
HTTP/1.1 200 OK  
Content-Type: text/html  
ETag: "bc2473a18e003bdb249eba5ce893033f:1760028122.592274"  
Last-Modified: Thu, 09 Oct 2025 16:42:02 GMT  
Cache-Control: max-age=86000  
Date: Tue, 02 Dec 2025 11:23:13 GMT  
Connection: keep-alive  
Vary: Accept-Encoding  
Alt-Svc: h3=":443"; ma=93600  
Length: unspecified [text/html]  
Remote file exists and could contain further links,  
but recursion is disabled -- not retrieving.  

From this single response alone, we can tell that the server supports HTTP3 (Alt-svc), caching (Cache-control) and intelligent cache expiration checks (ETag, Last-Modified) and may return different responses when requesting other encodings (Vary).

Downloading websites for offline use

By combining a few flags, wget can act as an archival program, downloading an entire website page including necessary assets like images, stylesheets and scripts, and rewriting links within the source code to use the local copies.


You can do this for a single page:

wget --convert-links --adjust-extension \
  --page-requisites 'https://example.com'  

The --page-requisites flag tells wget to download the page and all assets necessary for viewing, such as images, stylesheets or scripts. If the remote server is using dynamic file extensions other than .html for html responses like .php, .jsp or .aspx, the --adjust-extension flag appends .html to their filename so they work locally. Finally, --convert-links rewrites all links on the page referencing downloaded assets to use the local versions instead of the online paths by replacing absolute paths and domains with local paths instead.


An entire website and all pages linked on it can also be downloaded with a few adjustments:

wget --mirror --convert-links\
  --adjust-extension --page-requisites \
  --span-hosts --domains 'example.com' \
  'https://example.com'  

The only notable differences to a single page are --mirror to recursively download files linked to by the origin page, as well as --span-hosts to allow downloading files from different hosts (domains/subdomains) combine with --domains to limit allowed hosts to the desired domain and its subdomains.Combining these two options is important to prevent runaway crawling: if you forget --domains, then any found domain will be crawled and downloaded as well - and if the page contains a link to wikipedia, wget will try to download the entirety of wikipedia to your machine.


This does not guarantee all website contents are downloaded; it simply looks at the url given and follows all found links and downloaded those as well. Pages not linked to from within this link tree will be missing.

Syncing directories between hosts

If you want to quickly share files between two machines on the same network, you can spin up a python http server in the local directory you want to sync:

python3 -m http.server 8000  

Then use wget on the other machine to download all files recursively (excluding generated ìndex.html` files):

wget --mirror --no-parent \
  --reject index.html \
  'http://1.2.3.4:8000/'  

Using this has many advantages: it only requires python3 and wget, which are installed by default on virtually every linux distro, you need no shared storage or physical devices, no access credentials and the traffic is simple http which will be unrestricted for most firewall setups.On the flipside, this also lacks authentication and encryption, so anybody stumbling across the open port and access and download your files too. Only use this approach for files that aren't confidential and only in local networks you trust (do not use to download files from your online servers!).

Downloading files

File downloads are at the heart of wget, but most users miss out on important quality of life features. Probably the most important are resumable downloads enabled by passing -c:

wget -c 'https://example.com/large.zip'  

If the download is interrupted, you can simply re-run the command and wget will try to continue the download where it left off (if the server supports it). Especially large files over wireless network profit from this option.


When downloading multiple files, you can write their urls into a text file, one per line:urls.txt

https://example.com/first.zip  
https://example.com/second.zip  
https://example.com/third.zip  

then instruct wget to download them all in order:

wget -c -i urls.txt  

Enabling resumable downloads allows re-running the command in case it fails without losing progress.


If you are downloading many files or very large ones, you can limit bandwidth usage:

wget -c -i urls.txt --limit-rate=2m  

This allows the machine to be used for other network tasks while downloading, without wget consuming all available network bandwidth.


To prevent hitting server limits, consider adding a wait time in seconds using --wait, potentially combining it with --random-wait to help avoid triggering bot protections that ban clients for having too short or static request frequencies.

Interacting with rest apis

Many companies expose REST APIs to interact with their services, and these interactions can be automated with wget. In most cases, you will need three features: Setting the HTTP method, request headers and request body:

wget --method PUT \
  --header 'API-Key: 12345678' \
  --body-data '{"name": "John Doe"}' \
  'https://example.com/api/user'  

You can also load the request body from a local file using --body-file if you don't want to pass it as a string on the terminal.


If the api uses http basic authentication instead, that is also supported:

wget --method GET \
  --user 'muser' --password 'mytoken' \
  'https://example.com/api/user/123'  

Using wget for most APIs will work fine, but keep in mind that it is rather limited when dealing with form data: queries in urls and bodies are expected to be properly url-formatted and %-encoded, and advanced features like loading a file's contents into a named form parameter aren't supported (try curl for that).

Emulating user sessions

Many forms of web automation will need to interact with frontends built for humans, with no direct support for scriptable access like APIs.For authentication, most user-facing websites have a login mechanism, which stores a cookie with session information on success.For scriptable logins like form submissions, you can save the returned cookies in a local file for later use:

wget --save-cookies cookies.txt \  
  --keep-session-cookies \  
  --post-data "user=MYUSER&password=MYPASSWORD" \  
  https://example.com/login  

If the login mechanism isn't easily scriptable, like ones using CRSF tokens, login links sent by email or two-factor authentication, you will need to manually retrieve the cookie and write it to the cookies.txt file, e.g. using a browser extension.


Once you have a session cookie, you can use it to interact with pages protected by authentication:

wget --load-cookies cookies.txt 'https://example.com/profiles/@me'  

Some sites may have strict protections against bots which interfere with emulated user sessions. You may be able to partially prevent this by making your request more similar to a real browser:

wget \  
  --user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:119.0) Gecko/20100101 Firefox/119.0' \  
  --referer 'https://google.com' \  
  --load-cookies cookies.txt \  
  'https://example.com'  

Note that this does not help against more sophisticated checks based on javascript or captchas.

More articles

Getting started with KVM and virsh

From zero to a running vm without tripping over errors

Avoiding namespace pollution in C

Keeping code of all sizes clean and maintainable

Configuring vagrant to use kvm

Easy VM automation backed by libvirt