What Does the Mirror Flag Do in wget?
The wget command-line tool is a popular utility for
downloading files from the web, and its mirror flag (-m or
--mirror) is the ultimate shortcut for backing up or
duplicating entire websites. This flag acts as a master switch that
enables a specific combination of options designed to systematically
crawl a site and download all of its assets. By understanding how the
mirror flag works and combining it with a few other essential
parameters, you can efficiently create offline archives of web pages
while respecting server resources.
The Anatomy of the Mirror Flag
When you use -m or --mirror in a
wget command, you aren’t just turning on a single feature.
Instead, wget automatically activates a bundle of four
distinct settings optimized for website replication:
- Time-stamping (
-N): This ensures thatwgetonly downloads files if the remote version is newer than your local copy. It prevents you from wasting bandwidth redownloading unchanged data during subsequent runs. - Infinite Recursion (
-rwithl inf): Standard recursion inwgetstops after five levels of links. The mirror flag sets the recursion depth to infinity, allowing the tool to follow links as deep as the website goes. - Keep Directory Listing
(
--no-remove-listing): This tellswgetto retain the FTP directory listings (.listingfiles) generated during the process, which is useful for tracking structure.
In practice, running wget -m https://example.com is
identical to running
wget -r -l inf -N --no-remove-listing https://example.com,
but much easier to type and remember.
Crucial Companion Flags for Local Browsing
If your goal is to download a website so you can browse it offline on
your local computer, simply using -m is often not enough.
Website links will still point to the live internet URL (e.g.,
https://example.com/about) rather than your local files. To
fix this, you should combine the mirror flag with a few other
options:
wget -m -k -p -E https://example.comHere is what those additional flags do to make your mirrored site fully functional offline:
-k(--convert-links): After the download is complete, this converts the links in the documents to point to your local files so you can click through the site offline.-p(--page-requisites): This forceswgetto download all necessary assets to display the page correctly, such as images, CSS stylesheets, and sound files, even if they aren’t explicitly linked in the main HTML recursion.-E(--adjust-extension): If a page ends in an unconventional extension or has no extension at all (like a CGI or PHP generated page), this forceswgetto append.htmlto the filename so your local browser opens it properly.
Best Practices and Server Etiquette
Mirroring a website can put a heavy load on the host server because
wget requests pages much faster than a human browsing the
site would. To avoid getting your IP address banned or accidentally
crashing a small server, it is highly recommended to introduce a delay
between requests using the -w or --wait
flag.
For example, adding -w 2 tells wget to wait
two seconds between every file it downloads. You can also use
--random-wait to vary the delay between 0.5 and 1.5 times
your specified wait time, making the traffic pattern look more natural
to server firewalls.