How to Recursively Download a Website with Wget?

This article provides a practical guide on how to use the wget command-line utility to download an entire website for offline viewing. You will learn the essential terminal commands, the specific flags required for recursive downloading, and how to safely mirror a site without overloading the host server.

Understanding Recursive Downloading

When you download a website recursively, wget follows the links on the initial page and downloads those subsequent pages as well. This process repeats up to a specified depth, allowing you to fetch the entire structure of a site, including its HTML pages, images, stylesheets, and scripts.

The Standard Website Mirroring Command

The most efficient way to download a complete website using wget is by employing the built-in mirroring flag. The command looks like this:

wget --mirror --page-requisites --adjust-extension --convert-links --no-parent https://example.com

Breakdown of the Command Flags

Each flag in the command serves a specific purpose to ensure the downloaded website functions correctly on your local machine:

Being a Good Netizen: Adding Delays

Downloading a website too quickly can strain the target server’s resources. To prevent your IP address from being blocked and to practice good web citizenship, you should introduce a time delay between your requests.

wget --mirror --page-requisites --adjust-extension --convert-links --no-parent --wait=2 --limit-rate=100k https://example.com