How to Prevent Wget from Ascending to Parent Directory
When downloading files recursively using the wget
command-line utility, it often automatically climbs into parent
directories, resulting in unwanted file downloads. This article provides
a quick overview of how to restrict wget to the target
directory using specific command-line flags. We will explore the primary
solution, discuss why this behavior happens by default, and look at
additional flags to fine-tune your recursive downloads.
The Short Answer: Use the -np or –no-parent Flag
The most direct way to stop wget from ascending to a
parent directory during a recursive download is to use the
-np (or
--no-parent) option.
When you initiate a recursive download, wget follows
links. If a page contains a link pointing to a higher-level directory
(like a “Back to Home” or “Parent Directory” link), wget
will follow it by default. Enabling this flag tells the utility to
completely ignore any links that lead outside or above the current
directory hierarchy.
Here is the standard command structure:
wget -r -np http://example.com/subdir/Breakdown of the Command Options
To ensure your download works exactly as intended, it helps to understand what each part of the command is doing:
-r(or--recursive): Turns on recursive downloading, allowingwgetto follow links and download entire directories.-np(or--no-parent): Restricts the download exclusively to the specified directory and its subdirectories, preventing any upward traversal.
Additional Tips for Cleaner Downloads
When downloading directories recursively, combining the no-parent flag with a few other options can prevent cluttering your local system:
-k(or--convert-links): After the download is complete, this converts the links in the documents to make them suitable for local viewing.-N(or--timestamping): Turns on timestamping, which ensures you only download files if they are newer than the ones you already have locally.--reject index.html*: Preventswgetfrom saving auto-generated web server index pages, leaving you with only the raw files you actually want.
By default, wget treats the parent directory like any
other link. Explicitly adding -np ensures
your recursive downloads stay strictly contained within your target
folder.