How to Download Specific File Types with Wget?
The wget command-line tool is a powerful utility for
downloading files from the web, but downloading an entire site when you
only need specific file types can waste time and bandwidth. By using the
accept (-A) and reject (R) flags, you can
precise-tune your download requests to target only specific extensions
like PDFs, JPEGs, or MP3s. This article provides a quick guide on how to
filter your wget downloads by file extension, whether you
are grabbing a single file type or mirroring a directory structure.
Using the Accept Flag for Specific Extensions
The most direct way to limit your downloads to specific file types is
by using the -A (or --accept) option. This
tells wget to only keep files that match the extensions you
define.
To download only PDF files from a specific directory, use the following syntax:
wget -r -A pdf http://example.com/directory/
If you need to target multiple file extensions at once, such as both PDFs and JPEG images, you can separate the extensions with a comma:
wget -r -A pdf,jpg,jpeg http://example.com/directory/
Key Flags to Use with Extension Filtering
When filtering by extension, you will usually need to combine the accept flag with other options to make the command work efficiently:
-r(Recursive): This tellswgetto follow links and travel through directories. Extension filtering is most effective when used recursively, as a single direct URL link usually overrides the filter.-l [depth](Level): Limits how deepwgetwill traverse into the subdirectories (e.g.,-l 2goes two levels deep).-nd(No Directories): Forceswgetto dump all downloaded files into a single local folder instead of recreating the server’s directory hierarchy.
Excluding Specific Extensions
Conversely, if you want to download everything except a
certain file type, you can use the -R (or
--reject) flag. This is useful if a site contains massive
video or zip files that you want to skip.
wget -r -R mp4,zip http://example.com/directory/
A Practical Example
If you want to download all images (PNGs and JPEGs) from a page and save them directly into your current folder without creating nested subfolders, your command would look like this:
wget -r -nd -A png,jpg,jpeg http://example.com/gallery/