Can wget filter downloads by file timestamp?
The wget command line utility can absolutely filter
downloads based on the modification timestamp of files on a remote
server. By using specific built-in options, wget evaluates
whether a local file already exists and compares its timestamp with the
file on the server. If the server’s file is newer, the utility downloads
it; otherwise, it skips the download, saving both time and bandwidth.
This feature is particularly useful for mirroring websites, automating
backups, and syncing data efficiently.
Understanding Time-Stamping in wget
By default, wget does not automatically check timestamps
to skip files unless you explicitly tell it to do so. When you enable
time-stamping, wget sends a special request to the server
to check the Last-Modified header of the remote file before
initiating a full download.
To enable this behavior, you use the -N
(or --timestamping) option.
wget -N https://example.com/file.zipWhen you run this command, wget takes the following
actions:
- Checks for a local file: It looks for a file with the same name in your current directory.
- Compares timestamps: If the file exists,
wgetexamines its local modification time against the remote file’s timestamp. - Decides to download: If the remote file is newer
(or if the file sizes differ),
wgetfetches the new version and updates the local file’s modification time to match the server. If the local file is newer or identical, the download is skipped.
Key Options for Time-Based Filtering
Beyond the standard -N flag, wget offers a
few variations and complementary settings to fine-tune how it handles
file timestamps.
1. The Timestamping Option
(-N)
As the primary tool for this task, it is best used for recurring downloads or cron jobs where you only want updates.
wget --timestamping https://example.com/data.csv2. Combining with Mirroring
(-m)
If you want to back up an entire directory or website while
respecting timestamps, use the mirror flag. The -m option
automatically turns on -N (timestamping), along with
infinite recursion and preservation of directory listings.
wget -m https://example.com/downloads/3. Backing Up Local
Files (--backup-converted)
When syncing, if you want to keep your old local files instead of overwriting them when a newer server file is found, you can pair your command with backup options so older versions are renamed rather than lost.
Important Limitations to Keep in Mind
While filtering by modification date is highly effective, its success relies entirely on the server’s configuration and protocol.
- Server Support:
wgetrelies on HTTP headers (likeLast-Modified) or FTP directory listings to read timestamps. If a web server dynamically generates pages or hides modification headers,wgetwon’t be able to determine the file’s age and may download it anyway. - Local Clock Accuracy: Because
wgetcompares the remote time to your local file system time, ensure your local system clock is properly synchronized (via NTP) to avoid false comparisons. - HTTP vs. FTP: On FTP servers,
wgetparses the remote directory listing to find dates. On HTTP/HTTPS servers, it sends aHEADrequest to read the file metadata before deciding whether to send aGETrequest for the actual data.