How Does wget Check If Remote File Is Newer?
When timestamping is enabled via the -N (or
--timestamping) option, wget determines if a
remote file is newer than a local file by comparing their modification
times and sizes. Instead of downloading the entire file upfront,
wget issues an initial request to gather metadata from the
remote server. By evaluating specific HTTP headers or FTP file listings
against the local file’s attributes, it decides whether a fresh download
is necessary or if the local copy is already up to date.
The Mechanism for HTTP/HTTPS
For web-based downloads, wget leverages standard HTTP
protocol headers to make its decision without wasting bandwidth.
1. The HEAD Request
Instead of a standard GET request, wget
starts by sending a HEAD request to the server. This asks
the server to return only the response headers, omitting the actual file
content.
2. Inspecting the Headers
Once the server responds, wget looks for two critical
pieces of information:
Last-Modified: This header indicates the exact date and time the file was last altered on the server.Content-Length: This header specifies the size of the remote file in bytes.
3. The Comparison Logic
wget compares these values against the local file’s
metadata. It will proceed to download the remote file only if:
- The remote
Last-Modifiedtimestamp is strictly newer than the local file’s modification time. - The remote file size (
Content-Length) differs from the local file size, even if the timestamps match.
If the local file is newer or identical in both time and size,
wget skips the download, saving time and data.
The Mechanism for FTP
When downloading from an FTP server, the process relies on different commands since HTTP headers are not available.
1. Checking File Attributes
wget attempts to retrieve the remote file’s modification
time using FTP commands like MDTM (Modification Time). If
the server supports it, this provides an exact timestamp. If
MDTM is unsupported, wget parses the standard
directory listing to find the file size and date.
2. Comparison and Local Preservation
Similar to HTTP, wget checks if the remote timestamp is
newer or if the file size has changed. If a download does occur,
wget updates the local file’s modification time to match
the remote timestamp, ensuring future checks remain accurate.