How Does wget Handle URLs With Spaces or Special Characters?
When downloading files via the command-line utility
wget, URLs containing spaces or special characters (such as
ampersands, question marks, or spaces) can cause the tool to fail or
interpret the command incorrectly. This article provides a quick
overview of how wget processes these problematic URLs and
explains the essential techniques—specifically URL encoding and proper
shell quoting—required to ensure your downloads complete successfully
without triggering syntax errors.
The Problem with Special Characters in the Shell
The core issue does not usually lie within wget itself,
but rather with how the command-line shell (like Bash or Zsh) interprets
characters before passing them to the utility.
- Spaces: The shell uses spaces to separate
arguments. If a URL contains an unquoted space,
wgetwill treat it as two separate URLs, leading to “scheme missing” or “404 Not Found” errors. - Ampersands (
&): In Unix-like shells, the ampersand is used to run a command in the background. If a URL has a query string containing&, the shell splits the command at that character and pushes the first half to the background. - Other Characters: Characters like
?,*, or$have special meanings in the shell (wildcards and variable expansions) and can corrupt the URL structure.
Solution 1: Wrapping the URL in Quotes
The simplest way to handle special characters is to wrap the entire
URL in single (') or double (") quotes. Single
quotes are generally preferred because they prevent the shell from
attempting to expand variables (like $).
wget 'https://example.com/download/file name with spaces.pdf'
wget "https://example.com/search?item=book&page=2"By quoting the URL, you force the shell to treat the entire string as
a single literal argument, which is then passed perfectly intact to
wget.
Solution 2: Percent-Encoding (URL Encoding)
While quoting protects the URL from the shell, certain web servers still struggle with raw spaces or literal special characters in the HTTP request header. To ensure maximum compatibility, these characters should be percent-encoded.
wget is capable of automatically handling
percent-encoded URLs. If you translate the problematic characters into
their percent-encoded equivalents, wget will transmit them
correctly to the server.
| Character | Percent-Encoded Equivalent |
|---|---|
| Space | %20 |
Ampersand (&) |
%26 |
Question Mark (?) |
%3F |
Equals Sign (=) |
%3D |
For example, if the original URL is
https://example.com/my file.zip, the encoded version passed
to wget should be:
wget https://example.com/my%20file.zipHow wget Saves the Resulting Files
When wget successfully downloads a URL with encoded
characters or spaces, it will, by default, save the file using the
literal characters found in the URL. For instance, downloading
file%20name.txt will often result in a local file named
file name.txt or file%20name.txt, depending on
the server’s Content-Disposition headers and the
wget version.
If you want to force wget to save the file under a
clean, specific name without dealing with local special characters, use
the -O (output document) flag:
wget 'https://example.com/file name with spaces.jpg' -O clean_filename.jpg