Using WGET

Blane Warrene
Blane Warrene
Share

Wget is a tasty utility on Linux and Mac OS X systems that can come in handy for web system administrators.

Wget — found on the GNU.org site — is a command line application for file retrieval over ftp, http and https connections.

I find it useful for downloading files directly to a server I am working on in a shell session, saving time instead of downloading to my local desktop and uploading. Additionally, since it can pass user names and passwords, it is powerful for use in web site migrations, setting up mirrored sites and more.

Finally, Wget can be scheduled using cron, so if a file or directories need replicated on a regular basis, it can be set to do so without adminstrator intervention.

Some useful examples for utilizing Wget:

1) Downloading a remote file – Perhaps you are downloading an update to an application and have been sent the url. In this case you could use either ftp or http to retrieve:

wget http://somedomain.com/public/remotefilename.tar.gz or wget ftp://somedomain.com/public/remotefilename.tar.gz

Wget over ftp defaults to binary (i mode on ftp lingo), however, of you need to use ascii mode, you simply add ‘;type=a’ (without quotes) onto the end of the ftp url example above.

2) Downloading with authentication – you may be updating a registered application requiring a user name and password to access. Change the syntax as shown below:

wget username:password@http://somedomain.com/reg/remotefilename.tar.gz or wget username:password@ftp://somedomain.com/reg/remotefilename.tar.gz

3) Inserting custom ports into the wget request – perhaps your download will require a custom port along with authentication. Wget easily handles this as well by inserting a colon and portnumber afrter the host and before the /path to file(s):

wget username:password@ https://somedomain.com:portnumber/reg/remotefilename.tar.gz or wget username:password@ftp://somedomain.com:portnumber/reg/remotefilename.tar.gz

4) Entire directories can also be migrated from one server to another, i.e. moving a web site to new hardware. I have found ftp access to be most effective for this. I also make use of logging (the -o option) the transfer in the event debugging or verification of file retrieval is needed, and use the recursive option (-r) to recreate the directory structure on the new server.

So if I am moving mydomain.com — I would use:

wget -o mylogfile -r myuser:mypass@ftp://mydomain.com/


If you have an ftp user that can see more than one domain, insure you specify the path to the files and directories for the domain you are moving.

There are several other interesting and useful options including:

–passive-ftp: for using wget behind firewalls

-nd: does not recreate the directory structure on the remote machine and instead simply saves all retrieved files into the current local directory.

–cookies=on/off: if the remote site requires cookies to be on or off to retireve files (helpful with authentication at times)

–retr-symlinks: Will retrieve files pointed to by symbolic links.

There are several other powerful features in Wget, and fortunately, the manual included offers excellent examples. Simply run man wget on the command line to review.

Frequently Asked Questions (FAQs) about Using Wget

How Can I Use Wget to Download Files Through a Specific Port?

To use Wget to download files through a specific port, you need to specify the port number in the URL. For example, if you want to download a file from a server running on port 8080, you would use the following command: wget http://example.com:8080/file. The :8080 in the URL tells Wget to connect to port 8080 on the server.

What is the Transport Protocol Used in Wget?

Wget uses the TCP/IP protocol for data transmission. It supports both HTTP and FTP protocols for file transfer. HTTP and FTP are application layer protocols in the TCP/IP stack, and they use TCP as the transport protocol.

How Can I Use Wget to Download an Entire Website?

To download an entire website using Wget, you can use the -r (or --recursive) option. This tells Wget to follow links and download all files from the website. For example, wget -r http://example.com will download the entire example.com website.

How Can I Resume an Interrupted Download with Wget?

If a download is interrupted, you can resume it using the -c (or --continue) option. For example, if you were downloading a file and the download got interrupted, you can resume it with wget -c http://example.com/file.

How Can I Limit the Download Speed in Wget?

You can limit the download speed in Wget using the --limit-rate option followed by the desired speed. For example, wget --limit-rate=200k http://example.com/file will limit the download speed to 200 KB/s.

How Can I Download Files from a List with Wget?

To download files from a list, you can use the -i (or --input-file) option followed by the name of the file containing the list of URLs. For example, wget -i urls.txt will download all files listed in urls.txt.

How Can I Make Wget Ignore Certain Types of Files?

You can make Wget ignore certain types of files using the -R (or --reject) option followed by a comma-separated list of file types. For example, wget -R jpg,jpeg http://example.com will download all files from example.com except for JPEG images.

How Can I Use Wget to Download Files in the Background?

To download files in the background, you can use the -b (or --background) option. For example, wget -b http://example.com/file will start the download and then move the process to the background, freeing up your terminal.

How Can I Use Wget to Download Files from a Password-Protected Site?

To download files from a password-protected site, you can use the --user and --password options followed by your username and password. For example, wget --user=username --password=password http://example.com/file.

How Can I Use Wget to Download Files from a Site with a Self-Signed SSL Certificate?

To download files from a site with a self-signed SSL certificate, you can use the --no-check-certificate option. For example, wget --no-check-certificate https://example.com/file. This tells Wget to ignore SSL certificate checks and proceed with the download.