Using WGET

By | | Open Source

Wget is a tasty utility on Linux and Mac OS X systems that can come in handy for web system administrators.

Wget — found on the GNU.org site — is a command line application for file retrieval over ftp, http and https connections.

I find it useful for downloading files directly to a server I am working on in a shell session, saving time instead of downloading to my local desktop and uploading. Additionally, since it can pass user names and passwords, it is powerful for use in web site migrations, setting up mirrored sites and more.

Finally, Wget can be scheduled using cron, so if a file or directories need replicated on a regular basis, it can be set to do so without adminstrator intervention.

Some useful examples for utilizing Wget:

1) Downloading a remote file – Perhaps you are downloading an update to an application and have been sent the url. In this case you could use either ftp or http to retrieve:

wget http://somedomain.com/public/remotefilename.tar.gz or wget ftp://somedomain.com/public/remotefilename.tar.gz

Wget over ftp defaults to binary (i mode on ftp lingo), however, of you need to use ascii mode, you simply add ‘;type=a’ (without quotes) onto the end of the ftp url example above.

2) Downloading with authentication – you may be updating a registered application requiring a user name and password to access. Change the syntax as shown below:

wget username:password@http://somedomain.com/reg/remotefilename.tar.gz or wget username:password@ftp://somedomain.com/reg/remotefilename.tar.gz

3) Inserting custom ports into the wget request – perhaps your download will require a custom port along with authentication. Wget easily handles this as well by inserting a colon and portnumber afrter the host and before the /path to file(s):

wget username:password@http://somedomain.com:portnumber/reg/remotefilename.tar.gz or wget username:password@ftp://somedomain.com:portnumber/reg/remotefilename.tar.gz

4) Entire directories can also be migrated from one server to another, i.e. moving a web site to new hardware. I have found ftp access to be most effective for this. I also make use of logging (the -o option) the transfer in the event debugging or verification of file retrieval is needed, and use the recursive option (-r) to recreate the directory structure on the new server.

So if I am moving mydomain.com — I would use:

wget -o mylogfile -r myuser:mypass@ftp://mydomain.com/

If you have an ftp user that can see more than one domain, insure you specify the path to the files and directories for the domain you are moving.

There are several other interesting and useful options including:

–passive-ftp: for using wget behind firewalls

-nd: does not recreate the directory structure on the remote machine and instead simply saves all retrieved files into the current local directory.

–cookies=on/off: if the remote site requires cookies to be on or off to retireve files (helpful with authentication at times)

–retr-symlinks: Will retrieve files pointed to by symbolic links.

There are several other powerful features in Wget, and fortunately, the manual included offers excellent examples. Simply run man wget on the command line to review.

Written By:

Blane Warrene

Blane is a writer and researcher focusing on Apple and Open Source technologies. Prior to this, he helped found a commercial software and consulting venture, and worked in the financial services sector as a director of technology and in varying technical roles. Blane maintains Open Sourcery: SitePoint's Open Source Blog.

Website
>> More Posts By Blane Warrene

 

{ 19 comments }

Akash Mehta June 9, 2007 at 5:50 pm

@webchalkboard: The cPanel user authentication system is not a proper server-basd HTTP authentication method, it’s generated through scripting, and as a result wget can’t handle it. I’m not sure if the port should affect the outcome as you have still specified the protocol as http.

lx December 5, 2006 at 5:25 am

vidhya: try “man wget” to get all options

Try these for prowy authentication:
–proxy-user=USER
–proxy-passwd=PASS

Greetz,
lx

vidhya October 23, 2006 at 7:32 pm

hi

when i tried to download a file using wget from my office n/w, i got the following message

Connecting to myserver:8080… connected.
Proxy request sent, awaiting response… 407 Proxy Authentication Required
14:14:30 ERROR 407: Proxy Authentication Required.

when i access thru the browser, i am using a username and password

how can i download from my user home

regards
vidhya

Yuksel Kurtbas October 22, 2006 at 11:04 am

Hi,

I’m a little confused, I can get wget to work when I enter a url like: wget http://username:password@www.mysite.co.uk/admin/

But it doesn’t work when I try to get:
http://username:password@www.mysite.co.uk:2082/getbackup/backup-mysite.co.uk-7-14-2006.tar.gz

I’m trying to grab the cpanel backup for my website from a different server. I thought this was a cool way to do it, but it doesn’t work… maybe because of the port?

I’m confused, nothing is working today! :(

You cant access 2082 port by pass

webchalkboard July 15, 2006 at 12:55 am

Hi,

I’m a little confused, I can get wget to work when I enter a url like: wget http://username:password@www.mysite.co.uk/admin/

But it doesn’t work when I try to get:
http://username:password@www.mysite.co.uk:2082/getbackup/backup-mysite.co.uk-7-14-2006.tar.gz

I’m trying to grab the cpanel backup for my website from a different server. I thought this was a cool way to do it, but it doesn’t work… maybe because of the port?

I’m confused, nothing is working today! :(

carlos alberto July 11, 2006 at 9:01 am

Eu ainda nunca utilizei, vou agora mesmo utiliza-lo. Muito interessante esta dica.

Nilesh April 22, 2006 at 4:01 pm

Is it possible to transfer free rapidshare files
to my webhost server using WGET. If yes then
what cammand should i use ? How can i run WGET
on my extenal webhost server so i can use speed
of the external server to transfer files from
rapidshare to my webhost account file manager.

Simpaly , Is Wget is capable external server to
server transfer ?

Example.
I have http download link of the any file say
http://www.tyr.com/dady.mov
and want to transfer this file to my webhost
directory using browser based control on Wget.

Is it possible?

Nate May 19, 2005 at 1:30 pm

try replacing the at sign with %40, as if it were escaped in a querystring:

ftp://name%40domain.dom:password@hostname.com/myfile

assuming “name@domain.dom” is your username

bwarrene November 16, 2004 at 8:00 pm

[QUOTE=BrianCoogan]Unfortunately the syntax given for authentication is completely wrong according to the manual which gives an example (for user hniksic with password ‘password’) of:

wget ftp://hniksic:mypassword@unix.server.com/.emacs

My question though is, what’s the syntax for wget when the FTP username contains an ‘@’ sign?? This breaks everything horribly! :)

- Brian[/QUOTE]

Actually Brian – I always use just that method to wget from either my OS X machine or a Linux/Unix box. You can also try:

wget ftp://name:password@hostname.com/myfile

BrianCoogan November 16, 2004 at 11:34 am

Unfortunately the syntax given for authentication is completely wrong according to the manual which gives an example (for user hniksic with password ‘password’) of:

wget ftp://hniksic:mypassword@unix.server.com/.emacs

My question though is, what’s the syntax for wget when the FTP username contains an ‘@’ sign?? This breaks everything horribly! :)

- Brian

bwarrene September 29, 2004 at 11:44 pm

Yes – while it is feasible some platforms could hiccup on syntax depending on versioning of the app – the protocol used in the blog post worked on Linux servers I manage running the 2.2, 2.4 and even a daring newer kernel – either in production or development environments.

Simon September 29, 2004 at 9:56 pm

Using standard ftp user/password syntax works, ie:

ftp://user:password@ftp.host.org/xyz

I would hazard a guess that it would work for http as well.

count zero September 13, 2004 at 1:32 pm

I have an apache 1.3x server with some files protected by a .htaccess file–a username and passwd is required for access.

The syntax in the article: luser:passwd@http://foo.bar.com
gives the following error: Authorization failed.
luser:passwd@http://foo.bar.com Unsupported scheme.

The alternate syntax given by man wget:
wget http://foo.bar.com –http-user=user
–http-passwd=password
does work. ???

Gesiel August 19, 2004 at 12:45 am

nice, however you also it could have spoken of option -c (resume)

Bob August 18, 2004 at 2:41 pm

>>Isn’t a GUI for WGET called “a browser”?

Ya, because when you want to download 400 images from a server.. God knows its a whole lot easier to browse them directly, click on the thumbnail, and save them individuall instead of typing a one line command and waiting 2 minutes

Conrad August 18, 2004 at 8:51 am

>>Deep Vacuum http://www.hexcat.com/deepvacuum/index.html offers Mac OS X users a GUI for wget.

? Isn’t a GUI for WGET called “a browser”?

Smit-tay! August 18, 2004 at 4:21 am

WGET can use an http or ftp proxy. See descriptions for the wget specific environment variables: http_proxy, ftp_proxy.; as well as command line options for proxy authentication: –proxy-username and –proxy-passwd

sylozof August 16, 2004 at 1:18 pm

Deep Vacuum http://www.hexcat.com/deepvacuum/index.html offers Mac OS X users a GUI for wget.
It installs both wget (as wget is not installed on Mac OS X by default) and the GUI to use it, so you can choose between the command line and the graphical interface.

MRoderick August 16, 2004 at 11:21 am

Wget is also available as a windows binary in a larger package called UNXUTILS.

http://unxutils.sourceforge.net/

Comments on this entry are closed.