1 This is wget.info, produced by makeinfo version 4.11 from wget.texi.
3 INFO-DIR-SECTION Network Applications
5 * Wget: (wget). The non-interactive network downloader.
8 This file documents the GNU Wget utility for downloading network
11 Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
12 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
14 Permission is granted to copy, distribute and/or modify this document
15 under the terms of the GNU Free Documentation License, Version 1.2 or
16 any later version published by the Free Software Foundation; with no
17 Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
18 copy of the license is included in the section entitled "GNU Free
19 Documentation License".
22 File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
27 This file documents the GNU Wget utility for downloading network data.
29 Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
30 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
32 Permission is granted to copy, distribute and/or modify this document
33 under the terms of the GNU Free Documentation License, Version 1.2 or
34 any later version published by the Free Software Foundation; with no
35 Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
36 copy of the license is included in the section entitled "GNU Free
37 Documentation License".
41 * Overview:: Features of Wget.
42 * Invoking:: Wget command-line arguments.
43 * Recursive Download:: Downloading interlinked pages.
44 * Following Links:: The available methods of chasing links.
45 * Time-Stamping:: Mirroring according to time-stamps.
46 * Startup File:: Wget's initialization file.
47 * Examples:: Examples of usage.
48 * Various:: The stuff that doesn't fit anywhere else.
49 * Appendices:: Some useful references.
50 * Copying this manual:: You may give out copies of this manual.
51 * Concept Index:: Topics covered by this manual.
54 File: wget.info, Node: Overview, Next: Invoking, Prev: Top, Up: Top
59 GNU Wget is a free utility for non-interactive download of files from
60 the Web. It supports HTTP, HTTPS, and FTP protocols, as well as
61 retrieval through HTTP proxies.
63 This chapter is a partial overview of Wget's features.
65 * Wget is non-interactive, meaning that it can work in the
66 background, while the user is not logged on. This allows you to
67 start a retrieval and disconnect from the system, letting Wget
68 finish the work. By contrast, most of the Web browsers require
69 constant user's presence, which can be a great hindrance when
70 transferring a lot of data.
72 * Wget can follow links in HTML, XHTML, and CSS pages, to create
73 local versions of remote web sites, fully recreating the directory
74 structure of the original site. This is sometimes referred to as
75 "recursive downloading." While doing that, Wget respects the Robot
76 Exclusion Standard (`/robots.txt'). Wget can be instructed to
77 convert the links in downloaded files to point at the local files,
80 * File name wildcard matching and recursive mirroring of directories
81 are available when retrieving via FTP. Wget can read the
82 time-stamp information given by both HTTP and FTP servers, and
83 store it locally. Thus Wget can see if the remote file has
84 changed since last retrieval, and automatically retrieve the new
85 version if it has. This makes Wget suitable for mirroring of FTP
86 sites, as well as home pages.
88 * Wget has been designed for robustness over slow or unstable network
89 connections; if a download fails due to a network problem, it will
90 keep retrying until the whole file has been retrieved. If the
91 server supports regetting, it will instruct the server to continue
92 the download from where it left off.
94 * Wget supports proxy servers, which can lighten the network load,
95 speed up retrieval and provide access behind firewalls. Wget uses
96 the passive FTP downloading by default, active FTP being an option.
98 * Wget supports IP version 6, the next generation of IP. IPv6 is
99 autodetected at compile-time, and can be disabled at either build
100 or run time. Binaries built with IPv6 support work well in both
101 IPv4-only and dual family environments.
103 * Built-in features offer mechanisms to tune which links you wish to
104 follow (*note Following Links::).
106 * The progress of individual downloads is traced using a progress
107 gauge. Interactive downloads are tracked using a
108 "thermometer"-style gauge, whereas non-interactive ones are traced
109 with dots, each dot representing a fixed amount of data received
110 (1KB by default). Either gauge can be customized to your
113 * Most of the features are fully configurable, either through
114 command line options, or via the initialization file `.wgetrc'
115 (*note Startup File::). Wget allows you to define "global"
116 startup files (`/usr/local/etc/wgetrc' by default) for site
119 * Finally, GNU Wget is free software. This means that everyone may
120 use it, redistribute it and/or modify it under the terms of the
121 GNU General Public License, as published by the Free Software
122 Foundation (see the file `COPYING' that came with GNU Wget, for
126 File: wget.info, Node: Invoking, Next: Recursive Download, Prev: Overview, Up: Top
131 By default, Wget is very simple to invoke. The basic syntax is:
133 wget [OPTION]... [URL]...
135 Wget will simply download all the URLs specified on the command
136 line. URL is a "Uniform Resource Locator", as defined below.
138 However, you may wish to change some of the default parameters of
139 Wget. You can do it two ways: permanently, adding the appropriate
140 command to `.wgetrc' (*note Startup File::), or specifying it on the
147 * Basic Startup Options::
148 * Logging and Input File Options::
150 * Directory Options::
152 * HTTPS (SSL/TLS) Options::
154 * Recursive Retrieval Options::
155 * Recursive Accept/Reject Options::
159 File: wget.info, Node: URL Format, Next: Option Syntax, Prev: Invoking, Up: Invoking
164 "URL" is an acronym for Uniform Resource Locator. A uniform resource
165 locator is a compact string representation for a resource available via
166 the Internet. Wget recognizes the URL syntax as per RFC1738. This is
167 the most widely used form (square brackets denote optional parts):
169 http://host[:port]/directory/file
170 ftp://host[:port]/directory/file
172 You can also encode your username and password within a URL:
174 ftp://user:password@host/path
175 http://user:password@host/path
177 Either USER or PASSWORD, or both, may be left out. If you leave out
178 either the HTTP username or password, no authentication will be sent.
179 If you leave out the FTP username, `anonymous' will be used. If you
180 leave out the FTP password, your email address will be supplied as a
183 *Important Note*: if you specify a password-containing URL on the
184 command line, the username and password will be plainly visible to all
185 users on the system, by way of `ps'. On multi-user systems, this is a
186 big security risk. To work around it, use `wget -i -' and feed the
187 URLs to Wget's standard input, each on a separate line, terminated by
190 You can encode unsafe characters in a URL as `%xy', `xy' being the
191 hexadecimal representation of the character's ASCII value. Some common
192 unsafe characters include `%' (quoted as `%25'), `:' (quoted as `%3A'),
193 and `@' (quoted as `%40'). Refer to RFC1738 for a comprehensive list
194 of unsafe characters.
196 Wget also supports the `type' feature for FTP URLs. By default, FTP
197 documents are retrieved in the binary mode (type `i'), which means that
198 they are downloaded unchanged. Another useful mode is the `a'
199 ("ASCII") mode, which converts the line delimiters between the
200 different operating systems, and is thus useful for text files. Here
203 ftp://host/directory/file;type=a
205 Two alternative variants of URL specification are also supported,
206 because of historical (hysterical?) reasons and their widespreaded use.
208 FTP-only syntax (supported by `NcFTP'):
211 HTTP-only syntax (introduced by `Netscape'):
214 These two alternative forms are deprecated, and may cease being
215 supported in the future.
217 If you do not understand the difference between these notations, or
218 do not know which one to use, just use the plain ordinary format you use
219 with your favorite browser, like `Lynx' or `Netscape'.
221 ---------- Footnotes ----------
223 (1) If you have a `.netrc' file in your home directory, password
224 will also be searched for there.
227 File: wget.info, Node: Option Syntax, Next: Basic Startup Options, Prev: URL Format, Up: Invoking
232 Since Wget uses GNU getopt to process command-line arguments, every
233 option has a long form along with the short one. Long options are more
234 convenient to remember, but take time to type. You may freely mix
235 different option styles, or specify options after the command-line
236 arguments. Thus you may write:
238 wget -r --tries=10 http://fly.srk.fer.hr/ -o log
240 The space between the option accepting an argument and the argument
241 may be omitted. Instead of `-o log' you can write `-olog'.
243 You may put several options that do not require arguments together,
248 This is completely equivalent to:
252 Since the options can be specified after the arguments, you may
253 terminate them with `--'. So the following will try to download URL
254 `-x', reporting failure to `log':
258 The options that accept comma-separated lists all respect the
259 convention that specifying an empty list clears its value. This can be
260 useful to clear the `.wgetrc' settings. For instance, if your `.wgetrc'
261 sets `exclude_directories' to `/cgi-bin', the following example will
262 first reset it, and then set it to exclude `/~nobody' and `/~somebody'.
263 You can also clear the lists in `.wgetrc' (*note Wgetrc Syntax::).
265 wget -X '' -X /~nobody,/~somebody
267 Most options that do not accept arguments are "boolean" options, so
268 named because their state can be captured with a yes-or-no ("boolean")
269 variable. For example, `--follow-ftp' tells Wget to follow FTP links
270 from HTML files and, on the other hand, `--no-glob' tells it not to
271 perform file globbing on FTP URLs. A boolean option is either
272 "affirmative" or "negative" (beginning with `--no'). All such options
273 share several properties.
275 Unless stated otherwise, it is assumed that the default behavior is
276 the opposite of what the option accomplishes. For example, the
277 documented existence of `--follow-ftp' assumes that the default is to
278 _not_ follow FTP links from HTML pages.
280 Affirmative options can be negated by prepending the `--no-' to the
281 option name; negative options can be negated by omitting the `--no-'
282 prefix. This might seem superfluous--if the default for an affirmative
283 option is to not do something, then why provide a way to explicitly
284 turn it off? But the startup file may in fact change the default. For
285 instance, using `follow_ftp = on' in `.wgetrc' makes Wget _follow_ FTP
286 links by default, and using `--no-follow-ftp' is the only way to
287 restore the factory default from the command line.
290 File: wget.info, Node: Basic Startup Options, Next: Logging and Input File Options, Prev: Option Syntax, Up: Invoking
292 2.3 Basic Startup Options
293 =========================
297 Display the version of Wget.
301 Print a help message describing all of Wget's command-line options.
305 Go to background immediately after startup. If no output file is
306 specified via the `-o', output is redirected to `wget-log'.
310 Execute COMMAND as if it were a part of `.wgetrc' (*note Startup
311 File::). A command thus invoked will be executed _after_ the
312 commands in `.wgetrc', thus taking precedence over them. If you
313 need to specify more than one wgetrc command, use multiple
318 File: wget.info, Node: Logging and Input File Options, Next: Download Options, Prev: Basic Startup Options, Up: Invoking
320 2.4 Logging and Input File Options
321 ==================================
324 `--output-file=LOGFILE'
325 Log all messages to LOGFILE. The messages are normally reported
329 `--append-output=LOGFILE'
330 Append to LOGFILE. This is the same as `-o', only it appends to
331 LOGFILE instead of overwriting the old log file. If LOGFILE does
332 not exist, a new file is created.
336 Turn on debug output, meaning various information important to the
337 developers of Wget if it does not work properly. Your system
338 administrator may have chosen to compile Wget without debug
339 support, in which case `-d' will not work. Please note that
340 compiling with debug support is always safe--Wget compiled with
341 the debug support will _not_ print any debug info unless requested
342 with `-d'. *Note Reporting Bugs::, for more information on how to
343 use `-d' for sending bug reports.
347 Turn off Wget's output.
351 Turn on verbose output, with all the available data. The default
356 Turn off verbose without being completely quiet (use `-q' for
357 that), which means that error messages and basic information still
362 Read URLs from a local or external FILE. If `-' is specified as
363 FILE, URLs are read from the standard input. (Use `./-' to read
364 from a file literally named `-'.)
366 If this function is used, no URLs need be present on the command
367 line. If there are URLs both on the command line and in an input
368 file, those on the command lines will be the first ones to be
369 retrieved. If `--force-html' is not specified, then FILE should
370 consist of a series of URLs, one per line.
372 However, if you specify `--force-html', the document will be
373 regarded as `html'. In that case you may have problems with
374 relative links, which you can solve either by adding `<base
375 href="URL">' to the documents or by specifying `--base=URL' on the
378 If the FILE is an external one, the document will be automatically
379 treated as `html' if the Content-Type matches `text/html'.
380 Furthermore, the FILE's location will be implicitly used as base
381 href if none was specified.
385 When input is read from a file, force it to be treated as an HTML
386 file. This enables you to retrieve relative links from existing
387 HTML files on your local disk, by adding `<base href="URL">' to
388 HTML, or using the `--base' command-line option.
392 Resolves relative links using URL as the point of reference, when
393 reading links from an HTML file specified via the
394 `-i'/`--input-file' option (together with `--force-html', or when
395 the input file was fetched remotely from a server describing it as
396 HTML). This is equivalent to the presence of a `BASE' tag in the
397 HTML input file, with URL as the value for the `href' attribute.
399 For instance, if you specify `http://foo/bar/a.html' for URL, and
400 Wget reads `../baz/b.html' from the input file, it would be
401 resolved to `http://foo/baz/b.html'.
404 File: wget.info, Node: Download Options, Next: Directory Options, Prev: Logging and Input File Options, Up: Invoking
409 `--bind-address=ADDRESS'
410 When making client TCP/IP connections, bind to ADDRESS on the
411 local machine. ADDRESS may be specified as a hostname or IP
412 address. This option can be useful if your machine is bound to
417 Set number of retries to NUMBER. Specify 0 or `inf' for infinite
418 retrying. The default is to retry 20 times, with the exception of
419 fatal errors like "connection refused" or "not found" (404), which
423 `--output-document=FILE'
424 The documents will not be written to the appropriate files, but all
425 will be concatenated together and written to FILE. If `-' is used
426 as FILE, documents will be printed to standard output, disabling
427 link conversion. (Use `./-' to print to a file literally named
430 Use of `-O' is _not_ intended to mean simply "use the name FILE
431 instead of the one in the URL;" rather, it is analogous to shell
432 redirection: `wget -O file http://foo' is intended to work like
433 `wget -O - http://foo > file'; `file' will be truncated
434 immediately, and _all_ downloaded content will be written there.
436 For this reason, `-N' (for timestamp-checking) is not supported in
437 combination with `-O': since FILE is always newly created, it will
438 always have a very new timestamp. A warning will be issued if this
441 Similarly, using `-r' or `-p' with `-O' may not work as you
442 expect: Wget won't just download the first file to FILE and then
443 download the rest to their normal names: _all_ downloaded content
444 will be placed in FILE. This was disabled in version 1.11, but has
445 been reinstated (with a warning) in 1.11.2, as there are some
446 cases where this behavior can actually have some use.
448 Note that a combination with `-k' is only permitted when
449 downloading a single document, as in that case it will just convert
450 all relative URIs to external ones; `-k' makes no sense for
451 multiple URIs when they're all being downloaded to a single file.
455 If a file is downloaded more than once in the same directory,
456 Wget's behavior depends on a few options, including `-nc'. In
457 certain cases, the local file will be "clobbered", or overwritten,
458 upon repeated download. In other cases it will be preserved.
460 When running Wget without `-N', `-nc', `-r', or `-p', downloading
461 the same file in the same directory will result in the original
462 copy of FILE being preserved and the second copy being named
463 `FILE.1'. If that file is downloaded yet again, the third copy
464 will be named `FILE.2', and so on. (This is also the behavior
465 with `-nd', even if `-r' or `-p' are in effect.) When `-nc' is
466 specified, this behavior is suppressed, and Wget will refuse to
467 download newer copies of `FILE'. Therefore, "`no-clobber'" is
468 actually a misnomer in this mode--it's not clobbering that's
469 prevented (as the numeric suffixes were already preventing
470 clobbering), but rather the multiple version saving that's
473 When running Wget with `-r' or `-p', but without `-N', `-nd', or
474 `-nc', re-downloading a file will result in the new copy simply
475 overwriting the old. Adding `-nc' will prevent this behavior,
476 instead causing the original version to be preserved and any newer
477 copies on the server to be ignored.
479 When running Wget with `-N', with or without `-r' or `-p', the
480 decision as to whether or not to download a newer copy of a file
481 depends on the local and remote timestamp and size of the file
482 (*note Time-Stamping::). `-nc' may not be specified at the same
485 Note that when `-nc' is specified, files with the suffixes `.html'
486 or `.htm' will be loaded from the local disk and parsed as if they
487 had been retrieved from the Web.
491 Continue getting a partially-downloaded file. This is useful when
492 you want to finish up a download started by a previous instance of
493 Wget, or by another program. For instance:
495 wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
497 If there is a file named `ls-lR.Z' in the current directory, Wget
498 will assume that it is the first portion of the remote file, and
499 will ask the server to continue the retrieval from an offset equal
500 to the length of the local file.
502 Note that you don't need to specify this option if you just want
503 the current invocation of Wget to retry downloading a file should
504 the connection be lost midway through. This is the default
505 behavior. `-c' only affects resumption of downloads started
506 _prior_ to this invocation of Wget, and whose local files are
507 still sitting around.
509 Without `-c', the previous example would just download the remote
510 file to `ls-lR.Z.1', leaving the truncated `ls-lR.Z' file alone.
512 Beginning with Wget 1.7, if you use `-c' on a non-empty file, and
513 it turns out that the server does not support continued
514 downloading, Wget will refuse to start the download from scratch,
515 which would effectively ruin existing contents. If you really
516 want the download to start from scratch, remove the file.
518 Also beginning with Wget 1.7, if you use `-c' on a file which is of
519 equal size as the one on the server, Wget will refuse to download
520 the file and print an explanatory message. The same happens when
521 the file is smaller on the server than locally (presumably because
522 it was changed on the server since your last download
523 attempt)--because "continuing" is not meaningful, no download
526 On the other side of the coin, while using `-c', any file that's
527 bigger on the server than locally will be considered an incomplete
528 download and only `(length(remote) - length(local))' bytes will be
529 downloaded and tacked onto the end of the local file. This
530 behavior can be desirable in certain cases--for instance, you can
531 use `wget -c' to download just the new portion that's been
532 appended to a data collection or log file.
534 However, if the file is bigger on the server because it's been
535 _changed_, as opposed to just _appended_ to, you'll end up with a
536 garbled file. Wget has no way of verifying that the local file is
537 really a valid prefix of the remote file. You need to be
538 especially careful of this when using `-c' in conjunction with
539 `-r', since every file will be considered as an "incomplete
542 Another instance where you'll get a garbled file if you try to use
543 `-c' is if you have a lame HTTP proxy that inserts a "transfer
544 interrupted" string into the local file. In the future a
545 "rollback" option may be added to deal with this case.
547 Note that `-c' only works with FTP servers and with HTTP servers
548 that support the `Range' header.
551 Select the type of the progress indicator you wish to use. Legal
552 indicators are "dot" and "bar".
554 The "bar" indicator is used by default. It draws an ASCII progress
555 bar graphics (a.k.a "thermometer" display) indicating the status of
556 retrieval. If the output is not a TTY, the "dot" bar will be used
559 Use `--progress=dot' to switch to the "dot" display. It traces
560 the retrieval by printing dots on the screen, each dot
561 representing a fixed amount of downloaded data.
563 When using the dotted retrieval, you may also set the "style" by
564 specifying the type as `dot:STYLE'. Different styles assign
565 different meaning to one dot. With the `default' style each dot
566 represents 1K, there are ten dots in a cluster and 50 dots in a
567 line. The `binary' style has a more "computer"-like
568 orientation--8K dots, 16-dots clusters and 48 dots per line (which
569 makes for 384K lines). The `mega' style is suitable for
570 downloading very large files--each dot represents 64K retrieved,
571 there are eight dots in a cluster, and 48 dots on each line (so
572 each line contains 3M).
574 Note that you can set the default style using the `progress'
575 command in `.wgetrc'. That setting may be overridden from the
576 command line. The exception is that, when the output is not a
577 TTY, the "dot" progress will be favored over "bar". To force the
578 bar output, use `--progress=bar:force'.
582 Turn on time-stamping. *Note Time-Stamping::, for details.
586 Print the headers sent by HTTP servers and responses sent by FTP
590 When invoked with this option, Wget will behave as a Web "spider",
591 which means that it will not download the pages, just check that
592 they are there. For example, you can use Wget to check your
595 wget --spider --force-html -i bookmarks.html
597 This feature needs much more work for Wget to get close to the
598 functionality of real web spiders.
602 Set the network timeout to SECONDS seconds. This is equivalent to
603 specifying `--dns-timeout', `--connect-timeout', and
604 `--read-timeout', all at the same time.
606 When interacting with the network, Wget can check for timeout and
607 abort the operation if it takes too long. This prevents anomalies
608 like hanging reads and infinite connects. The only timeout
609 enabled by default is a 900-second read timeout. Setting a
610 timeout to 0 disables it altogether. Unless you know what you are
611 doing, it is best not to change the default timeout settings.
613 All timeout-related options accept decimal values, as well as
614 subsecond values. For example, `0.1' seconds is a legal (though
615 unwise) choice of timeout. Subsecond timeouts are useful for
616 checking server response times or for testing network latency.
618 `--dns-timeout=SECONDS'
619 Set the DNS lookup timeout to SECONDS seconds. DNS lookups that
620 don't complete within the specified time will fail. By default,
621 there is no timeout on DNS lookups, other than that implemented by
624 `--connect-timeout=SECONDS'
625 Set the connect timeout to SECONDS seconds. TCP connections that
626 take longer to establish will be aborted. By default, there is no
627 connect timeout, other than that implemented by system libraries.
629 `--read-timeout=SECONDS'
630 Set the read (and write) timeout to SECONDS seconds. The "time"
631 of this timeout refers to "idle time": if, at any point in the
632 download, no data is received for more than the specified number
633 of seconds, reading fails and the download is restarted. This
634 option does not directly affect the duration of the entire
637 Of course, the remote server may choose to terminate the connection
638 sooner than this option requires. The default read timeout is 900
641 `--limit-rate=AMOUNT'
642 Limit the download speed to AMOUNT bytes per second. Amount may
643 be expressed in bytes, kilobytes with the `k' suffix, or megabytes
644 with the `m' suffix. For example, `--limit-rate=20k' will limit
645 the retrieval rate to 20KB/s. This is useful when, for whatever
646 reason, you don't want Wget to consume the entire available
649 This option allows the use of decimal numbers, usually in
650 conjunction with power suffixes; for example, `--limit-rate=2.5k'
653 Note that Wget implements the limiting by sleeping the appropriate
654 amount of time after a network read that took less time than
655 specified by the rate. Eventually this strategy causes the TCP
656 transfer to slow down to approximately the specified rate.
657 However, it may take some time for this balance to be achieved, so
658 don't be surprised if limiting the rate doesn't work well with
663 Wait the specified number of seconds between the retrievals. Use
664 of this option is recommended, as it lightens the server load by
665 making the requests less frequent. Instead of in seconds, the
666 time can be specified in minutes using the `m' suffix, in hours
667 using `h' suffix, or in days using `d' suffix.
669 Specifying a large value for this option is useful if the network
670 or the destination host is down, so that Wget can wait long enough
671 to reasonably expect the network error to be fixed before the
672 retry. The waiting interval specified by this function is
673 influenced by `--random-wait', which see.
675 `--waitretry=SECONDS'
676 If you don't want Wget to wait between _every_ retrieval, but only
677 between retries of failed downloads, you can use this option.
678 Wget will use "linear backoff", waiting 1 second after the first
679 failure on a given file, then waiting 2 seconds after the second
680 failure on that file, up to the maximum number of SECONDS you
681 specify. Therefore, a value of 10 will actually make Wget wait up
682 to (1 + 2 + ... + 10) = 55 seconds per file.
684 By default, Wget will assume a value of 10 seconds.
687 Some web sites may perform log analysis to identify retrieval
688 programs such as Wget by looking for statistically significant
689 similarities in the time between requests. This option causes the
690 time between requests to vary between 0.5 and 1.5 * WAIT seconds,
691 where WAIT was specified using the `--wait' option, in order to
692 mask Wget's presence from such analysis.
694 A 2001 article in a publication devoted to development on a popular
695 consumer platform provided code to perform this analysis on the
696 fly. Its author suggested blocking at the class C address level
697 to ensure automated retrieval programs were blocked despite
698 changing DHCP-supplied addresses.
700 The `--random-wait' option was inspired by this ill-advised
701 recommendation to block many unrelated users from a web site due
702 to the actions of one.
705 Don't use proxies, even if the appropriate `*_proxy' environment
708 For more information about the use of proxies with Wget, *Note
713 Specify download quota for automatic retrievals. The value can be
714 specified in bytes (default), kilobytes (with `k' suffix), or
715 megabytes (with `m' suffix).
717 Note that quota will never affect downloading a single file. So
718 if you specify `wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz',
719 all of the `ls-lR.gz' will be downloaded. The same goes even when
720 several URLs are specified on the command-line. However, quota is
721 respected when retrieving either recursively, or from an input
722 file. Thus you may safely type `wget -Q2m -i sites'--download
723 will be aborted when the quota is exceeded.
725 Setting quota to 0 or to `inf' unlimits the download quota.
728 Turn off caching of DNS lookups. Normally, Wget remembers the IP
729 addresses it looked up from DNS so it doesn't have to repeatedly
730 contact the DNS server for the same (typically small) set of hosts
731 it retrieves from. This cache exists in memory only; a new Wget
732 run will contact DNS again.
734 However, it has been reported that in some situations it is not
735 desirable to cache host names, even for the duration of a
736 short-running application like Wget. With this option Wget issues
737 a new DNS lookup (more precisely, a new call to `gethostbyname' or
738 `getaddrinfo') each time it makes a new connection. Please note
739 that this option will _not_ affect caching that might be performed
740 by the resolving library or by an external caching layer, such as
743 If you don't understand exactly what this option does, you probably
746 `--restrict-file-names=MODES'
747 Change which characters found in remote URLs must be escaped during
748 generation of local filenames. Characters that are "restricted"
749 by this option are escaped, i.e. replaced with `%HH', where `HH'
750 is the hexadecimal number that corresponds to the restricted
751 character. This option may also be used to force all alphabetical
752 cases to be either lower- or uppercase.
754 By default, Wget escapes the characters that are not valid or safe
755 as part of file names on your operating system, as well as control
756 characters that are typically unprintable. This option is useful
757 for changing these defaults, perhaps because you are downloading
758 to a non-native partition, or because you want to disable escaping
759 of the control characters, or you want to further restrict
760 characters to only those in the ASCII range of values.
762 The MODES are a comma-separated set of text values. The acceptable
763 values are `unix', `windows', `nocontrol', `ascii', `lowercase',
764 and `uppercase'. The values `unix' and `windows' are mutually
765 exclusive (one will override the other), as are `lowercase' and
766 `uppercase'. Those last are special cases, as they do not change
767 the set of characters that would be escaped, but rather force local
768 file paths to be converted either to lower- or uppercase.
770 When "unix" is specified, Wget escapes the character `/' and the
771 control characters in the ranges 0-31 and 128-159. This is the
772 default on Unix-like operating systems.
774 When "windows" is given, Wget escapes the characters `\', `|',
775 `/', `:', `?', `"', `*', `<', `>', and the control characters in
776 the ranges 0-31 and 128-159. In addition to this, Wget in Windows
777 mode uses `+' instead of `:' to separate host and port in local
778 file names, and uses `@' instead of `?' to separate the query
779 portion of the file name from the rest. Therefore, a URL that
780 would be saved as `www.xemacs.org:4300/search.pl?input=blah' in
781 Unix mode would be saved as
782 `www.xemacs.org+4300/search.pl@input=blah' in Windows mode. This
783 mode is the default on Windows.
785 If you specify `nocontrol', then the escaping of the control
786 characters is also switched off. This option may make sense when
787 you are downloading URLs whose names contain UTF-8 characters, on
788 a system which can save and display filenames in UTF-8 (some
789 possible byte values used in UTF-8 byte sequences fall in the
790 range of values designated by Wget as "controls").
792 The `ascii' mode is used to specify that any bytes whose values
793 are outside the range of ASCII characters (that is, greater than
794 127) shall be escaped. This can be useful when saving filenames
795 whose encoding does not match the one used locally.
801 Force connecting to IPv4 or IPv6 addresses. With `--inet4-only'
802 or `-4', Wget will only connect to IPv4 hosts, ignoring AAAA
803 records in DNS, and refusing to connect to IPv6 addresses
804 specified in URLs. Conversely, with `--inet6-only' or `-6', Wget
805 will only connect to IPv6 hosts and ignore A records and IPv4
808 Neither options should be needed normally. By default, an
809 IPv6-aware Wget will use the address family specified by the
810 host's DNS record. If the DNS responds with both IPv4 and IPv6
811 addresses, Wget will try them in sequence until it finds one it
812 can connect to. (Also see `--prefer-family' option described
815 These options can be used to deliberately force the use of IPv4 or
816 IPv6 address families on dual family systems, usually to aid
817 debugging or to deal with broken network configuration. Only one
818 of `--inet6-only' and `--inet4-only' may be specified at the same
819 time. Neither option is available in Wget compiled without IPv6
822 `--prefer-family=none/IPv4/IPv6'
823 When given a choice of several addresses, connect to the addresses
824 with specified address family first. The address order returned by
825 DNS is used without change by default.
827 This avoids spurious errors and connect attempts when accessing
828 hosts that resolve to both IPv6 and IPv4 addresses from IPv4
829 networks. For example, `www.kame.net' resolves to
830 `2001:200:0:8002:203:47ff:fea5:3085' and to `203.178.141.194'.
831 When the preferred family is `IPv4', the IPv4 address is used
832 first; when the preferred family is `IPv6', the IPv6 address is
833 used first; if the specified value is `none', the address order
834 returned by DNS is used without change.
836 Unlike `-4' and `-6', this option doesn't inhibit access to any
837 address family, it only changes the _order_ in which the addresses
838 are accessed. Also note that the reordering performed by this
839 option is "stable"--it doesn't affect order of addresses of the
840 same family. That is, the relative order of all IPv4 addresses
841 and of all IPv6 addresses remains intact in all cases.
843 `--retry-connrefused'
844 Consider "connection refused" a transient error and try again.
845 Normally Wget gives up on a URL when it is unable to connect to the
846 site because failure to connect is taken as a sign that the server
847 is not running at all and that retries would not help. This
848 option is for mirroring unreliable sites whose servers tend to
849 disappear for short periods of time.
852 `--password=PASSWORD'
853 Specify the username USER and password PASSWORD for both FTP and
854 HTTP file retrieval. These parameters can be overridden using the
855 `--ftp-user' and `--ftp-password' options for FTP connections and
856 the `--http-user' and `--http-password' options for HTTP
860 Prompt for a password for each connection established. Cannot be
861 specified when `--password' is being used, because they are
865 Turn off internationalized URI (IRI) support. Use `--iri' to turn
866 it on. IRI support is activated by default.
868 You can set the default state of IRI support using the `iri'
869 command in `.wgetrc'. That setting may be overridden from the
872 `--local-encoding=ENCODING'
873 Force Wget to use ENCODING as the default system encoding. That
874 affects how Wget converts URLs specified as arguments from locale
875 to UTF-8 for IRI support.
877 Wget use the function `nl_langinfo()' and then the `CHARSET'
878 environment variable to get the locale. If it fails, ASCII is used.
880 You can set the default local encoding using the `local_encoding'
881 command in `.wgetrc'. That setting may be overridden from the
884 `--remote-encoding=ENCODING'
885 Force Wget to use ENCODING as the default remote server encoding.
886 That affects how Wget converts URIs found in files from remote
887 encoding to UTF-8 during a recursive fetch. This options is only
888 useful for IRI support, for the interpretation of non-ASCII
891 For HTTP, remote encoding can be found in HTTP `Content-Type'
892 header and in HTML `Content-Type http-equiv' meta tag.
894 You can set the default encoding using the `remoteencoding'
895 command in `.wgetrc'. That setting may be overridden from the
899 File: wget.info, Node: Directory Options, Next: HTTP Options, Prev: Download Options, Up: Invoking
901 2.6 Directory Options
902 =====================
906 Do not create a hierarchy of directories when retrieving
907 recursively. With this option turned on, all files will get saved
908 to the current directory, without clobbering (if a name shows up
909 more than once, the filenames will get extensions `.n').
912 `--force-directories'
913 The opposite of `-nd'--create a hierarchy of directories, even if
914 one would not have been created otherwise. E.g. `wget -x
915 http://fly.srk.fer.hr/robots.txt' will save the downloaded file to
916 `fly.srk.fer.hr/robots.txt'.
919 `--no-host-directories'
920 Disable generation of host-prefixed directories. By default,
921 invoking Wget with `-r http://fly.srk.fer.hr/' will create a
922 structure of directories beginning with `fly.srk.fer.hr/'. This
923 option disables such behavior.
925 `--protocol-directories'
926 Use the protocol name as a directory component of local file
927 names. For example, with this option, `wget -r http://HOST' will
928 save to `http/HOST/...' rather than just to `HOST/...'.
931 Ignore NUMBER directory components. This is useful for getting a
932 fine-grained control over the directory where recursive retrieval
935 Take, for example, the directory at
936 `ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with `-r',
937 it will be saved locally under `ftp.xemacs.org/pub/xemacs/'.
938 While the `-nH' option can remove the `ftp.xemacs.org/' part, you
939 are still stuck with `pub/xemacs'. This is where `--cut-dirs'
940 comes in handy; it makes Wget not "see" NUMBER remote directory
941 components. Here are several examples of how `--cut-dirs' option
944 No options -> ftp.xemacs.org/pub/xemacs/
946 -nH --cut-dirs=1 -> xemacs/
947 -nH --cut-dirs=2 -> .
949 --cut-dirs=1 -> ftp.xemacs.org/xemacs/
952 If you just want to get rid of the directory structure, this
953 option is similar to a combination of `-nd' and `-P'. However,
954 unlike `-nd', `--cut-dirs' does not lose with subdirectories--for
955 instance, with `-nH --cut-dirs=1', a `beta/' subdirectory will be
956 placed to `xemacs/beta', as one would expect.
959 `--directory-prefix=PREFIX'
960 Set directory prefix to PREFIX. The "directory prefix" is the
961 directory where all other files and subdirectories will be saved
962 to, i.e. the top of the retrieval tree. The default is `.' (the
966 File: wget.info, Node: HTTP Options, Next: HTTPS (SSL/TLS) Options, Prev: Directory Options, Up: Invoking
971 `--default-page=NAME'
972 Use NAME as the default file name when it isn't known (i.e., for
973 URLs that end in a slash), instead of `index.html'.
977 If a file of type `application/xhtml+xml' or `text/html' is
978 downloaded and the URL does not end with the regexp
979 `\.[Hh][Tt][Mm][Ll]?', this option will cause the suffix `.html'
980 to be appended to the local filename. This is useful, for
981 instance, when you're mirroring a remote site that uses `.asp'
982 pages, but you want the mirrored pages to be viewable on your
983 stock Apache server. Another good use for this is when you're
984 downloading CGI-generated materials. A URL like
985 `http://site.com/article.cgi?25' will be saved as
986 `article.cgi?25.html'.
988 Note that filenames changed in this way will be re-downloaded
989 every time you re-mirror a site, because Wget can't tell that the
990 local `X.html' file corresponds to remote URL `X' (since it
991 doesn't yet know that the URL produces output of type `text/html'
992 or `application/xhtml+xml'. To prevent this re-downloading, you
993 must use `-k' and `-K' so that the original version of the file
994 will be saved as `X.orig' (*note Recursive Retrieval Options::).
996 As of version 1.12, Wget will also ensure that any downloaded
997 files of type `text/css' end in the suffix `.css', and the option
998 was renamed from `--html-extension', to better reflect its new
999 behavior. The old option name is still acceptable, but should now
1000 be considered deprecated.
1002 At some point in the future, this option may well be expanded to
1003 include suffixes for other types of content, including content
1004 types that are not parsed by Wget.
1007 `--http-password=PASSWORD'
1008 Specify the username USER and password PASSWORD on an HTTP server.
1009 According to the type of the challenge, Wget will encode them
1010 using either the `basic' (insecure), the `digest', or the Windows
1011 `NTLM' authentication scheme.
1013 Another way to specify username and password is in the URL itself
1014 (*note URL Format::). Either method reveals your password to
1015 anyone who bothers to run `ps'. To prevent the passwords from
1016 being seen, store them in `.wgetrc' or `.netrc', and make sure to
1017 protect those files from other users with `chmod'. If the
1018 passwords are really important, do not leave them lying in those
1019 files either--edit the files and delete them after Wget has
1020 started the download.
1022 `--no-http-keep-alive'
1023 Turn off the "keep-alive" feature for HTTP downloads. Normally,
1024 Wget asks the server to keep the connection open so that, when you
1025 download more than one document from the same server, they get
1026 transferred over the same TCP connection. This saves time and at
1027 the same time reduces the load on the server.
1029 This option is useful when, for some reason, persistent
1030 (keep-alive) connections don't work for you, for example due to a
1031 server bug or due to the inability of server-side scripts to cope
1032 with the connections.
1035 Disable server-side cache. In this case, Wget will send the remote
1036 server an appropriate directive (`Pragma: no-cache') to get the
1037 file from the remote service, rather than returning the cached
1038 version. This is especially useful for retrieving and flushing
1039 out-of-date documents on proxy servers.
1041 Caching is allowed by default.
1044 Disable the use of cookies. Cookies are a mechanism for
1045 maintaining server-side state. The server sends the client a
1046 cookie using the `Set-Cookie' header, and the client responds with
1047 the same cookie upon further requests. Since cookies allow the
1048 server owners to keep track of visitors and for sites to exchange
1049 this information, some consider them a breach of privacy. The
1050 default is to use cookies; however, _storing_ cookies is not on by
1053 `--load-cookies FILE'
1054 Load cookies from FILE before the first HTTP retrieval. FILE is a
1055 textual file in the format originally used by Netscape's
1058 You will typically use this option when mirroring sites that
1059 require that you be logged in to access some or all of their
1060 content. The login process typically works by the web server
1061 issuing an HTTP cookie upon receiving and verifying your
1062 credentials. The cookie is then resent by the browser when
1063 accessing that part of the site, and so proves your identity.
1065 Mirroring such a site requires Wget to send the same cookies your
1066 browser sends when communicating with the site. This is achieved
1067 by `--load-cookies'--simply point Wget to the location of the
1068 `cookies.txt' file, and it will send the same cookies your browser
1069 would send in the same situation. Different browsers keep textual
1070 cookie files in different locations:
1073 The cookies are in `~/.netscape/cookies.txt'.
1075 Mozilla and Netscape 6.x.
1076 Mozilla's cookie file is also named `cookies.txt', located
1077 somewhere under `~/.mozilla', in the directory of your
1078 profile. The full path usually ends up looking somewhat like
1079 `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'.
1082 You can produce a cookie file Wget can use by using the File
1083 menu, Import and Export, Export Cookies. This has been
1084 tested with Internet Explorer 5; it is not guaranteed to work
1085 with earlier versions.
1088 If you are using a different browser to create your cookies,
1089 `--load-cookies' will only work if you can locate or produce a
1090 cookie file in the Netscape format that Wget expects.
1092 If you cannot use `--load-cookies', there might still be an
1093 alternative. If your browser supports a "cookie manager", you can
1094 use it to view the cookies used when accessing the site you're
1095 mirroring. Write down the name and value of the cookie, and
1096 manually instruct Wget to send those cookies, bypassing the
1097 "official" cookie support:
1099 wget --no-cookies --header "Cookie: NAME=VALUE"
1101 `--save-cookies FILE'
1102 Save cookies to FILE before exiting. This will not save cookies
1103 that have expired or that have no expiry time (so-called "session
1104 cookies"), but also see `--keep-session-cookies'.
1106 `--keep-session-cookies'
1107 When specified, causes `--save-cookies' to also save session
1108 cookies. Session cookies are normally not saved because they are
1109 meant to be kept in memory and forgotten when you exit the browser.
1110 Saving them is useful on sites that require you to log in or to
1111 visit the home page before you can access some pages. With this
1112 option, multiple Wget runs are considered a single browser session
1113 as far as the site is concerned.
1115 Since the cookie file format does not normally carry session
1116 cookies, Wget marks them with an expiry timestamp of 0. Wget's
1117 `--load-cookies' recognizes those as session cookies, but it might
1118 confuse other browsers. Also note that cookies so loaded will be
1119 treated as other session cookies, which means that if you want
1120 `--save-cookies' to preserve them again, you must use
1121 `--keep-session-cookies' again.
1124 Unfortunately, some HTTP servers (CGI programs, to be more
1125 precise) send out bogus `Content-Length' headers, which makes Wget
1126 go wild, as it thinks not all the document was retrieved. You can
1127 spot this syndrome if Wget retries getting the same document again
1128 and again, each time claiming that the (otherwise normal)
1129 connection has closed on the very same byte.
1131 With this option, Wget will ignore the `Content-Length' header--as
1132 if it never existed.
1134 `--header=HEADER-LINE'
1135 Send HEADER-LINE along with the rest of the headers in each HTTP
1136 request. The supplied header is sent as-is, which means it must
1137 contain name and value separated by colon, and must not contain
1140 You may define more than one additional header by specifying
1141 `--header' more than once.
1143 wget --header='Accept-Charset: iso-8859-2' \
1144 --header='Accept-Language: hr' \
1145 http://fly.srk.fer.hr/
1147 Specification of an empty string as the header value will clear all
1148 previous user-defined headers.
1150 As of Wget 1.10, this option can be used to override headers
1151 otherwise generated automatically. This example instructs Wget to
1152 connect to localhost, but to specify `foo.bar' in the `Host'
1155 wget --header="Host: foo.bar" http://localhost/
1157 In versions of Wget prior to 1.10 such use of `--header' caused
1158 sending of duplicate headers.
1160 `--max-redirect=NUMBER'
1161 Specifies the maximum number of redirections to follow for a
1162 resource. The default is 20, which is usually far more than
1163 necessary. However, on those occasions where you want to allow
1164 more (or fewer), this is the option to use.
1167 `--proxy-password=PASSWORD'
1168 Specify the username USER and password PASSWORD for authentication
1169 on a proxy server. Wget will encode them using the `basic'
1170 authentication scheme.
1172 Security considerations similar to those with `--http-password'
1173 pertain here as well.
1176 Include `Referer: URL' header in HTTP request. Useful for
1177 retrieving documents with server-side processing that assume they
1178 are always being retrieved by interactive web browsers and only
1179 come out properly when Referer is set to one of the pages that
1183 Save the headers sent by the HTTP server to the file, preceding the
1184 actual contents, with an empty line as the separator.
1187 `--user-agent=AGENT-STRING'
1188 Identify as AGENT-STRING to the HTTP server.
1190 The HTTP protocol allows the clients to identify themselves using a
1191 `User-Agent' header field. This enables distinguishing the WWW
1192 software, usually for statistical purposes or for tracing of
1193 protocol violations. Wget normally identifies as `Wget/VERSION',
1194 VERSION being the current version number of Wget.
1196 However, some sites have been known to impose the policy of
1197 tailoring the output according to the `User-Agent'-supplied
1198 information. While this is not such a bad idea in theory, it has
1199 been abused by servers denying information to clients other than
1200 (historically) Netscape or, more frequently, Microsoft Internet
1201 Explorer. This option allows you to change the `User-Agent' line
1202 issued by Wget. Use of this option is discouraged, unless you
1203 really know what you are doing.
1205 Specifying empty user agent with `--user-agent=""' instructs Wget
1206 not to send the `User-Agent' header in HTTP requests.
1208 `--post-data=STRING'
1210 Use POST as the method for all HTTP requests and send the specified
1211 data in the request body. `--post-data' sends STRING as data,
1212 whereas `--post-file' sends the contents of FILE. Other than
1213 that, they work in exactly the same way. In particular, they
1214 _both_ expect content of the form `key1=value1&key2=value2', with
1215 percent-encoding for special characters; the only difference is
1216 that one expects its content as a command-line paramter and the
1217 other accepts its content from a file. In particular,
1218 `--post-file' is _not_ for transmitting files as form attachments:
1219 those must appear as `key=value' data (with appropriate
1220 percent-coding) just like everything else. Wget does not currently
1221 support `multipart/form-data' for transmitting POST data; only
1222 `application/x-www-form-urlencoded'. Only one of `--post-data' and
1223 `--post-file' should be specified.
1225 Please be aware that Wget needs to know the size of the POST data
1226 in advance. Therefore the argument to `--post-file' must be a
1227 regular file; specifying a FIFO or something like `/dev/stdin'
1228 won't work. It's not quite clear how to work around this
1229 limitation inherent in HTTP/1.0. Although HTTP/1.1 introduces
1230 "chunked" transfer that doesn't require knowing the request length
1231 in advance, a client can't use chunked unless it knows it's
1232 talking to an HTTP/1.1 server. And it can't know that until it
1233 receives a response, which in turn requires the request to have
1234 been completed - a chicken-and-egg problem.
1236 Note: if Wget is redirected after the POST request is completed, it
1237 will not send the POST data to the redirected URL. This is because
1238 URLs that process POST often respond with a redirection to a
1239 regular page, which does not desire or accept POST. It is not
1240 completely clear that this behavior is optimal; if it doesn't work
1241 out, it might be changed in the future.
1243 This example shows how to log to a server using POST and then
1244 proceed to download the desired pages, presumably only accessible
1245 to authorized users:
1247 # Log in to the server. This can be done only once.
1248 wget --save-cookies cookies.txt \
1249 --post-data 'user=foo&password=bar' \
1250 http://server.com/auth.php
1252 # Now grab the page or pages we care about.
1253 wget --load-cookies cookies.txt \
1254 -p http://server.com/interesting/article.php
1256 If the server is using session cookies to track user
1257 authentication, the above will not work because `--save-cookies'
1258 will not save them (and neither will browsers) and the
1259 `cookies.txt' file will be empty. In that case use
1260 `--keep-session-cookies' along with `--save-cookies' to force
1261 saving of session cookies.
1263 `--content-disposition'
1264 If this is set to on, experimental (not fully-functional) support
1265 for `Content-Disposition' headers is enabled. This can currently
1266 result in extra round-trips to the server for a `HEAD' request,
1267 and is known to suffer from a few bugs, which is why it is not
1268 currently enabled by default.
1270 This option is useful for some file-downloading CGI programs that
1271 use `Content-Disposition' headers to describe what the name of a
1272 downloaded file should be.
1274 `--auth-no-challenge'
1275 If this option is given, Wget will send Basic HTTP authentication
1276 information (plaintext username and password) for all requests,
1277 just like Wget 1.10.2 and prior did by default.
1279 Use of this option is not recommended, and is intended only to
1280 support some few obscure servers, which never send HTTP
1281 authentication challenges, but accept unsolicited auth info, say,
1282 in addition to form-based authentication.
1286 File: wget.info, Node: HTTPS (SSL/TLS) Options, Next: FTP Options, Prev: HTTP Options, Up: Invoking
1288 2.8 HTTPS (SSL/TLS) Options
1289 ===========================
1291 To support encrypted HTTP (HTTPS) downloads, Wget must be compiled with
1292 an external SSL library, currently OpenSSL. If Wget is compiled
1293 without SSL support, none of these options are available.
1295 `--secure-protocol=PROTOCOL'
1296 Choose the secure protocol to be used. Legal values are `auto',
1297 `SSLv2', `SSLv3', and `TLSv1'. If `auto' is used, the SSL library
1298 is given the liberty of choosing the appropriate protocol
1299 automatically, which is achieved by sending an SSLv2 greeting and
1300 announcing support for SSLv3 and TLSv1. This is the default.
1302 Specifying `SSLv2', `SSLv3', or `TLSv1' forces the use of the
1303 corresponding protocol. This is useful when talking to old and
1304 buggy SSL server implementations that make it hard for OpenSSL to
1305 choose the correct protocol version. Fortunately, such servers are
1308 `--no-check-certificate'
1309 Don't check the server certificate against the available
1310 certificate authorities. Also don't require the URL host name to
1311 match the common name presented by the certificate.
1313 As of Wget 1.10, the default is to verify the server's certificate
1314 against the recognized certificate authorities, breaking the SSL
1315 handshake and aborting the download if the verification fails.
1316 Although this provides more secure downloads, it does break
1317 interoperability with some sites that worked with previous Wget
1318 versions, particularly those using self-signed, expired, or
1319 otherwise invalid certificates. This option forces an "insecure"
1320 mode of operation that turns the certificate verification errors
1321 into warnings and allows you to proceed.
1323 If you encounter "certificate verification" errors or ones saying
1324 that "common name doesn't match requested host name", you can use
1325 this option to bypass the verification and proceed with the
1326 download. _Only use this option if you are otherwise convinced of
1327 the site's authenticity, or if you really don't care about the
1328 validity of its certificate._ It is almost always a bad idea not
1329 to check the certificates when transmitting confidential or
1332 `--certificate=FILE'
1333 Use the client certificate stored in FILE. This is needed for
1334 servers that are configured to require certificates from the
1335 clients that connect to them. Normally a certificate is not
1336 required and this switch is optional.
1338 `--certificate-type=TYPE'
1339 Specify the type of the client certificate. Legal values are
1340 `PEM' (assumed by default) and `DER', also known as `ASN1'.
1342 `--private-key=FILE'
1343 Read the private key from FILE. This allows you to provide the
1344 private key in a file separate from the certificate.
1346 `--private-key-type=TYPE'
1347 Specify the type of the private key. Accepted values are `PEM'
1348 (the default) and `DER'.
1350 `--ca-certificate=FILE'
1351 Use FILE as the file with the bundle of certificate authorities
1352 ("CA") to verify the peers. The certificates must be in PEM
1355 Without this option Wget looks for CA certificates at the
1356 system-specified locations, chosen at OpenSSL installation time.
1358 `--ca-directory=DIRECTORY'
1359 Specifies directory containing CA certificates in PEM format. Each
1360 file contains one CA certificate, and the file name is based on a
1361 hash value derived from the certificate. This is achieved by
1362 processing a certificate directory with the `c_rehash' utility
1363 supplied with OpenSSL. Using `--ca-directory' is more efficient
1364 than `--ca-certificate' when many certificates are installed
1365 because it allows Wget to fetch certificates on demand.
1367 Without this option Wget looks for CA certificates at the
1368 system-specified locations, chosen at OpenSSL installation time.
1370 `--random-file=FILE'
1371 Use FILE as the source of random data for seeding the
1372 pseudo-random number generator on systems without `/dev/random'.
1374 On such systems the SSL library needs an external source of
1375 randomness to initialize. Randomness may be provided by EGD (see
1376 `--egd-file' below) or read from an external source specified by
1377 the user. If this option is not specified, Wget looks for random
1378 data in `$RANDFILE' or, if that is unset, in `$HOME/.rnd'. If
1379 none of those are available, it is likely that SSL encryption will
1382 If you're getting the "Could not seed OpenSSL PRNG; disabling SSL."
1383 error, you should provide random data using some of the methods
1387 Use FILE as the EGD socket. EGD stands for "Entropy Gathering
1388 Daemon", a user-space program that collects data from various
1389 unpredictable system sources and makes it available to other
1390 programs that might need it. Encryption software, such as the SSL
1391 library, needs sources of non-repeating randomness to seed the
1392 random number generator used to produce cryptographically strong
1395 OpenSSL allows the user to specify his own source of entropy using
1396 the `RAND_FILE' environment variable. If this variable is unset,
1397 or if the specified file does not produce enough randomness,
1398 OpenSSL will read random data from EGD socket specified using this
1401 If this option is not specified (and the equivalent startup
1402 command is not used), EGD is never contacted. EGD is not needed
1403 on modern Unix systems that support `/dev/random'.
1406 File: wget.info, Node: FTP Options, Next: Recursive Retrieval Options, Prev: HTTPS (SSL/TLS) Options, Up: Invoking
1412 `--ftp-password=PASSWORD'
1413 Specify the username USER and password PASSWORD on an FTP server.
1414 Without this, or the corresponding startup option, the password
1415 defaults to `-wget@', normally used for anonymous FTP.
1417 Another way to specify username and password is in the URL itself
1418 (*note URL Format::). Either method reveals your password to
1419 anyone who bothers to run `ps'. To prevent the passwords from
1420 being seen, store them in `.wgetrc' or `.netrc', and make sure to
1421 protect those files from other users with `chmod'. If the
1422 passwords are really important, do not leave them lying in those
1423 files either--edit the files and delete them after Wget has
1424 started the download.
1426 `--no-remove-listing'
1427 Don't remove the temporary `.listing' files generated by FTP
1428 retrievals. Normally, these files contain the raw directory
1429 listings received from FTP servers. Not removing them can be
1430 useful for debugging purposes, or when you want to be able to
1431 easily check on the contents of remote server directories (e.g. to
1432 verify that a mirror you're running is complete).
1434 Note that even though Wget writes to a known filename for this
1435 file, this is not a security hole in the scenario of a user making
1436 `.listing' a symbolic link to `/etc/passwd' or something and
1437 asking `root' to run Wget in his or her directory. Depending on
1438 the options used, either Wget will refuse to write to `.listing',
1439 making the globbing/recursion/time-stamping operation fail, or the
1440 symbolic link will be deleted and replaced with the actual
1441 `.listing' file, or the listing will be written to a
1442 `.listing.NUMBER' file.
1444 Even though this situation isn't a problem, though, `root' should
1445 never run Wget in a non-trusted user's directory. A user could do
1446 something as simple as linking `index.html' to `/etc/passwd' and
1447 asking `root' to run Wget with `-N' or `-r' so the file will be
1451 Turn off FTP globbing. Globbing refers to the use of shell-like
1452 special characters ("wildcards"), like `*', `?', `[' and `]' to
1453 retrieve more than one file from the same directory at once, like:
1455 wget ftp://gnjilux.srk.fer.hr/*.msg
1457 By default, globbing will be turned on if the URL contains a
1458 globbing character. This option may be used to turn globbing on
1461 You may have to quote the URL to protect it from being expanded by
1462 your shell. Globbing makes Wget look for a directory listing,
1463 which is system-specific. This is why it currently works only
1464 with Unix FTP servers (and the ones emulating Unix `ls' output).
1467 Disable the use of the "passive" FTP transfer mode. Passive FTP
1468 mandates that the client connect to the server to establish the
1469 data connection rather than the other way around.
1471 If the machine is connected to the Internet directly, both passive
1472 and active FTP should work equally well. Behind most firewall and
1473 NAT configurations passive FTP has a better chance of working.
1474 However, in some rare firewall configurations, active FTP actually
1475 works when passive FTP doesn't. If you suspect this to be the
1476 case, use this option, or set `passive_ftp=off' in your init file.
1479 Usually, when retrieving FTP directories recursively and a symbolic
1480 link is encountered, the linked-to file is not downloaded.
1481 Instead, a matching symbolic link is created on the local
1482 filesystem. The pointed-to file will not be downloaded unless
1483 this recursive retrieval would have encountered it separately and
1484 downloaded it anyway.
1486 When `--retr-symlinks' is specified, however, symbolic links are
1487 traversed and the pointed-to files are retrieved. At this time,
1488 this option does not cause Wget to traverse symlinks to
1489 directories and recurse through them, but in the future it should
1490 be enhanced to do this.
1492 Note that when retrieving a file (not a directory) because it was
1493 specified on the command-line, rather than because it was recursed
1494 to, this option has no effect. Symbolic links are always
1495 traversed in this case.
1498 File: wget.info, Node: Recursive Retrieval Options, Next: Recursive Accept/Reject Options, Prev: FTP Options, Up: Invoking
1500 2.10 Recursive Retrieval Options
1501 ================================
1505 Turn on recursive retrieving. *Note Recursive Download::, for more
1510 Specify recursion maximum depth level DEPTH (*note Recursive
1511 Download::). The default maximum depth is 5.
1514 This option tells Wget to delete every single file it downloads,
1515 _after_ having done so. It is useful for pre-fetching popular
1516 pages through a proxy, e.g.:
1518 wget -r -nd --delete-after http://whatever.com/~popular/page/
1520 The `-r' option is to retrieve recursively, and `-nd' to not
1523 Note that `--delete-after' deletes files on the local machine. It
1524 does not issue the `DELE' command to remote FTP sites, for
1525 instance. Also note that when `--delete-after' is specified,
1526 `--convert-links' is ignored, so `.orig' files are simply not
1527 created in the first place.
1531 After the download is complete, convert the links in the document
1532 to make them suitable for local viewing. This affects not only
1533 the visible hyperlinks, but any part of the document that links to
1534 external content, such as embedded images, links to style sheets,
1535 hyperlinks to non-HTML content, etc.
1537 Each link will be changed in one of the two ways:
1539 * The links to files that have been downloaded by Wget will be
1540 changed to refer to the file they point to as a relative link.
1542 Example: if the downloaded file `/foo/doc.html' links to
1543 `/bar/img.gif', also downloaded, then the link in `doc.html'
1544 will be modified to point to `../bar/img.gif'. This kind of
1545 transformation works reliably for arbitrary combinations of
1548 * The links to files that have not been downloaded by Wget will
1549 be changed to include host name and absolute path of the
1550 location they point to.
1552 Example: if the downloaded file `/foo/doc.html' links to
1553 `/bar/img.gif' (or to `../bar/img.gif'), then the link in
1554 `doc.html' will be modified to point to
1555 `http://HOSTNAME/bar/img.gif'.
1557 Because of this, local browsing works reliably: if a linked file
1558 was downloaded, the link will refer to its local name; if it was
1559 not downloaded, the link will refer to its full Internet address
1560 rather than presenting a broken link. The fact that the former
1561 links are converted to relative links ensures that you can move
1562 the downloaded hierarchy to another directory.
1564 Note that only at the end of the download can Wget know which
1565 links have been downloaded. Because of that, the work done by
1566 `-k' will be performed at the end of all the downloads.
1569 `--backup-converted'
1570 When converting a file, back up the original version with a `.orig'
1571 suffix. Affects the behavior of `-N' (*note HTTP Time-Stamping
1576 Turn on options suitable for mirroring. This option turns on
1577 recursion and time-stamping, sets infinite recursion depth and
1578 keeps FTP directory listings. It is currently equivalent to `-r
1579 -N -l inf --no-remove-listing'.
1583 This option causes Wget to download all the files that are
1584 necessary to properly display a given HTML page. This includes
1585 such things as inlined images, sounds, and referenced stylesheets.
1587 Ordinarily, when downloading a single HTML page, any requisite
1588 documents that may be needed to display it properly are not
1589 downloaded. Using `-r' together with `-l' can help, but since
1590 Wget does not ordinarily distinguish between external and inlined
1591 documents, one is generally left with "leaf documents" that are
1592 missing their requisites.
1594 For instance, say document `1.html' contains an `<IMG>' tag
1595 referencing `1.gif' and an `<A>' tag pointing to external document
1596 `2.html'. Say that `2.html' is similar but that its image is
1597 `2.gif' and it links to `3.html'. Say this continues up to some
1598 arbitrarily high number.
1600 If one executes the command:
1602 wget -r -l 2 http://SITE/1.html
1604 then `1.html', `1.gif', `2.html', `2.gif', and `3.html' will be
1605 downloaded. As you can see, `3.html' is without its requisite
1606 `3.gif' because Wget is simply counting the number of hops (up to
1607 2) away from `1.html' in order to determine where to stop the
1608 recursion. However, with this command:
1610 wget -r -l 2 -p http://SITE/1.html
1612 all the above files _and_ `3.html''s requisite `3.gif' will be
1613 downloaded. Similarly,
1615 wget -r -l 1 -p http://SITE/1.html
1617 will cause `1.html', `1.gif', `2.html', and `2.gif' to be
1618 downloaded. One might think that:
1620 wget -r -l 0 -p http://SITE/1.html
1622 would download just `1.html' and `1.gif', but unfortunately this
1623 is not the case, because `-l 0' is equivalent to `-l inf'--that
1624 is, infinite recursion. To download a single HTML page (or a
1625 handful of them, all specified on the command-line or in a `-i'
1626 URL input file) and its (or their) requisites, simply leave off
1629 wget -p http://SITE/1.html
1631 Note that Wget will behave as if `-r' had been specified, but only
1632 that single page and its requisites will be downloaded. Links
1633 from that page to external documents will not be followed.
1634 Actually, to download a single page and all its requisites (even
1635 if they exist on separate websites), and make sure the lot
1636 displays properly locally, this author likes to use a few options
1637 in addition to `-p':
1639 wget -E -H -k -K -p http://SITE/DOCUMENT
1641 To finish off this topic, it's worth knowing that Wget's idea of an
1642 external document link is any URL specified in an `<A>' tag, an
1643 `<AREA>' tag, or a `<LINK>' tag other than `<LINK
1647 Turn on strict parsing of HTML comments. The default is to
1648 terminate comments at the first occurrence of `-->'.
1650 According to specifications, HTML comments are expressed as SGML
1651 "declarations". Declaration is special markup that begins with
1652 `<!' and ends with `>', such as `<!DOCTYPE ...>', that may contain
1653 comments between a pair of `--' delimiters. HTML comments are
1654 "empty declarations", SGML declarations without any non-comment
1655 text. Therefore, `<!--foo-->' is a valid comment, and so is
1656 `<!--one-- --two-->', but `<!--1--2-->' is not.
1658 On the other hand, most HTML writers don't perceive comments as
1659 anything other than text delimited with `<!--' and `-->', which is
1660 not quite the same. For example, something like `<!------------>'
1661 works as a valid comment as long as the number of dashes is a
1662 multiple of four (!). If not, the comment technically lasts until
1663 the next `--', which may be at the other end of the document.
1664 Because of this, many popular browsers completely ignore the
1665 specification and implement what users have come to expect:
1666 comments delimited with `<!--' and `-->'.
1668 Until version 1.9, Wget interpreted comments strictly, which
1669 resulted in missing links in many web pages that displayed fine in
1670 browsers, but had the misfortune of containing non-compliant
1671 comments. Beginning with version 1.9, Wget has joined the ranks
1672 of clients that implements "naive" comments, terminating each
1673 comment at the first occurrence of `-->'.
1675 If, for whatever reason, you want strict comment parsing, use this
1676 option to turn it on.
1679 File: wget.info, Node: Recursive Accept/Reject Options, Next: Exit Status, Prev: Recursive Retrieval Options, Up: Invoking
1681 2.11 Recursive Accept/Reject Options
1682 ====================================
1684 `-A ACCLIST --accept ACCLIST'
1685 `-R REJLIST --reject REJLIST'
1686 Specify comma-separated lists of file name suffixes or patterns to
1687 accept or reject (*note Types of Files::). Note that if any of the
1688 wildcard characters, `*', `?', `[' or `]', appear in an element of
1689 ACCLIST or REJLIST, it will be treated as a pattern, rather than a
1693 `--domains=DOMAIN-LIST'
1694 Set domains to be followed. DOMAIN-LIST is a comma-separated list
1695 of domains. Note that it does _not_ turn on `-H'.
1697 `--exclude-domains DOMAIN-LIST'
1698 Specify the domains that are _not_ to be followed. (*note
1702 Follow FTP links from HTML documents. Without this option, Wget
1703 will ignore all the FTP links.
1705 `--follow-tags=LIST'
1706 Wget has an internal table of HTML tag / attribute pairs that it
1707 considers when looking for linked documents during a recursive
1708 retrieval. If a user wants only a subset of those tags to be
1709 considered, however, he or she should be specify such tags in a
1710 comma-separated LIST with this option.
1712 `--ignore-tags=LIST'
1713 This is the opposite of the `--follow-tags' option. To skip
1714 certain HTML tags when recursively looking for documents to
1715 download, specify them in a comma-separated LIST.
1717 In the past, this option was the best bet for downloading a single
1718 page and its requisites, using a command-line like:
1720 wget --ignore-tags=a,area -H -k -K -r http://SITE/DOCUMENT
1722 However, the author of this option came across a page with tags
1723 like `<LINK REL="home" HREF="/">' and came to the realization that
1724 specifying tags to ignore was not enough. One can't just tell
1725 Wget to ignore `<LINK>', because then stylesheets will not be
1726 downloaded. Now the best bet for downloading a single page and
1727 its requisites is the dedicated `--page-requisites' option.
1730 Ignore case when matching files and directories. This influences
1731 the behavior of -R, -A, -I, and -X options, as well as globbing
1732 implemented when downloading from FTP sites. For example, with
1733 this option, `-A *.txt' will match `file1.txt', but also
1734 `file2.TXT', `file3.TxT', and so on.
1738 Enable spanning across hosts when doing recursive retrieving
1739 (*note Spanning Hosts::).
1743 Follow relative links only. Useful for retrieving a specific home
1744 page without any distractions, not even those from the same hosts
1745 (*note Relative Links::).
1748 `--include-directories=LIST'
1749 Specify a comma-separated list of directories you wish to follow
1750 when downloading (*note Directory-Based Limits::). Elements of
1751 LIST may contain wildcards.
1754 `--exclude-directories=LIST'
1755 Specify a comma-separated list of directories you wish to exclude
1756 from download (*note Directory-Based Limits::). Elements of LIST
1757 may contain wildcards.
1762 Do not ever ascend to the parent directory when retrieving
1763 recursively. This is a useful option, since it guarantees that
1764 only the files _below_ a certain hierarchy will be downloaded.
1765 *Note Directory-Based Limits::, for more details.
1768 File: wget.info, Node: Exit Status, Prev: Recursive Accept/Reject Options, Up: Invoking
1773 Wget may return one of several error codes if it encounters problems.
1776 No problems occurred.
1782 Parse error--for instance, when parsing command-line options, the
1783 `.wgetrc' or `.netrc'...
1792 SSL verification failure.
1795 Username/password authentication failure.
1801 Server issued an error response.
1803 With the exceptions of 0 and 1, the lower-numbered exit codes take
1804 precedence over higher-numbered ones, when multiple types of errors are
1807 In versions of Wget prior to 1.12, Wget's exit status tended to be
1808 unhelpful and inconsistent. Recursive downloads would virtually always
1809 return 0 (success), regardless of any issues encountered, and
1810 non-recursive fetches only returned the status corresponding to the
1811 most recently-attempted download.
1814 File: wget.info, Node: Recursive Download, Next: Following Links, Prev: Invoking, Up: Top
1816 3 Recursive Download
1817 ********************
1819 GNU Wget is capable of traversing parts of the Web (or a single HTTP or
1820 FTP server), following links and directory structure. We refer to this
1821 as to "recursive retrieval", or "recursion".
1823 With HTTP URLs, Wget retrieves and parses the HTML or CSS from the
1824 given URL, retrieving the files the document refers to, through markup
1825 like `href' or `src', or CSS URI values specified using the `url()'
1826 functional notation. If the freshly downloaded file is also of type
1827 `text/html', `application/xhtml+xml', or `text/css', it will be parsed
1828 and followed further.
1830 Recursive retrieval of HTTP and HTML/CSS content is "breadth-first".
1831 This means that Wget first downloads the requested document, then the
1832 documents linked from that document, then the documents linked by them,
1833 and so on. In other words, Wget first downloads the documents at depth
1834 1, then those at depth 2, and so on until the specified maximum depth.
1836 The maximum "depth" to which the retrieval may descend is specified
1837 with the `-l' option. The default maximum depth is five layers.
1839 When retrieving an FTP URL recursively, Wget will retrieve all the
1840 data from the given directory tree (including the subdirectories up to
1841 the specified depth) on the remote server, creating its mirror image
1842 locally. FTP retrieval is also limited by the `depth' parameter.
1843 Unlike HTTP recursion, FTP recursion is performed depth-first.
1845 By default, Wget will create a local directory tree, corresponding to
1846 the one found on the remote server.
1848 Recursive retrieving can find a number of applications, the most
1849 important of which is mirroring. It is also useful for WWW
1850 presentations, and any other opportunities where slow network
1851 connections should be bypassed by storing the files locally.
1853 You should be warned that recursive downloads can overload the remote
1854 servers. Because of that, many administrators frown upon them and may
1855 ban access from your site if they detect very fast downloads of big
1856 amounts of content. When downloading from Internet servers, consider
1857 using the `-w' option to introduce a delay between accesses to the
1858 server. The download will take a while longer, but the server
1859 administrator will not be alarmed by your rudeness.
1861 Of course, recursive download may cause problems on your machine. If
1862 left to run unchecked, it can easily fill up the disk. If downloading
1863 from local network, it can also take bandwidth on the system, as well as
1864 consume memory and CPU.
1866 Try to specify the criteria that match the kind of download you are
1867 trying to achieve. If you want to download only one page, use
1868 `--page-requisites' without any additional recursion. If you want to
1869 download things under one directory, use `-np' to avoid downloading
1870 things from other directories. If you want to download all the files
1871 from one directory, use `-l 1' to make sure the recursion depth never
1872 exceeds one. *Note Following Links::, for more information about this.
1874 Recursive retrieval should be used with care. Don't say you were not
1878 File: wget.info, Node: Following Links, Next: Time-Stamping, Prev: Recursive Download, Up: Top
1883 When retrieving recursively, one does not wish to retrieve loads of
1884 unnecessary data. Most of the time the users bear in mind exactly what
1885 they want to download, and want Wget to follow only specific links.
1887 For example, if you wish to download the music archive from
1888 `fly.srk.fer.hr', you will not want to download all the home pages that
1889 happen to be referenced by an obscure part of the archive.
1891 Wget possesses several mechanisms that allows you to fine-tune which
1892 links it will follow.
1896 * Spanning Hosts:: (Un)limiting retrieval based on host name.
1897 * Types of Files:: Getting only certain files.
1898 * Directory-Based Limits:: Getting only certain directories.
1899 * Relative Links:: Follow relative links only.
1900 * FTP Links:: Following FTP links.
1903 File: wget.info, Node: Spanning Hosts, Next: Types of Files, Prev: Following Links, Up: Following Links
1908 Wget's recursive retrieval normally refuses to visit hosts different
1909 than the one you specified on the command line. This is a reasonable
1910 default; without it, every retrieval would have the potential to turn
1911 your Wget into a small version of google.
1913 However, visiting different hosts, or "host spanning," is sometimes
1914 a useful option. Maybe the images are served from a different server.
1915 Maybe you're mirroring a site that consists of pages interlinked between
1916 three servers. Maybe the server has two equivalent names, and the HTML
1917 pages refer to both interchangeably.
1919 Span to any host--`-H'
1920 The `-H' option turns on host spanning, thus allowing Wget's
1921 recursive run to visit any host referenced by a link. Unless
1922 sufficient recursion-limiting criteria are applied depth, these
1923 foreign hosts will typically link to yet more hosts, and so on
1924 until Wget ends up sucking up much more data than you have
1927 Limit spanning to certain domains--`-D'
1928 The `-D' option allows you to specify the domains that will be
1929 followed, thus limiting the recursion only to the hosts that
1930 belong to these domains. Obviously, this makes sense only in
1931 conjunction with `-H'. A typical example would be downloading the
1932 contents of `www.server.com', but allowing downloads from
1933 `images.server.com', etc.:
1935 wget -rH -Dserver.com http://www.server.com/
1937 You can specify more than one address by separating them with a
1938 comma, e.g. `-Ddomain1.com,domain2.com'.
1940 Keep download off certain domains--`--exclude-domains'
1941 If there are domains you want to exclude specifically, you can do
1942 it with `--exclude-domains', which accepts the same type of
1943 arguments of `-D', but will _exclude_ all the listed domains. For
1944 example, if you want to download all the hosts from `foo.edu'
1945 domain, with the exception of `sunsite.foo.edu', you can do it like
1948 wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu \
1953 File: wget.info, Node: Types of Files, Next: Directory-Based Limits, Prev: Spanning Hosts, Up: Following Links
1958 When downloading material from the web, you will often want to restrict
1959 the retrieval to only certain file types. For example, if you are
1960 interested in downloading GIFs, you will not be overjoyed to get loads
1961 of PostScript documents, and vice versa.
1963 Wget offers two options to deal with this problem. Each option
1964 description lists a short name, a long name, and the equivalent command
1970 The argument to `--accept' option is a list of file suffixes or
1971 patterns that Wget will download during recursive retrieval. A
1972 suffix is the ending part of a file, and consists of "normal"
1973 letters, e.g. `gif' or `.jpg'. A matching pattern contains
1974 shell-like wildcards, e.g. `books*' or `zelazny*196[0-9]*'.
1976 So, specifying `wget -A gif,jpg' will make Wget download only the
1977 files ending with `gif' or `jpg', i.e. GIFs and JPEGs. On the
1978 other hand, `wget -A "zelazny*196[0-9]*"' will download only files
1979 beginning with `zelazny' and containing numbers from 1960 to 1969
1980 anywhere within. Look up the manual of your shell for a
1981 description of how pattern matching works.
1983 Of course, any number of suffixes and patterns can be combined
1984 into a comma-separated list, and given as an argument to `-A'.
1989 The `--reject' option works the same way as `--accept', only its
1990 logic is the reverse; Wget will download all files _except_ the
1991 ones matching the suffixes (or patterns) in the list.
1993 So, if you want to download a whole page except for the cumbersome
1994 MPEGs and .AU files, you can use `wget -R mpg,mpeg,au'.
1995 Analogously, to download all files except the ones beginning with
1996 `bjork', use `wget -R "bjork*"'. The quotes are to prevent
1997 expansion by the shell.
1999 The `-A' and `-R' options may be combined to achieve even better
2000 fine-tuning of which files to retrieve. E.g. `wget -A "*zelazny*" -R
2001 .ps' will download all the files having `zelazny' as a part of their
2002 name, but _not_ the PostScript files.
2004 Note that these two options do not affect the downloading of HTML
2005 files (as determined by a `.htm' or `.html' filename prefix). This
2006 behavior may not be desirable for all users, and may be changed for
2007 future versions of Wget.
2009 Note, too, that query strings (strings at the end of a URL beginning
2010 with a question mark (`?') are not included as part of the filename for
2011 accept/reject rules, even though these will actually contribute to the
2012 name chosen for the local file. It is expected that a future version of
2013 Wget will provide an option to allow matching against query strings.
2015 Finally, it's worth noting that the accept/reject lists are matched
2016 _twice_ against downloaded files: once against the URL's filename
2017 portion, to determine if the file should be downloaded in the first
2018 place; then, after it has been accepted and successfully downloaded,
2019 the local file's name is also checked against the accept/reject lists
2020 to see if it should be removed. The rationale was that, since `.htm'
2021 and `.html' files are always downloaded regardless of accept/reject
2022 rules, they should be removed _after_ being downloaded and scanned for
2023 links, if they did match the accept/reject lists. However, this can
2024 lead to unexpected results, since the local filenames can differ from
2025 the original URL filenames in the following ways, all of which can
2026 change whether an accept/reject rule matches:
2028 * If the local file already exists and `--no-directories' was
2029 specified, a numeric suffix will be appended to the original name.
2031 * If `--adjust-extension' was specified, the local filename might
2032 have `.html' appended to it. If Wget is invoked with `-E -A.php',
2033 a filename such as `index.php' will match be accepted, but upon
2034 download will be named `index.php.html', which no longer matches,
2035 and so the file will be deleted.
2037 * Query strings do not contribute to URL matching, but are included
2038 in local filenames, and so _do_ contribute to filename matching.
2040 This behavior, too, is considered less-than-desirable, and may change
2041 in a future version of Wget.
2044 File: wget.info, Node: Directory-Based Limits, Next: Relative Links, Prev: Types of Files, Up: Following Links
2046 4.3 Directory-Based Limits
2047 ==========================
2049 Regardless of other link-following facilities, it is often useful to
2050 place the restriction of what files to retrieve based on the directories
2051 those files are placed in. There can be many reasons for this--the
2052 home pages may be organized in a reasonable directory structure; or some
2053 directories may contain useless information, e.g. `/cgi-bin' or `/dev'
2056 Wget offers three different options to deal with this requirement.
2057 Each option description lists a short name, a long name, and the
2058 equivalent command in `.wgetrc'.
2062 `include_directories = LIST'
2063 `-I' option accepts a comma-separated list of directories included
2064 in the retrieval. Any other directories will simply be ignored.
2065 The directories are absolute paths.
2067 So, if you wish to download from `http://host/people/bozo/'
2068 following only links to bozo's colleagues in the `/people'
2069 directory and the bogus scripts in `/cgi-bin', you can specify:
2071 wget -I /people,/cgi-bin http://host/people/bozo/
2075 `exclude_directories = LIST'
2076 `-X' option is exactly the reverse of `-I'--this is a list of
2077 directories _excluded_ from the download. E.g. if you do not want
2078 Wget to download things from `/cgi-bin' directory, specify `-X
2079 /cgi-bin' on the command line.
2081 The same as with `-A'/`-R', these two options can be combined to
2082 get a better fine-tuning of downloading subdirectories. E.g. if
2083 you want to load all the files from `/pub' hierarchy except for
2084 `/pub/worthless', specify `-I/pub -X/pub/worthless'.
2089 The simplest, and often very useful way of limiting directories is
2090 disallowing retrieval of the links that refer to the hierarchy
2091 "above" than the beginning directory, i.e. disallowing ascent to
2092 the parent directory/directories.
2094 The `--no-parent' option (short `-np') is useful in this case.
2095 Using it guarantees that you will never leave the existing
2096 hierarchy. Supposing you issue Wget with:
2098 wget -r --no-parent http://somehost/~luzer/my-archive/
2100 You may rest assured that none of the references to
2101 `/~his-girls-homepage/' or `/~luzer/all-my-mpegs/' will be
2102 followed. Only the archive you are interested in will be
2103 downloaded. Essentially, `--no-parent' is similar to
2104 `-I/~luzer/my-archive', only it handles redirections in a more
2105 intelligent fashion.
2107 *Note* that, for HTTP (and HTTPS), the trailing slash is very
2108 important to `--no-parent'. HTTP has no concept of a
2109 "directory"--Wget relies on you to indicate what's a directory and
2110 what isn't. In `http://foo/bar/', Wget will consider `bar' to be a
2111 directory, while in `http://foo/bar' (no trailing slash), `bar'
2112 will be considered a filename (so `--no-parent' would be
2113 meaningless, as its parent is `/').
2116 File: wget.info, Node: Relative Links, Next: FTP Links, Prev: Directory-Based Limits, Up: Following Links
2121 When `-L' is turned on, only the relative links are ever followed.
2122 Relative links are here defined those that do not refer to the web
2123 server root. For example, these links are relative:
2126 <a href="foo/bar.gif">
2127 <a href="../foo/bar.gif">
2129 These links are not relative:
2132 <a href="/foo/bar.gif">
2133 <a href="http://www.server.com/foo/bar.gif">
2135 Using this option guarantees that recursive retrieval will not span
2136 hosts, even without `-H'. In simple cases it also allows downloads to
2137 "just work" without having to convert links.
2139 This option is probably not very useful and might be removed in a
2143 File: wget.info, Node: FTP Links, Prev: Relative Links, Up: Following Links
2145 4.5 Following FTP Links
2146 =======================
2148 The rules for FTP are somewhat specific, as it is necessary for them to
2149 be. FTP links in HTML documents are often included for purposes of
2150 reference, and it is often inconvenient to download them by default.
2152 To have FTP links followed from HTML documents, you need to specify
2153 the `--follow-ftp' option. Having done that, FTP links will span hosts
2154 regardless of `-H' setting. This is logical, as FTP links rarely point
2155 to the same host where the HTTP server resides. For similar reasons,
2156 the `-L' options has no effect on such downloads. On the other hand,
2157 domain acceptance (`-D') and suffix rules (`-A' and `-R') apply
2160 Also note that followed links to FTP directories will not be
2161 retrieved recursively further.
2164 File: wget.info, Node: Time-Stamping, Next: Startup File, Prev: Following Links, Up: Top
2169 One of the most important aspects of mirroring information from the
2170 Internet is updating your archives.
2172 Downloading the whole archive again and again, just to replace a few
2173 changed files is expensive, both in terms of wasted bandwidth and money,
2174 and the time to do the update. This is why all the mirroring tools
2175 offer the option of incremental updating.
2177 Such an updating mechanism means that the remote server is scanned in
2178 search of "new" files. Only those new files will be downloaded in the
2179 place of the old ones.
2181 A file is considered new if one of these two conditions are met:
2183 1. A file of that name does not already exist locally.
2185 2. A file of that name does exist, but the remote file was modified
2186 more recently than the local file.
2188 To implement this, the program needs to be aware of the time of last
2189 modification of both local and remote files. We call this information
2190 the "time-stamp" of a file.
2192 The time-stamping in GNU Wget is turned on using `--timestamping'
2193 (`-N') option, or through `timestamping = on' directive in `.wgetrc'.
2194 With this option, for each file it intends to download, Wget will check
2195 whether a local file of the same name exists. If it does, and the
2196 remote file is not newer, Wget will not download it.
2198 If the local file does not exist, or the sizes of the files do not
2199 match, Wget will download the remote file no matter what the time-stamps
2204 * Time-Stamping Usage::
2205 * HTTP Time-Stamping Internals::
2206 * FTP Time-Stamping Internals::
2209 File: wget.info, Node: Time-Stamping Usage, Next: HTTP Time-Stamping Internals, Prev: Time-Stamping, Up: Time-Stamping
2211 5.1 Time-Stamping Usage
2212 =======================
2214 The usage of time-stamping is simple. Say you would like to download a
2215 file so that it keeps its date of modification.
2217 wget -S http://www.gnu.ai.mit.edu/
2219 A simple `ls -l' shows that the time stamp on the local file equals
2220 the state of the `Last-Modified' header, as returned by the server. As
2221 you can see, the time-stamping info is preserved locally, even without
2222 `-N' (at least for HTTP).
2224 Several days later, you would like Wget to check if the remote file
2225 has changed, and download it if it has.
2227 wget -N http://www.gnu.ai.mit.edu/
2229 Wget will ask the server for the last-modified date. If the local
2230 file has the same timestamp as the server, or a newer one, the remote
2231 file will not be re-fetched. However, if the remote file is more
2232 recent, Wget will proceed to fetch it.
2234 The same goes for FTP. For example:
2236 wget "ftp://ftp.ifi.uio.no/pub/emacs/gnus/*"
2238 (The quotes around that URL are to prevent the shell from trying to
2241 After download, a local directory listing will show that the
2242 timestamps match those on the remote server. Reissuing the command
2243 with `-N' will make Wget re-fetch _only_ the files that have been
2244 modified since the last download.
2246 If you wished to mirror the GNU archive every week, you would use a
2247 command like the following, weekly:
2249 wget --timestamping -r ftp://ftp.gnu.org/pub/gnu/
2251 Note that time-stamping will only work for files for which the server
2252 gives a timestamp. For HTTP, this depends on getting a `Last-Modified'
2253 header. For FTP, this depends on getting a directory listing with
2254 dates in a format that Wget can parse (*note FTP Time-Stamping
2258 File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
2260 5.2 HTTP Time-Stamping Internals
2261 ================================
2263 Time-stamping in HTTP is implemented by checking of the `Last-Modified'
2264 header. If you wish to retrieve the file `foo.html' through HTTP, Wget
2265 will check whether `foo.html' exists locally. If it doesn't,
2266 `foo.html' will be retrieved unconditionally.
2268 If the file does exist locally, Wget will first check its local
2269 time-stamp (similar to the way `ls -l' checks it), and then send a
2270 `HEAD' request to the remote server, demanding the information on the
2273 The `Last-Modified' header is examined to find which file was
2274 modified more recently (which makes it "newer"). If the remote file is
2275 newer, it will be downloaded; if it is older, Wget will give up.(1)
2277 When `--backup-converted' (`-K') is specified in conjunction with
2278 `-N', server file `X' is compared to local file `X.orig', if extant,
2279 rather than being compared to local file `X', which will always differ
2280 if it's been converted by `--convert-links' (`-k').
2282 Arguably, HTTP time-stamping should be implemented using the
2283 `If-Modified-Since' request.
2285 ---------- Footnotes ----------
2287 (1) As an additional check, Wget will look at the `Content-Length'
2288 header, and compare the sizes; if they are not the same, the remote
2289 file will be downloaded no matter what the time-stamp says.
2292 File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping
2294 5.3 FTP Time-Stamping Internals
2295 ===============================
2297 In theory, FTP time-stamping works much the same as HTTP, only FTP has
2298 no headers--time-stamps must be ferreted out of directory listings.
2300 If an FTP download is recursive or uses globbing, Wget will use the
2301 FTP `LIST' command to get a file listing for the directory containing
2302 the desired file(s). It will try to analyze the listing, treating it
2303 like Unix `ls -l' output, extracting the time-stamps. The rest is
2304 exactly the same as for HTTP. Note that when retrieving individual
2305 files from an FTP server without using globbing or recursion, listing
2306 files will not be downloaded (and thus files will not be time-stamped)
2307 unless `-N' is specified.
2309 Assumption that every directory listing is a Unix-style listing may
2310 sound extremely constraining, but in practice it is not, as many
2311 non-Unix FTP servers use the Unixoid listing format because most (all?)
2312 of the clients understand it. Bear in mind that RFC959 defines no
2313 standard way to get a file list, let alone the time-stamps. We can
2314 only hope that a future standard will define this.
2316 Another non-standard solution includes the use of `MDTM' command
2317 that is supported by some FTP servers (including the popular
2318 `wu-ftpd'), which returns the exact time of the specified file. Wget
2319 may support this command in the future.
2322 File: wget.info, Node: Startup File, Next: Examples, Prev: Time-Stamping, Up: Top
2327 Once you know how to change default settings of Wget through command
2328 line arguments, you may wish to make some of those settings permanent.
2329 You can do that in a convenient way by creating the Wget startup
2332 Besides `.wgetrc' is the "main" initialization file, it is
2333 convenient to have a special facility for storing passwords. Thus Wget
2334 reads and interprets the contents of `$HOME/.netrc', if it finds it.
2335 You can find `.netrc' format in your system manuals.
2337 Wget reads `.wgetrc' upon startup, recognizing a limited set of
2342 * Wgetrc Location:: Location of various wgetrc files.
2343 * Wgetrc Syntax:: Syntax of wgetrc.
2344 * Wgetrc Commands:: List of available commands.
2345 * Sample Wgetrc:: A wgetrc example.
2348 File: wget.info, Node: Wgetrc Location, Next: Wgetrc Syntax, Prev: Startup File, Up: Startup File
2353 When initializing, Wget will look for a "global" startup file,
2354 `/usr/local/etc/wgetrc' by default (or some prefix other than
2355 `/usr/local', if Wget was not installed there) and read commands from
2356 there, if it exists.
2358 Then it will look for the user's file. If the environmental variable
2359 `WGETRC' is set, Wget will try to load that file. Failing that, no
2360 further attempts will be made.
2362 If `WGETRC' is not set, Wget will try to load `$HOME/.wgetrc'.
2364 The fact that user's settings are loaded after the system-wide ones
2365 means that in case of collision user's wgetrc _overrides_ the
2366 system-wide wgetrc (in `/usr/local/etc/wgetrc' by default). Fascist
2370 File: wget.info, Node: Wgetrc Syntax, Next: Wgetrc Commands, Prev: Wgetrc Location, Up: Startup File
2375 The syntax of a wgetrc command is simple:
2379 The "variable" will also be called "command". Valid "values" are
2380 different for different commands.
2382 The commands are case-insensitive and underscore-insensitive. Thus
2383 `DIr__PrefiX' is the same as `dirprefix'. Empty lines, lines beginning
2384 with `#' and lines containing white-space only are discarded.
2386 Commands that expect a comma-separated list will clear the list on an
2387 empty command. So, if you wish to reset the rejection list specified in
2388 global `wgetrc', you can do it with:
2393 File: wget.info, Node: Wgetrc Commands, Next: Sample Wgetrc, Prev: Wgetrc Syntax, Up: Startup File
2398 The complete set of commands is listed below. Legal values are listed
2399 after the `='. Simple Boolean values can be set or unset using `on'
2400 and `off' or `1' and `0'.
2402 Some commands take pseudo-arbitrary values. ADDRESS values can be
2403 hostnames or dotted-quad IP addresses. N can be any positive integer,
2404 or `inf' for infinity, where appropriate. STRING values can be any
2407 Most of these commands have direct command-line equivalents. Also,
2408 any wgetrc command can be specified on the command line using the
2409 `--execute' switch (*note Basic Startup Options::.)
2411 accept/reject = STRING
2412 Same as `-A'/`-R' (*note Types of Files::).
2414 add_hostdir = on/off
2415 Enable/disable host-prefixed file names. `-nH' disables it.
2417 ask_password = on/off
2418 Prompt for a password for each connection established. Cannot be
2419 specified when `--password' is being used, because they are
2420 mutually exclusive. Equivalent to `--ask-password'.
2422 auth_no_challenge = on/off
2423 If this option is given, Wget will send Basic HTTP authentication
2424 information (plaintext username and password) for all requests. See
2425 `--auth-no-challenge'.
2428 Enable/disable going to background--the same as `-b' (which
2431 backup_converted = on/off
2432 Enable/disable saving pre-converted files with the suffix
2433 `.orig'--the same as `-K' (which enables it).
2436 Consider relative URLs in input files (specified via the `input'
2437 command or the `--input-file'/`-i' option, together with
2438 `force_html' or `--force-html') as being relative to STRING--the
2439 same as `--base=STRING'.
2441 bind_address = ADDRESS
2442 Bind to ADDRESS, like the `--bind-address=ADDRESS'.
2444 ca_certificate = FILE
2445 Set the certificate authority bundle file to FILE. The same as
2446 `--ca-certificate=FILE'.
2448 ca_directory = DIRECTORY
2449 Set the directory used for certificate authorities. The same as
2450 `--ca-directory=DIRECTORY'.
2453 When set to off, disallow server-caching. See the `--no-cache'
2457 Set the client certificate file name to FILE. The same as
2458 `--certificate=FILE'.
2460 certificate_type = STRING
2461 Specify the type of the client certificate, legal values being
2462 `PEM' (the default) and `DER' (aka ASN1). The same as
2463 `--certificate-type=STRING'.
2465 check_certificate = on/off
2466 If this is set to off, the server certificate is not checked
2467 against the specified client authorities. The default is "on".
2468 The same as `--check-certificate'.
2471 Set the connect timeout--the same as `--connect-timeout'.
2473 content_disposition = on/off
2474 Turn on recognition of the (non-standard) `Content-Disposition'
2475 HTTP header--if set to `on', the same as `--content-disposition'.
2478 If set to on, force continuation of preexistent partially retrieved
2479 files. See `-c' before setting it.
2481 convert_links = on/off
2482 Convert non-relative links locally. The same as `-k'.
2485 When set to off, disallow cookies. See the `--cookies' option.
2488 Ignore N remote directory components. Equivalent to
2492 Debug mode, same as `-d'.
2494 default_page = STRING
2495 Default page name--the same as `--default-page=STRING'.
2497 delete_after = on/off
2498 Delete after download--the same as `--delete-after'.
2501 Top of directory tree--the same as `-P STRING'.
2504 Turning dirstruct on or off--the same as `-x' or `-nd',
2508 Turn DNS caching on/off. Since DNS caching is on by default, this
2509 option is normally used to turn it off and is equivalent to
2513 Set the DNS timeout--the same as `--dns-timeout'.
2516 Same as `-D' (*note Spanning Hosts::).
2519 Specify the number of bytes "contained" in a dot, as seen
2520 throughout the retrieval (1024 by default). You can postfix the
2521 value with `k' or `m', representing kilobytes and megabytes,
2522 respectively. With dot settings you can tailor the dot retrieval
2523 to suit your needs, or you can use the predefined "styles" (*note
2524 Download Options::).
2527 Specify the number of dots in a single cluster (10 by default).
2530 Specify the number of dots that will be printed in each line
2531 throughout the retrieval (50 by default).
2534 Use STRING as the EGD socket file name. The same as
2537 exclude_directories = STRING
2538 Specify a comma-separated list of directories you wish to exclude
2539 from download--the same as `-X STRING' (*note Directory-Based
2542 exclude_domains = STRING
2543 Same as `--exclude-domains=STRING' (*note Spanning Hosts::).
2546 Follow FTP links from HTML documents--the same as `--follow-ftp'.
2548 follow_tags = STRING
2549 Only follow certain HTML tags when doing a recursive retrieval,
2550 just like `--follow-tags=STRING'.
2553 If set to on, force the input filename to be regarded as an HTML
2554 document--the same as `-F'.
2556 ftp_password = STRING
2557 Set your FTP password to STRING. Without this setting, the
2558 password defaults to `-wget@', which is a useful default for
2559 anonymous FTP access.
2561 This command used to be named `passwd' prior to Wget 1.10.
2564 Use STRING as FTP proxy, instead of the one specified in
2568 Set FTP user to STRING.
2570 This command used to be named `login' prior to Wget 1.10.
2573 Turn globbing on/off--the same as `--glob' and `--no-glob'.
2576 Define a header for HTTP downloads, like using `--header=STRING'.
2578 adjust_extension = on/off
2579 Add a `.html' extension to `text/html' or `application/xhtml+xml'
2580 files that lack one, or a `.css' extension to `text/css' files
2581 that lack one, like `-E'. Previously named `html_extension' (still
2582 acceptable, but deprecated).
2584 http_keep_alive = on/off
2585 Turn the keep-alive feature on or off (defaults to on). Turning it
2586 off is equivalent to `--no-http-keep-alive'.
2588 http_password = STRING
2589 Set HTTP password, equivalent to `--http-password=STRING'.
2592 Use STRING as HTTP proxy, instead of the one specified in
2596 Set HTTP user to STRING, equivalent to `--http-user=STRING'.
2598 https_proxy = STRING
2599 Use STRING as HTTPS proxy, instead of the one specified in
2602 ignore_case = on/off
2603 When set to on, match files and directories case insensitively; the
2604 same as `--ignore-case'.
2606 ignore_length = on/off
2607 When set to on, ignore `Content-Length' header; the same as
2610 ignore_tags = STRING
2611 Ignore certain HTML tags when doing a recursive retrieval, like
2612 `--ignore-tags=STRING'.
2614 include_directories = STRING
2615 Specify a comma-separated list of directories you wish to follow
2616 when downloading--the same as `-I STRING'.
2619 When set to on, enable internationalized URI (IRI) support; the
2623 Force connecting to IPv4 addresses, off by default. You can put
2624 this in the global init file to disable Wget's attempts to resolve
2625 and connect to IPv6 hosts. Available only if Wget was compiled
2626 with IPv6 support. The same as `--inet4-only' or `-4'.
2629 Force connecting to IPv6 addresses, off by default. Available
2630 only if Wget was compiled with IPv6 support. The same as
2631 `--inet6-only' or `-6'.
2634 Read the URLs from STRING, like `-i FILE'.
2636 keep_session_cookies = on/off
2637 When specified, causes `save_cookies = on' to also save session
2638 cookies. See `--keep-session-cookies'.
2641 Limit the download speed to no more than RATE bytes per second.
2642 The same as `--limit-rate=RATE'.
2645 Load cookies from FILE. See `--load-cookies FILE'.
2647 local_encoding = ENCODING
2648 Force Wget to use ENCODING as the default system encoding. See
2652 Set logfile to FILE, the same as `-o FILE'.
2654 max_redirect = NUMBER
2655 Specifies the maximum number of redirections to follow for a
2656 resource. See `--max-redirect=NUMBER'.
2659 Turn mirroring on/off. The same as `-m'.
2662 Turn reading netrc on or off.
2668 Disallow retrieving outside the directory hierarchy, like
2669 `--no-parent' (*note Directory-Based Limits::).
2672 Use STRING as the comma-separated list of domains to avoid in
2673 proxy loading, instead of the one specified in environment.
2675 output_document = FILE
2676 Set the output filename--the same as `-O FILE'.
2678 page_requisites = on/off
2679 Download all ancillary documents necessary for a single HTML page
2680 to display properly--the same as `-p'.
2682 passive_ftp = on/off
2683 Change setting of passive FTP, equivalent to the `--passive-ftp'
2687 Specify password STRING for both FTP and HTTP file retrieval.
2688 This command can be overridden using the `ftp_password' and
2689 `http_password' command for FTP and HTTP respectively.
2692 Use POST as the method for all HTTP requests and send STRING in
2693 the request body. The same as `--post-data=STRING'.
2696 Use POST as the method for all HTTP requests and send the contents
2697 of FILE in the request body. The same as `--post-file=FILE'.
2699 prefer_family = none/IPv4/IPv6
2700 When given a choice of several addresses, connect to the addresses
2701 with specified address family first. The address order returned by
2702 DNS is used without change by default. The same as
2703 `--prefer-family', which see for a detailed discussion of why this
2707 Set the private key file to FILE. The same as
2708 `--private-key=FILE'.
2710 private_key_type = STRING
2711 Specify the type of the private key, legal values being `PEM' (the
2712 default) and `DER' (aka ASN1). The same as
2713 `--private-type=STRING'.
2716 Set the type of the progress indicator. Legal types are `dot' and
2717 `bar'. Equivalent to `--progress=STRING'.
2719 protocol_directories = on/off
2720 When set, use the protocol name as a directory component of local
2721 file names. The same as `--protocol-directories'.
2723 proxy_password = STRING
2724 Set proxy authentication password to STRING, like
2725 `--proxy-password=STRING'.
2728 Set proxy authentication user name to STRING, like
2729 `--proxy-user=STRING'.
2732 Quiet mode--the same as `-q'.
2735 Specify the download quota, which is useful to put in the global
2736 `wgetrc'. When download quota is specified, Wget will stop
2737 retrieving after the download sum has become greater than quota.
2738 The quota can be specified in bytes (default), kbytes `k'
2739 appended) or mbytes (`m' appended). Thus `quota = 5m' will set
2740 the quota to 5 megabytes. Note that the user's startup file
2741 overrides system settings.
2744 Use FILE as a source of randomness on systems lacking
2747 random_wait = on/off
2748 Turn random between-request wait times on or off. The same as
2752 Set the read (and write) timeout--the same as `--read-timeout=N'.
2755 Recursion level (depth)--the same as `-l N'.
2758 Recursive on/off--the same as `-r'.
2761 Set HTTP `Referer:' header just like `--referer=STRING'. (Note
2762 that it was the folks who wrote the HTTP spec who got the spelling
2763 of "referrer" wrong.)
2765 relative_only = on/off
2766 Follow only relative links--the same as `-L' (*note Relative
2769 remote_encoding = ENCODING
2770 Force Wget to use ENCODING as the default remote server encoding.
2771 See `--remote-encoding'.
2773 remove_listing = on/off
2774 If set to on, remove FTP listings downloaded by Wget. Setting it
2775 to off is the same as `--no-remove-listing'.
2777 restrict_file_names = unix/windows
2778 Restrict the file names generated by Wget from URLs. See
2779 `--restrict-file-names' for a more detailed description.
2781 retr_symlinks = on/off
2782 When set to on, retrieve symbolic links as if they were plain
2783 files; the same as `--retr-symlinks'.
2785 retry_connrefused = on/off
2786 When set to on, consider "connection refused" a transient
2787 error--the same as `--retry-connrefused'.
2790 Specify whether the norobots convention is respected by Wget, "on"
2791 by default. This switch controls both the `/robots.txt' and the
2792 `nofollow' aspect of the spec. *Note Robot Exclusion::, for more
2793 details about this. Be sure you know what you are doing before
2797 Save cookies to FILE. The same as `--save-cookies FILE'.
2799 save_headers = on/off
2800 Same as `--save-headers'.
2802 secure_protocol = STRING
2803 Choose the secure protocol to be used. Legal values are `auto'
2804 (the default), `SSLv2', `SSLv3', and `TLSv1'. The same as
2805 `--secure-protocol=STRING'.
2807 server_response = on/off
2808 Choose whether or not to print the HTTP and FTP server
2809 responses--the same as `-S'.
2817 strict_comments = on/off
2818 Same as `--strict-comments'.
2821 Set all applicable timeout values to N, the same as `-T N'.
2823 timestamping = on/off
2824 Turn timestamping on/off. The same as `-N' (*note
2828 Set number of retries per URL--the same as `-t N'.
2831 When set to off, don't use proxy even when proxy-related
2832 environment variables are set. In that case it is the same as
2836 Specify username STRING for both FTP and HTTP file retrieval.
2837 This command can be overridden using the `ftp_user' and
2838 `http_user' command for FTP and HTTP respectively.
2841 User agent identification sent to the HTTP Server--the same as
2842 `--user-agent=STRING'.
2845 Turn verbose on/off--the same as `-v'/`-nv'.
2848 Wait N seconds between retrievals--the same as `-w N'.
2851 Wait up to N seconds between retries of failed retrievals
2852 only--the same as `--waitretry=N'. Note that this is turned on by
2853 default in the global `wgetrc'.
2856 File: wget.info, Node: Sample Wgetrc, Prev: Wgetrc Commands, Up: Startup File
2861 This is the sample initialization file, as given in the distribution.
2862 It is divided in two section--one for global usage (suitable for global
2863 startup file), and one for local usage (suitable for `$HOME/.wgetrc').
2864 Be careful about the things you change.
2866 Note that almost all the lines are commented out. For a command to
2867 have any effect, you must remove the `#' character at the beginning of
2871 ### Sample Wget initialization file .wgetrc
2874 ## You can use this file to change the default behaviour of wget or to
2875 ## avoid having to type many many command-line options. This file does
2876 ## not contain a comprehensive list of commands -- look at the manual
2877 ## to find out what you can put into this file.
2879 ## Wget initialization file can reside in /usr/local/etc/wgetrc
2880 ## (global, for all users) or $HOME/.wgetrc (for a single user).
2882 ## To use the settings in this file, you will have to uncomment them,
2883 ## as well as change them, in most cases, as the values on the
2884 ## commented-out lines are the default values (e.g. "off").
2888 ## Global settings (useful for setting up in /usr/local/etc/wgetrc).
2889 ## Think well before you change them, since they may reduce wget's
2890 ## functionality, and make it behave contrary to the documentation:
2893 # You can set retrieve quota for beginners by specifying a value
2894 # optionally followed by 'K' (kilobytes) or 'M' (megabytes). The
2895 # default quota is unlimited.
2898 # You can lower (or raise) the default number of retries when
2899 # downloading a file (default is 20).
2902 # Lowering the maximum depth of the recursive retrieval is handy to
2903 # prevent newbies from going too "deep" when they unwittingly start
2904 # the recursive retrieval. The default is 5.
2907 # By default Wget uses "passive FTP" transfer where the client
2908 # initiates the data connection to the server rather than the other
2909 # way around. That is required on systems behind NAT where the client
2910 # computer cannot be easily reached from the Internet. However, some
2911 # firewalls software explicitly supports active FTP and in fact has
2912 # problems supporting passive transfer. If you are in such
2913 # environment, use "passive_ftp = off" to revert to active FTP.
2916 # The "wait" command below makes Wget wait between every connection.
2917 # If, instead, you want Wget to wait only between retries of failed
2918 # downloads, set waitretry to maximum number of seconds to wait (Wget
2919 # will use "linear backoff", waiting 1 second after the first failure
2920 # on a file, 2 seconds after the second failure, etc. up to this max).
2925 ## Local settings (for a user to set in his $HOME/.wgetrc). It is
2926 ## *highly* undesirable to put these settings in the global file, since
2927 ## they are potentially dangerous to "normal" users.
2929 ## Even when setting up your own ~/.wgetrc, you should know what you
2930 ## are doing before doing so.
2933 # Set this to on to use timestamping by default:
2936 # It is a good idea to make Wget send your email address in a `From:'
2937 # header with your request (so that server administrators can contact
2938 # you in case of errors). Wget does *not* send `From:' by default.
2939 #header = From: Your Name <username@site.domain>
2941 # You can set up other headers, like Accept-Language. Accept-Language
2942 # is *not* sent by default.
2943 #header = Accept-Language: en
2945 # You can set the default proxies for Wget to use for http, https, and ftp.
2946 # They will override the value in the environment.
2947 #https_proxy = http://proxy.yoyodyne.com:18023/
2948 #http_proxy = http://proxy.yoyodyne.com:18023/
2949 #ftp_proxy = http://proxy.yoyodyne.com:18023/
2951 # If you do not want to use proxy at all, set this to off.
2954 # You can customize the retrieval outlook. Valid options are default,
2955 # binary, mega and micro.
2956 #dot_style = default
2958 # Setting this to off makes Wget not download /robots.txt. Be sure to
2959 # know *exactly* what /robots.txt is and how it is used before changing
2963 # It can be useful to make Wget wait between connections. Set this to
2964 # the number of seconds you want Wget to wait.
2967 # You can force creating directory structure, even if a single is being
2968 # retrieved, by setting this to on.
2971 # You can turn on recursive retrieving by default (don't do this if
2972 # you are not sure you know what it means) by setting this to on.
2975 # To always back up file X as X.orig before converting its links (due
2976 # to -k / --convert-links / convert_links = on having been specified),
2977 # set this variable to on:
2978 #backup_converted = off
2980 # To have Wget follow FTP links from HTML files by default, set this
2984 # To try ipv6 addresses first:
2985 #prefer-family = IPv6
2987 # Set default IRI support state
2990 # Force the default system encoding
2993 # Force the default remote server encoding
2994 #remoteencoding = UTF-8
2997 File: wget.info, Node: Examples, Next: Various, Prev: Startup File, Up: Top
3002 The examples are divided into three sections loosely based on their
3007 * Simple Usage:: Simple, basic usage of the program.
3008 * Advanced Usage:: Advanced tips.
3009 * Very Advanced Usage:: The hairy stuff.
3012 File: wget.info, Node: Simple Usage, Next: Advanced Usage, Prev: Examples, Up: Examples
3017 * Say you want to download a URL. Just type:
3019 wget http://fly.srk.fer.hr/
3021 * But what will happen if the connection is slow, and the file is
3022 lengthy? The connection will probably fail before the whole file
3023 is retrieved, more than once. In this case, Wget will try getting
3024 the file until it either gets the whole of it, or exceeds the
3025 default number of retries (this being 20). It is easy to change
3026 the number of tries to 45, to insure that the whole file will
3029 wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
3031 * Now let's leave Wget to work in the background, and write its
3032 progress to log file `log'. It is tiring to type `--tries', so we
3035 wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
3037 The ampersand at the end of the line makes sure that Wget works in
3038 the background. To unlimit the number of retries, use `-t inf'.
3040 * The usage of FTP is as simple. Wget will take care of login and
3043 wget ftp://gnjilux.srk.fer.hr/welcome.msg
3045 * If you specify a directory, Wget will retrieve the directory
3046 listing, parse it and convert it to HTML. Try:
3048 wget ftp://ftp.gnu.org/pub/gnu/
3052 File: wget.info, Node: Advanced Usage, Next: Very Advanced Usage, Prev: Simple Usage, Up: Examples
3057 * You have a file that contains the URLs you want to download? Use
3062 If you specify `-' as file name, the URLs will be read from
3065 * Create a five levels deep mirror image of the GNU web site, with
3066 the same directory structure the original has, with only one try
3067 per document, saving the log of the activities to `gnulog':
3069 wget -r http://www.gnu.org/ -o gnulog
3071 * The same as the above, but convert the links in the downloaded
3072 files to point to local files, so you can view the documents
3075 wget --convert-links -r http://www.gnu.org/ -o gnulog
3077 * Retrieve only one HTML page, but make sure that all the elements
3078 needed for the page to be displayed, such as inline images and
3079 external style sheets, are also downloaded. Also make sure the
3080 downloaded page references the downloaded links.
3082 wget -p --convert-links http://www.server.com/dir/page.html
3084 The HTML page will be saved to `www.server.com/dir/page.html', and
3085 the images, stylesheets, etc., somewhere under `www.server.com/',
3086 depending on where they were on the remote server.
3088 * The same as the above, but without the `www.server.com/' directory.
3089 In fact, I don't want to have all those random server directories
3090 anyway--just save _all_ those files under a `download/'
3091 subdirectory of the current directory.
3093 wget -p --convert-links -nH -nd -Pdownload \
3094 http://www.server.com/dir/page.html
3096 * Retrieve the index.html of `www.lycos.com', showing the original
3099 wget -S http://www.lycos.com/
3101 * Save the server headers with the file, perhaps for post-processing.
3103 wget --save-headers http://www.lycos.com/
3106 * Retrieve the first two levels of `wuarchive.wustl.edu', saving them
3109 wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
3111 * You want to download all the GIFs from a directory on an HTTP
3112 server. You tried `wget http://www.server.com/dir/*.gif', but that
3113 didn't work because HTTP retrieval does not support globbing. In
3116 wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
3118 More verbose, but the effect is the same. `-r -l1' means to
3119 retrieve recursively (*note Recursive Download::), with maximum
3120 depth of 1. `--no-parent' means that references to the parent
3121 directory are ignored (*note Directory-Based Limits::), and
3122 `-A.gif' means to download only the GIF files. `-A "*.gif"' would
3125 * Suppose you were in the middle of downloading, when Wget was
3126 interrupted. Now you do not want to clobber the files already
3127 present. It would be:
3129 wget -nc -r http://www.gnu.org/
3131 * If you want to encode your own username and password to HTTP or
3132 FTP, use the appropriate URL syntax (*note URL Format::).
3134 wget ftp://hniksic:mypassword@unix.server.com/.emacs
3136 Note, however, that this usage is not advisable on multi-user
3137 systems because it reveals your password to anyone who looks at
3140 * You would like the output documents to go to standard output
3141 instead of to files?
3143 wget -O - http://jagor.srce.hr/ http://www.srce.hr/
3145 You can also combine the two options and make pipelines to
3146 retrieve the documents from remote hotlists:
3148 wget -O - http://cool.list.com/ | wget --force-html -i -
3151 File: wget.info, Node: Very Advanced Usage, Prev: Advanced Usage, Up: Examples
3153 7.3 Very Advanced Usage
3154 =======================
3156 * If you wish Wget to keep a mirror of a page (or FTP
3157 subdirectories), use `--mirror' (`-m'), which is the shorthand for
3158 `-r -l inf -N'. You can put Wget in the crontab file asking it to
3159 recheck a site each Sunday:
3162 0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
3164 * In addition to the above, you want the links to be converted for
3165 local viewing. But, after having read this manual, you know that
3166 link conversion doesn't play well with timestamping, so you also
3167 want Wget to back up the original HTML files before the
3168 conversion. Wget invocation would look like this:
3170 wget --mirror --convert-links --backup-converted \
3171 http://www.gnu.org/ -o /home/me/weeklog
3173 * But you've also noticed that local viewing doesn't work all that
3174 well when HTML files are saved under extensions other than `.html',
3175 perhaps because they were served as `index.cgi'. So you'd like
3176 Wget to rename all the files served with content-type `text/html'
3177 or `application/xhtml+xml' to `NAME.html'.
3179 wget --mirror --convert-links --backup-converted \
3180 --html-extension -o /home/me/weeklog \
3183 Or, with less typing:
3185 wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
3188 File: wget.info, Node: Various, Next: Appendices, Prev: Examples, Up: Top
3193 This chapter contains all the stuff that could not fit anywhere else.
3197 * Proxies:: Support for proxy servers.
3198 * Distribution:: Getting the latest version.
3199 * Web Site:: GNU Wget's presence on the World Wide Web.
3200 * Mailing Lists:: Wget mailing list for announcements and discussion.
3201 * Internet Relay Chat:: Wget's presence on IRC.
3202 * Reporting Bugs:: How and where to report bugs.
3203 * Portability:: The systems Wget works on.
3204 * Signals:: Signal-handling performed by Wget.
3207 File: wget.info, Node: Proxies, Next: Distribution, Prev: Various, Up: Various
3212 "Proxies" are special-purpose HTTP servers designed to transfer data
3213 from remote servers to local clients. One typical use of proxies is
3214 lightening network load for users behind a slow connection. This is
3215 achieved by channeling all HTTP and FTP requests through the proxy
3216 which caches the transferred data. When a cached resource is requested
3217 again, proxy will return the data from cache. Another use for proxies
3218 is for companies that separate (for security reasons) their internal
3219 networks from the rest of Internet. In order to obtain information
3220 from the Web, their users connect and retrieve remote data using an
3223 Wget supports proxies for both HTTP and FTP retrievals. The
3224 standard way to specify proxy location, which Wget recognizes, is using
3225 the following environment variables:
3229 If set, the `http_proxy' and `https_proxy' variables should
3230 contain the URLs of the proxies for HTTP and HTTPS connections
3234 This variable should contain the URL of the proxy for FTP
3235 connections. It is quite common that `http_proxy' and `ftp_proxy'
3236 are set to the same URL.
3239 This variable should contain a comma-separated list of domain
3240 extensions proxy should _not_ be used for. For instance, if the
3241 value of `no_proxy' is `.mit.edu', proxy will not be used to
3242 retrieve documents from MIT.
3244 In addition to the environment variables, proxy location and settings
3245 may be specified from within Wget itself.
3249 This option and the corresponding command may be used to suppress
3250 the use of proxy, even if the appropriate environment variables
3257 These startup file variables allow you to override the proxy
3258 settings specified by the environment.
3260 Some proxy servers require authorization to enable you to use them.
3261 The authorization consists of "username" and "password", which must be
3262 sent by Wget. As with HTTP authorization, several authentication
3263 schemes exist. For proxy authorization only the `Basic' authentication
3264 scheme is currently implemented.
3266 You may specify your username and password either through the proxy
3267 URL or through the command-line options. Assuming that the company's
3268 proxy is located at `proxy.company.com' at port 8001, a proxy URL
3269 location containing authorization data might look like this:
3271 http://hniksic:mypassword@proxy.company.com:8001/
3273 Alternatively, you may use the `proxy-user' and `proxy-password'
3274 options, and the equivalent `.wgetrc' settings `proxy_user' and
3275 `proxy_password' to set the proxy username and password.
3278 File: wget.info, Node: Distribution, Next: Web Site, Prev: Proxies, Up: Various
3283 Like all GNU utilities, the latest version of Wget can be found at the
3284 master GNU archive site ftp.gnu.org, and its mirrors. For example,
3285 Wget 1.12 can be found at
3286 `ftp://ftp.gnu.org/pub/gnu/wget/wget-1.12.tar.gz'
3289 File: wget.info, Node: Web Site, Next: Mailing Lists, Prev: Distribution, Up: Various
3294 The official web site for GNU Wget is at
3295 `http://www.gnu.org/software/wget/'. However, most useful information
3296 resides at "The Wget Wgiki", `http://wget.addictivecode.org/'.
3299 File: wget.info, Node: Mailing Lists, Next: Internet Relay Chat, Prev: Web Site, Up: Various
3307 The primary mailinglist for discussion, bug-reports, or questions about
3308 GNU Wget is at <bug-wget@gnu.org>. To subscribe, send an email to
3309 <bug-wget-join@gnu.org>, or visit
3310 `http://lists.gnu.org/mailman/listinfo/bug-wget'.
3312 You do not need to subscribe to send a message to the list; however,
3313 please note that unsubscribed messages are moderated, and may take a
3314 while before they hit the list--*usually around a day*. If you want
3315 your message to show up immediately, please subscribe to the list
3316 before posting. Archives for the list may be found at
3317 `http://lists.gnu.org/pipermail/bug-wget/'.
3319 An NNTP/Usenettish gateway is also available via Gmane
3320 (http://gmane.org/about.php). You can see the Gmane archives at
3321 `http://news.gmane.org/gmane.comp.web.wget.general'. Note that the
3322 Gmane archives conveniently include messages from both the current
3323 list, and the previous one. Messages also show up in the Gmane archives
3324 sooner than they do at `lists.gnu.org'.
3329 Additionally, there is the <wget-notify@addictivecode.org> mailing
3330 list. This is a non-discussion list that receives bug report
3331 notifications from the bug-tracker. To subscribe to this list, send an
3332 email to <wget-notify-join@addictivecode.org>, or visit
3333 `http://addictivecode.org/mailman/listinfo/wget-notify'.
3338 Previously, the mailing list <wget@sunsite.dk> was used as the main
3339 discussion list, and another list, <wget-patches@sunsite.dk> was used
3340 for submitting and discussing patches to GNU Wget.
3342 Messages from <wget@sunsite.dk> are archived at
3343 `http://www.mail-archive.com/wget%40sunsite.dk/' and at
3345 `http://news.gmane.org/gmane.comp.web.wget.general' (which also
3346 continues to archive the current list, <bug-wget@gnu.org>).
3348 Messages from <wget-patches@sunsite.dk> are archived at
3349 `http://news.gmane.org/gmane.comp.web.wget.patches'.
3352 File: wget.info, Node: Internet Relay Chat, Next: Reporting Bugs, Prev: Mailing Lists, Up: Various
3354 8.5 Internet Relay Chat
3355 =======================
3357 In addition to the mailinglists, we also have a support channel set up
3358 via IRC at `irc.freenode.org', `#wget'. Come check it out!
3361 File: wget.info, Node: Reporting Bugs, Next: Portability, Prev: Internet Relay Chat, Up: Various
3366 You are welcome to submit bug reports via the GNU Wget bug tracker (see
3367 `http://wget.addictivecode.org/BugTracker').
3369 Before actually submitting a bug report, please try to follow a few
3372 1. Please try to ascertain that the behavior you see really is a bug.
3373 If Wget crashes, it's a bug. If Wget does not behave as
3374 documented, it's a bug. If things work strange, but you are not
3375 sure about the way they are supposed to work, it might well be a
3376 bug, but you might want to double-check the documentation and the
3377 mailing lists (*note Mailing Lists::).
3379 2. Try to repeat the bug in as simple circumstances as possible.
3380 E.g. if Wget crashes while downloading `wget -rl0 -kKE -t5
3381 --no-proxy http://yoyodyne.com -o /tmp/log', you should try to see
3382 if the crash is repeatable, and if will occur with a simpler set
3383 of options. You might even try to start the download at the page
3384 where the crash occurred to see if that page somehow triggered the
3387 Also, while I will probably be interested to know the contents of
3388 your `.wgetrc' file, just dumping it into the debug message is
3389 probably a bad idea. Instead, you should first try to see if the
3390 bug repeats with `.wgetrc' moved out of the way. Only if it turns
3391 out that `.wgetrc' settings affect the bug, mail me the relevant
3394 3. Please start Wget with `-d' option and send us the resulting
3395 output (or relevant parts thereof). If Wget was compiled without
3396 debug support, recompile it--it is _much_ easier to trace bugs
3397 with debug support on.
3399 Note: please make sure to remove any potentially sensitive
3400 information from the debug log before sending it to the bug
3401 address. The `-d' won't go out of its way to collect sensitive
3402 information, but the log _will_ contain a fairly complete
3403 transcript of Wget's communication with the server, which may
3404 include passwords and pieces of downloaded data. Since the bug
3405 address is publically archived, you may assume that all bug
3406 reports are visible to the public.
3408 4. If Wget has crashed, try to run it in a debugger, e.g. `gdb `which
3409 wget` core' and type `where' to get the backtrace. This may not
3410 work if the system administrator has disabled core files, but it is
3414 File: wget.info, Node: Portability, Next: Signals, Prev: Reporting Bugs, Up: Various
3419 Like all GNU software, Wget works on the GNU system. However, since it
3420 uses GNU Autoconf for building and configuring, and mostly avoids using
3421 "special" features of any particular Unix, it should compile (and work)
3422 on all common Unix flavors.
3424 Various Wget versions have been compiled and tested under many kinds
3425 of Unix systems, including GNU/Linux, Solaris, SunOS 4.x, Mac OS X, OSF
3426 (aka Digital Unix or Tru64), Ultrix, *BSD, IRIX, AIX, and others. Some
3427 of those systems are no longer in widespread use and may not be able to
3428 support recent versions of Wget. If Wget fails to compile on your
3429 system, we would like to know about it.
3431 Thanks to kind contributors, this version of Wget compiles and works
3432 on 32-bit Microsoft Windows platforms. It has been compiled
3433 successfully using MS Visual C++ 6.0, Watcom, Borland C, and GCC
3434 compilers. Naturally, it is crippled of some features available on
3435 Unix, but it should work as a substitute for people stuck with Windows.
3436 Note that Windows-specific portions of Wget are not guaranteed to be
3437 supported in the future, although this has been the case in practice
3438 for many years now. All questions and problems in Windows usage should
3439 be reported to Wget mailing list at <wget@sunsite.dk> where the
3440 volunteers who maintain the Windows-related features might look at them.
3442 Support for building on MS-DOS via DJGPP has been contributed by
3443 Gisle Vanem; a port to VMS is maintained by Steven Schweda, and is
3444 available at `http://antinode.org/'.
3447 File: wget.info, Node: Signals, Prev: Portability, Up: Various
3452 Since the purpose of Wget is background work, it catches the hangup
3453 signal (`SIGHUP') and ignores it. If the output was on standard
3454 output, it will be redirected to a file named `wget-log'. Otherwise,
3455 `SIGHUP' is ignored. This is convenient when you wish to redirect the
3456 output of Wget after having started it.
3458 $ wget http://www.gnus.org/dist/gnus.tar.gz &
3461 SIGHUP received, redirecting output to `wget-log'.
3463 Other than that, Wget will not try to interfere with signals in any
3464 way. `C-c', `kill -TERM' and `kill -KILL' should kill it alike.
3467 File: wget.info, Node: Appendices, Next: Copying this manual, Prev: Various, Up: Top
3472 This chapter contains some references I consider useful.
3476 * Robot Exclusion:: Wget's support for RES.
3477 * Security Considerations:: Security with Wget.
3478 * Contributors:: People who helped.
3481 File: wget.info, Node: Robot Exclusion, Next: Security Considerations, Prev: Appendices, Up: Appendices
3486 It is extremely easy to make Wget wander aimlessly around a web site,
3487 sucking all the available data in progress. `wget -r SITE', and you're
3488 set. Great? Not for the server admin.
3490 As long as Wget is only retrieving static pages, and doing it at a
3491 reasonable rate (see the `--wait' option), there's not much of a
3492 problem. The trouble is that Wget can't tell the difference between the
3493 smallest static page and the most demanding CGI. A site I know has a
3494 section handled by a CGI Perl script that converts Info files to HTML on
3495 the fly. The script is slow, but works well enough for human users
3496 viewing an occasional Info file. However, when someone's recursive Wget
3497 download stumbles upon the index page that links to all the Info files
3498 through the script, the system is brought to its knees without providing
3499 anything useful to the user (This task of converting Info files could be
3500 done locally and access to Info documentation for all installed GNU
3501 software on a system is available from the `info' command).
3503 To avoid this kind of accident, as well as to preserve privacy for
3504 documents that need to be protected from well-behaved robots, the
3505 concept of "robot exclusion" was invented. The idea is that the server
3506 administrators and document authors can specify which portions of the
3507 site they wish to protect from robots and those they will permit access.
3509 The most popular mechanism, and the de facto standard supported by
3510 all the major robots, is the "Robots Exclusion Standard" (RES) written
3511 by Martijn Koster et al. in 1994. It specifies the format of a text
3512 file containing directives that instruct the robots which URL paths to
3513 avoid. To be found by the robots, the specifications must be placed in
3514 `/robots.txt' in the server root, which the robots are expected to
3517 Although Wget is not a web robot in the strictest sense of the word,
3518 it can download large parts of the site without the user's intervention
3519 to download an individual page. Because of that, Wget honors RES when
3520 downloading recursively. For instance, when you issue:
3522 wget -r http://www.server.com/
3524 First the index of `www.server.com' will be downloaded. If Wget
3525 finds that it wants to download more documents from that server, it will
3526 request `http://www.server.com/robots.txt' and, if found, use it for
3527 further downloads. `robots.txt' is loaded only once per each server.
3529 Until version 1.8, Wget supported the first version of the standard,
3530 written by Martijn Koster in 1994 and available at
3531 `http://www.robotstxt.org/wc/norobots.html'. As of version 1.8, Wget
3532 has supported the additional directives specified in the internet draft
3533 `<draft-koster-robots-00.txt>' titled "A Method for Web Robots
3534 Control". The draft, which has as far as I know never made to an RFC,
3535 is available at `http://www.robotstxt.org/wc/norobots-rfc.txt'.
3537 This manual no longer includes the text of the Robot Exclusion
3540 The second, less known mechanism, enables the author of an individual
3541 document to specify whether they want the links from the file to be
3542 followed by a robot. This is achieved using the `META' tag, like this:
3544 <meta name="robots" content="nofollow">
3546 This is explained in some detail at
3547 `http://www.robotstxt.org/wc/meta-user.html'. Wget supports this
3548 method of robot exclusion in addition to the usual `/robots.txt'
3551 If you know what you are doing and really really wish to turn off the
3552 robot exclusion, set the `robots' variable to `off' in your `.wgetrc'.
3553 You can achieve the same effect from the command line using the `-e'
3554 switch, e.g. `wget -e robots=off URL...'.
3557 File: wget.info, Node: Security Considerations, Next: Contributors, Prev: Robot Exclusion, Up: Appendices
3559 9.2 Security Considerations
3560 ===========================
3562 When using Wget, you must be aware that it sends unencrypted passwords
3563 through the network, which may present a security problem. Here are the
3564 main issues, and some solutions.
3566 1. The passwords on the command line are visible using `ps'. The best
3567 way around it is to use `wget -i -' and feed the URLs to Wget's
3568 standard input, each on a separate line, terminated by `C-d'.
3569 Another workaround is to use `.netrc' to store passwords; however,
3570 storing unencrypted passwords is also considered a security risk.
3572 2. Using the insecure "basic" authentication scheme, unencrypted
3573 passwords are transmitted through the network routers and gateways.
3575 3. The FTP passwords are also in no way encrypted. There is no good
3576 solution for this at the moment.
3578 4. Although the "normal" output of Wget tries to hide the passwords,
3579 debugging logs show them, in all forms. This problem is avoided by
3580 being careful when you send debug logs (yes, even when you send
3584 File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
3589 GNU Wget was written by Hrvoje Niksic <hniksic@xemacs.org>, and it is
3590 currently maintained by Micah Cowan <micah@cowan.name>.
3592 However, the development of Wget could never have gone as far as it
3593 has, were it not for the help of many people, either with bug reports,
3594 feature proposals, patches, or letters saying "Thanks!".
3596 Special thanks goes to the following people (no particular order):
3598 * Dan Harkless--contributed a lot of code and documentation of
3599 extremely high quality, as well as the `--page-requisites' and
3600 related options. He was the principal maintainer for some time and
3603 * Ian Abbott--contributed bug fixes, Windows-related fixes, and
3604 provided a prototype implementation of the breadth-first recursive
3605 download. Co-maintained Wget during the 1.8 release cycle.
3607 * The dotsrc.org crew, in particular Karsten Thygesen--donated system
3608 resources such as the mailing list, web space, FTP space, and
3609 version control repositories, along with a lot of time to make
3610 these actually work. Christian Reiniger was of invaluable help
3611 with setting up Subversion.
3613 * Heiko Herold--provided high-quality Windows builds and contributed
3614 bug and build reports for many years.
3616 * Shawn McHorse--bug reports and patches.
3618 * Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization. Lots of
3621 * Gordon Matzigkeit--`.netrc' support.
3623 * Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
3624 suggestions and "philosophical" discussions.
3626 * Darko Budor--initial port to Windows.
3628 * Antonio Rosella--help and suggestions, plus the initial Italian
3631 * Tomislav Petrovic, Mario Mikocevic--many bug reports and
3634 * Francois Pinard--many thorough bug reports and discussions.
3636 * Karl Eichwalder--lots of help with internationalization, Makefile
3637 layout and many other things.
3639 * Junio Hamano--donated support for Opie and HTTP `Digest'
3642 * Mauro Tortonesi--improved IPv6 support, adding support for dual
3643 family systems. Refactored and enhanced FTP IPv6 code. Maintained
3644 GNU Wget from 2004-2007.
3646 * Christopher G. Lewis--maintenance of the Windows version of GNU
3649 * Gisle Vanem--many helpful patches and improvements, especially for
3650 Windows and MS-DOS support.
3652 * Ralf Wildenhues--contributed patches to convert Wget to use
3653 Automake as part of its build process, and various bugfixes.
3655 * Steven Schubiger--Many helpful patches, bugfixes and improvements.
3656 Notably, conversion of Wget to use the Gnulib quotes and quoteargs
3657 modules, and the addition of password prompts at the console, via
3658 the Gnulib getpasswd-gnu module.
3660 * Ted Mielczarek--donated support for CSS.
3662 * Saint Xavier--Support for IRIs (RFC 3987).
3664 * People who provided donations for development--including Brian
3667 The following people have provided patches, bug/build reports, useful
3668 suggestions, beta testing services, fan mail and all the other things
3669 that make maintenance so much fun:
3671 Tim Adam, Adrian Aichner, Martin Baehr, Dieter Baron, Roger Beeman,
3672 Dan Berger, T. Bharath, Christian Biere, Paul Bludov, Daniel Bodea,
3673 Mark Boyns, John Burden, Julien Buty, Wanderlei Cavassin, Gilles Cedoc,
3674 Tim Charron, Noel Cragg, Kristijan Conkas, John Daily, Andreas Damm,
3675 Ahmon Dancy, Andrew Davison, Bertrand Demiddelaer, Alexander Dergachev,
3676 Andrew Deryabin, Ulrich Drepper, Marc Duponcheel, Damir Dzeko, Alan
3677 Eldridge, Hans-Andreas Engel, Aleksandar Erkalovic, Andy Eskilsson,
3678 Joao Ferreira, Christian Fraenkel, David Fritz, Mike Frysinger, Charles
3679 C. Fu, FUJISHIMA Satsuki, Masashi Fujita, Howard Gayle, Marcel Gerrits,
3680 Lemble Gregory, Hans Grobler, Alain Guibert, Mathieu Guillaume, Aaron
3681 Hawley, Jochen Hein, Karl Heuer, Madhusudan Hosaagrahara, HIROSE
3682 Masaaki, Ulf Harnhammar, Gregor Hoffleit, Erik Magnus Hulthen, Richard
3683 Huveneers, Jonas Jensen, Larry Jones, Simon Josefsson, Mario Juric,
3684 Hack Kampbjorn, Const Kaplinsky, Goran Kezunovic, Igor Khristophorov,
3685 Robert Kleine, KOJIMA Haime, Fila Kolodny, Alexander Kourakos, Martin
3686 Kraemer, Sami Krank, Jay Krell, Simos KSenitellis, Christian Lackas,
3687 Hrvoje Lacko, Daniel S. Lewart, Nicolas Lichtmeier, Dave Love,
3688 Alexander V. Lukyanov, Thomas Lussnig, Andre Majorel, Aurelien Marchand,
3689 Matthew J. Mellon, Jordan Mendelson, Ted Mielczarek, Robert Millan, Lin
3690 Zhe Min, Jan Minar, Tim Mooney, Keith Moore, Adam D. Moss, Simon Munton,
3691 Charlie Negyesi, R. K. Owen, Jim Paris, Kenny Parnell, Leonid Petrov,
3692 Simone Piunno, Andrew Pollock, Steve Pothier, Jan Prikryl, Marin Purgar,
3693 Csaba Raduly, Keith Refson, Bill Richardson, Tyler Riddle, Tobias
3694 Ringstrom, Jochen Roderburg, Juan Jose Rodriguez, Maciej W. Rozycki,
3695 Edward J. Sabol, Heinz Salzmann, Robert Schmidt, Nicolas Schodet, Benno
3696 Schulenberg, Andreas Schwab, Steven M. Schweda, Chris Seawood, Pranab
3697 Shenoy, Dennis Smit, Toomas Soome, Tage Stabell-Kulo, Philip Stadermann,
3698 Daniel Stenberg, Sven Sternberger, Markus Strasser, John Summerfield,
3699 Szakacsits Szabolcs, Mike Thomas, Philipp Thomas, Mauro Tortonesi, Dave
3700 Turner, Gisle Vanem, Rabin Vincent, Russell Vincent, Zeljko Vrba,
3701 Charles G Waldman, Douglas E. Wegscheid, Ralf Wildenhues, Joshua David
3702 Williams, Benjamin Wolsey, Saint Xavier, YAMAZAKI Makoto, Jasmin Zainul,
3703 Bojan Zdrnja, Kristijan Zimmer, Xin Zou.
3705 Apologies to all who I accidentally left out, and many thanks to all
3706 the subscribers of the Wget mailing list.
3709 File: wget.info, Node: Copying this manual, Next: Concept Index, Prev: Appendices, Up: Top
3711 Appendix A Copying this manual
3712 ******************************
3716 * GNU Free Documentation License:: Licnse for copying this manual.
3719 File: wget.info, Node: GNU Free Documentation License, Prev: Copying this manual, Up: Copying this manual
3721 A.1 GNU Free Documentation License
3722 ==================================
3724 Version 1.3, 3 November 2008
3726 Copyright (C) 2000, 2001, 2002, 2007, 2008, 2009 Free Software
3730 Everyone is permitted to copy and distribute verbatim copies
3731 of this license document, but changing it is not allowed.
3735 The purpose of this License is to make a manual, textbook, or other
3736 functional and useful document "free" in the sense of freedom: to
3737 assure everyone the effective freedom to copy and redistribute it,
3738 with or without modifying it, either commercially or
3739 noncommercially. Secondarily, this License preserves for the
3740 author and publisher a way to get credit for their work, while not
3741 being considered responsible for modifications made by others.
3743 This License is a kind of "copyleft", which means that derivative
3744 works of the document must themselves be free in the same sense.
3745 It complements the GNU General Public License, which is a copyleft
3746 license designed for free software.
3748 We have designed this License in order to use it for manuals for
3749 free software, because free software needs free documentation: a
3750 free program should come with manuals providing the same freedoms
3751 that the software does. But this License is not limited to
3752 software manuals; it can be used for any textual work, regardless
3753 of subject matter or whether it is published as a printed book.
3754 We recommend this License principally for works whose purpose is
3755 instruction or reference.
3757 1. APPLICABILITY AND DEFINITIONS
3759 This License applies to any manual or other work, in any medium,
3760 that contains a notice placed by the copyright holder saying it
3761 can be distributed under the terms of this License. Such a notice
3762 grants a world-wide, royalty-free license, unlimited in duration,
3763 to use that work under the conditions stated herein. The
3764 "Document", below, refers to any such manual or work. Any member
3765 of the public is a licensee, and is addressed as "you". You
3766 accept the license if you copy, modify or distribute the work in a
3767 way requiring permission under copyright law.
3769 A "Modified Version" of the Document means any work containing the
3770 Document or a portion of it, either copied verbatim, or with
3771 modifications and/or translated into another language.
3773 A "Secondary Section" is a named appendix or a front-matter section
3774 of the Document that deals exclusively with the relationship of the
3775 publishers or authors of the Document to the Document's overall
3776 subject (or to related matters) and contains nothing that could
3777 fall directly within that overall subject. (Thus, if the Document
3778 is in part a textbook of mathematics, a Secondary Section may not
3779 explain any mathematics.) The relationship could be a matter of
3780 historical connection with the subject or with related matters, or
3781 of legal, commercial, philosophical, ethical or political position
3784 The "Invariant Sections" are certain Secondary Sections whose
3785 titles are designated, as being those of Invariant Sections, in
3786 the notice that says that the Document is released under this
3787 License. If a section does not fit the above definition of
3788 Secondary then it is not allowed to be designated as Invariant.
3789 The Document may contain zero Invariant Sections. If the Document
3790 does not identify any Invariant Sections then there are none.
3792 The "Cover Texts" are certain short passages of text that are
3793 listed, as Front-Cover Texts or Back-Cover Texts, in the notice
3794 that says that the Document is released under this License. A
3795 Front-Cover Text may be at most 5 words, and a Back-Cover Text may
3796 be at most 25 words.
3798 A "Transparent" copy of the Document means a machine-readable copy,
3799 represented in a format whose specification is available to the
3800 general public, that is suitable for revising the document
3801 straightforwardly with generic text editors or (for images
3802 composed of pixels) generic paint programs or (for drawings) some
3803 widely available drawing editor, and that is suitable for input to
3804 text formatters or for automatic translation to a variety of
3805 formats suitable for input to text formatters. A copy made in an
3806 otherwise Transparent file format whose markup, or absence of
3807 markup, has been arranged to thwart or discourage subsequent
3808 modification by readers is not Transparent. An image format is
3809 not Transparent if used for any substantial amount of text. A
3810 copy that is not "Transparent" is called "Opaque".
3812 Examples of suitable formats for Transparent copies include plain
3813 ASCII without markup, Texinfo input format, LaTeX input format,
3814 SGML or XML using a publicly available DTD, and
3815 standard-conforming simple HTML, PostScript or PDF designed for
3816 human modification. Examples of transparent image formats include
3817 PNG, XCF and JPG. Opaque formats include proprietary formats that
3818 can be read and edited only by proprietary word processors, SGML or
3819 XML for which the DTD and/or processing tools are not generally
3820 available, and the machine-generated HTML, PostScript or PDF
3821 produced by some word processors for output purposes only.
3823 The "Title Page" means, for a printed book, the title page itself,
3824 plus such following pages as are needed to hold, legibly, the
3825 material this License requires to appear in the title page. For
3826 works in formats which do not have any title page as such, "Title
3827 Page" means the text near the most prominent appearance of the
3828 work's title, preceding the beginning of the body of the text.
3830 The "publisher" means any person or entity that distributes copies
3831 of the Document to the public.
3833 A section "Entitled XYZ" means a named subunit of the Document
3834 whose title either is precisely XYZ or contains XYZ in parentheses
3835 following text that translates XYZ in another language. (Here XYZ
3836 stands for a specific section name mentioned below, such as
3837 "Acknowledgements", "Dedications", "Endorsements", or "History".)
3838 To "Preserve the Title" of such a section when you modify the
3839 Document means that it remains a section "Entitled XYZ" according
3842 The Document may include Warranty Disclaimers next to the notice
3843 which states that this License applies to the Document. These
3844 Warranty Disclaimers are considered to be included by reference in
3845 this License, but only as regards disclaiming warranties: any other
3846 implication that these Warranty Disclaimers may have is void and
3847 has no effect on the meaning of this License.
3851 You may copy and distribute the Document in any medium, either
3852 commercially or noncommercially, provided that this License, the
3853 copyright notices, and the license notice saying this License
3854 applies to the Document are reproduced in all copies, and that you
3855 add no other conditions whatsoever to those of this License. You
3856 may not use technical measures to obstruct or control the reading
3857 or further copying of the copies you make or distribute. However,
3858 you may accept compensation in exchange for copies. If you
3859 distribute a large enough number of copies you must also follow
3860 the conditions in section 3.
3862 You may also lend copies, under the same conditions stated above,
3863 and you may publicly display copies.
3865 3. COPYING IN QUANTITY
3867 If you publish printed copies (or copies in media that commonly
3868 have printed covers) of the Document, numbering more than 100, and
3869 the Document's license notice requires Cover Texts, you must
3870 enclose the copies in covers that carry, clearly and legibly, all
3871 these Cover Texts: Front-Cover Texts on the front cover, and
3872 Back-Cover Texts on the back cover. Both covers must also clearly
3873 and legibly identify you as the publisher of these copies. The
3874 front cover must present the full title with all words of the
3875 title equally prominent and visible. You may add other material
3876 on the covers in addition. Copying with changes limited to the
3877 covers, as long as they preserve the title of the Document and
3878 satisfy these conditions, can be treated as verbatim copying in
3881 If the required texts for either cover are too voluminous to fit
3882 legibly, you should put the first ones listed (as many as fit
3883 reasonably) on the actual cover, and continue the rest onto
3886 If you publish or distribute Opaque copies of the Document
3887 numbering more than 100, you must either include a
3888 machine-readable Transparent copy along with each Opaque copy, or
3889 state in or with each Opaque copy a computer-network location from
3890 which the general network-using public has access to download
3891 using public-standard network protocols a complete Transparent
3892 copy of the Document, free of added material. If you use the
3893 latter option, you must take reasonably prudent steps, when you
3894 begin distribution of Opaque copies in quantity, to ensure that
3895 this Transparent copy will remain thus accessible at the stated
3896 location until at least one year after the last time you
3897 distribute an Opaque copy (directly or through your agents or
3898 retailers) of that edition to the public.
3900 It is requested, but not required, that you contact the authors of
3901 the Document well before redistributing any large number of
3902 copies, to give them a chance to provide you with an updated
3903 version of the Document.
3907 You may copy and distribute a Modified Version of the Document
3908 under the conditions of sections 2 and 3 above, provided that you
3909 release the Modified Version under precisely this License, with
3910 the Modified Version filling the role of the Document, thus
3911 licensing distribution and modification of the Modified Version to
3912 whoever possesses a copy of it. In addition, you must do these
3913 things in the Modified Version:
3915 A. Use in the Title Page (and on the covers, if any) a title
3916 distinct from that of the Document, and from those of
3917 previous versions (which should, if there were any, be listed
3918 in the History section of the Document). You may use the
3919 same title as a previous version if the original publisher of
3920 that version gives permission.
3922 B. List on the Title Page, as authors, one or more persons or
3923 entities responsible for authorship of the modifications in
3924 the Modified Version, together with at least five of the
3925 principal authors of the Document (all of its principal
3926 authors, if it has fewer than five), unless they release you
3927 from this requirement.
3929 C. State on the Title page the name of the publisher of the
3930 Modified Version, as the publisher.
3932 D. Preserve all the copyright notices of the Document.
3934 E. Add an appropriate copyright notice for your modifications
3935 adjacent to the other copyright notices.
3937 F. Include, immediately after the copyright notices, a license
3938 notice giving the public permission to use the Modified
3939 Version under the terms of this License, in the form shown in
3942 G. Preserve in that license notice the full lists of Invariant
3943 Sections and required Cover Texts given in the Document's
3946 H. Include an unaltered copy of this License.
3948 I. Preserve the section Entitled "History", Preserve its Title,
3949 and add to it an item stating at least the title, year, new
3950 authors, and publisher of the Modified Version as given on
3951 the Title Page. If there is no section Entitled "History" in
3952 the Document, create one stating the title, year, authors,
3953 and publisher of the Document as given on its Title Page,
3954 then add an item describing the Modified Version as stated in
3955 the previous sentence.
3957 J. Preserve the network location, if any, given in the Document
3958 for public access to a Transparent copy of the Document, and
3959 likewise the network locations given in the Document for
3960 previous versions it was based on. These may be placed in
3961 the "History" section. You may omit a network location for a
3962 work that was published at least four years before the
3963 Document itself, or if the original publisher of the version
3964 it refers to gives permission.
3966 K. For any section Entitled "Acknowledgements" or "Dedications",
3967 Preserve the Title of the section, and preserve in the
3968 section all the substance and tone of each of the contributor
3969 acknowledgements and/or dedications given therein.
3971 L. Preserve all the Invariant Sections of the Document,
3972 unaltered in their text and in their titles. Section numbers
3973 or the equivalent are not considered part of the section
3976 M. Delete any section Entitled "Endorsements". Such a section
3977 may not be included in the Modified Version.
3979 N. Do not retitle any existing section to be Entitled
3980 "Endorsements" or to conflict in title with any Invariant
3983 O. Preserve any Warranty Disclaimers.
3985 If the Modified Version includes new front-matter sections or
3986 appendices that qualify as Secondary Sections and contain no
3987 material copied from the Document, you may at your option
3988 designate some or all of these sections as invariant. To do this,
3989 add their titles to the list of Invariant Sections in the Modified
3990 Version's license notice. These titles must be distinct from any
3991 other section titles.
3993 You may add a section Entitled "Endorsements", provided it contains
3994 nothing but endorsements of your Modified Version by various
3995 parties--for example, statements of peer review or that the text
3996 has been approved by an organization as the authoritative
3997 definition of a standard.
3999 You may add a passage of up to five words as a Front-Cover Text,
4000 and a passage of up to 25 words as a Back-Cover Text, to the end
4001 of the list of Cover Texts in the Modified Version. Only one
4002 passage of Front-Cover Text and one of Back-Cover Text may be
4003 added by (or through arrangements made by) any one entity. If the
4004 Document already includes a cover text for the same cover,
4005 previously added by you or by arrangement made by the same entity
4006 you are acting on behalf of, you may not add another; but you may
4007 replace the old one, on explicit permission from the previous
4008 publisher that added the old one.
4010 The author(s) and publisher(s) of the Document do not by this
4011 License give permission to use their names for publicity for or to
4012 assert or imply endorsement of any Modified Version.
4014 5. COMBINING DOCUMENTS
4016 You may combine the Document with other documents released under
4017 this License, under the terms defined in section 4 above for
4018 modified versions, provided that you include in the combination
4019 all of the Invariant Sections of all of the original documents,
4020 unmodified, and list them all as Invariant Sections of your
4021 combined work in its license notice, and that you preserve all
4022 their Warranty Disclaimers.
4024 The combined work need only contain one copy of this License, and
4025 multiple identical Invariant Sections may be replaced with a single
4026 copy. If there are multiple Invariant Sections with the same name
4027 but different contents, make the title of each such section unique
4028 by adding at the end of it, in parentheses, the name of the
4029 original author or publisher of that section if known, or else a
4030 unique number. Make the same adjustment to the section titles in
4031 the list of Invariant Sections in the license notice of the
4034 In the combination, you must combine any sections Entitled
4035 "History" in the various original documents, forming one section
4036 Entitled "History"; likewise combine any sections Entitled
4037 "Acknowledgements", and any sections Entitled "Dedications". You
4038 must delete all sections Entitled "Endorsements."
4040 6. COLLECTIONS OF DOCUMENTS
4042 You may make a collection consisting of the Document and other
4043 documents released under this License, and replace the individual
4044 copies of this License in the various documents with a single copy
4045 that is included in the collection, provided that you follow the
4046 rules of this License for verbatim copying of each of the
4047 documents in all other respects.
4049 You may extract a single document from such a collection, and
4050 distribute it individually under this License, provided you insert
4051 a copy of this License into the extracted document, and follow
4052 this License in all other respects regarding verbatim copying of
4055 7. AGGREGATION WITH INDEPENDENT WORKS
4057 A compilation of the Document or its derivatives with other
4058 separate and independent documents or works, in or on a volume of
4059 a storage or distribution medium, is called an "aggregate" if the
4060 copyright resulting from the compilation is not used to limit the
4061 legal rights of the compilation's users beyond what the individual
4062 works permit. When the Document is included in an aggregate, this
4063 License does not apply to the other works in the aggregate which
4064 are not themselves derivative works of the Document.
4066 If the Cover Text requirement of section 3 is applicable to these
4067 copies of the Document, then if the Document is less than one half
4068 of the entire aggregate, the Document's Cover Texts may be placed
4069 on covers that bracket the Document within the aggregate, or the
4070 electronic equivalent of covers if the Document is in electronic
4071 form. Otherwise they must appear on printed covers that bracket
4072 the whole aggregate.
4076 Translation is considered a kind of modification, so you may
4077 distribute translations of the Document under the terms of section
4078 4. Replacing Invariant Sections with translations requires special
4079 permission from their copyright holders, but you may include
4080 translations of some or all Invariant Sections in addition to the
4081 original versions of these Invariant Sections. You may include a
4082 translation of this License, and all the license notices in the
4083 Document, and any Warranty Disclaimers, provided that you also
4084 include the original English version of this License and the
4085 original versions of those notices and disclaimers. In case of a
4086 disagreement between the translation and the original version of
4087 this License or a notice or disclaimer, the original version will
4090 If a section in the Document is Entitled "Acknowledgements",
4091 "Dedications", or "History", the requirement (section 4) to
4092 Preserve its Title (section 1) will typically require changing the
4097 You may not copy, modify, sublicense, or distribute the Document
4098 except as expressly provided under this License. Any attempt
4099 otherwise to copy, modify, sublicense, or distribute it is void,
4100 and will automatically terminate your rights under this License.
4102 However, if you cease all violation of this License, then your
4103 license from a particular copyright holder is reinstated (a)
4104 provisionally, unless and until the copyright holder explicitly
4105 and finally terminates your license, and (b) permanently, if the
4106 copyright holder fails to notify you of the violation by some
4107 reasonable means prior to 60 days after the cessation.
4109 Moreover, your license from a particular copyright holder is
4110 reinstated permanently if the copyright holder notifies you of the
4111 violation by some reasonable means, this is the first time you have
4112 received notice of violation of this License (for any work) from
4113 that copyright holder, and you cure the violation prior to 30 days
4114 after your receipt of the notice.
4116 Termination of your rights under this section does not terminate
4117 the licenses of parties who have received copies or rights from
4118 you under this License. If your rights have been terminated and
4119 not permanently reinstated, receipt of a copy of some or all of
4120 the same material does not give you any rights to use it.
4122 10. FUTURE REVISIONS OF THIS LICENSE
4124 The Free Software Foundation may publish new, revised versions of
4125 the GNU Free Documentation License from time to time. Such new
4126 versions will be similar in spirit to the present version, but may
4127 differ in detail to address new problems or concerns. See
4128 `http://www.gnu.org/copyleft/'.
4130 Each version of the License is given a distinguishing version
4131 number. If the Document specifies that a particular numbered
4132 version of this License "or any later version" applies to it, you
4133 have the option of following the terms and conditions either of
4134 that specified version or of any later version that has been
4135 published (not as a draft) by the Free Software Foundation. If
4136 the Document does not specify a version number of this License,
4137 you may choose any version ever published (not as a draft) by the
4138 Free Software Foundation. If the Document specifies that a proxy
4139 can decide which future versions of this License can be used, that
4140 proxy's public statement of acceptance of a version permanently
4141 authorizes you to choose that version for the Document.
4145 "Massive Multiauthor Collaboration Site" (or "MMC Site") means any
4146 World Wide Web server that publishes copyrightable works and also
4147 provides prominent facilities for anybody to edit those works. A
4148 public wiki that anybody can edit is an example of such a server.
4149 A "Massive Multiauthor Collaboration" (or "MMC") contained in the
4150 site means any set of copyrightable works thus published on the MMC
4153 "CC-BY-SA" means the Creative Commons Attribution-Share Alike 3.0
4154 license published by Creative Commons Corporation, a not-for-profit
4155 corporation with a principal place of business in San Francisco,
4156 California, as well as future copyleft versions of that license
4157 published by that same organization.
4159 "Incorporate" means to publish or republish a Document, in whole or
4160 in part, as part of another Document.
4162 An MMC is "eligible for relicensing" if it is licensed under this
4163 License, and if all works that were first published under this
4164 License somewhere other than this MMC, and subsequently
4165 incorporated in whole or in part into the MMC, (1) had no cover
4166 texts or invariant sections, and (2) were thus incorporated prior
4167 to November 1, 2008.
4169 The operator of an MMC Site may republish an MMC contained in the
4170 site under CC-BY-SA on the same site at any time before August 1,
4171 2009, provided the MMC is eligible for relicensing.
4174 ADDENDUM: How to use this License for your documents
4175 ====================================================
4177 To use this License in a document you have written, include a copy of
4178 the License in the document and put the following copyright and license
4179 notices just after the title page:
4181 Copyright (C) YEAR YOUR NAME.
4182 Permission is granted to copy, distribute and/or modify this document
4183 under the terms of the GNU Free Documentation License, Version 1.3
4184 or any later version published by the Free Software Foundation;
4185 with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
4186 Texts. A copy of the license is included in the section entitled ``GNU
4187 Free Documentation License''.
4189 If you have Invariant Sections, Front-Cover Texts and Back-Cover
4190 Texts, replace the "with...Texts." line with this:
4192 with the Invariant Sections being LIST THEIR TITLES, with
4193 the Front-Cover Texts being LIST, and with the Back-Cover Texts
4196 If you have Invariant Sections without Cover Texts, or some other
4197 combination of the three, merge those two alternatives to suit the
4200 If your document contains nontrivial examples of program code, we
4201 recommend releasing these examples in parallel under your choice of
4202 free software license, such as the GNU General Public License, to
4203 permit their use in free software.
4206 File: wget.info, Node: Concept Index, Prev: Copying this manual, Up: Top
4214 * #wget: Internet Relay Chat. (line 6)
4215 * .css extension: HTTP Options. (line 10)
4216 * .html extension: HTTP Options. (line 10)
4217 * .listing files, removing: FTP Options. (line 21)
4218 * .netrc: Startup File. (line 6)
4219 * .wgetrc: Startup File. (line 6)
4220 * accept directories: Directory-Based Limits.
4222 * accept suffixes: Types of Files. (line 15)
4223 * accept wildcards: Types of Files. (line 15)
4224 * append to log: Logging and Input File Options.
4226 * arguments: Invoking. (line 6)
4227 * authentication <1>: HTTP Options. (line 41)
4228 * authentication: Download Options. (line 448)
4229 * backing up converted files: Recursive Retrieval Options.
4231 * bandwidth, limit: Download Options. (line 238)
4232 * base for relative links in input file: Logging and Input File Options.
4234 * bind address: Download Options. (line 6)
4235 * bug reports: Reporting Bugs. (line 6)
4236 * bugs: Reporting Bugs. (line 6)
4237 * cache: HTTP Options. (line 69)
4238 * caching of DNS lookups: Download Options. (line 324)
4239 * case fold: Recursive Accept/Reject Options.
4241 * client IP address: Download Options. (line 6)
4242 * clobbering, file: Download Options. (line 50)
4243 * command line: Invoking. (line 6)
4244 * comments, HTML: Recursive Retrieval Options.
4246 * connect timeout: Download Options. (line 221)
4247 * Content-Disposition: HTTP Options. (line 298)
4248 * Content-Length, ignore: HTTP Options. (line 158)
4249 * continue retrieval: Download Options. (line 86)
4250 * contributors: Contributors. (line 6)
4251 * conversion of links: Recursive Retrieval Options.
4253 * cookies: HTTP Options. (line 78)
4254 * cookies, loading: HTTP Options. (line 88)
4255 * cookies, saving: HTTP Options. (line 136)
4256 * cookies, session: HTTP Options. (line 141)
4257 * cut directories: Directory Options. (line 32)
4258 * debug: Logging and Input File Options.
4260 * default page name: HTTP Options. (line 6)
4261 * delete after retrieval: Recursive Retrieval Options.
4263 * directories: Directory-Based Limits.
4265 * directories, exclude: Directory-Based Limits.
4267 * directories, include: Directory-Based Limits.
4269 * directory limits: Directory-Based Limits.
4271 * directory prefix: Directory Options. (line 60)
4272 * DNS cache: Download Options. (line 324)
4273 * DNS timeout: Download Options. (line 215)
4274 * dot style: Download Options. (line 147)
4275 * downloading multiple times: Download Options. (line 50)
4276 * EGD: HTTPS (SSL/TLS) Options.
4278 * entropy, specifying source of: HTTPS (SSL/TLS) Options.
4280 * examples: Examples. (line 6)
4281 * exclude directories: Directory-Based Limits.
4283 * execute wgetrc command: Basic Startup Options.
4285 * FDL, GNU Free Documentation License: GNU Free Documentation License.
4287 * features: Overview. (line 6)
4288 * file names, restrict: Download Options. (line 343)
4289 * filling proxy cache: Recursive Retrieval Options.
4291 * follow FTP links: Recursive Accept/Reject Options.
4293 * following ftp links: FTP Links. (line 6)
4294 * following links: Following Links. (line 6)
4295 * force html: Logging and Input File Options.
4297 * ftp authentication: FTP Options. (line 6)
4298 * ftp password: FTP Options. (line 6)
4299 * ftp time-stamping: FTP Time-Stamping Internals.
4301 * ftp user: FTP Options. (line 6)
4302 * globbing, toggle: FTP Options. (line 45)
4303 * hangup: Signals. (line 6)
4304 * header, add: HTTP Options. (line 169)
4305 * hosts, spanning: Spanning Hosts. (line 6)
4306 * HTML comments: Recursive Retrieval Options.
4308 * http password: HTTP Options. (line 41)
4309 * http referer: HTTP Options. (line 210)
4310 * http time-stamping: HTTP Time-Stamping Internals.
4312 * http user: HTTP Options. (line 41)
4313 * idn support: Download Options. (line 461)
4314 * ignore case: Recursive Accept/Reject Options.
4316 * ignore length: HTTP Options. (line 158)
4317 * include directories: Directory-Based Limits.
4319 * incomplete downloads: Download Options. (line 86)
4320 * incremental updating: Time-Stamping. (line 6)
4321 * index.html: HTTP Options. (line 6)
4322 * input-file: Logging and Input File Options.
4324 * Internet Relay Chat: Internet Relay Chat. (line 6)
4325 * invoking: Invoking. (line 6)
4326 * IP address, client: Download Options. (line 6)
4327 * IPv6: Download Options. (line 394)
4328 * IRC: Internet Relay Chat. (line 6)
4329 * iri support: Download Options. (line 461)
4330 * Keep-Alive, turning off: HTTP Options. (line 57)
4331 * latest version: Distribution. (line 6)
4332 * limit bandwidth: Download Options. (line 238)
4333 * link conversion: Recursive Retrieval Options.
4335 * links: Following Links. (line 6)
4336 * list: Mailing Lists. (line 6)
4337 * loading cookies: HTTP Options. (line 88)
4338 * local encoding: Download Options. (line 469)
4339 * location of wgetrc: Wgetrc Location. (line 6)
4340 * log file: Logging and Input File Options.
4342 * mailing list: Mailing Lists. (line 6)
4343 * mirroring: Very Advanced Usage. (line 6)
4344 * no parent: Directory-Based Limits.
4346 * no-clobber: Download Options. (line 50)
4347 * nohup: Invoking. (line 6)
4348 * number of retries: Download Options. (line 12)
4349 * operating systems: Portability. (line 6)
4350 * option syntax: Option Syntax. (line 6)
4351 * output file: Logging and Input File Options.
4353 * overview: Overview. (line 6)
4354 * page requisites: Recursive Retrieval Options.
4356 * passive ftp: FTP Options. (line 61)
4357 * password: Download Options. (line 448)
4358 * pause: Download Options. (line 258)
4359 * Persistent Connections, disabling: HTTP Options. (line 57)
4360 * portability: Portability. (line 6)
4361 * POST: HTTP Options. (line 243)
4362 * progress indicator: Download Options. (line 147)
4363 * proxies: Proxies. (line 6)
4364 * proxy <1>: HTTP Options. (line 69)
4365 * proxy: Download Options. (line 301)
4366 * proxy authentication: HTTP Options. (line 201)
4367 * proxy filling: Recursive Retrieval Options.
4369 * proxy password: HTTP Options. (line 201)
4370 * proxy user: HTTP Options. (line 201)
4371 * quiet: Logging and Input File Options.
4373 * quota: Download Options. (line 308)
4374 * random wait: Download Options. (line 283)
4375 * randomness, specifying source of: HTTPS (SSL/TLS) Options.
4377 * rate, limit: Download Options. (line 238)
4378 * read timeout: Download Options. (line 226)
4379 * recursion: Recursive Download. (line 6)
4380 * recursive download: Recursive Download. (line 6)
4381 * redirect: HTTP Options. (line 195)
4382 * redirecting output: Advanced Usage. (line 89)
4383 * referer, http: HTTP Options. (line 210)
4384 * reject directories: Directory-Based Limits.
4386 * reject suffixes: Types of Files. (line 34)
4387 * reject wildcards: Types of Files. (line 34)
4388 * relative links: Relative Links. (line 6)
4389 * remote encoding: Download Options. (line 481)
4390 * reporting bugs: Reporting Bugs. (line 6)
4391 * required images, downloading: Recursive Retrieval Options.
4393 * resume download: Download Options. (line 86)
4394 * retries: Download Options. (line 12)
4395 * retries, waiting between: Download Options. (line 272)
4396 * retrieving: Recursive Download. (line 6)
4397 * robot exclusion: Robot Exclusion. (line 6)
4398 * robots.txt: Robot Exclusion. (line 6)
4399 * sample wgetrc: Sample Wgetrc. (line 6)
4400 * saving cookies: HTTP Options. (line 136)
4401 * security: Security Considerations.
4403 * server maintenance: Robot Exclusion. (line 6)
4404 * server response, print: Download Options. (line 181)
4405 * server response, save: HTTP Options. (line 217)
4406 * session cookies: HTTP Options. (line 141)
4407 * signal handling: Signals. (line 6)
4408 * spanning hosts: Spanning Hosts. (line 6)
4409 * spider: Download Options. (line 186)
4410 * SSL: HTTPS (SSL/TLS) Options.
4412 * SSL certificate: HTTPS (SSL/TLS) Options.
4414 * SSL certificate authority: HTTPS (SSL/TLS) Options.
4416 * SSL certificate type, specify: HTTPS (SSL/TLS) Options.
4418 * SSL certificate, check: HTTPS (SSL/TLS) Options.
4420 * SSL protocol, choose: HTTPS (SSL/TLS) Options.
4422 * startup: Startup File. (line 6)
4423 * startup file: Startup File. (line 6)
4424 * suffixes, accept: Types of Files. (line 15)
4425 * suffixes, reject: Types of Files. (line 34)
4426 * symbolic links, retrieving: FTP Options. (line 73)
4427 * syntax of options: Option Syntax. (line 6)
4428 * syntax of wgetrc: Wgetrc Syntax. (line 6)
4429 * tag-based recursive pruning: Recursive Accept/Reject Options.
4431 * time-stamping: Time-Stamping. (line 6)
4432 * time-stamping usage: Time-Stamping Usage. (line 6)
4433 * timeout: Download Options. (line 197)
4434 * timeout, connect: Download Options. (line 221)
4435 * timeout, DNS: Download Options. (line 215)
4436 * timeout, read: Download Options. (line 226)
4437 * timestamping: Time-Stamping. (line 6)
4438 * tries: Download Options. (line 12)
4439 * types of files: Types of Files. (line 6)
4440 * updating the archives: Time-Stamping. (line 6)
4441 * URL: URL Format. (line 6)
4442 * URL syntax: URL Format. (line 6)
4443 * usage, time-stamping: Time-Stamping Usage. (line 6)
4444 * user: Download Options. (line 448)
4445 * user-agent: HTTP Options. (line 221)
4446 * various: Various. (line 6)
4447 * verbose: Logging and Input File Options.
4449 * wait: Download Options. (line 258)
4450 * wait, random: Download Options. (line 283)
4451 * waiting between retries: Download Options. (line 272)
4452 * web site: Web Site. (line 6)
4453 * Wget as spider: Download Options. (line 186)
4454 * wgetrc: Startup File. (line 6)
4455 * wgetrc commands: Wgetrc Commands. (line 6)
4456 * wgetrc location: Wgetrc Location. (line 6)
4457 * wgetrc syntax: Wgetrc Syntax. (line 6)
4458 * wildcards, accept: Types of Files. (line 15)
4459 * wildcards, reject: Types of Files. (line 34)
4460 * Windows file names: Download Options. (line 343)
4466 Node: Overview
\7f2165
4467 Node: Invoking
\7f5587
4468 Node: URL Format
\7f6440
4469 Ref: URL Format-Footnote-1
\7f9030
4470 Node: Option Syntax
\7f9132
4471 Node: Basic Startup Options
\7f11809
4472 Node: Logging and Input File Options
\7f12614
4473 Node: Download Options
\7f16002
4474 Node: Directory Options
\7f39608
4475 Node: HTTP Options
\7f42313
4476 Node: HTTPS (SSL/TLS) Options
\7f57394
4477 Node: FTP Options
\7f63069
4478 Node: Recursive Retrieval Options
\7f67535
4479 Node: Recursive Accept/Reject Options
\7f75403
4480 Node: Exit Status
\7f78883
4481 Node: Recursive Download
\7f79906
4482 Node: Following Links
\7f83079
4483 Node: Spanning Hosts
\7f84041
4484 Node: Types of Files
\7f86238
4485 Node: Directory-Based Limits
\7f90600
4486 Node: Relative Links
\7f93692
4487 Node: FTP Links
\7f94529
4488 Node: Time-Stamping
\7f95396
4489 Node: Time-Stamping Usage
\7f97044
4490 Node: HTTP Time-Stamping Internals
\7f98892
4491 Ref: HTTP Time-Stamping Internals-Footnote-1
\7f100168
4492 Node: FTP Time-Stamping Internals
\7f100367
4493 Node: Startup File
\7f101833
4494 Node: Wgetrc Location
\7f102747
4495 Node: Wgetrc Syntax
\7f103567
4496 Node: Wgetrc Commands
\7f104287
4497 Node: Sample Wgetrc
\7f119229
4498 Node: Examples
\7f124752
4499 Node: Simple Usage
\7f125113
4500 Node: Advanced Usage
\7f126534
4501 Node: Very Advanced Usage
\7f130246
4502 Node: Various
\7f131741
4503 Node: Proxies
\7f132446
4504 Node: Distribution
\7f135297
4505 Node: Web Site
\7f135635
4506 Node: Mailing Lists
\7f135930
4507 Node: Internet Relay Chat
\7f137996
4508 Node: Reporting Bugs
\7f138282
4509 Node: Portability
\7f140801
4510 Node: Signals
\7f142426
4511 Node: Appendices
\7f143109
4512 Node: Robot Exclusion
\7f143455
4513 Node: Security Considerations
\7f147250
4514 Node: Contributors
\7f148434
4515 Node: Copying this manual
\7f154074
4516 Node: GNU Free Documentation License
\7f154313
4517 Node: Concept Index
\7f179483