.\"- .\" Copyright (c) 1998-2004 Dag-Erling Coïdan Smørgrav .\" Copyright (c) 2010 Joerg Sonnenberger .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD: fetch.3,v 1.64 2007/12/18 11:03:26 des Exp $ .\" $NetBSD: fetch.3,v 1.1.1.8 2010/01/30 21:26:10 joerg Exp $ .\" .Dd January 22, 2010 .Dt FETCH 3 .Os .Sh NAME .Nm fetchMakeURL , .Nm fetchParseURL , .Nm fetchCopyURL , .Nm fetchFreeURL , .Nm fetchXGetURL , .Nm fetchGetURL , .Nm fetchPutURL , .Nm fetchStatURL , .Nm fetchListURL , .Nm fetchXGet , .Nm fetchGet , .Nm fetchPut , .Nm fetchStat , .Nm fetchList , .Nm fetchXGetFile , .Nm fetchGetFile , .Nm fetchPutFile , .Nm fetchStatFile , .Nm fetchListFile , .Nm fetchXGetHTTP , .Nm fetchGetHTTP , .Nm fetchPutHTTP , .Nm fetchStatHTTP , .Nm fetchListHTTP , .Nm fetchXGetFTP , .Nm fetchGetFTP , .Nm fetchPutFTP , .Nm fetchStatFTP , .Nm fetchListFTP .Nm fetchInitURLList , .Nm fetchFreeURLList , .Nm fetchUnquotePath , .Nm fetchUnquoteFilename , .Nm fetchStringifyURL , .Nm fetchConnectionCacheInit , .Nm fetchConnectionCacheClose , .Nm fetch .Nd file transfer functions .Sh LIBRARY .Lb libfetch .Sh SYNOPSIS .In stdio.h .In fetch.h .Ft struct url * .Fn fetchMakeURL "const char *scheme" "const char *host" "int port" "const char *doc" "const char *user" "const char *pwd" .Ft struct url * .Fn fetchParseURL "const char *URL" .Ft struct url * .Fn fetchCopyURL "const struct url *u" .Ft void .Fn fetchFreeURL "struct url *u" .Ft fetchIO * .Fn fetchXGetURL "const char *URL" "struct url_stat *us" "const char *flags" .Ft fetchIO * .Fn fetchGetURL "const char *URL" "const char *flags" .Ft fetchIO * .Fn fetchPutURL "const char *URL" "const char *flags" .Ft int .Fn fetchStatURL "const char *URL" "struct url_stat *us" "const char *flags" .Ft int .Fn fetchListURL "struct url_list *list" "const char *URL" "const char *flags" .Ft fetchIO * .Fn fetchXGet "struct url *u" "struct url_stat *us" "const char *flags" .Ft fetchIO * .Fn fetchGet "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchPut "struct url *u" "const char *flags" .Ft int .Fn fetchStat "struct url *u" "struct url_stat *us" "const char *flags" .Ft int .Fn fetchList "struct url_list *list" "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchXGetFile "struct url *u" "struct url_stat *us" "const char *flags" .Ft fetchIO * .Fn fetchGetFile "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchPutFile "struct url *u" "const char *flags" .Ft int .Fn fetchStatFile "struct url *u" "struct url_stat *us" "const char *flags" .Ft int .Fn fetchListFile "struct url_list *list" "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchXGetHTTP "struct url *u" "struct url_stat *us" "const char *flags" .Ft fetchIO * .Fn fetchGetHTTP "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchPutHTTP "struct url *u" "const char *flags" .Ft int .Fn fetchStatHTTP "struct url *u" "struct url_stat *us" "const char *flags" .Ft int .Fn fetchListHTTP "struct url_list *list" "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchXGetFTP "struct url *u" "struct url_stat *us" "const char *flags" .Ft fetchIO * .Fn fetchGetFTP "struct url *u" "const char *flags" .Ft fetchIO * .Fn fetchPutFTP "struct url *u" "const char *flags" .Ft int .Fn fetchStatFTP "struct url *u" "struct url_stat *us" "const char *flags" .Ft int .Fn fetchListFTP "struct url_list *list" "struct url *u" "const char *flags" .Ft void .Fn fetchInitURLList "struct url_list *ul" .Ft int .Fn fetchAppendURLList "struct url_list *dst" "const struct url_list *src" .Ft void .Fn fetchFreeURLList "struct url_list *ul" .Ft char * .Fn fetchUnquotePath "struct url *u" .Ft char * .Fn fetchUnquoteFilename "struct url *u" .Ft char * .Fn fetchStringifyURL "const struct url *u" .Ft void .Fn fetchConnectionCacheInit "int global" "int per_host" .Ft void .Fn fetchConnectionCacheClose "void" .Sh DESCRIPTION These functions implement a high-level library for retrieving and uploading files using Uniform Resource Locators (URLs). .Pp .Fn fetchParseURL takes a URL in the form of a null-terminated string and splits it into its components function according to the Common Internet Scheme Syntax detailed in RFC 1738. A regular expression which produces this syntax is: .Bd -literal -offset indent \*[Lt]scheme\*[Gt]:(//(\*[Lt]user\*[Gt](:\*[Lt]pwd\*[Gt])?@)?\*[Lt]host\*[Gt](:\*[Lt]port\*[Gt])?)?/(\*[Lt]document\*[Gt])? .Ed .Pp If the URL does not seem to begin with a scheme name, it is assumed to be a local path. Only absolute path names are accepted. .Pp Note that some components of the URL are not necessarily relevant to all URL schemes. For instance, the file scheme only needs the .Aq scheme and .Aq document components. .Fn fetchParseURL quotes any unsafe character in the URL automatically. This is not done by .Fn fetchMakeURL . .Fn fetchCopyURL copies an existing .Vt url structure. .Pp .Fn fetchMakeURL , .Fn fetchParseURL , and .Fn fetchCopyURL return a pointer to a .Vt url structure, which is defined as follows in .In fetch.h : .Bd -literal #define URL_SCHEMELEN 16 #define URL_USERLEN 256 #define URL_PWDLEN 256 #define URL_HOSTLEN 255 struct url { char scheme[URL_SCHEMELEN + 1]; char user[URL_USERLEN + 1]; char pwd[URL_PWDLEN + 1]; char host[URL_HOSTLEN + 1]; int port; char *doc; off_t offset; size_t length; time_t last_modified; }; .Ed .Pp The pointer returned by .Fn fetchMakeURL , .Fn fetchCopyURL , and .Fn fetchParseURL should be freed using .Fn fetchFreeURL . The size of .Vt struct URL is not part of the ABI. .Pp .Fn fetchXGetURL , .Fn fetchGetURL , and .Fn fetchPutURL constitute the recommended interface to the .Nm fetch library. They examine the URL passed to them to determine the transfer method, and call the appropriate lower-level functions to perform the actual transfer. .Fn fetchXGetURL also returns the remote document's metadata in the .Vt url_stat structure pointed to by the .Fa us argument. .Pp The .Fa flags argument is a string of characters which specify transfer options. The meaning of the individual flags is scheme-dependent, and is detailed in the appropriate section below. .Pp .Fn fetchStatURL attempts to obtain the requested document's metadata and fill in the structure pointed to by its second argument. The .Vt url_stat structure is defined as follows in .In fetch.h : .Bd -literal struct url_stat { off_t size; time_t atime; time_t mtime; }; .Ed .Pp If the size could not be obtained from the server, the .Fa size field is set to \-1. If the modification time could not be obtained from the server, the .Fa mtime field is set to the epoch. If the access time could not be obtained from the server, the .Fa atime field is set to the modification time. .Pp .Fn fetchListURL attempts to list the contents of the directory pointed to by the URL provided. The pattern can be a simple glob-like expression as hint. Callers should not depend on the server to filter names. If successful, it appends the list of entries to the .Vt url_list structure. The .Vt url_list structure is defined as follows in .In fetch.h : .Bd -literal struct url_list { size_t length; size_t alloc_size; struct url *urls; }; .Ed .Pp The list should be initialized by calling .Fn fetchInitURLList and the entries be freed by calling .Fn fetchFreeURLList . The function .Fn fetchAppendURLList can be used to append one URL lists to another. If the .Ql c (cache result) flag is specified, the library is allowed to internally cache the result. .Pp .Fn fetchStringifyURL returns the URL as string. .Fn fetchUnquotePath returns the path name part of the URL with any quoting undone. Query arguments and fragment identifiers are not included. .Fn fetchUnquoteFilename returns the last component of the path name as returned by .Fn fetchUnquotePath . .Fn fetchStringifyURL , .Fn fetchUnquotePath , and .Fn fetchUnquoteFilename return a string that should be deallocated with .Fn free after use. .Pp .Fn fetchConnectionCacheInit enables the connection cache. The first argument specifies the global limit on cached connections. The second argument specifies the host limit. Entries are considered to specify the same host, if the host name from the URL is identical, indepent of the address or address family. .Fn fetchConnectionCacheClose flushed the connection cache and closes all cached connections. .Pp .Fn fetchXGet , .Fn fetchGet , .Fn fetchPut , and .Fn fetchStat are similar to .Fn fetchXGetURL , .Fn fetchGetURL , .Fn fetchPutURL , and .Fn fetchStatURL , except that they expect a pre-parsed URL in the form of a pointer to a .Vt struct url rather than a string. .Pp All of the .Fn fetchXGetXXX , .Fn fetchGetXXX , and .Fn fetchPutXXX functions return a pointer to a stream which can be used to read or write data from or to the requested document, respectively. Note that although the implementation details of the individual access methods vary, it can generally be assumed that a stream returned by one of the .Fn fetchXGetXXX or .Fn fetchGetXXX functions is read-only, and that a stream returned by one of the .Fn fetchPutXXX functions is write-only. .Sh PROTOCOL INDEPENDENT FLAGS If the .Ql i (if-modified-since) flag is specified, the library will try to fetch the content only if it is newer than .Va last_modified . For HTTP an .Li If-Modified-Since HTTP header is sent. For FTP a .Li MTDM command is sent first and compared locally. For FILE the source file is compared. .Sh FILE SCHEME .Fn fetchXGetFile , .Fn fetchGetFile , and .Fn fetchPutFile provide access to documents which are files in a locally mounted file system. Only the .Aq document component of the URL is used. .Pp .Fn fetchXGetFile and .Fn fetchGetFile do not accept any flags. .Pp .Fn fetchPutFile accepts the .Ql a (append to file) flag. If that flag is specified, the data written to the stream returned by .Fn fetchPutFile will be appended to the previous contents of the file, instead of replacing them. .Sh FTP SCHEME .Fn fetchXGetFTP , .Fn fetchGetFTP , and .Fn fetchPutFTP implement the FTP protocol as described in RFC 959. .Pp By default .Nm libfetch will attempt to use passive mode first and only fallback to active mode if the server reports a syntax error. If the .Ql a (active) flag is specified, a passive connection is not tried and active mode is used directly. .Pp If the .Ql l (low) flag is specified, data sockets will be allocated in the low (or default) port range instead of the high port range (see .Xr ip 4 ) . .Pp If the .Ql d (direct) flag is specified, .Fn fetchXGetFTP , .Fn fetchGetFTP , and .Fn fetchPutFTP will use a direct connection even if a proxy server is defined. .Pp If no user name or password is given, the .Nm fetch library will attempt an anonymous login, with user name "anonymous" and password "anonymous@\*[Lt]hostname\*[Gt]". .Sh HTTP SCHEME The .Fn fetchXGetHTTP , .Fn fetchGetHTTP , and .Fn fetchPutHTTP functions implement the HTTP/1.1 protocol. With a little luck, there is even a chance that they comply with RFC 2616 and RFC 2617. .Pp If the .Ql d (direct) flag is specified, .Fn fetchXGetHTTP , .Fn fetchGetHTTP , and .Fn fetchPutHTTP will use a direct connection even if a proxy server is defined. .Pp Since there seems to be no good way of implementing the HTTP PUT method in a manner consistent with the rest of the .Nm fetch library, .Fn fetchPutHTTP is currently unimplemented. .Sh AUTHENTICATION Apart from setting the appropriate environment variables and specifying the user name and password in the URL or the .Vt struct url , the calling program has the option of defining an authentication function with the following prototype: .Pp .Ft int .Fn myAuthMethod "struct url *u" .Pp The callback function should fill in the .Fa user and .Fa pwd fields in the provided .Vt struct url and return 0 on success, or any other value to indicate failure. .Pp To register the authentication callback, simply set .Va fetchAuthMethod to point at it. The callback will be used whenever a site requires authentication and the appropriate environment variables are not set. .Pp This interface is experimental and may be subject to change. .Sh RETURN VALUES .Fn fetchParseURL returns a pointer to a .Vt struct url containing the individual components of the URL. If it is unable to allocate memory, or the URL is syntactically incorrect, .Fn fetchParseURL returns a .Dv NULL pointer. .Pp The .Fn fetchStat functions return 0 on success and \-1 on failure. .Pp All other functions return a stream pointer which may be used to access the requested document, or .Dv NULL if an error occurred. .Pp The following error codes are defined in .In fetch.h : .Bl -tag -width 18n .It Bq Er FETCH_ABORT Operation aborted .It Bq Er FETCH_AUTH Authentication failed .It Bq Er FETCH_DOWN Service unavailable .It Bq Er FETCH_EXISTS File exists .It Bq Er FETCH_FULL File system full .It Bq Er FETCH_INFO Informational response .It Bq Er FETCH_MEMORY Insufficient memory .It Bq Er FETCH_MOVED File has moved .It Bq Er FETCH_NETWORK Network error .It Bq Er FETCH_OK No error .It Bq Er FETCH_PROTO Protocol error .It Bq Er FETCH_RESOLV Resolver error .It Bq Er FETCH_SERVER Server error .It Bq Er FETCH_TEMP Temporary error .It Bq Er FETCH_TIMEOUT Operation timed out .It Bq Er FETCH_UNAVAIL File is not available .It Bq Er FETCH_UNKNOWN Unknown error .It Bq Er FETCH_URL Invalid URL .El .Pp The accompanying error message includes a protocol-specific error code and message, e.g.\& "File is not available (404 Not Found)" .Sh ENVIRONMENT .Bl -tag -width ".Ev FETCH_BIND_ADDRESS" .It Ev FETCH_BIND_ADDRESS Specifies a host name or IP address to which sockets used for outgoing connections will be bound. .It Ev FTP_LOGIN Default FTP login if none was provided in the URL. .It Ev FTP_PASSIVE_MODE If set to anything but .Ql no , forces the FTP code to use passive mode. .It Ev FTP_PASSWORD Default FTP password if the remote server requests one and none was provided in the URL. .It Ev FTP_PROXY URL of the proxy to use for FTP requests. The document part is ignored. FTP and HTTP proxies are supported; if no scheme is specified, FTP is assumed. If the proxy is an FTP proxy, .Nm libfetch will send .Ql user@host as user name to the proxy, where .Ql user is the real user name, and .Ql host is the name of the FTP server. .Pp If this variable is set to an empty string, no proxy will be used for FTP requests, even if the .Ev HTTP_PROXY variable is set. .It Ev ftp_proxy Same as .Ev FTP_PROXY , for compatibility. .It Ev HTTP_AUTH Specifies HTTP authorization parameters as a colon-separated list of items. The first and second item are the authorization scheme and realm respectively; further items are scheme-dependent. Currently, only basic authorization is supported. .Pp Basic authorization requires two parameters: the user name and password, in that order. .Pp This variable is only used if the server requires authorization and no user name or password was specified in the URL. .It Ev HTTP_PROXY URL of the proxy to use for HTTP requests. The document part is ignored. Only HTTP proxies are supported for HTTP requests. If no port number is specified, the default is 3128. .Pp Note that this proxy will also be used for FTP documents, unless the .Ev FTP_PROXY variable is set. .It Ev http_proxy Same as .Ev HTTP_PROXY , for compatibility. .It Ev HTTP_PROXY_AUTH Specifies authorization parameters for the HTTP proxy in the same format as the .Ev HTTP_AUTH variable. .Pp This variable is used if and only if connected to an HTTP proxy, and is ignored if a user and/or a password were specified in the proxy URL. .It Ev HTTP_REFERER Specifies the referrer URL to use for HTTP requests. If set to .Dq auto , the document URL will be used as referrer URL. .It Ev HTTP_USER_AGENT Specifies the User-Agent string to use for HTTP requests. This can be useful when working with HTTP origin or proxy servers that differentiate between user agents. .It Ev NETRC Specifies a file to use instead of .Pa ~/.netrc to look up login names and passwords for FTP sites. See .Xr ftp 1 for a description of the file format. This feature is experimental. .It Ev NO_PROXY Either a single asterisk, which disables the use of proxies altogether, or a comma- or whitespace-separated list of hosts for which proxies should not be used. .It Ev no_proxy Same as .Ev NO_PROXY , for compatibility. .El .Sh EXAMPLES To access a proxy server on .Pa proxy.example.com port 8080, set the .Ev HTTP_PROXY environment variable in a manner similar to this: .Pp .Dl HTTP_PROXY=http://proxy.example.com:8080 .Pp If the proxy server requires authentication, there are two options available for passing the authentication data. The first method is by using the proxy URL: .Pp .Dl HTTP_PROXY=http://\*[Lt]user\*[Gt]:\*[Lt]pwd\*[Gt]@proxy.example.com:8080 .Pp The second method is by using the .Ev HTTP_PROXY_AUTH environment variable: .Bd -literal -offset indent HTTP_PROXY=http://proxy.example.com:8080 HTTP_PROXY_AUTH=basic:*:\*[Lt]user\*[Gt]:\*[Lt]pwd\*[Gt] .Ed .Pp To disable the use of a proxy for an HTTP server running on the local host, define .Ev NO_PROXY as follows: .Bd -literal -offset indent NO_PROXY=localhost,127.0.0.1 .Ed .Sh SEE ALSO .\" .Xr fetch 1 , .\" .Xr ftpio 3 , .Xr ftp 1 , .Xr ip 4 .Rs .%A J. Postel .%A J. K. Reynolds .%D October 1985 .%B File Transfer Protocol .%O RFC 959 .Re .Rs .%A P. Deutsch .%A A. Emtage .%A A. Marine .%D May 1994 .%T How to Use Anonymous FTP .%O RFC 1635 .Re .Rs .%A T. Berners-Lee .%A L. Masinter .%A M. McCahill .%D December 1994 .%T Uniform Resource Locators (URL) .%O RFC 1738 .Re .Rs .%A R. Fielding .%A J. Gettys .%A J. Mogul .%A H. Frystyk .%A L. Masinter .%A P. Leach .%A T. Berners-Lee .%D January 1999 .%B Hypertext Transfer Protocol -- HTTP/1.1 .%O RFC 2616 .Re .Rs .%A J. Franks .%A P. Hallam-Baker .%A J. Hostetler .%A S. Lawrence .%A P. Leach .%A A. Luotonen .%A L. Stewart .%D June 1999 .%B HTTP Authentication: Basic and Digest Access Authentication .%O RFC 2617 .Re .Sh HISTORY The .Nm fetch library first appeared in .Fx 3.0 . .Sh AUTHORS .An -nosplit The .Nm fetch library was mostly written by .An Dag-Erling Sm\(/orgrav Aq des@FreeBSD.org with numerous suggestions from .An Jordan K. Hubbard Aq jkh@FreeBSD.org , .An Eugene Skepner Aq eu@qub.com and other .Fx developers. It replaces the older .Nm ftpio library written by .An Poul-Henning Kamp Aq phk@FreeBSD.org and .An Jordan K. Hubbard Aq jkh@FreeBSD.org . .Pp This manual page was written by .An Dag-Erling Sm\(/orgrav Aq des@FreeBSD.org . .Sh BUGS Some parts of the library are not yet implemented. The most notable examples of this are .Fn fetchPutHTTP and FTP proxy support. .Pp There is no way to select a proxy at run-time other than setting the .Ev HTTP_PROXY or .Ev FTP_PROXY environment variables as appropriate. .Pp .Nm libfetch does not understand or obey 305 (Use Proxy) replies. .Pp Error numbers are unique only within a certain context; the error codes used for FTP and HTTP overlap, as do those used for resolver and system errors. For instance, error code 202 means "Command not implemented, superfluous at this site" in an FTP context and "Accepted" in an HTTP context. .Pp .Fn fetchStatFTP does not check that the result of an MDTM command is a valid date. .Pp The man page is incomplete, poorly written and produces badly formatted text. .Pp The error reporting mechanism is unsatisfactory. .Pp Some parts of the code are not fully reentrant.