.\" $Header: /a/swan/home/swan/staff/csg/lmjm/src/perl/mirror/RCS/mirror.man,v 1.15 1992/03/20 21:01:06 lmjm Exp lmjm $
.\" $Log: mirror.man,v $
.\" Revision 1.15  1992/03/20  21:01:06  lmjm
.\" Added a whole bunch of corrections from someone on the net
.\" whose name I've lost!  Very embarasing as it was a lot of work.
.\"
.\" Revision 1.14  1992/01/14  11:02:53  lmjm
.\" Added local_ls_lR_file.
.\"
.\" Revision 1.13  1991/11/27  22:17:49  lmjm
.\" Added do_deletes deletes_excl split_max split_chunk split_patt
.\"
.\" Revision 1.12  1991/10/23  22:42:24  lmjm
.\" Added the ls_lR_file keyword
.\"
.\" Revision 1.11  1991/10/07  18:30:38  lmjm
.\" Zapped the out of date ls_lR_file option.
.\"
.\" Revision 1.10  1991/09/20  21:13:07  lmjm
.\" Gave a default to -U
.\"
.\" Revision 1.9  1991/09/20  20:59:18  lmjm
.\" Added -U option.
.\"
.\" Revision 1.8  1991/09/20  20:22:19  lmjm
.\" Some more hints.
.\" Update the examples.
.\"
.\" Revision 1.7  1991/09/17  22:53:17  lmjm
.\" Update get_size_change entry
.\"
.\" Revision 1.6  1991/09/12  22:40:26  lmjm
.\" Added description of command line based facilities.
.\"
.\" Revision 1.5  1991/08/29  16:23:51  lmjm
.\" Oodles of fixes.
.\"
.\" Revision 1.4  1991/08/19  15:47:26  lmjm
.\" Added -v option - just to print the version.
.\" Corrected the examples.
.\"
.\" Revision 1.3  1991/08/16  22:17:47  lmjm
.\" Added the -T option to re-timestamp the local archive.
.\" Added the update_local option.
.\" Allow key+value
.\" Added the -T option to just timestamp.
.\" Corrected the English and some typos, fixes courtesty k.twidle
.\"
.\" Revision 1.2  1991/08/14  21:19:57  lmjm
.\" Made the example of the key=value syntax match the description.
.\" Cleaned up the english.
.\" Improved the example.
.\"
.\" Revision 1.1  1991/08/13  22:31:13  lmjm
.\" Initial revision
.\"
.\"
.TH MIRROR 1L "13 August 1991"
.SH NAME
mirror \- mirror packages on remote sites
.SH SYNOPSIS
.B mirror
.B \-[dvTn] [\-Ufilename [\-p\fIpackage\fP] [config-files]
.br
.B mirror
.B \-[mtfr[G|P]] [\-k\fIkey=val\fP] [\-c\fPconfig-file\fR]
.br
.B \ \ \ [\-s\fIsite\fP] [-u\fIuser\fP] {\fIlocal_dir\fP \fIremote_dir\fP [\fIget_patt\fP]}+
.SH DESCRIPTION
.B Mirror
is a package written in Perl that uses the ftp protocol to copy
directory hierarchies between machines.  It tries to avoid copying
files unnecessarily.
.LP
It is commonly used in one of two ways.  In the first usage given in
the synopsis above,
.B mirror
is used to make a local ftp archive mirror the contents of a remote
ftp archive.  A minimal number of arguments are required and
.B mirror
is controlled by the settings read from the configuration files (or
standard input).
.LP
In its second use 
.B mirror
is controlled by command line arguments.  It provides an easy way to
get, or put, a directory heirarchy.
.LP
It was written to mirror remote Un*x archives but has grown (like topsy).
.SH OPTIONS
.TP
.B \-d
Enable debugging.  If this argument is given more than once the
debugging level will increase.  Currently the maximum useful level is
three.
.TP
.B \-n
Do nothing except compare local and remote directories.  Sets debug
level to two, so you are shown a trace of what would be done.
.TP
.B \-T
Force the time stamps of any local files to be reset to be the same as
the remote file.  Normally only used when initialising a mirror area
using existing files contents.
.TP
.B \-p\fIpackage\fP
Only mirror the given package.  This option may be given multiple
times in which case all the given packages will be mirrored.  Without
this option being specified all packages will be mirrored.
.TP
.B \-v
Print the version details of mirror and exit.
.TP
.B \-U[filename]
Record all uploads into filename.  Remeber that mirror changes into
local_dir to do its work, so filename should be a full pathame.  If no
filename is given it defaults to \`pwd\`/upload_log.day.month.year.
.TP
.B \-G
Get files from the remote machine.  The local and remote directories
have to be given on the command line.
.TP
.B \-P
Put files onto the remote machine.  The local and remote directories
have to be given on the command line.
.TP
.B \-Cfile
Specify config files.  Needed to give config files with
.B \-P
and
.B \-G
options.
.TP
.B -kkey=value
Override any default key/value.
.TP
.B \-m
Equivalent to
.B \-kmode_copy=true
.TP
.B \-t
Equivalent to
.B \-ktext_mode=true
.TP
.B \-r
Equivalent to
.B \-recursive=false
.TP
.B \-sSite
Equivalent to
.B \-ksite=Site
.TP
.B \-uUser
Equivalent to
.BR \-kremote_user=User .
You are then prompted for a password, with echo turned off.  The
password is used to set
.BR remote_password .
.TP -
.SH CONFIGURATION FILE
The configuration file is parsed as a series of statements.
Blank lines and lines beginning with a hash are ignored.  Each
statement is of the form:
.LP
.I keyword
'BI = value
.br
or
.br
.I keyword
'BI + value
.LP
You can add whitespace before the keyword and the equals/plus.
Everything immediately following the equals/plus is the value,
including any leading or trailing whitespace.  The equals version sets
the keyword to this value. The plus version concatenates the value
onto the end of the default.
.LP
A statement can be continued over multiple lines by ending all lines
except the last, with the character ampersand (&).  The line
following the ampersand it appended to the current line with
all leading whitespace is removed.
.LP
Here is a list of the keywords and their values, any defaults are
given inside square brackets.  Those options flagged with a star are
not yet implemented.
.de kV
.TP 15m
.I \\$1
\\$3
.if !'\\$2'' [\\$2]
..
.kV package '' "should be a unique name for the package to be mirrored
.kV comment '' "used in reports
.kV skip '' "Setting this entry causes this package to be skipped.  (Its easier than commenting the entry out.)
.kV site '' "sitename or ip address of the remote site
.kV remote_dir '' "remote directory to mirror (or copy into, for "-P" operation)
.kV local_dir '' "local directory to copy into (or mirror, for "-P" operation)
.kV remote_user anonymous "username to use at remote site
.kV remote_password user@localhostname "password to use at remote site
.kV get_patt . "regexp of remote pathnames to retrieve
.kV exclude_patt '' "regexp of remote pathnames to ignore
.kV update_local false "Set get_patt to be local_dir/*.  This is useful if you only want to mirror selected subdirectories of a remote archive.
.kV local_ignore '' "regexp of local pathnames to totally ignore.  Useful to skip restricted local directories.
.kV do_deletes false "delete local files if not in remote.  Does not delete the file yet.  Just prints a note about what to delete.
.kV delete_excl '' "regexp of local pathnames to never delete
.kV max_days 0 "if >0 ignore files older than this many days
.kV split_max 0 "if >0 and size of file greater than this the file is split up to be stored locally (filename must also match split_patt)
.kV split_chunk 102400 "Size of chunks to split up files into
.kV split_patt '' "regexp of remote pathnames to split up before storing locally
.kV local_ls_lR_file '' "local file containing ls-lR - else use remote ls_lR_file.  This is useful when first mirroring a large package.
.kV ls_lR_file '' "remote file containing ls-lR - else run remote ls
.kV name_mappings '' "remote to local pathname mappings (a perl \fIs\fP command, eg s:old:new:) currently only one allowed
.kV get_newer true "get the remote file if its date is newer than local
.kV get_size_change true "get the file if size if different from local.  If a file is compressed when fetched then the size is automatically ignored.
.kV compress_patt '' "regexp of files to compress before storing locally.  (See also get_size_change.)
.kV compress_excl \e.[zZ] "regexp of files not to compress
.kV force_times yes "Force local times to match remote times
.kV retry_call yes "If initial connect fails retry ONCE after ONE minute
.kV update_log '' "Filename, relative to local_dir, where an update report is to be kept
.kV mail_to '' "Mail a report to this comma separated list of people
.kV user '' "User name or uid to give to local pathnames
.kV group '' "Group name or gid to give to local pathnames
.kV file_mode 0444 "Mode to give files created locally
.kV dir_mode 0755 "mode to give directories created locally
.kV timeout 20 "timeout ftp requests after this many seconds
.kV ftp_port 21 "port number of remote ftp daemon
.kV proxy 0 "set to 1 to use proxy ftp service
.kV proxy_ftp_port 4514 "port number of proxy-service ftp daemon
.kV proxy_gateway internet-gateway "name of proxy-service, may also be supplied by environmental variable INTERNET_HOST
.kV recursive true "do sub directories as well
.kV flags_recursive '-lRat' "flags to send to ls to do a recursive listing
.kV flags_nonrecursive '-lat'  "flags to send to ls to do a non-recursive listing
.kV mode_copy false "flag indicating if we need to copy the mode bits
.kV interactive false "noninteractive copy default
.kV text_mode false "transfer in binary mode by default
.kV force false "transfer selectively by default
.kV getfile true "perform get, not put by default
.kV verbose false "Verbose messages
.kV disconnect false "disconnect from remote site at end of package
.kV *remote_fs unix "Remote file store type.  (Only copes with unix at the moment)
.kV mail_prog mail "Program called to send to the mail_to list.
.kV delete_source false "Delete the source files and dirs once transfered.
.LP
Each group of keywords defines how to mirror a particular package and
should begin with a unique
.B package
line.  The package name is used in report generation and by the
.B \-p
argument, so pick something mnemonic.  The minimum needed for each
package is the
.B site,
.B remote_dir
and
.B local_dir .
On finding a package line all the default values are reset.
.LP
If the package name is
.B defaults
then no site is contacted but the default values given for any
keywords are changed.  Personally I begin my config files with:
.LP
.RS
.ft B
.nf
package=defaults
	remote_password=ukuug-soft@doc.ic.ac.uk
	get_newer=yes
	get_size_change=yes
.fi
.ft R
.RE
.LP
If the package is not
.B defaults
then
.B mirror
will perform the following steps.  Unless an internal failure is
detected any error will cause the current package to be skipped and
the next one tried.
.LP
If 
.B Mirror
is not already connected to the site it
will disconnect from any site it is already connected to then
attempt to connect to the remote site's
.B ftp
daemon.  It will then login using the given remote username and password.  Once
connected
.B mirror
turns on binary mode transfers.  Next it changes to the given local
directory and scans it to get the details of the local files that already exist,
if necessary the local directory will be created.  Once this is completed
the remote directory is similarly scanned.
.B Mirror
does this by changing to the remote directory and runing the ftp LIST
command, passing the
.B -lRt
options.  (I am not very happy about this bit and hope to allow it to
pull back a file containing the remote directory listing instead.)  Each
remote pathname will have any specified mappings performed on it to
create a local pathname.  Then any checks specified
by the
.BR exclude_patt ,
.BR max_days ,
.BR get_newer
and
.BR get_size_change
keywords are applied on names of files or symlinks.  Only 
.BR exclude_patt
checking is applied to directories.
.LP
The above creates a list of all required remote files and the local path names
to store them in.
.LP
Once the directory listing is completed all required files are
fetched from the remote site into their local path names.  This is
done by pulling the file into a temporary file in the target directory.
If required the temporary file is compressed.
The temporary file is renamed when the transfer is successful.
.SH EXAMPLES
.LP
Here is the mirror.defaults file from the archive on
.BR src.doc.ic.ac.uk .
.LP
.RS
.ft B
.nf
# This is the defaul mirror settings used by my site:
# src.doc.ic.ac.uk (146.169.3.7)
# This is home of the UKUUG Software Distribution Service
#
# Lee McLoughlin <lmjm@doc.ic.ac.uk>

# Set my defaults
package=defaults
	# Keep all local_dirs relative to here
	local_dir=/vol/public/
	remote_password=ukuug-soft@doc.ic.ac.uk
	mail_to=lmjm
	dir_mode=0755
	file_mode=0444
	user=0
	group=0
	get_newer=yes
	get_size_change=yes
	# Don't overwrite my mirror log with the remote one.
	# Don't pull back any of their mirror temporary files.
	exclude_patt=^\.mirror$|^MIRROR.LOG$|^\.in\..*\.$|^#.*#|^lost+found/
	# Don't compress arc, zip, boo, readme files and index.txt files
	compress_excl+|\.arc$|\.zip$|\.boo$|[Rr][Ee][Aa][Dd][Mm][Ee]|index.txt
	# Keep a log file in each updated directory
	update_log=.mirror
.fi
.ft R
.RE
And here is part of the mirror.config:
.LP
.RS
.ft B
.nf
package=gnu
	comment=Powerful and free Un*x utilities
	site=prep.ai.mit.edu
	remote_dir=/pub/gnu
	local_dir+gnu
	exclude_patt+|^ListArchives/|^lost+found/|^scheme-7.0/|^\.history
	# I tend to only keep the lastest couple of versions of things
	# this stops mirror from re-pulling the older versions I've removed
	max_days=30

package=elisp-archive
	site=tut.cis.ohio-state.edu
	remote_dir=/pub/gnu/emacs/elisp-archive
	local_dir+gnu/EmacsBits/elisp-archive

package=X
	comment=The X Area at export
	site=export.lcs.mit.edu
	remote_dir=/contrib
	local_dir+X/contrib
	# go-1.0.b.tar.Z is immense so I store it split locally.
	exclude_patt+|^unicom|^go-1.0.b.tar.Z
	# I tend to only keep the lastest couple of versions of things
	# this stops mirror from re-pulling the older versions I've removed
	max_days=30

package=cnews
	comment=The C News system
	site=ftp.cs.toronto.edu
	remote_dir=/pub/c-news
	local_dir+news/c
	compress_excl+|patches/PATCHDATES
	compress_patt=patches/
	exclude_patt+|^c-news.Z

# and on, and on ...
.fi
.ft R
.RE
.SH HINTS
.LP
Always on adding in a new package check it out first turning on the
.I -n
option.
.LP
If you are adding to an existing archive then it is ususally best to
force the timestamps so time comparisions will work.
.LP
Try and have all packages that are being retrieved from the same site
one after the other.  That way
.BI mirror
will only have to login once.
.LP
Keep your default settings in a separate file.  That way you will,
hopefully, be able to share mirror details with others.
.SH NETIQUETTE
If you are going to mirror a remote site please obey any restrictions
that the site administrators place on access.  You can generally find
the restrictions on connecting into the archive using the standard ftp
command.  Any restrictions are normally given as a login banner or in
a, hopefully, obvious file.
.LP
Here are, what I hope are, some good general rules.
.LP
Only mirror a site well outside the working hours of both the local
and remote sites.
.LP
It is probably unfriendly to try to mirror a remote site more
than once a day.
.LP
Before trying to mirror a remote site try and find the
packages you want from local archives, no one will be pleased if you
soak up a lot of network bandwidth needlessly.
.LP
If you have a local archive then tell people about it so they don't
have to waste bandwidth and CPU at the remote site.
.LP
Do remember to check your config files from time to time in case the
remote archive has changed their access restrictions.
.LP
Check the remote site regularly for any new restrictions.
.SH SEE ALSO
perl(l), ftp(1)
.SH BUGS
The remaining keywords need to be implemented.
.LP
Should be able to mirror non Un*x sites (it may be able to but I have
not tested this - the remote ls is the problem).
.LP
It should restart file transfers where they left off.
.LP
Hanging data transfers should be detected.
.LP
Should allow for multiple packages from the same host, efficiently.
.LP
Some of the netiquette guidelines should be enforced.
.LP
Should be able to cope with links as well as symlinks.
.LP
Beginning to suffer from \fIcreeping featurism\fP.
.SH AUTHOR
Written by Lee McLoughlin <lmjm@doc.ic.ac.uk>.
It uses the ftp.pl package by: Alan R. Martello <al@ee.pitt.edu>
which uses the chat2.pl package by: Randal L. Schwartz <merlyn@iwarp.intel.com>
