TODO list for distcc

See also TODO comments in source files



monitor

    There's two things that could be monitored.  First is the daemon running
    on `this' computer and the client that is sending processes across the
    network.

    Some things I'd like to see for the daemon:
    1) Uptime
    2) Configuration (port, lzo compression? ssh enabled? etc)
    3) Number of jobs done (and a spread of the types of errors reported)
    4) Average throughput
    5) Current compiling tasks (pid, Src, filesize, filename, time recv'd)


    For the client (ie, distccmon-gnome replacement):
    On a per job basis
    For SEND and RECEIVE state:
    1) Current throughput
    2) Type of connection (ssh? port? lzo?)
    3) pid
    4) filename
    For (remote) COMPILE state:
    1) The actual pre-processed filename
    2) Type of connection (ssh? port? lzo?)
    3) pid
    For (local) LINKING state:
    1) The actual pre-processed filename
    2) pid
    3) Where the object code was compiled
    4) The post-linked filenamed (gcc ... -o [display this])
    For (local) PREPROCESS state:
    1) filename
    2) pid

    A tall order to be sure, and it'd suck to do the GUI... but you asked.
    :)




error messages

   distcc[32356] ERROR: compile on nevada failed with exit code 1

   Ought to say which input file.



unlink .i file as soon as it has been sent

    Might help with vm performance by hinting to the kernel that it
    will be discarded.

    Possibly reduces the chances of temporary files being left behind.



variable to add extra remote cflags, to handle icpc



tmpdir stuff

    Guard against the temporary directory being created by an attacker
    before we use it.   It must be owned by the right user and have
    the right privleges.

    Perhaps open the directory and fchdir to it to help prevent
    problems with the directory getting removed or having a different
    name while we're running.

    http://www.security-express.com/archives/bugtraq/2000-01/0012.html



loadavg limit on accepting connections?

    This should *not* be used instead of -j.  It's a backup.
 

handle -xFOO.



handle -Wp,-MF

    Some makefiles seem to generate this.  Aarg!



Installation as an SSH subsystem

    Might make use easier on Windows.  I don't see any real advantage
    on Unix.


control through command line

    Handle options like

       --distcc-verbose
       --distcc-hosts=

    to allow options to be set on the command line.


If we have produced a .i file and need to fall back to running locally
then use that rather than the original source.  On the other hand,
falling back to running the original command is possibly more robust.

 * @todo Make absolutely sure that if we fail, the .o file is removed.
 * Perhaps it would be better to receive to a temporary file and then
 * rename into place?  On the other hand, gcc seems to just write
 * directly, and if we fail or crash then Make ought to know not to
 * use it.
 *
 * @todo Count the preprocessor, and any compilations run locally, against the
 * load of localhost.  In doing this, make sure that we cannot deadlock
 * against a load limit, by having a case where we need to hold one lock and
 * take another to make progress.  I don't think there should be any such case
 * -- we can release the cpp lock before starting the main compiler.
 *

allow more control over verbosity

    For example, for the client, it would be nice to get just 'info'
    level messages about things that can or can't be distributed.


split gcc-specific argument parsing into a separate module


version checking

    Can we do some check that the compiler versions are compatible?

    What about a sanity check command that can be run at installation
    time?  This is now present as the distcc-check shell script.

    We could add several tests that exercise different aspects that
    are known to cause incompatibilities.  Alexandre suggests:

        Besides getting it to use ${CC-cc}, I also meant to make the source
        file configurable, which would make it possible to avoid certain
        known incompatibilities between compilers as they show up (e.g., using
        gcc 3.3 to preprocess and 3.2 to compile will get you link errors in
        programs that use the va_start macro).


    



boredom

    When there are too many jobs submitted by make, then we have to
    wait until any slot is available.  Unfortunately there is no
    OS-level locking system I can think of that allows us to block
    waiting for any one of a number of resources.

    If there are no slots to run, then at the moment we just sleep for
    2s.  This is OK, but can leave the processor idle.  It would be
    better to be woken up by other processes as they exit.  One way to
    do this would be to listen on a named pipe for notifications.

    This must be backed up by a sleep timer because we may not get the
    notification if e.g. the other process is killed.  Also it won't
    work on Cygwin, which doesn't have named pipes.

    Simply doing a select() on a pipe allows us to block for a while
    or until signalled.  Simply doing a nonblocking write of one byte
    to the pipe ought to allow waking up exactly one of the sleepers.

    Using an OS level semaphore to guard access to slots might work
    with some fudging, but there is no good portable implementation of
    them so it is moot.
    
    When woken, the clients can do one full round of trying to get a
    slot and then go back to sleep.

    This "guides" the OS scheduler towards keeping (almost) the exact
    number of clients activated, without too many of them spinning.

    We can't make the timeout too high, or the client will idle for a
    long time waiting for it.  But if we make it too low then we have
    the thundering herd problem that currently exists...

    Perhaps this is overengineering: people shouldn't make the -j
    number so high that this is hit very often, and we need to have
    the timeout anyhow, so why not just rely on it.

    Just listening on a pipe is cheaper than checking all the locks.



intel CC

    Does not understand the .ii extension.

    We need to specify -xc++ to make it properly compile C++ from
    preprocessed source.

    Is it OK to just get the user to add this?

    Perhaps we could add it always?
    
    Do we need a DISTCC_ADD_OPTIONS variable?


clean up temp files when a client is signalled

    Interrupting a compilation is pretty common.  It might be good to
    handle this more cleanly.

    We can also remove status files.


globally visible status files
    
    Perhaps store in a world-writable /var/lib/distcc, so that
    they're visible even when TMPDIR or HOME has been reset, as when
    building with emerge.

    Another good case to support is compilation from inside a chroot
    jail.
    
    It might also be nice to be able to see other people using your
    machine either as a client or as a server.

    This requires passing a trust boundary when publishing information
    across accounts.  The directory needs to be writable and the
    programs need to be robust against other users trying to cause
    mischief.  

    It's perhaps not great to allow that kind of security issue in a
    default installation.  Should we really create a mode 777
    directory by default?  umask will put some restrictions on what
    can be seen.

    Alternatively, have an environment variable that sets the state
    location.  If people want it globally visible they can set it to a
    global location.


dnotify in monitor

    This has been implemented, but I pulled it out because I'm not
    convinced it is a good idea.   

    Signals into GTK seem to cause some trouble when running from
    valgrind etc.

    Polling is not too expensive, and is nice and simple.  It also
    allows easier ways to handle corner cases like cleaning up state
    files left over after a compiler is terminated.

    Could set up dnotify on the state directory so that we don't have
    to keep polling it.  This would slightly reduce our CPU usage when
    idle, and might allow for faster updates when busy.

    We still have to scan the whole directory though, so we don't want
    to do it too often.

    I'm not sure how to nicely integrate this into GNOME though.
    dnotify sends us a signal, which doesn't seem to fit in well with
    the GNOME system.  Perhaps the dummy pipe trick?  Or perhaps we
    can jump out of the signal?  

    We can't call GTK code from inside.

    state changes are "committed" by renaming the file, so we'd want
    to listen for DN_RENAME I think.

    We need to make sure not to get into a loop by reacting to our own
    delete events.


SSH connection hoarding

    It might be nice to hold open SSH connections to avoid the network
    and CPU overhead of opening new ones.  

    However, fsh is far too slow, probably because of being written in
    Python. 

    It's only going to work on systems which can pass file descriptors
    and therefore needs to be optional.  Probably this only works on
    Unix.

    Building the kernel between the three x2000s seems to make
    localhost thrash.  A few jobs (but not many) get passed out to the
    other machines.

    Perhaps for C++ or something with really large files fsh would be
    better because the cost of starting Python would be amortized
    across more work.

    I don't think this needs to be done in distcc.  It can be a
    completely separate project to just rewrite fsh into C.  Indeed
    you could even be compatible with the Python implementation and
    just write the short-lived client bit in C.


Masquerade

    It might be nice to automatically create the directory and
    symlinks.  However we don't know what compiler names they'll want
    to hook...  

    Probably the best that we can do is provide clear instructions for
    users or package distributors to set this up.


Packaging

    Perhaps build RPMS and .debs?

    Is it easy to build a static (or LSB-compliant?) .rpm on Debian?
    
    What about an apt repository?


never run locally?

    Perhaps if compilation on one remote machine fails, try another, rather
    than falling back to localhost?

    Perhaps we should more carefully distinguish e.g. "failed to connect",
    "server dropped connection", etc etc.  

    Perhaps backing off from downed machines will make this more or
    less unnecessary.


asynchronous name lookups

    http://people.redhat.com/drepper/asynchnl.pdf

    I don't think this is very useful for us until we get to
    attempting parallel connections, because the client just doesn't
    have anything else to do until name resolution is complete.

    We could use this to impose a timeout on name lookups but that
    doesn't seem like a very common problem:

     - if your local resolver nameserver is broken, fix it
     - if a named hosts's nameserver is broken, fix it


Statistics

    Accumulate statistics on how many jobs are built on various machines.

    Want to be able to do something like "watch ccache -s".

    Perhaps just dump files into a status directory where they can be
    examined?

    Ignore (or delete) files over ~60s old.  This avoids problems with
    files hanging around from interrupted compilations. 


refactor name handling

    Common function that looks at file extensions and returns
    information about them

        - what is the preprocessed form of this extension?
        - does this need preprocessing?
        - is this a source file?


check that EINTR is handled in all cases


check that all lengths are unsigned 32-bit

    I think this is done, but it's worth checking a bit more.


abort when cpp fails

    The same SIGCHLD handling approach used to feed the compiler from
    a fifo might be used to abort early if the preprocessor fails.
    This will happen reasonably often, whenever there is a problem
    with an include, ifdef, comment, etc.  

    It might save waiting for a long connection to complete.

    One complication is that we know the compiler ought to consume all
    its input but we don't know when cpp ought to finish.  So the
    sigchld handler will have to check if it failed or not.  If it
    failed, then abort compilation.  If it did not fail, then keep
    going with the connection or whatever.

    This is probably not worthwhile at the moment because connections
    generally seem faster than waiting for cpp.


feed compiler from fifo

    Probably quite desirable, because it allows the compiler to start
    work sooner.

    This was originally removed because of some hitches to do with
    process termination.  I think it can be put back in reliably, but
    only if this is fixed.  Perhaps we need to write to the compiler
    in nonblocking mode?

    Perhaps it would be better to talk to both the compiler and
    network in nonblocking mode?  It is pretty desirable to pull
    information from the network as soon as possible, so that the TCP
    windows and buffers can open right up.

    Check CVS to remember what originally went wrong here.

    Events that we need to consider:

        Client forks
        
        Compiler opens pipe
        
        Client exits

        Server opens pipe
        
    There are a few possibilities here:

        Client opens fifo, reads all input, and exits.  The normal
        success case.

        Client never reads from fifo and just exits.  Would happen if
        the compiler command line was wrong.

        Client reads from fifo but not the whole thing, and then
        exits.

    Opening the fifo is a synchronization point: in blocking mode
    neither the compiler or server can proceed past here until the other
    one opens it.  If the compiler exits, then the server ought to be
    broken out of it by a SIGCHLD.  But there is a race condition
    here: the SIGCHLD might happen just before the open() call.  

    We need to either jump out of the signal handler and abort the
    compilation, or use a non-blocking open and a dummy pipe to break
    the select().

    If we jump out with longjmp then this makes the code a bit
    convoluted.

    Alternatively the signal handler could just do a nonblocking open
    on the pipe, which would allow the open to complete, if it had not
    already.

    This was last supported in 0.12.  That version doesn't handle the
    compiler exiting without opening the pipe though.


streaming input output

    We could start sending the preprocessed source out before it is
    complete.  This would require a protocol that allows us to send
    little chunks from various streams, followed by an EOF.  

    This can certainly be done -- fsh and ssh do it.  However,
    particularly if we want to allow for streaming more than one thing
    at a time, then getting all the timing conditions right to avoid
    deadlock caused by bubbles of data in TCP pipes.  rsync has had
    trouble with this.  It's even more hairy when running over ssh.

    So on the whole I am very skeptical about doing this.  Even when
    refactored into a general 'distexec', this is more about batch
    than interactive processing.


assemble on client

    May be useful if there is a cross compiler but no cross assembler,
    as is supposed to be the case for PPC AIX.  See thread by Stuart D
    Gathman.  Would also allow piping output back to client, if the
    protocol was changed to support that.


web site

    http://user-mode-linux.sourceforge.net/thanks.html


sendfile
 
    perhaps try sendfile to receive as well, if this works on any platforms.


static linking
    
    cachegrind shows that a large fraction of client runtime is spent in the
    dynamic linker, which is kind of a waste.  In principle using dietlibc
    might reduce the fixed overhead of the client.  However, the nsswitch
    functions are always dynamically linked: even if we try to produce a
    static client it will include dlopen and eventually indirectly get libc,
    so it's probably not practical.

testing

    How to use Debian's make-kpkg with distcc?  Does it work with the
    masquerade feature?

    http://moin.conectiva.com.br/files/AptRpm/attachments/apt-0.5.5cnc4.1.tar.bz2



way to set nice values for SSH mode

    Just need a client option that causes --nice NICE to be set for
    the remote server.

    
ccache
    
    Add an uncached fd to ccache, so that we can describe e.g. network
    failures that shouldn't be remembered. 

    Already done as UNCACHED_ERR_FD in distcc.  Needs to be patched in
    to ccache. 


coverage

    Try running with gcov.  May require all tests to be run from the same
    directory (no chdir) so that the .da files can accumulate properly.


slow networks

    Use Linux Traffic Control to simulate compilation across a slow
    network.


scheduling onto localhost

    Where does local execution fit into the picture?

    Perhaps we could talk to a daemon on localhost to coordinate with
    other processes, though that's a bit yucky.

    However the client should use the same information and shared
    state as the daemon when deciding whether it can take on another
    job.

    At the moment we just use a fixed number of slots, by default 4,
    and this seems to work adequately.


make "localhost" less magic

    Recognizing this magic string and treating it differently from
    127.0.0.1 or the canonical name of the host is perhaps a bit
    strange.  People do seem to get it wrong.  I can't think of a
    better simple solution though.



DNS multi-A-records

    build.foo.com expands to a list of all IP addresses for building. 

    Need to choose an appropriate target that has the right compilers.

    Probably not a good idea.

    If we go to using DNS roundrobin records, or if people have the same
    HOSTS set on different machines, then we can't rely on the ordering of
    hosts.  Perhaps we should always shuffle them?

    ssh is an interesting case because we probably want to open the
    connection using the hostname, so that the ssh config's "Host"
    sections can have the proper effect.

    Sometimes people use multi A records for machines with several
    routeable interfaces.  In that case it would be bad to assume the
    machine can run multiple jobs, and it is better to let the
    resolver work out which address to use.




DNS SRV records
    
    Can only be updated by zone administrator -- unless you have
    dynamic DNS, which is quite possible.



better scheduler

    What's the best way to schedule jobs?  Multiprocessor machines present
    a considerable complication, because we ought to schedule to them even
    if they're already busy.

    We don't know how many more jobs will arrive in the future.  This
    might be the first of many, or it might be the last, or all jobs might
    be sequenced in this stage of compilation.

    Generic OS scheduling theory suggests (??) that we should schedule a
    job in the place where it is likely to complete fastest.  In other
    words, we should put it on the fastest CPU that's not currently busy.

    We can't control the overall amount of concurrency -- that's down to
    Make.  I think all we really want is to keep roughly the same number
    of jobs running on each machine.

    I would rather not require all clients to know the capabilities of the
    machines they might like to use, but it's probably acceptable.

    We could also take the current load of the CPUs into account, but I'm
    not sure if we could get the information back fast enough for it to
    make a difference.

    Note that loadavg on Linux includes processes stuck in D state,
    which are not necessarily using any CPU.


    We want to approximate all tasks on the network being in a single queue,
    from which the servers invite tasks as cycles become available.  

    However, we also want to preserve the classic-TCP model of clients opening
    connections to servers, because this makes the security model
    straightforward, works over plain TCP, and also can work over SSH.

    http://www.cs.panam.edu/~meng/Course/CS6354/Notes/meng/master/node4.html

    Research this more.

    We "commit" to using a particular server at the last possible moment: when
    we start sending a job to it.  This is almost certainly preferable to
    queueing up on a particular server when we don't know that it will be the
    next one free. 

    One analogy for this is patients waiting in a medical center to see one of
    several doctors.  They all wait in a common waiting room (the queue) until
    a doctor (server) is free.  Normally the doctors would come into the
    waiting room to say "who's next?", but the constraint of running over TCP
    means that in our case the doctors cannot initiate the transaction.
    
    One approach would be to have a central controller (ie receptionist), who
    knows which clients are waiting and which servers are free, but I don't
    really think the complexity is justified at this stage.

    Imagine if the clients sat so that they could see which doctor had their
    door open and was ready to accept a new patient.  The first client who
    sees that then gets up to go through that door.  There is a possibility of
    a race when two patients head for the door at the same time, but we just
    need to make sure that only one of them wins, and that the other returns
    to her seat and keeps looking rather than getting stuck.

    Ideally this will be built on top of some mechanism that does not rely on
    polling.

    I had wondered whether it would work to use refused TCP connections to
    indicate that a server's door is closed, but I think that is no good.  

    It seems that at least on Linux, and probably on other platforms, you
    cannot set the TCP SYN backlog down to zero for a socket.  The kernel will
    still accept new connections on behalf of the process if it is listening,
    even if it's asked for no backlog and if it's not accepting them yet.
    netstat shows these processes just in 

    It looks like the only way to reliably have the server turn away
    connections is to either close its listening socket when it's too busy, or
    drop connections.  This would work OK, but it forces the client into
    retrying, which is inefficient and ugly.

    Suppose clients connect and then wait for a prompt from the server before
    they begin to send.  For multiple servers the client would keep opening
    connections to new machines until it got an invitation to send a job.

    This requires a change to the protocol but it can be made backward
    compatible if necessary, though perhaps that's not necessary.

    This would have the advantage of working over either TCP or SSH.  The main
    problem is that the client will potentially need to open connections to
    many machines before it can proceed.

    We almost certainly need to do this with nonblocking IO, but that should
    be reasonably portable.
    
    Local compilation needs to be handled by lockfiles or some similar
    mechanism.

    So in pseudocode this will be something like

    looking_fds = []
    while not accepted:
      select() on looking_fds:
        if any have failed, remove them
	if any have sent an invitation:
	  close all others
	  use the accepted connection
      open a new connection

    I'm not sure if connections should be opened in random order or the order
    they're listed.

    Clients are almost certainly not going to be accepted in the order in
    which they arrive. 

    If the client sends its job early then it doesn't hurt anybody else.  I
    suppose it could open a lot of connections but that sort of fairness issue
    is not really something that distcc needs to handle.  (Just block the user
    if they misbehave.)

    We can't use select() to check for the ability to run a process locally.
    Perhaps the select() needs to timeout and we can then, say, check the load
    average.


problems with new protocol

    Does anyone actually want this?  I really need an example of
    somewhere where it would be useful.

    The server may need to know the right extension for the temporary
    file to make the compiler behave in the right way.  In fact,
    knowing the acceptable temporary filenames is part of the
    application definition.


proposed new protocol

    Client connects to server, but needs to get confirmation before it
    actually starts sending the job.  It can actually ask multiple servers,
    and then choose the one that is seen to respond first.

    Also, client and server should always send their version strings.  This
    might make debugging easier in some cases.
    
    Because we'd like to support persistent connections in the future, it is
    not OK to rely on connection opening/closing at any point in the
    protocol. 
    
    As a partial exception, we might follow HTTP by making parties drop the
    connection when they think the other side is confused, as a way of
    restoring sanity.  However a mere compiler error does not require the
    connection to be closed.

    Therefore: there is a little state machine where the client and server are
    handshaking.  The client sends DIST, and when the server is ready for a
    job it sends back MEEP.  The client can now either proceed to send the
    job, or can drop the connection, or it can send DIST again when it wants
    to restart.

    Is there a mutual commitment happening here?  How does the server know
    that the client is actually not going to use it?  What would happen if 10
    clients connected, all got invitations, and then all tried to proceed to
    use the server?

    If the connection is not dropped how does the server know that the client
    is not going to accept the invitation?  For how long is the invitation
    valid?

    Is this overoptimization?

    Another good reason to have the client wait for a greeting is that it
    allows for cleaner error messages when the server decides not to talk to
    us and drops the connection: rather than, say, sendfile failing with
    EPIPE, we'll get a closed connection at a point where it's expected.

    Is the cost of an initial turnaround too high?



Compression

    Use LZO.  Very cheap compared to cpp.

    Perhaps statically link minilzo into the program -- simpler and
    possibly faster than using a shared library.

    Compression is probably only useful when we're network-bound.  We can
    roughly detect this by seeing whether we had to wait to acquire the mutex
    to send data to a particular machine.  If we're waiting for the network to
    be free, we might as well start compressing.

    Specify algorithm as a string so that it we can try e.g. gzip in a
    future release?

    Can compression automatically be turned on, rather than requiring
    user configuration?  I can't tell at the moment when would be the
    right time to do that.
    
    Is it cheap enough to always have it on?  We not only pay the cost
    of compression, but we also need to give up on using sendfile()
    and therefore pay for more kernel-userspace transitions and some
    data copying.  Therefore probably not, at least for GigE.


manpages

    The GNU project considers manpages to be deprecated, and they are
    certainly harder to maintain than a proper manual, but many people still
    find them useful.

    It might be nice to update the manual pages to contain quick-reference
    information that is smaller than the user manual but larger than what is
    available from --help.  Is that ever really needed?

    The manpages should be reasonably small both because that suits the
    format, and also because I don't want to need to keep too much duplicated
    information up to date.

    This might be a nice small bit of work for somebody who wants to
    contribute.
    
    http://www.debian.org/doc/debian-policy/ch-docs.html



User Manual

    Document SSH in more detail

    The UML manual is very good

    Make it clear that _LOG does not set the daemon logfile.

    Document the protocol (though perhaps not in the user manual.)

    Document DISTCCD_PATH.

    Suggest not using inetd unless you really have to.  You can do access
    control through distccd, and there may be more optimizations for
    standalone in the future.

 - Add some documentation of the benchmark system.  Does this belong
   in the manual, or in a separate manual?

 - Note that mixed gcc versions might cause different optimizations, which may
   be a problem.  In addition, files which test the gcc version either in the
   configure script or in the preprocessor will have trouble.  The kernel is
   one such program, and it needs to be built the same versions of gcc on all
   machines.

 - FAQ: Can't you check the gcc version?  No, because gcc programs which
   report the same versions number can have different behaviours, perhaps due
   to vendor/distributor patches.

 - Actually, distcc might use flock, lockf, or something else, depending on
   the platform.

 - Note that LSB requires init scripts to reset PATH, etc, so as to be
   independent of user settings if started interactively.

 - Discuss dietlibc.


Just cpp and linker?

   Is it easy to describe how to install only the bits of gcc needed for
   distcc clients?  Basically the driver, header, linker, and specs.  Would
   this save much space?

   Certainly installing gcc is much easier than installing a full cross
   development environment, because you don't need headers or libraries.  So
   if you have a target machine that is a bit slower but not terrible (or you
   don't have many of them) it might be convenient to do most of your builds
   on the target, but rely on helpers with cross-compilers to help out.


-g support

    I'm told that gcc may fix this properly in a future release.  There would
    then be no need to kludge around it in distcc.

    Perhaps detect the -g option, and then absolutify filenames passed to the
    compiler.  This will cause absolute filenames to appear in error messages,
    but I don't see any easy way to have both correct stabs info and also
    correct error messages.

    Is anything else wrong with this approach?

  
kill compiler

    If the client is killed, it will close the connection.  The server ought
    to kill the compiler so as to prevent runaway processes on the server. 

    This probably involves selecting() for read on the connection.

    The compilation will complete relatively soon anyhow, so it's not worth
    doing this unless there is a simple implementation.
    

data in syn packet?

    I think this is valid in TCP but never seen in the wild.

    Can corks cause data to be sent in the SYN or ACK packet?
    Apparently not.


tcp fiddling

    I wonder if increasing the maximum window size (sys.net.core.wmem_default,
    etc) will help anything?  It's probably dominated by scheduling
    inefficiency at the moment.

    The client does seem to spend time in wait_for_tcp_memory, which
    might be benefitted by increasing the available memory.


benchmark

    Try aspell and xmms, which may have strange Makefiles.

    glibc
    gtk/glib
    glibc++
    qt
    gcc
    gdb
    linux
    openoffice
    mozilla


rsync-like distributed caching

    Look in the remote machine's cache as well.

    Perhaps use a SQUID-like broadcast of the file digest and other critical
    details to find out if any machine in the workgroup has the file cached.
    Perhaps this could be built on top of a more general file-caching
    mechanism that maps from hash to body.  At the moment this sounds like
    premature optimization.

    Send source as an rdiff against the previous version.

    Needs to be able to fall back to just sending plain text of course.

    Perhaps use different compression for source and binary.

    librsync is probably not stable enough to do this very well.


--ping option
       
    It would be nice to have a <tt>--ping</tt> client option to contact
    all the remote servers, and perhaps return some kind of interesting
    information.  

    Output should be machine-parseable e.g. to use in removing
    unreachable machines from the host list.

    Perhaps send little fixed signatures, based on --version.  Would
    this ever be useful?


non-CC-specific Protocol

  Perhaps rather than getting the server to reinterpret the command
  line, we should mark the input and output parameters on the client.
  So what's sent across the network might be

    distcc -c @@INPUT@@ -o @@OUTPUT@@

  It's probably better to add additional protocol sections to say
  which words should be the input and output files than to use magic
  values.

  The attraction is that this would allow a particularly knotty part
  of code to be included only in the client and run only once.  If any
  bugs are fixed in this, then only the client will need to be
  upgraded.  This might remove most of the gcc-specific knowledge from
  the server.

  Different clients might be used to support various very different
  distributable jobs.

  We ought to allow for running commands that don't take an input or
  output file, in case we want to run "gcc --version".

  The drawback is that probably new servers need to be installed to
  handle the new protocol version.

  I don't know if there's really a compelling reason to do this.  If
  the argument parser depends on things that can only be seen on the
  client, such as checking whether files exist, then this may be
  needed.
  
  The server needs to use an appropriately-named temporary file.


gcc wierdnesses:

    distcc needs to  handle <tt>$COMPILER_PATH</tt> and
    <tt>$GCC_EXEC_PREFIX</tt> in some sensible way, if there is one.
    Not urgent because I have never heard of them being used.


networking timeouts:

    Can we just set SO_TIMEOUT on the socket?

    distcc waits for too long on unreachable hosts.  We probably need
    to timeout after about a second and build locally.  Probably this
    should be implemented by connect() in non-blocking mode, bounded
    by a select.

    Also we want a timeout for name resolution.  The GNU resolver has
    a specific feature to do this.  On other systems we probably need
    to use alarm(), but that might be more trouble than it is worth.  Jonas
    Jensen says:

	Timing out the connect call could be done easier than this, just by
	interrupting it with a SIGALRM, but that's not enough to abort
	gethostbyname. This method of longjmp'ing from a signal handler is what
	they use in curl, so it should be ok.


waitstatus

    Make sure that native waitstatus formats are the same as the
    Unix/Linux/BSD formats used on the wire.  (See
    <http://www.opengroup.org/onlinepubs/007904975/functions/wait.html>,
    which says they may only be interpreted by macros.)  I don't know
    of any system where they're different.


override compiler name
	    
    distcc could support cross-compilation by a per-volunteer option to
    override the compiler name.  On the local host, it might invoke gcc
    directly, but on some volunteers it might be necessary to specify a more
    detailed description of the compiler to get the appropriate cross tool.
    This might be insufficient for Makefiles that need to call several
    different compilers, perhaps gcc and g++ or different versions of gcc.
    Perhaps they can make do with changing the DISTCC host settings at
    appropriate times.

    I'm not convinced this complexity is justified.

    Rusty is doing this in ccontrol, which is possibly a better place
    for it.



Installable package for Windows

    Also, it would be nice to have an easily installable package for Windows
    that makes the machine be a Cygwin-based compile volunteer.  It probably
    needs to include cross-compilers for Linux (or whatever), or at least
    simple instructions for building them.



autodetection (Rendezvous, etc)

    http://dotlocal.org/mdnsd/

    The Apple licence is apparently not GPL compatible.

    Brad reckons SLP is a better fit.

    Automatic detection ("zero configuration") of compile volunteers is
    probably not a good idea, because it might be complicated to implement,
    and would possibly cause breakage by distributing to machines which are
    not properly configured.


OpenMOSIX autodiscovery

    what is this?


central configuration


    Notwithstanding the previous point, centralized configuration for a site
    would be good, and probably quite practical.  Setting up a list of
    machines centrally rather than configuring each one sounds more friendly.
    The most likely design is to use DNS SRV records (RFC2052), or perhaps
    multi-RR A records.  For exmaple, compile.ozlabs.foo.com would resolve to
    all relevant machines.  Another possibility would be to use SLP, the
    Service Location Protocol, but that adds a larger dependency and it seems
    not to be widely deployed.



Large-scale Distribution

    distcc in it's present form works well on small numbers of close machines
    owned by the same people.  It might be an interesting project to
    investigate scaling up to large numbers of machines, which potentially do
    not trust each other.  This would make distcc somewhat more like other
    "peer-to-peer" systems like Freenet and Napster.



preprocess remotely

    Some people might like to assume that all the machines have the same
    headers installed, in which case we really can preprocess remotely and
    only ship the source.  Imagine e.g. a Clearcase environment where the same
    filesystem view is mounted on all machines, and they're all running the
    exact same system release.

    It's probably not really a good idea, because it will be marginally faster
    but much more risky.  It is possible, though, and perhaps people building
    files with enormous headers would like it.

    Perhaps those people should just use a different tool like dmake, etc.


Local variables:
mode: indented-text
indent-tabs-mode: nil
End: