


SHORTEN(1)					       SHORTEN(1)


NAME
       shorten - fast compression for waveform files

SYNOPSIS
       shorten	[-hlu]	[-a  #bytes] [-b #samples] [-c #channels]
       [-d #bytes] [-m #blocks] [-n #dB] [-p #order]  [-q  #bits]
       [-r  #bits]  [-t	 filetype]  [-v	 #version] [waveform-file
       [shortened-file]]

       shorten -x [-hl] [ -a #bytes] [-d #bytes]  [shortened-file
       [waveform-file]]

DESCRIPTION
       shorten reduces the size of waveform files (such as audio)
       using Huffman coding of prediction residuals and	 optional
       additional  quantisation.   In lossless mode the amount of
       compression obtained depends on the nature  of  the  wave-
       form.   Those  composing of low frequencies and low ampli-
       tudes give the best compression, which may be 2:1 or  bet-
       ter.   Lossy  compression operates by specifying a minimum
       acceptable segmental signal to noise ratio  or  a  maximum
       bit  rate.    Lossy  compression	 operates  by zeroing the
       lower order bits of the waveform,  so  retaining	 waveform
       shape.

       If  both	 file  names are specified then these are used as
       the input and output files.  The first file  name  can  be
       replaced	 by  "-" to read from standard input and likewise
       the second filename can be replaced by  "-"  to	write  to
       standard	 output.   Under  UNIX,	 if only one file name is
       specified, then that name is used for input and the output
       file name is generated by adding the suffix ".shn" on com-
       pression and removing the ".shn" suffix on  decompression.
       In  these  cases	 the input file is removed on completion.
       The use of automatic file name generation is not currently
       supported  under	 DOS.	If  no	file names are specified,
       shorten reads from standard input and writes  to	 standard
       output.	 Whenever  possible, the output file inherits the
       permissions, owner, group, access and  modification  times
       of the input file.

OPTIONS
       -a align bytes
	      Specify  the  number of bytes to be copied verbatim
	      before compression begins.  This option can be used
	      to  preserve fixed length ASCII headers on waveform
	      files, and may be necessary if the header length is
	      an odd number of bytes.

       -b block size
	      Specify  the number of samples to be grouped into a
	      block for processing.  Within a  block  the  signal
	      elements	are  expected  to  have the same spectral
	      characteristics.	The default option works well for



			   22 Dec 1995				1





SHORTEN(1)					       SHORTEN(1)


	      a large range of audio files.

       -c channels
	      Specify  the number of independent interwoven chan-
	      nels.  For two signals, a(t) and b(t) the	 original
	      data format is assumed to be a(0),b(0),a(1),b(1)...

       -d discard bytes
	      Specify the number of bytes to be discarded  before
	      compression  or decompression.  This may be used to
	      delete header information from a	file.	Refer  to
	      the -a option for storing the header information in
	      the compressed file.

       -h     Give a short message specifying usage options.

       -l     Prints the software license specifying  the  condi-
	      tions  for the distribution and usage of this soft-
	      ware.

       -m blocks
	      Specify the number of past blocks	 to  be	 used  to
	      estimate	the  mean  and	power of the signal.  The
	      value of zero disables this prediction and the mean
	      is assumed to lie in the middle of the range of the
	      relevant data type (i.e. at zero for signed quanti-
	      ties).	The  default value is non-zero for format
	      versions 2.0 and above.

       -n noise level
	      Specify the minimum acceptable segmental signal  to
	      noise  ratio  in	dB.  The signal power is taken as
	      the variance of the samples in the  current  block.
	      The  noise power is the quantisation noise incurred
	      by coding the current block assuming  that  samples
	      are  uniformally	distributed over the quantisation
	      interval.	 The bit rate is dynamically  changed  to
	      maintain	the  desired  signal to noise ratio.  The
	      default value represents lossless coding.

       -p prediction order
	      Specify the maximum order of the linear  predictive
	      filter.  The default value of zero disables the use
	      of linear prediction and a polynomial interpolation
	      method is used instead.  The use of the linear pre-
	      dictive  filter  generally  results  in	a   small
	      improvement  in compression ratio at the expense of
	      execution time.	This is the only option to use	a
	      significant  amount  of  floating	 point processing
	      during compression.   Decompression  still  uses	a
	      minimal number of floating point operations.

	      Decompression  time is normally about twice that of
	      the default polynomial interpolation.  For  version



			   22 Dec 1995				2





SHORTEN(1)					       SHORTEN(1)


	      0	 and  1, compression time is linear in the speci-
	      fied maximum order as all lower values are searched
	      for  the	greatest expected compression (the number
	      of bits required to transmit the prediction  resid-
	      ual  is  monotonically  decreasing  with prediction
	      order, but  transmitting	each  filter  coefficient
	      requires	about 7 bits).	 For version 2 and above,
	      the search is started at zero order and  terminated
	      when  the	 last two prediction orders give a larger
	      expected bit rate than the minimum found	to  date.
	      This  is	a reasonable strategy for many real world
	      signals - you may revert	back  to  the  exhaustive
	      algorithm	 by  setting -v1 to check that this works
	      for your signal type.

       -q quantisation level
	      Specify the number of low order bits in each sample
	      which can be discarded (set to zero).  This is use-
	      ful if these bits carry no information, for example
	      when the signal is corrupted by noise.

       -r bit rate
	      Specify  the  expected  maximum  number of bits per
	      sample.	The  upper  bound  on  the  bit	 rate  is
	      achieved	by setting the low order bits of the sam-
	      ple to zero, hence maximising the segmental  signal
	      to noise ratio.

       -t file type
	      Gives  the  type of the sound sample file as one of
	      {ulaw,alaw,s8,u8,s16,u16,s16x,u16x,s16hl,u16hl,s16lh,u16lh}.
	      ulaw is the natural file type of ulaw encoded files
	      (such as the default sun .au files) and alaw  is	a
	      similar  byte-packed  scheme.   All the other types
	      have initial s or u for signed  or  unsigned  data,
	      followed	by 8 or 16 as the number of bits per sam-
	      ple.  No further extension means the data is in the
	      natural  byte  order,  a	trailing x specifies byte
	      swapped data, hl explicitly states the  byte  order
	      as  high	byte followed by low byte and lh the con-
	      verse.  The default is s16, meaning signed  16  bit
	      integers in the natural byte order.

	      Specific optimisations are applied to ulaw and alaw
	      files.  If lossless compression is  specified  with
	      ulaw  files  then	 a  check  is made that the whole
	      dynamic range is used (useful for files recorded on
	      a	 SparcStation  with  the  volume  set  too high).
	      Lossless coding of both file types uses an internal
	      format  with  a  monotonic  mapping  to linear.  If
	      lossy compression is specified  then  the	 data  is
	      internally  converted  to linear.	 The lossy option
	      "-r4" has been observed to give little degradation.




			   22 Dec 1995				3





SHORTEN(1)					       SHORTEN(1)


       -u     The  ulaw	 standard  (ITU G711) has two codes which
	      both map onto the zero value  on	a  linear  scale.
	      The "-u" flag maps the negative zero onto the posi-
	      tive zero and so yields marginally better	 compres-
	      sion  for format version 2 (the gain is significant
	      for older format versions).

       -v version
	      Specify the binary format version	 number	 of  com-
	      pressed  files.	 Legal values are currently 1 and
	      2, higher numbers generally giving better	 compres-
	      sion.    Detection  of  format version on decode is
	      automatic.

       -x extract
	      Reconstruct the original file.  All  other  command
	      line options except -a and -d are ignored.


METHODOLOGY
       shorten	works  by  blocking the signal, making a model of
       each block in order to remove  temporal	redundancy,  then
       Huffman coding the quantised prediction residual.


   Blocking
       The signal is read in a block of about 128 or 256 samples,
       and converted to integers  with	expected  mean	of  zero.
       Sample-wise-interleaved	data  is  converted  to	 separate
       channels, which are assumed independent.


   Decorrelation
       Four functions are computed, corresponding to the  signal,
       difference  signal,  second  and	 third order differences.
       The one with the lowest variance is coded.   The	 variance
       is  measured  by	 summing absolute values for speed and to
       avoid overflow.


   Compression
       It is assumed the signal	 has  the  Laplacian  probability
       density function of exp(-abs(x)).  There is a computation-
       ally efficient way of  mapping  this  density  to  Huffman
       codes,  The code is in two parts, a run of zeros, a bound-
       ing one and a fixed number of bits mantissa.   The  number
       of  leading zeros gives the offset from zero.  Signed num-
       bers are stored by calling the function for unsigned  num-
       bers with the sign in the lowest bit.  Some examples for a
       2 bit mantissa:

	      100	 0
	      101	 1
	      110	 2



			   22 Dec 1995				4





SHORTEN(1)					       SHORTEN(1)


	      111	 3
	      0100	 4
	      0111	 7
	      00100	 8
	      0000100	 16

       This Huffman code was first used by Robert Rice, for  more
       details	see  the  technical  report  CUED/F-INFENG/TR.156
       included with the shorten distribution as files	tr154.tex
       and tr154.ps.


SEE ALSO
       compress(1),pack(1).


DIAGNOSTICS
       Exit  status  is	 normally  0.  A warning is issued if the
       file is not properly  aligned,  i.e.  a	whole  number  of
       records could not be read at the end of the file.

BUGS
       Large  values of '-c' or '-b' cause MS-DOS to throw a wob-
       bly.  Presumably this is a  (lack  of)  memory  management
       problem.

       An  easy	 way  to  test	shorten for your system is to use
       "make test", if this fails, for	whatever  reason,  please
       report it.

       No check is made for increasing file size, but valid wave-
       form files generally achieve some compression.  Even  com-
       pressing	 a  file  of  random  bytes (which represents the
       worst case waveform file) only results in a small increase
       in  the file length (about 6% for 8 bit data and 3% for 16
       bit data).  There is one condition  that	 is  know  to  be
       problematic,  that  is  the  lossy compression of unsigned
       data without mean estimation - large file sizes may result
       if the mean is far from the middle range value.	For these
       files the value of the -m switch should be non-zero, as it
       is by default in format version 2.

       There  is  no  provision for different channels containing
       different data types.  Normally, this is	 not  a	 restric-
       tion,  but  it  does mean that if lossy coding is selected
       for the ulaw type, then all channels use lossy coding.

       It would be possible for all options to	be  channel  spe-
       cific as in the -r option.   I could do this if anyone has
       a really good need for it.

       See the file "change.log" for a history of bug fixes.

       Please mail me immediately at the address below if you  do



			   22 Dec 1995				5





SHORTEN(1)					       SHORTEN(1)


       find a bug.


AVAILABILITY
       The  latest  version can be obtained by anonymous FTP from
       svr-ftp.eng.cam.ac.uk,  in  directory  comp.speech/coding.
       The  sources  are  available  for  UNIX	machines in files
       shorten.tar.Z and shorten.tar.gz and for DOS  machines  as
       file  shorten.zip.   All	 distributions contain a DOS exe-
       cutable.


AUTHOR
       Copyright (C) 1992-1995 by Tony Robinson and SoftSound Ltd
       (ajr@softsound.com)

       Shorten	is  available for non-commercial use without fee.
       See the LICENSE file for	 the  formal  copying  and  usage
       restrictions.






































			   22 Dec 1995				6


