1.7.1995

Welcome to POWERpv1.1, a set of programs
for altering and creating sounds using 
Fourier analysis/synthesis as its basis.
These programs build upon the
structure of the practical phase
vocoder C program described by
Richard Moore in Elements of Computer
Music. The programs work at the
level of the Unix command line.

First, a scan through the alterations
(hopefully improvements) since the last
release:

1. Reorganized directories for greater clarity.
2. Bug fixes for temper, tempeh.
3. Addition of several new modules.
4. Intel-friendly code, where needed.
5. Removal of NeXT dependencies.

Downer:

Documentation for the newer modules is sparse
to non-existent (the price for making this code 
available before the 21st century.)

Most of the programs behave like
filters, with an input and an output.
A few of the programs are synthesizers
with no input. These modules are dedicated
to Gordon Mumma, who has been fond of building
electronic music circuits with no inputs 
since the late 1950s.

In order to build these programs,
first visit the PVLIB directory
and type "make". (Don't be a slob! 
Type "make clean" afterwards.)
Then go to the src directory and
type:
make ; make install ; make clean.
You are now either done, or mad at
me because the stuff didn't compile.
Although I have removed most if not all
of the NeXT dependencies, not all
C compilers are the same, so you may hit
some obscure gotcha. Email me if you need help.
Please see the section HEADER HEADACHES below
before compiling, as you may need to make minor
alterations to the file pv.h, found in PVLIB.

HEADER HEADACHES 

A few of the new modules need to read in multiple
soundfiles, and therefore must deal with both headers
and byte-ordering. Since these programs don't know about
(and don't want to know about) soundfile headers, they
just throw them away, but first they have to know the
size of the header in bytes. This is defined in pv.h 
as 28 bytes (the size of a NeXT soundfile header) so
you will want to change this if your soundfiles have a
different header size. (Or you could use Sox to change
your soundfile to a NeXT soundfile). 

LA DIFFERENCE

If you are on a little-endian machine such as an Intel,
some of the new modules must swap bytes as they gobble
your integer samples directly. On big-endian machines
(such as Motorola NeXTs), you do not want byte-swapping. 
This choice is controlled by the BYTESWAP constant
defined in pv.h. By default, bytes are swapped.
It also follows that these programs will only work
on soundfiles stored as 16-bit integers.

"OFF WITH HER HEAD!" SHOUTED THE QUEEN

Most of these programs process a stream of
floating point samples through unix pipes.
Since most soundfiles have
some header information, and are often
stored as integer samples, you will
have to strip off the header, possibly
convert to floating point values, run
them through a POWERpv processor and then
reverse the conversion header process.
If you have the utilities to accomplish
this, then you probably already know how 
to use them. If not, I suggest you pick
up the Sound_io_NeXT.tar.Z distribution 
from princeton.edu. If you are on a 
NEXTSTEP/INTEL platform, you will want
nextsf-intel.tar.Z at wendy.ucsd.edu.

The CARL Package has recently been released
at ccrma-ftp.stanford.edu so you might
get the soundfile IO you need from there
instead. As a bonus, you also get cmusic
and Moore's original pv program.

All the processors in POWERpv are 
self-documenting to a certain degree. Example:

mitosis> temper
temper:  static spectral compander
temper   [flags] < floatsams > floatsams
        N:      fft length [1024]
        R:      sampling rate [44100]
        M:      window size in samples [2048]
        D:      decimation factor in samples [256]
        I:      interpolation factor in samples [256]
        P:      pitch factor [0 for overlap add] [1.0]
        f:      spectral companding factor [2.0]
        t:      oscillator resynthesis threshold [.001]
        s:      synthesize analysis input

The usage message gives you a list of flags,
and their default values. The temper processor,
since all the important values are predefined,
could be successfully executed without any user-supplied
parameters.

I will briefly discuss the parameters which are
most common throughout these programs. For more
information, see Moore.

R - sampling rate of input sound.
N - number of points in the FFT analysis.
This works out to N/2 instantaneous amplitudes
and frequencies (or phases - more about that later).
At first I felt ripped off when I realized that for
a value of N FFT points, I was only getting N/2
frequency points. But I felt better after I
realized I was also getting negative frequencies. [:)]
N must be a power of 2, since we are using the
FFT. Typical values are 1024 or 2048, but I recommennd
experimenting with extreme high and low values as they can
produce interesting effects. Note that higher
values of N increase frequency resolution but
decrease transient resolution. 


M - window size. if M is larger than N, blocks of
samples will be mixed together before analysis,
resulting in some loss of fidelity, but speeding
the computation process.

D - This is the number of samples skipped between
FFT analysis frames. D is conservatively set to
N/8 but it may be set as large as N under certain
conditions. The ratio N/D is roughly inversely
proportional to compute time. So if you want
quality, you have to pay.

I - this is the number of samples pumped out
for each FFT frame. If I != D, you have effectively
altered the duration of the sound without changing
its spectrum (much). Your own personal Springer machine.

P - this is a pitch multiplier for the spectrum.
One of the interesting features of Moore's 
implementation is that the FFT frame may be
resynthesized either with the inverse FFT, or
with a bank of oscillators. Oscillator resynthesis
tends to be more computationally expensive than
inverse FFT resynthesis, but it is also necessary for
processors which radically alter the frequency
content of the spectrum. A non-zero value for
P specifies oscillator bank resynthesis.
For P = 2, the spectrum is shifted up an octave
while the duration remains the same. Groovy.
When using inverse FFT synthesis (P = 0), it may not
be necessary to convert phase to instantaneous
frequency if you pretty much leave the frequencies
alone. This efficiency measure is discussed elsewhere.
See the leanconvert subroutine for details.

t - this is a multiplier to determine a threshold
below which frequencies are not resynthesized.
It applies only to oscillator bank resynthesis
and can speed up the processor considerably.
A threshold of .001 corresponds to about -60dB.
I've gotten away with values of .01 and higher.
Past a certain point, this creates noticeable
artifacts which you may or may not enjoy.
The actual synthesis threshold is adaptively
recalculated at each FFT frame, relative to the
highest reported amplitude.

s - it is possible for the processors to accept
a stored FFT analysis file. The -s flag indicates
that your input is in this format. Some of the
processors only use analysis data, in which case
the -s flag is disabled. I have separated the
analysis portion of Moore' program into a
standalone, pvanal, since I didn't wish to
include the analysis code in every processor.
Please note that analysis files will be larger
than the source files, perhaps considerably
larger, and courtesy may dictate sparing use of
storing these files at a shared installation.

Finally, here is an example commandline use of
the above program employing the CARL soundfile
filters fromsf and tosf:

mitosis> fromsf -H crumhorn_master | temper -R44100 -N2048 -M8192 -D1024 -I1024 -f.666 | tosf -R44100 -c1 pile_driver

or, using the nextsf tools:

mitosis> fromsnd crumhorn_master | temper -R44100 -N2048 -M8192 -D1024 -I1024 -f.666 | tosnd pile_driver


Congratulations! You are now ready to do some
serious frequency-domain hacking. 

Bug reports, tapes and denatured texts to:

Eric Lyon
eric@cmlab.keio.sfc.ac.jp
eric@wendy.ucsd.edu