/* This is a reference implementation of Snefru. Snefru is a one-way hash
   function that provides authentication. It does not provide secrecy.

Snefru is named after a Pharaoh of ancient Egypt.



Copyright (c) Xerox Corporation 1989 All rights reserved.

License to  copy and use this software is granted provided that it is
   identified as the 'Xerox Secure Hash Function' in all material mentioning
   or referencing this software or this hash function.

License is also granted to make and use derivative works provided that such
   works are identified as 'derived from the Xerox Secure Hash Function' in
   all material mentioning or referencing the derived work.

Xerox Corporation makes no representations concerning either the
   merchantability of this software or the suitability of this software for
   any particular purpose.  It is provided "as is" without express or implied
   warranty of any kind.

These notices must be retained in any copies of any part of this software.



This is version 2.0, July 31, 1989.

Version 2.0 is algorithmically different from versions 1.4 and 1.3.
In particular, version 2.0 makes the following changes:

1.)  The S-boxes in version 2.0 have been computed in accordance with a
   publicly known algorithm.

2.)  The special case handling of inputs shorter than 64 bytes has been
   deleted.  This special case code offered little performance advantage and
   increased the complexity of the algorithm.  In addition, Coppersmith
   noticed a weakness that affected the 128-bit-input/128-bit-output version
   of Snefru (though not the 512-bit input/128-bit or 256-bit output version).

3.)  The parameters p0, p1, and p2 have been eliminated.  There are several
   reasons for this change.  The principle reason is that they increase the
   complexity both of the code and the conceptual complexity of the hash
   function, and provide only modest improvement in security.

4.)  Because the parameter mechanism was used to distinguish between inputs
   that differ only in the number of trailing 0 bits, a different mechanism
   has been adopted.  This new mechanism simply counts the number of bits in
   the input, and then places this 64-bit bit-count into the rightmost
   64-bits of the last block hashed.  A slightly different method of applying
   the hash also been adopted.

5.)	Several people requested a larger output (to provide greater
   security). Because this will not always be needed, the algorithm was
   modified to generate either 128 bits or 256 bits of output, depending on a
   command line option.  Notice that 128 bits of output only provides 64
   "real" bits of security (because of square root attacks) and similarly 256
   bits of output provides only 128 "real" bits of security. Use of the
   higher security 256-bit output will slow down the algorithm by about 1/3
   (32 bytes per application of Hash512 versus 48 bytes per application of
   Hash512).

6.) Other non-algorithmic changes have been made to the code, in keeping
   with various criticisms and comments.



A 128 bit output should provide adequate security for most commercial
   applications (barring discovery of some unexpected weakness in the
   algorithm).  Where higher security is desired, the 256-bit output
   size can be used.

Version 1.4 differs from Version 1.3 only in the wording of the export
   notice. Version 1.3 forbids export entirely.

Version 1.3 fixes a security bug in handling short files (64 bytes or less).
   Such files should use the length of the file as a parameter to the hash,
   to prevent two files of different lengths from producing the same hash
   value if one file is a prefix of another.  This had been done for long
   files (greater than 64 bytes in length) but was overlooked for short
   files.

In addition, Version 1.3 improves the way in which the parameter is mixed
   into the hash.  In essence, the change mixes in the parameter one more
   time. Although a similar effect can be achieved by increasing the security
   parameter by 1 (e.g., from 2 to 3) this also increases the amount of
   computation required by 50%.

Version 1.3 also makes some more changes in the notices that accompany the
   code.

Version 1.2 makes no changes in the code.  Only the notices that accompany
   the code have changed, and some changes in the comments.

Version 1.1 added the routine 'convertBytes' to convert an array of 'char'
   into an array of unsigned long int.  This conversion is a no-op on the SUN
   and many other computers, but will have an effect on the VAX and computers
   with 'backwards' byte ordering.  It will also SLOW DOWN THE HASH FUNCTION,
   so it should be removed whenever possible if speed is an issue.

This program reads from the standard input until EOF is reached (the first
   'read' that does not return a full buffer of data).  The data on the
   standard input is 'hashed' with a cryptographically secure one-way hash
   function (also known as a 'message digest', 'fingerprint', 'Manipulation
   Detection Code' or 'MDC').  The hash is then printed on the standard
   output.

The input can be of any size.  The output is either 128 bits printed as 32
   characters in hex, or 256 bits printed as 64 characters in hex.

The primary use of one-way hash functions is to determine if there have been
   any unauthorized, malicious, or accidental changes made to a file.  For
   example, if an executable program file produces the hash '209884c4
   2e89d967 5456ac0e 61269550', then any change to that program file will
   cause the hash to be changed.  Thus, the tampering can be detected by
   comparing the current output value with the previously computed (and
   presumably correct) output value.

Hash512 is the centrol routine in this program.  It is used in this program
   in a linear fashion -- i.e., a sequential file is hashed down by repeated
   applications of Hash512.  Changing a single bit in the file would then
   require completely re-computing the hash from the point of change onward.

Hash512 can be used in a tree-structured fashion to authenticate a large
   table of data. This would imply that changing a single bit would not force
   a complete re-computation of the hash value, but would instead require
   only log n re-computations of Hash512 to 'patch up' the changes along the
   path from the root to the changed leaf entry. A tree-structured
   application also has the advantage that any single entry in the table can
   subsequently be authenticated by someone who knows only the
   'authentication path' from the root of the tree to the leaf entry.  These
   concepts are discussed more thoroughly in 'Secrecy, Authentication, and
   Public Key Systems' by Ralph C. Merkle, UMI Research Press, 1982 (see
   particularly Chapter 2, 'One Way Hash Functions').  The use of a
   tree-structured pattern of applications of a one-way hash function is
   covered by U.S. Patent #4,309,569, 'Method of Providing Digital
   Signatures' (contact Stanford University, Office of Technology Licensing).


At the present time (July 31, 1989) the author knows of no method for
   'breaking' this one-way function, (i.e., finding two input files that
   produce the same output value). The algorithm has undergone some review
   for security.  Further review is expected.  Use of this algorithm for
   production use is not recomended at this time.  Note that we are
   specifically examining the security of Hash512 with a 512-bit input and a
   128-bit output, and Hash512 with a 512-bit input and a 256-bit output.
   Use of other sizes is not recomended at this time.  In particular, we
   recomend against the use of output sizes smaller than 128 bits, and
   recomend against the use of an input that is less than 2 (two) times the
   size of the output.  When the input size equals the output size, Snefru
   suffers a serious degradation in security (an observation due to
   Coppersmith).

If anyone using this program finds two different inputs that produce the same
   output, please contact Ralph C. Merkle via E-mail (merkle@xerox.com) or
   via normal mail at: Xerox PARC 3333 Coyote Hill Road Palo Alto, CA 94304
   (415) 494-4000


See the paper 'A Software One Way Hash Function', by Ralph C. Merkle, for a
   more detailed explanation.

The following test cases were taken directly from a terminal, and can be used
   to verify the correct functioning of an implementation of Snefru.  The
   first input is simply a carriage return.  The second input is '1', the
   third input is '12', etc.  The test sequence is repeated with security
   parameter "4" and with the output size set to "256".

% hash

 13af7619 ab98d4b5 f5e0a9e6 b26b5452
% hash
1
 578c83f8 8fe1f6a8 c119d2ba 3a9256c2
% hash
12
 255468d4 b4bd985b 696a7313 6027fc80
% hash
123
 f5339a52 9c4dafc5 34fe3f0d 7a66baf7
% hash
1234
 2645ff86 9a6c0ec6 5c49c20d d9050165
% hash
12345
 387d2929 8ed52ece 88e64f38 fe4fdb11
% hash
123456
 f29f8915 d23a0e02 838cc2e2 75f5dfe7
% hash
1234567
 4fb0f76e 9af16a2d 61844b9c e833e18f
% hash
12345678
 aacc56fc 85910fef e81fc697 6b061f4e
% hash
123456789
 e6997849 44ed68a1 c762ea1e 90c77967


% hash 4 256

 6c504351 ce7f4b7a 93adb29a f9781ff9 2150f157 fee18661 eef511a3 fc83ddf
% hash 4 256
1
 65d657f8 85ad8b4a b35999cc 3ded8b82 7cf71fa4 25424750 35778910 d6c2e320
% hash 4 256
12
 7636f3d1 af139cf9 58f46f99 66221282 a444732a 7de59da5 d3481c6b bd6e7092
% hash 4 256
123
 cd3c7163 5b14c7c2 c24be864 4baab592 b8ab5b99 91ee5ee5 b3cf7a7f c6426ad7
% hash 4 256
1234
 9ba783a1 290cb21e fe196a02 3286ece5 49394c75 1ddd607e 5d67c4dc 549c62eb
% hash 4 256
12345
 c9680da8 ef00d2f8 4459a8e9 b50ada71 c63cae6f dcb6f774 f6998783 30a4a1f4
% hash 4 256
123456
 7656d389 f980bbe8 94152abe c6dc5f16 faf21c60 3b8f5098 861acf3c c059467b
% hash 4 256
1234567
 d96eb599 8377bb1d 74a02a2f ac9a85 3175250e 4796af36 36609747 372bba80
% hash 4 256
12345678
 b7818f09 2118e98a 140af09a 6cca4e6f 1eba88e7 52c20174 653637c9 d628f33f
% hash 4 256
123456789
 c2242249 1187baaa 94725400 8dffd5b 38f01557 9f3f2390 50969991 fdc1a810
% 



Note that 'unsigned long int' MUST be 32 bits

Implementor:  Ralph C. Merkle

*/




#define inputFile  0		/* normally set up for standard in */
#define errorFile  2		/* normally set up for standard error */

/* WARNING:  Changing the following parameter may affect security in
   non-obvious ways  */
#define inputBlockSize  16	/* size in 32-bit words of an input block to
				   the hash routine  */
#define maxOutputBlockSize  8	/* size in 32-bit words of largest output
				   block from the hash routine */



#define bufferSize  3072	/* MUST be 3 * 2**n,  n > 5  */
#define bufferSizeInWords  768	/* MUST be bufferSize/4  */
#define maxSBoxCount 8
#define maxSBoxCountDiv2 4	/* has to equal maxSBoxCount/2, of course */
#define maxWordCount 16		/* maximum valid value for wordCount  */

/* Note that the following variable should be set either to 4 or to 8.  */
int     outputBlockSize = 4;	/* default to normal 4 32-bit word or 128 bit
				   output size */

/* default to normal 48-byte chunk size. Must be
   inputBlockSize-outputBlockSize  */
int     chunkSize = 12;

/* initialize the default value for the security parameter */
int     securityLevel = 2;

int     shiftTable[4] = {16, 8, 16, 24};

typedef unsigned long int sBox[256];


/* The standard S boxes must be defined in another file */
extern sBox standardSBoxes[maxSBoxCount];


/* an array needed only for the fast version of HashN -- Hash512.   It's safe
   to ignore this array if you don't need to understand the speeded up
   version.  Note that the 32-bit word specified by
   'rotatedRightStandardSBoxes[i][j][k]' is rotated right by i*8 bits  */
sBox    rotatedRightStandardSBoxes[4][maxSBoxCount];



/* The following routine is a simple error exit routine  -- it prints a
   message and aborts */
void    errAbort (s)
	char   *s;
{
	int     length;

	for (length = 0; s[length] != 0; length++);
	if (write (errorFile, s, length) != length)
		exit (2);
	if (write (errorFile, "\n", 1) != 1)
		exit (2);
	exit (1);
};


/* The following routine converts a byte array to an array of unsigned long
   int.  It is primarily intended to eliminate the byte-ordering problem.
   VAXes order the bytes in a character array differently than SUN's do. Note
   that this routine will SLOW DOWN THE HASH FUNCTION */
void    convertBytes (buffer, wordBuffer)
	char    buffer[bufferSize];
unsigned long int wordBuffer[bufferSizeInWords];

{
	int     i;
	unsigned long int t0, t1, t2, t3;

	/* buffer --        an input buffer of characters
	
	wordBuffer --  an output buffer of unsigned long int's
	
	*/

	for (i = 0; i < bufferSizeInWords; i++) {

		t0 = buffer[4 * i];
		t1 = buffer[4 * i + 1];
		t2 = buffer[4 * i + 2];
		t3 = buffer[4 * i + 3];
		t0 &= 0xff;
		t1 &= 0xff;
		t2 &= 0xff;
		t3 &= 0xff;
		wordBuffer[i] = (t0 << 24) | (t1 << 16) | (t2 << 8) | t3;

	};

};



void    HashN (output, wordCount, input, localSecurityLevel)
	unsigned long int output[maxOutputBlockSize];
int     wordCount;
unsigned long int input[];
int     localSecurityLevel;

{

	/* Note that we are computing localSecurityLevel * wordCount * 4
	   rounds.  */

	unsigned long int mask;
	unsigned long int block[maxWordCount];	/* holds the array of data
						   being hashed  */
	unsigned long int SBoxEntry;	/* just a temporary */
	int     shift;
	int     i;
	int     index;
	int     next, last;
	int     byteInWord;


	mask = wordCount - 1;	/* presumes wordCount is a power of 2  */


	/* Test for various error conditions and logic problems  */

	if (localSecurityLevel * 2 > maxSBoxCount)
		errAbort ("Too few S-boxes");
	if (wordCount > maxWordCount)
		errAbort ("Logic error, wordCount > maxWordCount");
	if (wordCount != 16)
		errAbort ("Security warning, input size not equal to 512 bits");
	/* note that this routine will fail spectacularly on 8 byte blocks,
	   hence the following ban. */
	if (wordCount < 4)
		errAbort (" wordCount too small");
	if ((wordCount & mask) != 0)
		errAbort ("logic error, wordCount not a power of 2");
	if (outputBlockSize > wordCount)
		errAbort ("logic error, outputBlockSize is too big");
	if ((outputBlockSize != 4) & (outputBlockSize != 8))
		errAbort ("Output size neither 128 nor 256 bits");

	/* All the error condtions have now been checked -- everything should
	   work smoothly  */


	/* initialize the block to be encrypted from the input  */
	for (i = 0; i < wordCount; i++)
		block[i] = input[i];



	for (index = 0; index < localSecurityLevel; index++) {


		for (byteInWord = 0; byteInWord < 4; byteInWord++) {


			for (i = 0; i < wordCount; i++) {
				next = (i + 1) & mask;
				last = (i + mask) & mask;	/* really last = (i-1)
								   MOD wordCount */

				SBoxEntry = standardSBoxes[2 * index + ((i / 2) & 1)][block[i] & 0xff];
				block[next] ^= SBoxEntry;
				block[last] ^= SBoxEntry;
			};


			/* Rotate right all 32-bit words in the entire block
			   at once.  */
			shift = shiftTable[byteInWord];
			for (i = 0; i < wordCount; i++)
				block[i] = (block[i] >> shift) | (block[i] << (32 - shift));


		};		/* end of byteInWord going from 0 to 3 */


	};			/* end of index going from 0 to
				   localSecurityLevel-1 */



	for (i = 0; i < outputBlockSize; i++)
		output[i] = input[i] ^ block[mask - i];


};




void    Hash512 (output, input, localSecurityLevel)
	unsigned long int output[maxOutputBlockSize];
unsigned long int input[];
int     localSecurityLevel;

{

	/* This routine is a specialized version of HashN.  It is optimized
	   for speed, and assumes that the input is always 16 words long
	   It hashes 512 bits, hence its name.  */
	/* You need not try to figure out this routine unless you wish to
	   figure out a fast implementation of Snefru  */

	/* the following are two pointers to S-boxes  */
	unsigned long int *SBox0;
	unsigned long int *SBox1;


	/* the array 'block' is divided into 16 distinct variables */
	unsigned long int block0, block1, block2, block3;
	unsigned long int block4, block5, block6, block7;
	unsigned long int block8, block9, block10, block11;
	unsigned long int block12, block13, block14, block15;

	unsigned long int SBoxEntry;	/* just a temporary */
	int     index;

	if (inputBlockSize != 16)
		errAbort ("Hash512 called with inputBlockSize != 16");
	if ((outputBlockSize != 4) & (outputBlockSize != 8))
		errAbort ("Hash512 called with outputBlockSize != 4 or 8");

	/* initialize the block to be encrypted from the input  */
	/* Note that in theory block<i> should be kept in register.  Not all
	   compilers can do this, even when there are enough registers --
	   this will degrade performance significantly.  */
	block0 = input[0];
	block1 = input[1];
	block2 = input[2];
	block3 = input[3];
	block4 = input[4];
	block5 = input[5];
	block6 = input[6];
	block7 = input[7];
	block8 = input[8];
	block9 = input[9];
	block10 = input[10];
	block11 = input[11];
	block12 = input[12];
	block13 = input[13];
	block14 = input[14];
	block15 = input[15];



	for (index = 0; index < 2 * localSecurityLevel; index += 2) {


		/* set up the base address for the two S-box pointers.  */
		SBox0 = rotatedRightStandardSBoxes[0][index];
		SBox1 = SBox0 + 256;


		/* In the following unrolled code, the basic 'assembly
		   language' block that is repeated is:
		
		1    temp1 = shift(block<i>, shiftConstant)
		2    temp2 = temp1 & 0x3fc
		3    temp3 = SBox<0 or 1> + temp2
		4    temp4 = *temp3
		5    block<i-1> ^= temp4
		6    block<i+1> ^= temp4
		
		In step 1, we simply shift the ith 32-bit block to bring the
		   8-bit byte into the right position.  Note that we will
		   also build-in a left-shift by 2 bits at this stage, to
		   eliminate the left shift required later because we are
		   indexing into an array of 4-byte table entries.
		
		In step 2, we mask off the desired 8 bits.  Note that 0x3fc
		   is simply 0xff << 2.
		
		In step 3, we use a normal integer add to compute the actual
		   address of the S-box entry.  Note that one of two pointers
		   is used, as appropriate.  Temp3 then holds the actual byte
		   address of the desired S-box entry.
		
		In step 4, we load the 4-byte S-box entry.
		
		In steps 5 and 6, we XOR the loaded S-box entry with both the
		   previous and the next 32-bit entries in the 'block' array.
		
		Typical optimizing comilers might fail to put all the
		   block<i> variables into registers. This can result in
		   significant performance degradation. Also, most compilers
		   will use a separate left-shift-by-2 after masking off the
		   needed 8 bits, but the performance degradation caused by
		   this oversight should be modest.
		
		*/

		SBoxEntry = SBox0[block0 & 0xff];
		block1 ^= SBoxEntry;
		block15 ^= SBoxEntry;
		SBoxEntry = SBox0[block1 & 0xff];
		block2 ^= SBoxEntry;
		block0 ^= SBoxEntry;
		SBoxEntry = SBox1[block2 & 0xff];
		block3 ^= SBoxEntry;
		block1 ^= SBoxEntry;
		SBoxEntry = SBox1[block3 & 0xff];
		block4 ^= SBoxEntry;
		block2 ^= SBoxEntry;
		SBoxEntry = SBox0[block4 & 0xff];
		block5 ^= SBoxEntry;
		block3 ^= SBoxEntry;
		SBoxEntry = SBox0[block5 & 0xff];
		block6 ^= SBoxEntry;
		block4 ^= SBoxEntry;
		SBoxEntry = SBox1[block6 & 0xff];
		block7 ^= SBoxEntry;
		block5 ^= SBoxEntry;
		SBoxEntry = SBox1[block7 & 0xff];
		block8 ^= SBoxEntry;
		block6 ^= SBoxEntry;
		SBoxEntry = SBox0[block8 & 0xff];
		block9 ^= SBoxEntry;
		block7 ^= SBoxEntry;
		SBoxEntry = SBox0[block9 & 0xff];
		block10 ^= SBoxEntry;
		block8 ^= SBoxEntry;
		SBoxEntry = SBox1[block10 & 0xff];
		block11 ^= SBoxEntry;
		block9 ^= SBoxEntry;
		SBoxEntry = SBox1[block11 & 0xff];
		block12 ^= SBoxEntry;
		block10 ^= SBoxEntry;
		SBoxEntry = SBox0[block12 & 0xff];
		block13 ^= SBoxEntry;
		block11 ^= SBoxEntry;
		SBoxEntry = SBox0[block13 & 0xff];
		block14 ^= SBoxEntry;
		block12 ^= SBoxEntry;
		SBoxEntry = SBox1[block14 & 0xff];
		block15 ^= SBoxEntry;
		block13 ^= SBoxEntry;
		SBoxEntry = SBox1[block15 & 0xff];
		block0 ^= SBoxEntry;
		block14 ^= SBoxEntry;

		/* SBox0 = rotatedRightStandardSBoxes[2][index];  */
		SBox0 += 2 * maxSBoxCount * 256;
		SBox1 = SBox0 + 256;

		SBoxEntry = SBox0[(block0 >> 16) & 0xff];
		block1 ^= SBoxEntry;
		block15 ^= SBoxEntry;
		SBoxEntry = SBox0[(block1 >> 16) & 0xff];
		block2 ^= SBoxEntry;
		block0 ^= SBoxEntry;
		SBoxEntry = SBox1[(block2 >> 16) & 0xff];
		block3 ^= SBoxEntry;
		block1 ^= SBoxEntry;
		SBoxEntry = SBox1[(block3 >> 16) & 0xff];
		block4 ^= SBoxEntry;
		block2 ^= SBoxEntry;
		SBoxEntry = SBox0[(block4 >> 16) & 0xff];
		block5 ^= SBoxEntry;
		block3 ^= SBoxEntry;
		SBoxEntry = SBox0[(block5 >> 16) & 0xff];
		block6 ^= SBoxEntry;
		block4 ^= SBoxEntry;
		SBoxEntry = SBox1[(block6 >> 16) & 0xff];
		block7 ^= SBoxEntry;
		block5 ^= SBoxEntry;
		SBoxEntry = SBox1[(block7 >> 16) & 0xff];
		block8 ^= SBoxEntry;
		block6 ^= SBoxEntry;
		SBoxEntry = SBox0[(block8 >> 16) & 0xff];
		block9 ^= SBoxEntry;
		block7 ^= SBoxEntry;
		SBoxEntry = SBox0[(block9 >> 16) & 0xff];
		block10 ^= SBoxEntry;
		block8 ^= SBoxEntry;
		SBoxEntry = SBox1[(block10 >> 16) & 0xff];
		block11 ^= SBoxEntry;
		block9 ^= SBoxEntry;
		SBoxEntry = SBox1[(block11 >> 16) & 0xff];
		block12 ^= SBoxEntry;
		block10 ^= SBoxEntry;
		SBoxEntry = SBox0[(block12 >> 16) & 0xff];
		block13 ^= SBoxEntry;
		block11 ^= SBoxEntry;
		SBoxEntry = SBox0[(block13 >> 16) & 0xff];
		block14 ^= SBoxEntry;
		block12 ^= SBoxEntry;
		SBoxEntry = SBox1[(block14 >> 16) & 0xff];
		block15 ^= SBoxEntry;
		block13 ^= SBoxEntry;
		SBoxEntry = SBox1[(block15 >> 16) & 0xff];
		block0 ^= SBoxEntry;
		block14 ^= SBoxEntry;


		/* SBox0 = rotatedRightStandardSBoxes[1][index];  */
		SBox0 -= maxSBoxCount * 256;
		SBox1 = SBox0 + 256;

		SBoxEntry = SBox0[block0 >> 24];
		block1 ^= SBoxEntry;
		block15 ^= SBoxEntry;
		SBoxEntry = SBox0[block1 >> 24];
		block2 ^= SBoxEntry;
		block0 ^= SBoxEntry;
		SBoxEntry = SBox1[block2 >> 24];
		block3 ^= SBoxEntry;
		block1 ^= SBoxEntry;
		SBoxEntry = SBox1[block3 >> 24];
		block4 ^= SBoxEntry;
		block2 ^= SBoxEntry;
		SBoxEntry = SBox0[block4 >> 24];
		block5 ^= SBoxEntry;
		block3 ^= SBoxEntry;
		SBoxEntry = SBox0[block5 >> 24];
		block6 ^= SBoxEntry;
		block4 ^= SBoxEntry;
		SBoxEntry = SBox1[block6 >> 24];
		block7 ^= SBoxEntry;
		block5 ^= SBoxEntry;
		SBoxEntry = SBox1[block7 >> 24];
		block8 ^= SBoxEntry;
		block6 ^= SBoxEntry;
		SBoxEntry = SBox0[block8 >> 24];
		block9 ^= SBoxEntry;
		block7 ^= SBoxEntry;
		SBoxEntry = SBox0[block9 >> 24];
		block10 ^= SBoxEntry;
		block8 ^= SBoxEntry;
		SBoxEntry = SBox1[block10 >> 24];
		block11 ^= SBoxEntry;
		block9 ^= SBoxEntry;
		SBoxEntry = SBox1[block11 >> 24];
		block12 ^= SBoxEntry;
		block10 ^= SBoxEntry;
		SBoxEntry = SBox0[block12 >> 24];
		block13 ^= SBoxEntry;
		block11 ^= SBoxEntry;
		SBoxEntry = SBox0[block13 >> 24];
		block14 ^= SBoxEntry;
		block12 ^= SBoxEntry;
		SBoxEntry = SBox1[block14 >> 24];
		block15 ^= SBoxEntry;
		block13 ^= SBoxEntry;
		SBoxEntry = SBox1[block15 >> 24];
		block0 ^= SBoxEntry;
		block14 ^= SBoxEntry;


		/* SBox0 = rotatedRightStandardSBoxes[3][index];  */
		SBox0 += 2 * maxSBoxCount * 256;
		SBox1 = SBox0 + 256;

		SBoxEntry = SBox0[(block0 >> 8) & 0xff];
		block1 ^= SBoxEntry;
		block15 ^= SBoxEntry;
		SBoxEntry = SBox0[(block1 >> 8) & 0xff];
		block2 ^= SBoxEntry;
		block0 ^= SBoxEntry;
		SBoxEntry = SBox1[(block2 >> 8) & 0xff];
		block3 ^= SBoxEntry;
		block1 ^= SBoxEntry;
		SBoxEntry = SBox1[(block3 >> 8) & 0xff];
		block4 ^= SBoxEntry;
		block2 ^= SBoxEntry;
		SBoxEntry = SBox0[(block4 >> 8) & 0xff];
		block5 ^= SBoxEntry;
		block3 ^= SBoxEntry;
		SBoxEntry = SBox0[(block5 >> 8) & 0xff];
		block6 ^= SBoxEntry;
		block4 ^= SBoxEntry;
		SBoxEntry = SBox1[(block6 >> 8) & 0xff];
		block7 ^= SBoxEntry;
		block5 ^= SBoxEntry;
		SBoxEntry = SBox1[(block7 >> 8) & 0xff];
		block8 ^= SBoxEntry;
		block6 ^= SBoxEntry;
		SBoxEntry = SBox0[(block8 >> 8) & 0xff];
		block9 ^= SBoxEntry;
		block7 ^= SBoxEntry;
		SBoxEntry = SBox0[(block9 >> 8) & 0xff];
		block10 ^= SBoxEntry;
		block8 ^= SBoxEntry;
		SBoxEntry = SBox1[(block10 >> 8) & 0xff];
		block11 ^= SBoxEntry;
		block9 ^= SBoxEntry;
		SBoxEntry = SBox1[(block11 >> 8) & 0xff];
		block12 ^= SBoxEntry;
		block10 ^= SBoxEntry;
		SBoxEntry = SBox0[(block12 >> 8) & 0xff];
		block13 ^= SBoxEntry;
		block11 ^= SBoxEntry;
		SBoxEntry = SBox0[(block13 >> 8) & 0xff];
		block14 ^= SBoxEntry;
		block12 ^= SBoxEntry;
		SBoxEntry = SBox1[(block14 >> 8) & 0xff];
		block15 ^= SBoxEntry;
		block13 ^= SBoxEntry;
		SBoxEntry = SBox1[(block15 >> 8) & 0xff];
		block0 ^= SBoxEntry;
		block14 ^= SBoxEntry;



	};			/* end of index going from 0 to
				   2*localSecurityLevel-2 in steps of 2 */


	output[0] = input[0] ^ block15;
	output[1] = input[1] ^ block14;
	output[2] = input[2] ^ block13;
	output[3] = input[3] ^ block12;

	/* generate an extra 128 bits if the output is 256 bits */
	if (outputBlockSize == 8) {
		output[4] = input[4] ^ block11;
		output[5] = input[5] ^ block10;
		output[6] = input[6] ^ block9;
		output[7] = input[7] ^ block8;
	};


};


/* Hash512twice does exactly that.  It hashes 512 bits using both Hash512 and
   HashN.  The output of the two implementations is then compared, and if
   they differ an error message is issued. This insures that two different
   quite different C routines compute the same value, and is quite
   helpful in catching bugs, implementation errors, etc.

It should be particularly helpful to the implementor trying to "tune" the
   hash function for speed, as it provides a constant check on correctness.

*/

void    Hash512twice (output, input, localSecurityLevel)
	unsigned long int output[maxOutputBlockSize];
unsigned long int input[];
int     localSecurityLevel;

{
	unsigned long int out1[maxOutputBlockSize];
	unsigned long int out2[maxOutputBlockSize];
	int     i;

	HashN (out1, inputBlockSize, input, localSecurityLevel);
	Hash512 (out2, input, localSecurityLevel);
	for (i = 0; i < outputBlockSize; i++)
		if (out1[i] != out2[i])
			errAbort (" Hash512 and HashN differ");
	for (i = 0; i < outputBlockSize; i++)
		output[i] = out1[i];
};



/* The main program reads the input, hashes it, and prints the result.  Much
   of the logic in the main program is taken up with the trivia of buffer
   management, error checking, command-line parameter checking, self-tests,
   and the like. The actual use of the hash function occupies a modest
   portion of the overall program.

The basic idea is simple.  As an example, if H is the hash function that
   produces either 128-bit (or 256-bit) outputs, and if we pick an input
   string that is 3 "chunks" long then we are computing:

output = H( H(  H(  H(
			0 || chunk[0])  || chunk[1])  || chunk[2]) || bit-length)

"bit-length" is a "chunk" sized field into which has been put the length of
   the input, in bits, right justified.  Note that the size of a "chunk" is
   just the input size minus the output size.

"0" is a vector of 0 bits of the same size (in bits) as the output of H
   (i.e., either 128 or 256 bits).

"||" is the concatenation operator, and is used to concatenate the output
   field of the preceding computation of H with the next "chunk" of bits from
   the input.

"chunk" is an array which holds the input string.  The final element of the
   array is left justified and zero-filled on the right.

*/

void    main (argc, argv)
	int     argc;
	char   *argv[];

{

	int     i;
	char    buffer[bufferSize];
	int     hitEOF = 0;	/* 0 means we haven't hit EOF, 1 means we
				   have */
	unsigned long int wordBuffer[bufferSizeInWords];
	unsigned long int hash[maxOutputBlockSize];
	unsigned long int hashArray[inputBlockSize];
	unsigned long int bitCount[2];	/* the 64-bit count of the number of
					   bits in the input */
	int     byteCount;	/* the count of the number of bytes we have
				   in the buffer */
	int     dataLoc;	/* the location of the next block of data we
				   wish to hash down.  Note this is the index
				   into an array of 32-bit elements, not
				   bytes */



	/* self-test, to make sure everything is okay.  */
	/* First, test the standard S boxes to make sure they haven't been
	   damaged.  */
	/* Test to make sure each column is a permutation.  */
	for (i = 0; i < maxSBoxCount; i++) {
		char    testArray[256];
		int     testShift = 0;
		int     j;

		for (testShift = 0; testShift < 32; testShift += 8) {
			for (j = 0; j < 256; j++)
				testArray[j] = 0;
			for (j = 0; j < 256; j++)
				testArray[(standardSBoxes[i][j] >> testShift) & 0xff]++;
			for (j = 0; j < 256; j++)
				if (testArray[j] != 1)
					errAbort ("Table error -- the standard S box is corrupted");
		};
	};
	/* Okay, the standard S-box hasn't been damaged  */



	/* Set up the rotated array for the fast hash routine  */
	{
		int     index;	/* ranges over the S box indices  */
		int     rotation;	/* ranges over the four possible
					   byte-rotations  */
		int     i;	/* ranges over the 256 possible S-box entries    */

		for (index = 0; index < maxSBoxCount; index++)
			for (rotation = 0; rotation < 4; rotation++)
				for (i = 0; i < 256; i++)
					rotatedRightStandardSBoxes[rotation][index][i] =
						(standardSBoxes[index][i] >> (rotation * 8)) |
						(standardSBoxes[index][i] << (32 - rotation * 8));


	};



	/* Now try hashing something.  Note that we're testing both HashN and
	   Hash512 here  */
	{
		unsigned long int testInput[inputBlockSize];
		unsigned long int testOutput[maxOutputBlockSize];
		int     j;
		int	k;

		/* Set output size to 256 bits -- just for this test routine */
		outputBlockSize = maxOutputBlockSize;
		chunkSize = inputBlockSize - outputBlockSize;

		if (maxOutputBlockSize != 8)
			errAbort ("The output block size has changed, update the self-test");
		if (inputBlockSize != 16)
			errAbort ("The input block size has changed, update the self-test");
		if (maxSBoxCount != 8)
			errAbort ("Wrong number of S boxes, update the self-test");

		for (i = 0; i < inputBlockSize; i++)
			testInput[i] = 0;	/* zero the input */
		k = 0;  /*  zero the pointer into the input buffer */
		for (i = 0; i < 50; i++) {
			Hash512twice (testOutput, testInput, 4);
			/*	Copy the output into a new slot in the input buffer */
			for (j = 0; j < maxOutputBlockSize; j++)
				testInput[k+j] = testOutput[j];
			k += maxOutputBlockSize;
				/*	reset pointer into input buffer
					if it might overflow next time */
			if ( (k+maxOutputBlockSize) > inputBlockSize) k=0;
		};
		if ((testOutput[0] != 1967985403) || (testOutput[1] != 2688653266) ||
		    (testOutput[2] != 3911883902) || (testOutput[3] != 1117323144) ||
		    (testOutput[4] != 4238876879) || (testOutput[5] != 877649382) ||
		(testOutput[6] != 1112396746) || (testOutput[7] != 324992866)
			)
			errAbort ("Test hash of 64 bytes of 0 failed");
	};
	/* Okay, we can hash at least 50  64-byte values correctly.  */



	outputBlockSize = 4;	/* default is 4 32-bit words, or 128 bits */
	chunkSize = inputBlockSize - outputBlockSize;

	/* Check command line arguments */
	if (argc == 3) {
		/* Two arguments -- generate the Large Economy Size Output of
		   256 bits */
		outputBlockSize = 8;	/* 8  32-bit words */
		chunkSize = inputBlockSize - outputBlockSize;
		if ((argv[2][0] != '2') |
		    (argv[2][1] != '5') |
		    (argv[2][2] != '6'))
			errAbort ("usage:  snefru <security level> [256]");
		argc--;
	};

	if ((bufferSizeInWords % chunkSize) != 0)
		errAbort ("Buffer size is fouled up");

	if (argc > 2)
		errAbort ("usage:  snefru [ <security level> [ <output size> ] ]");
	else if (argc == 2) {
		securityLevel = 0;
		if (argv[1][0] == '2')
			securityLevel = 2;
		else if (argv[1][0] == '3')
			securityLevel = 3;
		else if (argv[1][0] == '4')
			securityLevel = 4;

		if ((argv[1][1] != 0) || (securityLevel == 0))
			errAbort ("The security level can only be 2, 3, or 4 (4 is most secure).");
	};


	/* Apply the one-way hash function to the standard input  */


	bitCount[0] = 0;
	bitCount[1] = 0;
	byteCount = read (inputFile, buffer, bufferSize);
	if (byteCount < 0)
		errAbort ("Error on first read from standard input");
	if (byteCount != bufferSize)
		hitEOF = 1;	/* set EOF flag true if didn't fill buffer */


	bitCount[1] += byteCount * 8;	/* increment the total number of bits
					   read */
	/* bump the upper 32 bits when the lower 32-bits wraps around */
	if (bitCount[1] < (byteCount * 8))
		bitCount[0] += 1;

	/* zero out rest of buffer  */
	for (i = byteCount; i < bufferSize; i++)
		buffer[i] = 0;

	/* following conversion required for machines with byte-ordering
	   unlike the SUN */
	convertBytes (buffer, wordBuffer);
	dataLoc = 0;		/* 4-byte-word location in data input buffer */



	/* Now we hash it down with multiple applications of Hash512  */

	for (i = 0; i < inputBlockSize; i++)
		hashArray[i] = 0;	/* initialize hashArray  */

	/* Hash each chunk in the input (either 48 byte chunks or 32 byte
	   chunks)  -- and keep the result in hashArray.  Note that the first
	   16 (32) bytes of hashArray holds the output of the previous hash
	   computation.  */

	while (byteCount > 0) {
		if (dataLoc + chunkSize > bufferSize)
			errAbort ("logic error, buffer over run");

		/* Get the next chunk */
		for (i = 0; i < chunkSize; i++)
			hashArray[outputBlockSize + i] = wordBuffer[dataLoc + i];
		/* and hash it in */
		Hash512 (hashArray, hashArray, securityLevel);

		/* increment index to next 48-byte input chunk */
		dataLoc += chunkSize;


		/* decrement the number of bytes left to process */
		/* Yes, if byteCount was less than chunkSize this might make
		   byteCount negative. It's ok, it's caught in the following
		   "if" statement */
		byteCount -= chunkSize * 4;

		/* Out of data -- read some more */
		if (byteCount <= 0) {
			if (hitEOF == 1)
				byteCount = 0;
			else {
				if (byteCount != 0)
					errAbort ("Logic error near EOF");
				byteCount = read (inputFile, buffer, bufferSize);
				if (byteCount < 0)
					errAbort ("Error while reading from input");
				if (byteCount != bufferSize)
					hitEOF = 1;
			};

			bitCount[1] += byteCount * 8;	/* increment the
							   bit-count */
			/* bump the upper 32 bits when the lower 32-bits
			   wraps around */
			if (bitCount[1] < (byteCount * 8))
				bitCount[0] += 1;

			/* zero out rest of buffer */
			for (i = byteCount; i < bufferSize; i++)
				buffer[i] = 0;

			convertBytes (buffer, wordBuffer);
			dataLoc = 0;
		};
	};			/* end of while */


	/*  Zero out the remainder of hashArray  */
	for (i = 0; i < chunkSize; i++)
			hashArray[outputBlockSize + i] = 0;

	/* Put the 64-bit bit-count into the final 64-bits of the block about
	   to be hashed */
	hashArray[inputBlockSize - 2] = bitCount[0];	/* upper 32 bits of
							   count */
	hashArray[inputBlockSize - 1] = bitCount[1];	/* lower 32 bits of
							   count */

	/* Final hash down.  */
	Hash512 (hash, hashArray, securityLevel);

	/* 'hash' now holds the hashed result, which is printed on standard
	   output */
	for (i = 0; i < outputBlockSize; i++)
		printf (" %x", hash[i]);
	printf ("\n");
	exit (0);
};
