M8, RS8, IS8 -
Three variable block size encryption functions.


Leslie R. McBride
104 Westwood Dr. Apt. 355
Lafayette, LA, USA 70506
E-Mail: macck@linknet.net

Abstract. Most secret key encryption methods use a small encryption 
block, ie. DES = 64 bits, RC5 = 32, 64 or 128 bits, Blowfish = 64 bits, 
IDEA = 64 bits. These algorithms are normally used in feedback modes to 
encrypt large blocks. This paper proposes three ciphers suitable for 
encrypting large data blocks of variable length. These ciphers are products 
cipher consisting of three different cipher block chaining methods of a 
simple substitution. They are redesigned versions of SCOTT16 (now SCOTT8) 
with a new key scheduling method, improved chaining, and a smaller lookup 
table based on discussions in SCI.CRYPT USENET news group.

1. Terminology.

     a. All descriptions will be in pseudo-C code.
     b. A[B] will be the B+1 element of the array A[0..x].
     c. A(B1,B2,...) will be a function A of B1,B2,...
     d. A byte is defined as 8 bits.
     e. The most significant bit of a word is assumed to be on the left,
        ie. 76543210 where 7 is most significant.
     f. The most significant byte of a word will be stored higher in 
        memory than the lower bytes, ie. fedcba9876543210 will be stored 
        as 76543210 fedcba98. That is this is "little endian" notation. 
        This is standard Intel x86 format.
     g. L is the last of elements in an array a[0..L].
     h. P[n] is plaintext before a round of encipherment.
     i. C[n] is ciphertext after a round of encipherment.
     j. ^ is addition modulo 2


2. Description of the basic round functions.

     P[n] and C[n] are each 1 byte long for these descriptions.
     S[n] and IS[n] are 256 element substitution arrays.
     n=IS[S[n]], that is the IS array is inverse of S

     a. Cipher Block Chaining (CBC):

            Encryption:             Decryption:
            C[0] = E(P[0]^IV)       P[0] = D(C[0])^IV
     (n>0)  C[n] = E(P[n]^C[n-1])   P[n] = D(C[n])^C[n-1]

     b. Simple Substitution:

            Encryption:             Decryption:
            C[n]=S[P[n]]            P[n]=IS[C[n]]

     c. SCOTT8

        The SCOTT8 algorithm requires decryption last byte first.

             Encryption:                      Decryption:
     (n=0)   C[0] = S(P[0]^P[L-1]^P[1])       P[0] = IS(C[0])^P[L-1]^P[1]
     (0<n<L) C[n] = S(P[n]^C[n-1]^P[n+1])     P[n] = IS(C[n])^C[n-1]^P[n+1]
     (n=L)   C[L] = S(P[L]^C[L-1]^C[0])     P[L} = IS(C[L])^C[L-1]^C[0]


     d. RS8

        RS8 is simply repeated CBC over a substitution table.
        The IV has been replaced by the last plaintext block.

            Encryption:             Decryption:
     (n=L)  C[0] = S(P[0]^P[n])     P[0] = IS(C[0])^P[n]
     (n>0)  C[n] = S(P[n]^C[n-1])   P[n] = IS(C[n])^C[n-1]

     e. IS8:

        IS8 is a modified version of SCOTT8. It also requires decryption
        of the last byte first. + and - outside of the array references
        are both modulo 256.

             Encryption:                     Decryption:
     (n=0)   C[0] = S((P[0]^P[L-1])+P[1])    P[0] = (IS(C[0])-P[1])^P[L-1]
     (0<n<L) C[n] = S((P[n]^C[n-1])+P[n+1])  P[n] = (IS(C[n])-P[n+1])^C[n-1]
     (n=L)   C[L] = S((P[L]^C[L-1])+C[0])    P[L} = (IS(C[L])-C[0])^C[L-1]

     f. M8

        The M8 algorithm requires decryption of the second byte first. 
        The first byte cannot be decrypted until the last byte is 
        decrypted. M8 uses a set of round keys RK[i,0] and RK[i,1] where 
        i is the round. M8 after the first two bytes uses normal PCBC 
        (propagating cipher block chaining).

             Encryption:                     Decryption:
     (n=0)   C[0] = S(P[0]^P[L-1]^RK[i,0])   P[0] = IS(C[0])^P[L-1]^RK[i,0]
     (n=1)   C[1] = S(C[0]^P[1]^RK[i,1])     P[1] = IS(C[1])^P[0]^RK[i,1]
     (1<n)   C[n] = S(P[n]^C[n-1]^P[n-1])    P[n] = IS(C[n])^C[n-1]^P[n-1]


3. Description of the substitution table.

     a. Filling the S array

     The substitution table is filled by the following algorithm:

     int fill_S(BYTE S[256], BYTE *perm, int len)
     {
     int i,j,k;
     unsigned m;

     for (i=0;i<256;i++) S[i]=i;
     for (i=256;i>1;i--)
          {
          m=bdiv(perm,i,len);
          while (perm[len-1]==0) len--;
          j=S[256-i];
          k=S[256-i+m];
          S[256-i]=k;
          S[256-i+m]=j;
          }
     return len;
     }

     This is based on 256! (256 factorial). All substitution tables are
     possible. The shuffle algorithm swaps the current element with one 
     of the remaining ones. Based on the value of m. Element 0 is swapped 
     with the element in the position returned by bdiv which is perm mod 256.
     Perm is divided by 256. The element 1 is swapped with the element in
     the perm mod 255 position now that one position is filled. Perm is
     divided by 255 and this is repeated till all positions are filled.

     The one cycle permutation method by David Scott developed for SCOTT16
     and SCOTT8 is discarded in favor of all possible permutations.

     b. The bdiv function.

     The function bdiv is defined as:

     unsigned bdiv(BYTE *bint, unsigned d, int len)
     {int i;
      unsigned m=0,u;

      for (i=len-1;i>=0;i--)
          {u = (m<<8) + bint[i];
           m = u%d;
           bint[i]=u/d;
          }
      return m;
     }

     bdiv takes a "little-endian" number len bytes long and divides it by d
     and returns the remainder. bint is the number to divide and the
     result.

     c. Inverting S to make IS.

     The inversion of the substitution uses the following algorithm:

     void invert_S(BYTE S[256], BYTE IS[256])
     {
     int i;
     for (i=0;i<256;i++) IS[S[i]]=i;
     }

     This simply determines which element of IS should decrypt to I.


4. Round keys for M8.

   A set of round keys is needed to use M8. These keys are defined as:

   RK[i,j]=S[S[i]^E_2[j]]

   For j=0, E_2[0]=0xD7 and for j=1, E_2[1]=0xE1. These are the only
   two values used in M8.


5. Initialization from the key.

     a. The initial S

     The initial S is produced by making a 256 byte array and using the
     array to fill S. The array is made by taking adding modulo 2 a stream
     of 17 bytes of pi-2 and 19 bytes of e-2. The algorthim is:

     pi_len=17;
     e_len=19;
     for (i=0,j=0,k=0; i<256; i++)
          {
          temp[i]=pi_3[j++]^e_2[k++];
          if (j==pi_len) j=0;
          if (k==e_len) k=0;
          }

     b. Preprocessing the key.

     The key should be preprocessed if a limited key is to be used 
     (ie. for export purposes). The preprocessing consists of 
     encrypting the key text four rounds with the initial S and 
     using a truncated portion (ie. 5 bytes or 40 bits). The key 
     length is now the length of the truncated portion. This step 
     is skipped if no truncation is needed.

     c. Using the key

     The key is combined with the golden mean (rho) in a manner similar 
     to the algorithm used in a. The difference is that the key length 
     is added modulo 2 to the first byte of rho and the procedure then 
     streams from the second byte of rho and the first byte of the key. 
     23 bytes of rho are used. The array is again 256 bytes. The 
     algorithm is:

     rho_len=23
     for (i=1,j=0,k=1,temp[0]=key_len^rho_1[0]; i<256; i++)
          {
          temp[i]=key[j++]^rho_1[k++];
          if (j>=key_len) j=0;
          if (k==rho_len) k=0;
          }

     If key length is zero key[0] MUST equal zero.


     d. Making the final S.

     The array from c. Is encrypted using the initial S. The encryption 
     is N rounds, where N is the number of rounds in a normal encryption 
     and N is always at least four. The new array is used to fill S again 
     as in step a. The encryption method is the same as the normal 
     encryption, IS8, RS8, or M8.

5. Describing the algorithm

The algorithm may be described as: RS8-r.
Where:
r = the number of rounds. (Minimum 4)

An additional parameter is added for a fixed key size with pretreatment 
of the key: RS8-r/l

l = the restricted key length

RS4 uses half a byte at a time and a 16 word table. For RS4 the low four
bits are taken as the first word and the high four bits as the second word.
RS4 is defined as a smaller easier cipher to break.

M8, and IS8 are similarly defined. As are M4 and IS4.


6. Special notes for encryption.

The lookup table remains the same for a two to 4 byte encryption as for 
the algorithm as described in section 5. The table is calculated for the 
normal number of round and the minimum number of rounds is used to encrypt 
these numbers of bytes. These values were selected to yield a minimum of 
16 table lookups.

If less than 5 bytes are encrypted:
bytes     minimum rounds

2         8
3         6
4         4

Four is the recommended minimum number of rounds. Six rounds are used in 
the reference implementation. Six rounds is probably the minimum for good
security.

The minimum number of bytes is two for RS8 and M8. IS8 and SCOTT8 both
require three bytes. The reference implementation requires three bytes.

7. More notes on the reference implementation.

The method the reference implementation uses to calculate the size of
blocks ensures that all blocks are less than 8192 bytes. The algorithm
is as follows:

X=file length
Y=total number of whole blocks in the file
Z=the length of the last odd sized block
W=Z+8192

encrypt the first(Y-1) blocks
if W is even then encrypt the remainder of the file as two
   blocks of length W/2.
if W is odd then encrypt the remainder of the file as one block
   of length (W+1)/2 and one block of length (W-1)/2.

The reference implementation also includes a PCBC mode over the
selected mode. PCBC is well defined over a decreasing block size.

8. Analysis

Before analyzing a complete round it is useful to analyze the parts. The
simple substitution requires 255 chosen plaintext-ciphertext pairs to be
'broken'. This is a 'dictionary attack', where all possible input-output
pairs are solved. CBC is vulnerable to differential attack if the same IV
is used twice with chosen plaintext, but in no case can it be weaker than
the function E(). Effectively this means that in order to break one round
two simultaneous ciphers must be broken.

Standard differential and linear cyptanalysis techniques that are so
effective against DES and other feistel ciphers are of dubious use in
an attack on chained-substitution ciphers. They require some knowledge of
the sustitution table which is lacking.

SCOTT8 and SCOTT16 seem to resist attack well accept for a chosen-plaintext
attack with a fixed length file which is a multiple of the cycle size plus 
one. The file consists of a constant value. All three of the methods 
presented here are resistant to that attack in its simplest form, RS8 may 
be vulnerable to similar attacks.

That attack on SCOTTx by Paul Onions used the single-cycle permutation
and the advanced plaintext xor to form regular cycles in the output.
The cycles in the output are then directly used to reconstruct the table.
For multiple cycles this form of attack can also be used by varying the
file lengths until a regular pattern appears. For any given input value,
the output will only consists of values in a given cycle.

RS8 is somewhat vulnerable to similar attacks, but the cycles it forms
have less correlation to the table-cycles. RS8 is also subject to a known-
plaintext attack that takes advantage of the poor diffusion in the reverse
direction upon decryption. This form is vulnerable for two rounds with 
only about 2000 bytes of adaptively chosen-ciphertext. I have not been 
successful with the attack but it seems promising. Three rounds increases 
the resistance considerably. For three rounds about 2**24 chosen ciphertext 
bytes would be required for the same attack. The complete attack will be 
included in later versions of this document.

The two-round attack RS8 is based on the two equations:

(1)    P[n]=IS[IS[c[n]]]^IS[0] and
(2)    P[n]=IS[IS[c[n]]^S[0]]

S[0] is easily found and the corresponding tables representing the
two equations are easily constructed. If table IS can be determined
from these two equations then IS can be converted to S trivially.
The attack indicates that at most 131072 chosen ciphertext bytes are
required. That number can be trimmed somewhat by observing that
for X1X2X3 the three bytes make two pairs of bytes.

RS8 is included primarily as a reduced version of the M8 and IS8 algorithms.
RS8 trades a speed increase for decreased security. RS8 also has the
characteristic that if a portion of the ciphertext is changed only
a limited number of plaintext bytes after the error will be wrong. This
may be advantageous in a noisy environment where the ciphertext may
not be transmitted properly and less security is required.

IS8 uses the concept of 'three non-group fuctions' used in IDEA.
S[] is totally non-linear. + and xor(^) are both linear but do not form
a group. This cipher appears to be resistant to the Onions attack. IS8 
produces good diffusion after two rounds in for encryption and decryption
The attack which is effective against RS8 does not work against IS8.

M8 also uses the 'three non-group functions' concept. S[] is the first
function. Xor(^) is the second function. And Y[x]=S[x]^x is the third 
function. It is interesting to look at the function in this manner 
because ^ and S[] do not form a group the function Y[x] combining the 
two is also not a group. Y[x] is not reversible, that is it is not 
bijective (ie. is not a one-to-one mapping). The round keys used in M8 
are dependent on the table. Finding the table gives the round keys. M8 
is probably more secure than IS8 because of the use of round keys. M8 
is provably more secure than RS8 because of the round keys:

If we assume that Y[x] and S[x] have equal security.

For n>1:               For n=1                   For n=0
RS8 = S[S[P[n-1]]^P[n]]    S[S[P[0]]^P[1]]           S[P[L]^P[0]]
M8  = S[Y[P[n-1]]^P[n]]    S[S[P[0]]^P[1]^RK[i,1]]   S[P[L}^P[0]^RK[i,1]]

In reality it is easy to prove that Y[x] improves the security over S[x] 
if we assume that the two are not related. Y[x] is however related to 
S[x].

M8 has good diffusion after two rounds for encryption and three rounds
for decryption. This is due to the asymmetry in the first three bytes.
Changes in the first byte take three rounds to propogate on decryption
but only one round on encryption.

Y[x] in M8 is similar to the T(x) box in CMEA (Cellular Message Encryption 
Algorithm). T(x) was found to be weak for two reasons: a) the use of a 
known starting table, b) the uneven distribution of that table. Y[x] does 
not suffer from these deficiencies. CMEA was also only a three round 
algorithm that did not use the S[x] transformation found in M8. Only two 
rounds of CMEA actually contain the non-linear function T(x).

The table creation of all three algorithms is similar except for the 
function used to encrypt the key material. Because of the good diffusion 
properties in the encryption direction, a single bit change in the key 
results in a large change in the final S[] table. The keying produces 
about 2**356 equivalent keys for each of the 2**1684 possible tables. 
If the 95 printable characters are used, only 2**1675.3 keys are possible 
for key length 255. For all key lengths of 95 characters, 2**1679.2 keys 
are possible. This means only 1 in 2**4.8 (27.9) possible tables results 
in a valid key. A key resulting in a linear table is therefore unlikely. 
Most tables will have points with small cycle lengths. If they were used 
for SCOTTx this would be a serious deficiency. Cycling through the Onions 
attack with a file of length (8!)+1=40321 would find all cycles of eight 
or less.

For this section we will assume breaking the cipher is equivalent to
finding the table not just creating a  dictionary'. Another angle of
attack is incremental improvement of the S[] table. For a short block
length (3) and 6 rounds, only 18 table lookups are used in RS8 and
IS8. For M8 an additional 18 lookups are performed to create the
round keys. This allows us to incrementally improve our estimate of
the S[] table. 

This attack is a 'lottery attack', finding R values out of X
possible with none repeated. Such an attack has a complexity
of X!/((X-R)!R!). We will call this function U(X,R). In this 
case the added complexity of finding both the value and the 
locations simultaneously has a complexity U(X,R)**2 assuming the
order of the values in the locations is irrelavent. Since the
order is relavent the actual equation is R!*U(X,R)**2.


R   LOG2(U(256,R))    LOG2(U(256,R)**2)     LOG2(R!*U(256,R)**2)     
1      8                16                    16
3     21.4              42.8                  45.38
6     38.42             76.85                 86.34
9     53.33            106.65                125.12
12    66.79            133.57                162.41
18    90.61            181.22                233.73
24   111.36            222.71                301.75
36   146.18            292.36                430.45

The difficulty of finding one value and location assuming
neither is known is 16 bits. Each round of IS8 and RS8 uses
3 table lookups minimum (for three bytes). M8 uses 6 lookups
for three bytes, the extra three are from the round keys.
The referense implementation uses 6 rounds. For IS8 and RS8
this involves 18 lookups. With these two we know one location
and 3 values. Which reduces the complexity somewhat, if we assume
that the reduction is equivalent to six lookups (it is less than
that) then, RS8 and IS8 have a strength of 162.4 bits. The
round keys in M8 are assumed to increase the strength to
24 lookups or 301.75 bits and in no case with the strength be
less than 18 lookups or 233.7 bits.

This attack is further complicated by more than one set of
18 lookups leading to the same output. For a 4 bit table there
are only 16 positions therefore this attack is equivalent
to 'brute forcing' the table for the 4 bit algorithms.

For 6 rounds RS8 seems secure against the current attacks.
IS8 has a probable security of above 160 bits. M8 has a probable 
security of greater than 230 bits. All three algorithms allow 
plaintext pass phrases in a manner which is unlikely to produce
weak keys.


9. Possible improvements.

a. IS8 and RS8 may be improved by adding round keys.
b. All three algorithms can obviously be improved by increasing
   the table size to 16 bits.
c. Further for small block sizes more rounds can be used,
   requiring an improved table in section 6. A minimum number of
   table lookups could be set at 32 rather than the current 16 by
   means of that table.
d. More rounds could be used for larger block sizes as well.
e. More tables could be used. For example, two tables, used either
   on alternating bytes or alternating rounds.
f. The minimum file size could be set higher, say 8 bytes. This
   would increase the number of table lookups to 48.


Appendix A - Hexadecimal digits of pi_3, e_2, and rho_1

#define unsigned char BYTE;
BYTE pi_3[]=
     {0x24,0x3f,0x6a,0x88,0x85,0xa3,0x08,0xd3, /* 8  */
      0x13,0x19,0x8a,0x2e,0x03,0x70,0x73,0x44, /* 16 */
      0xa4,0x09,0x38,0x22,0x29,0x9f,0x31,0xd0, /* 24 */
      0x08,0x2e,0xfa,0x98,0xec,0x4e,0x6c,0x89, /* 32 */
      0x45,0x28,0x21,0xe6,0x38,0xd0,0x13,0x77, /* 40 */
      0xbe,0x54,0x66,0xcf,0x34,0xe9,0x0c,0x6c, /* 48 */
      0xc0,0xac,0x29,0xb7,0xc9,0x7c,0x50,0xdd, /* 56 */
      0x3f,0x84,0xd5,0xb5,0xb5,0x47,0x09,0x17, /* 64 */

      0x92,0x16,0xd5,0xd9,0x89,0x79,0xfb,0x1b, /* 72 */
      0xD1,0x31,0x0B,0xA6,0x98,0xDF,0xB5,0xAC, /* 80 */
      0x2F,0xFD,0x72,0xDB,0xD0,0x1A,0xDF,0xB7, /* 88 */
     };
BYTE e_2[]=
     {0xB7,0xE1,0x51,0x62,0x8A,0xED,0x2A,0x6A, /* 8 */
      0xBF,0x71,0x58,0x80,0x9C,0xF4,0xF3,0xC7, /* 16 */
      0x62,0xE7,0x16,0x0F,0x38,0xB4,0xDA,0x56, /* 24 */
      0xA7,0x84,0xD9,0x04,0x51,0x90,0xCF,0xEF, /* 32 */
      0x32,0x4E,0x77,0x38,0x92,0x6C,0xFB,0xE5, /* 40 */
      0xF4,0xBF,0x8D,0x8D,0x8C,0x31,0xD7,0x63, /* 48 */
      0xDA,0x06,0xC8,0x0A,0xBB,0x11,0x85,0xEB, /* 56 */
      0x4F,0x7C,0x7B,        /* 59 */
     };
BYTE rho_1[]=
     {0x9E,0x37,0x79,0xB9,0x7F,0x4A,0x7C,0x15, /* 8 */
      0xF3,0x9C,0xC0,0x60,0x5C,0xED,0xC8,0x34, /* 16 */
      0x10,0x82,0x27,0x6B,0xF3,0xA2,0x72,0x51, /* 24 */
      0xF8,0x6C,0x6A,0x11,0xD0,0xC1,0x8E,0x95, /* 32 */
      0x27,0x67,0xF0,0xB1,0x53,0xD2,0x7B,0x7F, /* 40 */
      0x03,0x47,0x04,0x5B,0x5B,0xF1,0x82,0x7F, /* 48 */
      0x01,0x88,0x6F,0x09,0x28,0x40,0x30,0x02, /* 56 */
      0xC1,0xD6,0x4B,0xA4,0x0F,0x33,0x5E,0x36, /* 64 */

      0xF0,0x6A,0xD7 /* 67 */
     };

Appendix B - Source Listing (not included due to ITAR)
