Notes on the Beale Ciphers

The first 121 words of the Key for B1 would decipher 1/2  of  the
        message.   This  would  include  a  maximum stretch of 10
        clear text letters in a row.  

Using the DOI as a key for B1 gives mostly  garbage,  except  for
        the  curious  ocurrance  of  part  of the alphabet in the
	early part of the paper:


seq#   188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
code#  147 436 194 320  37 122 113   6 140   8 120 305  42  58 461  44 106
	a   b   c   d   e   f   g   h   i   i   j   k   l   m   m   n   o

	What are the odds that this is chance?

Other sequences of the first letters of the alphabet appear when using
the corrections described by Aaron and Matyas: "How the Message in
Paper No. 2 was Recovered"

150  251  284  308  231  124  211  486  225  401
a    a    a    b    b    c    d    e    f    f

25   485  18   436  65   84   200  283  118  320  138
a    b    b    b    c    c    c    c    d    d    e

24   283  134  92   63   246  486
a    c    b    c    d    d    e

147 436 195 320 37  122 113 6   140 8  120 305 42  58  461 44  106 301 13  408
a   b   c   d   e   f   g   h   i   i  j   k   l   m   m   n   o   h   p   p

Note that the largest number in any of the 4 sequences is 486.

        Reworked my copy of B2 to match  the  Ward  pamphlet.   I
        included  corrections for what are almost surely printing
        errors, and left in the  counting  errors  introduced  by
        Beale.   Also  tried  generating  a  version  of  the DOI
        numbered the way Beale might have done it by  hand.   The
        assumed method is to number only every tenth (or possibly 
        every  fifth) word of the document.  The numbering errors
        can most easily be explained if the ORIGINAL  VERSION  of
        the  DOI is used.  The original is written with very long
        lines that might cause the type of counting  errors  seen
        in B2.  Most of the numbering shifts can be attributed to 
        Beale  miscounting  when  going from the end of a line to
        the beginning of the next.  My corrections are: 

        1) Between `new' and `government' insert  a  filler  word
		`X'.  The X would be encoded as 156, but is never
                used (in any of the 3 papers).  Since Beale would 
                count  from the nearest `10-mark' when converting
                a letter to its position, he would  probably  not
                see  his  error  once  the document was numbered.
                This is the only error that requires inserting  a
                word into the document.  All others are caused by 
                dropping a word (or merging them to show how they 
                were derived).  

                Note that Ward just added the word  `a'  at  this
                point (a new government).  

        2) Merge the words `object'  and  `evinces'.   Thus  code
                word 244 could be read as `o' or `e' in B3.  This 
                error  is  also unlikely to be seen by Beale once
                made.  Merging really means dropping one  of  the
                two  words merged.  The program that reads such a
                merged pair will use  the  first  letter  of  the
                string.  

        3) Number 480=people, then number 480=dissolutions.  This 
                error  is  similar  to  the  others,  except  the
                `10-marks'  are  miscounted  instead  of just the
                distance between them.   Again,  the  mistake  is
                across  a  line boundary.  For counting purposes,
                the safest thing to do seems to be  to  drop  the
                sequence:  `He  has refused for a long time after
                such dissolutions'  (Just  as  Ward  did).   Code
                words  475-484  aren't  used anywhere.  Note also
		that none of the Gillogly Strings contain numbers
                higher  than  486.  The break at this point could
                be related to these strings.  Unfortunately,  the
                numbers 485 and 486 occur AFTER the break.....  

        4) `meantime' should be counted  as  2  words.   This  is
                clear   from   inspecting   the  DOI.   mean=509,
                time=510.  In this case, most  modern  texts  are
                wrong,  and  Beale  counted correctly.  Or: count
                `remaining' as two words  since  it's  hyphenated
                across a line break.  

        5) Merge `among' and `us' as word 627.  From  this  point
                on,  the  adjustments  have  little justification
                other than that they are made in the same  manner
		as  the  previous  ones.

        6) Merge `boundaries' and `so' as word 778.  There are  4
                places  that this error could have been made.  It
                only affects a few code words.  This corrects the 
                counting errors through  code  element  #811  and
                leaves only the `x' needing adjustment.  

        There are 4 words remaining in the DOI  that  contain  an
        `x': executioners, excited, sexes, and extend.  Which (if 
        any) of these did Beale use as element 1005?   I  suggest
        an   alternative  to  `sexes'  as  is  commonly  assumed:
        `Executioners' is the sixth word of a line and this  could  be
        element  #1005  if the numbering was restarted at 1000 at
	the beginning of this line.  Actually, this is pretty weak
	reasoning.  I just haven't seen a good of explanation as
	to why 1005=X in B2.

        Just recieved material ordered from the BCA: Ward's  1885
        pamphlet,  Hart's  version  and  the  '81 proceedings.  I
        found a few irritating differences between what I thought 
        were correct versions of the 3  ciphers  and  the  values
        published  in  Ward's paper.  In particular in B1 I found
        the following differences: 

		Position	Hart,etc.	Ward
		 260		 320		 324
		 405		  90		 290
		 462		 858		 868
		 516		 820		 826

	In B3 the following differences exist:

		 401		  11		   1
		 554		  29		  28

        Where did these errors come from?   Since  the  cleartext
	for  B2  is known, the errors there are understandable as
        either typesetting errors or mis-counts by the author  of
        the ciphers.  

Extending the `Gillogly strings':
        Another string emerges and the longest string is extended 
        if a count of 5 is added  to  elements  above  604.   The
        string: 

		604 230 436 664 582

        is `aabad' without the correction, and `aabcd'  with  it.
        Even more interesting is the cipher element #208  at  the
        end  of  the string: `abcdefghiijklmmnohpp'.  The element
        is 680, and is deciphered as `a' without  adding  5,  but
        becomes  `q'  by  adding  5.  Note that there is only one
        word in the entire DOI that  begins  with  `q'.   Against
        this  argument  is  the  clear(!)  requirement  that  the
        counting not be shifted by 5 for decoding B2.  

        Hammer's 1971 CACM article also notes significant  biases
        for multiples of 5 in B3.  

        Also, the second `h' in  the  string  is  represented  by
        301.  The 302nd word in my version of the DOI is `of'.  

Explanation for the Gillogly strings:
	Assume the method for encoding B1 and B2 went something like this:

        A partial list of numbers  is  prepared  by  writing  the
        alphabet  down  the left side of a piece of paper.  Words
        beginning with this  letter  are  then  noted  and  their
        position  in  the DOI is written on the appropriate line.
        This process continues until most of  the  lines  contain
        enough letters for the expected task.  B2 is then encoded 
        using  this  list;  with reference back to the DOI when a
        needed letter isn't in the prepared list, or the  encoder
        thinks a number has been used too often.  New numbers may 
        be added to the list during this process.  

        In order to encode  B1,  the  preparer  then  writes  the
        alphabet  ACROSS  THE  TOP of his prepared list of cipher
        elements  and  proceeds  as  before;  this  time  picking
        numbers  from  the  columns  instead  of rows.  Thus when
        encoding a particular word, it would be natural to  stick
        to  the top of the columns and work down while encoding a
	word.  Note that some of the Gillogly strings use numbers
        that  do  not  appear  in B2 and that this list must have
        been made up before  either  of  the  two  messages  were
        encoded.  

        If this scenario is correct, then the appearance of (say) 
        four C's  in  a  row  probably  indicate  four  different
        letters in the cleartext of B1.  

        Problems with this explanation: Some  rows  of  the  list
        would  have  only a few numbers in them and thus would be
        unlikely to appear in B1(doi).  This is  contradicted  by
        the  string:  `ijkl'.   There are only 6 words that start
        with `j' and only 2 that start with `k' in the first  811
        words  of the DOI.  Some rows of the list would also have
        many more than 26 numbers and thus  shouldn't  appear  at
        all in B1.  Finally, the BCA newsletter (June 82) article 
        by  Aaron  mentions that the key to B1 was in a format of
	25 letters per line,  basing this observation on the bias
        of numbers toward the center of a  key  list.   (3/30/83:
        This  tendency is very weak; my modulo program shows only
        one significant peak in a chart as described by Aaron) 

        From the recent discussion  in  the  BCA  newsletter,  it
        seems that Ward really was the agent for the author.

        Modulo tests.  Wrote a program to display the  remainders
        after  division  of  the  cipher  elements.  For example,
        there is a definite preference for multiples of 5 in  all
        3 ciphers: 

	B1 % 5, mean: 86.20, sigma:  8.30
	B1 %5 = 0: 78                            5
	B1 %5 = 1:125                                            5++++
	B1 %5 = 2: 59                     5---
	B1 %5 = 3: 80                            5
	B1 %5 = 4: 89                               5

	B2 % 5, mean:138.00, sigma: 10.51
	B2 %5 = 0:187                                         5++++
	B2 %5 = 1:134                              5
	B2 %5 = 2:145                                5
	B2 %5 = 3:140                               5
	B2 %5 = 4: 84                   5-----

	B3 % 5, mean:117.80, sigma:  9.71
	B3 %5 = 0: 81                     5----
	B3 %5 = 1:152                                       5+++
	B3 %5 = 2:111                             5
	B3 %5 = 3:121                               5
	B3 %5 = 4:124                                5

        For each message, the expected number of remainders for a 
        completely random distribution  is  printed  (the  mean),
        followed  by  the  number  of counts corresponding to one
        standard deviation away  from  the  mean  (sigma).   Each
        subsequent line shows the remainder being calculated, the 
        number  of  cipher  elements  with  this remainder, and a
        graphical representation of the deviation.  +'s  and  -'s
        after  the charted number indicate the number of standard
        deviations away from the mean that the count  represents.
        Sigmas of +/- 3 seem to be significant.  

        B2 prefers numbers evenly divisible by 5, while B3 avoids 
        them.  The pattern for all  3  ciphers  is  similar;  One
        remainder  is  preferred,  one avoided, and the remaining
        ones about random.  

        It's  not  surprising  to  find  a  particular  remainder
        preferred  over  others,  but  the  pattern for the Beale
        ciphers is peculiar.  The excess use of a  particular  is
        not  balanced  by  a  general  avoidance  of  the other 4
        remainders.  Instead a single  other  remainder  accounts
        for the excess of another.  What could cause this?  

        The pattern for B2%10 also shows  significant  deviations
        from random: 

	B2 % 10, mean: 69.00, sigma:  7.88
	B2 %10= 0:116                                                 10++++++
	B2 %10= 1: 60                           10-
	B2 %10= 2: 69                              10
	B2 %10= 3: 70                               10
	B2 %10= 4: 55                        10-
	B2 %10= 5: 71                               10
	B2 %10= 6: 74                                 10
	B2 %10= 7: 76                                  10
	B2 %10= 8: 70                               10
	B2 %10= 9: 29             10-----

	B3 % 10, mean: 58.90, sigma:  7.28
	B3 %10= 0: 30                10----
	B3 %10= 6: 87                                             10+++

        Again B2 prefers numbers  evenly  divisible  by  10,  and
        avoids  numbers  with  remainders of 9.  B3 avoids evenly
        divisible numbers, and concentrates on  remainders  of  6
        (which  is  related  to  remainders of 1 when dividing by
        5).  

Conclusions/Observations:
	1) The original DOI was the key for B2; numbering errors
		all ocurr at line break boundaries of the original DOI.
	2) A side table arranged alphabetically was prepared before
		B1 or B2 were encoded.  The Gillogly strings contain
		elements that do not appear in B2.
	3) All 3 ciphers show a bias for multiples of 5.
	4) A shift of 5 for elements >600 will create/extend
		the Gillogly strings in B1.
	5) X=1005 in B2, but no word near 1005 contains an X.
	6) The Ward pamphlet contains the words 'for silver' as the
		cleartext for B2, but the cipher contains no such
		set of numbers.
	7) J.B.Ward was not the author of "The Beale Papers". 
