* add_co(a,a) and shiftl_co(a,1) are identical.  Affects m68k, x86, pyr, etc.

* Try "subcc %g0,s1,%g0" on SPARC?

* Adding the bsfl instruction revealed a deficiency: We can't deal with
  instructions that give an undefined result for some inputs.  This is so
  because the sequences might fail to work only when the undefined result
  happen to become a certain value.  To cope with this, we have to make
  test_sequence try lots of values, but it can only do that if it knows
  about these instructions.

* Now we require equality between a computed goal value and a computed
  result.  Permit fuzzier function, like "something negative".  E.g., a
  fuzzy sgn function might be useful.

* Most importantly: Generalize the class of possible goal functions.  Allow
  them to be any mapping from a vector of words to another vector of words,
  each of arbitrary length.

  To make it fast, record after each instruction if it generates a value
  that is in (the vector) goal_value, and prune a sequence if it has not
  produced N-M requested values when M more instructions are allowed [N the
  number of words in goal_value].

  We should split `synth'.  The leaf search `synth' function could be
  written like currently, but with the leaf-test "if (allowed_cost > 0)"
  removed.  The non-leaf `synth' need to loop and look for the generated
  value in goal_value.  To avoid massive code replication, we have to put
  the synth function in a separate file, and play with cpp and #include.

  Make sure to handle the case were you find all values before the last
  instruction.  This might be non-trivial!  We know that we have to use the
  value from the ultimate instruction, otherwise we would have found this
  sequence before.  Problem is, we will either have to loop and look for
  the value in goal_value, or, probably much better, just accept the
  sequence.

* Add -test-on-cpu option triggering a mechanism for testing the generated
  sequences on the real hardware.  That would help debug the simulation
  code.

* I'd like to have a means to define that a goal function is not defined
  for all possible input values.  An extra parameter, ALLOWED_ARGUMENTS, to
  DEF_GOAL could take care of that.

  Also I'd like the user to have the possibility to add a list of immediate
  values to try for each goal function.  For example, 31 and 32 could be
  useful for ffs.

* Make it possible to handle more immediate values, for example by putting
  them in the immediate_val array.

* Interpret goal functions so the user doesn't need to recompile.
  Interpretation would make goal function evaluation slower than it is now,
  but goal function evaluation is not critical.

* Add code to algebraically prove that generated sequences are correct.

* Add bsrl/bsfl and bfffo to CISC synth.

* Check that PERFORM_CLZ works like RS/6000's cntlz and 29k's clz.  Is it
  ok for input == 0?

* A major speed improvement would be to make independent insn have a
  canonical order.  Consider `gts' on the SPARC.  This is probably not very
  hard, if insns are enumerated in some clever way and loop variables are
  passed down.  A very simple but potentially quite powerful mechanism: If
  the putative instruction doesn't depend on the last instruction, compare
  the putative instruction's opcode with the last instruction's opcode, and
  proceed iff, say, the < relation holds.

  After an instruction that sets carry (and there is another instruction
  with the same effect apart from that it doesn't affect carry), the
  generated carry has to be used.  [Fix this with a reservation vector
  --allow both making and deleting a reservation.  Make reservation when
  carry is generated and delete it when it is used.]  The leaf instructions
  have to input carry if an unused carry is pending.

  Make sure all computed values are used by subsequent instructions.  For
  example, if we have just two more values to compute and three yet unused
  values, the last two instructions have to restrict their input operands.

* Efficient pruning of sequences not using generated resources:

  Each generated instruction should record it's computed 'resources' in a
  list of unused resources.  (A written register is such a resource, and the
  carry flag is such a resource.)  When a resource is used by an
  instruction, it's removed from the data base.

  At each recursion, we check that the unused resources can be consumed
  with the allowed number of instructions.  If not, we back-track.

  Beware: A resource is not 'consumed' when it has been used.  I have seen
  optimal sequences that uses a generated carry more than once.

* Shift 32 steps on 68k is well-defined.  LSHIFTR_CO can be used to zero a
  word and simultaneously move the sign bit to the X flag, ASHIFTR_CO can
  be used to propagate the sign bit to the whole word and to the X flag.
  Useful?

* Model the exact timing, i.e., instruction overlap, superscalar issue,
  etc.  Requires modelling the CPU internal function units.

* `386: bt, clc, cmc, cdq[0->1], lea, shld, shrd, stc.

* Make the instruction description cleaner.  Something of this kind would
  be great:

  88k:
	{ADD,		"addu		%d{r},%1{r,0},%2{r,[0-FFFF]}"},
	{ADD_CI,	"addu.ci	%d{r},%1{r,0},%2{r,[0-FFFF]}"},
	...

  sparc:
	{ADD,		"add		%1{r,0},%2{r,[-1000,+FFF]},%d{r}"},
	{ADD_CI,	"addx		%1{r,0},%2{r,[-1000,+FFF]},%d{r}"},
	...

  We would need a tool to extract the information and generate a 'synth'
  function.  (That instruction description format would be useful to
  assemblers, disassemblers, and simulators too.)

* Include a 'synth' function for several targets in one gso binary.  Have a
  command line option -t<target> select which one to use.
