From xemacs-m  Mon Dec  2 01:54:18 1996
Received: from venus.Sun.COM (venus.Sun.COM [192.9.25.5]) by xemacs.cs.uiuc.edu (8.8.3/8.8.3) with SMTP id BAA26684 for <xemacs-beta@xemacs.org>; Mon, 2 Dec 1996 01:54:17 -0600 (CST)
Received: from Eng.Sun.COM ([129.146.1.25]) by venus.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id XAA07373; Sun, 1 Dec 1996 23:53:13 -0800
Received: from kindra.eng.sun.com by Eng.Sun.COM (SMI-8.6/SMI-5.3)
	id XAA28250; Sun, 1 Dec 1996 23:53:11 -0800
Received: from xemacs.eng.sun.com by kindra.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA11313; Sun, 1 Dec 1996 23:53:10 -0800
Received: by xemacs.eng.sun.com (SMI-8.6/SMI-SVR4)
	id XAA19995; Sun, 1 Dec 1996 23:53:08 -0800
Date: Sun, 1 Dec 1996 23:53:08 -0800
Message-Id: <199612020753.XAA19995@xemacs.eng.sun.com>
From: Martin Buchholz <mrb@Eng.Sun.COM>
To: Kenichi Handa <handa@etlken.etl.go.jp>
Cc: mule@etl.go.jp, xemacs-beta@xemacs.org, Bob Brewin <brewin@Eng.Sun.COM>,
        Teruhiko Kurosaka <Teruhiko.Kurosaka@Japan.Sun.COM>
Subject: Re: mule API of Emacs and XEmacs (Re: caesar-region)
In-Reply-To: <199610181120.UAA22219@etlken.etl.go.jp>
References: <199610170721.QAA26100@mikan.jaist.ac.jp>
	<199610181120.UAA22219@etlken.etl.go.jp>
Reply-To: Martin Buchholz <mrb@Eng.Sun.COM>
Mime-Version: 1.0 (generated by tm-edit 7.94)
Content-Type: text/plain; charset=US-ASCII

>>>>> "KH" == Kenichi Handa <handa@etlken.etl.go.jp> writes:

KH>      In XEmacs/mule, function `charset-chars' returns 94 or 96, function
KH>    `charset-dimension' returns 1 or 2.
KH> 	   (charset-chars 'japanese) -> 94
KH> 	   (charset-dimension 'japanese) -> 2
KH> 	   (charset-chars 'latin-1) -> 96
KH> 	   (charset-dimension 'latin-1) -> 1

Ben Wing did the original XEmacs Mule implementation.  I am currently
maintaining it.

(charset-dimension) gives information on the encoding space, not display
space, e.g.

(eq (charset-dimension 'japanese) 2) means that you need 2 octets to encode
a character, which means you need to pass 2 octet arguments to (make-char)

make-char: (CHARSET ARG1 &optional ARG2)
  -- a built-in function.
Make a multi-byte character from CHARSET and octets ARG1 and ARG2.

On the other hand, (charset-columns) gives the display width (in
character cells) of the character on a tty.  This is not necessarily
the same as (charset-chars), but they are often the same in practice.
On an X display, (charset-columns) gives a better approximation to the
actual displayed width of the character (especially if fonts are
chosen with this assumption in mind) than to assume all characters are
the same size, so (charset-columns) is used by the column-manipulating
functions like (current-column).  I think (charset-width) should refer
to the actual displayed width of a character, but that would be
undefined since this is font-dependent.  So we are better off not
having a (char-width) or (charset-width) function at all.  So I like
XEmacs' decision to have (charset-dimension) and (charset-columns),
but not (charset-width).  

But I am willing to change XEmacs' (charset-columns) function name to
(charset-width), for compatibility reasons.

KH>    Registered name of charsets are not systematic.  So systematic named
KH>    alias may be comfortable for users.  Charset defined in RFC 1922 is
KH>    systematic.  One way is to use this naming rule, like:
KH> 	   coding-system-<script>-<charset>-<edition>
KH>    For example:
KH> 	   coding-system-ja-jis-1978
KH> 	   coding-system-ja-jis-1990
KH> 	   coding-system-ja-sjis-1978
KH> 	   coding-system-ja-sjis-1990
KH> 	   coding-system-kr

I agree that we should use international standards to name charsets
and coding-systems.  Can't we just copy the names from the relevant
RFC's?  But I object to the prefixes `coding-system' and `charset-',
although I can be swayed on this point.  An interface that may be
common to both emacsen might be (get-charset):

get-charset: (NAME)
  -- a built-in function.
Retrieve the charset of the given name.
Same as `find-charset' except an error is signalled if there is no such
charset instead of returning nil.

So (get-charset 'japanese-jisx0208) would be an API that could work
with both emacsen.  With GNU Emacs the implementation could be:
(defun get-charset (symbol) 
  (symbol-value (intern (concat "charset-" (symbol-name symbol)))))

But I can see why you would not like this (for aesthetic reasons).

BTW, I have been making some changes for compatibility (!).
My latest XEmacs returns this for (charset-list):

(chinese-cns11643-7 ethiopic korean-ksc5601 chinese-big5-2
japanese-jisx0212 japanese-jisx0201-roman latin-5 arabic latin-3
control-1 chinese-cns11643-3 arabic-0 sisheng chinese-cns11643-5
chinese-cns11643-2 chinese-big5-1 japanese-jisx0208
japanese-jisx0201-kana hebrew cyrillic latin-2 ascii arabic-2
vietnamese-upper ipa chinese-cns11643-6 composite chinese-cns11643-1
chinese-gb japanese-jisx0208-1978 thai greek latin-4 latin-1
chinese-cns11643-4 arabic-1 vietnamese-lower)

There are several fundamental design differences between Emacs/Mule
and XEmacs/Mule.  I really don't like having a charset be simply an
integer.  One cannot write a robust (charsetp) function, for example.

Mule 19.33-delta returns this for (charset-list):

(0 129 130 131 132 133 134 135 136 137 138 140 141 144 145 146 147 148
149 150 151 152 153 160 161 162 163 164 165 166 224 245 246 247 248
249 250)

Perhaps even XEmacs' (charset-list) function is broken.  It returns a
list of symbols (that can be used to look up charsets), but not charset
objects themselves.  Contrast to what (buffer-list) returns (in both
emacsen!):

(#<buffer *scratch*> #<buffer *Minibuf-1*> #<buffer *Messages*>
#<buffer *Help*> #<buffer *Apropos*>)

Perhaps what XEmacs' (charset-list) function should return is what is
now returned by (mapcar #'get-charset (charset-list)):

(#<charset chinese-cns11643-7 "Chinese CNS Plane 7" 94x94 l2r cols=2
g0 final='M' reg=CNS11643[.-]\(.*[.-]\)?7$ 0x760> #<charset ethiopic
"Ethiopic" 94x94 l2r cols=2 g0 final='2' reg=Ethio 0x782> #<charset
korean-ksc5601 "Korean" 94x94 l2r cols=2 g0 final='C' reg=KSC5601
0x217> ....)

Martin

