Merged in CHARACTERS
This commit is contained in:
parent
462e90c9f9
commit
33d92c1f9d
1 changed files with 57 additions and 0 deletions
57
etc/TO-DO
57
etc/TO-DO
|
@ -24,3 +24,60 @@ Things useful to do for GNU Emacs:
|
|||
inverse video.
|
||||
|
||||
* VMS code to list a file directory. Make dired work.
|
||||
|
||||
Long range:
|
||||
|
||||
Ideas for extending GNU Emacs to deal with arbitrary character sets.
|
||||
|
||||
I would like GNU Emacs to be extended to handle all the world's alphabets
|
||||
and word signs. I don't expect to have time to do such a thing in the next
|
||||
few years, so here are my ideas on the best way to do it.
|
||||
|
||||
* Each graphic is represented by a sequence of ordinary 8-bit characters.
|
||||
|
||||
* All the characters that make up such a sequence have codes >= 0200.
|
||||
|
||||
* The first character of such a sequence is between 0200 and 0237.
|
||||
|
||||
* The remaining characters of such a sequence are all 0240 or higher.
|
||||
|
||||
* The first character of the sequence determines the number of characters
|
||||
in the sequence. Thus, 0200...0207 could start two-character sequences,
|
||||
0210...0227 could start three-character sequences, and 0230 could start
|
||||
four-character sequences. (Codes 0231...0237 would be reserved.)
|
||||
|
||||
* Several common alphabets, and some mathematical symbols, would get
|
||||
two-character sequences. (Probably Greek, Russian, Hebrew(?), Arabic(?),
|
||||
Korean, and Japanese kana). The remaining alphabets, and some versions of
|
||||
Chinese, would get three-character sequences. Other sets of Chinese
|
||||
characters would get four-character sequences.
|
||||
|
||||
Each country that uses Chinese characters has its own standard character
|
||||
set, and it is not easy to correlate them to avoid overlap. So there may
|
||||
need to be several sets of Chinese characters. That is why they need so
|
||||
much code space.
|
||||
|
||||
True support for Hebrew and Arabic requires dealing with the problem of
|
||||
writing direction for mixed text; I don't know what to do for that.
|
||||
|
||||
* The functions that use syntax table would determine the
|
||||
syntax of a sequence from its first character.
|
||||
|
||||
* Functions in indent.c for computing widths and columns would
|
||||
determine the width of a sequence from its first character.
|
||||
So would display routines.
|
||||
|
||||
* Only a few other editing routines would need any change. In
|
||||
particular, searching and regexp matching might not need any change.
|
||||
|
||||
* Most of the work required would be in redisplay. The only case that
|
||||
needs to be supported is with X windows, since ordinary terminals
|
||||
can't display all these characters anyway.
|
||||
|
||||
* There might need to be code to translate files from this format
|
||||
to whatever format is typically stored on disk.
|
||||
|
||||
|
||||
I would be very unhappy with half-measures, such as support for
|
||||
Japanese only.
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue