Import Unicode 12.0 data files

* admin/unidata/copyright.html:
* admin/unidata/UnicodeData.txt:
* admin/unidata/SpecialCasing.txt:
* admin/unidata/NormalizationTest.txt:
* admin/unidata/Blocks.txt:
* admin/unidata/BidiMirroring.txt:
* admin/unidata/BidiBrackets.txt: New versions from Unicode 12.0.
* admin/unidata/unidata-gen.el (unidata-gen-file):
* admin/unidata/blocks.awk (name2alias): Adapt to changes in
new data files.
* admin/notes/unicode: Update and improve instructions for
importing a new Unicode Standard.

* lisp/international/characters.el (char-width-table): Update
lists of characters according to Unicode 12.0.
* lisp/international/fontset.el (script-representative-chars):
Add characters from new scripts to 'script-representative-chars'.
(otf-script-alist): Update according to data on the MS site.
* lisp/international/mule-cmds.el (ucs-names): Update unused
ranges of codepoints according to Unicode 12.0.

* test/lisp/international/ucs-normalize-tests.el
(ucs-normalize-tests--failing-lines-part1)
(ucs-normalize-tests--failing-lines-part2): Update for the new
NormalizationTest.txt file.
* test/manual/BidiCharacterTest.txt: Update with the new
version from Unicode 12.0.
This commit is contained in:
Eli Zaretskii 2019-03-09 12:41:48 +02:00
parent 4e082ce394
commit fddb915d23
15 changed files with 791 additions and 211 deletions

View file

@ -11,15 +11,20 @@ Emacs uses the following files from the Unicode Character Database
. UnicodeData.txt
. Blocks.txt
. BidiMirroring.txt
. BidiBrackets.txt
. BidiCharacterTest.txt
. BidiMirroring.txt
. IVD_Sequences.txt
. NormalizationTest.txt
. SpecialCasing.txt
. BidiCharacterTest.txt
First, the first 7 files need to be copied into admin/unidata/, and
then Emacs should be rebuilt for them to take effect. Rebuilding
the file https://www.unicode.org/copyright.html should be copied over
copyright.html in admin/unidata (that file might need trailing
whitespace removed before it can be committed to the Emacs
repository).
Then Emacs should be rebuilt for them to take effect. Rebuilding
Emacs updates several derived files elsewhere in the Emacs source
tree, mainly in lisp/international/.
@ -28,7 +33,10 @@ files, pay attention to any warning or error messages. In particular,
admin/unidata/unidata-gen.el will complain if UnicodeData.txt defines
new bidirectional attributes of characters, because unidata-gen.el,
bidi.c and dispextern.h need to be updated in that case; failure to do
so will cause aborts in redisplay.
so will cause aborts in redisplay. unidata-gen.el will also complain
if the format of the Unicode Copyright notice in copyright.html
changed in significant ways; in that case, update the regular
expression in unidata-gen-file used to extract the copyright string.
Next, review the changes in UnicodeData.txt vs the previous version
used by Emacs. Any changes, be it introduction of new scripts or
@ -40,7 +48,12 @@ and see if any changes in admin/unidata/blocks.awk are required.
The setting of char-width-table around line 1200 of characters.el
should be checked against the latest version of the Unicode file
EastAsianWidth.txt, and any discrepancies fixed.
EastAsianWidth.txt, and any discrepancies fixed: double-width
characters are those marked with W or F in that file. Zero-width
characters are not taken from EastAsianWidth.txt, they are those whose
Unicode General Category property is one of Mn, Me, or Cf, and also
Hangul jungseong and jongseong characters (a.k.a. "Jamo medial vowels"
and "Jamo final consonants").
Any new scripts added by UnicodeData.txt will also need updates to
script-representative-chars defined in fontset.el, and also the list