Split Unicode emoji into their own script

* admin/notes/unicode: Describe how to update emoji for new Unicode release.
* admin/unidata/Makefile.in: Pass emoji-data.txt to
blocks.awk script.
* admin/unidata/README: Add pointer to emoji-data.txt file.
* admin/unidata/blocks.awk: Parse emoji-data.txt, add emoji codepoints
to the 'emoji' script (except for the ASCII ones).
* admin/unidata/emoji-data.txt: New file.
* etc/NEWS: Describe new 'emoji' script.
* etc/TODO: Update item about 'emoji' script.
* lisp/international/fontset.el (script-representative-chars): Add
'emoji' script.
(setup-default-fontset): Add 'emoji' script.  Use "Noto Color Emoji"
as default font for it.
This commit is contained in:
Robert Pluim 2021-09-14 19:07:03 +02:00
parent 9ca737c419
commit 12d2fb58c4
8 changed files with 1388 additions and 13 deletions

View file

@ -16,13 +16,14 @@ Emacs uses the following files from the Unicode Character Database
. IVD_Sequences.txt
. NormalizationTest.txt
. SpecialCasing.txt
. emoji-data.txt
. BidiCharacterTest.txt
First, the first 7 files need to be copied into admin/unidata/, and
First, the first 8 files need to be copied into admin/unidata/, and
the file https://www.unicode.org/copyright.html should be copied over
copyright.html in admin/unidata (that file might need trailing
whitespace removed before it can be committed to the Emacs
repository).
copyright.html in admin/unidata (that file and emoji-data.txt might
need trailing whitespace removed before they can be committed to the
Emacs repository).
Then Emacs should be rebuilt for them to take effect. Rebuilding
Emacs updates several derived files elsewhere in the Emacs source
@ -85,8 +86,34 @@ modified to follow suit. If there's trailing whitespace in
BidiCharacterTest.txt, it should be removed before committing the new
version.
etc/NEWS should be updated to announce the support for the new Unicode
version.
Visit "emoji-data.txt" with the rebuilt Emacs, and check that an
appropriate font is being used for the emoji (by default Emacs uses
"Noto Color Emoji"). Running the following command in that buffer
will give you an idea of which codepoints are not supported by
whichever font Emacs is using.
(defun check-emoji-coverage (font-name-regexp)
"Display a buffer containing emoji codepoints for which FONT-NAME is not used.
This must be run from a buffer in the format of emoji-data.txt.
FONT-NAME-REGEXP is checked using `string-match'."
(interactive "MFont Name: ")
(save-excursion
(goto-char (point-min))
(let (res char name ifont)
(while (re-search-forward "; Emoji [^(]+(\\(.\\)[).\uFE0F]" nil t)
(setq char (aref (match-string 1) 0))
(setq ifont (car (internal-char-font nil char)))
(when ifont
(setq name (font-xlfd-name ifont)))
(if (or (not ifont) (not (string-match font-name-regexp name)))
(setq res (concat (string char) res))))
(when res
(with-output-to-temp-buffer "*Check-Emoji-Coverage*"
(princ (format "Font not matching '%s' was used for the following characters:\n%s"
font-name-regexp (reverse res))))))))
Finally, etc/NEWS should be updated to announce the support for the
new Unicode version.
Problems, fixmes and other unicode-related issues
-------------------------------------------------------------