Split Unicode emoji into their own script

* admin/notes/unicode: Describe how to update emoji for new Unicode release.
* admin/unidata/Makefile.in: Pass emoji-data.txt to
blocks.awk script.
* admin/unidata/README: Add pointer to emoji-data.txt file.
* admin/unidata/blocks.awk: Parse emoji-data.txt, add emoji codepoints
to the 'emoji' script (except for the ASCII ones).
* admin/unidata/emoji-data.txt: New file.
* etc/NEWS: Describe new 'emoji' script.
* etc/TODO: Update item about 'emoji' script.
* lisp/international/fontset.el (script-representative-chars): Add
'emoji' script.
(setup-default-fontset): Add 'emoji' script.  Use "Noto Color Emoji"
as default font for it.
This commit is contained in:
Robert Pluim 2021-09-14 19:07:03 +02:00
parent 9ca737c419
commit 12d2fb58c4
8 changed files with 1388 additions and 13 deletions

View file

@ -16,13 +16,14 @@ Emacs uses the following files from the Unicode Character Database
. IVD_Sequences.txt
. NormalizationTest.txt
. SpecialCasing.txt
. emoji-data.txt
. BidiCharacterTest.txt
First, the first 7 files need to be copied into admin/unidata/, and
First, the first 8 files need to be copied into admin/unidata/, and
the file https://www.unicode.org/copyright.html should be copied over
copyright.html in admin/unidata (that file might need trailing
whitespace removed before it can be committed to the Emacs
repository).
copyright.html in admin/unidata (that file and emoji-data.txt might
need trailing whitespace removed before they can be committed to the
Emacs repository).
Then Emacs should be rebuilt for them to take effect. Rebuilding
Emacs updates several derived files elsewhere in the Emacs source
@ -85,8 +86,34 @@ modified to follow suit. If there's trailing whitespace in
BidiCharacterTest.txt, it should be removed before committing the new
version.
etc/NEWS should be updated to announce the support for the new Unicode
version.
Visit "emoji-data.txt" with the rebuilt Emacs, and check that an
appropriate font is being used for the emoji (by default Emacs uses
"Noto Color Emoji"). Running the following command in that buffer
will give you an idea of which codepoints are not supported by
whichever font Emacs is using.
(defun check-emoji-coverage (font-name-regexp)
"Display a buffer containing emoji codepoints for which FONT-NAME is not used.
This must be run from a buffer in the format of emoji-data.txt.
FONT-NAME-REGEXP is checked using `string-match'."
(interactive "MFont Name: ")
(save-excursion
(goto-char (point-min))
(let (res char name ifont)
(while (re-search-forward "; Emoji [^(]+(\\(.\\)[).\uFE0F]" nil t)
(setq char (aref (match-string 1) 0))
(setq ifont (car (internal-char-font nil char)))
(when ifont
(setq name (font-xlfd-name ifont)))
(if (or (not ifont) (not (string-match font-name-regexp name)))
(setq res (concat (string char) res))))
(when res
(with-output-to-temp-buffer "*Check-Emoji-Coverage*"
(princ (format "Font not matching '%s' was used for the following characters:\n%s"
font-name-regexp (reverse res))))))))
Finally, etc/NEWS should be updated to announce the support for the
new Unicode version.
Problems, fixmes and other unicode-related issues
-------------------------------------------------------------

View file

@ -81,8 +81,10 @@ charscript.el: ${unidir}/charscript.el
blocks = ${srcdir}/blocks.awk
${unidir}/charscript.el: ${srcdir}/Blocks.txt ${blocks}
$(AM_V_GEN)$(AWK) -f ${blocks} < $< > $@
${unidir}/charscript.el: ${blocks}
${unidir}/charscript.el: ${srcdir}/Blocks.txt ${srcdir}/emoji-data.txt
$(AM_V_GEN)$(AWK) -f ${blocks} $^ > $@
.PHONY: clean bootstrap-clean distclean maintainer-clean gen-clean

View file

@ -32,3 +32,7 @@ http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
SpecialCasing.txt
http://unicode.org/Public/UNIDATA/SpecialCasing.txt
2017-04-20
emoji-data.txt
https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-data.txt
2021-08-26

View file

@ -131,7 +131,7 @@ function name2alias(name , w, w2) {
return name
}
/^[0-9A-F]/ {
FILENAME == "Blocks.txt" && /^[0-9A-F]/ {
sep = index($1, "..")
len = length($1)
s = substr($1,1,sep-1)
@ -202,6 +202,29 @@ function name2alias(name , w, w2) {
}
}
# The space after 'Emoji' is significant in the next two rules.
# This purposely and deliberately excludes codepoints <= 00FF
FILENAME == "emoji-data.txt" && /^00[0-9A-F]{2}.*; Emoji / {
next
}
FILENAME == "emoji-data.txt" && /^[0-9A-F].*; Emoji / {
sep = index($1, "..")
len = length($1)
if (sep > 0) {
s = substr($1,1,sep-1)
e = substr($1,sep+2,len-sep-1)
} else {
s = $1
e = $1
}
$1 = ""
i++
start[i] = s
end[i] = e
alt[i] = "emoji"
name[i] = "Autogenerated emoji"
}
END {
print ";;; charscript.el --- character script table -*- lexical-binding:t -*-"
print ";;; Automatically generated from admin/unidata/Blocks.txt"
@ -223,6 +246,6 @@ END {
print " (or (memq (nth 2 elt) script-list)"
print " (setq script-list (cons (nth 2 elt) script-list))))"
print " (set-char-table-extra-slot char-script-table 0 (nreverse script-list)))"
print ""
print "\n"
print "(provide 'charscript)"
}

1297
admin/unidata/emoji-data.txt Normal file

File diff suppressed because it is too large Load diff

View file

@ -131,6 +131,19 @@ of files visited via 'C-x C-f' and other commands.
---
** Emacs now supports Unicode Standard version 14.0.
+++
** New character script 'emoji' has been created.
Various blocks of codepoints have been split out of the 'symbol'
script into their own 'emoji' script to allow easier specification of
their treatment. Which codepoints are treated as emoji is derived
from the Unicode specifications. Also, Emacs will now use "Noto Color
Emoji" by default for that script. Use:
(set-fontset-font t 'emoji
'("My New Emoji Font" . "iso10646-1") nil 'prepend)
to change the font used.
+++
** New command 'execute-extended-command-for-buffer'.
This new command, bound to 'M-S-x', works like
@ -379,6 +392,9 @@ When customized to nil, it uses 'minibuffer-restore-windows' in
'minibuffer-exit-hook' to remove only the window showing the
"*Completions*" buffer.
* Editing Changes in Emacs 28.1
---
*** New variable 'redisplay-adhoc-scroll-in-resize-mini-windows'.
Customizing it to nil will disable the ad-hoc auto-scrolling of

View file

@ -405,7 +405,8 @@ punctuation characters, disregarding the fontsets, should be modified
to exempt Emoji from this rule (since Emoji characters belong to the
'symbol' script in Emacs), so that use-default-font-for-symbols would
not have to be tweaked to have Emoji display by default with a capable
font.
font. (This has now been implemented, but only one font is currently
considered, please augment the list).
*** Consider changing the default display of Variation Selectors
Emacs by default displays the Variation Selector (VS) codepoints not

View file

@ -278,7 +278,8 @@
(indic-siyaq-number #x1ec71)
(ottoman-siyaq-number #x1ed01)
(mahjong-tile #x1F000)
(domino-tile #x1F030)))
(domino-tile #x1F030)
(emoji #x1F300 #x1F600 #xFE0F)))
(defvar otf-script-alist)
@ -781,7 +782,8 @@
toto
adlam
mahjong-tile
domino-tile))
domino-tile
emoji))
(set-fontset-font "fontset-default"
script (font-spec :registry "iso10646-1" :script script)
nil 'append))
@ -894,6 +896,9 @@
(#x1FA00 . #x1FA6F))) ;; Chess Symbols
(set-fontset-font "fontset-default" symbol-subgroup
'("Symbola" . "iso10646-1") nil 'prepend))
;; This sets up the Emoji codepoints to use prettier fonts.
(set-fontset-font "fontset-default" 'emoji
'("Noto Color Emoji" . "iso10646-1") nil 'prepend)
;; Box Drawing and Block Elements
(set-fontset-font "fontset-default" '(#x2500 . #x259F)
'("FreeMono" . "iso10646-1") nil 'prepend)