Split Unicode emoji into their own script
* admin/notes/unicode: Describe how to update emoji for new Unicode release. * admin/unidata/Makefile.in: Pass emoji-data.txt to blocks.awk script. * admin/unidata/README: Add pointer to emoji-data.txt file. * admin/unidata/blocks.awk: Parse emoji-data.txt, add emoji codepoints to the 'emoji' script (except for the ASCII ones). * admin/unidata/emoji-data.txt: New file. * etc/NEWS: Describe new 'emoji' script. * etc/TODO: Update item about 'emoji' script. * lisp/international/fontset.el (script-representative-chars): Add 'emoji' script. (setup-default-fontset): Add 'emoji' script. Use "Noto Color Emoji" as default font for it.
This commit is contained in:
parent
9ca737c419
commit
12d2fb58c4
8 changed files with 1388 additions and 13 deletions
|
@ -16,13 +16,14 @@ Emacs uses the following files from the Unicode Character Database
|
|||
. IVD_Sequences.txt
|
||||
. NormalizationTest.txt
|
||||
. SpecialCasing.txt
|
||||
. emoji-data.txt
|
||||
. BidiCharacterTest.txt
|
||||
|
||||
First, the first 7 files need to be copied into admin/unidata/, and
|
||||
First, the first 8 files need to be copied into admin/unidata/, and
|
||||
the file https://www.unicode.org/copyright.html should be copied over
|
||||
copyright.html in admin/unidata (that file might need trailing
|
||||
whitespace removed before it can be committed to the Emacs
|
||||
repository).
|
||||
copyright.html in admin/unidata (that file and emoji-data.txt might
|
||||
need trailing whitespace removed before they can be committed to the
|
||||
Emacs repository).
|
||||
|
||||
Then Emacs should be rebuilt for them to take effect. Rebuilding
|
||||
Emacs updates several derived files elsewhere in the Emacs source
|
||||
|
@ -85,8 +86,34 @@ modified to follow suit. If there's trailing whitespace in
|
|||
BidiCharacterTest.txt, it should be removed before committing the new
|
||||
version.
|
||||
|
||||
etc/NEWS should be updated to announce the support for the new Unicode
|
||||
version.
|
||||
Visit "emoji-data.txt" with the rebuilt Emacs, and check that an
|
||||
appropriate font is being used for the emoji (by default Emacs uses
|
||||
"Noto Color Emoji"). Running the following command in that buffer
|
||||
will give you an idea of which codepoints are not supported by
|
||||
whichever font Emacs is using.
|
||||
|
||||
(defun check-emoji-coverage (font-name-regexp)
|
||||
"Display a buffer containing emoji codepoints for which FONT-NAME is not used.
|
||||
This must be run from a buffer in the format of emoji-data.txt.
|
||||
FONT-NAME-REGEXP is checked using `string-match'."
|
||||
(interactive "MFont Name: ")
|
||||
(save-excursion
|
||||
(goto-char (point-min))
|
||||
(let (res char name ifont)
|
||||
(while (re-search-forward "; Emoji [^(]+(\\(.\\)[).\uFE0F]" nil t)
|
||||
(setq char (aref (match-string 1) 0))
|
||||
(setq ifont (car (internal-char-font nil char)))
|
||||
(when ifont
|
||||
(setq name (font-xlfd-name ifont)))
|
||||
(if (or (not ifont) (not (string-match font-name-regexp name)))
|
||||
(setq res (concat (string char) res))))
|
||||
(when res
|
||||
(with-output-to-temp-buffer "*Check-Emoji-Coverage*"
|
||||
(princ (format "Font not matching '%s' was used for the following characters:\n%s"
|
||||
font-name-regexp (reverse res))))))))
|
||||
|
||||
Finally, etc/NEWS should be updated to announce the support for the
|
||||
new Unicode version.
|
||||
|
||||
Problems, fixmes and other unicode-related issues
|
||||
-------------------------------------------------------------
|
||||
|
|
|
@ -81,8 +81,10 @@ charscript.el: ${unidir}/charscript.el
|
|||
|
||||
blocks = ${srcdir}/blocks.awk
|
||||
|
||||
${unidir}/charscript.el: ${srcdir}/Blocks.txt ${blocks}
|
||||
$(AM_V_GEN)$(AWK) -f ${blocks} < $< > $@
|
||||
${unidir}/charscript.el: ${blocks}
|
||||
|
||||
${unidir}/charscript.el: ${srcdir}/Blocks.txt ${srcdir}/emoji-data.txt
|
||||
$(AM_V_GEN)$(AWK) -f ${blocks} $^ > $@
|
||||
|
||||
|
||||
.PHONY: clean bootstrap-clean distclean maintainer-clean gen-clean
|
||||
|
|
|
@ -32,3 +32,7 @@ http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt
|
|||
SpecialCasing.txt
|
||||
http://unicode.org/Public/UNIDATA/SpecialCasing.txt
|
||||
2017-04-20
|
||||
|
||||
emoji-data.txt
|
||||
https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-data.txt
|
||||
2021-08-26
|
||||
|
|
|
@ -131,7 +131,7 @@ function name2alias(name , w, w2) {
|
|||
return name
|
||||
}
|
||||
|
||||
/^[0-9A-F]/ {
|
||||
FILENAME == "Blocks.txt" && /^[0-9A-F]/ {
|
||||
sep = index($1, "..")
|
||||
len = length($1)
|
||||
s = substr($1,1,sep-1)
|
||||
|
@ -202,6 +202,29 @@ function name2alias(name , w, w2) {
|
|||
}
|
||||
}
|
||||
|
||||
# The space after 'Emoji' is significant in the next two rules.
|
||||
# This purposely and deliberately excludes codepoints <= 00FF
|
||||
FILENAME == "emoji-data.txt" && /^00[0-9A-F]{2}.*; Emoji / {
|
||||
next
|
||||
}
|
||||
FILENAME == "emoji-data.txt" && /^[0-9A-F].*; Emoji / {
|
||||
sep = index($1, "..")
|
||||
len = length($1)
|
||||
if (sep > 0) {
|
||||
s = substr($1,1,sep-1)
|
||||
e = substr($1,sep+2,len-sep-1)
|
||||
} else {
|
||||
s = $1
|
||||
e = $1
|
||||
}
|
||||
$1 = ""
|
||||
i++
|
||||
start[i] = s
|
||||
end[i] = e
|
||||
alt[i] = "emoji"
|
||||
name[i] = "Autogenerated emoji"
|
||||
}
|
||||
|
||||
END {
|
||||
print ";;; charscript.el --- character script table -*- lexical-binding:t -*-"
|
||||
print ";;; Automatically generated from admin/unidata/Blocks.txt"
|
||||
|
@ -223,6 +246,6 @@ END {
|
|||
print " (or (memq (nth 2 elt) script-list)"
|
||||
print " (setq script-list (cons (nth 2 elt) script-list))))"
|
||||
print " (set-char-table-extra-slot char-script-table 0 (nreverse script-list)))"
|
||||
print ""
|
||||
print "\n"
|
||||
print "(provide 'charscript)"
|
||||
}
|
||||
|
|
1297
admin/unidata/emoji-data.txt
Normal file
1297
admin/unidata/emoji-data.txt
Normal file
File diff suppressed because it is too large
Load diff
16
etc/NEWS
16
etc/NEWS
|
@ -131,6 +131,19 @@ of files visited via 'C-x C-f' and other commands.
|
|||
---
|
||||
** Emacs now supports Unicode Standard version 14.0.
|
||||
|
||||
+++
|
||||
** New character script 'emoji' has been created.
|
||||
Various blocks of codepoints have been split out of the 'symbol'
|
||||
script into their own 'emoji' script to allow easier specification of
|
||||
their treatment. Which codepoints are treated as emoji is derived
|
||||
from the Unicode specifications. Also, Emacs will now use "Noto Color
|
||||
Emoji" by default for that script. Use:
|
||||
|
||||
(set-fontset-font t 'emoji
|
||||
'("My New Emoji Font" . "iso10646-1") nil 'prepend)
|
||||
|
||||
to change the font used.
|
||||
|
||||
+++
|
||||
** New command 'execute-extended-command-for-buffer'.
|
||||
This new command, bound to 'M-S-x', works like
|
||||
|
@ -379,6 +392,9 @@ When customized to nil, it uses 'minibuffer-restore-windows' in
|
|||
'minibuffer-exit-hook' to remove only the window showing the
|
||||
"*Completions*" buffer.
|
||||
|
||||
|
||||
* Editing Changes in Emacs 28.1
|
||||
|
||||
---
|
||||
*** New variable 'redisplay-adhoc-scroll-in-resize-mini-windows'.
|
||||
Customizing it to nil will disable the ad-hoc auto-scrolling of
|
||||
|
|
3
etc/TODO
3
etc/TODO
|
@ -405,7 +405,8 @@ punctuation characters, disregarding the fontsets, should be modified
|
|||
to exempt Emoji from this rule (since Emoji characters belong to the
|
||||
'symbol' script in Emacs), so that use-default-font-for-symbols would
|
||||
not have to be tweaked to have Emoji display by default with a capable
|
||||
font.
|
||||
font. (This has now been implemented, but only one font is currently
|
||||
considered, please augment the list).
|
||||
|
||||
*** Consider changing the default display of Variation Selectors
|
||||
Emacs by default displays the Variation Selector (VS) codepoints not
|
||||
|
|
|
@ -278,7 +278,8 @@
|
|||
(indic-siyaq-number #x1ec71)
|
||||
(ottoman-siyaq-number #x1ed01)
|
||||
(mahjong-tile #x1F000)
|
||||
(domino-tile #x1F030)))
|
||||
(domino-tile #x1F030)
|
||||
(emoji #x1F300 #x1F600 #xFE0F)))
|
||||
|
||||
(defvar otf-script-alist)
|
||||
|
||||
|
@ -781,7 +782,8 @@
|
|||
toto
|
||||
adlam
|
||||
mahjong-tile
|
||||
domino-tile))
|
||||
domino-tile
|
||||
emoji))
|
||||
(set-fontset-font "fontset-default"
|
||||
script (font-spec :registry "iso10646-1" :script script)
|
||||
nil 'append))
|
||||
|
@ -894,6 +896,9 @@
|
|||
(#x1FA00 . #x1FA6F))) ;; Chess Symbols
|
||||
(set-fontset-font "fontset-default" symbol-subgroup
|
||||
'("Symbola" . "iso10646-1") nil 'prepend))
|
||||
;; This sets up the Emoji codepoints to use prettier fonts.
|
||||
(set-fontset-font "fontset-default" 'emoji
|
||||
'("Noto Color Emoji" . "iso10646-1") nil 'prepend)
|
||||
;; Box Drawing and Block Elements
|
||||
(set-fontset-font "fontset-default" '(#x2500 . #x259F)
|
||||
'("FreeMono" . "iso10646-1") nil 'prepend)
|
||||
|
|
Loading…
Add table
Reference in a new issue