Case mapping
There are three sets of characters: upper case, lower case, and neither. toUpperCase() and toLowerCase() provide mappings between them. Let's define the following sets:
C \mathbb{C} C is the set of all single-code-point Unicode characters.
I I I := { c ∈ C ∣ t o U p p e r C a s e ( c ) = t o L o w e r C a s e ( c ) = c } \{ c\in\mathbb{C}\mid \mathtt{toUpperCase}(c) = \mathtt{toLowerCase}(c) = c \} { c ∈ C ∣ toUpperCase ( c ) = toLowerCase ( c ) = c } are characters that are invariant under both toUpperCase() and toLowerCase(), such as numbers and emojis. These are "uninteresting" characters that aren't in the scope of discussion.
I I I is closed under both functions, i.e. there does not exist a character c ∈ C c\in\mathbb{C} c ∈ C such that c ∉ I c\notin I c ∈ / I but t o U p p e r C a s e ( c ) ∈ I \mathtt{toUpperCase}(c)\in I toUpperCase ( c ) ∈ I or t o L o w e r C a s e ( c ) ∈ I \mathtt{toLowerCase}(c)\in I toLowerCase ( c ) ∈ I . This has been built into the data collector code shown below, so you can check that there are no logs in the browser console. (Note that I I I also includes code points that aren't assigned to characters, so the exact set of characters is hard to get.)
U C \mathbb{UC} UC := C ∖ I \mathbb{C}\setminus I C ∖ I . Characters in U C \mathbb{UC} UC are never mapped to I I I . Size: 2907
M L M_L M L := { c ∈ U C ∣ t o L o w e r C a s e ( t o U p p e r C a s e ( c ) ) ∈ C , t o U p p e r C a s e ( c ) ∉ C } \{ c\in\mathbb{UC}\mid \mathtt{toLowerCase}(\mathtt{toUpperCase}(c))\in\mathbb{C}, \mathtt{toUpperCase}(c)\notin\mathbb{C} \} { c ∈ UC ∣ toLowerCase ( toUpperCase ( c )) ∈ C , toUpperCase ( c ) ∈ / C } = Character set (21 ) ǰ (U+01F0) → J̌ (U+004A U+030C) → ǰ (U+01F0) ΐ (U+0390) → Ϊ́ (U+03AA U+0301) → ΐ (U+0390) ΰ (U+03B0) → Ϋ́ (U+03AB U+0301) → ΰ (U+03B0) ẖ (U+1E96) → H̱ (U+0048 U+0331) → ẖ (U+1E96) ẗ (U+1E97) → T̈ (U+0054 U+0308) → ẗ (U+1E97) ẘ (U+1E98) → W̊ (U+0057 U+030A) → ẘ (U+1E98) ẙ (U+1E99) → Y̊ (U+0059 U+030A) → ẙ (U+1E99) ὐ (U+1F50) → Υ̓ (U+03A5 U+0313) → ὐ (U+1F50) ὒ (U+1F52) → Υ̓̀ (U+03A5 U+0313 U+0300) → ὒ (U+1F52) ὔ (U+1F54) → Υ̓́ (U+03A5 U+0313 U+0301) → ὔ (U+1F54) ὖ (U+1F56) → Υ̓͂ (U+03A5 U+0313 U+0342) → ὖ (U+1F56) ᾶ (U+1FB6) → Α͂ (U+0391 U+0342) → ᾶ (U+1FB6) ῆ (U+1FC6) → Η͂ (U+0397 U+0342) → ῆ (U+1FC6) ῒ (U+1FD2) → Ϊ̀ (U+03AA U+0300) → ῒ (U+1FD2) ῖ (U+1FD6) → Ι͂ (U+0399 U+0342) → ῖ (U+1FD6) ῗ (U+1FD7) → Ϊ͂ (U+03AA U+0342) → ῗ (U+1FD7) ῢ (U+1FE2) → Ϋ̀ (U+03AB U+0300) → ῢ (U+1FE2) ῤ (U+1FE4) → Ρ̓ (U+03A1 U+0313) → ῤ (U+1FE4) ῦ (U+1FE6) → Υ͂ (U+03A5 U+0342) → ῦ (U+1FE6) ῧ (U+1FE7) → Ϋ͂ (U+03AB U+0342) → ῧ (U+1FE7) ῶ (U+1FF6) → Ω͂ (U+03A9 U+0342) → ῶ (U+1FF6) To summarize, they are:
M U M_U M U := { c ∈ U C ∣ t o U p p e r C a s e ( t o L o w e r C a s e ( c ) ) ∈ C , t o L o w e r C a s e ( c ) ∉ C } \{ c\in\mathbb{UC}\mid \mathtt{toUpperCase}(\mathtt{toLowerCase}(c))\in\mathbb{C}, \mathtt{toLowerCase}(c)\notin\mathbb{C} \} { c ∈ UC ∣ toUpperCase ( toLowerCase ( c )) ∈ C , toLowerCase ( c ) ∈ / C } = Character set (1 ) İ (U+0130) → i̇ (U+0069 U+0307) → İ (U+0130) The only character is:
N L N_L N L := { c ∈ U C ∣ t o L o w e r C a s e ( t o U p p e r C a s e ( c ) ) ∉ C , t o U p p e r C a s e ( c ) ∉ C } \{ c\in\mathbb{UC}\mid \mathtt{toLowerCase}(\mathtt{toUpperCase}(c))\notin\mathbb{C}, \mathtt{toUpperCase}(c)\notin\mathbb{C} \} { c ∈ UC ∣ toLowerCase ( toUpperCase ( c )) ∈ / C , toUpperCase ( c ) ∈ / C } = Character set (79 ) ß (U+00DF) → SS (U+0053 U+0053) → ss (U+0073 U+0073) ʼn (U+0149) → ʼN (U+02BC U+004E) → ʼn (U+02BC U+006E) և (U+0587) → ԵՒ (U+0535 U+0552) → եւ (U+0565 U+0582) ẚ (U+1E9A) → Aʾ (U+0041 U+02BE) → aʾ (U+0061 U+02BE) ᾀ (U+1F80) → ἈΙ (U+1F08 U+0399) → ἀι (U+1F00 U+03B9) ᾁ (U+1F81) → ἉΙ (U+1F09 U+0399) → ἁι (U+1F01 U+03B9) ᾂ (U+1F82) → ἊΙ (U+1F0A U+0399) → ἂι (U+1F02 U+03B9) ᾃ (U+1F83) → ἋΙ (U+1F0B U+0399) → ἃι (U+1F03 U+03B9) ᾄ (U+1F84) → ἌΙ (U+1F0C U+0399) → ἄι (U+1F04 U+03B9) ᾅ (U+1F85) → ἍΙ (U+1F0D U+0399) → ἅι (U+1F05 U+03B9) ᾆ (U+1F86) → ἎΙ (U+1F0E U+0399) → ἆι (U+1F06 U+03B9) ᾇ (U+1F87) → ἏΙ (U+1F0F U+0399) → ἇι (U+1F07 U+03B9) ᾈ (U+1F88) → ἈΙ (U+1F08 U+0399) → ἀι (U+1F00 U+03B9) ᾉ (U+1F89) → ἉΙ (U+1F09 U+0399) → ἁι (U+1F01 U+03B9) ᾊ (U+1F8A) → ἊΙ (U+1F0A U+0399) → ἂι (U+1F02 U+03B9) ᾋ (U+1F8B) → ἋΙ (U+1F0B U+0399) → ἃι (U+1F03 U+03B9) ᾌ (U+1F8C) → ἌΙ (U+1F0C U+0399) → ἄι (U+1F04 U+03B9) ᾍ (U+1F8D) → ἍΙ (U+1F0D U+0399) → ἅι (U+1F05 U+03B9) ᾎ (U+1F8E) → ἎΙ (U+1F0E U+0399) → ἆι (U+1F06 U+03B9) ᾏ (U+1F8F) → ἏΙ (U+1F0F U+0399) → ἇι (U+1F07 U+03B9) ᾐ (U+1F90) → ἨΙ (U+1F28 U+0399) → ἠι (U+1F20 U+03B9) ᾑ (U+1F91) → ἩΙ (U+1F29 U+0399) → ἡι (U+1F21 U+03B9) ᾒ (U+1F92) → ἪΙ (U+1F2A U+0399) → ἢι (U+1F22 U+03B9) ᾓ (U+1F93) → ἫΙ (U+1F2B U+0399) → ἣι (U+1F23 U+03B9) ᾔ (U+1F94) → ἬΙ (U+1F2C U+0399) → ἤι (U+1F24 U+03B9) ᾕ (U+1F95) → ἭΙ (U+1F2D U+0399) → ἥι (U+1F25 U+03B9) ᾖ (U+1F96) → ἮΙ (U+1F2E U+0399) → ἦι (U+1F26 U+03B9) ᾗ (U+1F97) → ἯΙ (U+1F2F U+0399) → ἧι (U+1F27 U+03B9) ᾘ (U+1F98) → ἨΙ (U+1F28 U+0399) → ἠι (U+1F20 U+03B9) ᾙ (U+1F99) → ἩΙ (U+1F29 U+0399) → ἡι (U+1F21 U+03B9) ᾚ (U+1F9A) → ἪΙ (U+1F2A U+0399) → ἢι (U+1F22 U+03B9) ᾛ (U+1F9B) → ἫΙ (U+1F2B U+0399) → ἣι (U+1F23 U+03B9) ᾜ (U+1F9C) → ἬΙ (U+1F2C U+0399) → ἤι (U+1F24 U+03B9) ᾝ (U+1F9D) → ἭΙ (U+1F2D U+0399) → ἥι (U+1F25 U+03B9) ᾞ (U+1F9E) → ἮΙ (U+1F2E U+0399) → ἦι (U+1F26 U+03B9) ᾟ (U+1F9F) → ἯΙ (U+1F2F U+0399) → ἧι (U+1F27 U+03B9) ᾠ (U+1FA0) → ὨΙ (U+1F68 U+0399) → ὠι (U+1F60 U+03B9) ᾡ (U+1FA1) → ὩΙ (U+1F69 U+0399) → ὡι (U+1F61 U+03B9) ᾢ (U+1FA2) → ὪΙ (U+1F6A U+0399) → ὢι (U+1F62 U+03B9) ᾣ (U+1FA3) → ὫΙ (U+1F6B U+0399) → ὣι (U+1F63 U+03B9) ᾤ (U+1FA4) → ὬΙ (U+1F6C U+0399) → ὤι (U+1F64 U+03B9) ᾥ (U+1FA5) → ὭΙ (U+1F6D U+0399) → ὥι (U+1F65 U+03B9) ᾦ (U+1FA6) → ὮΙ (U+1F6E U+0399) → ὦι (U+1F66 U+03B9) ᾧ (U+1FA7) → ὯΙ (U+1F6F U+0399) → ὧι (U+1F67 U+03B9) ᾨ (U+1FA8) → ὨΙ (U+1F68 U+0399) → ὠι (U+1F60 U+03B9) ᾩ (U+1FA9) → ὩΙ (U+1F69 U+0399) → ὡι (U+1F61 U+03B9) ᾪ (U+1FAA) → ὪΙ (U+1F6A U+0399) → ὢι (U+1F62 U+03B9) ᾫ (U+1FAB) → ὫΙ (U+1F6B U+0399) → ὣι (U+1F63 U+03B9) ᾬ (U+1FAC) → ὬΙ (U+1F6C U+0399) → ὤι (U+1F64 U+03B9) ᾭ (U+1FAD) → ὭΙ (U+1F6D U+0399) → ὥι (U+1F65 U+03B9) ᾮ (U+1FAE) → ὮΙ (U+1F6E U+0399) → ὦι (U+1F66 U+03B9) ᾯ (U+1FAF) → ὯΙ (U+1F6F U+0399) → ὧι (U+1F67 U+03B9) ᾲ (U+1FB2) → ᾺΙ (U+1FBA U+0399) → ὰι (U+1F70 U+03B9) ᾳ (U+1FB3) → ΑΙ (U+0391 U+0399) → αι (U+03B1 U+03B9) ᾴ (U+1FB4) → ΆΙ (U+0386 U+0399) → άι (U+03AC U+03B9) ᾷ (U+1FB7) → Α͂Ι (U+0391 U+0342 U+0399) → ᾶι (U+1FB6 U+03B9) ᾼ (U+1FBC) → ΑΙ (U+0391 U+0399) → αι (U+03B1 U+03B9) ῂ (U+1FC2) → ῊΙ (U+1FCA U+0399) → ὴι (U+1F74 U+03B9) ῃ (U+1FC3) → ΗΙ (U+0397 U+0399) → ηι (U+03B7 U+03B9) ῄ (U+1FC4) → ΉΙ (U+0389 U+0399) → ήι (U+03AE U+03B9) ῇ (U+1FC7) → Η͂Ι (U+0397 U+0342 U+0399) → ῆι (U+1FC6 U+03B9) ῌ (U+1FCC) → ΗΙ (U+0397 U+0399) → ηι (U+03B7 U+03B9) ῲ (U+1FF2) → ῺΙ (U+1FFA U+0399) → ὼι (U+1F7C U+03B9) ῳ (U+1FF3) → ΩΙ (U+03A9 U+0399) → ωι (U+03C9 U+03B9) ῴ (U+1FF4) → ΏΙ (U+038F U+0399) → ώι (U+03CE U+03B9) ῷ (U+1FF7) → Ω͂Ι (U+03A9 U+0342 U+0399) → ῶι (U+1FF6 U+03B9) ῼ (U+1FFC) → ΩΙ (U+03A9 U+0399) → ωι (U+03C9 U+03B9) ff (U+FB00) → FF (U+0046 U+0046) → ff (U+0066 U+0066) fi (U+FB01) → FI (U+0046 U+0049) → fi (U+0066 U+0069) fl (U+FB02) → FL (U+0046 U+004C) → fl (U+0066 U+006C) ffi (U+FB03) → FFI (U+0046 U+0046 U+0049) → ffi (U+0066 U+0066 U+0069) ffl (U+FB04) → FFL (U+0046 U+0046 U+004C) → ffl (U+0066 U+0066 U+006C) ſt (U+FB05) → ST (U+0053 U+0054) → st (U+0073 U+0074) st (U+FB06) → ST (U+0053 U+0054) → st (U+0073 U+0074) ﬓ (U+FB13) → ՄՆ (U+0544 U+0546) → մն (U+0574 U+0576) ﬔ (U+FB14) → ՄԵ (U+0544 U+0535) → մե (U+0574 U+0565) ﬕ (U+FB15) → ՄԻ (U+0544 U+053B) → մի (U+0574 U+056B) ﬖ (U+FB16) → ՎՆ (U+054E U+0546) → վն (U+057E U+0576) ﬗ (U+FB17) → ՄԽ (U+0544 U+053D) → մխ (U+0574 U+056D) To summarize, they are:
Latin-1 Supplement
LATIN SMALL LETTER SHARP S (U+00DF)
Latin Extended-A
LATIN SMALL LETTER N PRECEDED BY APOSTROPHE (U+0149)
Latin Extended Additional
LATIN SMALL LETTER A WITH RIGHT HALF RING (U+1E9A)
Armenian
ARMENIAN SMALL LIGATURE ECH YIWN (U+0587)
Greek Extended
Greek small letters with ypogegrammeni (G r L Gr_L G r L ): U+1F{8,9,A}0 – U+1F{8,9,A}7, U+1F{B,C,F}2 – U+1F{B,C,F}4, U+1F{B,C,F}7 (the Iota subscript, COMBINING GREEK YPOGEGRAMMENI (U+0345) itself, maps to GREEK CAPITAL LETTER IOTA (U+0399) which will be discussed later)
Greek capital letters with prosgegrammeni (G r U Gr_U G r U ): U+1F{8,9,A}8 – U+1F{8,9,A}F, U+1F{B,C,F}C
Alphabetic Presentation Forms
All latin ligatures: U+FB00 – U+FB06
All Armenian ligatures: U+FB13 – U+FB17
N U N_U N U := { c ∈ U C ∣ t o U p p e r C a s e ( t o L o w e r C a s e ( c ) ) ∉ C , t o L o w e r C a s e ( c ) ∉ C } \{ c\in\mathbb{UC}\mid \mathtt{toUpperCase}(\mathtt{toLowerCase}(c))\notin\mathbb{C}, \mathtt{toLowerCase}(c)\notin\mathbb{C} \} { c ∈ UC ∣ toUpperCase ( toLowerCase ( c )) ∈ / C , toLowerCase ( c ) ∈ / C } = ∅
M L ′ M_L' M L ′ := t o U p p e r C a s e ( M L ) \mathtt{toUpperCase}(M_L) toUpperCase ( M L ) . Conceptually, these are uppercase letters that just don't have a single Unicode code point.
M U ′ M_U' M U ′ := t o L o w e r C a s e ( M U ) \mathtt{toLowerCase}(M_U) toLowerCase ( M U ) . Conceptually, these are lowercase letters that just don't have a single Unicode code point.
N L ′ N_L' N L ′ := t o U p p e r C a s e ( N L ) \mathtt{toUpperCase}(N_L) toUpperCase ( N L )
N U ′ N_U' N U ′ := t o L o w e r C a s e ( N U ) \mathtt{toLowerCase}(N_U) toLowerCase ( N U )
R U \mathcal{R}_U R U := t o U p p e r C a s e ( U C ) \mathtt{toUpperCase}(\mathbb{UC}) toUpperCase ( UC ) . Size: 1464
R L \mathcal{R}_L R L := t o L o w e r C a s e ( U C ) \mathtt{toLowerCase}(\mathbb{UC}) toLowerCase ( UC ) . Size: 1485
Also define the following predicates:
i s U p p e r C a s e ( c ) = / \ p { U p p e r c a s e _ L e t t e r } / u . t e s t ( c ) \mathtt{isUpperCase}(c) = \mathtt{/\backslash p\{Uppercase\_Letter\}/u.test(c)} isUpperCase ( c ) = /\p { Uppercase_Letter } /u.test ( c )
i s L o w e r C a s e ( c ) = / \ p { L o w e r c a s e _ L e t t e r } / u . t e s t ( c ) \mathtt{isLowerCase}(c) = \mathtt{/\backslash p\{Lowercase\_Letter\}/u.test(c)} isLowerCase ( c ) = /\p { Lowercase_Letter } /u.test ( c )
Thus define the following sets:
U e x t U_{ext} U e x t := { c ∈ C ∣ i s U p p e r C a s e ( c ) } \{ c\in\mathbb{C}\mid \mathtt{isUpperCase}(c) \} { c ∈ C ∣ isUpperCase ( c )} . Size: 1876 (Note: Unicode utility lists 1831, with the extra 10 possibly being duplicates by normalization.)
L e x t L_{ext} L e x t := { c ∈ C ∣ i s L o w e r C a s e ( c ) } \{ c\in\mathbb{C}\mid \mathtt{isLowerCase}(c) \} { c ∈ C ∣ isLowerCase ( c )} . Size: 2273 (Note: Unicode utility lists 2233, with the extra 10 possibly being duplicates by normalization.)
U U U := U e x t ∖ I U_{ext}\setminus I U e x t ∖ I . Size: 1350
L L L := L e x t ∖ I L_{ext}\setminus I L e x t ∖ I . Size: 1441
Define the following terminologies:
c c c is upper case if c ∈ U e x t c\in U_{ext} c ∈ U e x t .
c c c is lower case if c ∈ L e x t c\in L_{ext} c ∈ L e x t .
upper case and lower case are mutually exclusive: U e x t ∩ L e x t U_{ext}\cap L_{ext} U e x t ∩ L e x t = ∅
c c c is cased if c ∈ U e x t ∪ L e x t c\in U_{ext}\cup L_{ext} c ∈ U e x t ∪ L e x t .
c c c is uncased if c ∉ U e x t ∪ L e x t c\notin U_{ext}\cup L_{ext} c ∈ / U e x t ∪ L e x t .
c c c is lowercase variant if t o L o w e r C a s e ( c ) ≠ c \mathtt{toLowerCase}(c)\neq c toLowerCase ( c ) = c .
c c c is uppercase variant if t o U p p e r C a s e ( c ) ≠ c \mathtt{toUpperCase}(c)\neq c toUpperCase ( c ) = c .
c c c is case-mapping variant if c c c is either lowercase variant or uppercase variant.
c c c is case-mapping invariant if c c c is neither lowercase variant nor uppercase variant.
NOTE: To maximize the number of single-code-point characters in discussion, we normalize the output with .normalize("NFC").
The first invariant we want to establish is toUpperCase(toUpperCase(c)) == toUpperCase(c) and toLowerCase(toLowerCase(c)) == toLowerCase(c) for all c ∈ U C c\in\mathbb{UC} c ∈ UC .
{ c ∈ U C ∣ t o U p p e r C a s e ( t o U p p e r C a s e ( c ) ) ≠ t o U p p e r C a s e ( c ) } \{ c\in\mathbb{UC}\mid \mathtt{toUpperCase}(\mathtt{toUpperCase}(c))\neq \mathtt{toUpperCase}(c) \} { c ∈ UC ∣ toUpperCase ( toUpperCase ( c )) = toUpperCase ( c )} = ∅
{ c ∈ U C ∣ t o L o w e r C a s e ( t o L o w e r C a s e ( c ) ) ≠ t o L o w e r C a s e ( c ) } \{ c\in\mathbb{UC}\mid \mathtt{toLowerCase}(\mathtt{toLowerCase}(c))\neq \mathtt{toLowerCase}(c) \} { c ∈ UC ∣ toLowerCase ( toLowerCase ( c )) = toLowerCase ( c )} = ∅
This also means that if c c c is uppercase variant, then c c c will not be the output of toUpperCase(); similarly, if c c c is lowercase variant, then c c c will not be the output of toLowerCase().
The ranges of toUpperCase() and toLowerCase() are disjoint:
R U ∩ R L \mathcal{R}_U\cap \mathcal{R}_L R U ∩ R L = ∅
But, they are not partitions of U C \mathbb{UC} UC :
U C ∖ ( R U ∪ R L ) \mathbb{UC}\setminus(\mathcal{R}_U\cup \mathcal{R}_L) UC ∖ ( R U ∪ R L ) = Character set (31 ) Dž (U+01C5) Lj (U+01C8) Nj (U+01CB) Dz (U+01F2) ᾈ (U+1F88) ᾉ (U+1F89) ᾊ (U+1F8A) ᾋ (U+1F8B) ᾌ (U+1F8C) ᾍ (U+1F8D) ᾎ (U+1F8E) ᾏ (U+1F8F) ᾘ (U+1F98) ᾙ (U+1F99) ᾚ (U+1F9A) ᾛ (U+1F9B) ᾜ (U+1F9C) ᾝ (U+1F9D) ᾞ (U+1F9E) ᾟ (U+1F9F) ᾨ (U+1FA8) ᾩ (U+1FA9) ᾪ (U+1FAA) ᾫ (U+1FAB) ᾬ (U+1FAC) ᾭ (U+1FAD) ᾮ (U+1FAE) ᾯ (U+1FAF) ᾼ (U+1FBC) ῌ (U+1FCC) ῼ (U+1FFC)
27 of these characters are G r U Gr_U G r U . The other 4 are:
These characters cannot be produced by toUpperCase() or toLowerCase() with any input, including themselves.
Yes . toUpperCase() and toLowerCase() are identity functions on U e x t U_{ext} U e x t and L e x t L_{ext} L e x t , respectively.
{ c ∈ U e x t ∣ t o U p p e r C a s e ( c ) ≠ c } \{ c\in U_{ext}\mid \mathtt{toUpperCase}(c)\neq c \} { c ∈ U e x t ∣ toUpperCase ( c ) = c } = ∅
{ c ∈ L e x t ∣ t o L o w e r C a s e ( c ) ≠ c } \{ c\in L_{ext}\mid \mathtt{toLowerCase}(c)\neq c \} { c ∈ L e x t ∣ toLowerCase ( c ) = c } = ∅
This means upper case implies uppercase invariance, and lower case implies lowercase invariance.
No . U e x t U_{ext} U e x t and L e x t L_{ext} L e x t are not proper subsets of U C \mathbb{UC} UC :
U e x t ∖ U U_{ext}\setminus\mathbb{U} U e x t ∖ U = Character set (526 ) ϒ (U+03D2) ϓ (U+03D3) ϔ (U+03D4) (U+1C89) ℂ (U+2102) ℇ (U+2107) ℋ (U+210B) ℌ (U+210C) ℍ (U+210D) ℐ (U+2110) ℑ (U+2111) ℒ (U+2112) ℕ (U+2115) ℙ (U+2119) ℚ (U+211A) ℛ (U+211B) ℜ (U+211C) ℝ (U+211D) ℤ (U+2124) ℨ (U+2128) ℬ (U+212C) ℭ (U+212D) ℰ (U+2130) ℱ (U+2131) ℳ (U+2133) ℾ (U+213E) ℿ (U+213F) ⅅ (U+2145) (U+A7CB) (U+A7CC) (U+A7CE) (U+A7D2) (U+A7D4) (U+A7DA) (U+A7DC) (U+10D50) (U+10D51) (U+10D52) (U+10D53) (U+10D54) (U+10D55) (U+10D56) (U+10D57) (U+10D58) (U+10D59) (U+10D5A) (U+10D5B) (U+10D5C) (U+10D5D) (U+10D5E) (U+10D5F) (U+10D60) (U+10D61) (U+10D62) (U+10D63) (U+10D64) (U+10D65) (U+16EA0) (U+16EA1) (U+16EA2) (U+16EA3) (U+16EA4) (U+16EA5) (U+16EA6) (U+16EA7) (U+16EA8) (U+16EA9) (U+16EAA) (U+16EAB) (U+16EAC) (U+16EAD) (U+16EAE) (U+16EAF) (U+16EB0) (U+16EB1) (U+16EB2) (U+16EB3) (U+16EB4) (U+16EB5) (U+16EB6) (U+16EB7) (U+16EB8) 𝐀 (U+1D400) 𝐁 (U+1D401) 𝐂 (U+1D402) 𝐃 (U+1D403) 𝐄 (U+1D404) 𝐅 (U+1D405) 𝐆 (U+1D406) 𝐇 (U+1D407) 𝐈 (U+1D408) 𝐉 (U+1D409) 𝐊 (U+1D40A) 𝐋 (U+1D40B) 𝐌 (U+1D40C) 𝐍 (U+1D40D) 𝐎 (U+1D40E) 𝐏 (U+1D40F) 𝐐 (U+1D410) 𝐑 (U+1D411) 𝐒 (U+1D412) 𝐓 (U+1D413) 𝐔 (U+1D414) 𝐕 (U+1D415) 𝐖 (U+1D416) 𝐗 (U+1D417) 𝐘 (U+1D418) 𝐙 (U+1D419) 𝐴 (U+1D434) 𝐵 (U+1D435) 𝐶 (U+1D436) 𝐷 (U+1D437) 𝐸 (U+1D438) 𝐹 (U+1D439) 𝐺 (U+1D43A) 𝐻 (U+1D43B) 𝐼 (U+1D43C) 𝐽 (U+1D43D) 𝐾 (U+1D43E) 𝐿 (U+1D43F) 𝑀 (U+1D440) 𝑁 (U+1D441) 𝑂 (U+1D442) 𝑃 (U+1D443) 𝑄 (U+1D444) 𝑅 (U+1D445) 𝑆 (U+1D446) 𝑇 (U+1D447) 𝑈 (U+1D448) 𝑉 (U+1D449) 𝑊 (U+1D44A) 𝑋 (U+1D44B) 𝑌 (U+1D44C) 𝑍 (U+1D44D) 𝑨 (U+1D468) 𝑩 (U+1D469) 𝑪 (U+1D46A) 𝑫 (U+1D46B) 𝑬 (U+1D46C) 𝑭 (U+1D46D) 𝑮 (U+1D46E) 𝑯 (U+1D46F) 𝑰 (U+1D470) 𝑱 (U+1D471) 𝑲 (U+1D472) 𝑳 (U+1D473) 𝑴 (U+1D474) 𝑵 (U+1D475) 𝑶 (U+1D476) 𝑷 (U+1D477) 𝑸 (U+1D478) 𝑹 (U+1D479) 𝑺 (U+1D47A) 𝑻 (U+1D47B) 𝑼 (U+1D47C) 𝑽 (U+1D47D) 𝑾 (U+1D47E) 𝑿 (U+1D47F) 𝒀 (U+1D480) 𝒁 (U+1D481) 𝒜 (U+1D49C) 𝒞 (U+1D49E) 𝒟 (U+1D49F) 𝒢 (U+1D4A2) 𝒥 (U+1D4A5) 𝒦 (U+1D4A6) 𝒩 (U+1D4A9) 𝒪 (U+1D4AA) 𝒫 (U+1D4AB) 𝒬 (U+1D4AC) 𝒮 (U+1D4AE) 𝒯 (U+1D4AF) 𝒰 (U+1D4B0) 𝒱 (U+1D4B1) 𝒲 (U+1D4B2) 𝒳 (U+1D4B3) 𝒴 (U+1D4B4) 𝒵 (U+1D4B5) 𝓐 (U+1D4D0) 𝓑 (U+1D4D1) 𝓒 (U+1D4D2) 𝓓 (U+1D4D3) 𝓔 (U+1D4D4) 𝓕 (U+1D4D5) 𝓖 (U+1D4D6) 𝓗 (U+1D4D7) 𝓘 (U+1D4D8) 𝓙 (U+1D4D9) 𝓚 (U+1D4DA) 𝓛 (U+1D4DB) 𝓜 (U+1D4DC) 𝓝 (U+1D4DD) 𝓞 (U+1D4DE) 𝓟 (U+1D4DF) 𝓠 (U+1D4E0) 𝓡 (U+1D4E1) 𝓢 (U+1D4E2) 𝓣 (U+1D4E3) 𝓤 (U+1D4E4) 𝓥 (U+1D4E5) 𝓦 (U+1D4E6) 𝓧 (U+1D4E7) 𝓨 (U+1D4E8) 𝓩 (U+1D4E9) 𝔄 (U+1D504) 𝔅 (U+1D505) 𝔇 (U+1D507) 𝔈 (U+1D508) 𝔉 (U+1D509) 𝔊 (U+1D50A) 𝔍 (U+1D50D) 𝔎 (U+1D50E) 𝔏 (U+1D50F) 𝔐 (U+1D510) 𝔑 (U+1D511) 𝔒 (U+1D512) 𝔓 (U+1D513) 𝔔 (U+1D514) 𝔖 (U+1D516) 𝔗 (U+1D517) 𝔘 (U+1D518) 𝔙 (U+1D519) 𝔚 (U+1D51A) 𝔛 (U+1D51B) 𝔜 (U+1D51C) 𝔸 (U+1D538) 𝔹 (U+1D539) 𝔻 (U+1D53B) 𝔼 (U+1D53C) 𝔽 (U+1D53D) 𝔾 (U+1D53E) 𝕀 (U+1D540) 𝕁 (U+1D541) 𝕂 (U+1D542) 𝕃 (U+1D543) 𝕄 (U+1D544) 𝕆 (U+1D546) 𝕊 (U+1D54A) 𝕋 (U+1D54B) 𝕌 (U+1D54C) 𝕍 (U+1D54D) 𝕎 (U+1D54E) 𝕏 (U+1D54F) 𝕐 (U+1D550) 𝕬 (U+1D56C) 𝕭 (U+1D56D) 𝕮 (U+1D56E) 𝕯 (U+1D56F) 𝕰 (U+1D570) 𝕱 (U+1D571) 𝕲 (U+1D572) 𝕳 (U+1D573) 𝕴 (U+1D574) 𝕵 (U+1D575) 𝕶 (U+1D576) 𝕷 (U+1D577) 𝕸 (U+1D578) 𝕹 (U+1D579) 𝕺 (U+1D57A) 𝕻 (U+1D57B) 𝕼 (U+1D57C) 𝕽 (U+1D57D) 𝕾 (U+1D57E) 𝕿 (U+1D57F) 𝖀 (U+1D580) 𝖁 (U+1D581) 𝖂 (U+1D582) 𝖃 (U+1D583) 𝖄 (U+1D584) 𝖅 (U+1D585) 𝖠 (U+1D5A0) 𝖡 (U+1D5A1) 𝖢 (U+1D5A2) 𝖣 (U+1D5A3) 𝖤 (U+1D5A4) 𝖥 (U+1D5A5) 𝖦 (U+1D5A6) 𝖧 (U+1D5A7) 𝖨 (U+1D5A8) 𝖩 (U+1D5A9) 𝖪 (U+1D5AA) 𝖫 (U+1D5AB) 𝖬 (U+1D5AC) 𝖭 (U+1D5AD) 𝖮 (U+1D5AE) 𝖯 (U+1D5AF) 𝖰 (U+1D5B0) 𝖱 (U+1D5B1) 𝖲 (U+1D5B2) 𝖳 (U+1D5B3) 𝖴 (U+1D5B4) 𝖵 (U+1D5B5) 𝖶 (U+1D5B6) 𝖷 (U+1D5B7) 𝖸 (U+1D5B8) 𝖹 (U+1D5B9) 𝗔 (U+1D5D4) 𝗕 (U+1D5D5) 𝗖 (U+1D5D6) 𝗗 (U+1D5D7) 𝗘 (U+1D5D8) 𝗙 (U+1D5D9) 𝗚 (U+1D5DA) 𝗛 (U+1D5DB) 𝗜 (U+1D5DC) 𝗝 (U+1D5DD) 𝗞 (U+1D5DE) 𝗟 (U+1D5DF) 𝗠 (U+1D5E0) 𝗡 (U+1D5E1) 𝗢 (U+1D5E2) 𝗣 (U+1D5E3) 𝗤 (U+1D5E4) 𝗥 (U+1D5E5) 𝗦 (U+1D5E6) 𝗧 (U+1D5E7) 𝗨 (U+1D5E8) 𝗩 (U+1D5E9) 𝗪 (U+1D5EA) 𝗫 (U+1D5EB) 𝗬 (U+1D5EC) 𝗭 (U+1D5ED) 𝘈 (U+1D608) 𝘉 (U+1D609) 𝘊 (U+1D60A) 𝘋 (U+1D60B) 𝘌 (U+1D60C) 𝘍 (U+1D60D) 𝘎 (U+1D60E) 𝘏 (U+1D60F) 𝘐 (U+1D610) 𝘑 (U+1D611) 𝘒 (U+1D612) 𝘓 (U+1D613) 𝘔 (U+1D614) 𝘕 (U+1D615) 𝘖 (U+1D616) 𝘗 (U+1D617) 𝘘 (U+1D618) 𝘙 (U+1D619) 𝘚 (U+1D61A) 𝘛 (U+1D61B) 𝘜 (U+1D61C) 𝘝 (U+1D61D) 𝘞 (U+1D61E) 𝘟 (U+1D61F) 𝘠 (U+1D620) 𝘡 (U+1D621) 𝘼 (U+1D63C) 𝘽 (U+1D63D) 𝘾 (U+1D63E) 𝘿 (U+1D63F) 𝙀 (U+1D640) 𝙁 (U+1D641) 𝙂 (U+1D642) 𝙃 (U+1D643) 𝙄 (U+1D644) 𝙅 (U+1D645) 𝙆 (U+1D646) 𝙇 (U+1D647) 𝙈 (U+1D648) 𝙉 (U+1D649) 𝙊 (U+1D64A) 𝙋 (U+1D64B) 𝙌 (U+1D64C) 𝙍 (U+1D64D) 𝙎 (U+1D64E) 𝙏 (U+1D64F) 𝙐 (U+1D650) 𝙑 (U+1D651) 𝙒 (U+1D652) 𝙓 (U+1D653) 𝙔 (U+1D654) 𝙕 (U+1D655) 𝙰 (U+1D670) 𝙱 (U+1D671) 𝙲 (U+1D672) 𝙳 (U+1D673) 𝙴 (U+1D674) 𝙵 (U+1D675) 𝙶 (U+1D676) 𝙷 (U+1D677) 𝙸 (U+1D678) 𝙹 (U+1D679) 𝙺 (U+1D67A) 𝙻 (U+1D67B) 𝙼 (U+1D67C) 𝙽 (U+1D67D) 𝙾 (U+1D67E) 𝙿 (U+1D67F) 𝚀 (U+1D680) 𝚁 (U+1D681) 𝚂 (U+1D682) 𝚃 (U+1D683) 𝚄 (U+1D684) 𝚅 (U+1D685) 𝚆 (U+1D686) 𝚇 (U+1D687) 𝚈 (U+1D688) 𝚉 (U+1D689) 𝚨 (U+1D6A8) 𝚩 (U+1D6A9) 𝚪 (U+1D6AA) 𝚫 (U+1D6AB) 𝚬 (U+1D6AC) 𝚭 (U+1D6AD) 𝚮 (U+1D6AE) 𝚯 (U+1D6AF) 𝚰 (U+1D6B0) 𝚱 (U+1D6B1) 𝚲 (U+1D6B2) 𝚳 (U+1D6B3) 𝚴 (U+1D6B4) 𝚵 (U+1D6B5) 𝚶 (U+1D6B6) 𝚷 (U+1D6B7) 𝚸 (U+1D6B8) 𝚹 (U+1D6B9) 𝚺 (U+1D6BA) 𝚻 (U+1D6BB) 𝚼 (U+1D6BC) 𝚽 (U+1D6BD) 𝚾 (U+1D6BE) 𝚿 (U+1D6BF) 𝛀 (U+1D6C0) 𝛢 (U+1D6E2) 𝛣 (U+1D6E3) 𝛤 (U+1D6E4) 𝛥 (U+1D6E5) 𝛦 (U+1D6E6) 𝛧 (U+1D6E7) 𝛨 (U+1D6E8) 𝛩 (U+1D6E9) 𝛪 (U+1D6EA) 𝛫 (U+1D6EB) 𝛬 (U+1D6EC) 𝛭 (U+1D6ED) 𝛮 (U+1D6EE) 𝛯 (U+1D6EF) 𝛰 (U+1D6F0) 𝛱 (U+1D6F1) 𝛲 (U+1D6F2) 𝛳 (U+1D6F3) 𝛴 (U+1D6F4) 𝛵 (U+1D6F5) 𝛶 (U+1D6F6) 𝛷 (U+1D6F7) 𝛸 (U+1D6F8) 𝛹 (U+1D6F9) 𝛺 (U+1D6FA) 𝜜 (U+1D71C) 𝜝 (U+1D71D) 𝜞 (U+1D71E) 𝜟 (U+1D71F) 𝜠 (U+1D720) 𝜡 (U+1D721) 𝜢 (U+1D722) 𝜣 (U+1D723) 𝜤 (U+1D724) 𝜥 (U+1D725) 𝜦 (U+1D726) 𝜧 (U+1D727) 𝜨 (U+1D728) 𝜩 (U+1D729) 𝜪 (U+1D72A) 𝜫 (U+1D72B) 𝜬 (U+1D72C) 𝜭 (U+1D72D) 𝜮 (U+1D72E) 𝜯 (U+1D72F) 𝜰 (U+1D730) 𝜱 (U+1D731) 𝜲 (U+1D732) 𝜳 (U+1D733) 𝜴 (U+1D734) 𝝖 (U+1D756) 𝝗 (U+1D757) 𝝘 (U+1D758) 𝝙 (U+1D759) 𝝚 (U+1D75A) 𝝛 (U+1D75B) 𝝜 (U+1D75C) 𝝝 (U+1D75D) 𝝞 (U+1D75E) 𝝟 (U+1D75F) 𝝠 (U+1D760) 𝝡 (U+1D761) 𝝢 (U+1D762) 𝝣 (U+1D763) 𝝤 (U+1D764) 𝝥 (U+1D765) 𝝦 (U+1D766) 𝝧 (U+1D767) 𝝨 (U+1D768) 𝝩 (U+1D769) 𝝪 (U+1D76A) 𝝫 (U+1D76B) 𝝬 (U+1D76C) 𝝭 (U+1D76D) 𝝮 (U+1D76E) 𝞐 (U+1D790) 𝞑 (U+1D791) 𝞒 (U+1D792) 𝞓 (U+1D793) 𝞔 (U+1D794) 𝞕 (U+1D795) 𝞖 (U+1D796) 𝞗 (U+1D797) 𝞘 (U+1D798) 𝞙 (U+1D799) 𝞚 (U+1D79A) 𝞛 (U+1D79B) 𝞜 (U+1D79C) 𝞝 (U+1D79D) 𝞞 (U+1D79E) 𝞟 (U+1D79F) 𝞠 (U+1D7A0) 𝞡 (U+1D7A1) 𝞢 (U+1D7A2) 𝞣 (U+1D7A3) 𝞤 (U+1D7A4) 𝞥 (U+1D7A5) 𝞦 (U+1D7A6) 𝞧 (U+1D7A7) 𝞨 (U+1D7A8) 𝟊 (U+1D7CA)
L e x t ∖ U L_{ext}\setminus\mathbb{U} L e x t ∖ U = Character set (832 ) ĸ (U+0138) ƍ (U+018D) ƛ (U+019B) ƪ (U+01AA) ƫ (U+01AB) ƺ (U+01BA) ƾ (U+01BE) ȡ (U+0221) ȴ (U+0234) ȵ (U+0235) ȶ (U+0236) ȷ (U+0237) ȸ (U+0238) ȹ (U+0239) ɕ (U+0255) ɘ (U+0258) ɚ (U+025A) ɝ (U+025D) ɞ (U+025E) ɟ (U+025F) ɢ (U+0262) ɤ (U+0264) ɧ (U+0267) ɭ (U+026D) ɮ (U+026E) ɰ (U+0270) ɳ (U+0273) ɴ (U+0274) ɶ (U+0276) ɷ (U+0277) ɸ (U+0278) ɹ (U+0279) ɺ (U+027A) ɻ (U+027B) ɼ (U+027C) ɾ (U+027E) ɿ (U+027F) ʁ (U+0281) ʄ (U+0284) ʅ (U+0285) ʆ (U+0286) ʍ (U+028D) ʎ (U+028E) ʏ (U+028F) ʐ (U+0290) ʑ (U+0291) ʓ (U+0293) ʖ (U+0296) ʗ (U+0297) ʘ (U+0298) ʙ (U+0299) ʚ (U+029A) ʛ (U+029B) ʜ (U+029C) ʟ (U+029F) ʠ (U+02A0) ʡ (U+02A1) ʢ (U+02A2) ʣ (U+02A3) ʤ (U+02A4) ʥ (U+02A5) ʦ (U+02A6) ʧ (U+02A7) ʨ (U+02A8) ʩ (U+02A9) ʪ (U+02AA) ʫ (U+02AB) ʬ (U+02AC) ʭ (U+02AD) ʮ (U+02AE) ʯ (U+02AF) ϼ (U+03FC) ՠ (U+0560) ֈ (U+0588) (U+1C8A) ᴀ (U+1D00) ᴁ (U+1D01) ᴂ (U+1D02) ᴃ (U+1D03) ᴄ (U+1D04) ᴅ (U+1D05) ᴆ (U+1D06) ᴇ (U+1D07) ᴈ (U+1D08) ᴉ (U+1D09) ᴊ (U+1D0A) ᴋ (U+1D0B) ᴌ (U+1D0C) ᴍ (U+1D0D) ᴎ (U+1D0E) ᴏ (U+1D0F) ᴐ (U+1D10) ᴑ (U+1D11) ᴒ (U+1D12) ᴓ (U+1D13) ᴔ (U+1D14) ᴕ (U+1D15) ᴖ (U+1D16) ᴗ (U+1D17) ᴘ (U+1D18) ᴙ (U+1D19) ᴚ (U+1D1A) ᴛ (U+1D1B) ᴜ (U+1D1C) ᴝ (U+1D1D) ᴞ (U+1D1E) ᴟ (U+1D1F) ᴠ (U+1D20) ᴡ (U+1D21) ᴢ (U+1D22) ᴣ (U+1D23) ᴤ (U+1D24) ᴥ (U+1D25) ᴦ (U+1D26) ᴧ (U+1D27) ᴨ (U+1D28) ᴩ (U+1D29) ᴪ (U+1D2A) ᴫ (U+1D2B) ᵫ (U+1D6B) ᵬ (U+1D6C) ᵭ (U+1D6D) ᵮ (U+1D6E) ᵯ (U+1D6F) ᵰ (U+1D70) ᵱ (U+1D71) ᵲ (U+1D72) ᵳ (U+1D73) ᵴ (U+1D74) ᵵ (U+1D75) ᵶ (U+1D76) ᵷ (U+1D77) ᵺ (U+1D7A) ᵻ (U+1D7B) ᵼ (U+1D7C) ᵾ (U+1D7E) ᵿ (U+1D7F) ᶀ (U+1D80) ᶁ (U+1D81) ᶂ (U+1D82) ᶃ (U+1D83) ᶄ (U+1D84) ᶅ (U+1D85) ᶆ (U+1D86) ᶇ (U+1D87) ᶈ (U+1D88) ᶉ (U+1D89) ᶊ (U+1D8A) ᶋ (U+1D8B) ᶌ (U+1D8C) ᶍ (U+1D8D) ᶏ (U+1D8F) ᶐ (U+1D90) ᶑ (U+1D91) ᶒ (U+1D92) ᶓ (U+1D93) ᶔ (U+1D94) ᶕ (U+1D95) ᶖ (U+1D96) ᶗ (U+1D97) ᶘ (U+1D98) ᶙ (U+1D99) ᶚ (U+1D9A) ẜ (U+1E9C) ẝ (U+1E9D) ẟ (U+1E9F) ℊ (U+210A) ℎ (U+210E) ℏ (U+210F) ℓ (U+2113) ℯ (U+212F) ℴ (U+2134) ℹ (U+2139) ℼ (U+213C) ℽ (U+213D) ⅆ (U+2146) ⅇ (U+2147) ⅈ (U+2148) ⅉ (U+2149) ⱱ (U+2C71) ⱴ (U+2C74) ⱷ (U+2C77) ⱸ (U+2C78) ⱹ (U+2C79) ⱺ (U+2C7A) ⱻ (U+2C7B) ⳤ (U+2CE4) ꜰ (U+A730) ꜱ (U+A731) ꝱ (U+A771) ꝲ (U+A772) ꝳ (U+A773) ꝴ (U+A774) ꝵ (U+A775) ꝶ (U+A776) ꝷ (U+A777) ꝸ (U+A778) ꞎ (U+A78E) ꞕ (U+A795) ꞯ (U+A7AF) (U+A7CD) (U+A7CF) ꟓ (U+A7D3) ꟕ (U+A7D5) (U+A7DB) ꟺ (U+A7FA) ꬰ (U+AB30) ꬱ (U+AB31) ꬲ (U+AB32) ꬳ (U+AB33) ꬴ (U+AB34) ꬵ (U+AB35) ꬶ (U+AB36) ꬷ (U+AB37) ꬸ (U+AB38) ꬹ (U+AB39) ꬺ (U+AB3A) ꬻ (U+AB3B) ꬼ (U+AB3C) ꬽ (U+AB3D) ꬾ (U+AB3E) ꬿ (U+AB3F) ꭀ (U+AB40) ꭁ (U+AB41) ꭂ (U+AB42) ꭃ (U+AB43) ꭄ (U+AB44) ꭅ (U+AB45) ꭆ (U+AB46) ꭇ (U+AB47) ꭈ (U+AB48) ꭉ (U+AB49) ꭊ (U+AB4A) ꭋ (U+AB4B) ꭌ (U+AB4C) ꭍ (U+AB4D) ꭎ (U+AB4E) ꭏ (U+AB4F) ꭐ (U+AB50) ꭑ (U+AB51) ꭒ (U+AB52) ꭔ (U+AB54) ꭕ (U+AB55) ꭖ (U+AB56) ꭗ (U+AB57) ꭘ (U+AB58) ꭙ (U+AB59) ꭚ (U+AB5A) ꭠ (U+AB60) ꭡ (U+AB61) ꭢ (U+AB62) ꭣ (U+AB63) ꭤ (U+AB64) ꭥ (U+AB65) ꭦ (U+AB66) ꭧ (U+AB67) ꭨ (U+AB68) (U+10D70) (U+10D71) (U+10D72) (U+10D73) (U+10D74) (U+10D75) (U+10D76) (U+10D77) (U+10D78) (U+10D79) (U+10D7A) (U+10D7B) (U+10D7C) (U+10D7D) (U+10D7E) (U+10D7F) (U+10D80) (U+10D81) (U+10D82) (U+10D83) (U+10D84) (U+10D85) (U+16EBB) (U+16EBC) (U+16EBD) (U+16EBE) (U+16EBF) (U+16EC0) (U+16EC1) (U+16EC2) (U+16EC3) (U+16EC4) (U+16EC5) (U+16EC6) (U+16EC7) (U+16EC8) (U+16EC9) (U+16ECA) (U+16ECB) (U+16ECC) (U+16ECD) (U+16ECE) (U+16ECF) (U+16ED0) (U+16ED1) (U+16ED2) (U+16ED3) 𝐚 (U+1D41A) 𝐛 (U+1D41B) 𝐜 (U+1D41C) 𝐝 (U+1D41D) 𝐞 (U+1D41E) 𝐟 (U+1D41F) 𝐠 (U+1D420) 𝐡 (U+1D421) 𝐢 (U+1D422) 𝐣 (U+1D423) 𝐤 (U+1D424) 𝐥 (U+1D425) 𝐦 (U+1D426) 𝐧 (U+1D427) 𝐨 (U+1D428) 𝐩 (U+1D429) 𝐪 (U+1D42A) 𝐫 (U+1D42B) 𝐬 (U+1D42C) 𝐭 (U+1D42D) 𝐮 (U+1D42E) 𝐯 (U+1D42F) 𝐰 (U+1D430) 𝐱 (U+1D431) 𝐲 (U+1D432) 𝐳 (U+1D433) 𝑎 (U+1D44E) 𝑏 (U+1D44F) 𝑐 (U+1D450) 𝑑 (U+1D451) 𝑒 (U+1D452) 𝑓 (U+1D453) 𝑔 (U+1D454) 𝑖 (U+1D456) 𝑗 (U+1D457) 𝑘 (U+1D458) 𝑙 (U+1D459) 𝑚 (U+1D45A) 𝑛 (U+1D45B) 𝑜 (U+1D45C) 𝑝 (U+1D45D) 𝑞 (U+1D45E) 𝑟 (U+1D45F) 𝑠 (U+1D460) 𝑡 (U+1D461) 𝑢 (U+1D462) 𝑣 (U+1D463) 𝑤 (U+1D464) 𝑥 (U+1D465) 𝑦 (U+1D466) 𝑧 (U+1D467) 𝒂 (U+1D482) 𝒃 (U+1D483) 𝒄 (U+1D484) 𝒅 (U+1D485) 𝒆 (U+1D486) 𝒇 (U+1D487) 𝒈 (U+1D488) 𝒉 (U+1D489) 𝒊 (U+1D48A) 𝒋 (U+1D48B) 𝒌 (U+1D48C) 𝒍 (U+1D48D) 𝒎 (U+1D48E) 𝒏 (U+1D48F) 𝒐 (U+1D490) 𝒑 (U+1D491) 𝒒 (U+1D492) 𝒓 (U+1D493) 𝒔 (U+1D494) 𝒕 (U+1D495) 𝒖 (U+1D496) 𝒗 (U+1D497) 𝒘 (U+1D498) 𝒙 (U+1D499) 𝒚 (U+1D49A) 𝒛 (U+1D49B) 𝒶 (U+1D4B6) 𝒷 (U+1D4B7) 𝒸 (U+1D4B8) 𝒹 (U+1D4B9) 𝒻 (U+1D4BB) 𝒽 (U+1D4BD) 𝒾 (U+1D4BE) 𝒿 (U+1D4BF) 𝓀 (U+1D4C0) 𝓁 (U+1D4C1) 𝓂 (U+1D4C2) 𝓃 (U+1D4C3) 𝓅 (U+1D4C5) 𝓆 (U+1D4C6) 𝓇 (U+1D4C7) 𝓈 (U+1D4C8) 𝓉 (U+1D4C9) 𝓊 (U+1D4CA) 𝓋 (U+1D4CB) 𝓌 (U+1D4CC) 𝓍 (U+1D4CD) 𝓎 (U+1D4CE) 𝓏 (U+1D4CF) 𝓪 (U+1D4EA) 𝓫 (U+1D4EB) 𝓬 (U+1D4EC) 𝓭 (U+1D4ED) 𝓮 (U+1D4EE) 𝓯 (U+1D4EF) 𝓰 (U+1D4F0) 𝓱 (U+1D4F1) 𝓲 (U+1D4F2) 𝓳 (U+1D4F3) 𝓴 (U+1D4F4) 𝓵 (U+1D4F5) 𝓶 (U+1D4F6) 𝓷 (U+1D4F7) 𝓸 (U+1D4F8) 𝓹 (U+1D4F9) 𝓺 (U+1D4FA) 𝓻 (U+1D4FB) 𝓼 (U+1D4FC) 𝓽 (U+1D4FD) 𝓾 (U+1D4FE) 𝓿 (U+1D4FF) 𝔀 (U+1D500) 𝔁 (U+1D501) 𝔂 (U+1D502) 𝔃 (U+1D503) 𝔞 (U+1D51E) 𝔟 (U+1D51F) 𝔠 (U+1D520) 𝔡 (U+1D521) 𝔢 (U+1D522) 𝔣 (U+1D523) 𝔤 (U+1D524) 𝔥 (U+1D525) 𝔦 (U+1D526) 𝔧 (U+1D527) 𝔨 (U+1D528) 𝔩 (U+1D529) 𝔪 (U+1D52A) 𝔫 (U+1D52B) 𝔬 (U+1D52C) 𝔭 (U+1D52D) 𝔮 (U+1D52E) 𝔯 (U+1D52F) 𝔰 (U+1D530) 𝔱 (U+1D531) 𝔲 (U+1D532) 𝔳 (U+1D533) 𝔴 (U+1D534) 𝔵 (U+1D535) 𝔶 (U+1D536) 𝔷 (U+1D537) 𝕒 (U+1D552) 𝕓 (U+1D553) 𝕔 (U+1D554) 𝕕 (U+1D555) 𝕖 (U+1D556) 𝕗 (U+1D557) 𝕘 (U+1D558) 𝕙 (U+1D559) 𝕚 (U+1D55A) 𝕛 (U+1D55B) 𝕜 (U+1D55C) 𝕝 (U+1D55D) 𝕞 (U+1D55E) 𝕟 (U+1D55F) 𝕠 (U+1D560) 𝕡 (U+1D561) 𝕢 (U+1D562) 𝕣 (U+1D563) 𝕤 (U+1D564) 𝕥 (U+1D565) 𝕦 (U+1D566) 𝕧 (U+1D567) 𝕨 (U+1D568) 𝕩 (U+1D569) 𝕪 (U+1D56A) 𝕫 (U+1D56B) 𝖆 (U+1D586) 𝖇 (U+1D587) 𝖈 (U+1D588) 𝖉 (U+1D589) 𝖊 (U+1D58A) 𝖋 (U+1D58B) 𝖌 (U+1D58C) 𝖍 (U+1D58D) 𝖎 (U+1D58E) 𝖏 (U+1D58F) 𝖐 (U+1D590) 𝖑 (U+1D591) 𝖒 (U+1D592) 𝖓 (U+1D593) 𝖔 (U+1D594) 𝖕 (U+1D595) 𝖖 (U+1D596) 𝖗 (U+1D597) 𝖘 (U+1D598) 𝖙 (U+1D599) 𝖚 (U+1D59A) 𝖛 (U+1D59B) 𝖜 (U+1D59C) 𝖝 (U+1D59D) 𝖞 (U+1D59E) 𝖟 (U+1D59F) 𝖺 (U+1D5BA) 𝖻 (U+1D5BB) 𝖼 (U+1D5BC) 𝖽 (U+1D5BD) 𝖾 (U+1D5BE) 𝖿 (U+1D5BF) 𝗀 (U+1D5C0) 𝗁 (U+1D5C1) 𝗂 (U+1D5C2) 𝗃 (U+1D5C3) 𝗄 (U+1D5C4) 𝗅 (U+1D5C5) 𝗆 (U+1D5C6) 𝗇 (U+1D5C7) 𝗈 (U+1D5C8) 𝗉 (U+1D5C9) 𝗊 (U+1D5CA) 𝗋 (U+1D5CB) 𝗌 (U+1D5CC) 𝗍 (U+1D5CD) 𝗎 (U+1D5CE) 𝗏 (U+1D5CF) 𝗐 (U+1D5D0) 𝗑 (U+1D5D1) 𝗒 (U+1D5D2) 𝗓 (U+1D5D3) 𝗮 (U+1D5EE) 𝗯 (U+1D5EF) 𝗰 (U+1D5F0) 𝗱 (U+1D5F1) 𝗲 (U+1D5F2) 𝗳 (U+1D5F3) 𝗴 (U+1D5F4) 𝗵 (U+1D5F5) 𝗶 (U+1D5F6) 𝗷 (U+1D5F7) 𝗸 (U+1D5F8) 𝗹 (U+1D5F9) 𝗺 (U+1D5FA) 𝗻 (U+1D5FB) 𝗼 (U+1D5FC) 𝗽 (U+1D5FD) 𝗾 (U+1D5FE) 𝗿 (U+1D5FF) 𝘀 (U+1D600) 𝘁 (U+1D601) 𝘂 (U+1D602) 𝘃 (U+1D603) 𝘄 (U+1D604) 𝘅 (U+1D605) 𝘆 (U+1D606) 𝘇 (U+1D607) 𝘢 (U+1D622) 𝘣 (U+1D623) 𝘤 (U+1D624) 𝘥 (U+1D625) 𝘦 (U+1D626) 𝘧 (U+1D627) 𝘨 (U+1D628) 𝘩 (U+1D629) 𝘪 (U+1D62A) 𝘫 (U+1D62B) 𝘬 (U+1D62C) 𝘭 (U+1D62D) 𝘮 (U+1D62E) 𝘯 (U+1D62F) 𝘰 (U+1D630) 𝘱 (U+1D631) 𝘲 (U+1D632) 𝘳 (U+1D633) 𝘴 (U+1D634) 𝘵 (U+1D635) 𝘶 (U+1D636) 𝘷 (U+1D637) 𝘸 (U+1D638) 𝘹 (U+1D639) 𝘺 (U+1D63A) 𝘻 (U+1D63B) 𝙖 (U+1D656) 𝙗 (U+1D657) 𝙘 (U+1D658) 𝙙 (U+1D659) 𝙚 (U+1D65A) 𝙛 (U+1D65B) 𝙜 (U+1D65C) 𝙝 (U+1D65D) 𝙞 (U+1D65E) 𝙟 (U+1D65F) 𝙠 (U+1D660) 𝙡 (U+1D661) 𝙢 (U+1D662) 𝙣 (U+1D663) 𝙤 (U+1D664) 𝙥 (U+1D665) 𝙦 (U+1D666) 𝙧 (U+1D667) 𝙨 (U+1D668) 𝙩 (U+1D669) 𝙪 (U+1D66A) 𝙫 (U+1D66B) 𝙬 (U+1D66C) 𝙭 (U+1D66D) 𝙮 (U+1D66E) 𝙯 (U+1D66F) 𝚊 (U+1D68A) 𝚋 (U+1D68B) 𝚌 (U+1D68C) 𝚍 (U+1D68D) 𝚎 (U+1D68E) 𝚏 (U+1D68F) 𝚐 (U+1D690) 𝚑 (U+1D691) 𝚒 (U+1D692) 𝚓 (U+1D693) 𝚔 (U+1D694) 𝚕 (U+1D695) 𝚖 (U+1D696) 𝚗 (U+1D697) 𝚘 (U+1D698) 𝚙 (U+1D699) 𝚚 (U+1D69A) 𝚛 (U+1D69B) 𝚜 (U+1D69C) 𝚝 (U+1D69D) 𝚞 (U+1D69E) 𝚟 (U+1D69F) 𝚠 (U+1D6A0) 𝚡 (U+1D6A1) 𝚢 (U+1D6A2) 𝚣 (U+1D6A3) 𝚤 (U+1D6A4) 𝚥 (U+1D6A5) 𝛂 (U+1D6C2) 𝛃 (U+1D6C3) 𝛄 (U+1D6C4) 𝛅 (U+1D6C5) 𝛆 (U+1D6C6) 𝛇 (U+1D6C7) 𝛈 (U+1D6C8) 𝛉 (U+1D6C9) 𝛊 (U+1D6CA) 𝛋 (U+1D6CB) 𝛌 (U+1D6CC) 𝛍 (U+1D6CD) 𝛎 (U+1D6CE) 𝛏 (U+1D6CF) 𝛐 (U+1D6D0) 𝛑 (U+1D6D1) 𝛒 (U+1D6D2) 𝛓 (U+1D6D3) 𝛔 (U+1D6D4) 𝛕 (U+1D6D5) 𝛖 (U+1D6D6) 𝛗 (U+1D6D7) 𝛘 (U+1D6D8) 𝛙 (U+1D6D9) 𝛚 (U+1D6DA) 𝛜 (U+1D6DC) 𝛝 (U+1D6DD) 𝛞 (U+1D6DE) 𝛟 (U+1D6DF) 𝛠 (U+1D6E0) 𝛡 (U+1D6E1) 𝛼 (U+1D6FC) 𝛽 (U+1D6FD) 𝛾 (U+1D6FE) 𝛿 (U+1D6FF) 𝜀 (U+1D700) 𝜁 (U+1D701) 𝜂 (U+1D702) 𝜃 (U+1D703) 𝜄 (U+1D704) 𝜅 (U+1D705) 𝜆 (U+1D706) 𝜇 (U+1D707) 𝜈 (U+1D708) 𝜉 (U+1D709) 𝜊 (U+1D70A) 𝜋 (U+1D70B) 𝜌 (U+1D70C) 𝜍 (U+1D70D) 𝜎 (U+1D70E) 𝜏 (U+1D70F) 𝜐 (U+1D710) 𝜑 (U+1D711) 𝜒 (U+1D712) 𝜓 (U+1D713) 𝜔 (U+1D714) 𝜖 (U+1D716) 𝜗 (U+1D717) 𝜘 (U+1D718) 𝜙 (U+1D719) 𝜚 (U+1D71A) 𝜛 (U+1D71B) 𝜶 (U+1D736) 𝜷 (U+1D737) 𝜸 (U+1D738) 𝜹 (U+1D739) 𝜺 (U+1D73A) 𝜻 (U+1D73B) 𝜼 (U+1D73C) 𝜽 (U+1D73D) 𝜾 (U+1D73E) 𝜿 (U+1D73F) 𝝀 (U+1D740) 𝝁 (U+1D741) 𝝂 (U+1D742) 𝝃 (U+1D743) 𝝄 (U+1D744) 𝝅 (U+1D745) 𝝆 (U+1D746) 𝝇 (U+1D747) 𝝈 (U+1D748) 𝝉 (U+1D749) 𝝊 (U+1D74A) 𝝋 (U+1D74B) 𝝌 (U+1D74C) 𝝍 (U+1D74D) 𝝎 (U+1D74E) 𝝐 (U+1D750) 𝝑 (U+1D751) 𝝒 (U+1D752) 𝝓 (U+1D753) 𝝔 (U+1D754) 𝝕 (U+1D755) 𝝰 (U+1D770) 𝝱 (U+1D771) 𝝲 (U+1D772) 𝝳 (U+1D773) 𝝴 (U+1D774) 𝝵 (U+1D775) 𝝶 (U+1D776) 𝝷 (U+1D777) 𝝸 (U+1D778) 𝝹 (U+1D779) 𝝺 (U+1D77A) 𝝻 (U+1D77B) 𝝼 (U+1D77C) 𝝽 (U+1D77D) 𝝾 (U+1D77E) 𝝿 (U+1D77F) 𝞀 (U+1D780) 𝞁 (U+1D781) 𝞂 (U+1D782) 𝞃 (U+1D783) 𝞄 (U+1D784) 𝞅 (U+1D785) 𝞆 (U+1D786) 𝞇 (U+1D787) 𝞈 (U+1D788) 𝞊 (U+1D78A) 𝞋 (U+1D78B) 𝞌 (U+1D78C) 𝞍 (U+1D78D) 𝞎 (U+1D78E) 𝞏 (U+1D78F) 𝞪 (U+1D7AA) 𝞫 (U+1D7AB) 𝞬 (U+1D7AC) 𝞭 (U+1D7AD) 𝞮 (U+1D7AE) 𝞯 (U+1D7AF) 𝞰 (U+1D7B0) 𝞱 (U+1D7B1) 𝞲 (U+1D7B2) 𝞳 (U+1D7B3) 𝞴 (U+1D7B4) 𝞵 (U+1D7B5) 𝞶 (U+1D7B6) 𝞷 (U+1D7B7) 𝞸 (U+1D7B8) 𝞹 (U+1D7B9) 𝞺 (U+1D7BA) 𝞻 (U+1D7BB) 𝞼 (U+1D7BC) 𝞽 (U+1D7BD) 𝞾 (U+1D7BE) 𝞿 (U+1D7BF) 𝟀 (U+1D7C0) 𝟁 (U+1D7C1) 𝟂 (U+1D7C2) 𝟄 (U+1D7C4) 𝟅 (U+1D7C5) 𝟆 (U+1D7C6) 𝟇 (U+1D7C7) 𝟈 (U+1D7C8) 𝟉 (U+1D7C9) 𝟋 (U+1D7CB) 𝼀 (U+1DF00) 𝼁 (U+1DF01) 𝼂 (U+1DF02) 𝼃 (U+1DF03) 𝼄 (U+1DF04) 𝼅 (U+1DF05) 𝼆 (U+1DF06) 𝼇 (U+1DF07) 𝼈 (U+1DF08) 𝼉 (U+1DF09) 𝼋 (U+1DF0B) 𝼌 (U+1DF0C) 𝼍 (U+1DF0D) 𝼎 (U+1DF0E) 𝼏 (U+1DF0F) 𝼐 (U+1DF10) 𝼑 (U+1DF11) 𝼒 (U+1DF12) 𝼓 (U+1DF13) 𝼔 (U+1DF14) 𝼕 (U+1DF15) 𝼖 (U+1DF16) 𝼗 (U+1DF17) 𝼘 (U+1DF18) 𝼙 (U+1DF19) 𝼚 (U+1DF1A) 𝼛 (U+1DF1B) 𝼜 (U+1DF1C) 𝼝 (U+1DF1D) 𝼞 (U+1DF1E) 𝼥 (U+1DF25) 𝼦 (U+1DF26) 𝼧 (U+1DF27) 𝼨 (U+1DF28) 𝼩 (U+1DF29) 𝼪 (U+1DF2A)
This means there are cased letters that are case-mapping invariant. Always-upper characters include:
Greek and Coptic
Variants of GREEK UPSILON: U+03D2 – U+03D4
Letterlike Symbols
EULER CONSTANT (U+2107)
DOUBLE-STRUCK (ITALIC) CAPITAL {C,H,N,P,Q,R,Z,GAMMA,PI,D}: U+2102, U+210D, U+2115, U+2119, U+211A, U+211D, U+2124, U+213E, U+213F, U+2145
SCRIPT CAPITAL {H,I,L,R,B,E,F,M}: U+210B, U+2110, U+2112, U+211B, U+212C, U+2130, U+2131, U+2133
BLACK-LETTER CAPITAL {H,I,R,Z,C}: U+210C, U+2111, U+211C, U+2128, U+212D
Mathematical Alphanumeric Symbols
MATHEMATICAL {BOLD,ITALIC,BOLD ITALIC,SCRIPT,BOLD SCRIPT,FRAKTUR,DOUBLE-STRUCK,BOLD FRAKTUR,SANS-SERIF,SANS-SERIF BOLD,SANS-SERIF ITALIC,SANS-SERIF BOLD ITALIC,MONOSPACE} CAPITAL Latin alphabet: U+1D400 – U+1D419, U+1D434 – U+1D44D, U+1D468 – U+1D481, U+1D49C – U+1D4B5, U+1D4D0 – U+1D4E9, U+1D504 – U+1D51C, U+1D538 – U+1D550, U+1D56C – U+1D585, U+1D5A0 – U+1D5B9, U+1D5D4 – U+1D5ED, U+1D608 – U+1D621, U+1D63C – U+1D655, U+1D670 – U+1D689
MATHEMATICAL {BOLD,ITALIC,BOLD ITALIC,SANS-SERIF BOLD,SANS-SERIF ITALIC} CAPITAL Greek alphabet: U+1D6A8 – U+1D6C0, U+1D6E2 – U+1D6FA, U+1D71C – U+1D734, U+1D756 – U+1D76E, U+1D790 – U+1D7A8
MATHEMATICAL BOLD CAPITAL DIGAMMA (U+1D7CA)
Always-lower characters include:
Latin Extended-A
LATIN SMALL LETTER KRA (U+0138)
Latin Extended-B
LATIN SMALL LETTER TURNED DELTA (U+018D)
LATIN SMALL LETTER LAMBDA WITH STROKE (U+019B)
LATIN LETTER REVERSED ESH LOOP (U+01AA)
LATIN SMALL LETTER T WITH PALATAL HOOK (U+01AB)
LATIN SMALL LETTER EZH WITH TAIL (U+01BA)
LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE (U+01BE)
LATIN SMALL LETTER {D,L,N,T} WITH CURL: U+0221, U+0234 – U+0236
LATIN SMALL LETTER DOTLESS J (U+0237)
LATIN SMALL LETTER {DB,QP} DIGRAPH: U+0238, U+0239
IPA Extensions
U+0250 – U+02AF, except 28 of them
Greek and Coptic
GREEK RHO WITH STROKE SYMBOL (U+03FC)
Armenian
ARMENIAN SMALL LETTER TURNED AYB (U+0560)
ARMENIAN SMALL LETTER YI WITH STROKE (U+0589)
Phonetic Extensions
Latin letters, Greek letters, and Cyrillic letter: U+1D00 – U+1D2B
Latin letter for American lexicography, Latin letters with middle tilde: U+1D6B – U+1D76
LATIN SMALL LETTER TURNED G (U+1D77)
Other phonetic symbols, except LATIN SMALL LETTER INSULAR G: U+1D7A – U+1D7F
Phonetic Extensions Supplement
Latin letters with palatal hook, except LATIN SMALL LETTER Z WITH PALATAL HOOK: U+1D80 – U+1D8D
Latin letters with retroflex hook: U+1D8F – U+1D9A
Latin Extended Additional
Mediavalist additions: U+1E9C, U+1E9D, U+1E9F
...
To make our discussions more meaningful, we will limit our future discussions to U U U and L L L instead of U e x t U_{ext} U e x t and L e x t L_{ext} L e x t , so that all sets in question are subsets of U C \mathbb{UC} UC . Upper(Lower) case letters that are case-mapping variant must be lower(upper)case variant, because we already showed that they are upper(lower)case in variant.
No . U e x t U_{ext} U e x t and L e x t L_{ext} L e x t are not partitions of U C \mathbb{UC} UC : there are characters that are uncased, but are case-mapping variant:
U C ∖ ( U e x t ∪ L e x t ) \mathbb{UC}\setminus(U_{ext}\cup L_{ext}) UC ∖ ( U e x t ∪ L e x t ) = Character set (116 ) Dž (U+01C5) Lj (U+01C8) Nj (U+01CB) Dz (U+01F2) ͅ (U+0345) ᾈ (U+1F88) ᾉ (U+1F89) ᾊ (U+1F8A) ᾋ (U+1F8B) ᾌ (U+1F8C) ᾍ (U+1F8D) ᾎ (U+1F8E) ᾏ (U+1F8F) ᾘ (U+1F98) ᾙ (U+1F99) ᾚ (U+1F9A) ᾛ (U+1F9B) ᾜ (U+1F9C) ᾝ (U+1F9D) ᾞ (U+1F9E) ᾟ (U+1F9F) ᾨ (U+1FA8) ᾩ (U+1FA9) ᾪ (U+1FAA) ᾫ (U+1FAB) ᾬ (U+1FAC) ᾭ (U+1FAD) ᾮ (U+1FAE) ᾯ (U+1FAF) ᾼ (U+1FBC) ῌ (U+1FCC) ῼ (U+1FFC) Ⅰ (U+2160) Ⅱ (U+2161) Ⅲ (U+2162) Ⅳ (U+2163) Ⅴ (U+2164) Ⅵ (U+2165) Ⅶ (U+2166) Ⅷ (U+2167) Ⅸ (U+2168) Ⅹ (U+2169) Ⅺ (U+216A) Ⅻ (U+216B) Ⅼ (U+216C) Ⅽ (U+216D) Ⅾ (U+216E) Ⅿ (U+216F) ⅰ (U+2170) ⅱ (U+2171) ⅲ (U+2172) ⅳ (U+2173) ⅴ (U+2174) ⅵ (U+2175) ⅶ (U+2176) ⅷ (U+2177) ⅸ (U+2178) ⅹ (U+2179) ⅺ (U+217A) ⅻ (U+217B) ⅼ (U+217C) ⅽ (U+217D) ⅾ (U+217E) ⅿ (U+217F) Ⓐ (U+24B6) Ⓑ (U+24B7) Ⓒ (U+24B8) Ⓓ (U+24B9) Ⓔ (U+24BA) Ⓕ (U+24BB) Ⓖ (U+24BC) Ⓗ (U+24BD) Ⓘ (U+24BE) Ⓙ (U+24BF) Ⓚ (U+24C0) Ⓛ (U+24C1) Ⓜ (U+24C2) Ⓝ (U+24C3) Ⓞ (U+24C4) Ⓟ (U+24C5) Ⓠ (U+24C6) Ⓡ (U+24C7) Ⓢ (U+24C8) Ⓣ (U+24C9) Ⓤ (U+24CA) Ⓥ (U+24CB) Ⓦ (U+24CC) Ⓧ (U+24CD) Ⓨ (U+24CE) Ⓩ (U+24CF) ⓐ (U+24D0) ⓑ (U+24D1) ⓒ (U+24D2) ⓓ (U+24D3) ⓔ (U+24D4) ⓕ (U+24D5) ⓖ (U+24D6) ⓗ (U+24D7) ⓘ (U+24D8) ⓙ (U+24D9) ⓚ (U+24DA) ⓛ (U+24DB) ⓜ (U+24DC) ⓝ (U+24DD) ⓞ (U+24DE) ⓟ (U+24DF) ⓠ (U+24E0) ⓡ (U+24E1) ⓢ (U+24E2) ⓣ (U+24E3) ⓤ (U+24E4) ⓥ (U+24E5) ⓦ (U+24E6) ⓧ (U+24E7) ⓨ (U+24E8) ⓩ (U+24E9)
They include G r U Gr_U G r U , L a La L a , and also:
Number Forms
Uppercase roman numerals (R o U Ro_U R o U ): U+2160 – U+216F
Small roman numerals (R o L Ro_L R o L ): U+2170 – U+217F
Enclosed Alphanumerics
Circled Latin letters (C i U Ci_U C i U ): U+24B6 – U+24CF
Circled small Latin letters (C i L Ci_L C i L ): U+24D0 – U+24E9
Combining Diacritical Marks
COMBINING GREEK YPOGEGRAMMENI (U+0345) (Previously mentioned)
No . There are characters that are both uppercase variant and lowercase variant:
{ c ∈ U C ∣ c ≠ t o U p p e r C a s e ( c ) ≠ t o L o w e r C a s e ( c ) } \{ c\in\mathbb{UC}\mid c\neq \mathtt{toUpperCase}(c) \neq \mathtt{toLowerCase}(c) \} { c ∈ UC ∣ c = toUpperCase ( c ) = toLowerCase ( c )} = L a ∪ G r U La\cup Gr_U L a ∪ G r U = Character set (31 ) Dž (U+01C5) → DŽ (U+01C4), dž (U+01C6) Lj (U+01C8) → LJ (U+01C7), lj (U+01C9) Nj (U+01CB) → NJ (U+01CA), nj (U+01CC) Dz (U+01F2) → DZ (U+01F1), dz (U+01F3) ᾈ (U+1F88) → ἈΙ (U+1F08 U+0399), ᾀ (U+1F80) ᾉ (U+1F89) → ἉΙ (U+1F09 U+0399), ᾁ (U+1F81) ᾊ (U+1F8A) → ἊΙ (U+1F0A U+0399), ᾂ (U+1F82) ᾋ (U+1F8B) → ἋΙ (U+1F0B U+0399), ᾃ (U+1F83) ᾌ (U+1F8C) → ἌΙ (U+1F0C U+0399), ᾄ (U+1F84) ᾍ (U+1F8D) → ἍΙ (U+1F0D U+0399), ᾅ (U+1F85) ᾎ (U+1F8E) → ἎΙ (U+1F0E U+0399), ᾆ (U+1F86) ᾏ (U+1F8F) → ἏΙ (U+1F0F U+0399), ᾇ (U+1F87) ᾘ (U+1F98) → ἨΙ (U+1F28 U+0399), ᾐ (U+1F90) ᾙ (U+1F99) → ἩΙ (U+1F29 U+0399), ᾑ (U+1F91) ᾚ (U+1F9A) → ἪΙ (U+1F2A U+0399), ᾒ (U+1F92) ᾛ (U+1F9B) → ἫΙ (U+1F2B U+0399), ᾓ (U+1F93) ᾜ (U+1F9C) → ἬΙ (U+1F2C U+0399), ᾔ (U+1F94) ᾝ (U+1F9D) → ἭΙ (U+1F2D U+0399), ᾕ (U+1F95) ᾞ (U+1F9E) → ἮΙ (U+1F2E U+0399), ᾖ (U+1F96) ᾟ (U+1F9F) → ἯΙ (U+1F2F U+0399), ᾗ (U+1F97) ᾨ (U+1FA8) → ὨΙ (U+1F68 U+0399), ᾠ (U+1FA0) ᾩ (U+1FA9) → ὩΙ (U+1F69 U+0399), ᾡ (U+1FA1) ᾪ (U+1FAA) → ὪΙ (U+1F6A U+0399), ᾢ (U+1FA2) ᾫ (U+1FAB) → ὫΙ (U+1F6B U+0399), ᾣ (U+1FA3) ᾬ (U+1FAC) → ὬΙ (U+1F6C U+0399), ᾤ (U+1FA4) ᾭ (U+1FAD) → ὭΙ (U+1F6D U+0399), ᾥ (U+1FA5) ᾮ (U+1FAE) → ὮΙ (U+1F6E U+0399), ᾦ (U+1FA6) ᾯ (U+1FAF) → ὯΙ (U+1F6F U+0399), ᾧ (U+1FA7) ᾼ (U+1FBC) → ΑΙ (U+0391 U+0399), ᾳ (U+1FB3) ῌ (U+1FCC) → ΗΙ (U+0397 U+0399), ῃ (U+1FC3) ῼ (U+1FFC) → ΩΙ (U+03A9 U+0399), ῳ (U+1FF3)
In addition, as shown before, these are also characters that cannot be produced by toUpperCase() or toLowerCase() with any input, including themselves.
We already mentioned that certain upper-/lower-case letters are mapping invariant. Furthermore, there are plenty of characters in M L ∪ M U ∪ N L ∪ N U M_L\cup M_U\cup N_L\cup N_U M L ∪ M U ∪ N L ∪ N U that are cased. Dropping those, the answer is yes . If the input is a Lowercase_Letter, the output of toUpperCase() is always an Uppercase_Letter. If the input is an Uppercase_Letter, the output of toLowerCase() is always a Lowercase_Letter.
t o U p p e r C a s e ( L ∖ M L ∖ N L ) ∖ U \mathtt{toUpperCase}(L\setminus M_L\setminus N_L) \setminus U toUpperCase ( L ∖ M L ∖ N L ) ∖ U = ∅
t o L o w e r C a s e ( U ∖ M U ∖ N U ) ∖ L \mathtt{toLowerCase}(U\setminus M_U\setminus N_U) \setminus L toLowerCase ( U ∖ M U ∖ N U ) ∖ L = ∅
(Again, disregarding multi-code-point characters) No and no (but yes , if you count case-mapping invariant but cased characters). U U U and L L L are proper subsets of R U \mathcal{R}_U R U and R L \mathcal{R}_L R L , respectively:
U ∖ R U U\setminus \mathcal{R}_U U ∖ R U = ∅
R U ∖ U ∖ M L ′ ∖ N L ′ \mathcal{R}_U\setminus U\setminus M_L'\setminus N_L' R U ∖ U ∖ M L ′ ∖ N L ′ = R o U ∪ C i U Ro_U\cup Ci_U R o U ∪ C i U = Character set (42 ) Ⅰ (U+2160) Ⅱ (U+2161) Ⅲ (U+2162) Ⅳ (U+2163) Ⅴ (U+2164) Ⅵ (U+2165) Ⅶ (U+2166) Ⅷ (U+2167) Ⅸ (U+2168) Ⅹ (U+2169) Ⅺ (U+216A) Ⅻ (U+216B) Ⅼ (U+216C) Ⅽ (U+216D) Ⅾ (U+216E) Ⅿ (U+216F) Ⓐ (U+24B6) Ⓑ (U+24B7) Ⓒ (U+24B8) Ⓓ (U+24B9) Ⓔ (U+24BA) Ⓕ (U+24BB) Ⓖ (U+24BC) Ⓗ (U+24BD) Ⓘ (U+24BE) Ⓙ (U+24BF) Ⓚ (U+24C0) Ⓛ (U+24C1) Ⓜ (U+24C2) Ⓝ (U+24C3) Ⓞ (U+24C4) Ⓟ (U+24C5) Ⓠ (U+24C6) Ⓡ (U+24C7) Ⓢ (U+24C8) Ⓣ (U+24C9) Ⓤ (U+24CA) Ⓥ (U+24CB) Ⓦ (U+24CC) Ⓧ (U+24CD) Ⓨ (U+24CE) Ⓩ (U+24CF)
L ∖ R L L\setminus \mathcal{R}_L L ∖ R L = ∅
R L ∖ L ∖ M U ′ ∖ N U ′ \mathcal{R}_L\setminus L\setminus M_U'\setminus N_U' R L ∖ L ∖ M U ′ ∖ N U ′ = R o L ∪ C i L ∪ { U+0345 } Ro_L\cup Ci_L\cup\{\text{U+0345}\} R o L ∪ C i L ∪ { U+0345 } = Character set (43 ) ͅ (U+0345) ⅰ (U+2170) ⅱ (U+2171) ⅲ (U+2172) ⅳ (U+2173) ⅴ (U+2174) ⅵ (U+2175) ⅶ (U+2176) ⅷ (U+2177) ⅸ (U+2178) ⅹ (U+2179) ⅺ (U+217A) ⅻ (U+217B) ⅼ (U+217C) ⅽ (U+217D) ⅾ (U+217E) ⅿ (U+217F) ⓐ (U+24D0) ⓑ (U+24D1) ⓒ (U+24D2) ⓓ (U+24D3) ⓔ (U+24D4) ⓕ (U+24D5) ⓖ (U+24D6) ⓗ (U+24D7) ⓘ (U+24D8) ⓙ (U+24D9) ⓚ (U+24DA) ⓛ (U+24DB) ⓜ (U+24DC) ⓝ (U+24DD) ⓞ (U+24DE) ⓟ (U+24DF) ⓠ (U+24E0) ⓡ (U+24E1) ⓢ (U+24E2) ⓣ (U+24E3) ⓤ (U+24E4) ⓥ (U+24E5) ⓦ (U+24E6) ⓧ (U+24E7) ⓨ (U+24E8) ⓩ (U+24E9)
Uncased letters produced by to{Upper,Lower}Case are the same sets as discussed before: those characters that are uncased but case-mapping variant. To produce an uncased output, the input must be uncased too.
On the other hand, L L L and R U \mathcal{R}_U R U , U U U and R L \mathcal{R}_L R L are disjoint:
L ∩ R U L\cap \mathcal{R}_U L ∩ R U = ∅
U ∩ R L U\cap \mathcal{R}_L U ∩ R L = ∅
So toUpperCase() never produces a Lowercase_Letter, and toLowerCase() never produces an Uppercase_Letter.
Yes . Uncased letters may become cased after case mapping:
t o U p p e r C a s e ( U C ∖ ( U ∪ L ) ) ∩ U \mathtt{toUpperCase}\left(\mathbb{UC}\setminus (U\cup L)\right) \cap U toUpperCase ( UC ∖ ( U ∪ L ) ) ∩ U = t o U p p e r C a s e ( L a ∪ { U+0345 } ) \mathtt{toUpperCase}(La\cup \{\text{U+0345}\}) toUpperCase ( L a ∪ { U+0345 }) = Character set (5 ) DŽ (U+01C4) LJ (U+01C7) NJ (U+01CA) DZ (U+01F1) Ι (U+0399)
t o L o w e r C a s e ( U C ∖ ( U ∪ L ) ) ∩ L \mathtt{toLowerCase}\left(\mathbb{UC}\setminus (U\cup L)\right) \cap L toLowerCase ( UC ∖ ( U ∪ L ) ) ∩ L = t o L o w e r C a s e ( L a ∪ G r U ) \mathtt{toLowerCase}(La\cup Gr_U) toLowerCase ( L a ∪ G r U ) = Character set (31 ) dž (U+01C6) lj (U+01C9) nj (U+01CC) dz (U+01F3) ᾀ (U+1F80) ᾁ (U+1F81) ᾂ (U+1F82) ᾃ (U+1F83) ᾄ (U+1F84) ᾅ (U+1F85) ᾆ (U+1F86) ᾇ (U+1F87) ᾐ (U+1F90) ᾑ (U+1F91) ᾒ (U+1F92) ᾓ (U+1F93) ᾔ (U+1F94) ᾕ (U+1F95) ᾖ (U+1F96) ᾗ (U+1F97) ᾠ (U+1FA0) ᾡ (U+1FA1) ᾢ (U+1FA2) ᾣ (U+1FA3) ᾤ (U+1FA4) ᾥ (U+1FA5) ᾦ (U+1FA6) ᾧ (U+1FA7) ᾳ (U+1FB3) ῃ (U+1FC3) ῳ (U+1FF3)
These characters are also the characters that are uncased but case-mapping variant.
To summarize:
Input and output cases of toUpperCase when the input is...
Input case\Output case Upper case Lower case Uncased Upper case U e x t U_{ext} U e x t (identity)Never Never Lower case L L L L e x t ∖ L L_{ext}\setminus L L e x t ∖ L (identity)Never Uncased L a La L a , U+0345Never R o L Ro_L R o L , C i L Ci_L C i L , other identities
Input and output cases of toLowerCase when the input is...
Input case\Output case Upper case Lower case Uncased Upper case U e x t ∖ U U_{ext}\setminus U U e x t ∖ U (identity)U U U Never Lower case Never L e x t L_{ext} L e x t (identity)Never Uncased Never L a La L a , G r U Gr_U G r U R o U Ro_U R o U , C i U Ci_U C i U , other identities
We now focus on these particular subsets:
M L M_L M L
Lower case (thus lowercase invariant): M L ∩ ( U C ∖ L ) M_L\cap (\mathbb{UC}\setminus L) M L ∩ ( UC ∖ L ) = ∅
M U M_U M U
Upper case (thus uppercase invariant): M U ∩ ( U C ∖ U ) M_U\cap (\mathbb{UC}\setminus U) M U ∩ ( UC ∖ U ) = ∅
N L N_L N L
We already mentioned that G r U Gr_U G r U is uncased, and both uppercase and lowercase variant. N L ∖ G r U N_L\setminus Gr_U N L ∖ G r U is lower case (thus lowercase invariant): N L ∖ G r U ∩ ( U C ∖ L ) N_L\setminus Gr_U\cap (\mathbb{UC}\setminus L) N L ∖ G r U ∩ ( UC ∖ L ) = ∅
N U N_U N U
Define U C S = U C ∖ M U ∖ N U ∖ M L ∖ N L \mathbb{UC}_S = \mathbb{UC}\setminus M_U\setminus N_U\setminus M_L\setminus N_L UC S = UC ∖ M U ∖ N U ∖ M L ∖ N L . So, we may characterize the domain and codomain of toUpperCase() and toLowerCase() as the following, where each piece has a disjoint domain and each except the last has a disjoint codomain:
t o U p p e r C a s e : { M L ′ → M L ′ (Identity) M U ′ → M U M U → M U (Identity) M L → M L ′ N U → N U (Identity) N L → N L ′ I → I (Identity) U C S → U C ( ∗ ) t o L o w e r C a s e : { M L ′ → M L M U ′ → M U ′ (Identity) M U → M U ′ M L → M L (Identity) N U → N U ′ N L ∖ G r U → N L ∖ G r U (Identity) G r U → G r L I → I (Identity) U C S → U C ( ∗ ) \begin{aligned}
\mathtt{toUpperCase}&: \begin{cases}
M_L'\to M_L'&\text{(Identity)}\\
M_U'\to M_U\\
M_U\to M_U&\text{(Identity)}\\
M_L\to M_L'\\
N_U\to N_U&\text{(Identity)}\\
N_L\to N_L'\\
I\to I&\text{(Identity)}\\
\mathbb{UC}_S\to\mathbb{UC}&(*)\\
\end{cases}\\
\mathtt{toLowerCase}&: \begin{cases}
M_L'\to M_L\\
M_U'\to M_U'&\text{(Identity)}\\
M_U\to M_U'\\
M_L\to M_L&\text{(Identity)}\\
N_U\to N_U'\\
N_L\setminus Gr_U\to N_L\setminus Gr_U&\text{(Identity)}\\
Gr_U\to Gr_L\\
I\to I&\text{(Identity)}\\
\mathbb{UC}_S\to\mathbb{UC}&(*)\\
\end{cases}
\end{aligned} toUpperCase toLowerCase : ⎩ ⎨ ⎧ M L ′ → M L ′ M U ′ → M U M U → M U M L → M L ′ N U → N U N L → N L ′ I → I UC S → UC (Identity) (Identity) (Identity) (Identity) ( ∗ ) : ⎩ ⎨ ⎧ M L ′ → M L M U ′ → M U ′ M U → M U ′ M L → M L N U → N U ′ N L ∖ G r U → N L ∖ G r U G r U → G r L I → I UC S → UC (Identity) (Identity) (Identity) (Identity) ( ∗ )
There are other cases where multiple code points can be mapped to single code points, but they are not of our interest. We will discuss these multi-code-point characters soon.
We wonder if characters in U C S \mathbb{UC}_S UC S always stay in U C S \mathbb{UC}_S UC S . In order to narrow the codomain of the pieces marked with (*), we want to find if there are characters c ∈ U C S c\in \mathbb{UC}_S c ∈ UC S such that t o U p p e r C a s e ( c ) ∈ M U ∪ M L ∪ N U ∪ N L \mathtt{toUpperCase}(c)\in M_U\cup M_L\cup N_U\cup N_L toUpperCase ( c ) ∈ M U ∪ M L ∪ N U ∪ N L , or t o L o w e r C a s e ( c ) ∈ M U ∪ M L ∪ N U ∪ N L \mathtt{toLowerCase}(c)\in M_U\cup M_L\cup N_U\cup N_L toLowerCase ( c ) ∈ M U ∪ M L ∪ N U ∪ N L .
{ c ∈ U C s ∣ t o U p p e r C a s e ( c ) ∈ M U ∪ M L ∪ N U ∪ N L } \{ c\in \mathbb{UC}_s\mid \mathtt{toUpperCase}(c)\in M_U\cup M_L\cup N_U\cup N_L \} { c ∈ UC s ∣ toUpperCase ( c ) ∈ M U ∪ M L ∪ N U ∪ N L } = ∅
{ c ∈ U C s ∣ t o L o w e r C a s e ( c ) ∈ M U ∪ M L ∪ N U ∪ N L } \{ c\in \mathbb{UC}_s\mid \mathtt{toLowerCase}(c)\in M_U\cup M_L\cup N_U\cup N_L \} { c ∈ UC s ∣ toLowerCase ( c ) ∈ M U ∪ M L ∪ N U ∪ N L } = Character set (1 ) ẞ (U+1E9E)
There is exactly one: LATIN CAPITAL LETTER SHARP S (U+1E9E). The mappings of this character are:
ẞ → t o U p p e r C a s e \xrightarrow{\mathtt{toUpperCase}} toUpperCase ẞ (U+1E9E)
ẞ → t o L o w e r C a s e \xrightarrow{\mathtt{toLowerCase}} toLowerCase ß (U+00DF) → t o U p p e r C a s e \xrightarrow{\mathtt{toUpperCase}} toUpperCase SS (U+0053 U+0053)
i.e. it maps to a character in N L N_L N L . There's no other character that maps to U+00DF:
{ c ∈ U C ∣ t o L o w e r C a s e ( c ) = U+00DF } \{ c\in \mathbb{UC}\mid \mathtt{toLowerCase}(c) = \text{U+00DF} \} { c ∈ UC ∣ toLowerCase ( c ) = U+00DF } = Character set (2 ) ß (U+00DF) ẞ (U+1E9E)
The characters that map to t o L o w e r C a s e ( G r U ) \mathtt{toLowerCase}(Gr_U) toLowerCase ( G r U ) are exactly G r L Gr_L G r L and G r U Gr_U G r U :
{ c ∈ U C ∣ t o L o w e r C a s e ( c ) ∈ t o L o w e r C a s e ( G r U ) } \{ c\in \mathbb{UC}\mid \mathtt{toLowerCase}(c) \in \mathtt{toLowerCase}(Gr_U) \} { c ∈ UC ∣ toLowerCase ( c ) ∈ toLowerCase ( G r U )} = Character set (54 ) ᾀ (U+1F80) ᾁ (U+1F81) ᾂ (U+1F82) ᾃ (U+1F83) ᾄ (U+1F84) ᾅ (U+1F85) ᾆ (U+1F86) ᾇ (U+1F87) ᾈ (U+1F88) ᾉ (U+1F89) ᾊ (U+1F8A) ᾋ (U+1F8B) ᾌ (U+1F8C) ᾍ (U+1F8D) ᾎ (U+1F8E) ᾏ (U+1F8F) ᾐ (U+1F90) ᾑ (U+1F91) ᾒ (U+1F92) ᾓ (U+1F93) ᾔ (U+1F94) ᾕ (U+1F95) ᾖ (U+1F96) ᾗ (U+1F97) ᾘ (U+1F98) ᾙ (U+1F99) ᾚ (U+1F9A) ᾛ (U+1F9B) ᾜ (U+1F9C) ᾝ (U+1F9D) ᾞ (U+1F9E) ᾟ (U+1F9F) ᾠ (U+1FA0) ᾡ (U+1FA1) ᾢ (U+1FA2) ᾣ (U+1FA3) ᾤ (U+1FA4) ᾥ (U+1FA5) ᾦ (U+1FA6) ᾧ (U+1FA7) ᾨ (U+1FA8) ᾩ (U+1FA9) ᾪ (U+1FAA) ᾫ (U+1FAB) ᾬ (U+1FAC) ᾭ (U+1FAD) ᾮ (U+1FAE) ᾯ (U+1FAF) ᾳ (U+1FB3) ᾼ (U+1FBC) ῃ (U+1FC3) ῌ (U+1FCC) ῳ (U+1FF3) ῼ (U+1FFC)
Then we can refine the domain and codomain of toUpperCase() and toLowerCase() as the following, so that each piece has a disjoint domain and codomain:
t o U p p e r C a s e : { M L ′ → M L ′ (Identity) M U ′ → M U M U → M U (Identity) M L → M L ′ N U → N U (Identity) N L → N L ′ I → I (Identity) U C S → U C S t o L o w e r C a s e : { M L ′ → M L M U ′ → M U ′ (Identity) M U → M U ′ M L → M L (Identity) N U → N U ′ N L ∖ G r U → N L ∖ G r U (Identity) G r U → G r L G r L → G r L (Identity) I → I (Identity) { U+1E9E , U+00DF } → { U+00DF } U C S ∖ G r ∖ { U+1E9E , U+00DF } → U C S ∖ G r ∖ { U+1E9E , U+00DF } \begin{aligned}
\mathtt{toUpperCase}&: \begin{cases}
M_L'\to M_L'&\text{(Identity)}\\
M_U'\to M_U\\
M_U\to M_U&\text{(Identity)}\\
M_L\to M_L'\\
N_U\to N_U&\text{(Identity)}\\
N_L\to N_L'\\
I\to I&\text{(Identity)}\\
\mathbb{UC}_S\to\mathbb{UC}_S\\
\end{cases}\\
\mathtt{toLowerCase}&: \begin{cases}
M_L'\to M_L\\
M_U'\to M_U'&\text{(Identity)}\\
M_U\to M_U'\\
M_L\to M_L&\text{(Identity)}\\
N_U\to N_U'\\
N_L\setminus Gr_U\to N_L\setminus Gr_U&\text{(Identity)}\\
Gr_U\to Gr_L\\
Gr_L\to Gr_L&\text{(Identity)}\\
I\to I&\text{(Identity)}\\
\{\text{U+1E9E}, \text{U+00DF}\}\to \{\text{U+00DF}\}\\
\mathbb{UC}_S\setminus Gr\setminus \{\text{U+1E9E}, \text{U+00DF}\}\to\mathbb{UC}_S\setminus Gr\setminus \{\text{U+1E9E}, \text{U+00DF}\}\\
\end{cases}
\end{aligned} toUpperCase toLowerCase : ⎩ ⎨ ⎧ M L ′ → M L ′ M U ′ → M U M U → M U M L → M L ′ N U → N U N L → N L ′ I → I UC S → UC S (Identity) (Identity) (Identity) (Identity) : ⎩ ⎨ ⎧ M L ′ → M L M U ′ → M U ′ M U → M U ′ M L → M L N U → N U ′ N L ∖ G r U → N L ∖ G r U G r U → G r L G r L → G r L I → I { U+1E9E , U+00DF } → { U+00DF } UC S ∖ G r ∖ { U+1E9E , U+00DF } → UC S ∖ G r ∖ { U+1E9E , U+00DF } (Identity) (Identity) (Identity) (Identity) (Identity)
To study the injectivity/surjectivity of case mapping, we introduce the concept of a mapping graph , a directed graph ( U C S ∪ M L ∪ M U ∪ N L ∪ N U ∪ M L ′ ∪ M U ′ ∪ N L ′ ∪ N U ′ ∪ I , E u ∪ E l ) (\mathbb{UC}_S\cup M_L\cup M_U\cup N_L\cup N_U\cup M_L'\cup M_U'\cup N_L'\cup N_U'\cup I, E_u\cup E_l) ( UC S ∪ M L ∪ M U ∪ N L ∪ N U ∪ M L ′ ∪ M U ′ ∪ N L ′ ∪ N U ′ ∪ I , E u ∪ E l ) . The vertices are all characters that we discussed, and the edges have two colors: ( c 1 , c 2 ) ∈ E u (c_1, c_2)\in E_u ( c 1 , c 2 ) ∈ E u iff t o U p p e r C a s e ( c 1 ) = c 2 \mathtt{toUpperCase}(c_1) = c_2 toUpperCase ( c 1 ) = c 2 , ( c 1 , c 2 ) ∈ E l (c_1, c_2)\in E_l ( c 1 , c 2 ) ∈ E l iff t o L o w e r C a s e ( c 1 ) = c 2 \mathtt{toLowerCase}(c_1) = c_2 toLowerCase ( c 1 ) = c 2 . Therefore, we can reformulate case-mapping variance as:
A node is case-mapping invariant iff both of its out edges are self-loops. These are nodes in I I I .
A node is uppercase invariant iff its out u u u -edge is a self-loop.
A node is lowercase invariant iff its out l l l -edge is a self-loop.
A node is both uppercase and lowercase variant iff both of its out edges are not self-loops. These are the 31 characters mentioned before : L a ∪ G r U La\cup Gr_U L a ∪ G r U .
We also have the following observations:
Edges can be self-referential. Each node except those in N L ′ N_L' N L ′ and N U ′ N_U' N U ′ has exactly one out u u u -edge and exactly one out l l l -edge. N L ′ N_L' N L ′ has a self-referential out u u u -edge and no out l l l -edge. N U ′ N_U' N U ′ has a self-referential out l l l -edge and no out u u u -edge.
Each node has zero or more in edges colored either u u u or l l l .
Idempotence: if a node has a non-self-referential in u u u -edge, then its out u u u -edge is self-referential. Similarly, if a node has a non-self-referential in l l l -edge, then its out l l l -edge is self-referential.
Complementary ranges: if a node has a non-self-referential in u u u -edge, then it has no non-self-referential in l l l -edge. Similarly, if a node has a non-self-referential in l l l -edge, then it has no non-self-referential in u u u -edge.
Closedness of I I I : if a node has a non-self-referential in u u u -edge, then it has no out l l l -edge or its out l l l -edge is non-self-referential. Similarly, if a node has a non-self-referential in edge colored l l l , then it has no out u u u -edge or its out u u u -edge is non-self-referential.
We already established that:
Nodes in I I I have no in edges from other nodes.
Nodes in L a ∪ G r U La\cup Gr_U L a ∪ G r U have no in edges from other nodes.
Therefore, ignoring these nodes, the graph is inherently bipartite:
Each uppercase invariant node points to a lowercase invariant node with an l l l -edge and points to itself with a u u u -edge. It has zero or more in u u u -edges and no in l l l -edge.
Each lowercase invariant node points to an uppercase invariant node with a u u u -edge and points to itself with an l l l -edge. It has zero or more in l l l -edges and no in u u u -edge.
A connected subgraph , called a cluster , formed by c c c is recursively defined:
c c c is in the cluster.
If c ′ c' c ′ is in the cluster, then all nodes that point to c ′ c' c ′ and those that c ′ c' c ′ points to are all in the cluster.
Nodes that are in the cluster formed by c ∈ N L ′ ∪ N U ′ c\in N_L'\cup N_U' c ∈ N L ′ ∪ N U ′ do not always form cycles through some series of toUpperCase/toLowerCase transformations, because they eventually map to a node in N L ′ ∪ N U ′ N_L'\cup N_U' N L ′ ∪ N U ′ , which has no out l l l -edge or no out u u u -edge.
Character set (51 ) SS (U+0053 U+0053), ß (U+00DF), ẞ (U+1E9E) ʼN (U+02BC U+004E), ʼn (U+0149) ԵՒ (U+0535 U+0552), և (U+0587) Aʾ (U+0041 U+02BE), ẚ (U+1E9A) ἈΙ (U+1F08 U+0399), ᾀ (U+1F80), ᾈ (U+1F88) ἉΙ (U+1F09 U+0399), ᾁ (U+1F81), ᾉ (U+1F89) ἊΙ (U+1F0A U+0399), ᾂ (U+1F82), ᾊ (U+1F8A) ἋΙ (U+1F0B U+0399), ᾃ (U+1F83), ᾋ (U+1F8B) ἌΙ (U+1F0C U+0399), ᾄ (U+1F84), ᾌ (U+1F8C) ἍΙ (U+1F0D U+0399), ᾅ (U+1F85), ᾍ (U+1F8D) ἎΙ (U+1F0E U+0399), ᾆ (U+1F86), ᾎ (U+1F8E) ἏΙ (U+1F0F U+0399), ᾇ (U+1F87), ᾏ (U+1F8F) ἨΙ (U+1F28 U+0399), ᾐ (U+1F90), ᾘ (U+1F98) ἩΙ (U+1F29 U+0399), ᾑ (U+1F91), ᾙ (U+1F99) ἪΙ (U+1F2A U+0399), ᾒ (U+1F92), ᾚ (U+1F9A) ἫΙ (U+1F2B U+0399), ᾓ (U+1F93), ᾛ (U+1F9B) ἬΙ (U+1F2C U+0399), ᾔ (U+1F94), ᾜ (U+1F9C) ἭΙ (U+1F2D U+0399), ᾕ (U+1F95), ᾝ (U+1F9D) ἮΙ (U+1F2E U+0399), ᾖ (U+1F96), ᾞ (U+1F9E) ἯΙ (U+1F2F U+0399), ᾗ (U+1F97), ᾟ (U+1F9F) ὨΙ (U+1F68 U+0399), ᾠ (U+1FA0), ᾨ (U+1FA8) ὩΙ (U+1F69 U+0399), ᾡ (U+1FA1), ᾩ (U+1FA9) ὪΙ (U+1F6A U+0399), ᾢ (U+1FA2), ᾪ (U+1FAA) ὫΙ (U+1F6B U+0399), ᾣ (U+1FA3), ᾫ (U+1FAB) ὬΙ (U+1F6C U+0399), ᾤ (U+1FA4), ᾬ (U+1FAC) ὭΙ (U+1F6D U+0399), ᾥ (U+1FA5), ᾭ (U+1FAD) ὮΙ (U+1F6E U+0399), ᾦ (U+1FA6), ᾮ (U+1FAE) ὯΙ (U+1F6F U+0399), ᾧ (U+1FA7), ᾯ (U+1FAF) ᾺΙ (U+1FBA U+0399), ᾲ (U+1FB2) ΑΙ (U+0391 U+0399), ᾳ (U+1FB3), ᾼ (U+1FBC) ΆΙ (U+0386 U+0399), ᾴ (U+1FB4) Α͂Ι (U+0391 U+0342 U+0399), ᾷ (U+1FB7) ῊΙ (U+1FCA U+0399), ῂ (U+1FC2) ΗΙ (U+0397 U+0399), ῃ (U+1FC3), ῌ (U+1FCC) ΉΙ (U+0389 U+0399), ῄ (U+1FC4) Η͂Ι (U+0397 U+0342 U+0399), ῇ (U+1FC7) ῺΙ (U+1FFA U+0399), ῲ (U+1FF2) ΩΙ (U+03A9 U+0399), ῳ (U+1FF3), ῼ (U+1FFC) ΏΙ (U+038F U+0399), ῴ (U+1FF4) Ω͂Ι (U+03A9 U+0342 U+0399), ῷ (U+1FF7) FF (U+0046 U+0046), ff (U+FB00) FI (U+0046 U+0049), fi (U+FB01) FL (U+0046 U+004C), fl (U+FB02) FFI (U+0046 U+0046 U+0049), ffi (U+FB03) FFL (U+0046 U+0046 U+004C), ffl (U+FB04) ST (U+0053 U+0054), ſt (U+FB05), st (U+FB06) ՄՆ (U+0544 U+0546), ﬓ (U+FB13) ՄԵ (U+0544 U+0535), ﬔ (U+FB14) ՄԻ (U+0544 U+053B), ﬕ (U+FB15) ՎՆ (U+054E U+0546), ﬖ (U+FB16) ՄԽ (U+0544 U+053D), ﬗ (U+FB17)
Define a mapping pair as a cluster of size 2. A mapping pair ( c u , c l ) (c_u, c_l) ( c u , c l ) (c u ≠ c l c_u\neq c_l c u = c l ) satisfies:
( c u , c l ) ∈ E l (c_u, c_l)\in E_l ( c u , c l ) ∈ E l (the l l l -edge of c u c_u c u points to c l c_l c l )
( c l , c u ) ∈ E u (c_l, c_u)\in E_u ( c l , c u ) ∈ E u (the u u u -edge of c l c_l c l points to c u c_u c u )
∄ c ≠ c u \nexists c\neq c_u ∄ c = c u such that ( c , c u ) ∈ E u (c, c_u)\in E_u ( c , c u ) ∈ E u or ( c , c l ) ∈ E l (c, c_l)\in E_l ( c , c l ) ∈ E l (no other node points to c u c_u c u or c l c_l c l )
There are 1386 such pairs. Of these, 1322 are pairs of Uppercase_Letter and Lowercase_Letter, and the rest are:
Character set (64 ) İ (U+0130) — i̇ (U+0069 U+0307) ǰ (U+01F0) — J̌ (U+004A U+030C) ΐ (U+0390) — Ϊ́ (U+03AA U+0301) ΰ (U+03B0) — Ϋ́ (U+03AB U+0301) ẖ (U+1E96) — H̱ (U+0048 U+0331) ẗ (U+1E97) — T̈ (U+0054 U+0308) ẘ (U+1E98) — W̊ (U+0057 U+030A) ẙ (U+1E99) — Y̊ (U+0059 U+030A) ὐ (U+1F50) — Υ̓ (U+03A5 U+0313) ὒ (U+1F52) — Υ̓̀ (U+03A5 U+0313 U+0300) ὔ (U+1F54) — Υ̓́ (U+03A5 U+0313 U+0301) ὖ (U+1F56) — Υ̓͂ (U+03A5 U+0313 U+0342) ᾶ (U+1FB6) — Α͂ (U+0391 U+0342) ῆ (U+1FC6) — Η͂ (U+0397 U+0342) ῒ (U+1FD2) — Ϊ̀ (U+03AA U+0300) ῖ (U+1FD6) — Ι͂ (U+0399 U+0342) ῗ (U+1FD7) — Ϊ͂ (U+03AA U+0342) ῢ (U+1FE2) — Ϋ̀ (U+03AB U+0300) ῤ (U+1FE4) — Ρ̓ (U+03A1 U+0313) ῦ (U+1FE6) — Υ͂ (U+03A5 U+0342) ῧ (U+1FE7) — Ϋ͂ (U+03AB U+0342) ῶ (U+1FF6) — Ω͂ (U+03A9 U+0342) Ⅰ (U+2160) — ⅰ (U+2170) Ⅱ (U+2161) — ⅱ (U+2171) Ⅲ (U+2162) — ⅲ (U+2172) Ⅳ (U+2163) — ⅳ (U+2173) Ⅴ (U+2164) — ⅴ (U+2174) Ⅵ (U+2165) — ⅵ (U+2175) Ⅶ (U+2166) — ⅶ (U+2176) Ⅷ (U+2167) — ⅷ (U+2177) Ⅸ (U+2168) — ⅸ (U+2178) Ⅹ (U+2169) — ⅹ (U+2179) Ⅺ (U+216A) — ⅺ (U+217A) Ⅻ (U+216B) — ⅻ (U+217B) Ⅼ (U+216C) — ⅼ (U+217C) Ⅽ (U+216D) — ⅽ (U+217D) Ⅾ (U+216E) — ⅾ (U+217E) Ⅿ (U+216F) — ⅿ (U+217F) Ⓐ (U+24B6) — ⓐ (U+24D0) Ⓑ (U+24B7) — ⓑ (U+24D1) Ⓒ (U+24B8) — ⓒ (U+24D2) Ⓓ (U+24B9) — ⓓ (U+24D3) Ⓔ (U+24BA) — ⓔ (U+24D4) Ⓕ (U+24BB) — ⓕ (U+24D5) Ⓖ (U+24BC) — ⓖ (U+24D6) Ⓗ (U+24BD) — ⓗ (U+24D7) Ⓘ (U+24BE) — ⓘ (U+24D8) Ⓙ (U+24BF) — ⓙ (U+24D9) Ⓚ (U+24C0) — ⓚ (U+24DA) Ⓛ (U+24C1) — ⓛ (U+24DB) Ⓜ (U+24C2) — ⓜ (U+24DC) Ⓝ (U+24C3) — ⓝ (U+24DD) Ⓞ (U+24C4) — ⓞ (U+24DE) Ⓟ (U+24C5) — ⓟ (U+24DF) Ⓠ (U+24C6) — ⓠ (U+24E0) Ⓡ (U+24C7) — ⓡ (U+24E1) Ⓢ (U+24C8) — ⓢ (U+24E2) Ⓣ (U+24C9) — ⓣ (U+24E3) Ⓤ (U+24CA) — ⓤ (U+24E4) Ⓥ (U+24CB) — ⓥ (U+24E5) Ⓦ (U+24CC) — ⓦ (U+24E6) Ⓧ (U+24CD) — ⓧ (U+24E7) Ⓨ (U+24CE) — ⓨ (U+24E8) Ⓩ (U+24CF) — ⓩ (U+24E9)
Which are roman numerals ( R o U , R o L ) (Ro_U, Ro_L) ( R o U , R o L ) , circled letters ( C i U , C i L ) (Ci_U, Ci_L) ( C i U , C i L ) , and ( M L ′ , M L ) (M_L', M_L) ( M L ′ , M L ) , ( M U , M U ′ ) (M_U, M_U') ( M U , M U ′ ) .
Now, the remaining nodes are the complex cycles that neither have dead ends nor are simple mapping pairs. They are:
Character set (25 ) I (U+0049), i (U+0069), ı (U+0131) S (U+0053), s (U+0073), ſ (U+017F) µ (U+00B5), Μ (U+039C), μ (U+03BC) DŽ (U+01C4), Dž (U+01C5), dž (U+01C6) LJ (U+01C7), Lj (U+01C8), lj (U+01C9) NJ (U+01CA), Nj (U+01CB), nj (U+01CC) DZ (U+01F1), Dz (U+01F2), dz (U+01F3) ͅ (U+0345), Ι (U+0399), ι (U+03B9) Β (U+0392), β (U+03B2), ϐ (U+03D0) Ε (U+0395), ε (U+03B5), ϵ (U+03F5) Θ (U+0398), θ (U+03B8), ϴ (U+03F4), ϑ (U+03D1) Κ (U+039A), κ (U+03BA), ϰ (U+03F0) Π (U+03A0), π (U+03C0), ϖ (U+03D6) Ρ (U+03A1), ρ (U+03C1), ϱ (U+03F1) Σ (U+03A3), ς (U+03C2), σ (U+03C3) Φ (U+03A6), φ (U+03C6), ϕ (U+03D5) В (U+0412), в (U+0432), ᲀ (U+1C80) Д (U+0414), д (U+0434), ᲁ (U+1C81) О (U+041E), о (U+043E), ᲂ (U+1C82) С (U+0421), с (U+0441), ᲃ (U+1C83) Т (U+0422), т (U+0442), ᲄ (U+1C84), ᲅ (U+1C85) Ъ (U+042A), ъ (U+044A), ᲆ (U+1C86) Ѣ (U+0462), ѣ (U+0463), ᲇ (U+1C87) ᲈ (U+1C88), Ꙋ (U+A64A), ꙋ (U+A64B) Ṡ (U+1E60), ṡ (U+1E61), ẛ (U+1E9B)
Below are all characters that is the uppercase form of multiple characters.
Character set (53 ) I (U+0049): i (U+0069), ı (U+0131) S (U+0053): s (U+0073), ſ (U+017F) DŽ (U+01C4): Dž (U+01C5), dž (U+01C6) LJ (U+01C7): Lj (U+01C8), lj (U+01C9) NJ (U+01CA): Nj (U+01CB), nj (U+01CC) DZ (U+01F1): Dz (U+01F2), dz (U+01F3) Β (U+0392): β (U+03B2), ϐ (U+03D0) Ε (U+0395): ε (U+03B5), ϵ (U+03F5) Θ (U+0398): θ (U+03B8), ϑ (U+03D1) Ι (U+0399): ͅ (U+0345), ι (U+03B9) Κ (U+039A): κ (U+03BA), ϰ (U+03F0) Μ (U+039C): µ (U+00B5), μ (U+03BC) Π (U+03A0): π (U+03C0), ϖ (U+03D6) Ρ (U+03A1): ρ (U+03C1), ϱ (U+03F1) Σ (U+03A3): ς (U+03C2), σ (U+03C3) Φ (U+03A6): φ (U+03C6), ϕ (U+03D5) В (U+0412): в (U+0432), ᲀ (U+1C80) Д (U+0414): д (U+0434), ᲁ (U+1C81) О (U+041E): о (U+043E), ᲂ (U+1C82) С (U+0421): с (U+0441), ᲃ (U+1C83) Т (U+0422): т (U+0442), ᲄ (U+1C84), ᲅ (U+1C85) Ъ (U+042A): ъ (U+044A), ᲆ (U+1C86) Ѣ (U+0462): ѣ (U+0463), ᲇ (U+1C87) Ṡ (U+1E60): ṡ (U+1E61), ẛ (U+1E9B) Ꙋ (U+A64A): ᲈ (U+1C88), ꙋ (U+A64B) ἈΙ (U+1F08 U+0399): ᾀ (U+1F80), ᾈ (U+1F88) ἉΙ (U+1F09 U+0399): ᾁ (U+1F81), ᾉ (U+1F89) ἊΙ (U+1F0A U+0399): ᾂ (U+1F82), ᾊ (U+1F8A) ἋΙ (U+1F0B U+0399): ᾃ (U+1F83), ᾋ (U+1F8B) ἌΙ (U+1F0C U+0399): ᾄ (U+1F84), ᾌ (U+1F8C) ἍΙ (U+1F0D U+0399): ᾅ (U+1F85), ᾍ (U+1F8D) ἎΙ (U+1F0E U+0399): ᾆ (U+1F86), ᾎ (U+1F8E) ἏΙ (U+1F0F U+0399): ᾇ (U+1F87), ᾏ (U+1F8F) ἨΙ (U+1F28 U+0399): ᾐ (U+1F90), ᾘ (U+1F98) ἩΙ (U+1F29 U+0399): ᾑ (U+1F91), ᾙ (U+1F99) ἪΙ (U+1F2A U+0399): ᾒ (U+1F92), ᾚ (U+1F9A) ἫΙ (U+1F2B U+0399): ᾓ (U+1F93), ᾛ (U+1F9B) ἬΙ (U+1F2C U+0399): ᾔ (U+1F94), ᾜ (U+1F9C) ἭΙ (U+1F2D U+0399): ᾕ (U+1F95), ᾝ (U+1F9D) ἮΙ (U+1F2E U+0399): ᾖ (U+1F96), ᾞ (U+1F9E) ἯΙ (U+1F2F U+0399): ᾗ (U+1F97), ᾟ (U+1F9F) ὨΙ (U+1F68 U+0399): ᾠ (U+1FA0), ᾨ (U+1FA8) ὩΙ (U+1F69 U+0399): ᾡ (U+1FA1), ᾩ (U+1FA9) ὪΙ (U+1F6A U+0399): ᾢ (U+1FA2), ᾪ (U+1FAA) ὫΙ (U+1F6B U+0399): ᾣ (U+1FA3), ᾫ (U+1FAB) ὬΙ (U+1F6C U+0399): ᾤ (U+1FA4), ᾬ (U+1FAC) ὭΙ (U+1F6D U+0399): ᾥ (U+1FA5), ᾭ (U+1FAD) ὮΙ (U+1F6E U+0399): ᾦ (U+1FA6), ᾮ (U+1FAE) ὯΙ (U+1F6F U+0399): ᾧ (U+1FA7), ᾯ (U+1FAF) ΑΙ (U+0391 U+0399): ᾳ (U+1FB3), ᾼ (U+1FBC) ΗΙ (U+0397 U+0399): ῃ (U+1FC3), ῌ (U+1FCC) ΩΙ (U+03A9 U+0399): ῳ (U+1FF3), ῼ (U+1FFC) ST (U+0053 U+0054): ſt (U+FB05), st (U+FB06)
Below are all characters that is the lowercase form of multiple characters.
Character set (5 ) dž (U+01C6): DŽ (U+01C4), Dž (U+01C5) lj (U+01C9): LJ (U+01C7), Lj (U+01C8) nj (U+01CC): NJ (U+01CA), Nj (U+01CB) dz (U+01F3): DZ (U+01F1), Dz (U+01F2) θ (U+03B8): Θ (U+0398), ϴ (U+03F4)
A case-mapping chain is a sequence of distinct nodes ( c 1 , c 2 , … , c n ) (c_1, c_2, \dots, c_n) ( c 1 , c 2 , … , c n ) such that ( c i , c i + 1 ) ∈ E u ∪ E l (c_i, c_{i+1})\in E_u\cup E_l ( c i , c i + 1 ) ∈ E u ∪ E l . Invariant nodes in I I I have case-mapping chains of length 1 (only the node itself). Simple mapping pairs have case-mapping chains of length 2 (the two nodes). The longest case-mapping chain has length 3, and there are many of them:
Character set (28 ) µ (U+00B5) → Μ (U+039C) → μ (U+03BC) ı (U+0131) → I (U+0049) → i (U+0069) ſ (U+017F) → S (U+0053) → s (U+0073) Dž (U+01C5) → DŽ (U+01C4) → dž (U+01C6) Lj (U+01C8) → LJ (U+01C7) → lj (U+01C9) Nj (U+01CB) → NJ (U+01CA) → nj (U+01CC) Dz (U+01F2) → DZ (U+01F1) → dz (U+01F3) ͅ (U+0345) → Ι (U+0399) → ι (U+03B9) ς (U+03C2) → Σ (U+03A3) → σ (U+03C3) ϐ (U+03D0) → Β (U+0392) → β (U+03B2) ϑ (U+03D1) → Θ (U+0398) → θ (U+03B8) ϕ (U+03D5) → Φ (U+03A6) → φ (U+03C6) ϖ (U+03D6) → Π (U+03A0) → π (U+03C0) ϰ (U+03F0) → Κ (U+039A) → κ (U+03BA) ϱ (U+03F1) → Ρ (U+03A1) → ρ (U+03C1) ϴ (U+03F4) → θ (U+03B8) → Θ (U+0398) ϵ (U+03F5) → Ε (U+0395) → ε (U+03B5) ᲀ (U+1C80) → В (U+0412) → в (U+0432) ᲁ (U+1C81) → Д (U+0414) → д (U+0434) ᲂ (U+1C82) → О (U+041E) → о (U+043E) ᲃ (U+1C83) → С (U+0421) → с (U+0441) ᲄ (U+1C84) → Т (U+0422) → т (U+0442) ᲅ (U+1C85) → Т (U+0422) → т (U+0442) ᲆ (U+1C86) → Ъ (U+042A) → ъ (U+044A) ᲇ (U+1C87) → Ѣ (U+0462) → ѣ (U+0463) ᲈ (U+1C88) → Ꙋ (U+A64A) → ꙋ (U+A64B) ẛ (U+1E9B) → Ṡ (U+1E60) → ṡ (U+1E61) ẞ (U+1E9E) → ß (U+00DF) → SS (U+0053 U+0053)