Case mapping

There are three sets of characters: upper case, lower case, and neither. toUpperCase() and toLowerCase() provide mappings between them. Let's define the following sets:

Also define the following predicates:

Thus define the following sets:

Define the following terminologies:

export function toLowerCase(char: string): string;
export function toLowerCase(char: Iterable<string>): Set<string>;
export function toLowerCase(
char: string | Iterable<string>,
): string | Set<string> {
if (typeof char === "string") return char.toLowerCase().normalize("NFC");
return new Set(Iterator.from(char).map(toLowerCase));
}

export function toUpperCase(char: string): string;
export function toUpperCase(char: Iterable<string>): Set<string>;
export function toUpperCase(
char: string | Iterable<string>,
): string | Set<string> {
if (typeof char === "string") return char.toUpperCase().normalize("NFC");
return new Set(Iterator.from(char).map(toUpperCase));
}

export function isChar(char: string): boolean {
return [...char].length === 1;
}

export function isUpperCase(char: string): boolean {
return /\p{Uppercase_Letter}/u.test(char);
}

export function isLowerCase(char: string): boolean {
return /\p{Lowercase_Letter}/u.test(char);
}

export const UC = new Set<string>();
export const U = new Set<string>();
export const L = new Set<string>();
export const Uext = new Set<string>();
export const Lext = new Set<string>();
export const ML = new Set<string>();
export const MU = new Set<string>();
export const ML_ = new Set<string>();
export const MU_ = new Set<string>();
export const NL = new Set<string>();
export const NU = new Set<string>();
export const NL_ = new Set<string>();
export const NU_ = new Set<string>();

export function isI(c: string): boolean {
return toUpperCase(c) === c && toLowerCase(c) === c;
}

for (let c = 0; c < 0x10ffff; c++) {
const char = String.fromCodePoint(c).normalize("NFC");
if (isUpperCase(char)) Uext.add(char);
if (isLowerCase(char)) Lext.add(char);
if (!isChar(char) || isI(char)) continue;
UC.add(char);
const upper = toUpperCase(char);
const lower = toLowerCase(char);
if (isI(upper))
console.error(`Invariant broken: toUpperCase(${char}) = ${upper} ∈ I`);
else if (isI(lower))
console.error(`Invariant broken: toLowerCase(${char}) = ${lower} ∈ I`);
if (!isChar(upper)) {
if (isChar(toLowerCase(upper))) {
ML.add(char);
ML_.add(upper);
} else {
NL.add(char);
NL_.add(upper);
}
}
if (!isChar(lower)) {
if (isChar(toUpperCase(lower))) {
MU.add(char);
MU_.add(lower);
} else {
NU.add(char);
NU_.add(lower);
}
}
if (isUpperCase(char)) U.add(char);
if (isLowerCase(char)) L.add(char);
}

export const GrU = new Set(NL.values().filter((c) => toLowerCase(c) !== c));

export const RU = toUpperCase(UC);
export const RL = toLowerCase(UC);
ts

NOTE: To maximize the number of single-code-point characters in discussion, we normalize the output with .normalize("NFC").

Case-mapping properties

Idempotence

The first invariant we want to establish is toUpperCase(toUpperCase(c)) == toUpperCase(c) and toLowerCase(toLowerCase(c)) == toLowerCase(c) for all cUCc\in\mathbb{UC}.

This also means that if cc is uppercase variant, then cc will not be the output of toUpperCase(); similarly, if cc is lowercase variant, then cc will not be the output of toLowerCase().

Complementary ranges

The ranges of toUpperCase() and toLowerCase() are disjoint:

But, they are not partitions of UC\mathbb{UC}:

27 of these characters are GrUGr_U. The other 4 are:

These characters cannot be produced by toUpperCase() or toLowerCase() with any input, including themselves.

Relationships between case-mapping variance and case

Does upper(lower) case imply upper(lower)case invariance?

Yes. toUpperCase() and toLowerCase() are identity functions on UextU_{ext} and LextL_{ext}, respectively.

This means upper case implies uppercase invariance, and lower case implies lowercase invariance.

Does upper(lower) case imply lower(upper)case variance?

No. UextU_{ext} and LextL_{ext} are not proper subsets of UC\mathbb{UC}:

This means there are cased letters that are case-mapping invariant. Always-upper characters include:

Always-lower characters include:

To make our discussions more meaningful, we will limit our future discussions to UU and LL instead of UextU_{ext} and LextL_{ext}, so that all sets in question are subsets of UC\mathbb{UC}. Upper(Lower) case letters that are case-mapping variant must be lower(upper)case variant, because we already showed that they are upper(lower)case invariant.

Does case-mapping variance imply casedness?

No. UextU_{ext} and LextL_{ext} are not partitions of UC\mathbb{UC}: there are characters that are uncased, but are case-mapping variant:

They include GrUGr_U, LaLa, and also:

Are uppercase variance and lowercase variance mutually exclusive?

No. There are characters that are both uppercase variant and lowercase variant:

In addition, as shown before, these are also characters that cannot be produced by toUpperCase() or toLowerCase() with any input, including themselves.

Is Lower(Upper)Case_Letter always mapped to Upper(Lower)Case_Letter by toUpper(Lower)Case?

We already mentioned that certain upper-/lower-case letters are mapping invariant. Furthermore, there are plenty of characters in MLMUNLNUM_L\cup M_U\cup N_L\cup N_U that are cased. Dropping those, the answer is yes. If the input is a Lowercase_Letter, the output of toUpperCase() is always an Uppercase_Letter. If the input is an Uppercase_Letter, the output of toLowerCase() is always a Lowercase_Letter.

Does toUpper(Lower)Case always produce Upper(Lower)case_Letter? Can it produce Lower(Upper)case_Letter?

(Again, disregarding multi-code-point characters) No and no (but yes, if you count case-mapping invariant but cased characters). UU and LL are proper subsets of RU\mathcal{R}_U and RL\mathcal{R}_L, respectively:

Uncased letters produced by to{Upper,Lower}Case are the same sets as discussed before: those characters that are uncased but case-mapping variant. To produce an uncased output, the input must be uncased too.

On the other hand, LL and RU\mathcal{R}_U, UU and RL\mathcal{R}_L are disjoint:

So toUpperCase() never produces a Lowercase_Letter, and toLowerCase() never produces an Uppercase_Letter.

Can toUpper(Lower)Case produce Upper(Lower)case_Letter from uncased characters?

Yes. Uncased letters may become cased after case mapping:

These characters are also the characters that are uncased but case-mapping variant.

To summarize:

Input and output cases of toUpperCase when the input is...

Input case\Output caseUpper caseLower caseUncased
Upper caseUextU_{ext} (identity)NeverNever
Lower caseLLLextLL_{ext}\setminus L (identity)Never
UncasedLaLa, U+0345NeverRoLRo_L, CiLCi_L, other identities

Input and output cases of toLowerCase when the input is...

Input case\Output caseUpper caseLower caseUncased
Upper caseUextUU_{ext}\setminus U (identity)UUNever
Lower caseNeverLextL_{ext} (identity)Never
UncasedNeverLaLa, GrUGr_URoURo_U, CiUCi_U, other identities

Properties of characters that map to multiple characters

We now focus on these particular subsets:

Define UCS=UCMUNUMLNL\mathbb{UC}_S = \mathbb{UC}\setminus M_U\setminus N_U\setminus M_L\setminus N_L. So, we may characterize the domain and codomain of toUpperCase() and toLowerCase() as the following, where each piece has a disjoint domain and each except the last has a disjoint codomain:

toUpperCase:{MLML(Identity)MUMUMUMU(Identity)MLMLNUNU(Identity)NLNLII(Identity)UCSUC()toLowerCase:{MLMLMUMU(Identity)MUMUMLML(Identity)NUNUNLGrUNLGrU(Identity)GrUGrLII(Identity)UCSUC()\begin{aligned} \mathtt{toUpperCase}&: \begin{cases} M_L'\to M_L'&\text{(Identity)}\\ M_U'\to M_U\\ M_U\to M_U&\text{(Identity)}\\ M_L\to M_L'\\ N_U\to N_U&\text{(Identity)}\\ N_L\to N_L'\\ I\to I&\text{(Identity)}\\ \mathbb{UC}_S\to\mathbb{UC}&(*)\\ \end{cases}\\ \mathtt{toLowerCase}&: \begin{cases} M_L'\to M_L\\ M_U'\to M_U'&\text{(Identity)}\\ M_U\to M_U'\\ M_L\to M_L&\text{(Identity)}\\ N_U\to N_U'\\ N_L\setminus Gr_U\to N_L\setminus Gr_U&\text{(Identity)}\\ Gr_U\to Gr_L\\ I\to I&\text{(Identity)}\\ \mathbb{UC}_S\to\mathbb{UC}&(*)\\ \end{cases} \end{aligned}

There are other cases where multiple code points can be mapped to single code points, but they are not of our interest. We will discuss these multi-code-point characters soon.

We wonder if characters in UCS\mathbb{UC}_S always stay in UCS\mathbb{UC}_S. In order to narrow the codomain of the pieces marked with (*), we want to find if there are characters cUCSc\in \mathbb{UC}_S such that toUpperCase(c)MUMLNUNL\mathtt{toUpperCase}(c)\in M_U\cup M_L\cup N_U\cup N_L, or toLowerCase(c)MUMLNUNL\mathtt{toLowerCase}(c)\in M_U\cup M_L\cup N_U\cup N_L.

There is exactly one: LATIN CAPITAL LETTER SHARP S (U+1E9E). The mappings of this character are:

i.e. it maps to a character in NLN_L. There's no other character that maps to U+00DF:

The characters that map to toLowerCase(GrU)\mathtt{toLowerCase}(Gr_U) are exactly GrLGr_L and GrUGr_U:

Then we can refine the domain and codomain of toUpperCase() and toLowerCase() as the following, so that each piece has a disjoint domain and codomain:

toUpperCase:{MLML(Identity)MUMUMUMU(Identity)MLMLNUNU(Identity)NLNLII(Identity)UCSUCStoLowerCase:{MLMLMUMU(Identity)MUMUMLML(Identity)NUNUNLGrUNLGrU(Identity)GrUGrLGrLGrL(Identity)II(Identity){U+1E9E,U+00DF}{U+00DF}UCSGr{U+1E9E,U+00DF}UCSGr{U+1E9E,U+00DF}\begin{aligned} \mathtt{toUpperCase}&: \begin{cases} M_L'\to M_L'&\text{(Identity)}\\ M_U'\to M_U\\ M_U\to M_U&\text{(Identity)}\\ M_L\to M_L'\\ N_U\to N_U&\text{(Identity)}\\ N_L\to N_L'\\ I\to I&\text{(Identity)}\\ \mathbb{UC}_S\to\mathbb{UC}_S\\ \end{cases}\\ \mathtt{toLowerCase}&: \begin{cases} M_L'\to M_L\\ M_U'\to M_U'&\text{(Identity)}\\ M_U\to M_U'\\ M_L\to M_L&\text{(Identity)}\\ N_U\to N_U'\\ N_L\setminus Gr_U\to N_L\setminus Gr_U&\text{(Identity)}\\ Gr_U\to Gr_L\\ Gr_L\to Gr_L&\text{(Identity)}\\ I\to I&\text{(Identity)}\\ \{\text{U+1E9E}, \text{U+00DF}\}\to \{\text{U+00DF}\}\\ \mathbb{UC}_S\setminus Gr\setminus \{\text{U+1E9E}, \text{U+00DF}\}\to\mathbb{UC}_S\setminus Gr\setminus \{\text{U+1E9E}, \text{U+00DF}\}\\ \end{cases} \end{aligned}

Mapping graph

To study the injectivity/surjectivity of case mapping, we introduce the concept of a mapping graph, a directed graph (UCSMLMUNLNUMLMUNLNUI,EuEl)(\mathbb{UC}_S\cup M_L\cup M_U\cup N_L\cup N_U\cup M_L'\cup M_U'\cup N_L'\cup N_U'\cup I, E_u\cup E_l). The vertices are all characters that we discussed, and the edges have two colors: (c1,c2)Eu(c_1, c_2)\in E_u iff toUpperCase(c1)=c2\mathtt{toUpperCase}(c_1) = c_2, (c1,c2)El(c_1, c_2)\in E_l iff toLowerCase(c1)=c2\mathtt{toLowerCase}(c_1) = c_2. Therefore, we can reformulate case-mapping variance as:

  1. A node is case-mapping invariant iff both of its out edges are self-loops. These are nodes in II.
  2. A node is uppercase invariant iff its out uu-edge is a self-loop.
  3. A node is lowercase invariant iff its out ll-edge is a self-loop.
  4. A node is both uppercase and lowercase variant iff both of its out edges are not self-loops. These are the 31 characters mentioned before: LaGrULa\cup Gr_U.

We also have the following observations:

  1. Edges can be self-referential. Each node except those in NLN_L' and NUN_U' has exactly one out uu-edge and exactly one out ll-edge. NLN_L' has a self-referential out uu-edge and no out ll-edge. NUN_U' has a self-referential out ll-edge and no out uu-edge.
  2. Each node has zero or more in edges colored either uu or ll.
  3. Idempotence: if a node has a non-self-referential in uu-edge, then its out uu-edge is self-referential. Similarly, if a node has a non-self-referential in ll-edge, then its out ll-edge is self-referential.
  4. Complementary ranges: if a node has a non-self-referential in uu-edge, then it has no non-self-referential in ll-edge. Similarly, if a node has a non-self-referential in ll-edge, then it has no non-self-referential in uu-edge.
  5. Closedness of II: if a node has a non-self-referential in uu-edge, then it has no out ll-edge or its out ll-edge is non-self-referential. Similarly, if a node has a non-self-referential in edge colored ll, then it has no out uu-edge or its out uu-edge is non-self-referential.

We already established that:

  1. Nodes in II have no in edges from other nodes.
  2. Nodes in LaGrULa\cup Gr_U have no in edges from other nodes.

Therefore, ignoring these nodes, the graph is inherently bipartite:

Connected subgraphs

A connected subgraph, called a cluster, formed by cc is recursively defined:

  1. cc is in the cluster.
  2. If cc' is in the cluster, then all nodes that point to cc' and those that cc' points to are all in the cluster.

Nodes that are in the cluster formed by cNLNUc\in N_L'\cup N_U' do not always form cycles through some series of toUpperCase/toLowerCase transformations, because they eventually map to a node in NLNUN_L'\cup N_U', which has no out ll-edge or no out uu-edge.

Character set (51)
  • SS (U+0053 U+0053), ß (U+00DF), ẞ (U+1E9E)
  • ʼN (U+02BC U+004E), ʼn (U+0149)
  • ԵՒ (U+0535 U+0552), և (U+0587)
  • Aʾ (U+0041 U+02BE), ẚ (U+1E9A)
  • ἈΙ (U+1F08 U+0399), ᾀ (U+1F80), ᾈ (U+1F88)
  • ἉΙ (U+1F09 U+0399), ᾁ (U+1F81), ᾉ (U+1F89)
  • ἊΙ (U+1F0A U+0399), ᾂ (U+1F82), ᾊ (U+1F8A)
  • ἋΙ (U+1F0B U+0399), ᾃ (U+1F83), ᾋ (U+1F8B)
  • ἌΙ (U+1F0C U+0399), ᾄ (U+1F84), ᾌ (U+1F8C)
  • ἍΙ (U+1F0D U+0399), ᾅ (U+1F85), ᾍ (U+1F8D)
  • ἎΙ (U+1F0E U+0399), ᾆ (U+1F86), ᾎ (U+1F8E)
  • ἏΙ (U+1F0F U+0399), ᾇ (U+1F87), ᾏ (U+1F8F)
  • ἨΙ (U+1F28 U+0399), ᾐ (U+1F90), ᾘ (U+1F98)
  • ἩΙ (U+1F29 U+0399), ᾑ (U+1F91), ᾙ (U+1F99)
  • ἪΙ (U+1F2A U+0399), ᾒ (U+1F92), ᾚ (U+1F9A)
  • ἫΙ (U+1F2B U+0399), ᾓ (U+1F93), ᾛ (U+1F9B)
  • ἬΙ (U+1F2C U+0399), ᾔ (U+1F94), ᾜ (U+1F9C)
  • ἭΙ (U+1F2D U+0399), ᾕ (U+1F95), ᾝ (U+1F9D)
  • ἮΙ (U+1F2E U+0399), ᾖ (U+1F96), ᾞ (U+1F9E)
  • ἯΙ (U+1F2F U+0399), ᾗ (U+1F97), ᾟ (U+1F9F)
  • ὨΙ (U+1F68 U+0399), ᾠ (U+1FA0), ᾨ (U+1FA8)
  • ὩΙ (U+1F69 U+0399), ᾡ (U+1FA1), ᾩ (U+1FA9)
  • ὪΙ (U+1F6A U+0399), ᾢ (U+1FA2), ᾪ (U+1FAA)
  • ὫΙ (U+1F6B U+0399), ᾣ (U+1FA3), ᾫ (U+1FAB)
  • ὬΙ (U+1F6C U+0399), ᾤ (U+1FA4), ᾬ (U+1FAC)
  • ὭΙ (U+1F6D U+0399), ᾥ (U+1FA5), ᾭ (U+1FAD)
  • ὮΙ (U+1F6E U+0399), ᾦ (U+1FA6), ᾮ (U+1FAE)
  • ὯΙ (U+1F6F U+0399), ᾧ (U+1FA7), ᾯ (U+1FAF)
  • ᾺΙ (U+1FBA U+0399), ᾲ (U+1FB2)
  • ΑΙ (U+0391 U+0399), ᾳ (U+1FB3), ᾼ (U+1FBC)
  • ΆΙ (U+0386 U+0399), ᾴ (U+1FB4)
  • Α͂Ι (U+0391 U+0342 U+0399), ᾷ (U+1FB7)
  • ῊΙ (U+1FCA U+0399), ῂ (U+1FC2)
  • ΗΙ (U+0397 U+0399), ῃ (U+1FC3), ῌ (U+1FCC)
  • ΉΙ (U+0389 U+0399), ῄ (U+1FC4)
  • Η͂Ι (U+0397 U+0342 U+0399), ῇ (U+1FC7)
  • ῺΙ (U+1FFA U+0399), ῲ (U+1FF2)
  • ΩΙ (U+03A9 U+0399), ῳ (U+1FF3), ῼ (U+1FFC)
  • ΏΙ (U+038F U+0399), ῴ (U+1FF4)
  • Ω͂Ι (U+03A9 U+0342 U+0399), ῷ (U+1FF7)
  • FF (U+0046 U+0046), ff (U+FB00)
  • FI (U+0046 U+0049), fi (U+FB01)
  • FL (U+0046 U+004C), fl (U+FB02)
  • FFI (U+0046 U+0046 U+0049), ffi (U+FB03)
  • FFL (U+0046 U+0046 U+004C), ffl (U+FB04)
  • ST (U+0053 U+0054), ſt (U+FB05), st (U+FB06)
  • ՄՆ (U+0544 U+0546), ﬓ (U+FB13)
  • ՄԵ (U+0544 U+0535), ﬔ (U+FB14)
  • ՄԻ (U+0544 U+053B), ﬕ (U+FB15)
  • ՎՆ (U+054E U+0546), ﬖ (U+FB16)
  • ՄԽ (U+0544 U+053D), ﬗ (U+FB17)

Simple mapping pairs

Define a mapping pair as a cluster of size 2. A mapping pair (cu,cl)(c_u, c_l) (cuclc_u\neq c_l) satisfies:

  1. (cu,cl)El(c_u, c_l)\in E_l (the ll-edge of cuc_u points to clc_l)
  2. (cl,cu)Eu(c_l, c_u)\in E_u (the uu-edge of clc_l points to cuc_u)
  3. ccu\nexists c\neq c_u such that (c,cu)Eu(c, c_u)\in E_u or (c,cl)El(c, c_l)\in E_l (no other node points to cuc_u or clc_l)

There are 1386 such pairs. Of these, 1322 are pairs of Uppercase_Letter and Lowercase_Letter, and the rest are:

Character set (64)
  • İ (U+0130) — i̇ (U+0069 U+0307)
  • ǰ (U+01F0) — J̌ (U+004A U+030C)
  • ΐ (U+0390) — Ϊ́ (U+03AA U+0301)
  • ΰ (U+03B0) — Ϋ́ (U+03AB U+0301)
  • ẖ (U+1E96) — H̱ (U+0048 U+0331)
  • ẗ (U+1E97) — T̈ (U+0054 U+0308)
  • ẘ (U+1E98) — W̊ (U+0057 U+030A)
  • ẙ (U+1E99) — Y̊ (U+0059 U+030A)
  • ὐ (U+1F50) — Υ̓ (U+03A5 U+0313)
  • ὒ (U+1F52) — Υ̓̀ (U+03A5 U+0313 U+0300)
  • ὔ (U+1F54) — Υ̓́ (U+03A5 U+0313 U+0301)
  • ὖ (U+1F56) — Υ̓͂ (U+03A5 U+0313 U+0342)
  • ᾶ (U+1FB6) — Α͂ (U+0391 U+0342)
  • ῆ (U+1FC6) — Η͂ (U+0397 U+0342)
  • ῒ (U+1FD2) — Ϊ̀ (U+03AA U+0300)
  • ῖ (U+1FD6) — Ι͂ (U+0399 U+0342)
  • ῗ (U+1FD7) — Ϊ͂ (U+03AA U+0342)
  • ῢ (U+1FE2) — Ϋ̀ (U+03AB U+0300)
  • ῤ (U+1FE4) — Ρ̓ (U+03A1 U+0313)
  • ῦ (U+1FE6) — Υ͂ (U+03A5 U+0342)
  • ῧ (U+1FE7) — Ϋ͂ (U+03AB U+0342)
  • ῶ (U+1FF6) — Ω͂ (U+03A9 U+0342)
  • Ⅰ (U+2160) — ⅰ (U+2170)
  • Ⅱ (U+2161) — ⅱ (U+2171)
  • Ⅲ (U+2162) — ⅲ (U+2172)
  • Ⅳ (U+2163) — ⅳ (U+2173)
  • Ⅴ (U+2164) — ⅴ (U+2174)
  • Ⅵ (U+2165) — ⅵ (U+2175)
  • Ⅶ (U+2166) — ⅶ (U+2176)
  • Ⅷ (U+2167) — ⅷ (U+2177)
  • Ⅸ (U+2168) — ⅸ (U+2178)
  • Ⅹ (U+2169) — ⅹ (U+2179)
  • Ⅺ (U+216A) — ⅺ (U+217A)
  • Ⅻ (U+216B) — ⅻ (U+217B)
  • Ⅼ (U+216C) — ⅼ (U+217C)
  • Ⅽ (U+216D) — ⅽ (U+217D)
  • Ⅾ (U+216E) — ⅾ (U+217E)
  • Ⅿ (U+216F) — ⅿ (U+217F)
  • Ⓐ (U+24B6) — ⓐ (U+24D0)
  • Ⓑ (U+24B7) — ⓑ (U+24D1)
  • Ⓒ (U+24B8) — ⓒ (U+24D2)
  • Ⓓ (U+24B9) — ⓓ (U+24D3)
  • Ⓔ (U+24BA) — ⓔ (U+24D4)
  • Ⓕ (U+24BB) — ⓕ (U+24D5)
  • Ⓖ (U+24BC) — ⓖ (U+24D6)
  • Ⓗ (U+24BD) — ⓗ (U+24D7)
  • Ⓘ (U+24BE) — ⓘ (U+24D8)
  • Ⓙ (U+24BF) — ⓙ (U+24D9)
  • Ⓚ (U+24C0) — ⓚ (U+24DA)
  • Ⓛ (U+24C1) — ⓛ (U+24DB)
  • Ⓜ (U+24C2) — ⓜ (U+24DC)
  • Ⓝ (U+24C3) — ⓝ (U+24DD)
  • Ⓞ (U+24C4) — ⓞ (U+24DE)
  • Ⓟ (U+24C5) — ⓟ (U+24DF)
  • Ⓠ (U+24C6) — ⓠ (U+24E0)
  • Ⓡ (U+24C7) — ⓡ (U+24E1)
  • Ⓢ (U+24C8) — ⓢ (U+24E2)
  • Ⓣ (U+24C9) — ⓣ (U+24E3)
  • Ⓤ (U+24CA) — ⓤ (U+24E4)
  • Ⓥ (U+24CB) — ⓥ (U+24E5)
  • Ⓦ (U+24CC) — ⓦ (U+24E6)
  • Ⓧ (U+24CD) — ⓧ (U+24E7)
  • Ⓨ (U+24CE) — ⓨ (U+24E8)
  • Ⓩ (U+24CF) — ⓩ (U+24E9)

Which are roman numerals (RoU,RoL)(Ro_U, Ro_L), circled letters (CiU,CiL)(Ci_U, Ci_L), and (ML,ML)(M_L', M_L), (MU,MU)(M_U, M_U').

Now, the remaining nodes are the complex cycles that neither have dead ends nor are simple mapping pairs. They are:

Character set (25)
  • I (U+0049), i (U+0069), ı (U+0131)
  • S (U+0053), s (U+0073), ſ (U+017F)
  • µ (U+00B5), Μ (U+039C), μ (U+03BC)
  • DŽ (U+01C4), Dž (U+01C5), dž (U+01C6)
  • LJ (U+01C7), Lj (U+01C8), lj (U+01C9)
  • NJ (U+01CA), Nj (U+01CB), nj (U+01CC)
  • DZ (U+01F1), Dz (U+01F2), dz (U+01F3)
  • ͅ (U+0345), Ι (U+0399), ι (U+03B9)
  • Β (U+0392), β (U+03B2), ϐ (U+03D0)
  • Ε (U+0395), ε (U+03B5), ϵ (U+03F5)
  • Θ (U+0398), θ (U+03B8), ϴ (U+03F4), ϑ (U+03D1)
  • Κ (U+039A), κ (U+03BA), ϰ (U+03F0)
  • Π (U+03A0), π (U+03C0), ϖ (U+03D6)
  • Ρ (U+03A1), ρ (U+03C1), ϱ (U+03F1)
  • Σ (U+03A3), ς (U+03C2), σ (U+03C3)
  • Φ (U+03A6), φ (U+03C6), ϕ (U+03D5)
  • В (U+0412), в (U+0432), ᲀ (U+1C80)
  • Д (U+0414), д (U+0434), ᲁ (U+1C81)
  • О (U+041E), о (U+043E), ᲂ (U+1C82)
  • С (U+0421), с (U+0441), ᲃ (U+1C83)
  • Т (U+0422), т (U+0442), ᲄ (U+1C84), ᲅ (U+1C85)
  • Ъ (U+042A), ъ (U+044A), ᲆ (U+1C86)
  • Ѣ (U+0462), ѣ (U+0463), ᲇ (U+1C87)
  • ᲈ (U+1C88), Ꙋ (U+A64A), ꙋ (U+A64B)
  • Ṡ (U+1E60), ṡ (U+1E61), ẛ (U+1E9B)

Which characters are the upper(lower)case form of multiple characters?

Below are all characters that is the uppercase form of multiple characters.

Character set (53)
  • I (U+0049): i (U+0069), ı (U+0131)
  • S (U+0053): s (U+0073), ſ (U+017F)
  • DŽ (U+01C4): Dž (U+01C5), dž (U+01C6)
  • LJ (U+01C7): Lj (U+01C8), lj (U+01C9)
  • NJ (U+01CA): Nj (U+01CB), nj (U+01CC)
  • DZ (U+01F1): Dz (U+01F2), dz (U+01F3)
  • Β (U+0392): β (U+03B2), ϐ (U+03D0)
  • Ε (U+0395): ε (U+03B5), ϵ (U+03F5)
  • Θ (U+0398): θ (U+03B8), ϑ (U+03D1)
  • Ι (U+0399): ͅ (U+0345), ι (U+03B9)
  • Κ (U+039A): κ (U+03BA), ϰ (U+03F0)
  • Μ (U+039C): µ (U+00B5), μ (U+03BC)
  • Π (U+03A0): π (U+03C0), ϖ (U+03D6)
  • Ρ (U+03A1): ρ (U+03C1), ϱ (U+03F1)
  • Σ (U+03A3): ς (U+03C2), σ (U+03C3)
  • Φ (U+03A6): φ (U+03C6), ϕ (U+03D5)
  • В (U+0412): в (U+0432), ᲀ (U+1C80)
  • Д (U+0414): д (U+0434), ᲁ (U+1C81)
  • О (U+041E): о (U+043E), ᲂ (U+1C82)
  • С (U+0421): с (U+0441), ᲃ (U+1C83)
  • Т (U+0422): т (U+0442), ᲄ (U+1C84), ᲅ (U+1C85)
  • Ъ (U+042A): ъ (U+044A), ᲆ (U+1C86)
  • Ѣ (U+0462): ѣ (U+0463), ᲇ (U+1C87)
  • Ṡ (U+1E60): ṡ (U+1E61), ẛ (U+1E9B)
  • Ꙋ (U+A64A): ᲈ (U+1C88), ꙋ (U+A64B)
  • ἈΙ (U+1F08 U+0399): ᾀ (U+1F80), ᾈ (U+1F88)
  • ἉΙ (U+1F09 U+0399): ᾁ (U+1F81), ᾉ (U+1F89)
  • ἊΙ (U+1F0A U+0399): ᾂ (U+1F82), ᾊ (U+1F8A)
  • ἋΙ (U+1F0B U+0399): ᾃ (U+1F83), ᾋ (U+1F8B)
  • ἌΙ (U+1F0C U+0399): ᾄ (U+1F84), ᾌ (U+1F8C)
  • ἍΙ (U+1F0D U+0399): ᾅ (U+1F85), ᾍ (U+1F8D)
  • ἎΙ (U+1F0E U+0399): ᾆ (U+1F86), ᾎ (U+1F8E)
  • ἏΙ (U+1F0F U+0399): ᾇ (U+1F87), ᾏ (U+1F8F)
  • ἨΙ (U+1F28 U+0399): ᾐ (U+1F90), ᾘ (U+1F98)
  • ἩΙ (U+1F29 U+0399): ᾑ (U+1F91), ᾙ (U+1F99)
  • ἪΙ (U+1F2A U+0399): ᾒ (U+1F92), ᾚ (U+1F9A)
  • ἫΙ (U+1F2B U+0399): ᾓ (U+1F93), ᾛ (U+1F9B)
  • ἬΙ (U+1F2C U+0399): ᾔ (U+1F94), ᾜ (U+1F9C)
  • ἭΙ (U+1F2D U+0399): ᾕ (U+1F95), ᾝ (U+1F9D)
  • ἮΙ (U+1F2E U+0399): ᾖ (U+1F96), ᾞ (U+1F9E)
  • ἯΙ (U+1F2F U+0399): ᾗ (U+1F97), ᾟ (U+1F9F)
  • ὨΙ (U+1F68 U+0399): ᾠ (U+1FA0), ᾨ (U+1FA8)
  • ὩΙ (U+1F69 U+0399): ᾡ (U+1FA1), ᾩ (U+1FA9)
  • ὪΙ (U+1F6A U+0399): ᾢ (U+1FA2), ᾪ (U+1FAA)
  • ὫΙ (U+1F6B U+0399): ᾣ (U+1FA3), ᾫ (U+1FAB)
  • ὬΙ (U+1F6C U+0399): ᾤ (U+1FA4), ᾬ (U+1FAC)
  • ὭΙ (U+1F6D U+0399): ᾥ (U+1FA5), ᾭ (U+1FAD)
  • ὮΙ (U+1F6E U+0399): ᾦ (U+1FA6), ᾮ (U+1FAE)
  • ὯΙ (U+1F6F U+0399): ᾧ (U+1FA7), ᾯ (U+1FAF)
  • ΑΙ (U+0391 U+0399): ᾳ (U+1FB3), ᾼ (U+1FBC)
  • ΗΙ (U+0397 U+0399): ῃ (U+1FC3), ῌ (U+1FCC)
  • ΩΙ (U+03A9 U+0399): ῳ (U+1FF3), ῼ (U+1FFC)
  • ST (U+0053 U+0054): ſt (U+FB05), st (U+FB06)

Below are all characters that is the lowercase form of multiple characters.

Character set (5)
  • dž (U+01C6): DŽ (U+01C4), Dž (U+01C5)
  • lj (U+01C9): LJ (U+01C7), Lj (U+01C8)
  • nj (U+01CC): NJ (U+01CA), Nj (U+01CB)
  • dz (U+01F3): DZ (U+01F1), Dz (U+01F2)
  • θ (U+03B8): Θ (U+0398), ϴ (U+03F4)

What's the longest case-mapping chain?

A case-mapping chain is a sequence of distinct nodes (c1,c2,,cn)(c_1, c_2, \dots, c_n) such that (ci,ci+1)EuEl(c_i, c_{i+1})\in E_u\cup E_l. Invariant nodes in II have case-mapping chains of length 1 (only the node itself). Simple mapping pairs have case-mapping chains of length 2 (the two nodes). The longest case-mapping chain has length 3, and there are many of them:

Character set (28)
  • µ (U+00B5) → Μ (U+039C) → μ (U+03BC)
  • ı (U+0131) → I (U+0049) → i (U+0069)
  • ſ (U+017F) → S (U+0053) → s (U+0073)
  • Dž (U+01C5) → DŽ (U+01C4) → dž (U+01C6)
  • Lj (U+01C8) → LJ (U+01C7) → lj (U+01C9)
  • Nj (U+01CB) → NJ (U+01CA) → nj (U+01CC)
  • Dz (U+01F2) → DZ (U+01F1) → dz (U+01F3)
  • ͅ (U+0345) → Ι (U+0399) → ι (U+03B9)
  • ς (U+03C2) → Σ (U+03A3) → σ (U+03C3)
  • ϐ (U+03D0) → Β (U+0392) → β (U+03B2)
  • ϑ (U+03D1) → Θ (U+0398) → θ (U+03B8)
  • ϕ (U+03D5) → Φ (U+03A6) → φ (U+03C6)
  • ϖ (U+03D6) → Π (U+03A0) → π (U+03C0)
  • ϰ (U+03F0) → Κ (U+039A) → κ (U+03BA)
  • ϱ (U+03F1) → Ρ (U+03A1) → ρ (U+03C1)
  • ϴ (U+03F4) → θ (U+03B8) → Θ (U+0398)
  • ϵ (U+03F5) → Ε (U+0395) → ε (U+03B5)
  • ᲀ (U+1C80) → В (U+0412) → в (U+0432)
  • ᲁ (U+1C81) → Д (U+0414) → д (U+0434)
  • ᲂ (U+1C82) → О (U+041E) → о (U+043E)
  • ᲃ (U+1C83) → С (U+0421) → с (U+0441)
  • ᲄ (U+1C84) → Т (U+0422) → т (U+0442)
  • ᲅ (U+1C85) → Т (U+0422) → т (U+0442)
  • ᲆ (U+1C86) → Ъ (U+042A) → ъ (U+044A)
  • ᲇ (U+1C87) → Ѣ (U+0462) → ѣ (U+0463)
  • ᲈ (U+1C88) → Ꙋ (U+A64A) → ꙋ (U+A64B)
  • ẛ (U+1E9B) → Ṡ (U+1E60) → ṡ (U+1E61)
  • ẞ (U+1E9E) → ß (U+00DF) → SS (U+0053 U+0053)