NACLO 2026 - Problem JStrawbery Shortcake

Initial input: small
small is followed by: elephants (2 times), strawberries (1 time); so the next word is elephants
elephants is followed by: eat (1 time), are (2 times); so the next word is are
are is followed by: not (1 time), alleged (2 times), very (1 time); so the next word is alleged
alleged is followed by: spies (2 times), blueberries (1 time); so the next word is spies

So the sentence is small elephants are alleged spies.

In J2, the critical information is as follows:

"None of these words have any repeating letters"
"all the words are four letters long"
coaches is tokenized as [co][a][ch][es]

And the common token counts:

	1️⃣	2️⃣	3️⃣	4️⃣	5️⃣	6️⃣
1️⃣		2	2	1		0
2️⃣	2			0	1
3️⃣	2				0
4️⃣	1	0			1	2
5️⃣		1	0	1
6️⃣	0			2

Note that in general, the words overlap quite a bit, so we want to tokenize the words such that they share tokens as much as possible. That is, we always want to extract longest common substrings as tokens. My first assumption: "c" never appears as a token on its own, because the two tokens above are both "c" followed by something else.

A. [ch]anging
B. [co][a][ch][es]
C. ex[ce]ssive
D. pro[ce]eding
E. pro[ce]sses
F. started

The "ex" and "pro" prefixes must be tokens, because the suffixes look more complex and we only have 4 tokens to work with. Among the suffixes, note that "sses" contains "es", a known token. The remainder, "ss", also appears in C.

A. [ch]anging
B. [co][a][ch][es]
C. [ex][ce][ss][ive]
D. [pro][ce]eding
E. [pro][ce][ss][es]
F. started

If we divide "eding" into "ed" and "ing", then "ed" can be reused for F and "ing" can be reused for A.

A. [ch]ang[ing]
B. [co][a][ch][es]
C. [ex][ce][ss][ive]
D. [pro][ce][ed][ing]
E. [pro][ce][ss][es]
F. start[ed]

Let's count the common tokens. Although we haven't fully tokenized A and F, we can know for sure that a token doesn't appear in them if there's no such substring.

	A	B	C	D	E	F
A		1/2	0	1	0	0/1
B	1/2		0	0	1	0/1
C	0	0		1	2	0
D	1	0	1		2	1
E	0	1	2	2		0
F	0/1	0/1	0	1	0

1️⃣ shares 2 tokens with 2 other words, so this must be E, while 2️⃣ and 3️⃣ are C and D, in some order. 1️⃣ shares 1 token with 4️⃣, so 4️⃣ is B. 4️⃣ is known to share 2 tokens with 6️⃣, so 6️⃣ must be A, and the tokenization of A contains the token "a": A. [ch][a][ng][ing]. B is known to share no tokens with C or D, but 4️⃣ needs to share 1 token with 5️⃣, so 5️⃣ must be F, and F also contains the token "a": F. st[a]rt[ed]. Finally, 5️⃣ shares 1 token with 2️⃣ and no tokens with 3️⃣, so 2️⃣ is D and 3️⃣ is C.

(This is mostly guesswork, based on the assumption that the tokenization should create as many common tokens as possible. I'm not sure if there's an entirely rigorous way to do this.)

NACLO 2026 - Problem JStrawbery Shortcake

Initial input: small

small is followed by: elephants (2 times), strawberries (1 time); so the next word is elephants

elephants is followed by: eat (1 time), are (2 times); so the next word is are

are is followed by: not (1 time), alleged (2 times), very (1 time); so the next word is alleged

alleged is followed by: spies (2 times), blueberries (1 time); so the next word is spies