Wordle — What Is Statistically the Best Word to Start the Game With?
Wordle fans: this blog post will answer this question.
Introduction
Wordle is a daily game [1]. Each day there’s a common 5 letter word everyone needs to discover. You are given 6 tries. Each try you need to submit a 5 letter english word. After each guess for each letter you know if they appear and in the right place (“green”), appear but in the wrong place (“yellow”) or don’t appear (“grey”). The goal is to guess the word in the least number of tries.
Now I am sure all Wordle fans and fans to be ask themselves what the best word to start with is, And what the best strategy to play the game is. To answer these questions I used English words in NLTK [2]. The NLTK English vocabulary is of 236,736 words. I first filtered only words with length of five, and I was left with 10,422 words. This is our search space.
Let’s Start Simple
To answer the first question, what is the best word to start the game with, I decided to start with this simple heuristic: let’s say I am looking for the word that will give me the maximal number of “yellow” squares (true but in the wrong place) and within them I want the one that will maximize the “green” squares (true in the right place). Of course this is a navie approach but let’s start with this one.
We want to choose words based on the most common letters in five-letter words in English. Based on the MLE principle, let’s use our dataset to count the letters:
We learn from this distribution that starting with a words that has the characters: ‘a’,’e’,’r’,’o’,’s’ have the highest probability for “yellow” squares (true but in the wrong place). Let’s search our dictionary to see the possible words containing all those five letters:
arose
oreas
Now let’s say we want to choose between the two the word that will have the highest probability for “green” (true in the right place) squares. We’ll use the MLE principal again for p (letter|position) and count the frequency of a letter in a specific position based on the 5 letter words vocabulary. Later, we’ll normalize each of the letter count to get probability.
Let’s plot a heatmap of the location of each of the five letters:
let’s assume the letters are i.i.d given their place and define and that we don’t have a final vocabulary. We can tell this generative story. A five letter word is generated given the following:
p(five letter word with ‘a’, ‘e’, ‘r’,’o’,’s’) = p(letter|position=0)*p(letter|position=1)*p(letter|position=2)*p(letter|position=3)*p(letter|position=4)
p(‘arose’) = 0.0004
p(‘oreas’) = 0.000007
p(‘arose)>p(‘oreas’)
According to this simple heuristic I recommend you start your game with the word ‘arose’ .
Can We Do Better?
Notice that our heuristic separated the frequency of the letters (“yellow”) and the order of the letters (“green”) into two consecutive steps, which is obviously sub-optimal. Let’s now try to model both “green”, “yellow” and “grey” all together. Remember that we want to chose the word that will help us reduce the uncertainty, therefore minimize the entropy. But how will we calculate the entropy of each word?
We can think of the Wordle answer to our guess as a tree of depth 5 + 1. Each level is for the position of the letter in the word, and each node is split into three : “green”, “yellow”, “grey” according to the possible Wordle results. This results in 3⁵ leaves in the tree. Each branch representing the Wordle response to our guess.
For each guess we can count in each branch the possible english words that follow the rules. In each leaf we calculate the probability by asking how many words follow the path rule divided by the total number of words (10,422), notice that the leaves probability should sum up to one.
Let’s make sure you are all following. In this diagram, the rightmost leaf will always have only one word of 5 exact matches therefore probability 1/10442 , and the left most leaf will have all the possible words without the five letters in our guess divided by the total number of words.
We can now run this distribution trees on all of our five-letter words (10,422) and calculate the entropy of each word by calculating the entropy of the leaves, the word with the lowest entropy will give us the best word to start the game with.
This method can easily be extended to a method to play the game building conditional probability trees based on previous guesses.
References:
[1] Wordle Game — https://www.powerlanguage.co.uk/wordle/
[2] Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.