The mathematically optimal first guess in Wordle
Like many people on my twitter timeline currently, I’ve been playing Wordle over the past few days. It’s a fun simple little game where you guess 5 letter words, and it tells you how many letters you got correct and if they are in the right positions or not. Part of the game is that there’s only one per day, and its the same word for everyone playing.
So I decided to make a bot to play the game optimally for me. Now, you could consider this cheating, but considering how easy it is to cheat by “just looking up the answer for today’s puzzle”, I don’t think making a bot like this is unethical or anything. But knowing the “best possible first word” might possibly ruin the game for you, so you should stop reading now if you don’t want to know.
Anyway, lets get started.
First, I grabbed the word lists the game uses. There’s a separate list for solutions and possible guesses (so that you can enter weird scrabble words like AAHED, but that won’t show up as an actual solution for the puzzle ever cause its not a commonly used word), and both lists are accessible by viewing the game’s source code. There are 2315 possible solutions, and 12972 words it accepts as guesses.
A quick estimation of how long this would take, I first multiplied 12972 by 2315 to get ~30 million guesses, and evaluating each guess is gonna multiply that by 2315 again making it on the order of 60 billion calculations. 60 billion calculations seems like a lot, but its a similar order of magnitude to how many operations a CPU can do per second (4 ghz x 8 cores = 32 billion), so it should be feasible to do this in the dumb brute force way. I chose C++ to have as little overhead as possible, and also because I am very comfortable with it.
I have uploaded the code for the bot on github here, for reference.
After a little bit of coding, I had a “simple” test. For every possible guess, it checks it against every possible solution. “Green” squares (the letter is in the correct space) are worth 2 points, “Yellow” squares (the letter exists in the word, but its in the wrong spot) are worth 1 point, and grey squares (the letter does not exist in the word) are worth zero points. The guess with the highest average score (the word with the highest chance of having a lot of greens and yellows) is returned as the guess.
This system does fairly well. On average it finds a solution with 3.69017 guesses. The first word it picks is SOARE (“a young hawk”). However, this bot does not beat the game every time. In the worst case, it takes 8 guesses to find a solution.
Now, while this method is extremely fast, we can do better. A more specific metric for how good a guess is would be “how many possible solutions are left on average after making this guess”. After another couple hours I had a version of the bot that would use this method, multithreaded so it could actually compute the result in a reasonable amount of time. This version of the bot finds a solution in 3.49417 guesses on average, and has a worst-case of 5 guesses (meaning, it *always* finds a solution). The ideal first guess using this method is ROATE.
The efficiency at which this bot finds solutions is honestly pretty impressive to look at. Check out how quickly it finds the correct solution for today’s (12/29) puzzle:
So there you have it, the ideal first guess for Wordle is ROATE. It means, “The cumulative net earnings after taxes available to common shareholders, adjusted for tax-affected amortization of intangibles, for the calendar quarters in each calendar year in a specified period of time divided by average shareholder’s tangible common equity”.
But wait, ROATE might be the ideal first guess, based on average guess length, but since its only in the guesses list and not in the solutions list, it can never hit a “hole-in-one”. And if you really want to impress your friends, you want to be able to hit that 1 in 2315 chance of getting it on the first guess. For this, the ideal first guess would be RAISE. With RAISE as the first guess, you will average 3.49546 guesses to solve on average, only very slightly worse than ROATE, but have that sweet, sweet chance of landing that hole-in-one. (Raise is also the word with the best “worst case” performance, so I favor this myself now)
So there you have it. ROATE is the mathematically optimal first guess in Wordle. RAISE is very slightly less good, but gives you a chance of a hole-in-one. Wordle is solved!
…
or is it?
Ok so, ROATE is the optimal first guess, if the metric you’re measuring is “the size of the possible solution list after a guess”. But if what you really want is “get to a solution in the least moves on average” that is not quite equivalent to what I was measuring here.
I got to coding up a recursive solution that would directly measure the average number of additional guesses it would take to arrive to a solution based on any guess, but it proved way too much for my computer to handle for the naive brute-force method there. We’re talking “longer than the age of the universe” amounts of computer time here. I have actual work I should be working on though, so I’ll leave figuring this out to you all as homework. Good luck! Please tweet at me if you manage to figure out the *actual* optimal solution.