A crash course in Python “comprehensions” and “generators”
Master them in 20 minutes. Use them every day.
I love Python’s list comprehensions and generators.
They keep my code concise. They’re great one-liners for exploring and “munging” data. They’re intuitive, and once you know what you’re looking at, very easy to read.
Haven’t heard of them? Read on! Think you know all the variations of this popular Python construct already? Read on…
Spoiler Alert: There’s a great little video at the end of this article designed to help intermediate level coders with nested list comprehensions…
So… what are “comprehensions” exactly?
The best way of explaining these little beauties is to show you what they’re intended to replace and improve on. Instead of this traditional for
loop:
>>> fruits = ["apples", "bananas", "pears", "pears"]
>>> new_words = []
>>> for word in fruits:
... new_words.append(word.title())
...
>>> new_words
['Apples', 'Bananas', 'Pears', 'Pears']
You can just write:
>>> [word.title() for word in fruits]
['Apples', 'Bananas', 'Pears', 'Pears']
This, dear reader, is a list comprehension. Beautiful isn’t it?
No indentation or colon to get remember. No empty list to define then build up. Just the same familiar brackets you already know and love for indicating a list (or as you’ll see shortly, a dictionary or set).
Generators
Change the square brackets to regular brackets and you create something called a generator. These are like list comprehensions but they’re described as “lazy” because they don’t evaluate what’s inside them until the very last minute when they’re actually needed. They’re great for improving the speed of your code and minimising memory use, especially when you’re dealing with real data and large files, not these toy tutorial examples:
>>> (word.title() for word in fruits)
<generator object <genexpr> at 0x000001A2A97D20A0>
>>> generator = _ # In a REPL session, "_" means "the previous output"
>>> next(generator)
'Apples'
>>> list(generator) # Notice that 'Apples' has already been removed
['Bananas', 'Pears', 'Pears']
There are subtle differences between generators, iterators and iterables in Python which you might like to Google, but this article is intended as a practical crash-course to get you using the tools, even if you can’t precisely categorise or define them. So let’s move on…
Set Comprehensions
Change the square brackets to curly ones and you have yourself a set comprehension which is a great way of filtering out duplicates or finding the differences or overlaps with other sets of data:
>>> {word.title() for word in fruits}
{'Apples', 'Bananas', 'Pears'}
Dictionary Comprehensions
You can create dictionary comprehensions using curly brackets and starting off with the pattern {key: value for… }
:
>>> {x.title(): fruits.count(x) for x in fruits}
{'Apples': 1, 'Bananas': 1, 'Pears': 2}
Unpacking Values
You can extract or “unpack” more than one item (e.g. keys and values from a dictionary, or multiple values from a list of tuples) into any type of comprehension or generator using one of these patterns:
[… for x, y, z in your_iterable]
{… for x, y, z in your_iterable}
(… for x, y, z in your_iterable)
{…: … for x, y, z in your_iterable}
For example:
>>> fruits = [("apples", "2", "round"), ("bananas", "8", "curved")]
>>> [(x,z) for x,y,z in fruits]
[('apples', 'round'), ('bananas', 'curved')]
You can also unpack variable-length tuples with a special use of the *
character, meaning “unpack into a list of one or more values”:
>>> fruits = [("apples", "green"), ("bananas", "yellow", "curved")]
>>> [f"{x.title()} are normally {' and '.join(y)}" for x, *y in fruits]
['Apples are normally green', 'Bananas are normally yellow and curved']
If the
f"…"
pattern in the second line (above) is a new syntax for you, it’s a great tool to add to your tool-kit. Just Googlef-strings in Python
.
You’ll often see the following kinds of pattern used to unpack a Python dictionary:
>>> fruits = {"apples": "green", "bananas": "yellow", "pears": "green"}
>>> {f"{k} are {v}" for k,v in fruits.items()}
{'bananas are yellow', 'apples are green', 'pears are green'}
>>> [k for k in fruits]
['apples', 'bananas', 'pears'] # A list of keys
>>> [v for v in fruits.values()]
['green', 'yellow', 'green'] # A list of values
>>> {v for v in fruits.values()}
{'green', 'yellow'} # A set of (unique) values
Notice that the order of items in a Python set isn’t fixed, so the order of the output above doesn’t necessarily follow the order of the dictionary we unpacked.
Filtering Values
You can filter your results by adding if
followed by an expression:
>>> fruits = ["apples", "pears", "pears", "", None, False, 0, [], {}, ()]
>>> {x.title(): fruits.count(x) for x in fruits if x}
{'Apples': 1, 'Pears': 2}
>>> exclusions = "PEARS ORANGES MELONS".split()
>>> {x.title() for x in fruits if x and not x.upper() in exclusions}
{'Apples'}
The last few values in
fruits
are examples of so-called “falsey” expressions in Python. They’re considered to beFalse
when it comes to evaluatingif x
, and this is a nice concise way of excluding them from your results.
And finally, you can throw in the keyword else
to assign alternative values in your comprehension, but notice the word order now needs to follow the pattern <value> if x else <other value>
:
>>> fruits = ["apples", "pears", "pears", "", None, False, 0, [], {}, ()]
>>> {x.upper() if x else "<falsey>" for x in fruits}
{'APPLES', '<falsey>', 'PEARS'} # Notice the order of a set isn't fixed
Nested Comprehensions
Here’s a simple example of a nested list or “list of lists”:
>>> nest1 = ['egg1', 'egg2']
>>> nest2 = ['egg3', 'egg4', 'egg5']
>>> trees = [nest1, nest2]
>>> trees
[['egg1', 'egg2'], ['egg3', 'egg4', 'egg5']]
A common task is to extract all the lowest order elements from a nested list like this, or in other words, to “flatten” it. We can do this easily and succinctly with a nested list comprehension:
>>> [x for y in trees for x in y]
['egg1', 'egg2', 'egg3', 'egg4', 'egg5']
What about flattening a nested dictionary using a list comprehension? Let’s use some dog breeds as an example this time:
>>> dog_breeds = {
... "Terrier": ["Paterdale", "Border"],
... "Other": ["Dalmation", "Poodle", "Whippet"],
... }
Did you work it out on your own?
>>> [dog for breed in dog_breeds.values() for dog in breed]
['Paterdale', 'Border', 'Dalmation', 'Poodle', 'Whippet']
Big Reveal
When you read the one-liners above I hope you’ll agree they look elegant and concise and pretty easy to read and understand what’s going on. The tricky part though will be trying to reconstruct the pesky thing yourself, several weeks from now while staring at a blank screen (especially if you forget to bookmark it or follow me after reading this!).
I’ve been coding professionally for several years and I’m not ashamed to admit it’s taken me ages to truly master this simple nested comprehension to the point of being able to write it without deliberate thought. For a while I even resorted to creating code snippets in my IDE but still it didn’t “stick” and I found myself back in StackOverflow time and time again…
The good news for me and you is that I had a “Eureka” moment not long ago and I’ve prepared a short animation to help remember how to construct these mighty one-liners. Basically you just have to visualise what the traditional way of writing for
loops would look like, then slide the lines into the right position between square brackets. A picture speaks a thousand words, so here it is:
Final Points to Ponder
It’s easy to get over-enthused about comprehensions and generators, and as a general rule I’d suggest that if you’re starting to spill over into two or more lines, your code is becoming unreadable and you might like to consider:
- Going back to using a traditional
for
loop where you can write each step or transformation on a separate (shorter) line. - Defining a function that contains each step or transformation on a separate line, then use:
[my_function(x) for x in my_iterable]
Opinions vary on these next suggestions, but I find it quite difficult coming up with meaningful variable names at the best of times, let alone for a one-line comprehension, so I tend to go with as short a name as possible based on one of the following approaches:
- Simple ‘algebraic’ names e.g.
[(x, y) for x, y in coordinates]
- Matching singular/plural names e.g.
[tree for tree in trees]
- Short, general-purpose names like index, count, item, key, value, element, sublist, group, result, text, word, prefix, suffix, body, row, column, field, cell, line, page, sheet, book, tag, match, first, last, nth e.g.:
{index: item for index, item in enumerate(my_iterable)}
Test Your Mastery
You only get so far by (passively) reading an article like this, so I’ll finish with a challenge for you… Well two challenges actually:
CHALLENGE 1: Reinforce your learning by copying each of the code snippets from this article into your Python IDE, and let me know if you find any typos or errors?
CHALLENGE 2: Now you have the nested comprehensions animation firmly in your mind, try writing and testing a one-liner for flattening a list of lists of lists or in other words a 3-level deep nested list?
I hope this article has shown you some of the powerful ways you can build on Python’s basic list comprehension pattern to do some pretty useful things with very little code or hard work.
Good luck with the challenges, and please let me know if you have any handy examples of your own to share in the comments below.