A crash course in Python “comprehensions” and “generators”

Photo by Kelly Sikkema on Unsplash

I love Python’s list comprehensions and generators.

They keep my code concise. They’re great one-liners for exploring and “munging” data. They’re intuitive, and once you know what you’re looking at, very easy to read.

Haven’t heard of them? Read on! Think you know all the variations of this popular Python construct already? Read on…

Spoiler Alert: There’s a great little video at the end of this article designed to help intermediate level coders with nested list comprehensions

So… what are “comprehensions” exactly?

The best way of explaining these little beauties is to show you what they’re intended to replace and improve on. Instead of this traditional for loop:

>>> fruits = ["apples", "bananas", "pears", "pears"]
>>> new_words = []
>>> for word in fruits:
... new_words.append(word.title())
...
>>> new_words
['Apples', 'Bananas', 'Pears', 'Pears']

You can just write:

>>> [word.title() for word in fruits]
['Apples', 'Bananas', 'Pears', 'Pears']

This, dear reader, is a list comprehension. Beautiful isn’t it?

No indentation or colon to get remember. No empty list to define then build up. Just the same familiar brackets you already know and love for indicating a list (or as you’ll see shortly, a dictionary or set).

Generators

Change the square brackets to regular brackets and you create something called a generator. These are like list comprehensions but they’re described as “lazy” because they don’t evaluate what’s inside them until the very last minute when they’re actually needed. They’re great for improving the speed of your code and minimising memory use, especially when you’re dealing with real data and large files, not these toy tutorial examples:

>>> (word.title() for word in fruits) 
<generator object <genexpr> at 0x000001A2A97D20A0>
>>> generator = _ # In a REPL session, "_" means "the previous output"
>>> next(generator)
'Apples'
>>> list(generator) # Notice that 'Apples' has already been removed
['Bananas', 'Pears', 'Pears']

There are subtle differences between generators, iterators and iterables in Python which you might like to Google, but this article is intended as a practical crash-course to get you using the tools, even if you can’t precisely categorise or define them. So let’s move on…

Set Comprehensions

Change the square brackets to curly ones and you have yourself a set comprehension which is a great way of filtering out duplicates or finding the differences or overlaps with other sets of data:

>>> {word.title() for word in fruits}
{'Apples', 'Bananas', 'Pears'}

Dictionary Comprehensions

You can create dictionary comprehensions using curly brackets and starting off with the pattern {key: value for… }:

>>> {x.title(): fruits.count(x) for x in fruits} 
{'Apples': 1, 'Bananas': 1, 'Pears': 2}

Unpacking Values

You can extract or “unpack” more than one item (e.g. keys and values from a dictionary, or multiple values from a list of tuples) into any type of comprehension or generator using one of these patterns:

[… for x, y, z in your_iterable]
{… for x, y, z in your_iterable}
(… for x, y, z in your_iterable)
{…: … for x, y, z in your_iterable}

For example:

>>> fruits = [("apples", "2", "round"), ("bananas", "8", "curved")]
>>> [(x,z) for x,y,z in fruits]
[('apples', 'round'), ('bananas', 'curved')]

You can also unpack variable-length tuples with a special use of the * character, meaning “unpack into a list of one or more values”:

>>> fruits = [("apples", "green"), ("bananas", "yellow", "curved")]
>>> [f"{x.title()} are normally {' and '.join(y)}" for x, *y in fruits]
['Apples are normally green', 'Bananas are normally yellow and curved']

If the f"…" pattern in the second line (above) is a new syntax for you, it’s a great tool to add to your tool-kit. Just Google f-strings in Python.

You’ll often see the following kinds of pattern used to unpack a Python dictionary:

>>> fruits = {"apples": "green", "bananas": "yellow", "pears": "green"}
>>> {f"{k} are {v}" for k,v in fruits.items()}
{'bananas are yellow', 'apples are green', 'pears are green'}

>>> [k for k in fruits]
['apples', 'bananas', 'pears'] # A list of keys

>>> [v for v in fruits.values()]
['green', 'yellow', 'green'] # A list of values

>>> {v for v in fruits.values()}
{'green', 'yellow'} # A set of (unique) values

Notice that the order of items in a Python set isn’t fixed, so the order of the output above doesn’t necessarily follow the order of the dictionary we unpacked.

Filtering Values

You can filter your results by adding if followed by an expression:

>>> fruits = ["apples", "pears", "pears", "", None, False, 0, [], {}, ()] 
>>> {x.title(): fruits.count(x) for x in fruits if x}
{'Apples': 1, 'Pears': 2}

>>> exclusions = "PEARS ORANGES MELONS".split()
>>> {x.title() for x in fruits if x and not x.upper() in exclusions}
{'Apples'}

The last few values in fruits are examples of so-called “falsey” expressions in Python. They’re considered to be False when it comes to evaluating if x , and this is a nice concise way of excluding them from your results.

And finally, you can throw in the keyword else to assign alternative values in your comprehension, but notice the word order now needs to follow the pattern <value> if x else <other value> :

>>> fruits = ["apples", "pears", "pears", "", None, False, 0, [], {}, ()] 
>>> {x.upper() if x else "<falsey>" for x in fruits}
{'APPLES', '<falsey>', 'PEARS'} # Notice the order of a set isn't fixed

Nested Comprehensions

Here’s a simple example of a nested list or “list of lists”:

>>> nest1 = ['egg1', 'egg2']
>>> nest2 = ['egg3', 'egg4', 'egg5']
>>> trees = [nest1, nest2]
>>> trees
[['egg1', 'egg2'], ['egg3', 'egg4', 'egg5']]

A common task is to extract all the lowest order elements from a nested list like this, or in other words, to “flatten” it. We can do this easily and succinctly with a nested list comprehension:

>>> [x for y in trees for x in y]
['egg1', 'egg2', 'egg3', 'egg4', 'egg5']

What about flattening a nested dictionary using a list comprehension? Let’s use some dog breeds as an example this time:

>>> dog_breeds = {
... "Terrier": ["Paterdale", "Border"],
... "Other": ["Dalmation", "Poodle", "Whippet"],
... }

Did you work it out on your own?

>>> [dog for breed in dog_breeds.values() for dog in breed]
['Paterdale', 'Border', 'Dalmation', 'Poodle', 'Whippet']

Big Reveal

When you read the one-liners above I hope you’ll agree they look elegant and concise and pretty easy to read and understand what’s going on. The tricky part though will be trying to reconstruct the pesky thing yourself, several weeks from now while staring at a blank screen (especially if you forget to bookmark it or follow me after reading this!).

I’ve been coding professionally for several years and I’m not ashamed to admit it’s taken me ages to truly master this simple nested comprehension to the point of being able to write it without deliberate thought. For a while I even resorted to creating code snippets in my IDE but still it didn’t “stick” and I found myself back in StackOverflow time and time again…

The good news for me and you is that I had a “Eureka” moment not long ago and I’ve prepared a short animation to help remember how to construct these mighty one-liners. Basically you just have to visualise what the traditional way of writing for loops would look like, then slide the lines into the right position between square brackets. A picture speaks a thousand words, so here it is:

How to remember nested comprehensions. Video by author.

Final Points to Ponder

It’s easy to get over-enthused about comprehensions and generators, and as a general rule I’d suggest that if you’re starting to spill over into two or more lines, your code is becoming unreadable and you might like to consider:

  1. Going back to using a traditional for loop where you can write each step or transformation on a separate (shorter) line.
  2. Defining a function that contains each step or transformation on a separate line, then use: [my_function(x) for x in my_iterable]

Opinions vary on these next suggestions, but I find it quite difficult coming up with meaningful variable names at the best of times, let alone for a one-line comprehension, so I tend to go with as short a name as possible based on one of the following approaches:

  1. Simple ‘algebraic’ names e.g. [(x, y) for x, y in coordinates]
  2. Matching singular/plural names e.g. [tree for tree in trees]
  3. Short, general-purpose names like index, count, item, key, value, element, sublist, group, result, text, word, prefix, suffix, body, row, column, field, cell, line, page, sheet, book, tag, match, first, last, nth e.g.: {index: item for index, item in enumerate(my_iterable)}

Test Your Mastery

You only get so far by (passively) reading an article like this, so I’ll finish with a challenge for you… Well two challenges actually:

CHALLENGE 1: Reinforce your learning by copying each of the code snippets from this article into your Python IDE, and let me know if you find any typos or errors?

CHALLENGE 2: Now you have the nested comprehensions animation firmly in your mind, try writing and testing a one-liner for flattening a list of lists of lists or in other words a 3-level deep nested list?

I hope this article has shown you some of the powerful ways you can build on Python’s basic list comprehension pattern to do some pretty useful things with very little code or hard work.

Good luck with the challenges, and please let me know if you have any handy examples of your own to share in the comments below.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pete Fison

Pete Fison

77 Followers

Former IT Director & Video Producer. Experienced Python developer with a special interest in web scraping, ETL, NLP, AI/ML, rich media/social media and YouTube