List comprehensions in Python are concise, syntactic constructs. They can be utilized to generate lists from other lists by applying functions to each element in the list. The following section explains and demonstrates the use of these expressions.
List Comprehensions
A list comprehension creates a new list by applying an expression to each element of an iterable. The most basic form is:
[ <expression> for <element> in <iterable> ]
There's also an optional 'if' condition:
[ <expression> for <element> in <iterable> if <condition> ]
Each <element> in the <iterable> is plugged in to the <expression> if the (optional) <condition> evaluates to true . All results are returned at once in the new list. Generator expressions are evaluated lazily, but list comprehensions evaluate the entire iterator immediately - consuming memory proportional to the iterator's length.
To create a list of squared integers:
squares = [x * x for x in (1, 2, 3, 4)]
# squares: [1, 4, 9, 16]
The for expression sets x to each value in turn from (1, 2, 3, 4). The result of the expression x * x is appended to an internal list. The internal list is assigned to the variable squares when completed.
Besides a speed increase (as explained here), a list comprehension is roughly equivalent to the following for-loop:
squares = []
for x in (1, 2, 3, 4):
squares.append(x * x)
# squares: [1, 4, 9, 16]
The expression applied to each element can be as complex as needed:
# Get a list of uppercase characters from a string
[s.upper() for s in "Hello World"]
# ['H', 'E', 'L', 'L', 'O', ' ', 'W', 'O', 'R', 'L', 'D']
# Strip off any commas from the end of strings in a list
[w.strip(',') for w in ['these,', 'words,,', 'mostly', 'have,commas,']]
# ['these', 'words', 'mostly', 'have,commas']
# Organize letters in words more reasonably - in an alphabetical order
sentence = "Beautiful is better than ugly"
["".join(sorted(word, key = lambda x: x.lower())) for word in sentence.split()]
# ['aBefiltuu', 'is', 'beertt', 'ahnt', 'gluy']
else
else can be used in List comprehension constructs, but be careful regarding the syntax. The if/else clauses should be used before for loop, not after:
# create a list of characters in apple, replacing non vowels with '*'
# Ex - 'apple' --> ['a', '*', '*', '*' ,'e']
[x for x in 'apple' if x in 'aeiou' else '*']
#SyntaxError: invalid syntax
# When using if/else together use them before the loop
[x if x in 'aeiou' else '*' for x in 'apple']
#['a', '*', '*', '*', 'e']
Note this uses a different language construct, a conditional expression, which itself is not part of the comprehension syntax. Whereas the if after the for…in is a part of list comprehensions and used to filter elements from the source iterable.
Double Iteration
Order of double iteration [... for x in ... for y in ...] is either natural or counter-intuitive. The rule of thumb is to follow an equivalent for loop:
def foo(i):
return i, i + 0.5
for i in range(3):
for x in foo(i):
yield str(x)
This Becomes:
[str(x)
for i in range(3)
for x in foo(i) ]
This can be compressed into one line as [str(x) for i in range(3) for x in foo(i)]
In-place Mutation and Other Side Effects
Before using list comprehension, understand the difference between functions called for their side effects (mutating, or in-place functions) which usually return None, and functions that return an interesting value.
Many functions (especially pure functions) simply take an object and return some object. An in-place function modifies the existing object, which is called a side effect. Other examples include input and output operations such as printing.
list.sort() sorts a list in-place (meaning that it modifies the original list) and returns the value None. Therefore, it won't work as expected in a list comprehension:
[x.sort() for x in [[2, 1], [4, 3], [0, 1]]]
# [None, None, None]
Instead, sorted() returns a sorted list rather than sorting in-place:
[sorted(x) for x in [[2, 1], [4, 3], [0, 1]]]
# [[1, 2], [3, 4], [0, 1]]
Using comprehensions for side-effects is possible, such as I/O or in-place functions. Yet a for loop is usually more readable. While this works in Python 3:
[print(x) for x in (1, 2, 3)]
Instead use:
for x in (1, 2, 3):
print(x)
In some situations, side effect functions are suitable for list comprehension. random.randrange() has the side effect of changing the state of the random number generator, but it also returns an interesting value. Additionally, next() can be called on an iterator.
The following random value generator is not pure, yet makes sense as the random generator is reset every time the expression is evaluated:
from random import randrange
[randrange(1, 7) for _ in range(10)]
# [2, 3, 2, 1, 1, 5, 2, 4, 3, 5]
Whitespace in list comprehensions:
More complicated list comprehensions can reach an undesired length, or become less readable. Although less common in examples, it is possible to break a list comprehension into multiple lines like so:
[
x for x
in 'foo'
if x not in 'bar'
]
Conditional List Comprehensions
Given a list comprehension you can append one or more if conditions to filter values.
[ <expression> for <element> in <iterable> if <condition> ]
For each <element> in the <iterable>; if the (optional) <condition> evaluates to True,add <expression> (usually a function of <element>) to the returned list.
For example, this can be used to extract only even numbers from a sequence of integers:
[x for x in range(10) if x % 2 == 0]
# Out: [0, 2, 4, 6, 8] .
The above code is equivalent to:
even_numbers = []
for x in range(10):
if x % 2 == 0:
even_numbers.append(x)
print(even_numbers)
# Out: [0, 2, 4, 6, 8]
Also, a conditional list comprehension of the form [e for x in y if c] (where e and c are expressions in terms of x) is equivalent to list(filter(lambda x: c, map(lambda x: e, y))).
Despite providing the same result, pay attention to the fact that the former example is almost 2x faster than the latter one. For those who are curious, this is a nice explanation of the reason why.
Note that this is quite different from the ... if ... else ... conditional expression (sometimes known as a ternary expression) that you can use for the part of the list comprehension. Consider the following example:
[x if x % 2 == 0 else None for x in range(10)]
# Out: [0, None, 2, None, 4, None, 6, None, 8, None]
Here the conditional expression isn't a filter, but rather an operator determining the value to be used for the list items:
<value-if-condition-is-true> if <condition> else <value-if-condition-is-false>
This becomes more obvious if you combine it with other operators:
[2 * (x if x % 2 == 0 else -1) + 1 for x in range(10)]
# Out: [1, -1, 5, -1, 9, -1, 13, -1, 17, -1]
Avoid repetitive and expensive operations using conditional clause :
Consider the below list comprehension:
>>> def f(x):
import time
time.sleep(.1) # Simulate expensive functio
return x**2
>>> [f(x) for x in range(1000) if f(x) > 10]
[16, 25, 36, ...]
This results in two calls to f(x) for 1,000 values of x: one call for generating the value and the other for checking the if condition. If f(x) is a particularly expensive operation, this can have significant performance implications. Worse, if calling f() has side effects, it can have surprising results.
Instead, you should evaluate the expensive operation only once for each value of x by generating an intermediate iterable (generator expression) as follows:
>>> [v for v in (f(x) for x in range(1000)) if v > 10]
[16, 25, 36, ...]
Or, using the builtin map equivalent:
>>> [v for v in map(f, range(1000)) if v > 10]
[16, 25, 36, ...]
Another way that could result in a more readable code is to put the partial result (v in the previous example) in an iterable (such as a list or a tuple) and then iterate over it. Since v will be the only element in the iterable, the result is that we now have a reference to the output of our slow function computed only once:
>>> [v for x in range(1000) for v in [f(x)] if v > 10]
[16, 25, 36, ...]