Python collections: Counter, defaultdict, deque, and More
Imagine you have a basic toolbox with a hammer and a screwdriver. They work fine for simple jobs. But when you're building a deck, you really want a power drill, a level, and a tape measure.
Python's built-in list and dict are your hammer and screwdriver. They handle most tasks. But the `collections` module gives you specialized tools that make certain jobs much easier and faster.
In this tutorial, you'll learn the five most useful tools from collections: Counter, defaultdict, deque, OrderedDict, and namedtuple. Each one solves a specific problem that would take many lines of code with basic data structures.
What Is Counter and How Does It Count Things?
Have you ever counted how many times each letter appears in a word? Or tallied votes in a classroom? That's exactly what Counter does. You give it a list (or string, or any iterable), and it counts how many times each item appears.
A Counter is a special dictionary where keys are the items and values are the counts. You can create one from any iterable: a list, a string, or even another dictionary.
The most_common() method is incredibly useful. It returns items sorted by count, from highest to lowest. You can pass a number to get just the top N items.
You can also add and subtract Counters, which is perfect for combining tallies or finding differences between two datasets.
How Does defaultdict Eliminate KeyError Forever?
With a regular dictionary, if you try to access a key that doesn't exist, Python raises a KeyError. This is annoying when you're building up a dictionary, because you always have to check if a key exists before using it.
groups = {}
words = ['apple', 'banana', 'avocado', 'blueberry', 'cherry']
for word in words:
first_letter = word[0]
if first_letter not in groups:
groups[first_letter] = []
groups[first_letter].append(word)
print(groups)from collections import defaultdict
groups = defaultdict(list)
words = ['apple', 'banana', 'avocado', 'blueberry', 'cherry']
for word in words:
groups[word[0]].append(word)
print(dict(groups))A defaultdict automatically creates a default value when you access a missing key. You tell it what type of default to create: list, int, set, or any callable that returns a default value.
Why Is deque Faster Than a List for Queues?
A regular Python list is great for adding and removing items from the end. But adding or removing from the beginning is slow because Python has to shift every other element over.
A deque (pronounced "deck") stands for "double-ended queue." It's designed to be fast at both ends. Think of it like a line of people where you can quickly add or remove someone from either the front or the back.
One of the coolest features of deque is the rotate() method. It shifts all elements left or right, wrapping around the edges. Think of it like a circular conveyor belt.
You can also create a deque with a maximum size using maxlen. When the deque is full and you add a new item, the item on the opposite end is automatically removed. This is perfect for keeping track of the last N items.
Do You Still Need OrderedDict in Modern Python?
Since Python 3.7, regular dictionaries remember the order you inserted items. Before that, dictionaries had no guaranteed order, so OrderedDict was essential.
Today, OrderedDict still has one useful trick: its move_to_end() method. This lets you reorder items without rebuilding the dictionary.
How Do namedtuples Make Your Code More Readable?
Regular tuples use index numbers to access data: point[0] for x and point[1] for y. But what does index 0 mean? You have to remember the order. A namedtuple lets you access items by name instead, making your code much easier to read.
point = (3, 7)
print(f'x={point[0]}, y={point[1]}')
student = ('Alice', 16, 'A')
print(f'{student[0]} is {student[1]} with grade {student[2]}')from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
point = Point(3, 7)
print(f'x={point.x}, y={point.y}')
Student = namedtuple('Student', ['name', 'age', 'grade'])
student = Student('Alice', 16, 'A')
print(f'{student.name} is {student.age} with grade {student.grade}')A namedtuple is still a tuple, so it's immutable (you can't change its values after creation) and you can use it anywhere a regular tuple works. But it adds named fields that make your code self-documenting.
Practice Exercises
Use Counter from the collections module to count how many times each character appears in the string 'abracadabra'. Print the Counter object.
Given the list words = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat', 'fish'], use Counter to find and print the 2 most common words. Use most_common(2).
Use defaultdict(list) to group the names ['Alice', 'Bob', 'Anna', 'Charlie', 'Ben'] by their first letter. Print the resulting dictionary (convert to a regular dict first with dict()).
What does this code print? Think about what appendleft and popleft do, then type the exact output.
from collections import deque
d = deque([1, 2, 3])
d.appendleft(0)
d.append(4)
d.popleft()
print(list(d))Create a deque with maxlen=3 called history. Append the values 1 through 5 one at a time. After the loop, print the deque converted to a list. It should only contain the last 3 items.
Create a namedtuple called Book with fields title, author, and pages. Create a book instance with title 'Python Basics', author 'Guido', and pages 350. Print the book's title and pages in the format: Python Basics has 350 pages.
This code tries to count word lengths using defaultdict, but it has a bug. The programmer used the wrong default factory. Fix it so that each word length maps to the count of words with that length.
Expected output: {5: 2, 6: 1, 3: 1}