Home› Tutorials› Data Structures
Data StructuresSets and Frozensets in Python: Unordered Collections with Powerful Maths
- Sets hold only unique values; duplicates are silently dropped on insertion.
- Set arithmetic uses operators (
|,&,-,^) or named methods for clarity. - Membership testing with
inon a set is O(1) versus O(n) on a list. frozensetis hashable, so it can be used as a dict key or inside another set.
Creating a set
A set is an unordered collection of unique, hashable objects. You create one with curly braces or the set() constructor. Note that an empty pair of curly braces creates a dict, not a set — always use set() for an empty set:
colours = {"red", "green", "blue"}
print(type(colours)) # <class 'set'>
# From an iterable — duplicates are dropped automatically
tags = set(["python", "tutorial", "python", "beginner"])
print(tags) # {"python", "tutorial", "beginner"} (order may vary)
# Empty set — NOT {}
empty = set()
print(type(empty)) # <class 'set'>
Because sets are unordered you cannot index them with s[0]. If you need ordered unique items, collect into a set to drop duplicates and then sort: sorted(set(items)).
Core operations: add, discard, remove
The three most common mutation methods are add(), discard(), and remove():
s = {"apple", "banana"}
s.add("cherry") # add a new item
s.add("apple") # already present — no error, no duplicate
s.discard("banana") # remove if present; silent if absent
s.discard("mango") # no KeyError
s.remove("cherry") # remove if present; raises KeyError if absent
# s.remove("mango") # would raise KeyError
Use discard() when you are not sure whether the element exists. Use remove() when its absence would be a genuine bug worth surfacing.
pop() removes and returns an arbitrary element; clear() empties the set in place.
Set arithmetic: union, intersection, difference
Python's set type supports the classic mathematical operations through both operators and method names. The two styles are equivalent, but method names read more clearly in complex expressions:
| Operation | Operator | Method | Returns |
|---|---|---|---|
| Union | a | b | a.union(b) | Items in a, b, or both |
| Intersection | a & b | a.intersection(b) | Items in both a and b |
| Difference | a - b | a.difference(b) | Items in a but not in b |
| Symmetric diff | a ^ b | a.symmetric_difference(b) | Items in either, not both |
| Subset check | a <= b | a.issubset(b) | True if a is contained in b |
| Superset check | a >= b | a.issuperset(b) | True if a contains b |
visitors_mon = {"alice", "bob", "carol"}
visitors_tue = {"bob", "carol", "dan"}
# Who visited on either day?
either = visitors_mon | visitors_tue
# {"alice", "bob", "carol", "dan"}
# Who visited both days?
both = visitors_mon & visitors_tue
# {"bob", "carol"}
# Who visited Monday but not Tuesday?
monday_only = visitors_mon - visitors_tue
# {"alice"}
# Who visited exactly one day?
exclusive = visitors_mon ^ visitors_tue
# {"alice", "dan"}
In-place variants (|=, &=, -=, ^=) update the left-hand set without creating a new object.
Membership testing is O(1)
Sets are backed by a hash table, so checking whether an item is present takes constant time regardless of how many elements the set contains. Lists, by contrast, scan from the beginning and take O(n) time:
import time
big_list = list(range(10_000_000))
big_set = set(big_list)
target = 9_999_999
# List lookup: scans up to 10 million elements
start = time.perf_counter()
found = target in big_list
print(f"list: {time.perf_counter() - start:.4f}s")
# Set lookup: single hash calculation
start = time.perf_counter()
found = target in big_set
print(f"set: {time.perf_counter() - start:.4f}s")
On a typical machine the set lookup is hundreds of times faster. Any time you are calling x in collection repeatedly in a loop, converting the collection to a set first is one of the highest-leverage micro-optimisations available.
Set comprehensions
Like list and dict comprehensions, set comprehensions use curly braces with a single expression (no colon). They are ideal for building a set of transformed or filtered values in one readable line:
words = ["apple", "banana", "apricot", "blueberry", "avocado"]
# Unique first letters
initials = {w[0] for w in words}
# {"a", "b"}
# Lengths of unique words
lengths = {len(w) for w in words}
# {5, 6, 9}
# Words longer than 6 characters, uppercased
long_upper = {w.upper() for w in words if len(w) > 6}
# {"BLUEBERRY", "APRICOT"} (order not guaranteed)
Frozensets: the immutable variant
frozenset is a set that cannot be modified after creation. Because it is immutable, it is also hashable, which means it can be stored inside another set or used as a dictionary key:
fs = frozenset(["read", "write"])
# Can be used as a dict key
permissions = {
frozenset(["read"]): "viewer",
frozenset(["read", "write"]): "editor",
}
role = permissions.get(frozenset(["read", "write"]))
print(role) # "editor"
# Can be stored inside a regular set
groups = {frozenset(["a", "b"]), frozenset(["c", "d"])}
frozenset supports all the read operations of a regular set — membership testing, iteration, arithmetic — but not the mutating methods (add, discard, remove, etc.).
When sets beat lists
Reach for a set when one or more of these apply: you need to eliminate duplicates; you need fast repeated membership checks; you need mathematical set operations; or you want to treat a group of items as an unordered collection without caring about their position. Keep a list when order matters, when you need indexing, or when duplicates are meaningful data.
TypeError if you try to add them. Use frozenset or convert a list to a tuple when you need to store a collection inside a set.