Home Tutorials Data Structures

Data Structures

Sets and Frozensets in Python: Unordered Collections with Powerful Maths

Pyford Notes July 1, 2026 7 min read
Key points
  • Sets hold only unique values; duplicates are silently dropped on insertion.
  • Set arithmetic uses operators (|, &, -, ^) or named methods for clarity.
  • Membership testing with in on a set is O(1) versus O(n) on a list.
  • frozenset is hashable, so it can be used as a dict key or inside another set.

Creating a set

A set is an unordered collection of unique, hashable objects. You create one with curly braces or the set() constructor. Note that an empty pair of curly braces creates a dict, not a set — always use set() for an empty set:

colours = {"red", "green", "blue"}
print(type(colours))   # <class 'set'>

# From an iterable — duplicates are dropped automatically
tags = set(["python", "tutorial", "python", "beginner"])
print(tags)  # {"python", "tutorial", "beginner"} (order may vary)

# Empty set — NOT {}
empty = set()
print(type(empty))   # <class 'set'>

Because sets are unordered you cannot index them with s[0]. If you need ordered unique items, collect into a set to drop duplicates and then sort: sorted(set(items)).

Core operations: add, discard, remove

The three most common mutation methods are add(), discard(), and remove():

s = {"apple", "banana"}

s.add("cherry")       # add a new item
s.add("apple")        # already present — no error, no duplicate

s.discard("banana")   # remove if present; silent if absent
s.discard("mango")    # no KeyError

s.remove("cherry")    # remove if present; raises KeyError if absent
# s.remove("mango")   # would raise KeyError

Use discard() when you are not sure whether the element exists. Use remove() when its absence would be a genuine bug worth surfacing.

pop() removes and returns an arbitrary element; clear() empties the set in place.

Set arithmetic: union, intersection, difference

Python's set type supports the classic mathematical operations through both operators and method names. The two styles are equivalent, but method names read more clearly in complex expressions:

OperationOperatorMethodReturns
Uniona | ba.union(b)Items in a, b, or both
Intersectiona & ba.intersection(b)Items in both a and b
Differencea - ba.difference(b)Items in a but not in b
Symmetric diffa ^ ba.symmetric_difference(b)Items in either, not both
Subset checka <= ba.issubset(b)True if a is contained in b
Superset checka >= ba.issuperset(b)True if a contains b
visitors_mon = {"alice", "bob", "carol"}
visitors_tue = {"bob", "carol", "dan"}

# Who visited on either day?
either = visitors_mon | visitors_tue
# {"alice", "bob", "carol", "dan"}

# Who visited both days?
both = visitors_mon & visitors_tue
# {"bob", "carol"}

# Who visited Monday but not Tuesday?
monday_only = visitors_mon - visitors_tue
# {"alice"}

# Who visited exactly one day?
exclusive = visitors_mon ^ visitors_tue
# {"alice", "dan"}

In-place variants (|=, &=, -=, ^=) update the left-hand set without creating a new object.

Membership testing is O(1)

Sets are backed by a hash table, so checking whether an item is present takes constant time regardless of how many elements the set contains. Lists, by contrast, scan from the beginning and take O(n) time:

import time

big_list = list(range(10_000_000))
big_set  = set(big_list)

target = 9_999_999

# List lookup: scans up to 10 million elements
start = time.perf_counter()
found = target in big_list
print(f"list: {time.perf_counter() - start:.4f}s")

# Set lookup: single hash calculation
start = time.perf_counter()
found = target in big_set
print(f"set:  {time.perf_counter() - start:.4f}s")

On a typical machine the set lookup is hundreds of times faster. Any time you are calling x in collection repeatedly in a loop, converting the collection to a set first is one of the highest-leverage micro-optimisations available.

Set comprehensions

Like list and dict comprehensions, set comprehensions use curly braces with a single expression (no colon). They are ideal for building a set of transformed or filtered values in one readable line:

words = ["apple", "banana", "apricot", "blueberry", "avocado"]

# Unique first letters
initials = {w[0] for w in words}
# {"a", "b"}

# Lengths of unique words
lengths = {len(w) for w in words}
# {5, 6, 9}

# Words longer than 6 characters, uppercased
long_upper = {w.upper() for w in words if len(w) > 6}
# {"BLUEBERRY", "APRICOT"} (order not guaranteed)

Frozensets: the immutable variant

frozenset is a set that cannot be modified after creation. Because it is immutable, it is also hashable, which means it can be stored inside another set or used as a dictionary key:

fs = frozenset(["read", "write"])

# Can be used as a dict key
permissions = {
    frozenset(["read"]):        "viewer",
    frozenset(["read", "write"]): "editor",
}

role = permissions.get(frozenset(["read", "write"]))
print(role)  # "editor"

# Can be stored inside a regular set
groups = {frozenset(["a", "b"]), frozenset(["c", "d"])}

frozenset supports all the read operations of a regular set — membership testing, iteration, arithmetic — but not the mutating methods (add, discard, remove, etc.).

When sets beat lists

Reach for a set when one or more of these apply: you need to eliminate duplicates; you need fast repeated membership checks; you need mathematical set operations; or you want to treat a group of items as an unordered collection without caring about their position. Keep a list when order matters, when you need indexing, or when duplicates are meaningful data.

Hashing requirement Only hashable objects can be stored in a set. Strings, numbers, tuples of hashables, and frozensets all qualify. Lists, dicts, and ordinary sets are not hashable and will raise a TypeError if you try to add them. Use frozenset or convert a list to a tuple when you need to store a collection inside a set.