Property-Based Testing

Chapter 9: Property-Based Testing with Hypothesis

Traditional tests check specific examples: "Does sort([3, 1, 2]) return [1, 2, 3]?" Property-based testing checks universal properties: "Does sort(x) always return a sorted list for any input x?" Instead of writing dozens of examples manually, you define properties and let Hypothesis generate hundreds of test cases automatically. This chapter introduces property-based testing and shows you how to find bugs you'd never think to test for manually.

What is Property-Based Testing?

Property-based testing defines properties that should hold for all inputs, then generates random inputs to verify those properties.

Example-Based Test (Traditional):

You manually pick three examples and hope they're representative.

Property-Based Test:

Hypothesis generates hundreds of lists automatically—empty lists, single elements, massive lists, lists with duplicates, negatives, etc. You define the property ("reversing twice returns the original"), Hypothesis finds edge cases.

Installing Hypothesis

pip install hypothesis

Hypothesis integrates with pytest seamlessly.

Defining Properties

Good properties are universal truths about your code's behavior.

Property Examples:

For Sorting:

  • Output length equals input length
  • Output is sorted (each element <= next element)
  • Output contains same elements as input

For Reversal:

  • Reversing twice returns original
  • Length unchanged
  • First element becomes last

For Addition:

  • Commutative: a + b == b + a
  • Associative: (a + b) + c == a + (b + c)
  • Identity: a + 0 == a

Hypothesis Strategies

Strategies tell Hypothesis what kind of data to generate.

Built-in Strategies:

import hypothesis.strategies as st

st.integers()  # Any integer
st.integers(min_value=0, max_value=100)  # 0-100
st.floats()  # Any float
st.text()  # Any string
st.booleans()  # True or False
st.lists(st.integers())  # Lists of integers
st.dictionaries(keys=st.text(), values=st.integers())  # Dict
st.tuples(st.integers(), st.text())  # Tuple of (int, str)

Composite Strategies:

# Lists of 1-10 positive integers
st.lists(st.integers(min_value=1), min_size=1, max_size=10)

# Email-like strings
st.from_regex(r"[a-z]+@[a-z]+\.(com|org)")

# Custom objects
@st.composite
def users(draw):
    name = draw(st.text(min_size=1))
    age = draw(st.integers(min_value=0, max_value=120))
    return User(name=name, age=age)

Finding Bugs with Hypothesis

Hypothesis excels at finding edge cases you'd never think to test.

Example: Buggy Median Function:

def median(numbers):
    sorted_nums = sorted(numbers)
    n = len(sorted_nums)
    return sorted_nums[n // 2]  # Bug: wrong for even-length lists

@given(st.lists(st.integers(), min_size=1))
def test_median_property(numbers):
    result = median(numbers)
    # Property: median should be in the list
    assert result in numbers

Hypothesis quickly finds: median([1, 2]) returns 2 (the higher value), not the true median 1.5. The property catches the bug.

Example: Encoding/Decoding:

def encode(text):
    return text.encode('utf-8')

def decode(data):
    return data.decode('utf-8')

@given(st.text())
def test_encode_decode_roundtrip(text):
    # Property: encoding then decoding returns original
    assert decode(encode(text)) == text

This finds Unicode edge cases automatically.

Shrinking: Finding Minimal Failing Examples

When Hypothesis finds a failure, it "shrinks" the input to find the smallest example that still fails.

Example:

def buggy_sort(lst):
    if len(lst) > 5:
        return sorted(lst)[:-1]  # Bug: drops last element on long lists
    return sorted(lst)

@given(st.lists(st.integers()))
def test_sort_preserves_length(lst):
    assert len(buggy_sort(lst)) == len(lst)

Hypothesis might find this with input [5, 2, 8, 1, 9, 3, 7], then shrink it to [0, 0, 0, 0, 0, 0]—the minimal example that triggers the bug. Shrinking makes debugging easier.

Property-Based Testing Best Practices

Start with Simple Properties: Don't overcomplicate. "Output length equals input length" is a great first property.

Combine with Example Tests: Use property tests for general behavior, example tests for specific known edge cases.

Use Realistic Data: Generate data that matches your domain. For emails, use email-like strings, not random text.

Test Invariants: Look for things that should always be true—sorted output, no data loss, reversibility.

Set Bounds: Unbounded generation can be slow. Use min_size, max_size, min_value, max_value to keep tests fast.

Real-World Property-Based Testing Examples

Testing a JSON Serializer:

import json

@given(st.dictionaries(st.text(), st.integers()))
def test_json_roundtrip(data):
    # Property: JSON serialization is reversible
    serialized = json.dumps(data)
    deserialized = json.loads(serialized)
    assert deserialized == data

Testing Password Validation:

@given(st.text(min_size=8, alphabet=st.characters()))
def test_password_validator_accepts_long_strings(password):
    # Property: passwords >= 8 chars should be valid
    result = validate_password(password)
    assert result.is_valid or len(result.errors) > 0

Testing Database Operations:

@given(st.lists(st.integers(min_value=1, max_value=100), unique=True))
def test_database_insert_retrieve(user_ids):
    # Property: inserted IDs can all be retrieved
    for user_id in user_ids:
        db.insert_user(user_id)

    for user_id in user_ids:
        user = db.get_user(user_id)
        assert user is not None
        assert user.id == user_id

When to Use Property-Based Testing

Use Property-Based Testing For:

  • Parsers and serializers (roundtrip properties)
  • Sorting and data transformation
  • Mathematical operations (commutativity, associativity)
  • Encoding/decoding
  • Data validation
  • API contracts

Stick with Example-Based Tests For:

  • Specific business rules
  • UI behavior
  • Known edge cases you want to document
  • Simple CRUD operations

Best Strategy: Combine both. Use example tests for specific cases, property tests for general behavior.

Hypothesis Configuration

Control Hypothesis behavior with settings:

from hypothesis import given, settings

@given(st.lists(st.integers()))
@settings(
    max_examples=1000,  # Run 1000 test cases (default: 100)
    deadline=None,  # Disable time limit
)
def test_expensive_operation(data):
    result = expensive_operation(data)
    assert verify_result(result)

Common Pitfalls

Pitfall 1: Overly Specific Properties

Don't just reimplement the function in your property test:

# Bad: reimplements sort
@given(st.lists(st.integers()))
def test_sort_bad(lst):
    result = my_sort(lst)
    expected = sorted(lst)  # Just testing against sorted()
    assert result == expected

Better: Test properties:

@given(st.lists(st.integers()))
def test_sort_properties(lst):
    result = my_sort(lst)
    # Property 1: sorted
    assert all(result[i] <= result[i+1] for i in range(len(result)-1))
    # Property 2: same elements
    assert sorted(result) == sorted(lst)

Pitfall 2: Non-Deterministic Code

Property tests assume determinism. Random behavior breaks property testing unless you control the seed.

Pitfall 3: Slow Properties

Keep property tests fast. Hypothesis runs hundreds of examples—each must be quick.

Hypothesis and Fuzzing

Property-based testing is related to fuzzing—both generate inputs to find bugs. Hypothesis is smarter than random fuzzing because it:

  1. Understands data types: Generates valid integers, not random bytes
  2. Shrinks failures: Finds minimal failing examples
  3. Guides generation: Uses feedback to generate interesting inputs

This makes Hypothesis more effective than naive fuzzing for finding bugs.

Advanced Hypothesis Features

Stateful Testing: Test sequences of operations, not just single function calls.

from hypothesis.stateful import RuleBasedStateMachine, rule

class BankAccountMachine(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.balance = 0

    @rule(amount=st.integers(min_value=1, max_value=1000))
    def deposit(self, amount):
        self.balance += amount
        assert self.balance >= 0

    @rule(amount=st.integers(min_value=1, max_value=100))
    def withdraw(self, amount):
        if amount <= self.balance:
            self.balance -= amount
        assert self.balance >= 0

TestBankAccount = BankAccountMachine.TestCase

Hypothesis generates random sequences: deposit, withdraw, deposit, deposit, withdraw—testing state transitions automatically.

Custom Strategies: Build complex data generators for your domain.

@st.composite
def valid_email(draw):
    username = draw(st.text(alphabet=st.characters(min_codepoint=97, max_codepoint=122), min_size=1, max_size=20))
    domain = draw(st.sampled_from(["gmail.com", "yahoo.com", "example.com"]))
    return f"{username}@{domain}"

@given(valid_email())
def test_email_validator(email):
    assert validate_email(email) == True

Filtering Examples: Exclude invalid inputs.

@given(st.integers())
def test_division(n):
    assume(n != 0)  # Skip n=0
    result = 100 / n
    assert result * n == 100

Use assume() to filter out invalid inputs, but don't filter too much—Hypothesis will struggle to find valid examples.

Property Discovery Techniques

Finding good properties requires practice. Here are techniques:

Inverse Operations: If you have encode and decode, test decode(encode(x)) == x.

Invariants: Properties that never change. For sorted lists, all(lst[i] <= lst[i+1]). For sets, len(set) == len(unique_items).

Idempotence: Operations that can be repeated without changing the result. abs(abs(x)) == abs(x). dedupe(dedupe(lst)) == dedupe(lst).

Comparison with Alternative Implementation: If you have a simple but slow implementation and a fast complex one, verify they produce the same results.

@given(st.lists(st.integers()))
def test_fast_sort_matches_slow_sort(lst):
    assert fast_sort(lst) == slow_but_simple_sort(lst)

Metamorphic Properties: Changing input in predictable ways produces predictable output changes.

@given(st.lists(st.integers()))
def test_sort_preserves_reversal(lst):
    # Sorting then reversing should equal reverse-sorting
    sorted_then_reversed = list(reversed(sorted(lst)))
    reverse_sorted = sorted(lst, reverse=True)
    assert sorted_then_reversed == reverse_sorted

Debugging Property Test Failures

When Hypothesis finds a failure, it provides the minimal failing example:

Falsifying example: test_function(lst=[0, 0])

Step 1: Reproduce Locally. Hypothesis prints the exact input that failed. Use it to write a focused example test.

Step 2: Understand the Failure. Why does this input break your property? Is the property wrong or is the code buggy?

Step 3: Fix and Re-test. Fix the bug, then run Hypothesis again to verify the fix handles all cases.

Step 4: Add Example Test. Convert the failing case to an example test to prevent regression and document the edge case.

Integration with pytest

Hypothesis integrates seamlessly with pytest:

# Run property tests
pytest tests/

# Run with more examples
pytest --hypothesis-show-statistics

# Seed for reproducibility
pytest --hypothesis-seed=12345

Property tests appear as regular pytest tests in output. Failed properties show the minimal failing example in the error message.

Getting Started with Hypothesis in Your Project

Step 1: Identify Candidates. Look for functions with clear mathematical properties, parsers, encoders, or data transformations. These benefit most from property testing.

Step 2: Start Simple. Begin with one simple property. Don't try to test everything with properties immediately.

Step 3: Add Incrementally. As you gain confidence, add more property tests. Combine them with your existing example tests.

Step 4: Learn from Failures. When Hypothesis finds a bug, understand why your property failed. This teaches you about your code's behavior.

Step 5: Share with Team. Property testing has a learning curve. Share examples with your team and demonstrate the bugs Hypothesis finds.

Common First Properties to Try:

  • Serialization roundtrips: deserialize(serialize(x)) == x
  • Reversibility: undo(do(x)) == x
  • Length preservation: len(transform(lst)) == len(lst)
  • Sorting properties: output is sorted, contains same elements
  • Idempotence: f(f(x)) == f(x)

Start with these patterns and expand as you discover more properties in your codebase. Property-based testing complements traditional testing—use both for comprehensive coverage.

Course Recommendations

Advanced Python Testing

  • Property-based testing mastery
  • Hypothesis advanced features
  • Combining property and example tests
  • Enroll at paiml.com

Software Verification

Test-Driven Development Mastery

  • TDD with property-based testing
  • Property discovery techniques
  • Real-world TDD projects
  • Enroll at paiml.com

Quiz

Property-based testing represents a paradigm shift from example-based testing. It finds bugs in edge cases you never thought to test manually, making your code more robust and reliable in production environments.

📝 Test Your Knowledge: Property-Based Testing

Take this quiz to reinforce what you've learned in this chapter.