Mutation Testing

Chapter 10: Mutation Testing

You have 100% code coverage. All tests pass. But are your tests actually good? Mutation testing answers this by deliberately breaking your code ("mutating" it) and checking if tests catch the bugs. If a test suite doesn't fail when code is broken, the tests aren't testing effectively. This chapter introduces mutation testing and shows you how to measure and improve test quality.

What is Mutation Testing?

Mutation testing modifies your source code in small ways (mutations) and runs your test suite. If tests still pass after the mutation, your tests missed a bug—they're not strong enough.

Example:

Mutation (change > to >=):

If your tests still pass with this mutation, you don't have a test for n == 0. A strong test suite would fail, catching the mutation.

How Mutation Testing Works

Run Tests: Verify all tests pass on original code
Create Mutation: Modify code (change + to -, > to >=, etc.)
Run Tests on Mutant: Do tests fail?
- Killed: Tests failed (good! Tests caught the bug)
- Survived: Tests passed (bad! Tests missed the bug)
Repeat: Try all possible mutations
Calculate Score: killed / (killed + survived) * 100%

Mutation Score: Percentage of mutations your tests catch. Higher is better. 80%+ is strong.

Installing mutmut

pip install mutmut

mutmut is the most popular Python mutation testing tool.

Running Mutation Tests

# Run mutation testing
mutmut run

# View results
mutmut results

# Show specific mutations
mutmut show 1

mutmut automatically finds your source code and tests, creates mutations, and runs tests on each mutation.

Types of Mutations

Arithmetic Operators:

+ → -, *, /
- → +
* → /, +

Comparison Operators:

> → >=, <, ==
== → !=, >=, <=
< → <=, >, ==

Boolean Operators:

and → or
or → and
not → removed

Constants:

True → False
0 → 1
"string" → ""

Return Values:

return x → return None
return True → return False

Interpreting Mutation Results

High Mutation Score (80%+): Strong test suite. Most mutations are caught.

Medium Score (60-80%): Decent coverage but room for improvement. Some edge cases untested.

Low Score (<60%): Weak tests. Many mutations survive. Tests verify execution but not correctness.

Example Output:

Total mutations: 100
Killed: 85
Survived: 10
Timeout: 5

Mutation Score: 85%

85% is strong. The 10 survivors indicate specific gaps in test coverage.

Improving Your Mutation Score

Step 1: Identify Survivors. Run mutmut show <id> to see mutations that survived.

Step 2: Understand Why. Why did tests pass? Missing edge case? Weak assertion?

Step 3: Add Tests. Write tests that would kill the mutation.

Example: Mutation survived because you test add(2, 3) but not add(0, 5).

Step 4: Re-run. Verify new tests kill the mutation.

Mutation Testing Best Practices

Don't Chase 100%: Some mutations are equivalent (don't change behavior) or impossible to kill. 85-90% is excellent.

Focus on Critical Code: Run mutation testing on business logic, not trivial getters or configuration.

Integrate with CI: Run mutation testing periodically, not on every commit (it's slow).

Use Results to Guide Testing: Survivors show where tests are weak. Add tests for those cases.

Set Minimum Score: Fail builds if mutation score drops below threshold (e.g., 80%).

Mutation Testing vs Code Coverage

Code coverage measures execution. Mutation testing measures quality.

High Coverage, Low Mutation Score:

def calculate_discount(price, is_member):
    if is_member:
        return price * 0.9
    return price

def test_discount():
    result = calculate_discount(100, True)
    # No assertion! Just executes code.

100% code coverage, 0% mutation score. The test verifies nothing.

High Coverage, High Mutation Score:

def test_discount_strong():
    # Member gets discount
    assert calculate_discount(100, True) == 90
    # Non-member pays full price
    assert calculate_discount(100, False) == 100

100% coverage, 100% mutation score. Tests verify correctness.

Coverage measures "did this run?" Mutation testing measures "did this work correctly?"

Equivalent Mutations

Some mutations don't change behavior:

def is_adult(age):
    return age >= 18

# Mutation: age >= 18 to age > 17
# These are equivalent! Can't distinguish with tests.

Equivalent mutations can't be killed. Don't worry about them—they're unavoidable. Aim for 85-90%, not 100%.

Performance Considerations

Mutation testing is slow. Each mutation requires running the full test suite.

Optimize:

Run mutation testing on changed files only
Use parallel execution: mutmut run --threads-to-use=4
Run nightly, not on every commit
Focus on critical modules

Example CI Integration:

# Run weekly mutation testing
schedule:
  - cron: '0 2 * * 1'  # Monday 2 AM

jobs:
  mutation-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run mutation testing
        run: |
          pip install mutmut
          mutmut run --paths-to-mutate=src/core
          mutmut results

When to Use Mutation Testing

Use Mutation Testing For:

Critical business logic (payment processing, financial calculations, data transformations)
Security-sensitive code (authentication, authorization, encryption, input validation)
Complex algorithms (sorting, searching, optimization, graph algorithms)
Code with high test coverage that still has bugs (mutation testing reveals test quality issues)
Stable APIs and libraries (where correctness is paramount and code changes infrequently)

Skip Mutation Testing For:

Simple CRUD operations (basic create, read, update, delete with minimal logic)
Trivial utilities (simple getters, setters, data structure wrappers)
UI code (visual components change frequently, mutation testing adds little value)
Code that changes frequently (tests will change too, making mutation testing impractical)
Prototype or experimental code (focus on functionality first, quality later)

Mutation testing is expensive—it multiplies test execution time by the number of mutations. Use it strategically on code that matters most: code that handles money, security, or critical business rules. For a typical application, apply mutation testing to 10-20% of your codebase: the core logic that absolutely must be correct.

Mutation Testing Workflow

Achieve High Coverage: Get to 80%+ code coverage first with traditional testing
Run Mutation Testing: Identify weak tests
Improve Tests: Kill surviving mutations by adding better assertions and edge cases
Maintain Score: Set minimum mutation score and prevent regression
Focus on ROI: Don't chase 100%—focus on critical code

Mutation testing is the final quality check after coverage, code review, and property-based testing.

Course Recommendations

Advanced Python Testing

Mutation testing in depth
Test quality metrics beyond coverage
Combining testing strategies
Enroll at paiml.com

Software Quality Engineering

Comprehensive quality metrics
Testing strategies comparison
Building quality into processes
Enroll at paiml.com

Test-Driven Development Mastery

Writing tests that catch real bugs
TDD with mutation testing
Test quality from day one
Enroll at paiml.com

Quiz

Final Thoughts

Mutation testing is the ultimate test quality metric. It answers the critical question: "Do my tests actually catch bugs, or do they just execute code?" Use it strategically on critical code to build confidence in your test suite. Combined with TDD, property-based testing, and code review, mutation testing ensures your tests truly protect against regressions.

Congratulations on completing this comprehensive guide to Testing in Python! You now have the knowledge to write tests that are fast, reliable, maintainable, and effective at catching real bugs.

📝 Test Your Knowledge: Mutation Testing

Take this quiz to reinforce what you've learned in this chapter.