Testing Best Practices

Chapter 8: Testing Best Practices

Good tests are clear, fast, and maintainable. Bad tests are cryptic, slow, and brittle. This chapter distills years of testing wisdom into practical guidelines. Follow these best practices and your test suite will help rather than hinder development.

Organize Tests Mirror Your Code

Test organization should mirror your source code structure. If users find your code, they should easily find its tests.

Project Structure:

myproject/
├── src/
│   ├── auth/
│   │   ├── __init__.py
│   │   └── login.py
│   └── payments/
│       ├── __init__.py
│       └── processor.py
└── tests/
    ├── unit/
    │   ├── test_auth_login.py
    │   └── test_payments_processor.py
    └── integration/
        └── test_payment_flow.py

This structure makes tests discoverable. Looking for login tests? Check tests/unit/test_auth_login.py.

Separate Unit and Integration Tests: Unit tests run fast and should run on every save. Integration tests are slower and run less frequently. Keep them separate.

Write Self-Documenting Tests

Test names should explain what they verify. A failing test name should tell you what broke without reading the code.

Bad:

What does this test? No idea without reading it.

Good:

def test_calculate_adds_positive_numbers():
    result = calculate(5, 3)
    assert result == 8

Now the test name documents its purpose.

Use describe-when-then Format:

def test_user_service_create_user_with_duplicate_email_raises_error():
    # describe_what_when_then
    pass

Or use the AAA pattern in comments:

def test_transfer_money():
    # Arrange: Set up accounts
    from_account = Account(balance=1000)
    to_account = Account(balance=500)

    # Act: Perform transfer
    transfer(from_account, to_account, 200)

    # Assert: Verify balances updated
    assert from_account.balance == 800
    assert to_account.balance == 700

Keep Tests Independent

Tests must run in any order. One test's outcome shouldn't affect another. Shared state between tests causes mysterious failures.

Anti-Pattern: Shared State:

# Global state - BAD!
user_count = 0

def test_create_user():
    global user_count
    create_user("alice")
    user_count += 1
    assert user_count == 1  # Fails if run after test_delete_user

def test_delete_user():
    global user_count
    delete_user("alice")
    user_count -= 1

Best Practice: Fixtures:

@pytest.fixture
def fresh_database():
    db = Database()
    yield db
    db.clear()  # Clean up after test

def test_create_user(fresh_database):
    user = fresh_database.create_user("alice")
    assert user.name == "alice"

def test_delete_user(fresh_database):
    user = fresh_database.create_user("bob")
    fresh_database.delete_user(user.id)
    assert fresh_database.get_user(user.id) is None

Each test gets a fresh database. Order doesn't matter.

Don't Test Implementation Details

Tests should verify behavior, not implementation. Testing internals makes tests brittle—they break when you refactor even though behavior hasn't changed.

Bad - Testing Implementation:

def test_sort_uses_quicksort():
    sorter = Sorter()
    sorter.sort([3, 1, 2])
    assert sorter._algorithm == "quicksort"  # Fragile!

Good - Testing Behavior:

def test_sort_returns_sorted_list():
    sorter = Sorter()
    result = sorter.sort([3, 1, 2])
    assert result == [1, 2, 3]  # Tests what, not how

Now you can change the sorting algorithm without breaking tests.

Test One Thing Per Test

Each test should verify one specific behavior. Small, focused tests are easier to understand and debug.

Anti-Pattern: Testing Everything:

def test_user_workflow():
    user = create_user("alice")
    user.update_email("new@example.com")
    user.add_address("123 Main St")
    order = user.create_order()
    order.add_item("Widget", 2)
    order.checkout()
    # 20 assertions...

When this fails, which part broke? Hard to tell.

Best Practice: Focused Tests:

def test_create_user_returns_user_with_email():
    user = create_user("alice@example.com")
    assert user.email == "alice@example.com"

def test_update_email_changes_user_email():
    user = create_user("old@example.com")
    user.update_email("new@example.com")
    assert user.email == "new@example.com"

def test_checkout_creates_order_record():
    order = create_order()
    order.checkout()
    assert order.status == "pending"

Clear, focused, easy to debug.

Use Fixtures for Setup

Don't repeat setup code in every test. Use fixtures to DRY up test code.

Without Fixtures (Repetitive):

def test_user_creation():
    db = Database("test.db")
    db.connect()
    # Test code
    db.disconnect()

def test_user_deletion():
    db = Database("test.db")
    db.connect()
    # Test code
    db.disconnect()

With Fixtures (DRY):

@pytest.fixture
def database():
    db = Database("test.db")
    db.connect()
    yield db
    db.disconnect()

def test_user_creation(database):
    # database fixture provides clean DB
    user = database.create_user("alice")
    assert user is not None

def test_user_deletion(database):
    user = database.create_user("bob")
    database.delete_user(user.id)
    assert database.get_user(user.id) is None

Assert Meaningfully

Use specific assertions that produce helpful failure messages.

Bad:

def test_user_data():
    user = get_user(1)
    assert user  # What failed? Email? Name? ID?

Good:

def test_user_has_required_fields():
    user = get_user(1)
    assert user.id == 1
    assert user.email == "alice@example.com"
    assert user.name == "Alice"

Or use custom messages:

def test_user_active():
    user = get_user(1)
    assert user.is_active, f"User {user.id} should be active but was inactive"

Common Anti-Patterns to Avoid

Anti-Pattern 1: Test-Dependent Tests

Tests that only pass when run in a specific order are fragile and frustrating.

Anti-Pattern 2: Sleeping in Tests

def test_async_operation():
    start_operation()
    time.sleep(5)  # BAD! Slow and unreliable
    assert operation_complete()

Use proper async testing or polling with timeout instead.

Anti-Pattern 3: Excessive Mocking

Over-mocking makes tests verify mocks, not real behavior. Mock boundaries, not everything.

Anti-Pattern 4: Testing Third-Party Code

def test_requests_library_works():
    response = requests.get("http://example.com")
    assert response.status_code == 200  # Don't test requests!

Test your code, not libraries.

Anti-Pattern 5: Ignoring Flaky Tests

Flaky tests that pass/fail randomly are technical debt. Fix or delete them—don't ignore.

Write Fast Tests

Slow tests discourage running them. Keep unit tests under 100ms each. The entire unit test suite should run in seconds.

Make Tests Fast:

Use in-memory databases
Mock network calls
Avoid file I/O when possible
Use pytest-xdist for parallel execution
Keep integration tests separate

Measure Test Speed:

pytest --durations=10  # Show 10 slowest tests

Optimize the slow ones.

Maintain Your Tests

Tests are code. They need maintenance, refactoring, and cleanup just like production code.

Refactor Tests: When you refactor production code, refactor tests too. Keep them clean and readable.

Delete Dead Tests: Tests for deleted features should be deleted. Don't accumulate test cruft.

Update Tests with Requirements: When requirements change, update tests first (TDD) or immediately after.

Review Test Code: Code review should include tests. Bad tests are worse than no tests.

Continuous Integration Best Practices

Run Tests on Every Commit: Catch regressions immediately. Use GitHub Actions, GitLab CI, or Jenkins.

Fail Fast: Run fast unit tests first. Only run slow integration tests if unit tests pass.

Enforce Coverage: Set minimum coverage thresholds and fail builds below them.

Make CI Fast: Slow CI discourages commits. Optimize for speed—cache dependencies, parallelize tests, use faster test databases.

Real-World Testing Scenarios

Scenario 1: Testing Error Messages

Don't just verify an exception is raised—verify the message helps users understand the problem.

def withdraw(account, amount):
    if amount > account.balance:
        raise ValueError(f"Insufficient funds: need ${amount}, have ${account.balance}")
    account.balance -= amount

def test_withdraw_insufficient_funds_has_helpful_message():
    account = Account(balance=50)
    with pytest.raises(ValueError, match="Insufficient funds: need \\$100, have \\$50"):
        withdraw(account, 100)

Scenario 2: Testing Time-Dependent Code

Use libraries like freezegun to control time in tests:

from freezegun import freeze_time
from datetime import datetime

def is_business_hours():
    now = datetime.now()
    return 9 <= now.hour < 17 and now.weekday() < 5

@freeze_time("2024-01-15 14:30:00")  # Monday at 2:30 PM
def test_is_business_hours_during_business_day():
    assert is_business_hours() == True

@freeze_time("2024-01-15 19:00:00")  # Monday at 7 PM
def test_is_business_hours_after_hours():
    assert is_business_hours() == False

Scenario 3: Testing Randomness

Control random values in tests for reproducibility:

import random

def generate_id():
    return random.randint(1000, 9999)

def test_generate_id_returns_four_digit_number(monkeypatch):
    monkeypatch.setattr(random, 'randint', lambda a, b: 5555)
    id = generate_id()
    assert id == 5555

Debugging Test Failures

When tests fail, debug systematically:

Step 1: Read the Failure Message. pytest provides detailed output. Read it carefully. The assertion message often tells you exactly what's wrong.

Step 2: Run the Test in Isolation. If a test fails in the suite but passes alone, you have test pollution. Tests share state or depend on execution order.

Step 3: Add Print Statements. Use print() to inspect values. pytest shows print output for failing tests.

Step 4: Use a Debugger. Run tests with pytest --pdb to drop into a debugger on failure:

pytest --pdb tests/test_user.py::test_create_user

Step 5: Check Recent Changes. If tests started failing recently, what changed? Git blame and git diff are your friends.

Step 6: Verify Assumptions. Tests fail because assumptions were wrong. What did you assume that might not be true?

Test Maintenance Strategies

Keep Test Code Clean: Apply the same standards to test code as production code. Refactor duplicated test code. Extract helper functions. Use clear names.

Review Tests in Code Review: Don't rubber-stamp test changes. Review them as carefully as production code. Bad tests cause more problems than no tests.

Update Tests with Features: When adding a feature, add tests. When removing a feature, remove its tests. Keep tests synchronized with code.

Refactor Tests: As production code evolves, tests need refactoring too. Don't let test code become a tangled mess.

Delete Flaky Tests: If a test randomly fails, either fix it or delete it. Flaky tests erode trust in the entire test suite. Don't tolerate them.

Performance Testing Guidelines

Measure What Matters: Don't test arbitrary performance metrics. Test that operations complete within acceptable time limits for your use case.

import time

def test_search_completes_within_one_second():
    start = time.time()
    results = search_database("query")
    duration = time.time() - start

    assert duration < 1.0, f"Search took {duration}s, limit is 1s"
    assert len(results) > 0

Use Appropriate Tools: For serious performance testing, use profilers (cProfile, line_profiler) and benchmarking libraries (pytest-benchmark) rather than basic timing.

Test Performance Regressions: Track performance over time. If search takes 100ms today, a change that makes it take 500ms is a regression even if both are "fast."

Testing Best Practices Checklist

Before considering your tests complete, verify:

Test Organization:

Tests mirror source code structure
Unit and integration tests separated
Test files easily discoverable

Test Quality:

Test names describe what they verify
Tests are independent and can run in any order
Each test focuses on one specific behavior
Assertions are specific and meaningful
No testing of third-party library code

Test Maintenance:

Tests use fixtures to avoid duplication
Test code is clean and readable
Tests updated when requirements change
Flaky tests fixed or deleted
Dead tests for removed features deleted

Performance:

Unit test suite runs in seconds
Slow tests identified and optimized
Tests run in parallel where possible

CI/CD Integration:

Tests run on every commit
Fast tests run before slow tests
Minimum coverage threshold enforced
Build fails on test failures

Following these practices creates a test suite that helps development rather than hinders it.

Course Recommendations

Test-Driven Development Mastery

Advanced TDD techniques
Test design patterns
Refactoring with confidence
Enroll at paiml.com

Advanced Python Testing

pytest advanced features
Property-based testing
Mutation testing
Enroll at paiml.com

Clean Code Principles

Writing maintainable code
Code smells and refactoring
SOLID principles
Enroll at paiml.com

Quiz

📝 Test Your Knowledge: Testing Best Practices

Take this quiz to reinforce what you've learned in this chapter.