Testing Guide¶
This guide covers testing practices and standards for the Segmentation Robustness Framework.
๐งช Testing Philosophy¶
Testing Principles¶
- Comprehensive Coverage: Aim for >90% code coverage
- Fast Execution: Tests should run quickly (<5 minutes for full suite)
- Reliable Results: Tests should be deterministic and repeatable
- Clear Purpose: Each test should have a specific, documented purpose
- Maintainable: Tests should be easy to understand and modify
๐ Test Structure¶
tests/
โโโ adapters/ # Model adapter tests
โ โโโ test_custom_adapter.py
โ โโโ test_huggingface_adapter.py
โ โโโ test_smp_adapter.py
โ โโโ test_torchvision_adapter.py
โ โโโ test_registry.py
โโโ attacks/ # Attack implementation tests
โ โโโ test_base_attack.py
โ โโโ test_fgsm.py
โ โโโ test_pgd.py
โ โโโ test_rfgsm.py
โ โโโ test_tpgd.py
โ โโโ test_custom_attacks.py
โ โโโ test_registry.py
โโโ datasets/ # Dataset tests
โ โโโ test_voc_dataset.py
โ โโโ test_ade20k_dataset.py
โ โโโ test_cityscapes_dataset.py
โ โโโ test_stanford_background_dataset.py
โ โโโ test_registry.py
โโโ loaders/ # Model and dataset loader tests
โ โโโ test_universal_model_loader.py
โ โโโ test_torchvision_model_loader.py
โ โโโ test_smp_model_loader.py
โ โโโ test_huggingface_model_loader.py
โ โโโ test_custom_model_loader.py
โ โโโ test_dataset_loader.py
โ โโโ test_attack_loader.py
โโโ metrics/ # Metric tests
โ โโโ test_base_metrics.py
โ โโโ test_custom_metrics.py
โโโ pipeline/ # Pipeline tests
โ โโโ test_core.py
โ โโโ test_config.py
โโโ utils/ # Utility function tests
โ โโโ test_image_preprocessing.py
โ โโโ test_model_utils.py
โ โโโ test_dataset_utils.py
โ โโโ test_loader_utils.py
โ โโโ test_visualization.py
โโโ data/ # Test data files
โโโ dummy_model_weights.pth
โโโ dummy_model_checkpoint.pth
โโโ dummy_encoder_weights.pth
๐ Running Tests¶
Basic Commands¶
# Run all tests
pytest
# Run with verbose output
pytest -v
# Run with coverage
pytest --cov=segmentation_robustness_framework --cov-report=html
# Run specific test file
pytest tests/attacks/test_fgsm.py
# Run specific test function
pytest tests/attacks/test_fgsm.py::test_fgsm_attack
# Run tests matching pattern
pytest -k "attack"
# Run tests in parallel
pytest -n auto
Test Categories¶
# Adapter tests only
pytest tests/adapters/
# Attack tests only
pytest tests/attacks/
# Dataset tests only
pytest tests/datasets/
# Loader tests only
pytest tests/loaders/
# Pipeline tests only
pytest tests/pipeline/
๐ Writing Tests¶
Test Function Naming¶
def test_function_name_expected_behavior():
"""Test that function behaves as expected."""
pass
def test_function_name_edge_case():
"""Test function behavior with edge case input."""
pass
def test_function_name_error_condition():
"""Test function behavior with invalid input."""
pass
Test Structure¶
def test_fgsm_attack_creates_valid_adversarial_images():
"""Test that FGSM creates valid adversarial images."""
# Arrange
model = create_mock_model()
attack = FGSM(model, eps=0.02)
images = torch.randn(2, 3, 224, 224)
labels = torch.randint(0, 21, (2, 224, 224))
# Act
adv_images = attack.apply(images, labels)
# Assert
assert adv_images.shape == images.shape
assert torch.all(adv_images >= 0) and torch.all(adv_images <= 1)
assert torch.any(adv_images != images) # Images should be perturbed
Using Fixtures¶
import pytest
@pytest.fixture
def sample_model():
"""Create a sample model for testing."""
return create_mock_model()
@pytest.fixture
def sample_dataset():
"""Create a sample dataset for testing."""
return create_mock_dataset()
def test_attack_with_sample_model(sample_model, sample_dataset):
"""Test attack with sample model and dataset."""
attack = FGSM(sample_model, eps=0.02)
# Test implementation
Parameterized Tests¶
import pytest
@pytest.mark.parametrize("eps", [0.01, 0.02, 0.05])
def test_fgsm_different_eps_values(eps):
"""Test FGSM with different epsilon values."""
model = create_mock_model()
attack = FGSM(model, eps=eps)
images = torch.randn(2, 3, 224, 224)
labels = torch.randint(0, 21, (2, 224, 224))
adv_images = attack.apply(images, labels)
# Check perturbation magnitude
perturbation = torch.abs(adv_images - images)
assert torch.all(perturbation <= eps + 1e-6)
๐ง Test Utilities¶
Mock Models¶
class MockModel(nn.Module):
"""Mock model for testing."""
def __init__(self, num_classes=21):
super().__init__()
self.num_classes = num_classes
def forward(self, x):
"""Forward pass returning random logits."""
batch_size = x.shape[0]
height, width = x.shape[2], x.shape[3]
return torch.randn(batch_size, self.num_classes, height, width)
def create_mock_model(num_classes=21):
"""Create a mock model for testing."""
return MockModel(num_classes)
Sample Data Generators¶
def create_sample_images(batch_size=2, channels=3, height=224, width=224):
"""Create sample images for testing."""
return torch.randn(batch_size, channels, height, width)
def create_sample_labels(batch_size=2, height=224, width=224, num_classes=21):
"""Create sample labels for testing."""
return torch.randint(0, num_classes, (batch_size, height, width))
def create_sample_dataset(num_samples=10):
"""Create a sample dataset for testing."""
class SampleDataset(torch.utils.data.Dataset):
def __init__(self, num_samples):
self.num_samples = num_samples
def __len__(self):
return self.num_samples
def __getitem__(self, idx):
image = create_sample_images(1, 3, 224, 224).squeeze(0)
label = create_sample_labels(1, 224, 224).squeeze(0)
return image, label
return SampleDataset(num_samples)
Assertion Helpers¶
def assert_tensor_shape(tensor, expected_shape):
"""Assert tensor has expected shape."""
assert tensor.shape == expected_shape, f"Expected {expected_shape}, got {tensor.shape}"
def assert_tensor_range(tensor, min_val=0, max_val=1):
"""Assert tensor values are within range."""
assert torch.all(tensor >= min_val), f"Values below {min_val}"
assert torch.all(tensor <= max_val), f"Values above {max_val}"
def assert_metric_value(metric_value, min_val=0, max_val=1):
"""Assert metric value is within valid range."""
assert min_val <= metric_value <= max_val, f"Metric value {metric_value} outside range [{min_val}, {max_val}]"
๐ท๏ธ Test Markers¶
Built-in Markers¶
import pytest
@pytest.mark.slow
def test_slow_operation():
"""Test that takes a long time to run."""
pass
@pytest.mark.gpu
def test_gpu_operation():
"""Test that requires GPU."""
pass
@pytest.mark.integration
def test_integration():
"""Integration test."""
pass
@pytest.mark.unit
def test_unit():
"""Unit test."""
pass
Custom Markers¶
Register custom markers in pytest.ini:
[tool:pytest]
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
gpu: marks tests as requiring GPU
integration: marks tests as integration tests
unit: marks tests as unit tests
benchmark: marks tests as performance benchmarks
๐ Coverage Reporting¶
Coverage Configuration¶
[tool:pytest]
addopts = --cov=segmentation_robustness_framework --cov-report=html --cov-report=term-missing
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
Coverage Targets¶
- Overall Coverage: >90%
- Critical Paths: >95%
- New Code: >95%
๐จ Debugging Tests¶
Common Issues¶
Test Isolation¶
# Use fixtures for proper setup/teardown
@pytest.fixture(autouse=True)
def reset_state():
"""Reset global state before each test."""
# Setup
yield
# Teardown
torch.cuda.empty_cache()
Deterministic Results¶
# Set random seeds for reproducible tests
@pytest.fixture(autouse=True)
def set_random_seed():
"""Set random seed for reproducible tests."""
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)
Memory Issues¶
# Monitor memory usage
def test_memory_usage():
"""Test with memory monitoring."""
import psutil
process = psutil.Process()
initial_memory = process.memory_info().rss
# Run test
final_memory = process.memory_info().rss
print(f"Memory usage: {final_memory - initial_memory} bytes")
Debugging Commands¶
# Run single test with debug output
pytest tests/attacks/test_fgsm.py::test_fgsm_attack -v -s
# Run with print statements
pytest tests/attacks/test_fgsm.py::test_fgsm_attack -s
# Run with pdb debugger
pytest tests/attacks/test_fgsm.py::test_fgsm_attack --pdb
# Run with detailed error information
pytest tests/attacks/test_fgsm.py::test_fgsm_attack -vvv
๐ฏ Best Practices¶
Test Design¶
- Test One Thing: Each test should verify one specific behavior
- Use Descriptive Names: Test names should clearly describe what they test
- Arrange-Act-Assert: Structure tests with clear sections
- Use Fixtures: Reuse common setup code
- Test Edge Cases: Include boundary conditions and error cases
Test Maintenance¶
- Keep Tests Fast: Avoid slow operations in unit tests
- Use Mocks: Mock external dependencies
- Update Tests: Update tests when code changes
- Review Coverage: Regularly review and improve coverage
- Document Complex Tests: Add comments for complex test logic
Test Data¶
- Use Small Data: Use minimal data for fast tests
- Create Realistic Data: Use data that resembles real usage
- Avoid External Dependencies: Don't rely on external services
- Version Control Test Data: Include test data in repository
- Document Data Format: Document expected data formats
๐ Next Steps¶
After reading this testing guide:
- Set up your testing environment following the Development Setup Guide
- Write your first test using the examples above
- Run the test suite to ensure everything works
- Contribute tests for new features or bug fixes
- Review existing tests to understand patterns and conventions
Happy testing! ๐งช