Metrics API¶
This page documents the evaluation metrics components of the Segmentation Robustness Framework.
Metrics Classes¶
segmentation_robustness_framework.metrics.base_metrics
¶
Classes¶
MetricsCollection(num_classes: int, ignore_index: int = 255)
¶
Implements metrics to evaluate the quality of multiclass semantic segmentation models.
Attributes:
| Name | Type | Description |
|---|---|---|
targets |
Tensor
|
Ground truth segmentation mask [C, H, W], where each pixel value is the true class. |
preds |
Tensor
|
Predicted segmentation mask [C, H, W], where each pixel value is the predicted class. |
num_classes |
int
|
The number of classes. |
Initialize segmentation metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
Number of classes for segmentation. |
required |
ignore_index
|
int
|
Index to ignore in evaluation. Defaults to 255. |
255
|
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
Functions¶
mean_iou(targets: torch.Tensor, preds: torch.Tensor, average: str = 'macro') -> float
¶
Compute mean Intersection over Union metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Tensor
|
Ground-truth segmentation masks. |
required |
preds
|
Tensor
|
Predicted segmentation masks. |
required |
average
|
str
|
Type of averaging to use: "macro" or "micro". Defaults to "macro". |
'macro'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Mean IoU. |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
pixel_accuracy(targets: torch.Tensor, preds: torch.Tensor) -> float
¶
Compute pixel accuracy metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Tensor
|
Ground-truth segmentation masks. |
required |
preds
|
Tensor
|
Predicted segmentation masks. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Pixel accuracy. |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
precision(targets: torch.Tensor, preds: torch.Tensor, average: str = 'macro') -> float
¶
Compute precision metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Tensor
|
Ground-truth segmentation masks. |
required |
preds
|
Tensor
|
Predicted segmentation masks. |
required |
average
|
str
|
Type of averaging to use: "macro" or "micro". Defaults to "macro" |
'macro'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Precision metric. |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
recall(targets: torch.Tensor, preds: torch.Tensor, average: str = 'macro') -> float
¶
Compute recall metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Tensor
|
Ground-truth segmentation masks. |
required |
preds
|
Tensor
|
Predicted segmentation masks. |
required |
average
|
str
|
Type of averaging to use: "macro" or "micro". Defaults to "macro" |
'macro'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Recall metric. |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
dice_score(targets: torch.Tensor, preds: torch.Tensor, average: str = 'macro') -> float
¶
Compute dice score.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Tensor
|
Ground-truth segmentation masks. |
required |
preds
|
Tensor
|
Predicted segmentation masks. |
required |
average
|
str
|
Type of averaging to use: "macro" or "micro". Defaults to "macro" |
'macro'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Dice score. |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
get_metric_with_averaging(metric_name: str, average: str = 'macro')
¶
Get a metric function with specified averaging strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric_name
|
str
|
Name of the metric ('mean_iou', 'precision', 'recall', 'dice_score') |
required |
average
|
str
|
Averaging strategy ('macro' or 'micro') |
'macro'
|
Returns:
| Name | Type | Description |
|---|---|---|
callable |
Metric function with the specified averaging |
Raises:
| Type | Description |
|---|---|
ValueError
|
If metric_name is not supported or average is invalid |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
get_all_metrics_with_averaging(include_pixel_accuracy: bool = True)
¶
Get all metrics with both macro and micro averaging.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_pixel_accuracy
|
bool
|
Whether to include pixel_accuracy (no averaging) |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
(metrics_list, metric_names_list) with proper naming |
Source code in segmentation_robustness_framework/metrics/base_metrics.py
segmentation_robustness_framework.metrics.custom_metrics
¶
Functions¶
register_custom_metric(name: str) -> Callable
¶
Register a custom metric function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name to register the metric under. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Callable |
Callable
|
Decorator function. |
Example
@register_custom_metric("my_dice_score") def my_dice_score(targets, preds): # Custom implementation return score
Source code in segmentation_robustness_framework/metrics/custom_metrics.py
get_custom_metric(name: str) -> Callable
¶
Get a custom metric function by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the registered metric. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Callable |
Callable
|
The metric function. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the metric name is not registered. |
Source code in segmentation_robustness_framework/metrics/custom_metrics.py
list_custom_metrics() -> list[str]
¶
List all registered custom metrics.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of registered metric names. |
Metrics Overview¶
The framework provides comprehensive evaluation metrics for semantic segmentation tasks, including both standard metrics and custom implementations.
MetricsCollection¶
The main metrics container that provides standardized evaluation functions:
from segmentation_robustness_framework.metrics import MetricsCollection
# Initialize metrics collection
metrics = MetricsCollection(num_classes=21, ignore_index=255)
# Get metric functions for pipeline
metric_functions = [
metrics.mean_iou,
metrics.pixel_accuracy,
metrics.precision,
metrics.recall,
metrics.dice_score
]
Available Metrics¶
Mean IoU (Intersection over Union)¶
The most commonly used metric for semantic segmentation:
Features: - Handles class imbalance - Robust to different class distributions - Standard benchmark metric
Pixel Accuracy¶
Overall pixel-level accuracy:
# Calculate pixel accuracy
accuracy = metrics.pixel_accuracy(targets, predictions)
print(f"Pixel Accuracy: {accuracy:.3f}")
Features: - Simple and intuitive - Fast computation - Good for balanced datasets
Precision¶
Per-class precision scores:
# Calculate precision
precision = metrics.precision(targets, predictions)
print(f"Precision: {precision:.3f}")
Features: - Per-class evaluation - Useful for imbalanced datasets - Detailed performance analysis
Recall¶
Per-class recall scores:
Features: - Per-class evaluation - Completeness measure - Balanced with precision
Dice Score (F1-Score)¶
Harmonic mean of precision and recall:
# Calculate dice score
dice = metrics.dice_score(targets, predictions)
print(f"Dice Score: {dice:.3f}")
Features: - Balanced metric - Good for imbalanced classes - Medical imaging standard
Custom Metrics¶
Create custom metrics by implementing metric functions:
import torch
import torch.nn.functional as F
def custom_metric(targets: torch.Tensor, predictions: torch.Tensor,
num_classes: int, ignore_index: int = 255) -> float:
"""Custom metric implementation."""
# Remove ignored pixels
mask = targets != ignore_index
targets = targets[mask]
predictions = predictions[mask]
# Your custom metric calculation
# Example: weighted accuracy
correct = (targets == predictions).float()
weighted_accuracy = correct.mean()
return weighted_accuracy.item()
# Use custom metric in pipeline
pipeline = SegmentationRobustnessPipeline(
model=model,
dataset=dataset,
attacks=[FGSM(model, eps=0.1)],
metrics=[custom_metric],
batch_size=4,
device="cuda"
)
Metric Registration¶
Register custom metrics for automatic discovery:
from segmentation_robustness_framework.metrics import register_custom_metric
@register_custom_metric("weighted_accuracy")
def weighted_accuracy(targets: torch.Tensor, predictions: torch.Tensor,
num_classes: int, ignore_index: int = 255) -> float:
"""Weighted accuracy metric."""
# Remove ignored pixels
mask = targets != ignore_index
targets = targets[mask]
predictions = predictions[mask]
# Calculate weighted accuracy
correct = (targets == predictions).float()
weighted_accuracy = correct.mean()
return weighted_accuracy.item()
# Now you can use it in configuration
# metrics:
# - weighted_accuracy
Metric Configuration¶
Configure metrics in YAML configuration files:
metrics:
ignore_index: 255
selected_metrics:
- mean_iou
- pixel_accuracy
- precision
- recall
- {"name": "dice_score", "average": "micro"}
- weighted_accuracy # Custom metric
Metric Usage in Pipeline¶
Metrics are automatically used by the pipeline:
from segmentation_robustness_framework.pipeline import SegmentationRobustnessPipeline
from segmentation_robustness_framework.metrics import MetricsCollection
# Create metrics collection
metrics = MetricsCollection(num_classes=21, ignore_index=255)
# Create pipeline with metrics
pipeline = SegmentationRobustnessPipeline(
model=model,
dataset=dataset,
attacks=[FGSM(model, eps=0.1)],
metrics=[
metrics.mean_iou,
metrics.pixel_accuracy,
metrics.precision,
metrics.recall,
metrics.dice_score
],
batch_size=4,
device="cuda"
)
# Run evaluation
results = pipeline.run()
# Access results
clean_iou = results['clean']['mean_iou']
attack_iou = results['attack_fgsm']['mean_iou']
print(f"Clean IoU: {clean_iou:.3f}")
print(f"Attack IoU: {attack_iou:.3f}")
Metric Aggregation¶
The framework provides different aggregation strategies:
# Micro averaging (global)
micro_precision = metrics.precision(targets, predictions, average='micro')
# Macro averaging (per-class then average)
macro_precision = metrics.precision(targets, predictions, average='macro')
# Weighted averaging (per-class weighted by frequency)
weighted_precision = metrics.precision(targets, predictions, average='weighted')
Performance Considerations¶
- GPU Acceleration: All metrics support GPU computation
- Memory Efficiency: Optimized for large batches
- Batch Processing: Efficient batch metric computation
- Numerical Stability: Robust to edge cases
Metric Interpretation¶
Understanding metric results:
# Good performance indicators
good_iou = 0.8 # 80% IoU is excellent
good_accuracy = 0.9 # 90% accuracy is very good
good_dice = 0.85 # 85% dice score is excellent
# Poor performance indicators
poor_iou = 0.3 # 30% IoU indicates issues
poor_accuracy = 0.5 # 50% accuracy is poor
poor_dice = 0.4 # 40% dice score is poor
# Robustness evaluation
robustness_ratio = attack_iou / clean_iou
if robustness_ratio > 0.8:
print("Model is robust")
elif robustness_ratio > 0.5:
print("Model has moderate robustness")
else:
print("Model is vulnerable to attacks")