Datasets API¶
This page documents the dataset components of the Segmentation Robustness Framework.
Dataset Classes¶
segmentation_robustness_framework.datasets.voc
¶
Classes¶
VOCSegmentation(split: str, root: Optional[Union[Path, str]] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = True)
¶
Bases: Dataset
Pascal VOC 2012 dataset for semantic segmentation.
The Pascal VOC 2012 dataset contains 21 classes of objects in natural scenes. Images are paired with pixel-level segmentation masks for training and evaluation.
Setup Instructions:
The dataset will be automatically downloaded and extracted if not present.
When download=True (default):
- If root is provided, the dataset will be stored at root/voc/VOCdevkit/VOC2012/.
- If root is None, the dataset will be cached in the default cache directory.
When download=False:
- The dataset must be present at the exact path specified by root.
- If root is None, the dataset will be looked for in the default cache directory.
Supported Splits:
- train: Training images (1,464 samples)
- val: Validation images (1,449 samples)
- trainval: Combined train and validation (2,913 samples)
Attributes:
| Name | Type | Description |
|---|---|---|
root |
str | Path | None
|
Directory for dataset storage or cache location. |
split |
str
|
Dataset split ('train', 'val', 'trainval'). |
transform |
callable
|
Image transformations. |
target_transform |
callable
|
Target transformations. |
download |
bool
|
Whether to download dataset if not present. |
num_classes |
int
|
Number of semantic classes (21). |
Initialize Pascal VOC 2012 dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split
|
str
|
Dataset split. Must be one of 'train', 'val', or 'trainval'. |
required |
root
|
str | Path | None
|
Directory for dataset storage.
If |
None
|
transform
|
callable
|
Transform to apply to images. Defaults to None. |
None
|
target_transform
|
callable
|
Transform to apply to masks. Defaults to None. |
None
|
download
|
bool
|
Whether to download dataset if not present. Defaults to True. |
True
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If dataset is not found and download fails. |
ValueError
|
If split is not valid. |
Source code in segmentation_robustness_framework/datasets/voc.py
Functions¶
segmentation_robustness_framework.datasets.ade20k
¶
Classes¶
ADE20K(split: str, root: Optional[Union[Path, str]] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = True)
¶
Bases: Dataset
ADE20K dataset for semantic segmentation.
The ADE20K dataset contains 20,210 images with 150 semantic categories. Images are paired with pixel-level segmentation masks for training and evaluation.
Setup Instructions:
The dataset will be automatically downloaded and extracted if not present.
When download=True (default):
- If root is provided, the dataset will be stored at root/ade20k/ADEChallengeData2016/.
- If root is None, the dataset will be cached in the default cache directory.
When download=False:
- The dataset must be present at the exact path specified by root.
- If root is None, the dataset will be looked for in the default cache directory.
Supported Splits:
- train: Training images (~20,000 samples)
- val: Validation images (~2,000 samples)
Attributes:
| Name | Type | Description |
|---|---|---|
root |
str | Path | None
|
Directory for dataset storage or cache location. |
split |
str
|
Dataset split ('train', 'val'). |
transform |
callable
|
Image transformations. |
target_transform |
callable
|
Target transformations. |
download |
bool
|
Whether to download dataset if not present. |
num_classes |
int
|
Number of semantic classes (150). |
Initialize ADE20K dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
split
|
str
|
Dataset split. Must be one of 'train' or 'val'. |
required |
root
|
str | Path | None
|
Directory for dataset storage.
If |
None
|
transform
|
callable
|
Transform to apply to images. Defaults to None. |
None
|
target_transform
|
callable
|
Transform to apply to masks. Defaults to None. |
None
|
download
|
bool
|
Whether to download dataset if not present. Defaults to True. |
True
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If dataset is not found and download fails. |
ValueError
|
If split is not valid. |
Source code in segmentation_robustness_framework/datasets/ade20k.py
Functions¶
segmentation_robustness_framework.datasets.cityscapes
¶
Classes¶
Cityscapes(root: Union[Path, str], split: str = 'train', mode: str = 'fine', target_type: str = 'semantic', transform: Optional[Callable] = None, target_transform: Optional[Callable] = None)
¶
Bases: Dataset
Cityscapes dataset for semantic segmentation.
Cityscapes is a large-scale dataset for semantic understanding of urban street scenes. It contains high-quality pixel-level annotations of 5000 images in 50 cities.
Setup Instructions:
- Register at https://www.cityscapes-dataset.com/
- Download the dataset files:
leftImg8bit_trainvaltest.zip(11GB) - training/validation/test imagesgtFine_trainval.zip(241MB) - fine annotations for train/valgtCoarse.zip(1.3GB) - coarse annotations for train/val/train_extraleftImg8bit_trainextra.zip(44GB) - extra training images (optional)- Extract all archives to the same root directory
- Ensure the directory structure matches:
Supported Splits:
- train: Training images with fine annotations
- val: Validation images with fine annotations
- test: Test images (no annotations available)
- train_extra: Extra training images with coarse annotations
Supported Modes:
- fine: High-quality pixel-level annotations
- coarse: Coarse polygon annotations
Supported Target Types:
- semantic: Semantic segmentation masks
- instance: Instance segmentation masks
- color: Color-coded visualization masks
- polygon: Polygon annotations (JSON format)
Attributes:
| Name | Type | Description |
|---|---|---|
root |
str | Path
|
Path to the Cityscapes dataset root directory. |
split |
str
|
Dataset split ('train', 'val', 'test', 'train_extra'). |
mode |
str
|
Annotation mode ('fine' or 'coarse'). |
target_type |
str | list
|
Type of target annotations. |
transform |
callable
|
Image transformations. |
target_transform |
callable
|
Target transformations. |
num_classes |
int
|
Number of semantic classes (35). |
Initialize Cityscapes dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path
|
Path to the Cityscapes dataset root directory. Must contain the extracted dataset files with proper directory structure. |
required |
split
|
str
|
Dataset split. Must be one of 'train', 'val', 'test', or 'train_extra'. Defaults to "train". |
'train'
|
mode
|
str
|
Annotation mode. Must be 'fine' or 'coarse'. Defaults to "fine". |
'fine'
|
target_type
|
str | list
|
Type of target annotations. Can be a single type or list of types. Must be one or more of 'semantic', 'instance', 'color', 'polygon'. Defaults to "semantic". |
'semantic'
|
transform
|
callable
|
Transform to apply to images. Defaults to None. |
None
|
target_transform
|
callable
|
Transform to apply to targets. Defaults to None. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If root directory does not exist. |
ValueError
|
If split is not valid. |
ValueError
|
If mode is not valid. |
ValueError
|
If target_type is not valid. |
ValueError
|
If test split is used with coarse mode. |
ValueError
|
If train_extra split is used with fine mode. |
ValueError
|
If required dataset files are missing. |
Source code in segmentation_robustness_framework/datasets/cityscapes.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
Functions¶
segmentation_robustness_framework.datasets.stanford_background
¶
Classes¶
StanfordBackground(root: Optional[Union[Path, str]] = None, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = True)
¶
Bases: Dataset
Stanford Background dataset for semantic segmentation.
The Stanford Background dataset contains 715 images with 9 semantic categories. Images are paired with pixel-level segmentation masks for training and evaluation.
Setup Instructions:
The dataset will be automatically downloaded and extracted if not present.
When download=True (default):
- If root is provided, the dataset will be stored at root/stanford_background/stanford_background/.
- If root is None, the dataset will be cached in the default cache directory.
When download=False:
- The dataset must be present at the exact path specified by root.
- If root is None, the dataset will be looked for in the default cache directory.
Dataset Structure:
- images/: Input RGB images
- labels_colored/: Segmentation masks (color images)
Attributes:
| Name | Type | Description |
|---|---|---|
root |
str | Path | None
|
Directory for dataset storage or cache location. |
transform |
callable
|
Image transformations. |
target_transform |
callable
|
Target transformations. |
download |
bool
|
Whether to download dataset if not present. |
num_classes |
int
|
Number of semantic classes (9). |
Initialize Stanford Background dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
root
|
str | Path | None
|
Directory for dataset storage.
If |
None
|
transform
|
callable
|
Transform to apply to images. Defaults to None. |
None
|
target_transform
|
callable
|
Transform to apply to masks. Defaults to None. |
None
|
download
|
bool
|
Whether to download dataset if not present. Defaults to True. |
True
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If dataset is not found and download fails. |
Source code in segmentation_robustness_framework/datasets/stanford_background.py
Functions¶
Dataset Loaders¶
segmentation_robustness_framework.loaders.dataset_loader
¶
Classes¶
DatasetLoader(dataset_config: dict[str, Any])
¶
Load and configure datasets for image segmentation tasks.
The DatasetLoader initializes and loads a dataset by name using the provided configuration,
and applies preprocessing to input images and segmentation masks.
Supported attributes
config(dict[str, Any]): Configuration specifying the dataset and its parameters.dataset_name(str): Name of the dataset to be loaded (e.g.,VOC,ADE20K).root(str): Root directory where the dataset is located.images_shape(list[int]): Desired image shape for preprocessing [height, width].
Example
Initialize the DatasetLoader with a dataset configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_config
|
dict[str, Any]
|
|
required |
Source code in segmentation_robustness_framework/loaders/dataset_loader.py
Functions¶
load_dataset() -> Dataset
¶
Loads and preprocesses the specified dataset.
Based on the dataset name in the configuration, the corresponding dataset class is initialized with appropriate preprocessing transformations applied.
Returns:
| Name | Type | Description |
|---|---|---|
Dataset |
Dataset
|
An instance of the dataset class ready for training or evaluation. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the specified dataset name is not recognized. |
Source code in segmentation_robustness_framework/loaders/dataset_loader.py
Modules¶
Dataset Overview¶
The framework provides support for popular semantic segmentation datasets with automatic preprocessing and data loading.
Available Datasets¶
VOC (PASCAL VOC 2012)¶
The PASCAL Visual Object Classes dataset:
from segmentation_robustness_framework.datasets import VOCSegmentation
# Load VOC dataset
dataset = VOCSegmentation(
root="./data",
split="val",
transform=transform,
target_transform=target_transform
)
print(f"Dataset size: {len(dataset)}")
print(f"Number of classes: {dataset.num_classes}")
Features: - 20 object classes + background - High-quality pixel-level annotations - Standard benchmark dataset - Automatic download support
ADE20K (MIT Scene Parsing)¶
The MIT ADE20K dataset for scene parsing:
from segmentation_robustness_framework.datasets import ADE20K
# Load ADE20K dataset
dataset = ADE20K(
root="./data",
split="val",
transform=transform,
target_transform=target_transform
)
print(f"Dataset size: {len(dataset)}")
print(f"Number of classes: {dataset.num_classes}")
Features: - 150 semantic categories - Complex scene understanding - High-resolution images - Detailed annotations
Cityscapes¶
Urban scene understanding dataset:
from segmentation_robustness_framework.datasets import Cityscapes
# Load Cityscapes dataset
dataset = Cityscapes(
root="./data",
split="val",
mode="fine",
target_type="semantic",
transform=transform,
target_transform=target_transform
)
print(f"Dataset size: {len(dataset)}")
print(f"Number of classes: {dataset.num_classes}")
Features: - 19 semantic categories - High-resolution urban images - Fine and coarse annotations - Multiple annotation types
Stanford Background¶
Natural scene parsing dataset:
from segmentation_robustness_framework.datasets import StanfordBackground
# Load Stanford Background dataset
dataset = StanfordBackground(
root="./data",
transform=transform,
target_transform=target_transform
)
print(f"Dataset size: {len(dataset)}")
print(f"Number of classes: {dataset.num_classes}")
Features: - 8 semantic categories - Natural outdoor scenes - High-quality annotations - Compact dataset for testing
Dataset Configuration¶
Configure datasets in YAML configuration files. The framework automatically applies preprocessing based on the image_shape parameter:
dataset:
name: voc
split: val
root: ./data
image_shape: [512, 512] # Automatically applies resize, normalize, and mask conversion
Dataset Loading¶
Use the DatasetLoader for automatic dataset loading with preprocessing:
from segmentation_robustness_framework.loaders import DatasetLoader
# Load dataset with configuration
dataset_config = {
"name": "voc",
"split": "val",
"root": "./data",
"image_shape": [512, 512] # Automatically applies preprocessing
}
dataset_loader = DatasetLoader(dataset_config)
dataset = dataset_loader.load_dataset()
Automatic Preprocessing¶
The framework automatically applies preprocessing based on the image_shape parameter:
from segmentation_robustness_framework.utils.image_preprocessing import get_preprocessing_fn
# Get preprocessing functions (automatically called by DatasetLoader)
image_preprocess, target_preprocess = get_preprocessing_fn(
image_shape=[512, 512],
dataset_name="voc"
)
# The preprocessing includes:
# - Image resize to specified shape
# - Image normalization (ImageNet stats)
# - Mask resize to match image
# - RGB to index conversion for masks
# - Stride alignment (ensures dimensions are divisible by 8)
What Gets Applied Automatically¶
When you specify image_shape in the dataset configuration, the framework automatically applies:
Image Preprocessing¶
- Resize: Images are resized to the specified
[height, width] - Normalization: Images are normalized using ImageNet statistics (
mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]) - Tensor Conversion: Images are converted to PyTorch tensors
Mask Preprocessing¶
- Resize: Masks are resized to match the image dimensions
- RGB to Index: RGB masks are converted to class indices using dataset-specific color palettes
- Stride Alignment: Dimensions are adjusted to be divisible by 8 for model compatibility
Dataset-Specific Features¶
- Color Mapping: Each dataset has its own color palette for mask conversion
- Ignore Index: Proper handling of ignored pixels (usually index 255)
- Error Handling: Warnings for unmapped colors in masks
Custom Datasets¶
Create custom datasets by inheriting from torch.utils.data.Dataset:
import torch
from torch.utils.data import Dataset
from PIL import Image
import os
class MyCustomDataset(Dataset):
def __init__(self, root, split="train", transform=None, target_transform=None):
self.root = root
self.split = split
self.transform = transform
self.target_transform = target_transform
self.num_classes = 21
# Load your data
self.images = [] # List of image paths
self.masks = [] # List of mask paths
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
# Load image and mask
image = Image.open(self.images[idx]).convert("RGB")
mask = Image.open(self.masks[idx])
# Apply transforms
if self.transform:
image = self.transform(image)
if self.target_transform:
mask = self.target_transform(mask)
return image, mask
# Use custom dataset
dataset = MyCustomDataset("./data", split="train")
Dataset Registration¶
Register custom datasets for automatic discovery:
from segmentation_robustness_framework.datasets import register_dataset
@register_dataset("my_custom")
class MyCustomDataset(Dataset):
def __init__(self, root, split="train", transform=None, target_transform=None):
# Your dataset implementation
pass
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
# Your data loading logic
pass
# Now you can use it in configuration
# dataset:
# name: my_custom
# split: train
Dataset Usage in Pipeline¶
Datasets are automatically used by the pipeline:
from segmentation_robustness_framework.pipeline import SegmentationRobustnessPipeline
# Create pipeline with dataset
pipeline = SegmentationRobustnessPipeline(
model=model,
dataset=dataset, # Your dataset here
attacks=[FGSM(model, eps=0.1)],
metrics=[metrics.mean_iou],
batch_size=4,
device="cuda"
)
results = pipeline.run()
Performance Considerations¶
- Memory Efficiency: Lazy loading for large datasets
- GPU Compatibility: Automatic device placement
- Batch Processing: Optimized for batch inference
- Data Augmentation: Built-in augmentation support