Code Documentation

This section provides a complete overview of the internal modules of biomedical-data-generator. It is intended for developers, contributors, and advanced users who want to understand or extend the code base.

The API documentation is automatically generated using Sphinx autodoc and autosummary. Each module listed below expands into a separate page in the _autosummary directory.

Configuration Models

These classes define the full dataset configuration, including class structure, correlated clusters, noise distribution, and optional batch effects.

biomedical_data_generator.config.ClassConfig

Configuration for a single class in the dataset.

biomedical_data_generator.config.BatchEffectsConfig

Configuration for simulating batch effects.

biomedical_data_generator.config.CorrClusterConfig

Correlated feature cluster simulating coordinated biomarker patterns.

biomedical_data_generator.config.DatasetConfig

Configuration for synthetic dataset generation.

Dataset Generator

The central entry point for creating synthetic datasets.

biomedical_data_generator.generate_dataset

Generate synthetic biomedical dataset with specified feature structure.

Feature Generators

Functions responsible for generating informative features, noise features, and correlated feature clusters.

Informative features

biomedical_data_generator.features.informative

Generation of free informative features and class separation.

Correlated feature clusters

biomedical_data_generator.features.correlated

Generation of correlated feature clusters simulating pathway-like modules.

Independent noise features

Batch Effects

Simulation of site effects, instrument variation, temporal drift, and confounding with class labels.

biomedical_data_generator.effects.batch

Batch effect simulation for synthetic biomedical datasets.

Metadata

Structured metadata describing the full generative process, including feature roles, class labels, correlated clusters, batch labels, and derived dataset properties.

biomedical_data_generator.meta.DatasetMeta

Metadata about the generated dataset.

Utility Modules (Optional)

Helper functions for data manipulation, visualization, and integration with scikit-learn.

biomedical_data_generator.utils.correlation_tools

Correlation analysis and seed search utilities (no plotting).

biomedical_data_generator.utils.export_utils

Export utilities for saving generated datasets to various formats.

biomedical_data_generator.utils.visualization

Plot utilities for correlation analysis.

biomedical_data_generator.utils.sklearn_compat

Sklearn-like convenience wrapper around biomedical-data-generator.