Code Documentation
This section provides a complete overview of the internal modules of
biomedical-data-generator.
It is intended for developers, contributors, and advanced users who want
to understand or extend the code base.
The API documentation is automatically generated using Sphinx
autodoc and autosummary.
Each module listed below expands into a separate page in the
_autosummary directory.
—
Configuration Models
These classes define the full dataset configuration, including class structure, correlated clusters, noise distribution, and optional batch effects.
Configuration for a single class in the dataset. |
|
Configuration for simulating batch effects. |
|
Correlated feature cluster simulating coordinated biomarker patterns. |
|
Configuration for synthetic dataset generation. |
—
Dataset Generator
The central entry point for creating synthetic datasets.
Generate synthetic biomedical dataset with specified feature structure. |
—
Feature Generators
Functions responsible for generating informative features, noise features, and correlated feature clusters.
Informative features
Generation of free informative features and class separation. |
Independent noise features
—
Batch Effects
Simulation of site effects, instrument variation, temporal drift, and confounding with class labels.
Batch effect simulation for synthetic biomedical datasets. |
—
Metadata
Structured metadata describing the full generative process, including feature roles, class labels, correlated clusters, batch labels, and derived dataset properties.
Metadata about the generated dataset. |
—
Utility Modules (Optional)
Helper functions for data manipulation, visualization, and integration with scikit-learn.
Correlation analysis and seed search utilities (no plotting). |
|
Export utilities for saving generated datasets to various formats. |
|
Plot utilities for correlation analysis. |
|
Sklearn-like convenience wrapper around biomedical-data-generator. |