biomedical_data_generator.utils.export_utils
Export utilities for saving generated datasets to various formats.
Functions
|
Export dataset to CSV file. |
|
Convert generated dataset to DataFrame with optional labels. |
|
Export dataset to Parquet file (efficient for large datasets). |
- biomedical_data_generator.utils.export_utils.to_csv(x, y, meta, filepath, *, include_labels=True, index=False, **csv_kwargs)[source]
Export dataset to CSV file.
Convenience wrapper around to_dataframe() + DataFrame.to_csv().
- Parameters:
x (DataFrame | ndarray[tuple[Any, ...], dtype[float64]]) – Feature matrix.
meta (DatasetMeta) – Dataset metadata.
filepath (str | Path) – Output path (e.g., “data/train.csv”).
include_labels (bool) – If True, include label columns.
index (bool) – If True, write row indices to CSV.
**csv_kwargs – Additional arguments for pd.DataFrame.to_csv() (e.g., index=False, sep=’;’).
- Return type:
None
Examples
>>> to_csv(x, y, meta, "output/dataset.csv", index=False)
- biomedical_data_generator.utils.export_utils.to_labeled_dataframe(x, y=None, meta=None, *, include_labels=True, label_col_name='y', label_str_col_name='y_label', feature_names=None)[source]
Convert generated dataset to DataFrame with optional labels.
Flexible conversion supporting multiple use cases: 1. Full conversion: x + y + meta → df with features + labels 2. Features only: x + meta → df with features (no labels) 3. Custom names: override default column names
- Parameters:
x (DataFrame | ndarray[tuple[Any, ...], dtype[float64]]) – Feature matrix (DataFrame or ndarray).
y (ndarray[tuple[Any, ...], dtype[int64]] | None) – Optional class labels (integers 0 to n_classes-1).
meta (DatasetMeta | None) – Optional dataset metadata.
include_labels (bool) – If True and y provided, add label columns.
label_col_name (str) – Column name for numeric labels.
label_str_col_name (str) – Column name for string labels.
feature_names (list[str] | None) – Override meta.feature_names (for custom naming).
- Returns:
DataFrame with requested columns.
- Raises:
ValueError – If shapes mismatch or required args missing.
- Return type:
DataFrame
Examples
>>> # Standard usage >>> df = to_labeled_dataframe(x, y, meta)
>>> # Features only >>> df_features = to_labeled_dataframe(x, meta=meta, include_labels=False)
>>> # Custom column names >>> df = to_labeled_dataframe(x, y, meta, ... label_col_name="class", ... label_str_col_name="diagnosis")
- biomedical_data_generator.utils.export_utils.to_parquet(X, y, meta, filepath, *, include_labels=True, **parquet_kwargs)[source]
Export dataset to Parquet file (efficient for large datasets).
- Parameters:
X (DataFrame | ndarray[tuple[Any, ...], dtype[float64]]) – Feature matrix.
meta (DatasetMeta) – Dataset metadata.
filepath (str | Path) – Output path (e.g., “data/train.parquet”).
include_labels (bool) – If True, include label columns.
**parquet_kwargs – Additional arguments for pd.DataFrame.to_parquet() (e.g., compression=’gzip’, engine=’pyarrow’).
- Return type:
None
Examples
>>> to_parquet(X, y, meta, "output/dataset.parquet")