biomedical_data_generator.utils.export_utils

Export utilities for saving generated datasets to various formats.

Functions

to_csv(x, y, meta, filepath, *[, ...])

Export dataset to CSV file.

to_labeled_dataframe(x[, y, meta, ...])

Convert generated dataset to DataFrame with optional labels.

to_parquet(X, y, meta, filepath, *[, ...])

Export dataset to Parquet file (efficient for large datasets).

biomedical_data_generator.utils.export_utils.to_csv(x, y, meta, filepath, *, include_labels=True, index=False, **csv_kwargs)[source]

Export dataset to CSV file.

Convenience wrapper around to_dataframe() + DataFrame.to_csv().

Parameters:
  • x (DataFrame | ndarray[tuple[Any, ...], dtype[float64]]) – Feature matrix.

  • y (ndarray[tuple[Any, ...], dtype[int64]]) – Class labels.

  • meta (DatasetMeta) – Dataset metadata.

  • filepath (str | Path) – Output path (e.g., “data/train.csv”).

  • include_labels (bool) – If True, include label columns.

  • index (bool) – If True, write row indices to CSV.

  • **csv_kwargs – Additional arguments for pd.DataFrame.to_csv() (e.g., index=False, sep=’;’).

Return type:

None

Examples

>>> to_csv(x, y, meta, "output/dataset.csv", index=False)
biomedical_data_generator.utils.export_utils.to_labeled_dataframe(x, y=None, meta=None, *, include_labels=True, label_col_name='y', label_str_col_name='y_label', feature_names=None)[source]

Convert generated dataset to DataFrame with optional labels.

Flexible conversion supporting multiple use cases: 1. Full conversion: x + y + meta → df with features + labels 2. Features only: x + meta → df with features (no labels) 3. Custom names: override default column names

Parameters:
  • x (DataFrame | ndarray[tuple[Any, ...], dtype[float64]]) – Feature matrix (DataFrame or ndarray).

  • y (ndarray[tuple[Any, ...], dtype[int64]] | None) – Optional class labels (integers 0 to n_classes-1).

  • meta (DatasetMeta | None) – Optional dataset metadata.

  • include_labels (bool) – If True and y provided, add label columns.

  • label_col_name (str) – Column name for numeric labels.

  • label_str_col_name (str) – Column name for string labels.

  • feature_names (list[str] | None) – Override meta.feature_names (for custom naming).

Returns:

DataFrame with requested columns.

Raises:

ValueError – If shapes mismatch or required args missing.

Return type:

DataFrame

Examples

>>> # Standard usage
>>> df = to_labeled_dataframe(x, y, meta)
>>> # Features only
>>> df_features = to_labeled_dataframe(x, meta=meta, include_labels=False)
>>> # Custom column names
>>> df = to_labeled_dataframe(x, y, meta,
...                   label_col_name="class",
...                   label_str_col_name="diagnosis")
biomedical_data_generator.utils.export_utils.to_parquet(X, y, meta, filepath, *, include_labels=True, **parquet_kwargs)[source]

Export dataset to Parquet file (efficient for large datasets).

Parameters:
  • X (DataFrame | ndarray[tuple[Any, ...], dtype[float64]]) – Feature matrix.

  • y (ndarray[tuple[Any, ...], dtype[int64]]) – Class labels.

  • meta (DatasetMeta) – Dataset metadata.

  • filepath (str | Path) – Output path (e.g., “data/train.parquet”).

  • include_labels (bool) – If True, include label columns.

  • **parquet_kwargs – Additional arguments for pd.DataFrame.to_parquet() (e.g., compression=’gzip’, engine=’pyarrow’).

Return type:

None

Examples

>>> to_parquet(X, y, meta, "output/dataset.parquet")