biomedical_data_generator.BatchEffectsConfig

class biomedical_data_generator.BatchEffectsConfig(*, n_batches=0, effect_strength=0.5, effect_type='additive', effect_granularity='per_feature', confounding_with_class=0.0, affected_features='all', proportions=None)[source]

Bases: BaseModel

Configuration for simulating batch effects.

Simulate batch effects by adding random intercepts or scaling factors to a subset of features. This can be used to mimic:

  • site-to-site differences (multi-center studies),

  • instrument calibration shifts,

  • cohort / recruitment waves (temporal batches).

Conceptual separation of batch effect aspects:
  • confounding_with_class controls sampling bias: which samples (classes) are recruited into which batch.

  • effect_strength, effect_type and effect_granularity control technical variation: how strongly, and how coherently across features, the measurements shift between batches.

Parameters:
  • n_batches (int) – Number of batches. Value 0 effectively disables batch effects.

  • effect_strength (float) –

    Scale of batch effects. Must be non-negative. - For effect_type="additive": standard deviation of the additive

    batch effects, sampled as Normal(0, effect_strength).

    • For effect_type="multiplicative": standard deviation of the

      multiplicative deviations around 1.0, sampled as 1 + Normal(0, effect_strength).

  • effect_type (Literal['additive', 'multiplicative']) – Type of batch effect. - "additive": Additive intercepts (shifts in feature means). - "multiplicative": Multiplicative scaling (changes in variance/scale).

  • effect_granularity (Literal['per_feature', 'scalar']) –

    Granularity of batch effects across features: - "per_feature": draw distinct effects per batch and affected

    feature (shape (n_batches, n_affected_features)).

    • "scalar": draw a single effect per batch and apply it

      uniformly to all affected features (global per-batch shift/scale).

  • confounding_with_class (float) –

    Degree of confounding between batch and class in [0.0, 1.0]. Controls how strongly batch assignment correlates with class labels, simulating recruitment bias in multi-center studies.

    Semantics (for two classes / two batches with equal base proportions):
    • 0.0 → independent: each batch has ~50/50 class mix.

    • 0.5 → moderate correlation.

    • 0.8 → strong recruitment bias (most samples of a class go to one batch).

    • 1.0 → perfect confounding: each class maps to one preferred batch (if n_batches >= n_classes).

  • affected_features (list[int] | Literal['all']) – Which features should be affected: - "all": apply batch effects to all features. - list of ints: explicit 0-based column indices of affected features.

  • proportions (list[float] | None) – Optional target proportions for batch sizes. Values are normalized to sum to 1. If None, batches are (approximately) equal in size.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

Return type:

None

Methods

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

construct([_fields_set])

copy(*[, include, exclude, update, deep])

Returns a copy of the model.

dict(*[, include, exclude, by_alias, ...])

from_orm(obj)

json(*[, include, exclude, by_alias, ...])

model_construct([_fields_set])

Creates a new instance of the Model class with validated data.

model_copy(*[, update, deep])

!!! abstract "Usage Documentation"

model_dump(*[, mode, include, exclude, ...])

!!! abstract "Usage Documentation"

model_dump_json(*[, indent, ensure_ascii, ...])

!!! abstract "Usage Documentation"

model_json_schema([by_alias, ref_template, ...])

Generates a JSON schema for a model class.

model_parametrized_name(params)

Compute the class name for parametrizations of generic classes.

model_post_init(context, /)

Override this method to perform additional initialization after __init__ and model_construct.

model_rebuild(*[, force, raise_errors, ...])

Try to rebuild the pydantic-core schema for the model.

model_validate(obj, *[, strict, extra, ...])

Validate a pydantic model instance.

model_validate_json(json_data, *[, strict, ...])

!!! abstract "Usage Documentation"

model_validate_strings(obj, *[, strict, ...])

Validate the given object with string data against the Pydantic model.

parse_file(path, *[, content_type, ...])

parse_obj(obj)

parse_raw(b, *[, content_type, encoding, ...])

schema([by_alias, ref_template])

schema_json(*[, by_alias, ref_template])

update_forward_refs(**localns)

validate(value)

validate_proportions(v, info)

Ensure proportions are non-negative, match n_batches, and sum to 1.

Attributes

model_computed_fields

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_extra

Get extra fields set during validation.

model_fields

model_fields_set

Returns the set of fields that have been explicitly set on this model instance.

n_batches

effect_strength

effect_type

effect_granularity

confounding_with_class

affected_features

proportions

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_proportions(v, info)[source]

Ensure proportions are non-negative, match n_batches, and sum to 1.

Parameters:

v (list[float] | None)