biomedical_data_generator.BatchEffectsConfig

class biomedical_data_generator.BatchEffectsConfig(*, n_batches=0, effect_strength=0.5, effect_type='additive', effect_granularity='per_feature', confounding_with_class=0.0, affected_features='all', proportions=None)[source]

Bases: BaseModel

Configuration for simulating batch effects.

Simulate batch effects by adding random intercepts or scaling factors to a subset of features. This can be used to mimic:

site-to-site differences (multi-center studies),

instrument calibration shifts,

cohort / recruitment waves (temporal batches).

Conceptual separation of batch effect aspects:

confounding_with_class controls sampling bias: which samples (classes) are recruited into which batch.
effect_strength, effect_type and effect_granularity control technical variation: how strongly, and how coherently across features, the measurements shift between batches.

Parameters:

n_batches (int) – Number of batches. Value 0 effectively disables batch effects.
effect_strength (float) –
Scale of batch effects. Must be non-negative. - For effect_type="additive": standard deviation of the additive

batch effects, sampled as Normal(0, effect_strength).
- For effect_type="multiplicative": standard deviation of the
  multiplicative deviations around 1.0, sampled as 1 + Normal(0, effect_strength).
effect_type (Literal['additive', 'multiplicative']) – Type of batch effect. - "additive": Additive intercepts (shifts in feature means). - "multiplicative": Multiplicative scaling (changes in variance/scale).
effect_granularity (Literal['per_feature', 'scalar']) –
Granularity of batch effects across features: - "per_feature": draw distinct effects per batch and affected

feature (shape (n_batches, n_affected_features)).
- "scalar": draw a single effect per batch and apply it
  uniformly to all affected features (global per-batch shift/scale).
confounding_with_class (float) –
Degree of confounding between batch and class in [0.0, 1.0]. Controls how strongly batch assignment correlates with class labels, simulating recruitment bias in multi-center studies.
Semantics (for two classes / two batches with equal base proportions):
- 0.0 → independent: each batch has ~50/50 class mix.
- 0.5 → moderate correlation.
- 0.8 → strong recruitment bias (most samples of a class go to one batch).
- 1.0 → perfect confounding: each class maps to one preferred batch (if n_batches >= n_classes).
affected_features (list[int] | Literal['all']) – Which features should be affected: - "all": apply batch effects to all features. - list of ints: explicit 0-based column indices of affected features.
proportions (list[float] | None) – Optional target proportions for batch sizes. Values are normalized to sum to 1. If None, batches are (approximately) equal in size.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)
Return type:: None

Methods

`__init__`(**data)	Create a new model by parsing and validating input data from keyword arguments.
`construct`([_fields_set])
`copy`(*[, include, exclude, update, deep])	Returns a copy of the model.
`dict`(*[, include, exclude, by_alias, ...])
`from_orm`(obj)
`json`(*[, include, exclude, by_alias, ...])
`model_construct`([_fields_set])	Creates a new instance of the Model class with validated data.
`model_copy`(*[, update, deep])	!!! abstract "Usage Documentation"
`model_dump`(*[, mode, include, exclude, ...])	!!! abstract "Usage Documentation"
`model_dump_json`(*[, indent, ensure_ascii, ...])	!!! abstract "Usage Documentation"
`model_json_schema`([by_alias, ref_template, ...])	Generates a JSON schema for a model class.
`model_parametrized_name`(params)	Compute the class name for parametrizations of generic classes.
`model_post_init`(context, /)	Override this method to perform additional initialization after __init__ and model_construct.
`model_rebuild`(*[, force, raise_errors, ...])	Try to rebuild the pydantic-core schema for the model.
`model_validate`(obj, *[, strict, extra, ...])	Validate a pydantic model instance.
`model_validate_json`(json_data, *[, strict, ...])	!!! abstract "Usage Documentation"
`model_validate_strings`(obj, *[, strict, ...])	Validate the given object with string data against the Pydantic model.
`parse_file`(path, *[, content_type, ...])
`parse_obj`(obj)
`parse_raw`(b, *[, content_type, encoding, ...])
`schema`([by_alias, ref_template])
`schema_json`(*[, by_alias, ref_template])
`update_forward_refs`(**localns)
`validate`(value)
`validate_proportions`(v, info)	Ensure proportions are non-negative, match n_batches, and sum to 1.

Attributes

`model_computed_fields`
`model_config`	Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
`model_extra`	Get extra fields set during validation.
`model_fields`
`model_fields_set`	Returns the set of fields that have been explicitly set on this model instance.
`n_batches`
`effect_strength`
`effect_type`
`effect_granularity`
`confounding_with_class`
`affected_features`
`proportions`

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_proportions(v, info)[source]

Ensure proportions are non-negative, match n_batches, and sum to 1.

Parameters:: v (list[float] | None)