biomedical_data_generator.config.BatchEffectsConfig
- class biomedical_data_generator.config.BatchEffectsConfig(*, n_batches=0, effect_strength=0.5, effect_type='additive', effect_granularity='per_feature', confounding_with_class=0.0, affected_features='all', proportions=None)[source]
Bases:
BaseModelConfiguration for simulating batch effects.
Simulate batch effects by adding random intercepts or scaling factors to a subset of features. This can be used to mimic:
site-to-site differences (multi-center studies),
instrument calibration shifts,
cohort / recruitment waves (temporal batches).
- Conceptual separation of batch effect aspects:
confounding_with_classcontrols sampling bias: which samples (classes) are recruited into which batch.effect_strength,effect_typeandeffect_granularitycontrol technical variation: how strongly, and how coherently across features, the measurements shift between batches.
- Parameters:
n_batches (Annotated[int, Ge(ge=0)]) – Number of batches. Value 0 effectively disables batch effects.
effect_strength (Annotated[float, Ge(ge=0)]) –
Scale of batch effects. Must be non-negative. - For
effect_type="additive": standard deviation of the additivebatch effects, sampled as
Normal(0, effect_strength).- For
effect_type="multiplicative": standard deviation of the multiplicative deviations around 1.0, sampled as
1 + Normal(0, effect_strength).
- For
effect_type (Literal['additive', 'multiplicative']) – Type of batch effect. -
"additive": Additive intercepts (shifts in feature means). -"multiplicative": Multiplicative scaling (changes in variance/scale).effect_granularity (Literal['per_feature', 'scalar']) –
Granularity of batch effects across features: -
"per_feature": draw distinct effects per batch and affectedfeature (shape
(n_batches, n_affected_features))."scalar": draw a single effect per batch and apply ituniformly to all affected features (global per-batch shift/scale).
confounding_with_class (Annotated[float, Ge(ge=0.0), Le(le=1.0)]) –
Degree of confounding between batch and class in
[0.0, 1.0]. Controls how strongly batch assignment correlates with class labels, simulating recruitment bias in multi-center studies.- Semantics (for two classes / two batches with equal base proportions):
0.0 → independent: each batch has ~50/50 class mix.
0.5 → moderate correlation.
0.8 → strong recruitment bias (most samples of a class go to one batch).
1.0 → perfect confounding: each class maps to one preferred batch (if
n_batches >= n_classes).
affected_features (list[int] | Literal['all']) – Which features should be affected: -
"all": apply batch effects to all features. - list of ints: explicit 0-based column indices of affected features.proportions (list[float] | None) – Optional target proportions for batch sizes. Values are normalized to sum to 1. If
None, batches are (approximately) equal in size.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- __init__(**data)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- Return type:
None
Methods
__init__(**data)Create a new model by parsing and validating input data from keyword arguments.
construct([_fields_set])copy(*[, include, exclude, update, deep])Returns a copy of the model.
dict(*[, include, exclude, by_alias, ...])from_orm(obj)json(*[, include, exclude, by_alias, ...])model_construct([_fields_set])Creates a new instance of the Model class with validated data.
model_copy(*[, update, deep])!!! abstract "Usage Documentation"
model_dump(*[, mode, include, exclude, ...])!!! abstract "Usage Documentation"
model_dump_json(*[, indent, ensure_ascii, ...])!!! abstract "Usage Documentation"
model_json_schema([by_alias, ref_template, ...])Generates a JSON schema for a model class.
model_parametrized_name(params)Compute the class name for parametrizations of generic classes.
model_post_init(context, /)Override this method to perform additional initialization after __init__ and model_construct.
model_rebuild(*[, force, raise_errors, ...])Try to rebuild the pydantic-core schema for the model.
model_validate(obj, *[, strict, extra, ...])Validate a pydantic model instance.
model_validate_json(json_data, *[, strict, ...])!!! abstract "Usage Documentation"
model_validate_strings(obj, *[, strict, ...])Validate the given object with string data against the Pydantic model.
parse_file(path, *[, content_type, ...])parse_obj(obj)parse_raw(b, *[, content_type, encoding, ...])schema([by_alias, ref_template])schema_json(*[, by_alias, ref_template])update_forward_refs(**localns)validate(value)validate_proportions(v, info)Ensure proportions are non-negative, match n_batches, and sum to 1.
Attributes
model_computed_fieldsConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_extraGet extra fields set during validation.
model_fieldsmodel_fields_setReturns the set of fields that have been explicitly set on this model instance.
n_batcheseffect_strengtheffect_typeeffect_granularityconfounding_with_classaffected_featuresproportions- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].