biomedical_data_generator.DatasetConfig
- class biomedical_data_generator.DatasetConfig(*, n_samples=100, n_features=None, n_informative=2, n_pseudo=0, n_noise=0, noise_distribution=NoiseDistribution.normal, noise_scale=1.0, noise_params=None, n_classes=2, weights=None, class_counts=None, class_sep=1.5, feature_naming='prefixed', prefix_informative='i', prefix_pseudo='p', prefix_noise='n', prefix_corr='corr', corr_clusters=<factory>, corr_between=0.0, anchor_mode='equalized', effect_size='medium', batch=None, random_state=None)[source]
Bases:
BaseModelConfiguration for synthetic dataset generation.
The strict mode=”before” normalizer fills/validates n_features without Pydantic warnings.
Note: - The ‘before’ validator normalizes raw inputs:
fills n_features if omitted,
enforces n_features >= minimal requirement (strict).
Use DatasetConfig.relaxed(…) if you want silent auto-fix instead of a validation error.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
n_samples (int)
n_features (int | None)
n_informative (int)
n_pseudo (int)
n_noise (int)
noise_distribution (NoiseDistribution)
noise_scale (float)
n_classes (int)
class_sep (float)
feature_naming (Literal['prefixed', 'simple'])
prefix_informative (str)
prefix_pseudo (str)
prefix_noise (str)
prefix_corr (str)
corr_clusters (list[CorrCluster])
corr_between (float)
anchor_mode (Literal['equalized', 'strong'])
effect_size (Literal['small', 'medium', 'large'])
batch (BatchConfig | None)
random_state (int | None)
- __init__(**data)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
data (Any)
- Return type:
None
Methods
__init__(**data)Create a new model by parsing and validating input data from keyword arguments.
Return a structured breakdown of feature counts, incl.
construct([_fields_set])copy(*[, include, exclude, update, deep])Returns a copy of the model.
Count clusters whose anchor contributes as 'informative'.
dict(*[, include, exclude, by_alias, ...])from_orm(obj)from_yaml(path)Load from YAML and validate via the same 'before' pipeline.
json(*[, include, exclude, by_alias, ...])model_construct([_fields_set])Creates a new instance of the Model class with validated data.
model_copy(*[, update, deep])!!! abstract "Usage Documentation"
model_dump(*[, mode, include, exclude, ...])!!! abstract "Usage Documentation"
model_dump_json(*[, indent, ensure_ascii, ...])!!! abstract "Usage Documentation"
model_json_schema([by_alias, ref_template, ...])Generates a JSON schema for a model class.
model_parametrized_name(params)Compute the class name for parametrizations of generic classes.
model_post_init(context, /)Override this method to perform additional initialization after __init__ and model_construct.
model_rebuild(*[, force, raise_errors, ...])Try to rebuild the pydantic-core schema for the model.
model_validate(obj, *[, strict, extra, ...])Validate a pydantic model instance.
model_validate_json(json_data, *[, strict, ...])!!! abstract "Usage Documentation"
model_validate_strings(obj, *[, strict, ...])Validate the given object with string data against the Pydantic model.
parse_file(path, *[, content_type, ...])parse_obj(obj)parse_raw(b, *[, content_type, encoding, ...])relaxed(**kwargs)Create a configuration, silently autofixing n_features to the required minimum.
schema([by_alias, ref_template])schema_json(*[, by_alias, ref_template])summary(*[, per_cluster, as_markdown])Return a human-readable summary of the configuration.
update_forward_refs(**localns)validate(value)Attributes
model_computed_fieldsConfiguration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_extraGet extra fields set during validation.
model_fieldsmodel_fields_setReturns the set of fields that have been explicitly set on this model instance.
n_samplesn_featuresn_informativen_pseudon_noisenoise_distributionnoise_scalenoise_paramsn_classesweightsclass_countsclass_sepfeature_namingprefix_informativeprefix_pseudoprefix_noiseprefix_corrcorr_clusterscorr_betweenanchor_modeeffect_sizebatchrandom_state- breakdown()[source]
Return a structured breakdown of feature counts, incl. cluster proxies.
Returns:
A dict with keys: - n_informative_total - n_informative_anchors - n_informative_free - n_pseudo_free - n_noise - proxies_from_clusters - n_features_expected - n_features_configured
Raises:
ValueError: If self.n_features is inconsistent (should not happen if validated).
This is a safeguard against manual tampering with the instance attributes. This should not happen if the instance was created via the normal validators. If you encounter this, please report a bug.
Note
n_features_expected = n_informative + n_pseudo + n_noise + proxies_from_clusters n_features_configured = self.n_features (may be larger than expected)
- count_informative_anchors()[source]
Count clusters whose anchor contributes as ‘informative’.
Note: This is a subset of n_informative, not a separate count.
Returns:
The number of clusters with anchor_role == “informative”.
Note
If you want the number of additional features contributed by clusters, use self._proxies_from_clusters(self.corr_clusters).
- Return type:
- classmethod from_yaml(path)[source]
Load from YAML and validate via the same ‘before’ pipeline.
- Parameters:
cls – The DatasetConfig class.
path (str) – Path to a YAML file.
- Return type:
Returns:
A validated DatasetConfig instance.
Raises:
FileNotFoundError: If the file does not exist. yaml.YAMLError: If the file cannot be parsed as YAML. pydantic.ValidationError: If the loaded config is invalid.
Note
This requires PyYAML to be installed.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'use_enum_values': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod relaxed(**kwargs)[source]
Create a configuration, silently autofixing n_features to the required minimum.
Convenience factory that silently ‘autofixes’ n_features to the required minimum. Prefer this in teaching notebooks to avoid interruptions.
- Parameters:
**kwargs (Any) – Any valid DatasetConfig field.
- Return type:
Returns:
A validated DatasetConfig instance with n_features >= required minimum.
Note
This does NOT modify the original kwargs dict.