edaflow.ml.validate_ml_data

edaflow.ml.validate_ml_data(experiment_data: Dict[str, Any] | None = None, check_missing: bool = True, check_duplicates: bool = True, check_outliers: bool = True, verbose: bool = True, X: DataFrame | None = None, y: Series | None = None, check_cardinality: bool = True, check_distributions: bool = True) Dict[str, Any][source]

Validate data quality for ML experiments.

This function supports two calling patterns: 1. Experiment config: validate_ml_data(experiment_config) 2. Sklearn-style: validate_ml_data(X=X_train, y=y_train)

Parameters:

experiment_dataDict[str, Any], optional

Dictionary from setup_ml_experiment containing splits

check_missingbool, default=True

Whether to check for missing values

check_duplicatesbool, default=True

Whether to check for duplicate rows

check_outliersbool, default=True

Whether to check for outliers

verbosebool, default=True

Whether to print validation details

Xpd.DataFrame, optional

Feature data (alternative to experiment_data)

ypd.Series, optional

Target data (alternative to experiment_data)

check_cardinalitybool, default=True

Whether to check feature cardinality

check_distributionsbool, default=True

Whether to check feature distributions

Returns:

Dict[str, Any]

Dictionary containing validation results and recommendations