edaflow.ml.validate_ml_data

edaflow.ml.validate_ml_data(experiment_data: Dict[str, Any] | None = None, check_missing: bool = True, check_duplicates: bool = True, check_outliers: bool = True, verbose: bool = True, X: DataFrame | None = None, y: Series | None = None, check_cardinality: bool = True, check_distributions: bool = True) → Dict[str, Any][source]

Validate data quality for ML experiments.

This function supports two calling patterns: 1. Experiment config: validate_ml_data(experiment_config) 2. Sklearn-style: validate_ml_data(X=X_train, y=y_train)

Parameters:

experiment_dataDict[str, Any], optional: Dictionary from setup_ml_experiment containing splits
check_missingbool, default=True: Whether to check for missing values
check_duplicatesbool, default=True: Whether to check for duplicate rows
check_outliersbool, default=True: Whether to check for outliers
verbosebool, default=True: Whether to print validation details
Xpd.DataFrame, optional: Feature data (alternative to experiment_data)
ypd.Series, optional: Target data (alternative to experiment_data)
check_cardinalitybool, default=True: Whether to check feature cardinality
check_distributionsbool, default=True: Whether to check feature distributions

Returns:

Dict[str, Any]: Dictionary containing validation results and recommendations