edaflow.ml.validate_ml_data
- edaflow.ml.validate_ml_data(experiment_data: Dict[str, Any] | None = None, check_missing: bool = True, check_duplicates: bool = True, check_outliers: bool = True, verbose: bool = True, X: DataFrame | None = None, y: Series | None = None, check_cardinality: bool = True, check_distributions: bool = True) Dict[str, Any][source]
Validate data quality for ML experiments.
This function supports two calling patterns: 1. Experiment config: validate_ml_data(experiment_config) 2. Sklearn-style: validate_ml_data(X=X_train, y=y_train)
Parameters:
- experiment_dataDict[str, Any], optional
Dictionary from setup_ml_experiment containing splits
- check_missingbool, default=True
Whether to check for missing values
- check_duplicatesbool, default=True
Whether to check for duplicate rows
- check_outliersbool, default=True
Whether to check for outliers
- verbosebool, default=True
Whether to print validation details
- Xpd.DataFrame, optional
Feature data (alternative to experiment_data)
- ypd.Series, optional
Target data (alternative to experiment_data)
- check_cardinalitybool, default=True
Whether to check feature cardinality
- check_distributionsbool, default=True
Whether to check feature distributions
Returns:
- Dict[str, Any]
Dictionary containing validation results and recommendations