edaflow.ml.setup_ml_experiment
- edaflow.ml.setup_ml_experiment(data: DataFrame | None = None, target_column: str | None = None, test_size: float = 0.2, validation_size: float | None = None, random_state: int = 42, stratify: bool = True, verbose: bool = True, experiment_name: str | None = None, X: DataFrame | None = None, y: Series | None = None, val_size: float | None = None, primary_metric: str | None = None) Dict[str, Any][source]
Set up a complete ML experiment with train/validation/test splits.
This function supports two calling patterns: 1. DataFrame with target column: setup_ml_experiment(data, target_column) 2. Sklearn-style: setup_ml_experiment(X=X, y=y)
Parameters:
…existing parameters… primary_metric : str, optional
The main metric used for model selection and ranking (e.g., ‘roc_auc’, ‘f1’, ‘accuracy’, ‘r2’). This will be stored in the config for downstream use.
- target_columnstr, optional
Name of the target variable column (required if using data parameter)
- test_sizefloat, default=0.2
Proportion of data to use for testing
- validation_sizefloat, optional
Proportion of training data to use for validation (default=0.2)
- random_stateint, default=42
Random seed for reproducibility
- stratifybool, default=True
Whether to stratify the splits (for classification)
- verbosebool, default=True
Whether to print experiment setup details
- experiment_namestr, optional
Name for the experiment (default=’ml_experiment’)
- Xpd.DataFrame, optional
Feature matrix (alternative to data + target_column pattern)
- ypd.Series, optional
Target vector (alternative to data + target_column pattern)
- val_sizefloat, optional
Alternative name for validation_size (for compatibility)
Returns:
- Dict[str, Any]
Dictionary containing X_train, X_val, X_test, y_train, y_val, y_test, feature_names, target_name, and experiment_config
Examples:
# Method 1: DataFrame with target column (recommended) >>> experiment = ml.setup_ml_experiment(df, target_column=’target’)
# Method 2: Sklearn-style (also supported) >>> X = df.drop(‘target’, axis=1) >>> y = df[‘target’] >>> experiment = ml.setup_ml_experiment(X=X, y=y)