edaflow.ml.setup_ml_experiment

edaflow.ml.setup_ml_experiment(data: DataFrame | None = None, target_column: str | None = None, test_size: float = 0.2, validation_size: float | None = None, random_state: int = 42, stratify: bool = True, verbose: bool = True, experiment_name: str | None = None, X: DataFrame | None = None, y: Series | None = None, val_size: float | None = None, primary_metric: str | None = None) Dict[str, Any][source]

Set up a complete ML experiment with train/validation/test splits.

This function supports two calling patterns: 1. DataFrame with target column: setup_ml_experiment(data, target_column) 2. Sklearn-style: setup_ml_experiment(X=X, y=y)

Parameters:

…existing parameters… primary_metric : str, optional

The main metric used for model selection and ranking (e.g., ‘roc_auc’, ‘f1’, ‘accuracy’, ‘r2’). This will be stored in the config for downstream use.

target_columnstr, optional

Name of the target variable column (required if using data parameter)

test_sizefloat, default=0.2

Proportion of data to use for testing

validation_sizefloat, optional

Proportion of training data to use for validation (default=0.2)

random_stateint, default=42

Random seed for reproducibility

stratifybool, default=True

Whether to stratify the splits (for classification)

verbosebool, default=True

Whether to print experiment setup details

experiment_namestr, optional

Name for the experiment (default=’ml_experiment’)

Xpd.DataFrame, optional

Feature matrix (alternative to data + target_column pattern)

ypd.Series, optional

Target vector (alternative to data + target_column pattern)

val_sizefloat, optional

Alternative name for validation_size (for compatibility)

Returns:

Dict[str, Any]

Dictionary containing X_train, X_val, X_test, y_train, y_val, y_test, feature_names, target_name, and experiment_config

Examples:

# Method 1: DataFrame with target column (recommended) >>> experiment = ml.setup_ml_experiment(df, target_column=’target’)

# Method 2: Sklearn-style (also supported) >>> X = df.drop(‘target’, axis=1) >>> y = df[‘target’] >>> experiment = ml.setup_ml_experiment(X=X, y=y)