edaflow.ml.setup_ml_experiment

edaflow.ml.setup_ml_experiment(data: DataFrame | None = None, target_column: str | None = None, test_size: float = 0.2, validation_size: float | None = None, random_state: int = 42, stratify: bool = True, verbose: bool = True, experiment_name: str | None = None, X: DataFrame | None = None, y: Series | None = None, val_size: float | None = None, primary_metric: str | None = None) → Dict[str, Any][source]

Set up a complete ML experiment with train/validation/test splits.

This function supports two calling patterns: 1. DataFrame with target column: setup_ml_experiment(data, target_column) 2. Sklearn-style: setup_ml_experiment(X=X, y=y)

Parameters:

…existing parameters… primary_metric : str, optional

The main metric used for model selection and ranking (e.g., ‘roc_auc’, ‘f1’, ‘accuracy’, ‘r2’). This will be stored in the config for downstream use.

target_columnstr, optional: Name of the target variable column (required if using data parameter)
test_sizefloat, default=0.2: Proportion of data to use for testing
validation_sizefloat, optional: Proportion of training data to use for validation (default=0.2)
random_stateint, default=42: Random seed for reproducibility
stratifybool, default=True: Whether to stratify the splits (for classification)
verbosebool, default=True: Whether to print experiment setup details
experiment_namestr, optional: Name for the experiment (default=’ml_experiment’)
Xpd.DataFrame, optional: Feature matrix (alternative to data + target_column pattern)
ypd.Series, optional: Target vector (alternative to data + target_column pattern)
val_sizefloat, optional: Alternative name for validation_size (for compatibility)

Returns:

Dict[str, Any]: Dictionary containing X_train, X_val, X_test, y_train, y_val, y_test, feature_names, target_name, and experiment_config

Examples:

# Method 1: DataFrame with target column (recommended) >>> experiment = ml.setup_ml_experiment(df, target_column=’target’)

# Method 2: Sklearn-style (also supported) >>> X = df.drop(‘target’, axis=1) >>> y = df[‘target’] >>> experiment = ml.setup_ml_experiment(X=X, y=y)