edaflow.handle_outliers_median

edaflow.handle_outliers_median(df: DataFrame, columns: str | List[str] | None = None, method: str = 'iqr', iqr_multiplier: float = 1.5, inplace: bool = False, verbose: bool = True) → DataFrame[source]

Replace outliers in numerical columns with the median value.

This function identifies outliers using statistical methods and replaces them with the median value of the respective column. It’s designed to work seamlessly with the visualize_numerical_boxplots function for a complete outlier workflow.

Parameters:

df (pd.DataFrame) – The input DataFrame
columns (Optional[Union[str, List[str]]], optional) – Column name(s) to process. If None, processes all numerical columns. Defaults to None.
method (str, optional) – Method to identify outliers. Options: - ‘iqr’: Interquartile Range method (Q1 - 1.5*IQR, Q3 + 1.5*IQR) - ‘zscore’: Z-score method (values with |z-score| > 3) - ‘modified_zscore’: Modified Z-score using median absolute deviation Defaults to ‘iqr’.
iqr_multiplier (float, optional) – Multiplier for IQR method. Defaults to 1.5.
inplace (bool, optional) – If True, modifies the original DataFrame. If False, returns a new DataFrame. Defaults to False.
verbose (bool, optional) – If True, displays detailed information about the outlier handling process. Defaults to True.

Returns:

DataFrame with outliers replaced by median values.: If inplace=True, returns the modified original DataFrame.

Return type:

pd.DataFrame

Raises:

ValueError – If no valid numerical columns are found or if an invalid method is specified.
KeyError – If specified column(s) don’t exist in the DataFrame.

Example

>>> import pandas as pd
>>> import edaflow
>>>
>>> # Create sample data with outliers
>>> df = pd.DataFrame({
...     'A': [1, 2, 3, 4, 5, 100],  # 100 is an outlier
...     'B': [10, 20, 30, 40, 50, 60],
...     'C': ['x', 'y', 'z', 'x', 'y', 'z']
... })
>>>
>>> # First visualize outliers
>>> edaflow.visualize_numerical_boxplots(df)
>>>
>>> # Then handle outliers
>>> df_clean = edaflow.handle_outliers_median(df)
>>>
>>> # Or handle specific columns
>>> df_clean = edaflow.handle_outliers_median(df, columns=['A'])
>>>
>>> # Or modify inplace
>>> edaflow.handle_outliers_median(df, inplace=True)

# Alternative import style: >>> from edaflow.analysis import handle_outliers_median >>> df_clean = handle_outliers_median(df, method=’zscore’)