edaflow.handle_outliers_medianο
- edaflow.handle_outliers_median(df: DataFrame, columns: str | List[str] | None = None, method: str = 'iqr', iqr_multiplier: float = 1.5, inplace: bool = False, verbose: bool = True) DataFrame[source]ο
Replace outliers in numerical columns with the median value.
This function identifies outliers using statistical methods and replaces them with the median value of the respective column. Itβs designed to work seamlessly with the visualize_numerical_boxplots function for a complete outlier workflow.
- Parameters:
df (pd.DataFrame) β The input DataFrame
columns (Optional[Union[str, List[str]]], optional) β Column name(s) to process. If None, processes all numerical columns. Defaults to None.
method (str, optional) β Method to identify outliers. Options: - βiqrβ: Interquartile Range method (Q1 - 1.5*IQR, Q3 + 1.5*IQR) - βzscoreβ: Z-score method (values with |z-score| > 3) - βmodified_zscoreβ: Modified Z-score using median absolute deviation Defaults to βiqrβ.
iqr_multiplier (float, optional) β Multiplier for IQR method. Defaults to 1.5.
inplace (bool, optional) β If True, modifies the original DataFrame. If False, returns a new DataFrame. Defaults to False.
verbose (bool, optional) β If True, displays detailed information about the outlier handling process. Defaults to True.
- Returns:
- DataFrame with outliers replaced by median values.
If inplace=True, returns the modified original DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError β If no valid numerical columns are found or if an invalid method is specified.
KeyError β If specified column(s) donβt exist in the DataFrame.
Example
>>> import pandas as pd >>> import edaflow >>> >>> # Create sample data with outliers >>> df = pd.DataFrame({ ... 'A': [1, 2, 3, 4, 5, 100], # 100 is an outlier ... 'B': [10, 20, 30, 40, 50, 60], ... 'C': ['x', 'y', 'z', 'x', 'y', 'z'] ... }) >>> >>> # First visualize outliers >>> edaflow.visualize_numerical_boxplots(df) >>> >>> # Then handle outliers >>> df_clean = edaflow.handle_outliers_median(df) >>> >>> # Or handle specific columns >>> df_clean = edaflow.handle_outliers_median(df, columns=['A']) >>> >>> # Or modify inplace >>> edaflow.handle_outliers_median(df, inplace=True)
# Alternative import style: >>> from edaflow.analysis import handle_outliers_median >>> df_clean = handle_outliers_median(df, method=βzscoreβ)