edaflow.visualize_histogramsο
- edaflow.visualize_histograms(df: DataFrame, columns: str | List[str] | None = None, title: str | None = None, figsize: tuple | None = None, bins: int | str = 'auto', kde: bool = True, show_stats: bool = True, show_normal_curve: bool = True, color_palette: str = 'Set2', alpha: float = 0.7, grid_alpha: float = 0.3, rows: int | None = None, cols: int | None = None, statistical_tests: bool = True, verbose: bool = True) None[source]ο
Create comprehensive histogram visualizations with distribution analysis and skewness detection.
This function provides detailed histogram analysis for numerical columns, including: - Distribution shape visualization with histograms and KDE curves - Skewness and kurtosis analysis with interpretation - Normal distribution comparison overlay - Statistical tests for normality (Shapiro-Wilk, Anderson-Darling) - Comprehensive distribution statistics and insights
- Parameters:
df (pd.DataFrame) β The input DataFrame
columns (Optional[Union[str, List[str]]], optional) β Column name(s) to visualize. If None, processes all numerical columns. Defaults to None.
title (Optional[str], optional) β Main title for the entire plot. If None, auto-generated. Defaults to None.
figsize (Optional[tuple], optional) β Figure size (width, height). If None, auto-calculated. Defaults to None.
bins (Union[int, str], optional) β Number of bins or binning strategy. Options: int, βautoβ, βsturgesβ, βfdβ, βscottβ, βsqrtβ. Defaults to βautoβ.
kde (bool, optional) β Whether to show Kernel Density Estimation curve. Defaults to True.
show_stats (bool, optional) β Whether to display statistics on each subplot. Defaults to True.
show_normal_curve (bool, optional) β Whether to overlay normal distribution curve. Defaults to True.
color_palette (str, optional) β Seaborn color palette. Defaults to βSet2β.
alpha (float, optional) β Transparency of histogram bars (0-1). Defaults to 0.7.
grid_alpha (float, optional) β Transparency of grid lines (0-1). Defaults to 0.3.
rows (Optional[int], optional) β Number of rows in subplot grid. If None, auto-calculated. Defaults to None.
cols (Optional[int], optional) β Number of columns in subplot grid. If None, auto-calculated. Defaults to None.
statistical_tests (bool, optional) β Whether to run normality tests (Shapiro-Wilk, etc.). Defaults to True.
verbose (bool, optional) β If True, displays detailed distribution analysis. Defaults to True.
- Returns:
Displays the histogram visualization
- Return type:
None
- Raises:
ValueError β If no numerical columns are found or DataFrame is empty.
KeyError β If specified column(s) donβt exist in the DataFrame.
Example
>>> import pandas as pd >>> import numpy as np >>> import edaflow >>> >>> # Create sample data with different distributions >>> np.random.seed(42) >>> df = pd.DataFrame({ ... 'normal': np.random.normal(100, 15, 1000), ... 'skewed_right': np.random.exponential(2, 1000), ... 'skewed_left': 10 - np.random.exponential(2, 1000), ... 'uniform': np.random.uniform(0, 100, 1000) ... }) >>> >>> # Basic histogram analysis >>> edaflow.visualize_histograms(df) >>> >>> # Custom analysis with specific columns >>> edaflow.visualize_histograms( ... df, ... columns=['normal', 'skewed_right'], ... bins=30, ... show_normal_curve=True, ... statistical_tests=True ... ) >>> >>> # Detailed styling >>> edaflow.visualize_histograms( ... df, ... title="Distribution Analysis Dashboard", ... color_palette='viridis', ... alpha=0.8, ... figsize=(15, 10) ... )