edaflow.visualize_histograms
- edaflow.visualize_histograms(df: DataFrame, columns: str | List[str] | None = None, title: str | None = None, figsize: tuple | None = None, bins: int | str = 'auto', kde: bool = True, show_stats: bool = True, show_normal_curve: bool = True, color_palette: str = 'Set2', alpha: float = 0.7, grid_alpha: float = 0.3, rows: int | None = None, cols: int | None = None, statistical_tests: bool = True, verbose: bool = True) None[source]
Create comprehensive histogram visualizations with distribution analysis and skewness detection.
This function provides detailed histogram analysis for numerical columns, including: - Distribution shape visualization with histograms and KDE curves - Skewness and kurtosis analysis with interpretation - Normal distribution comparison overlay - Statistical tests for normality (Shapiro-Wilk, Anderson-Darling) - Comprehensive distribution statistics and insights
- Parameters:
df (pd.DataFrame) – The input DataFrame
columns (Optional[Union[str, List[str]]], optional) – Column name(s) to visualize. If None, processes all numerical columns. Defaults to None.
title (Optional[str], optional) – Main title for the entire plot. If None, auto-generated. Defaults to None.
figsize (Optional[tuple], optional) – Figure size (width, height). If None, auto-calculated. Defaults to None.
bins (Union[int, str], optional) – Number of bins or binning strategy. Options: int, ‘auto’, ‘sturges’, ‘fd’, ‘scott’, ‘sqrt’. Defaults to ‘auto’.
kde (bool, optional) – Whether to show Kernel Density Estimation curve. Defaults to True.
show_stats (bool, optional) – Whether to display statistics on each subplot. Defaults to True.
show_normal_curve (bool, optional) – Whether to overlay normal distribution curve. Defaults to True.
color_palette (str, optional) – Seaborn color palette. Defaults to ‘Set2’.
alpha (float, optional) – Transparency of histogram bars (0-1). Defaults to 0.7.
grid_alpha (float, optional) – Transparency of grid lines (0-1). Defaults to 0.3.
rows (Optional[int], optional) – Number of rows in subplot grid. If None, auto-calculated. Defaults to None.
cols (Optional[int], optional) – Number of columns in subplot grid. If None, auto-calculated. Defaults to None.
statistical_tests (bool, optional) – Whether to run normality tests (Shapiro-Wilk, etc.). Defaults to True.
verbose (bool, optional) – If True, displays detailed distribution analysis. Defaults to True.
- Returns:
Displays the histogram visualization
- Return type:
None
- Raises:
ValueError – If no numerical columns are found or DataFrame is empty.
KeyError – If specified column(s) don’t exist in the DataFrame.
Example
>>> import pandas as pd >>> import numpy as np >>> import edaflow >>> >>> # Create sample data with different distributions >>> np.random.seed(42) >>> df = pd.DataFrame({ ... 'normal': np.random.normal(100, 15, 1000), ... 'skewed_right': np.random.exponential(2, 1000), ... 'skewed_left': 10 - np.random.exponential(2, 1000), ... 'uniform': np.random.uniform(0, 100, 1000) ... }) >>> >>> # Basic histogram analysis >>> edaflow.visualize_histograms(df) >>> >>> # Custom analysis with specific columns >>> edaflow.visualize_histograms( ... df, ... columns=['normal', 'skewed_right'], ... bins=30, ... show_normal_curve=True, ... statistical_tests=True ... ) >>> >>> # Detailed styling >>> edaflow.visualize_histograms( ... df, ... title="Distribution Analysis Dashboard", ... color_palette='viridis', ... alpha=0.8, ... figsize=(15, 10) ... )