edaflow.visualize_histograms

edaflow.visualize_histograms(df: DataFrame, columns: str | List[str] | None = None, title: str | None = None, figsize: tuple | None = None, bins: int | str = 'auto', kde: bool = True, show_stats: bool = True, show_normal_curve: bool = True, color_palette: str = 'Set2', alpha: float = 0.7, grid_alpha: float = 0.3, rows: int | None = None, cols: int | None = None, statistical_tests: bool = True, verbose: bool = True) None[source]

Create comprehensive histogram visualizations with distribution analysis and skewness detection.

This function provides detailed histogram analysis for numerical columns, including: - Distribution shape visualization with histograms and KDE curves - Skewness and kurtosis analysis with interpretation - Normal distribution comparison overlay - Statistical tests for normality (Shapiro-Wilk, Anderson-Darling) - Comprehensive distribution statistics and insights

Parameters:
  • df (pd.DataFrame) – The input DataFrame

  • columns (Optional[Union[str, List[str]]], optional) – Column name(s) to visualize. If None, processes all numerical columns. Defaults to None.

  • title (Optional[str], optional) – Main title for the entire plot. If None, auto-generated. Defaults to None.

  • figsize (Optional[tuple], optional) – Figure size (width, height). If None, auto-calculated. Defaults to None.

  • bins (Union[int, str], optional) – Number of bins or binning strategy. Options: int, ‘auto’, ‘sturges’, ‘fd’, ‘scott’, ‘sqrt’. Defaults to ‘auto’.

  • kde (bool, optional) – Whether to show Kernel Density Estimation curve. Defaults to True.

  • show_stats (bool, optional) – Whether to display statistics on each subplot. Defaults to True.

  • show_normal_curve (bool, optional) – Whether to overlay normal distribution curve. Defaults to True.

  • color_palette (str, optional) – Seaborn color palette. Defaults to ‘Set2’.

  • alpha (float, optional) – Transparency of histogram bars (0-1). Defaults to 0.7.

  • grid_alpha (float, optional) – Transparency of grid lines (0-1). Defaults to 0.3.

  • rows (Optional[int], optional) – Number of rows in subplot grid. If None, auto-calculated. Defaults to None.

  • cols (Optional[int], optional) – Number of columns in subplot grid. If None, auto-calculated. Defaults to None.

  • statistical_tests (bool, optional) – Whether to run normality tests (Shapiro-Wilk, etc.). Defaults to True.

  • verbose (bool, optional) – If True, displays detailed distribution analysis. Defaults to True.

Returns:

Displays the histogram visualization

Return type:

None

Raises:
  • ValueError – If no numerical columns are found or DataFrame is empty.

  • KeyError – If specified column(s) don’t exist in the DataFrame.

Example

>>> import pandas as pd
>>> import numpy as np
>>> import edaflow
>>>
>>> # Create sample data with different distributions
>>> np.random.seed(42)
>>> df = pd.DataFrame({
...     'normal': np.random.normal(100, 15, 1000),
...     'skewed_right': np.random.exponential(2, 1000),
...     'skewed_left': 10 - np.random.exponential(2, 1000),
...     'uniform': np.random.uniform(0, 100, 1000)
... })
>>>
>>> # Basic histogram analysis
>>> edaflow.visualize_histograms(df)
>>>
>>> # Custom analysis with specific columns
>>> edaflow.visualize_histograms(
...     df,
...     columns=['normal', 'skewed_right'],
...     bins=30,
...     show_normal_curve=True,
...     statistical_tests=True
... )
>>>
>>> # Detailed styling
>>> edaflow.visualize_histograms(
...     df,
...     title="Distribution Analysis Dashboard",
...     color_palette='viridis',
...     alpha=0.8,
...     figsize=(15, 10)
... )