edaflow.visualize_heatmap
- edaflow.visualize_heatmap(df: DataFrame, heatmap_type: str = 'correlation', columns: str | List[str] | None = None, title: str | None = None, figsize: tuple | None = None, cmap: str = 'RdYlBu_r', annot: bool = True, fmt: str = '.2f', square: bool = True, linewidths: float = 0.5, cbar_kws: dict | None = None, method: str = 'pearson', missing_threshold: float = 5.0, verbose: bool = True) None[source]
Create comprehensive heatmap visualizations for exploratory data analysis.
This function provides multiple types of heatmaps for different EDA purposes: - Correlation heatmaps for numerical relationships - Missing data pattern heatmaps - Numerical data value heatmaps - Cross-tabulation heatmaps for categorical relationships
- Parameters:
df (pd.DataFrame) – The input DataFrame
heatmap_type (str, optional) – Type of heatmap to create. Options: - “correlation”: Correlation matrix heatmap (default) - “missing”: Missing data pattern heatmap - “values”: Raw data values heatmap (for small datasets) - “crosstab”: Cross-tabulation heatmap for categorical data Defaults to “correlation”.
columns (Optional[Union[str, List[str]]], optional) – Column name(s) to include. If None, uses appropriate columns based on heatmap_type. Defaults to None.
title (Optional[str], optional) – Custom title for the heatmap. If None, auto-generated. Defaults to None.
figsize (Optional[tuple], optional) – Figure size (width, height). If None, auto-calculated. Defaults to None.
cmap (str, optional) – Colormap for the heatmap. Defaults to “RdYlBu_r”.
annot (bool, optional) – Whether to annotate cells with values. Defaults to True.
fmt (str, optional) – String formatting code for annotations. Defaults to “.2f”.
square (bool, optional) – Whether to make cells square-shaped. Defaults to True.
linewidths (float, optional) – Width of lines separating cells. Defaults to 0.5.
cbar_kws (Optional[dict], optional) – Keyword arguments for colorbar. Defaults to None.
method (str, optional) – Correlation method for correlation heatmaps. Options: “pearson”, “kendall”, “spearman”. Defaults to “pearson”.
missing_threshold (float, optional) – Threshold for missing data highlighting (%). Only used for missing data heatmaps. Defaults to 5.0.
verbose (bool, optional) – If True, displays detailed information about the heatmap creation process. Defaults to True.
- Returns:
Displays the heatmap visualization
- Return type:
None
- Raises:
ValueError – If heatmap_type is not supported or no suitable data found.
KeyError – If specified column(s) don’t exist in the DataFrame.
Example
>>> import pandas as pd >>> import edaflow >>> >>> # Create sample data >>> df = pd.DataFrame({ ... 'age': [25, 30, 28, 35, 32, 29, 31, 33], ... 'income': [50000, 55000, 48000, 62000, 51000, 45000, 53000, 49000], ... 'score': [85, 90, 78, 92, 88, 95, 81, 87], ... 'category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B'] ... }) >>> >>> # Correlation heatmap (default) >>> edaflow.visualize_heatmap(df) >>> >>> # Missing data pattern heatmap >>> edaflow.visualize_heatmap(df, heatmap_type="missing") >>> >>> # Custom styling >>> edaflow.visualize_heatmap( ... df, ... heatmap_type="correlation", ... method="spearman", ... cmap="viridis", ... title="Spearman Correlation Analysis" ... )