edaflow.profile_report

edaflow.profile_report(df: DataFrame, top_n_categorical: int = 5, output_format: str = 'html') Any[source]

Generate a comprehensive profiling report for a DataFrame.

This function creates an automated EDA report similar to ydata-profiling’s ProfileReport, including dataset overview, missing value analysis, categorical insights, and visualizations.

Parameters:
  • df (pd.DataFrame) – The input DataFrame to profile

  • top_n_categorical (int, optional) – Number of top categorical columns to analyze. Defaults to 5.

  • output_format (str, optional) – Output format for the report. Options: “html” (saves to temp file), “dict” (returns dict). Defaults to “html”.

Returns:

If output_format=”html”, returns path to HTML file.

If output_format=”dict”, returns dict with: - ‘overview’: DataFrame with dataset info - ‘summary_stats’: DataFrame with summary statistics - ‘missing_values’: DataFrame with null analysis - ‘categorical_insights’: Dict with category distributions - ‘numeric_insights’: Dict with numeric column info - ‘visualizations’: Dict with matplotlib figures

Return type:

Any

Raises:
  • ValueError – If df is empty or output_format is invalid

  • TypeError – If df is not a pandas DataFrame

Examples

>>> import pandas as pd
>>> import edaflow
>>>
>>> # Create sample data
>>> df = pd.DataFrame({
...     'age': [25, 30, 35, 28, None, 45],
...     'salary': [50000, 60000, 70000, 55000, 65000, 80000],
...     'department': ['HR', 'IT', 'IT', 'HR', 'Finance', 'IT'],
...     'city': ['NYC', 'LA', 'NYC', 'LA', 'NYC', 'LA']
... })
>>>
>>> # Generate HTML report
>>> report_path = edaflow.profile_report(df)
>>> print(f"Report saved to: {report_path}")
>>>
>>> # Generate dict report
>>> report_dict = edaflow.profile_report(df, output_format="dict")
>>> print(report_dict['overview'])
>>>
>>> # Analyze top 3 categorical columns
>>> report = edaflow.profile_report(df, top_n_categorical=3, output_format="dict")
>>> print(report['categorical_insights'])

Alternative import: >>> from edaflow.analysis import profile_report >>> report = profile_report(df)