Installation Guide

Requirements

edaflow requires Python 3.8 or higher and the following dependencies:

  • pandas >= 1.5.0 - Data manipulation and analysis

  • numpy >= 1.21.0 - Numerical computing

  • matplotlib >= 3.5.0 - Static plotting

  • seaborn >= 0.11.0 - Statistical data visualization

  • scipy >= 1.9.0 - Scientific computing

  • plotly >= 5.0.0 - Interactive visualizations

  • scikit-learn >= 1.0.0 - Machine learning library (for regression analysis)

  • statsmodels >= 0.13.0 - Statistical analysis (for LOWESS smoothing)

  • missingno >= 0.5.2 - Missing data visualization

Install from Source

If you want to install the latest development version from GitHub:

git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e .

Development Installation

For development work, install with additional development dependencies:

git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e ".[dev]"

This includes tools for:

  • pytest - Testing framework

  • black - Code formatting

  • flake8 - Linting

  • isort - Import sorting

  • mypy - Type checking

  • build - Package building

  • twine - PyPI uploading

Verify Installation

To verify that edaflow is installed correctly:

import edaflow
print(edaflow.hello())
print(f"edaflow version: {edaflow.__version__}")

You should see:

Hello from edaflow! Ready for exploratory data analysis.
edaflow version: 0.8.6

Jupyter Notebook Setup

For the best experience with color-coded outputs and interactive visualizations:

pip install jupyter
jupyter notebook

Then in your notebook:

import pandas as pd
import edaflow

# Load data
df = pd.read_csv('your_data.csv')

# Beautiful color-coded output
edaflow.check_null_columns(df)

Troubleshooting

Import Error

If you encounter import errors, ensure all dependencies are installed:

pip install --upgrade edaflow

Version Conflicts

If you have dependency conflicts, create a fresh virtual environment:

python -m venv fresh_env
# Activate the environment
pip install edaflow

Missing Dependencies

If specific visualizations don’t work, check for missing optional dependencies:

# For interactive plots
pip install plotly>=5.0.0

# For advanced statistics
pip install scikit-learn>=1.0.0 statsmodels>=0.13.0

Performance Issues

For large datasets, consider:

  • Using smaller samples for visualization functions

  • Increasing memory allocation for Jupyter notebooks

  • Using the verbose=False option in functions that support it

Getting Help

If you encounter issues:

  1. Check the GitHub Issues

  2. Create a new issue with your error details

  3. Include your Python version, edaflow version, and full error traceback