Installation Guide

Requirements

edaflow requires Python 3.8 or higher and the following dependencies:

pandas >= 1.5.0 - Data manipulation and analysis
numpy >= 1.21.0 - Numerical computing
matplotlib >= 3.5.0 - Static plotting
seaborn >= 0.11.0 - Statistical data visualization
scipy >= 1.9.0 - Scientific computing
plotly >= 5.0.0 - Interactive visualizations
scikit-learn >= 1.0.0 - Machine learning library (for regression analysis)
statsmodels >= 0.13.0 - Statistical analysis (for LOWESS smoothing)
missingno >= 0.5.2 - Missing data visualization

Install from PyPI (Recommended)

The easiest way to install edaflow is using pip from PyPI:

pip install edaflow

This will automatically install all required dependencies.

Install from Source

If you want to install the latest development version from GitHub:

git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e .

Development Installation

For development work, install with additional development dependencies:

git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e ".[dev]"

This includes tools for:

pytest - Testing framework
black - Code formatting
flake8 - Linting
isort - Import sorting
mypy - Type checking
build - Package building
twine - PyPI uploading

Verify Installation

To verify that edaflow is installed correctly:

import edaflow
print(edaflow.hello())
print(f"edaflow version: {edaflow.__version__}")

You should see:

Hello from edaflow! Ready for exploratory data analysis.
edaflow version: 0.8.6

Virtual Environment (Recommended)

It’s recommended to install edaflow in a virtual environment:

# Create virtual environment
python -m venv edaflow_env

# Activate (Windows)
edaflow_env\\Scripts\\activate

# Activate (macOS/Linux)
source edaflow_env/bin/activate

# Install edaflow
pip install edaflow

Jupyter Notebook Setup

For the best experience with color-coded outputs and interactive visualizations:

pip install jupyter
jupyter notebook

Then in your notebook:

import pandas as pd
import edaflow

# Load data
df = pd.read_csv('your_data.csv')

# Beautiful color-coded output
edaflow.check_null_columns(df)

Troubleshooting

Import Error

If you encounter import errors, ensure all dependencies are installed:

pip install --upgrade edaflow

Version Conflicts

If you have dependency conflicts, create a fresh virtual environment:

python -m venv fresh_env
# Activate the environment
pip install edaflow

Missing Dependencies

If specific visualizations don’t work, check for missing optional dependencies:

# For interactive plots
pip install plotly>=5.0.0

# For advanced statistics
pip install scikit-learn>=1.0.0 statsmodels>=0.13.0

Performance Issues

For large datasets, consider:

Using smaller samples for visualization functions
Increasing memory allocation for Jupyter notebooks
Using the verbose=False option in functions that support it

Getting Help

If you encounter issues:

Check the GitHub Issues
Create a new issue with your error details
Include your Python version, edaflow version, and full error traceback