Installation Guide
Requirements
edaflow requires Python 3.8 or higher and the following dependencies:
pandas >= 1.5.0 - Data manipulation and analysis
numpy >= 1.21.0 - Numerical computing
matplotlib >= 3.5.0 - Static plotting
seaborn >= 0.11.0 - Statistical data visualization
scipy >= 1.9.0 - Scientific computing
plotly >= 5.0.0 - Interactive visualizations
scikit-learn >= 1.0.0 - Machine learning library (for regression analysis)
statsmodels >= 0.13.0 - Statistical analysis (for LOWESS smoothing)
missingno >= 0.5.2 - Missing data visualization
Install from PyPI (Recommended)
The easiest way to install edaflow is using pip from PyPI:
pip install edaflow
This will automatically install all required dependencies.
Install from Source
If you want to install the latest development version from GitHub:
git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e .
Development Installation
For development work, install with additional development dependencies:
git clone https://github.com/evanlow/edaflow.git
cd edaflow
pip install -e ".[dev]"
This includes tools for:
pytest - Testing framework
black - Code formatting
flake8 - Linting
isort - Import sorting
mypy - Type checking
build - Package building
twine - PyPI uploading
Verify Installation
To verify that edaflow is installed correctly:
import edaflow
print(edaflow.hello())
print(f"edaflow version: {edaflow.__version__}")
You should see:
Hello from edaflow! Ready for exploratory data analysis.
edaflow version: 0.8.6
Virtual Environment (Recommended)
It’s recommended to install edaflow in a virtual environment:
# Create virtual environment
python -m venv edaflow_env
# Activate (Windows)
edaflow_env\\Scripts\\activate
# Activate (macOS/Linux)
source edaflow_env/bin/activate
# Install edaflow
pip install edaflow
Jupyter Notebook Setup
For the best experience with color-coded outputs and interactive visualizations:
pip install jupyter
jupyter notebook
Then in your notebook:
import pandas as pd
import edaflow
# Load data
df = pd.read_csv('your_data.csv')
# Beautiful color-coded output
edaflow.check_null_columns(df)
Troubleshooting
Import Error
If you encounter import errors, ensure all dependencies are installed:
pip install --upgrade edaflow
Version Conflicts
If you have dependency conflicts, create a fresh virtual environment:
python -m venv fresh_env
# Activate the environment
pip install edaflow
Missing Dependencies
If specific visualizations don’t work, check for missing optional dependencies:
# For interactive plots
pip install plotly>=5.0.0
# For advanced statistics
pip install scikit-learn>=1.0.0 statsmodels>=0.13.0
Performance Issues
For large datasets, consider:
Using smaller samples for visualization functions
Increasing memory allocation for Jupyter notebooks
Using the
verbose=Falseoption in functions that support it
Getting Help
If you encounter issues:
Check the GitHub Issues
Create a new issue with your error details
Include your Python version, edaflow version, and full error traceback