Simple EDA tool to generate automatic HTML report

Exploratory data analysis is one of the most important step in any project. It basically gives us all details about a dataset like description of the data, data type, missing and Null values, etc. Although, we have multiple commands for each of this, we could do it in just a single step using Sweetviz

Sweetviz is an open-source library in Python that generates reports in the form of HTML. This library is used for visualization, and even compare subsets of two dataframe.

Let’s try and use the tool on a dataset!

The dataset I am using is the “Mall Customer Cluster Analysis”. It has customer details like Customer ID, age, gender, annual income and spending score.

1) Installation — Much like for other tools, we use pip

! pip install sweetviz

2) Import and Load your dataset

import sweetviz as sv

import pandas as pd

df = pd.read_csv(“Mall_customers.csv”, encoding = ‘utf-8’)

Let’s get into the data analysing part using visualization

3) Build report

To analyse the complete dataset, use analyze()

my_customer_report=sv.analyze(df)

my_customer_report.show_html(“Customer.html”)

The report of a complete dataframe

To compare two dataframes like Train and Test, split the dataframe and use the compare()

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2, random_state = 10)

split_report= sv.compare([train, “Train”], [test, “Test”])

split_report.show_html(“Spending_Report.html”)

The report comparing two split dataframes

You can also compare two subsets of same dataframes using compare_intra(). These reports are generated as .html files automatically.

These are the features of the Sweetviz library!

Thank you for reading! This is my first article.

If you want to get in touch with me, feel free to reach me on LinkedIn. You can also view the code and data I have used here in my Github.

Software Developer -> Data Scientist -> Business Scientist