Exploratory data analysis is one of the most important step in any project. It basically gives us all details about a dataset like description of the data, data type, missing and Null values, etc. Although, we have multiple commands for each of this, we could do it in just a single step using Sweetviz
Sweetviz is an open-source library in Python that generates reports in the form of HTML. This library is used for visualization, and even compare subsets of two dataframe.
Let’s try and use the tool on a dataset!
The dataset I am using is the “Mall Customer Cluster Analysis”. It has customer details like Customer ID, age, gender, annual income and spending score.
1) Installation — Much like for other tools, we use pip
! pip install sweetviz
2) Import and Load your dataset
import sweetviz as sv
import pandas as pd
df = pd.read_csv(“Mall_customers.csv”, encoding = ‘utf-8’)
Let’s get into the data analysing part using visualization
3) Build report
To analyse the complete dataset, use analyze()
To compare two dataframes like Train and Test, split the dataframe and use the compare()
from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.2, random_state = 10)
split_report= sv.compare([train, “Train”], [test, “Test”])
You can also compare two subsets of same dataframes using compare_intra(). These reports are generated as .html files automatically.
These are the features of the Sweetviz library!