In this step-by-step tutorial, we’ll walk through how to use Pandas, a powerful data manipulation and analysis library for Python, to analyze sales data. Our objective is to answer business questions like “What’s the most profitable month?” and “What’s the best-selling product?”
Table of Contents
1. Setup and Installation
First, install Pandas and Matplotlib for data visualization:
pip install pandas matplotlib
2. Reading Data
Download sales data CSV file named sales_data.csv with columns ‘Month’, ‘Product’, ‘Revenue’.
Create a file named ‘sales_data.py’ and place it in the same directory as ‘sales_data.csv’ .
Add python code as following:
import pandas as pd
# Load data into a Pandas DataFrame
df = pd.read_csv('sales_data.csv')
3. Data Cleaning
Before analysis, we need to ensure that the data is clean and in the right format.
3.1 Remove Missing Values
# Drop rows with missing values
df.dropna(inplace=True)
3.2 Convert Data Types
# Convert 'Revenue' to float
df['Revenue'] = df['Revenue'].astype(float)
4. Data Analysis
4.1 Most Profitable Month
# Group data by 'Month' and sum 'Revenue'
monthly_sales = df.groupby('Month')['Revenue'].sum()
# Find the most profitable month
most_profitable_month = monthly_sales.idxmax()
print(f"The most profitable month is {most_profitable_month}.")
4.2 Best-Selling Product
# Group data by 'Product' and sum 'Revenue'
product_sales = df.groupby('Product')['Revenue'].sum()
# Find the best-selling product
best_selling_product = product_sales.idxmax()
print(f"The best-selling product is {best_selling_product}.")
5. Data Visualization
5.1 Plotting Monthly Sales
import matplotlib.pyplot as plt
# Plotting
plt.bar(monthly_sales.index, monthly_sales)
plt.xlabel('Month')
plt.ylabel('Revenue')
plt.title('Monthly Sales Data')
plt.show()
5.2 Plotting Product Sales
# Plotting
plt.bar(product_sales.index, product_sales)
plt.xlabel('Product')
plt.ylabel('Revenue')
plt.title('Product Sales Data')
plt.show()
6. Conclusion
Through this tutorial, you’ve learned how to load, clean, analyze, and visualize sales data using Pandas and Matplotlib. Now you can easily answer business-related questions to help make better decisions.