NumPy

NumPy Statistical Functions with Code Examples

NumPy is a powerful library for numerical computations in Python. One of its many features is a collection of statistical functions that help in analyzing a dataset. In this tutorial, we’ll explore some of the basic NumPy statistical functions.

Pre-requisites

  • Python 3.x
  • NumPy library

If you haven’t installed NumPy yet, you can install it using pip. To visualize the dataset, we can use Matplotlib, a popular data visualization library for Python.

Python
pip install numpy
pip install matplotlib

Dataset

For this tutorial, let’s use a simple dataset consisting of the scores of 10 students in Mathematics, Science, and English.

Python
# Sample Dataset
math_scores = [90, 85, 77, 95, 80, 84, 89, 92, 74, 88]
science_scores = [85, 90, 78, 92, 82, 88, 89, 90, 76, 91]
english_scores = [80, 79, 85, 87, 90, 86, 88, 91, 76, 84]

Step 1: Import NumPy

The first step is to import the NumPy library.

Python
import numpy as np

Step 2: Convert Dataset to NumPy Arrays

Convert the sample dataset into NumPy arrays for easier manipulation.

Python
# Convert lists to NumPy arrays
math_scores = np.array([90, 85, 77, 95, 80, 84, 89, 92, 74, 88])
science_scores = np.array([85, 90, 78, 92, 82, 88, 89, 90, 76, 91])
english_scores = np.array([80, 79, 85, 87, 90, 86, 88, 91, 76, 84])

Step 3: Mean

Calculate the mean (average) score for each subject.

Python
# Calculate mean
math_mean = np.mean(math_scores)
science_mean = np.mean(science_scores)
english_mean = np.mean(english_scores)

print(f"Math Mean: {math_mean}")
print(f"Science Mean: {science_mean}")
print(f"English Mean: {english_mean}")
Python
#Output
Math Mean: 85.4
Science Mean: 86.1
English Mean: 84.6

Bar Chart

Plot a bar chart to compare the mean scores for each subject.

mean scores
image 6

Step 4: Median

Calculate the median score for each subject.

Python
# Calculate median
math_median = np.median(math_scores)
science_median = np.median(science_scores)
english_median = np.median(english_scores)

print(f"Math Median: {math_median}")
print(f"Science Median: {science_median}")
print(f"English Median: {english_median}")
Python
#Output
Math Median: 86.5
Science Median: 88.5
English Median: 85.5

Step 5: Standard Deviation

Calculate the standard deviation for each subject to understand the variability.

Python
# Calculate standard deviation
math_std = np.std(math_scores)
science_std = np.std(science_scores)
english_std = np.std(english_scores)

print(f"Math Standard Deviation: {math_std}")
print(f"Science Standard Deviation: {science_std}")
print(f"English Standard Deviation: {english_std}")
Python
#Output
Math Standard Deviation: 6.390618123468184
Science Standard Deviation: 5.3563046963368315
English Standard Deviation: 4.651881339845203

Step 6: Minimum and Maximum

Find the minimum and maximum scores for each subject.

Python
# Calculate minimum and maximum
math_min, math_max = np.min(math_scores), np.max(math_scores)
science_min, science_max = np.min(science_scores), np.max(science_scores)
english_min, english_max = np.min(english_scores), np.max(english_scores)

print(f"Math Min-Max: {math_min}-{math_max}")
print(f"Science Min-Max: {science_min}-{science_max}")
print(f"English Min-Max: {english_min}-{english_max}")
Python
#Output
Math Min-Max: 74-95
Science Min-Max: 76-92
English Min-Max: 76-91

Step 7: Sum and Product

Find the sum and product of scores for each subject.

Python
# Calculate sum and product
math_sum, math_prod = np.sum(math_scores), np.prod(math_scores)
science_sum, science_prod = np.sum(science_scores), np.prod(science_scores)
english_sum, english_prod = np.sum(english_scores), np.prod(english_scores)

print(f"Math Sum-Product: {math_sum}-{math_prod}")
print(f"Science Sum-Product: {science_sum}-{science_prod}")
print(f"English Sum-Product: {english_sum}-{english_prod}")
Python
#Output
Math Sum-Product: 854-1604312601367568384
Science Sum-Product: 861-3497867111170832384
English Sum-Product: 846-46502495079440384

Boxplot

Create a boxplot to visualize the distribution, median, and outliers for each subject.

score distribution
image 7

With these simple functions, you can perform basic statistical analysis on a dataset. NumPy offers many more statistical functions, but these should provide a good starting point for basic data analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *