Filtering

Top 10 quick examples of pandas dataframe operations of filtering

Data filtering is the process of selecting a subset of data based on specific criteria. It is a way to “filter out” what you don’t need, so you can more easily analyze, visualize, or understand your data. Filtering can be applied to rows, columns, or both, and it often precedes other data operations like sorting, grouping, and statistical analysis.

Types of Filtering

  1. Conditional Filtering: Selecting data based on conditions, such as numerical or textual thresholds.
  2. Range-based Filtering: Selecting data within a certain range (e.g., dates between Jan 1, 2020, and Jan 31, 2020).
  3. Categorical Filtering: Selecting data that belongs to certain categories or classes.
  4. Text-based Filtering: Selecting data based on text patterns, often using regular expressions.
  5. Custom Filtering: Using complex logic or external algorithms to select data.

Filtering data is an essential part of data analysis. Below are ten examples of filtering operations using the Python Pandas library, each with an explanation and sample code:

Example 1: Filter Rows Based on Single Condition

Python
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
filtered_df = df[df['A'] > 2]

Example 2: Filter Rows Using Multiple Conditions (AND)

Python
filtered_df = df[(df['A'] > 2) & (df['B'] < 5)]

Example 3: Filter Rows Using Multiple Conditions (OR)

Python
filtered_df = df[(df['A'] > 2) | (df['B'] < 5)]

Example 4: Filter Using isin()

Python
filtered_df = df[df['A'].isin([1, 3, 5])]

Example 5: Filter Using query()

Python
filtered_df = df.query("A > 2 & B < 5")

Example 6: Filter Using String Methods

Assume a DataFrame with a string column ‘Name’.

Python
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David']})
filtered_df = df[df['Name'].str.startswith('A')]

Example 7: Filter Using between()

Python
filtered_df = df[df['A'].between(2, 4)]

Example 8: Filter Using ~ to Negate Condition

Python
filtered_df = df[~(df['A'] > 3)]

Example 9: Filter Using Regex

Assume a DataFrame with a string column ‘Text’.

Python
df = pd.DataFrame({'Text': ['apple', 'banana', 'cherry', 'date']})
filtered_df = df[df['Text'].str.contains('app|ban', regex=True)]

Example 10: Filter Using a Custom Function

Python
def custom_filter(row):
    return row['A'] + row['B'] > 5

filtered_df = df[df.apply(custom_filter, axis=1)]

These are just a few examples to get you started with filtering operations in Pandas. Each of these methods can be adapted for more complex conditions and data structures.