Data filtering is the process of selecting a subset of data based on specific criteria. It is a way to “filter out” what you don’t need, so you can more easily analyze, visualize, or understand your data. Filtering can be applied to rows, columns, or both, and it often precedes other data operations like sorting, grouping, and statistical analysis.
Types of Filtering
- Conditional Filtering: Selecting data based on conditions, such as numerical or textual thresholds.
- Range-based Filtering: Selecting data within a certain range (e.g., dates between Jan 1, 2020, and Jan 31, 2020).
- Categorical Filtering: Selecting data that belongs to certain categories or classes.
- Text-based Filtering: Selecting data based on text patterns, often using regular expressions.
- Custom Filtering: Using complex logic or external algorithms to select data.
Filtering data is an essential part of data analysis. Below are ten examples of filtering operations using the Python Pandas library, each with an explanation and sample code:
Example 1: Filter Rows Based on Single Condition
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [5, 4, 3, 2, 1]})
filtered_df = df[df['A'] > 2]
Example 2: Filter Rows Using Multiple Conditions (AND)
filtered_df = df[(df['A'] > 2) & (df['B'] < 5)]
Example 3: Filter Rows Using Multiple Conditions (OR)
filtered_df = df[(df['A'] > 2) | (df['B'] < 5)]
Example 4: Filter Using isin()
filtered_df = df[df['A'].isin([1, 3, 5])]
Example 5: Filter Using query()
filtered_df = df.query("A > 2 & B < 5")
Example 6: Filter Using String Methods
Assume a DataFrame with a string column ‘Name’.
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David']})
filtered_df = df[df['Name'].str.startswith('A')]
Example 7: Filter Using between()
filtered_df = df[df['A'].between(2, 4)]
Example 8: Filter Using ~
to Negate Condition
filtered_df = df[~(df['A'] > 3)]
Example 9: Filter Using Regex
Assume a DataFrame with a string column ‘Text’.
df = pd.DataFrame({'Text': ['apple', 'banana', 'cherry', 'date']})
filtered_df = df[df['Text'].str.contains('app|ban', regex=True)]
Example 10: Filter Using a Custom Function
def custom_filter(row):
return row['A'] + row['B'] > 5
filtered_df = df[df.apply(custom_filter, axis=1)]
These are just a few examples to get you started with filtering operations in Pandas. Each of these methods can be adapted for more complex conditions and data structures.