Pandas

Intermediate
20 min

What is Pandas?

Pandas is a Python library for data manipulation and analysis, providing DataFrame objects similar to Excel.

Key Components:

  • DataFrame for tabular data
  • Powerful filtering and grouping
  • Handles missing data
  • Reads many file formats

Why it matters

Data Import

Read CSV, Excel, SQL data

Data Cleaning

Handle missing values

Analysis

Group, filter, transform

Export

Save to various formats

Key Concepts

DataFrame

2D table like Excel

Example: df = pd.read_csv("file.csv")...

Series

Single column

Example: df["Sales"]...

GroupBy

Group for aggregation

Example: df.groupby("Product")["Sales"].sum()...

Merge

Combine DataFrames

Example: pd.merge(df1, df2, on="ID")...

How to use

1

Import pandas

import pandas as pd

2

Load data

df = pd.read_csv("data.csv")

3

Preview

df.head() to see first rows

4

Clean

Handle missing, filter rows

5

Analyze

Group, aggregate, calculate

6

Export

df.to_csv("output.csv")

Example

Goal: Total sales by product
import pandas as pd
df = pd.read_csv("sales.csv")
summary = df.groupby("Product")["Sales"].sum()
print(summary)
Result: Displays total sales per product

Pro Tips

  • Use df.info(): See data types
  • Use df.describe(): Statistical summary
  • Vectorized operations: Avoid loops, use pandas methods

Practice

Read an Excel file, filter rows where Sales > 1000, and save as new CSV