🐼Section 3 of 5

🐼 Part 2 β€” Pandas Basics

🎬 Series vs DataFrame

A Series is a 1D labelled array (think: one column). A DataFrame is a 2D table β€” rows Γ— columns β€” with named columns. It's the workhorse of data analysis in Python.

Step 1 β€” Build a DataFrame from a dict

python
import pandas as pd

df = pd.DataFrame({
    "City":   ["Mumbai", "Delhi", "Bangalore"],
    "Sales":  [120, 90, 200],
    "Region": ["West", "North", "South"],
})
print(df)

Step 2 β€” Read a CSV

We've mounted a small sales.csv file into the runtime. Press Run.

CityMonthProductUnits_SoldRevenue
MumbaiJanNotebook1202400
MumbaiFebNotebook1503000
DelhiJanNotebook901800
BangaloreJanNotebook2004000
ChennaiFebNotebook951900
python
import pandas as pd

df = pd.read_csv("sales.csv")
print(df.head())
print("\nshape:", df.shape)

Step 3 β€” Explore

python
import pandas as pd
df = pd.read_csv("sales.csv")

print("--- info ---")
df.info()
print("\n--- describe ---")
print(df.describe())

Step 4 β€” Select columns & rows

python
import pandas as pd
df = pd.read_csv("sales.csv")

print("one column :\n", df["City"].head(), "\n")
print("two cols   :\n", df[["City", "Revenue"]].head(), "\n")
print("loc by label:\n", df.loc[0:2, ["City", "Revenue"]], "\n")
print("iloc by pos :\n", df.iloc[0:2, 0:3])

Step 5 β€” Filter rows

python
import pandas as pd
df = pd.read_csv("sales.csv")

big = df[df["Revenue"] > 2000]
print(big)

Step 6 β€” Add / drop columns

python
import pandas as pd
df = pd.read_csv("sales.csv")

df["Price_per_Unit"] = df["Revenue"] / df["Units_Sold"]
print(df.head())

df2 = df.drop(columns=["Month"])
print("\nafter drop:\n", df2.head())

Step 7 β€” Missing values

python
import pandas as pd
import numpy as np

df = pd.DataFrame({"x": [1, 2, np.nan, 4], "y": [np.nan, 5, 6, 7]})
print("isnull:\n", df.isnull())
print("\nfillna(0):\n", df.fillna(0))
print("\ndropna:\n", df.dropna())

Step 8 β€” GroupBy & aggregation

The big payoff: one line answers the principal's question.

python
import pandas as pd
df = pd.read_csv("sales.csv")

by_city = df.groupby("City")["Revenue"].sum().sort_values(ascending=False)
print("Total revenue by city:\n", by_city)

print("\nUnits sold by product:\n", df.groupby("Product")["Units_Sold"].sum())
🧠

Quick Check

Q1.Which method picks rows by integer position?
Q2.How do you keep only rows where age > 25?
Q3.df.groupby('city')['sales'].sum() returns…