🐼Section 3 of 5

🐼 Part 2 — Pandas Basics

🎬 Series vs DataFrame

A Series is a 1D labelled array (think: one column). A DataFrame is a 2D table — rows × columns — with named columns. It's the workhorse of data analysis in Python.

Step 1 — Build a DataFrame from a dict

python

import pandas as pd

df = pd.DataFrame({
    "City":   ["Mumbai", "Delhi", "Bangalore"],
    "Sales":  [120, 90, 200],
    "Region": ["West", "North", "South"],
})
print(df)

Step 2 — Read a CSV

We've mounted a small sales.csv file into the runtime. Press Run.

City	Month	Product	Units_Sold	Revenue
Mumbai	Jan	Notebook	120	2400
Mumbai	Feb	Notebook	150	3000
Delhi	Jan	Notebook	90	1800
Bangalore	Jan	Notebook	200	4000
Chennai	Feb	Notebook	95	1900

python

import pandas as pd

df = pd.read_csv("sales.csv")
print(df.head())
print("\nshape:", df.shape)

Step 3 — Explore

python

import pandas as pd
df = pd.read_csv("sales.csv")

print("--- info ---")
df.info()
print("\n--- describe ---")
print(df.describe())

Step 4 — Select columns & rows

python

import pandas as pd
df = pd.read_csv("sales.csv")

print("one column :\n", df["City"].head(), "\n")
print("two cols   :\n", df[["City", "Revenue"]].head(), "\n")
print("loc by label:\n", df.loc[0:2, ["City", "Revenue"]], "\n")
print("iloc by pos :\n", df.iloc[0:2, 0:3])

Step 5 — Filter rows

python

import pandas as pd
df = pd.read_csv("sales.csv")

big = df[df["Revenue"] > 2000]
print(big)

Step 6 — Add / drop columns

python

import pandas as pd
df = pd.read_csv("sales.csv")

df["Price_per_Unit"] = df["Revenue"] / df["Units_Sold"]
print(df.head())

df2 = df.drop(columns=["Month"])
print("\nafter drop:\n", df2.head())

Step 7 — Missing values

python

import pandas as pd
import numpy as np

df = pd.DataFrame({"x": [1, 2, np.nan, 4], "y": [np.nan, 5, 6, 7]})
print("isnull:\n", df.isnull())
print("\nfillna(0):\n", df.fillna(0))
print("\ndropna:\n", df.dropna())

Step 8 — GroupBy & aggregation

The big payoff: one line answers the principal's question.

python

import pandas as pd
df = pd.read_csv("sales.csv")

by_city = df.groupby("City")["Revenue"].sum().sort_values(ascending=False)
print("Total revenue by city:\n", by_city)

print("\nUnits sold by product:\n", df.groupby("Product")["Units_Sold"].sum())

🧠

Quick Check

Q1.Which method picks rows by integer position?

Q2.How do you keep only rows where age > 25?

Q3.df.groupby('city')['sales'].sum() returns…

NumPy

Matplotlib