πΌSection 3 of 5
πΌ Part 2 β Pandas Basics
π¬ Series vs DataFrame
A Series is a 1D labelled array (think: one column). A DataFrame is a 2D table β rows Γ columns β with named columns. It's the workhorse of data analysis in Python.
Step 1 β Build a DataFrame from a dict
python
import pandas as pd
df = pd.DataFrame({
"City": ["Mumbai", "Delhi", "Bangalore"],
"Sales": [120, 90, 200],
"Region": ["West", "North", "South"],
})
print(df)Step 2 β Read a CSV
We've mounted a small sales.csv file into the runtime. Press Run.
| City | Month | Product | Units_Sold | Revenue |
|---|---|---|---|---|
| Mumbai | Jan | Notebook | 120 | 2400 |
| Mumbai | Feb | Notebook | 150 | 3000 |
| Delhi | Jan | Notebook | 90 | 1800 |
| Bangalore | Jan | Notebook | 200 | 4000 |
| Chennai | Feb | Notebook | 95 | 1900 |
python
import pandas as pd
df = pd.read_csv("sales.csv")
print(df.head())
print("\nshape:", df.shape)Step 3 β Explore
python
import pandas as pd
df = pd.read_csv("sales.csv")
print("--- info ---")
df.info()
print("\n--- describe ---")
print(df.describe())Step 4 β Select columns & rows
python
import pandas as pd
df = pd.read_csv("sales.csv")
print("one column :\n", df["City"].head(), "\n")
print("two cols :\n", df[["City", "Revenue"]].head(), "\n")
print("loc by label:\n", df.loc[0:2, ["City", "Revenue"]], "\n")
print("iloc by pos :\n", df.iloc[0:2, 0:3])Step 5 β Filter rows
python
import pandas as pd
df = pd.read_csv("sales.csv")
big = df[df["Revenue"] > 2000]
print(big)Step 6 β Add / drop columns
python
import pandas as pd
df = pd.read_csv("sales.csv")
df["Price_per_Unit"] = df["Revenue"] / df["Units_Sold"]
print(df.head())
df2 = df.drop(columns=["Month"])
print("\nafter drop:\n", df2.head())Step 7 β Missing values
python
import pandas as pd
import numpy as np
df = pd.DataFrame({"x": [1, 2, np.nan, 4], "y": [np.nan, 5, 6, 7]})
print("isnull:\n", df.isnull())
print("\nfillna(0):\n", df.fillna(0))
print("\ndropna:\n", df.dropna())Step 8 β GroupBy & aggregation
The big payoff: one line answers the principal's question.
python
import pandas as pd
df = pd.read_csv("sales.csv")
by_city = df.groupby("City")["Revenue"].sum().sort_values(ascending=False)
print("Total revenue by city:\n", by_city)
print("\nUnits sold by product:\n", df.groupby("Product")["Units_Sold"].sum())π§
Quick Check
Q1.Which method picks rows by integer position?
Q2.How do you keep only rows where age > 25?
Q3.df.groupby('city')['sales'].sum() returnsβ¦