This document details my learnings and assignment work on using the Pandas library in Python. It covers DataFrame creation, data selection, manipulation, aggregation, and input/output operations. The document provides detailed explanations and code examples to illustrate the concepts and techniques learned. (”Pandas is for analyzing, cleaning, exploring, and manipulating data”)
To work with Pandas and perform numerical operations, you need to import the Pandas library using the alias pd and NumPy for numerical operations. You will also import random number generation from NumPy for generating random data.
import pandas as pd
import numpy as np
from numpy.random import randn
Create a DataFrame using random data with specified index and column labels.
df = pd.DataFrame(randn(5, 4), index='A B C D E'.split(), columns='W X Y Z'.split())
Access columns using bracket notation or dot notation.
df['W'] # Select column W
df[['W', 'Z']] # Select columns W and Z
df.W # Dot notation for column W
df['W']['A'] # Access specific value (column W, row A)
Use loc for label-based indexing and iloc for integer-based indexing.
df.loc['A'] # Select row A
df.iloc[2] # Select the third row
df.iloc[:, 0:3] # Select specific columns (0 to 2) for all rows