Regression Definition

Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables).

Linear Equation

Linear Regression

# Simple Linear Regression Implementation
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('Salary_Data.csv')
X = df.iloc[:, :-1].values
y = df.iloc[:, 1].values

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Train model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predictions
y_pred = regressor.predict(X_test)

# Visualization
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Multiple Regression

Uses two or more independent variables to predict a dependent variable

General form: y = a₁x₁ + a₂x₂ + a₃x₃ + ... + aₙxₙ + b

Cost Function

Measures how well a machine learning model performs by quantifying the difference between predicted and actual outputs

Goal is to minimize this function by adjusting model parameters

For linear regression h(X) = θ₀ + θ₁X, the cost function is:

J(θ₀, θ₁) = 1/2m * Σ(h(xⁱ) - yⁱ)²

Where: