Regression is a statistical method used in finance, investing, and other disciplines that attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables).
y = ax + b
y
is the output of predictionx
is the input variable dataa
and b
are constant values that control the linear line# Simple Linear Regression Implementation
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Load dataset
df = pd.read_csv('Salary_Data.csv')
X = df.iloc[:, :-1].values
y = df.iloc[:, 1].values
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
# Train model
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predictions
y_pred = regressor.predict(X_test)
# Visualization
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
Uses two or more independent variables to predict a dependent variable
General form: y = a₁x₁ + a₂x₂ + a₃x₃ + ... + aₙxₙ + b
Measures how well a machine learning model performs by quantifying the difference between predicted and actual outputs
Goal is to minimize this function by adjusting model parameters
For linear regression h(X) = θ₀ + θ₁X
, the cost function is:
J(θ₀, θ₁) = 1/2m * Σ(h(xⁱ) - yⁱ)²
Where:
m
is number of training examples