One of the loss functions related to logistic-style regression is:
L(w) = \\\\log(1 + e^{- (w^T x y)})
Linear Regression aims to fit a line that best predicts the data points.
It assumes the data follows a Gaussian (Normal) distribution around the regression line.
The model predicts:
y = w^T x + b
and the errors (residuals) are modeled as Gaussian noise:
\\\\epsilon \\\\sim \\\\mathcal{N}(0, \\\\sigma^2)
Hence, the likelihood of observing y given x is:
p(y|x; w, \\\\sigma^2) = \\\\frac{1}{\\\\sqrt{2\\\\pi\\\\sigma^2}} e^{-\\\\frac{(y - w^T x)^2}{2\\\\sigma^2}}
Maximizing this likelihood leads directly to minimizing the Mean Squared Error (MSE):
L(w) = \\\\frac{1}{2m} \\\\sum_{i=1}^{m} (y_i - w^T x_i)^2