Basic Concept[ML] | Open17's Blog

Through algorithms, machines can learn rules from a large number of data and make decisions on new sample

Four Elements in Machine Learning

Learning Rules

A good model should be consistent with the real mapping function in all values:

| f (x, θ^{*}) - y | \leq ϵ, \forall (x, y) \in X \times Y

it is a non-negative real function, used to quantify the difference between model’s prediction and true label

For example,Quadratic loss function:

L (y, f (x; θ)) = \frac{1}{2} (y - f (x; θ))^{2}

After selecting the appropriate risk function, we look for a parameter $θ^{*}$ to minimize the empirical risk function:

θ^{*} = {argmin}_{θ} \hat{R} (θ)

ML problem is transformed into an optimization problem

期望风险(真实风险):

R (θ) = E_{(} x, y) \sim p_{r} (x, y) [L (y, f (x; θ))]

Expected risk is unknown, approximated by empirical risk

\hat{R} (θ) = \frac{1}{n} \sum_{i = 1}^{n} L (y_{i}, f (x_{i}; θ))

SGD: sampling one samples in each iteration

θ_{t} + 1 = \frac{θ_{t} - α \partial L (y^{n}, f (x^{n}; θ))}{\partial θ}, n = 1 \dots N

Generalization error:

G_{D} (f) = R (f) - R_{D}^{e m p} (f)

the principle of empirical risk minimization can easily lead to a low error rate in the training set, but a high error rate in the unknown data.