Skip to content
Basic Concept[ML]

Through algorithms, machines can learn rules from a large number of data and make decisions on new sample


Four Elements in Machine Learning

  • Data
  • Model
  • Learning Rule
  • Optimization Algorithm

Learning Rules

A good model should be consistent with the real mapping function in all values:

|f(x,θ)y|ϵ,(x,y)X×Y

Loss Func

it is a non-negative real function, used to quantify the difference between model’s prediction and true label

For example,Quadratic loss function:

L(y,f(x;θ))=12(yf(x;θ))2

Empirical risk minimization

After selecting the appropriate risk function, we look for a parameter θ to minimize the empirical risk function:

θ=argminθR^(θ)

ML problem is transformed into an optimization problem

Expected Risk

期望风险(真实风险):

R(θ)=E(x,y)pr(x,y)[L(y,f(x;θ))]
  • E(x,y)pr(x,y): Real data distribution

Empirical Risk(经验风险)

Expected risk is unknown, approximated by empirical risk

R^(θ)=1ni=1nL(yi,f(xi;θ))

Stochastic Gradient Descent

SGD: sampling one samples in each iteration

θt+1=θtαL(yn,f(xn;θ))θ,n=1N

Generalization Error (泛化误差)

Generalization error:

GD(f)=R(f)RDemp(f)

Regularization

the principle of empirical risk minimization can easily lead to a low error rate in the training set, but a high error rate in the unknown data.