ch2_learning

AI/머신러닝

ch2_learning

9taetae9 2024. 3. 12. 14:31

728x90

Key terms

Y = f(x1, x2, x3)

- want to improve sales (Y) of a product

-> Y: output variable, dependent variable

-control adveritsing budgets : sns(x1), streaming(x2), flier(x3)

->x1, x2, x3 : input variables, independent variables, predictors

Key questions

1) What is the relationship between x1, x2, x3 and Y? -> learning

2) How accurately can we predict Y from x1, x2, x3? -> prediction

data -- (learn) --> pattern, knowledge, principles (model)

<--(apply) --

Formally,

collect data : observe Yi and Xi = (Xi1, ..., Xip) for i = 1, ..., n

assume that there is a relationship between Y and X's.

model the relationship(f) as Yi = f(Xi) + ei (random error (zero - mean))

statistical learning : estimate (learn) f from data

Models are useful for

1) prediction : predict Y from (new or unseen) X

2) inference : understand the relationship between X and Y

Prediction

Once we have a good model, we can predict Y from new X.

$\hat{y}$ (prediction, estimate) = f^(X) (estimate of f, f itself is unknown!)

(이 수식에서 기호 ^는 “캐럿(caret)”이라는 기호이다. $\hat{y}$ 는 “와이 햇(y hat)”이라고 읽는다.)

How accurate is the prediction?

Reducible vs irreducible errors

True relationship : Y = f(x) + e

We learn f from data and use it for prediction. Y^ = f^(X)

In general, f^ != f : reducible error, potentially improved by f^ -> f.

But, even if we know f, there is irreducible error. Y^ = f(X) (still missing e)

Irreducible Errors are the errors caused by the variables beyond the realm of X (our set of predictor variables).

Quantification of the error

- mean squared error(MSE)

Goal : estimate f so that reducible error is minimized

Inference

In prediction, f was a (?)black box.

But, for inference, we want to know the exact form of f.

Understand how Y changes as a function of X1, ..., Xp.

input(x) ---> (?)black box f --> output(Y)

Inference questions

-which predictors are associated with the reponse

e.g. among X1, ..., Xp, which are relevant?

-what is the relationship between the response and each predictor?

e.g. increasing x1 -> increase (or decrease) Y?

e.g. increasing x1 -> increase (or decrease) Y when x2 is positive?

-Can the relationship between Y and each predictor be adequately summarized using a linear equation, or is the relationship more complicated?

Some examples of Prediction vs. inference

prediction example : direct-marketing

- given 90,000 people with 400 different characteristics, want to predict how much money an individual will donate.

- should I send a mailing to a given individual?

(Don't care how you estimate Y)

x1, ..., xp: demographic data

Y: positive or negative response

Inference example : advertising

which media contribute to sales?

which media generate the biggest boost in sales? or

how much increase in sales is associated with a given increase in SNS advertising?

Inference example : Housing

How do we estimate f?

Given a set of training data {(x1, y1), (x2, y2), ..., (xn,yn)}, we want to estimate f.

Two types of approaches.

-parametric methods

-non-parametric methods

Parametric methods

-estimating f -> estimating a set of parameteres

Step 1. make an assumption about the functional form, a model, of f.

i.e. a linear model

Step 2. Use the training data to fit the model.

i.e. estimate beta0,1,2,...,Beta p of the linear model (using least square)

linear model : only need to estimate p+1 coefficients! coefficient(계수)

non-parametric methods

-do not make explicit assumptions about the functional form of f.

-advantage : flexibility! can fit a wider range of f.

-disadvantage: difficult to learn. require more data.

parametric vs. non-parametric models

-There are always parameters.

-Parametric models: parameteres are explicitly estimated. (e.g. linear regression)

-Non-parametric models:

-I chooses a family of models. But, I don't have direct control of parameters.

-Non-parameter models actually have far more parameters!

The bigger the better?

Q. why would we ever choose to use a more restrictive method instead of a very flexible approach?

Trade-off : flexibility vs. interpretability

여러가지 통계학습 방법들의 유연성과 해석 가능성 사이의 균형을 나타내는 그림. 일반적으로 유연성이 증가하면 해석 가능성이 감소한다.

Simple models are easier to interpret!

Back to linear regression.

y^=β0 + β1⋅Xi1+ β2Xi2 +...+ βpXip

βj : the average increase in Y for a one unit increase in Xj holding all other variables constant.

Overfitting

Too flexible model -> poor estimation

Y = f(X) + e (random error)

Summary

learned key concepts of supervised learning

learning : learn f from (training) data

prediction vs. inference

reducible vs. irreducible errors

parametric vs. non-parametric methods for learning

flexibility vs. interpretability

overfitting

참고 자료 :

https://medium.com/mighty-data-science-bootcamp/%EC%B5%9C%EC%84%A0%EC%9D%98-%EB%AA%A8%EB%8D%B8%EC%9D%84-%EC%B0%BE%EC%95%84%EC%84%9C-%EB%B6%80%EC%A0%9C-bias%EC%99%80-variance-%EB%AC%B8%EC%A0%9C-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0-eccbaa9e0f50

최선의 모델을 찾아서 (부제: bias와 variance 문제 이해하기)

니모 아빠 말린과 친구 도리가 니모를 찾아 넓은 바다로 모험을 떠났다면, 우리는 데이터의 바다에서 최선의 모델을 찾아 오늘도 모험을 떠납니다. 다른 점이 있다면 말린의 아들 니모는 유일한

medium.com

https://senthilkumarbala.medium.com/reducible-and-irreducible-errors-663eadace3a3

Reducible and Irreducible Errors

Suppose you are an aspiring data scientist in a procurement team of a large company and want to get your hands dirty on your first…

senthilkumarbala.medium.com

https://wikidocs.net/30807

2.4. Linear model vs non-Linear model

## Linear model vs non-Linear model 이전 장에서 선형 모델의 확장의 일환으로 비선형 상관관계를 수용하도록 하는 방법이 활용할 수 있음을 보았습니다. …

wikidocs.net

https://process-mining.tistory.com/131

Parametric model과 Non-parametric model

머신러닝(혹은 통계학)을 공부하다 보면, parametric/non-parametric model이나 parametric/non-parametric test와 같은 단어를 자주 접할 수 있다. 이번 포스팅에서는 parametric model과 non-parametric model의 뜻과 함께

process-mining.tistory.com

728x90