๊ด€๋ฆฌ ๋ฉ”๋‰ด

Unfazedโ—๏ธ๐ŸŽฏ

ch2_learning ๋ณธ๋ฌธ

AI/๋จธ์‹ ๋Ÿฌ๋‹

ch2_learning

9taetae9 2024. 3. 12. 14:31
728x90

Key terms

Y = f(x1, x2, x3)

- want to improve sales (Y) of a product

-> Y: output variable, dependent variable

-control adveritsing budgets : sns(x1), streaming(x2), flier(x3)

->x1, x2, x3 : input variables, independent variables, predictors 

 

Key questions

1) What is the relationship between x1, x2, x3 and Y? -> learning

2) How accurately can we predict Y from x1, x2, x3? -> prediction

 

data -- (learn) --> pattern, knowledge, principles (model)

       <--(apply) --

 

 

Formally, 

collect data : observe Yi and Xi = (Xi1, ..., Xip) for i = 1, ..., n

assume that there is a relationship between Y and X's.

model the relationship(f) as Yi = f(Xi) + ei (random error (zero - mean))

statistical learning : estimate (learn) f from data 

 

 

Models are useful for

1) prediction : predict Y from (new or unseen) X

2) inference : understand the relationship between X and Y

 

Prediction

Once we have a good model, we can predict Y from new X.

y^ (prediction, estimate) = f^(X) (estimate of f, f itself is unknown!)

(์ด ์ˆ˜์‹์—์„œ ๊ธฐํ˜ธ ^๋Š” “์บ๋Ÿฟ(caret)”์ด๋ผ๋Š” ๊ธฐํ˜ธ์ด๋‹ค. y^๋Š” “์™€์ด ํ–‡(y hat)”์ด๋ผ๊ณ  ์ฝ๋Š”๋‹ค.)

 

How accurate is the prediction?

 

Reducible vs irreducible errors 

True relationship : Y = f(x) + e

 

We learn f from data and use it for prediction. Y^ = f^(X)

 

In general, f^ != f : reducible error, potentially improved by f^ -> f.

But, even if we know f, there is irreducible error. Y^ = f(X) (still missing e)

Irreducible Errors are the errors caused by the variables beyond the realm of X (our set of predictor variables). 

 

Quantification of the error

- mean squared error(MSE) 

Goal : estimate f so that reducible error is minimized

 

Inference

In prediction, f was a (?)black box.

But, for inference, we want to know the exact form of f.

Understand how Y changes as a function of X1, ..., Xp.

input(x) ---> (?)black box f --> output(Y)

 

Inference questions 

-which predictors are associated with the reponse

e.g. among X1, ..., Xp, which are relevant?

 

-what is the relationship between the response and each predictor?

e.g. increasing x1 -> increase (or decrease) Y?

e.g. increasing x1 -> increase (or decrease) Y when x2 is positive?

 

-Can the relationship between Y and each predictor be adequately summarized using a linear equation, or is the relationship more complicated?

 

Some examples of Prediction vs. inference 

prediction example : direct-marketing 

- given 90,000 people with 400 different characteristics, want to predict how much money an individual will donate.

- should I send a mailing to a given individual?

(Don't care how you estimate Y)

x1, ..., xp: demographic data

Y: positive or negative response

 

Inference example : advertising

which media contribute to sales?

which media generate the biggest boost in sales? or

how much increase in sales is associated with a given increase in SNS advertising?

 

Inference example : Housing

 

How do we estimate f?

Given a set of training data {(x1, y1), (x2, y2), ..., (xn,yn)}, we want to estimate f.

Two types of approaches.

-parametric methods

-non-parametric methods

 

Parametric methods

-estimating f -> estimating a set of parameteres

Step 1. make an assumption about the functional form, a model, of f.

i.e. a linear model

 

Step 2. Use the training data to fit the model.

i.e. estimate beta0,1,2,...,Beta p of the linear model (using least square)

linear model : only need to estimate p+1 coefficients!    coefficient(๊ณ„์ˆ˜)

 

non-parametric methods

-do not make explicit assumptions about the functional form of f.

-advantage : flexibility! can fit a wider range of f.

-disadvantage: difficult to learn. require more data.

 

parametric vs. non-parametric models

-There are always parameters.

-Parametric models: parameteres are explicitly estimated. (e.g. linear regression)

-Non-parametric models:

 -I chooses a family of models. But, I don't have direct control of parameters.

 -Non-parameter models actually have far more parameters!

 

The bigger the better?

Q. why would we ever choose to use a more restrictive method instead of a very flexible approach?

 

Trade-off : flexibility vs. interpretability

์˜ˆ์ธก ์ •ํ™•๋„์™€ ๋ชจ๋ธ ํ•ด์„๋ ฅ์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„

์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ†ต๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•๋“ค์˜ ์œ ์—ฐ์„ฑ๊ณผ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ทธ๋ฆผ. ์ผ๋ฐ˜์ ์œผ๋กœ ์œ ์—ฐ์„ฑ์ด ์ฆ๊ฐ€ํ•˜๋ฉด ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ์†Œํ•œ๋‹ค.

 

Simple models are easier to interpret!

Back to linear regression.

y^=β0 +  β1⋅Xi1+ β2Xi2 +...+ βpXip

βj : the average increase in Y for a one unit increase in Xj holding all other variables constant.

 

Overfitting

Too flexible model -> poor estimation

Y = f(X) + e (random error)

 

 

 

 

 

 

Summary

learned key concepts of supervised learning

learning : learn f from (training) data

prediction vs. inference

reducible vs. irreducible errors

parametric vs. non-parametric methods for learning

flexibility  vs. interpretability

overfitting

 

 

์ฐธ๊ณ  ์ž๋ฃŒ :

https://medium.com/mighty-data-science-bootcamp/%EC%B5%9C%EC%84%A0%EC%9D%98-%EB%AA%A8%EB%8D%B8%EC%9D%84-%EC%B0%BE%EC%95%84%EC%84%9C-%EB%B6%80%EC%A0%9C-bias%EC%99%80-variance-%EB%AC%B8%EC%A0%9C-%EC%9D%B4%ED%95%B4%ED%95%98%EA%B8%B0-eccbaa9e0f50

 

์ตœ์„ ์˜ ๋ชจ๋ธ์„ ์ฐพ์•„์„œ (๋ถ€์ œ: bias์™€ variance ๋ฌธ์ œ ์ดํ•ดํ•˜๊ธฐ)

๋‹ˆ๋ชจ ์•„๋น  ๋ง๋ฆฐ๊ณผ ์นœ๊ตฌ ๋„๋ฆฌ๊ฐ€ ๋‹ˆ๋ชจ๋ฅผ ์ฐพ์•„ ๋„“์€ ๋ฐ”๋‹ค๋กœ ๋ชจํ—˜์„ ๋– ๋‚ฌ๋‹ค๋ฉด, ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์˜ ๋ฐ”๋‹ค์—์„œ ์ตœ์„ ์˜ ๋ชจ๋ธ์„ ์ฐพ์•„ ์˜ค๋Š˜๋„ ๋ชจํ—˜์„ ๋– ๋‚ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์ ์ด ์žˆ๋‹ค๋ฉด ๋ง๋ฆฐ์˜ ์•„๋“ค ๋‹ˆ๋ชจ๋Š” ์œ ์ผํ•œ

medium.com

https://senthilkumarbala.medium.com/reducible-and-irreducible-errors-663eadace3a3

 

Reducible and Irreducible Errors

Suppose you are an aspiring data scientist in a procurement team of a large company and want to get your hands dirty on your first…

senthilkumarbala.medium.com

https://wikidocs.net/30807

 

2.4. Linear model vs non-Linear model

## Linear model vs non-Linear model ์ด์ „ ์žฅ์—์„œ ์„ ํ˜• ๋ชจ๋ธ์˜ ํ™•์žฅ์˜ ์ผํ™˜์œผ๋กœ ๋น„์„ ํ˜• ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ˆ˜์šฉํ•˜๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์•˜์Šต๋‹ˆ๋‹ค. …

wikidocs.net

https://process-mining.tistory.com/131

 

Parametric model๊ณผ Non-parametric model

๋จธ์‹ ๋Ÿฌ๋‹(ํ˜น์€ ํ†ต๊ณ„ํ•™)์„ ๊ณต๋ถ€ํ•˜๋‹ค ๋ณด๋ฉด, parametric/non-parametric model์ด๋‚˜ parametric/non-parametric test์™€ ๊ฐ™์€ ๋‹จ์–ด๋ฅผ ์ž์ฃผ ์ ‘ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” parametric model๊ณผ non-parametric model์˜ ๋œป๊ณผ ํ•จ๊ป˜

process-mining.tistory.com

 

728x90