๊ด€๋ฆฌ ๋ฉ”๋‰ด

Unfazedโ—๏ธ๐ŸŽฏ

ch2_learning2 (Balancing Flexibility to Optimize Model Accuracy) ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ํ‰๊ฐ€ ๋ณธ๋ฌธ

AI/๋จธ์‹ ๋Ÿฌ๋‹

ch2_learning2 (Balancing Flexibility to Optimize Model Accuracy) ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ํ‰๊ฐ€

9taetae9 2024. 3. 13. 20:18
728x90

Assessing Model Accuracy

So many machine learning methods!
• A single best method for all data sets? Nope!
• One method may work best on a particular data set.
• But, some other method may work better on a similar but different data set

 

How to compare Methods?
• Given a set of data, which method will produce the best result?
• In other words, how to compare different learning methods?

 

Measuring quality of fit
• Evaluate the performance of a statistical learning method on a given data set
• For regression problem, a common measure of prediction accuracy is the

mean squared error:

 

• We choose the model that achieves smallest MSE.

training MSE vs. test MSE
• training MSE: easy to minimize
• but, training MSE ≠ test MSE
• test MSE is what we really want to minimize!

 

Overfitting
• “working too hard” to find a pattern from irreducible error!

Y = f(X) + e

 

 

test MSE

 

 

 

 

 

 

training MSE

 

 

 

 

 

 

 

 

 

 

์™ผ์ชฝ ๊ทธ๋ž˜ํ”„์—์„œ ๊ฒ€์€์ƒ‰ ์„ ์€ ์‹ค์ œ ํ•จ์ˆ˜ f ๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ด๋Š” ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ •ํ•˜๊ณ ์ž ํ•˜๋Š” ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์—ฌ๊ธฐ์— ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ(๋™๊ทธ๋ผ๋ฏธ๋กœ ํ‘œ์‹œ๋œ)๋Š” ์‹ค์ œ ํ•จ์ˆ˜์—์„œ ๋ฐœ์ƒํ–ˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์˜ค๋ Œ์ง€์ƒ‰ ์„ ์€ ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ํ†ตํ•ด ์–ป์–ด์ง„ ์ถ”์ •์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์„ ํ˜• ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ์˜ ์ „๋ฐ˜์ ์ธ ๊ฒฝํ–ฅ์„ฑ์€ ํŒŒ์•…ํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋“  ํŒจํ„ด์„ ์žก์•„๋‚ด์ง€๋Š” ๋ชปํ•œ๋‹ค. ํŒŒ๋ž€์ƒ‰๊ณผ ๋…น์ƒ‰ ์„ ์€ ๊ฐ๊ฐ ๋” ๋งŽ์€ ์œ ์—ฐ์„ฑ์„ ๊ฐ€์ง„ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ถ”์ •๋œ ๊ฐ’์œผ๋กœ, ์ด๋Ÿฌํ•œ ์Šคํ”Œ๋ผ์ธ ํ˜น์€ ๋น„์„ ํ˜• ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ์˜ ์ง€์—ญ์ ์ธ ๋ณ€๋™์„ฑ์„ ๋” ์ž˜ ํฌ์ฐฉํ•œ๋‹ค.

์˜ค๋ฅธ์ชฝ ๊ทธ๋ž˜ํ”„๋Š” ์œ ์—ฐ์„ฑ์— ๋”ฐ๋ฅธ ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ(Mean Squared Error, MSE)๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์—ฌ๊ธฐ์„œ ์œ ์—ฐ์„ฑ์€ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์— ์–ผ๋งˆ๋‚˜ ์ž˜ ๋งž์ถ”๋Š”์ง€์˜ ์ •๋„๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ํšŒ์ƒ‰ ์„ ์€ ํ›ˆ๋ จ MSE๋ฅผ, ๋นจ๊ฐ„์ƒ‰ ์„ ์€ ํ…Œ์ŠคํŠธ MSE๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ํ›ˆ๋ จ MSE๋Š” ๋ชจ๋ธ์ด ๋” ์œ ์—ฐํ•ด์งˆ์ˆ˜๋ก ๋‚ฎ์•„์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์ง€๋งŒ, ํ…Œ์ŠคํŠธ MSE๋Š” ์ผ์ • ํฌ์ธํŠธ ์ดํ›„์— ์ฆ๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•œ๋‹ค. ์ด๋Š” ๋ชจ๋ธ์ด ๋„ˆ๋ฌด ์œ ์—ฐํ•ด์ ธ์„œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ๋˜์–ด ์‹ค์ œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” ์ž˜ ์ž‘๋™ํ•˜์ง€ ์•Š์Œ์„ ์˜๋ฏธํ•œ๋‹ค. ์ด์ƒ์ ์œผ๋กœ๋Š” ํ…Œ์ŠคํŠธ MSE๊ฐ€ ์ตœ์†Œ์ธ ๋ชจ๋ธ์„ ์ฐพ์œผ๋ ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜ํ”„์˜ ์‚ฌ๊ฐํ˜•์€ ๊ฐ ์ถ”์ •์น˜(์˜ค๋ Œ์ง€, ํŒŒ๋ž€์ƒ‰, ๋…น์ƒ‰)์˜ ํ›ˆ๋ จ๊ณผ ํ…Œ์ŠคํŠธ MSE๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

 

Q. Which captures the data best? => green (training mse ๊ฐ€์žฅ ๋‚ฎ์Œ)

The green curve best captures the training data as it fits closest to the individual data points, following the fluctuations in the training data with the highest precision. This level of flexibility in the model tends to demonstrate a high degree of fit to the given data. However, it's important to be cautious of the potential for overfitting when applying the model to test data.

 

์ฃผ์–ด์ง„ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ํฌ์ฐฉํ•˜๋Š” ๊ฒƒ์€ ๋…น์ƒ‰ ์„ ์ด๋‹ค. ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์— ๊ฐ€์žฅ ๊ทผ์ ‘ํ•˜๊ฒŒ ๋งž์ถ”์–ด์ ธ ์žˆ๊ณ , ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ๋ณ€๋™์„ฑ์„ ๊ฐ€์žฅ ์„ธ๋ฐ€ํ•˜๊ฒŒ ๋”ฐ๋ผ๊ฐ€๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณ ๋„๋กœ ์œ ์—ฐํ•œ ๋ชจ๋ธ์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋†’์€ ์ ํ•ฉ๋„๋ฅผ ๋ณด์ด๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ๊ณผ์ ํ•ฉ(overfitting)์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ์„ ์œ ์˜ํ•ด์•ผ ํ•œ๋‹ค.


Q. Which is the best estimate of f? => blue (test mse ๊ฐ€์žฅ ๋‚ฎ์Œ)

The blue model is considered the best estimate of the true function f, as it captures the structure of the data effectively while avoiding overfitting, thereby demonstrating the greatest generalization ability on new data. This is evidenced by its lowest test MSE, indicating a balance between fitting the training data and maintaining predictive performance on unseen data.

 

ํŒŒ๋ž€์ƒ‰ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ๋ฅผ ์ž˜ ํฌ์ฐฉํ•˜๋ฉด์„œ๋„ ๊ณผ์ ํ•ฉ๋˜์ง€ ์•Š์•„ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ๊ฐ€์žฅ ๋›ฐ์–ด๋‚˜๋‹ค.


 ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ํฌ์ฐฉํ•˜๋Š” ๋ชจ๋ธ๊ณผ f์˜ ์ตœ์„ ์˜ ์ถ”์ •์น˜์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž. ์˜ค๋ Œ์ง€์ƒ‰ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ์˜ ์ „๋ฐ˜์ ์ธ ์ถ”์„ธ๋ฅผ ์ž˜ ํŒŒ์•…ํ•˜์ง€๋งŒ, ๋ชจ๋“  ํŒจํ„ด์„ ์žก์•„๋‚ด์ง€๋Š” ๋ชปํ•œ๋‹ค. ํŒŒ๋ž€์ƒ‰๊ณผ ๋…น์ƒ‰ ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ์˜ ์„ธ๋ถ€์ ์ธ ๋ณ€๋™๊นŒ์ง€ ์ž˜ ํฌ์ฐฉํ•˜๊ณ  ์žˆ์ง€๋งŒ, ๋„ˆ๋ฌด ์œ ์—ฐํ•ด์„œ ํ…Œ์ŠคํŠธ MSE๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๊ณผ์ ํ•ฉ์˜ ์œ„ํ—˜์ด ์žˆ๋‹ค. ๊ทธ๋ž˜ํ”„ ์ƒ์—์„œ๋Š” ํŒŒ๋ž€์ƒ‰ ๋ชจ๋ธ์ด ํ…Œ์ŠคํŠธ MSE๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์—, ์ผ๋ฐ˜ํ™” ์ธก๋ฉด์—์„œ๋Š” ํŒŒ๋ž€์ƒ‰ ๋ชจ๋ธ์ด f์˜ ์ตœ์„ ์˜ ์ถ”์ •์น˜๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

In general,
• more flexible a method โž” lower training MSE
• However, the test MSE may be higher for a more flexible method!

 

Avoid overfitting!
Choose just the right level of flexibility to avoid overfitting in mind!

728x90