ch1_overview

AI/머신러닝

ch1_overview

9taetae9 2024. 3. 6. 12:45

728x90

statistical learning

supervised/ unsupervised

supervised : given data (x,y), build a model, a mapping Y(input) ~ f(x)(output)

- use the model f to predict Y for unknown inputs X'

find relationship between factors based on data

ex) based on various values year, age, gender ... => wage

consider outlier (이상치)

https://en.wikipedia.org/wiki/Outlier

Outlier - Wikipedia

From Wikipedia, the free encyclopedia Observation far apart from others in statistics and data science Figure 1. Box plot of data from the Michelson–Morley experiment displaying four outliers in the middle column, as well as one outlier in the first colu

en.wikipedia.org

scatter plot

box plot : median(center value) top 25% bottom 75%

=> there are some reasons using certain plot

regression : predict a continuous ouput from inputs(ch3)

predict a categorical output(classification) (ch4)

predict Y(a categorical output) from X (inputs)

categorical -> not continuous

machine learning does not always work

unsupervised

given only inputs x, learn the underlying structure => hard to analyze

=> use Dimension reduction technique (ch12)

find some 'meaningful' directions Z1 and Z2. and, then plot the data using z1 and z2. now, we can see some underlying structure.

ex) gene expression data (expression levels on 6830 genes from 64 cancer cell lines

728x90