AI/머신러닝

ch1_overview

9taetae9 2024. 3. 6. 12:45
728x90

statistical learning

supervised/ unsupervised

 

supervised : given data (x,y), build a model, a mapping Y(input) ~ f(x)(output)

- use the model f to predict Y for unknown inputs X'

 

find relationship between factors based on data

ex) based on various values year, age, gender ... => wage 

 

consider outlier (이상치)

https://en.wikipedia.org/wiki/Outlier

 

Outlier - Wikipedia

From Wikipedia, the free encyclopedia Observation far apart from others in statistics and data science Figure 1. Box plot of data from the Michelson–Morley experiment displaying four outliers in the middle column, as well as one outlier in the first colu

en.wikipedia.org

 

scatter plot

box plot : median(center value) top 25% bottom 75% 

=> there are some reasons using certain plot 

 

 

regression : predict a continuous ouput from inputs(ch3)

 

 

predict a categorical output(classification) (ch4)

predict Y(a categorical output) from X (inputs) 

categorical -> not continuous

 

machine learning does not always work

 

 

 

unsupervised

given only inputs x, learn the underlying structure => hard to analyze 

=> use Dimension reduction technique (ch12)

find some 'meaningful' directions Z1 and Z2. and, then plot the data using z1 and z2. now, we can see some underlying structure.

 

ex) gene expression data (expression levels on 6830 genes from 64 cancer cell lines

 

 

 

728x90