What is machine learning?
Machine learning is the semi-automated extraction of knowledge from data.
- Knowledge from data: Starts with a question that might be answerable using data
- Automated extraction: A computer provides the insight
- Semi-automated: Requires many smart decisions by a human
The machine learning approach starts with either a problem that you need to solve or a given dataset that you need to analyze.
Two main categories of machine learning:
Supervised learning: Making predictions using data
Example: Is a given email "spam" or "ham"? There is an outcome we are trying to predict
Example: Segment grocery store shoppers into clusters that exhibit similar behaviors
There is no "right answer"
High-level steps of supervised learning:
First, train a machine learning model using labeled data
- "Labeled data" has been labeled with the outcome
- "Machine learning model" learns the relationship between the attributes of the data and its outcome
Evaluation
Metrics
Models
SkLearn
Constructors
Classification
mean accuracy, confusion matrix,
Sensitivity, Specificity
K-Nearest Neighbor
KNeighborsClassifier
Logistic Regression
LogisticRegression
Regression
RMSE
Linear Regression
LinearRegression
K-Nearest Neighbor
KNeighborsRegressor
Evaluation
Metrics
Models
SkLearn
Constructors
Classification
mean accuracy, confusion matrix,
Sensitivity, Specificity
K-Nearest Neighbor
KNeighborsClassifier
Logistic Regression
LogisticRegression
Regression
RMSE
Linear Regression
LinearRegression
K-Nearest Neighbor
KNeighborsRegressor
Machine learning on the iris dataset
- Framed as a supervised learning problem: Predict the species of an iris using the measurements
- Famous dataset for machine learning because prediction is easy
- Learn more about the iris dataset: UCI Machine Learning Repository
- Each row is an observation (also known as: sample, example, instance, record)
- Each column is a feature (also known as: predictor, attribute, independent variable, input, regressor, covariate)
- Prediction data is known as label,target
- 50 samples of 3 different species of iris (150 samples total)
- Measurements: sepal length, sepal width, petal length, petal width
- Response variable is the iris species
- Classification problem since response is categorical
Scikit-learn 4-step modeling pattern
K-nearest neighbors classification
K-nearest neighbors classification
Logistic Regression
No comments:
Post a Comment
Note: only a member of this blog may post a comment.