DataScience With Python/R/SAS: Supervised Learning | K-nearest neighbors | Logistic Regression | Machine Learning | Sci-kit Learn

What is machine learning?
Machine learning is the semi-automated extraction of knowledge from data.

Knowledge from data: Starts with a question that might be answerable using data
Automated extraction: A computer provides the insight
Semi-automated: Requires many smart decisions by a human

Machine learning is a great tool to analyze data, find hidden data patterns and relationships, and extract information to enable information-driven decisions and provide insights.
The machine learning approach starts with either a problem that you need to solve or a given dataset that you need to analyze.

Two main categories of machine learning:

Supervised learning: Making predictions using data
Example: Is a given email "spam" or "ham"? There is an outcome we are trying to predict

Unsupervised learning: Extracting structure from data
Example: Segment grocery store shoppers into clusters that exhibit similar behaviors
There is no "right answer"

High-level steps of supervised learning:

First, train a machine learning model using labeled data

"Labeled data" has been labeled with the outcome
"Machine learning model" learns the relationship between the attributes of the data and its outcome

Then, make predictions on new data for which the label is unknown.The primary goal of supervised learning is to build a model that "generalizes": It accurately predicts the future rather than the past!

Evaluation Metrics

Models

SkLearn Constructors

Classification

mean accuracy, confusion matrix,

Sensitivity, Specificity

K-Nearest Neighbor

KNeighborsClassifier

Logistic Regression

LogisticRegression

Regression

RMSE

Linear Regression

LinearRegression

K-Nearest Neighbor

KNeighborsRegressor

Machine learning on the iris dataset

Framed as a supervised learning problem: Predict the species of an iris using the measurements

Famous dataset for machine learning because prediction is easy

Learn more about the iris dataset: UCI Machine Learning Repository

Machine learning terminology:

Each row is an observation (also known as: sample, example, instance, record)

Each column is a feature (also known as: predictor, attribute, independent variable, input, regressor, covariate)

Prediction data is known as label,target

Lets load and understand the dataset,

50 samples of 3 different species of iris (150 samples total)
Measurements: sepal length, sepal width, petal length, petal width
Response variable is the iris species
Classification problem since response is categorical

Scikit-learn 4-step modeling pattern

K-nearest neighbors classification

Logistic Regression

DataScience With Python/R/SAS

Easy Pages

Supervised Learning | K-nearest neighbors | Logistic Regression | Machine Learning | Sci-kit Learn | Part-1

Machine learning on the iris dataset

No comments:

Post a Comment

	Evaluation Metrics	Models	SkLearn Constructors
Classification	mean accuracy, confusion matrix, Sensitivity, Specificity	K-Nearest Neighbor	KNeighborsClassifier
Classification	mean accuracy, confusion matrix, Sensitivity, Specificity	Logistic Regression	LogisticRegression
Regression	RMSE	Linear Regression	LinearRegression
Regression	RMSE	K-Nearest Neighbor	KNeighborsRegressor

Easy Pages

Supervised Learning | K-nearest neighbors | Logistic Regression | Machine Learning | Sci-kit Learn | Part-1

Evaluation Metrics Models SkLearn Constructors Classification mean accuracy, confusion matrix, Sensitivity, Specificity K-Nearest Neighbor KNeighborsClassifier Logistic Regression LogisticRegression Regression RMSE Linear Regression LinearRegression K-Nearest Neighbor KNeighborsRegressor

Machine learning on the iris dataset

No comments:

Post a Comment