Supervised Learning | K-nearest neighbors | Logistic Regression | Machine Learning | Sci-kit Learn | Part-1



What is machine learning?
Machine learning is the semi-automated extraction of knowledge from data.
  • Knowledge from data: Starts with a question that might be answerable using data
  • Automated extraction: A computer provides the insight
  • Semi-automated: Requires many smart decisions by a human
Machine learning is a great tool to analyze data, find hidden data patterns and relationships, and extract information to enable information-driven decisions and provide insights.
The machine learning approach starts with either a problem that you need to solve or a given dataset that you need to analyze.
Two main categories of machine learning:
Supervised learning: Making predictions using data
Example: Is a given email "spam" or "ham"? There is an outcome we are trying to predict
Unsupervised learning: Extracting structure from data
Example: Segment grocery store shoppers into clusters that exhibit similar behaviors
There is no "right answer"
High-level steps of supervised learning:
First, train a machine learning model using labeled data
  • "Labeled data" has been labeled with the outcome
  • "Machine learning model" learns the relationship between the attributes of the data and its outcome
Then, make predictions on new data for which the label is unknown.The primary goal of supervised learning is to build a model that "generalizes": It accurately predicts the future rather than the past!


Evaluation Metrics
Models
SkLearn Constructors
Classification
mean accuracy, confusion matrix,
Sensitivity, Specificity
K-Nearest Neighbor
KNeighborsClassifier
Logistic Regression
LogisticRegression
Regression
RMSE
Linear Regression
LinearRegression
K-Nearest Neighbor
KNeighborsRegressor

Machine learning on the iris dataset

  • Framed as a supervised learning problem: Predict the species of an iris using the measurements
  • Famous dataset for machine learning because prediction is easy
  • Learn more about the iris dataset: UCI Machine Learning Repository
Machine learning terminology:

  • Each row is an observation (also known as: sample, example, instance, record)
  • Each column is a feature (also known as: predictor, attribute, independent variable, input, regressor, covariate)
  • Prediction data is known as label,target
Lets load and understand the dataset,
  • 50 samples of 3 different species of iris (150 samples total)
  • Measurements: sepal length, sepal width, petal length, petal width
  • Response variable is the iris species
  • Classification problem since response is categorical
Scikit-learn 4-step modeling pattern

K-nearest neighbors classification
Logistic Regression

No comments:

Post a Comment

Note: only a member of this blog may post a comment.