Lets us explorer and analysis the nlkt module. Getting and preparing the data.
In this example we are going to train two models to classify SMS as "Spam" or "Ham".
In this example we are going to train two models to classify SMS as "Spam" or "Ham".
Lets import relevant modules and load the tab separated file in pandas dataframe, the dataset has 5572 observations and 2 features. we would add 1 more features based on the classification type(label)
Fit the features as X and y as shown above, now we would split the set as training and testing sets using cross validation and then vectorizing to getting the features.
We will use multinomial Naive Bayes classifier, as this is suitable for classification with **discrete features** (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.
Lets now predict the class for the following sms,
No comments:
Post a Comment
Note: only a member of this blog may post a comment.