Naive Bayes Classifier


Classification is the process of predicting the class of given data points.
Classes are sometimes called as targets/ labels or categories.
Classification is the process of building a model of classes from a set of records that contain class labels.
With other words classification the process of approximating a mapping function (f) from input variables (X) to discrete output variables (y).

In this article we will talk only about one such classifier, the Naive Bayes Classifier, a probabilistic classifier that considers all attributes are equally important and independent.
It can basically be considered the opposite of OneR classifier, which classifies using only one attribute, which it determines is the most important predictor.

This classifier is based on Thomas Bayes (1702-1761) work. The formula used is:
Pr[H|E] = ( Pr[E1|H] * Pr[E2|H] * … * Pr[En|H] * Pr[H] ) / Pr[E]

H = hypothesis
Pr[H] = probability of event before evidence is seen
E = evidence
Pr[H|E] = probability of event after evidence is seen

To better understand this here is an example using the weather dataset provided for educational purposes by the Weka project:


Some explanation:
- first table contains the known dataset we work with
- last column is what we consider the class
- all other columns are data attributes
- this example should be easy to follow considering the class can have just 2 values Yes/No
- second table contains the data tallied for each attribute; We are interested in determine that for example when temperature was Hot, the record was classified as Play=Yes 2 times out of 9 …
- having a new record Outlook==Sunny, Temp==Cool, Humidity==High, Wind==True we are interested in classifying it as Play Yes/No using Naive Bayes classifier
- The final outcome is that the new record has a 79.5% chances of being a PlayYes and only 20.5% of being a PlayNo

Naive Bayes can suffer from a problem called the zero probability problem. When the conditional probability is zero for a particular attribute, it fails to give a valid prediction. A easy way to solve this problem, is to start add 1 to all counts (Laplace correction).

As you can see Naive Bayes is a very simple algorithm to implement and it usually has good results in most cases. It can be easily scalable to larger datasets since it takes linear time.

When to use the Naive Bayes Classifier?
- If you have a moderate or large training data set.
- If the instances have several attributes.
- Attributes which describe the instances should be statistically independent.

a) Sentiment Analysis: analyze social media posts to see if they express positive or negative emotions.
b) Document Categorization: web pages ranking mage by search engines
c) Classifying news articles as Technology, Entertainment, Sports, Politics, etc.
d) Email Spam Filtering