Spam Detection

Overall findings

Problem

Scope

Stakeholders

Overall findings

Multiple vectorization and transformation techniques were applied to the raw data.
The raw data was balanced.
Data split was performed before data transformation o prevent Data Leakage. Thus, the train data is not influenced by the mean and standard deviation of the test data.
Multinomial Naive Bayes classifiers satisfactory performance metrics.

                Precision    Recall  F1-score   Support

         Ham       0.96      0.93      0.94       190
        Spam       0.94      0.96      0.95       201

    Accuracy                           0.95       391
   Macro avg       0.95      0.95      0.95       391
Weighted avg       0.95      0.95      0.95       391

Problem

Scope

Stakeholders

CAPTIONS

Code

Full Project