The basic supervised machine learning algorithms like KNN, Decision trees, etc. have certain limitations. Because of their simplicity, sometimes they are unable produce a model accurate enough for your problem. You could try using deep neural networks but, in practice, deep neural networks require a large amount of labeled data which you might not have and it might be slower for your need. Another approach for boosting the performance of simple learning algorithms can be using ensemble learning.
Ensemble learning is a learning paradigm that, instead of trying to learn a single highly accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those low-accuracy models to obtain a high-accuracy meta-model.
Low-accuracy models are usually learned by weak learners, which are learning algorithms that cannot learn complex models, and thus are typically fast at the training and at the prediction time. The most frequently used weak learner is a decision tree in which we often stop splitting the training set after just a few iterations. The obtained trees are shallow and not much accurate, but the idea behind ensemble learning is that if the trees are not identical and each tree is at least slightly better than random guessing, then we can get high accuracy by combining a large number of such shallow trees.
To obtain the prediction for input x, the predictions of each weak model are combined using some sort of weighted voting. The specific form of vote weighting depends on the algorithm, but, independently of the algorithm, the idea is the same: if the council of weak models predicts that the message is spam, then we assign the label spam to x.
Two principal ensemble learning methods includes boosting and bagging.
Boosting and Bagging
Boosting consists of using the original training data and iteratively create multiple models by using weak learners. Each new model would be different from the previous ones in the sense that the weak learner, by building each new model tries to “fix” the errors which the previous models made. The last ensemble model is a certain combination of those multiple weak models built iteratively.
Bagging consists of creating many “copies” of the training data (each copy being slightly different from another) and then apply the weak learner to each copy to obtain multiple weak models and then combine them at last. A widely used and effective machine learning algorithm based on the idea of bagging is random forest.