It is non-probabilistic, because the features in the new objects fully determine its location in feature space and there is no stochastic element involved. This makes it an example of a non-probabilistic linear classifier. documents/emails), it places an object "above" or "below" the separation plane, leading to a categorisation (e.g. Based on the features in the new unseen objects (e.g.
It achieves this by creating a linear partition of the feature space into two categories.
The goal of the SVM is to train a model that assigns new unseen objects into a particular category. In the context of spam or document classification, each "feature" is the prevalence or importance of a particular word. SVMs are highly applicable to such situations.Ī Support Vector Machine models the situation by creating a feature space, which is a finite-dimensional vector space, each dimension of which represents a "feature" of a particular object. Similarly, we could classify new emails into spam or non-spam, based on a large corpus of documents that have already been marked as spam or non-spam by humans. A good example of such a system is classifying a set of new documents into positive or negative sentiment groups, based on other documents which have already been classified as positive or negative. That is, we wish to categorise new unseen objects into two separate groups based on their properties and a set of known examples, which are already categorised. The problem to be solved in this article is one of supervised binary classification. Subsequent articles will make use of the Python scikit-learn library to demonstrate some examples of the aforementioned theoretical techniques on actual data. This article specifically will cover the theory of maximal margin classifiers, support vector classifiers and support vector machines.
Hence this article will form the first part in a series of articles that discuss support vector machines. I feel it is important for a quant researcher or data scientist to be comfortable with both the theoretical aspects and practical usage of the techniques in their toolkit. As such, it is an important tool for both the quantitative trading researcher and data scientist. It is one of the best "out of the box" supervised classification techniques. In this guide I want to introduce you to an extremely powerful machine learning technique known as the Support Vector Machine (SVM).