A Support Vector Machine (SVM) is a classifier that aims to find an optimal hyperplane that separates a number of categories. That separartion of categories can then be used for predictions. SVM's have been widely used in the biological sciences so it makes sense to give it a run over the Iris datset built into the Seaborn library. A famous dataset that dates back to 1936.
Using the scikit-learn train test split utility, the data is prepared for the model. The scikit-learn svm.svc classifier is the standard approach and with this dataset yields an f1 score of 0.98. A great result, but in order to be competitive on a site like Kaggle and set yourself apart from the crowd. It's well worth tweaking the model to see what may be gained.
For this we can use a grid search which allows for multiple paramaters to be tested. The parameters are specified in gridsearchCV through the param_grid argument and through a process of iteration and evaluation the best paramters can be set. Subsequent evaluation of the model yields full scores of 1.0...and the need for a more complicate dataset.
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV
iris = sns.load_dataset('iris')
X = iris.drop('species', axis=1)
Y = iris['species']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
model = SVC()
model.fit(X_train, Y_train)
predictions = model.predict(X_test)
print(confusion_matrix(Y_test, predictions))
print(classification_report(Y_test, predictions))
param_grid = {'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001]}
grid = GridSearchCV(SVC(), param_grid, verbose=3)
grid.fit(X_train, Y_train)
grid.best_params_
grid.best_estimator_
grid_predictions = grid.predict(X_test)
print(confusion_matrix(Y_test, grid_predictions))
print(classification_report(Y_test, grid_predictions))