# Comparing Classification Methods

Let's now compare the different classification methods we have discussed. 
Instead of using our own implementations, we will harness the more efficient implementations of `scikit-learn`. 
We will test each of them with the [breast cancer dataset](../setup/datasets), so we can start by loading this dataset. 

In [None]:
import pandas as pd

data = pd.read_csv('./../data/breast-cancer.csv')
data

This is a labelled dataset where the labels are either Malignant or Benign. 
To use these in many of our algorithms, we need to encode these to numerical values. 

In [2]:
data['Encoded Diagnosis'] = data['Diagnosis'].apply(lambda x: 1 if x == 'Malignant' else 0)

Let's split the data into our usual training and test subsets. 

In [8]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.2, random_state=42)

We will then scale the data. 
This is not necessarily required for all of the algorithms, but for consistency, we will use it in all cases.

In [9]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_train = scaler.fit_transform(train.drop(['Diagnosis', 'Encoded Diagnosis'], axis=1))
scaled_test = scaler.fit_transform(test.drop(['Diagnosis', 'Encoded Diagnosis'], axis=1))

Since all methods have a shared API, we can use a loop to perform each method in turn. 
For each method, we train using the training share and then make predictions on the test data. 

In [10]:
from sklearn.linear_model import LogisticRegression 
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

methods = {'Logistic Regression': LogisticRegression(random_state=42),
           'SVM': SVC(random_state=42),
           'Random Forest': RandomForestClassifier(random_state=42)}

for k, v in methods.items():
    v.fit(scaled_train, train['Encoded Diagnosis'])
    test[f'{k} Prediction'] = v.predict(scaled_test)

## Metrics

To quantify the success of a machine learning workflow, numerical quality scores are necessary. 
So far, we have used `accuracy_score`; however, this is not an ideal metric as it only accounts for when the algorithm has identified *true positives*. 
Other popular metrics include precision, recall, and the combinations of these two, the F<sub>1-score. 
Precision tells us how the many of the samples that are identified as malignant were, in fact, malignant, 

$$
\text{precision} = \frac{N(\text{true positives})}{N(\text{true positives})+ N(\text{false positives})}.
$$

The recall answers how many of the malignant samples were correctly identified by the algorithm, 

$$
\text{recall} = \frac{N(\text{true positives})}{N(\text{true positives})+ N(\text{false negatives})}.
$$

Finally, the F<sub>1</sub>-score balances these two and is a valuable tool when a single metric is needed. 

$$
F_1 = \frac{2\times\text{precision}\times\text{recall}}{\text{precision}+\text{recall}}.
$$

The true and false positives and negatives are described in {numref}`metrics`. 

```{figure} ../images/metrics.png
---
name: metrics
width: 60%
---
A figure showing the identification of true and false positives and negatives that make up the precision and recall scores. 
```

These metrics are computed with `sklearn`, as shown below. 

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

for k in methods.keys():
    print(f'{k} Precision: {precision_score(test["Encoded Diagnosis"], test[f"{k} Prediction"]):.3f}')
    print(f'{k} Recall: {recall_score(test["Encoded Diagnosis"], test[f"{k} Prediction"]):.3f}')
    print(f'{k} F1-Score: {f1_score(test["Encoded Diagnosis"], test[f"{k} Prediction"]):.3f}')
    print()

All three methods do very well in the classification. 
Indeed, the scikit-learn implementation outperform the implementations we wrote ourselves. 
However, from comparing the F<sub>1</sub>-scores, the support vector machine is the most effective. 
We highlight here that the implementations used are na√Øve, in that there is no hyperparameter optimisation being used. 
To achieve this, one could consider performing some random search or optimisation over the hyperparameter space. 