MVA-2021 / dl_in_practice / hw1_basics / TP1_Ex1_Binary_classification.ipynb
TP1_Ex1_Binary_classification.ipynb
Raw
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Binary classification problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dataset :"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We study first a binary classification problem, performed by a neural network. Each input has two real features, and the output can be only 0 or 1. The training set contains 4000 examples, and the validation set, 1000."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import torch\n",
    "\n",
    "# Display figures on jupyter notebook\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a function to generate the dataset, in the form of two interlaced spirals\n",
    "def spiral(phi):\n",
    "    x = (phi+1)*torch.cos(phi)\n",
    "    y = phi*torch.sin(phi)\n",
    "    return torch.cat((x, y), dim=1)\n",
    "\n",
    "def generate_data(num_data):\n",
    "    angles = torch.empty((num_data, 1)).uniform_(1, 15)\n",
    "    data = spiral(angles)\n",
    "    # add some noise to the data\n",
    "    data += torch.empty((num_data, 2)).normal_(0.0, 0.4)\n",
    "    labels = torch.zeros((num_data,), dtype=torch.int)\n",
    "    # flip half of the points to create two classes\n",
    "    data[num_data//2:,:] *= -1\n",
    "    labels[num_data//2:] = 1\n",
    "    return data, labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate the training set with 4000 examples by function generate_data\n",
    "\n",
    "X_train, y_train = generate_data(4000)\n",
    "X_train.size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the vis_data function to visualize the dataset\n",
    "def vis_data(X, y):\n",
    "    plt.figure(figsize=(5, 5))\n",
    "    plt.plot(X[y==1, 0], X[y==1, 1], 'r+') #Examples are represented as red plusses for label 1\n",
    "    plt.plot(X[y==0, 0], X[y==0, 1], 'b+') #Examples are represented as blue plusses for label 0 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now invoke the `vis_data` function on the dataset previously generated to see what it looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vis_data(X_train, y_train) # visualize training set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the `TensorDataset` wrapper from pytorch, so that the framework can easily understand our tensors as a proper dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import TensorDataset, DataLoader\n",
    "training_set = TensorDataset(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Training the model with a neural network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is a skeleton of a neural network with a single layer (thus: a linear classifier). This is the model you'll work on to improve it during this exercise.\n",
    "\n",
    "Look at the code and run it to see the structure, then follow the questions below to iteratively improve the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.nn as nn\n",
    "import torch.nn.functional as F"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At the first step, we define a neural network with just two layers. A useful tutorial for constructing model can be found [here](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic network structure with a single layer\n",
    "class Model(nn.Module):\n",
    "    \n",
    "    def __init__(self):\n",
    "        super(Model, self).__init__()\n",
    "        # A single linear layer\n",
    "        # The model has 2 inputs (the coordinates of the point) and an output (the prediction)\n",
    "        self.l1 = nn.Linear(2, 10)\n",
    "        self.l2 = nn.Linear(10, 1)\n",
    "     \n",
    "        \n",
    "    def forward(self, inputs):\n",
    "        # We want the model to predict 0 for one class and 1 for the other class\n",
    "        # A Sigmoid activation function seems appropriate\n",
    "        h = torch.relu(self.l1(inputs))\n",
    "        outputs = torch.sigmoid(self.l2(h))\n",
    "        \n",
    "        return outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the model: \n",
    "model = Model()\n",
    "\n",
    "# Choose the hyperparameters for training: \n",
    "num_epochs = 10\n",
    "batch_size = 10\n",
    "\n",
    "# Training criterion. This one is a mean squared error (MSE) loss between the output\n",
    "# of the network and the target label\n",
    "criterion = nn.MSELoss()\n",
    "\n",
    "# Use SGD optimizer with a learning rate of 0.01\n",
    "# It is initialized on our model\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=0.01)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Training the defined model\n",
    "More information can be found [here](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define a function for training\n",
    "model.train()\n",
    "def train(num_epochs, batch_size, criterion, optimizer, model, dataset):\n",
    "    train_error = []\n",
    "    train_loader = DataLoader(dataset, batch_size, shuffle=True)\n",
    "    model.train()\n",
    "    for epoch in range(num_epochs):\n",
    "        epoch_average_loss = 0.0\n",
    "        for (X_batch, y_real) in train_loader:\n",
    "            y_pre = model(X_batch).view(-1)\n",
    "            loss = criterion(y_pre, y_real.float())\n",
    "            optimizer.zero_grad()\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "            epoch_average_loss += loss.item() * batch_size / len(dataset)\n",
    "        train_error.append(epoch_average_loss)\n",
    "        print('Epoch [{}/{}], Loss: {:.4f}'\n",
    "                      .format(epoch+1, num_epochs, epoch_average_loss))\n",
    "    return train_error"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_error = train(num_epochs, batch_size, criterion, optimizer, model, training_set)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot the training error wrt. the number of epochs: \n",
    "plt.plot(range(1, num_epochs+1), train_error)\n",
    "plt.xlabel(\"num_epochs\")\n",
    "plt.ylabel(\"Train error\")\n",
    "plt.title(\"Visualization of convergence\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Evaluate the model on the validation set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate 1000 validation data:\n",
    "X_val, y_val = generate_data(1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# predict labels for validation set\n",
    "model.eval() # set the model to test mode\n",
    "with torch.no_grad():\n",
    "    y_pre = model(X_val).view(-1)\n",
    "    #loss = criterion(y_val, y_pre.float())\n",
    "    #print(loss.item())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate the accuracy on validation set to evaluate the model by the function accuracy\n",
    "def accuracy(y_real, y_pre):\n",
    "    y_pre[y_pre<0.5] = 0\n",
    "    y_pre[y_pre>=0.5] = 1\n",
    "\n",
    "    acc = 1 - torch.sum(torch.abs(y_pre - y_real))/len(y_pre)\n",
    "    print('Accuracy of the network on the 1000 validation data: {:.2f} %'.format(acc.item()*100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "accuracy(y_val, y_pre)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compare the prediction with real labels\n",
    "\n",
    "def compare_pred(X, y_real, y_pre):\n",
    "    plt.figure(figsize=(10, 5))\n",
    "\n",
    "    plt.subplot(121)\n",
    "    plt.plot(X[y_real==1, 0], X[y_real==1, 1], 'r+') #Examples are represented as a red plusses for label 1\n",
    "    plt.plot(X[y_real==0, 0], X[y_real==0, 1], 'b+') #Examples are represented as a blue plusses for label 0\n",
    "    plt.title(\"real data\")\n",
    "\n",
    "    plt.subplot(122)\n",
    "    plt.plot(X[y_pre==1, 0], X[y_pre==1, 1], 'r+')\n",
    "    plt.plot(X[y_pre==0, 0], X[y_pre==0, 1], 'b+')\n",
    "    plt.title(\"prediciton results\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "compare_pred(X_val, y_val, y_pre)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1: Impact of the architecture of the model\n",
    "\n",
    "The class `Model` is the definition of your model. You can now modify it to try out different architectures and\n",
    "see the impact of the following factors:\n",
    "\n",
    "* Try to add more layers (1, 2, 3, more ?)\n",
    "* Try to different activation functions ([sigmoid](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.sigmoid), [tanh](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.tanh), [relu](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.relu), etc.)\n",
    "* Try to change the number of neurons for each layer (5, 10, 20, more ?)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2: Impact of the optimizer\n",
    "\n",
    "Retrain the model by using different parameters of the optimizer, you can change its parameter in the cell initializing it, after the definition of your model.\n",
    "\n",
    "* Use different batch size from 10 to 400\n",
    "* Try different values of the learning rate (between 0.001 and 10), and see how these impact the trainig process. Do all network architectures react the same way to different learning rates?\n",
    "* Change the duration of the training by increasing the number of epochs\n",
    "* Try other optimizers, such as [Adam](https://pytorch.org/docs/stable/optim.html?highlight=adam#torch.optim.Adam) or [RMSprop](https://pytorch.org/docs/stable/optim.html?highlight=rmsprop#torch.optim.RMSprop)\n",
    "\n",
    "**Note:** These changes may interact with your previous choices of architectures, and you may need to change them as well!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3: Impact of the loss function\n",
    "\n",
    "The current model uses a mean square error (MSE) loss. While this loss can be used in this case, it is now rarely used for classification, and instead a Binary Cross Entropy (BCE) is used. It consists in interpreting the output of the network as the probability $p(y | x)$ of the point $x$ to belong to the class $y$, and in maximizing the probability to be correct for all samples $x$, that is, in maximizing $\\displaystyle \\prod_{(x,y) \\in Dataset} p(y|x)$. Applying $-\\log$ to this quantity, we obtain the following criterion to minimize:\n",
    "\n",
    "$$ \\sum_{(x,y) \\in Dataset} - \\log p(y | x) $$\n",
    "\n",
    "This is implemented as such by the [BCELoss](https://pytorch.org/docs/stable/nn.html?highlight=bce#torch.nn.BCELoss) of pytorch. Note that this criterion requires its input to be a probability, i.e. in $[0,1]$, which requires the use of an appropriate activation function beforehand, e.g., a sigmoid.\n",
    "\n",
    "It turns out that, for numerical stability reasons, it is better to incorporate this sigmoid and the BCELoss into a single function; this is done by the [BCEWithLogitsLoss](https://pytorch.org/docs/stable/nn.html?highlight=bcewithlogit#torch.nn.BCEWithLogitsLoss). Try to replace the MSE by this one and see how this changes the behavior in the network. This can also interact with the changes of the two previous exercices.\n",
    "\n",
    "**Note:** As a consequence, when using the BCEWithLogitsLoss, the last layer of your network should not be followed by an activation function, as BCEWithLogitsLoss already adds a sigmoid."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 4: Prediction on test set\n",
    "\n",
    "Once you have a model that seems satisfying on the validation dataset, you SHOULD evaluate it on a test dataset that has never been used before, to obtain a final accuracy value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here is a test dataset. Use it similarly to the validaiton dataset above\n",
    "# to compute the final performance of your model\n",
    "X_test, y_test = generate_data(500)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}