Bacteria Classification Introduction

    The following tutorial outlines the most important pieces of my work classifying bacteria from Raman Spectral Data with neural networks.

    The basis of our research, as well as the data required for model training and testing, comes from a study published by reasearchers at Stanford University. The Stanford team both collected close to 70,000 bacteria samples, and created a 26-layer residual neural network which achieved 82% accuracy on a 30-isolate classification task. Our goal was first to replicate their results, and then improve on them.

    Throughout our summer, we were able to replicate the Stanford ResNet, create a modular version which allowed for hyperparameter tuning, and use that modular version to discover a simpler, more effective network, which uses 4 convolutional layers and no residual connections. This network is able to run significantly faster, and achieves an accuracy of ~86%. With further tuning, possibly automated, we may be able to increase this accuracy level even further.

Part 1: Background

    The first and possibly the most important step in creating efficient neural networks is to understand the theory of each different type of network. The better you can picture what your network is doing behind the scenes, the more educated your adjustments to hyperparameters and network architecture will be.

    After learning about the theory behind these networks, you also need to be comfortable with whatever environment you are designing your network in, so that you can be confident in applying those adjustments. In our case, the environment is going to be Tensorflow 2, running on Google’s Colaboratory platform. This allows us to train our networks remotely on more powerful GPUs, and take advantage of Google Drive integration.

    The following tutorials should build background on these types of neural networks, as well as comfort designing them in Tensorflow (although doing your own research to fill in knowledge gaps is very important):

Part 2: The Exploration Model

    The first task is recreating the Stanford Network, which is a 26-Layer Residual Neural Network. However, since we are planning on improving the network's abilities, we must also make the network as modular as possible, so we can experiment with different hyperparameters and architectures. The model we create will allow for adjustment of not only hyperparameters, but the basic network architecture itself. This will allow us to find some 'rules of thumb' for the best hyperparameters and architecture, and hone in on a more final and less modular model.

    The 26-Layer network is a ResNet organized in blocks. An initial convolution is followed by 4 residual blocks, each containing 6 convolutional layers. Skip connections, in this case either a simple addition or a convolution between the block input and output, allow for residual blocks to be skipped if they are too deep for the network.

    The first task is to create a ResNet class, and within it a few methods for creating and training our network. The class should include the following methods:


    After importing and declaring the class, we declare the residual_block method, which applies a single residual block to an input tensor and returns an output tensor. First, the header includes all the parameters the residual_block function needs:

    With the method declared, we can move on to applying the main 'branch' of the block. This looks like creating any other CNN, and includes everything in the block except for the shortcut 'skip' connection.

    The section starts by initializing and modifying a couple variables, seen on lines 1-4. Line 1 reduces the number of filters to 1/4 of their original, which is applied for all but the last convolutional layer. This 'bottleneck' reduces the size of the network without losing much information. The next 2 lines save the state of the input tensor in 2 places: in the shortcut variable (to be saved until the end of the method, when the skip connection is applied) and in the x tensor, which runs through the main branch of the block.

    We then iterate through as many convolutional layers as passed in through the conv_layers parameter. Each iteration includes an activation and batch normalization layer, followed by a convolutional layer, following a pre-activation structure. If dropout is enabled, a dropout layer is also included, which randomly removes some layer outputs to reduce over-fitting.

    After finishing the main branch layers, we can add the shortcut tensor to the processed x tensor.

    If the output shape (of the x tensor) is different than that of the input shape (of the shortcut tensor), a convolution can be added to the shortcut layer, outputting the correct shape to match the x tensor, shown in lines 2-7. This is controlled by the boolean parameter shortcut_reduce.

    With a method capable of creating an individual block, we can implement a higher-level build method that takes in all the parameters describing a ResNet, and returns a compiled model.

    As with the residual_block method, the build method includes the following parameters. Having control over the parameters at the level of the build function and even higher lets us tune hyperparameters much easier, as they are in one, accessible place.

    Before moving into building the network, we can also add some assertions (lines 12-14) to check the lengths of the hyperparamter lists passed in: conv_layers, filters, and kernel_sizes. The last two should be equal in length to residual_blocks, since they each contain one item per residual block. The conv_layers list has a length one greater than residual_blocks, because it includes one item per residual block as well as one item for the intial convolution.

    Building the network is fairly simple, since we abstracted most of the network-building code to the residual_block function. It involves defining an initial convolutional layer, with normalization and activation, and then iterating through and creating each residual block with the correct parameters, displayed on lines 26-42.

    Finally an average pooling layer reduces the size of the flatten-dense-softmax classifier which follows it, to a degree defined by the pooling_size parameter. The flatten layer reduces the tensor output by the residual blocks into a 1D tensor, which connects via a dense (fully-connected) layer to a softmax activation function, which outputs a probability distribution indiciating the network's choice for each of the 30 classes.

    The last two methods are fairly similar: the pretraining method and the finetuning method.

    These methods serve mostly as wrappers which process data, call the build function, and train, so their parameters are almost identical to that of the build function, with the addition off:

    Both methods start by creating a train/test split (lines 7-8 and 34-35), which allows validation accuracy from within the reference training and finetuning training files, for the sake of debugging. For example, when the network gets ~90% validation accuracy from within the reference dataset, but only achieves ~50% accuracy on the testing data, it is a good indication that either the network does not generalize well or there is a significant difference between reference and testing data.

    A depth channel also must be added to the data (lines 11-12 and 38-39), because the Keras input layer expects a depth dimension. In our case this is simply 1.

    With the ready for training and testing, the last task for the pretrain method is inputting parameters into the build, fit, and evaluate functions.

    The finetune method has slightly more work to do, as it must load the model from the save directory (line 42) and if freeze_convs is true, iterate through each layer and set the trainable parameter to false for all but the last 2 layers (lines 44-48). Once this is complete, then it must recompile the model (with a new, often lower, learning rate), train it, and evaluate it.

    This completes the ResNet class, and leaves only some runner code to write. This runner code must load the data (which will likely be specific to the environment and file structure you are using), shuffle it, call pretraining and finetuning methods, and finally evaluate on testing data.

    Lines 1-13 load files found in the Stanford database into numpy arrays, in my case from Google Drive, since I am working in the Google Colab environment. You will likely have to change these depending on how your environment is set up, but any method which gets the required files (X_test, y_test, X_ref, y_ref, X_fine, y_fine) into numpy arrays should work.

    This is followed by more generic code to shuffle the arrays (lines 15-31). This is extremely important if the data you are using is ordered; we don't want our network to relate relative location of a sample in the database to its class, because this relationship won't hold up to generalization. Shuffling our data is a little bit more complicated than just running a single np.random.shuffle() operation, because the X and y arrays for each set of files (reference, fine, and test) must be shuffled together. This can be done with 3 separate index lists, with 1 for each set of files. If we randomly shuffle these index lists, then they can be used to reorder the X and y arrays in unison.

    The most important sections of this runner code are lines 33-47 and 60-63, which set the hyperparameters for the entire network. The hyperparameters currently loaded into the network are manually optimized, and can achieve ~82% accuracy. They are loosely based on the 26-layer Stanford network, although there are some key differences between the designs. Some accuracy could probably be gained by more thoroughly tuning these parameters, but as this network is too large for the problem at hand, there is not much point in delving deeper here. We will focus most of our hyperparameter tuning energy into the more managable 4-layer network.

    The output of this network on a single run is shown below (with model.summary results hidden for the sake of space):

Epoch 1/5
594/594 [==============================] - 44s 74ms/step - loss: 0.8429 - accuracy: 0.7956
Epoch 2/5
594/594 [==============================] - 44s 73ms/step - loss: 0.3735 - accuracy: 0.9084
Epoch 3/5
594/594 [==============================] - 44s 74ms/step - loss: 0.3160 - accuracy: 0.9237
Epoch 4/5
594/594 [==============================] - 44s 74ms/step - loss: 0.2851 - accuracy: 0.9304
Epoch 5/5
594/594 [==============================] - 44s 73ms/step - loss: 0.2613 - accuracy: 0.9377
6/6 [==============================] - 0s 19ms/step - loss: 0.2391 - accuracy: 0.9467
Pretraining validation accuracy: 0.9466666579246521
30/30 [==============================] - 1s 20ms/step - loss: 3.8406 - accuracy: 0.5717
Pretrain Testing Accuracy 0.5716666579246521
Epoch 1/25
29/29 [==============================] - 2s 72ms/step - loss: 1.3590 - accuracy: 0.6839
Epoch 2/25
29/29 [==============================] - 2s 71ms/step - loss: 0.5091 - accuracy: 0.8611
Epoch 3/25
29/29 [==============================] - 2s 71ms/step - loss: 0.3715 - accuracy: 0.9130
Epoch 4/25
29/29 [==============================] - 2s 71ms/step - loss: 0.2723 - accuracy: 0.9470
Epoch 5/25
29/29 [==============================] - 2s 71ms/step - loss: 0.1946 - accuracy: 0.9688
Epoch 6/25
29/29 [==============================] - 2s 72ms/step - loss: 0.1518 - accuracy: 0.9811
...
29/29 [==============================] - 2s 72ms/step - loss: 0.0702 - accuracy: 1.0000
Epoch 24/25
29/29 [==============================] - 2s 71ms/step - loss: 0.0695 - accuracy: 1.0000
Epoch 25/25
29/29 [==============================] - 2s 72ms/step - loss: 0.0686 - accuracy: 1.0000
2/2 [==============================] - 0s 7ms/step - loss: 0.3321 - accuracy: 0.9000
Finetuning Validation Accuracy: 0.8999999761581421
30/30 [==============================] - 1s 20ms/step - loss: 0.8393 - accuracy: 0.8387
Pretraining Testing Accuracy:  0.5716666579246521
Fintuning Testing Accuracy:  0.8386666774749756

    Although it turns out that this model is overkill by a fairly huge margin, it is still extremely modular and could be used for other applications which require a deeper network. Given enough time and computing power, it might also be possible to auto-tune the hyperparameters with Bayesian Optimization or a similar algorithm to increase accuracy, especially since the network is so modular, allowing us to set not only traditional hyperparameters (learning rate, convolution filters, kernel size, etc.) but also the architecture of the network itself (number of convolutions, number of residual blocks). Some other adjustments which are not easy to control, and I didn't have time to explore might also give higher accuracy (strides and padding, different activations and optimizers, different data preprocessing algorithms, etc.)

Part 3: The 4-Layer Network

    As our group experimented with a wide variety of network architectures and hyper-parameters, we noticed that the number of convolutional layers per residual block, as well as the number of residual blocks, played a very small role in determining accuracy. Taking this to the extreme, a network with 1 residual block and 1 convolution within that residual block still achieves 84% accuracy, in significantly less time than the larger networks.

    This was the first step towards realizing that the 26-layer ResNet we spent so long developing was not actually necessary for this problem; a much smaller and simpler network could achieve similar (and eventually better after tuning) levels of accuracy. Since the larger network had residual 'skip' connections, it is safe to assume that the over-complexity was hidden, as the network just learned to go around all the layers which weren't necessary. 

    We found that an ideal network for this problem was convolutional but not residual, and includes 4 layers. 3 convolutions, followed by pooling/flattening, and then 1 dense classier layer make up the 4-layer network. The ConvNet class and runner code serve to build this network, as well as generalize it just a little bit to work for any number of convolutional layers.

    Much like our previous ResNet build, we begin with import statements and a class definition. This time, though, we can skip straight to the build method, because we do not need to define residual blocks.

This build method returns a model, taking in the following parameters, through **kwargs:

    Using kwargs for keywords is a major improvement over my previous method of writing out each keyword argument. I will use kwargs in the future, and I am sure there are more efficient implementations than I have shown here.    

Moving on to the contents of the build method, much simpler than in the ResNet class:

    The build method again begins with some assert statements, as well as defining the batchnorm axis, to ensure that our arguments were passed in correctly and are of correct length (lines 1-4).

    We then use a short for loop (lines 10-13) to iterate through each of our convolutions, and apply activation and batch normalization.

    The final section (lines 15-19) adds some pooling, followed by a fully connected classifier. This classifier is the only section which is trained during finetuning, and contains by far the most weights and biases. Although the convolutions help to convert raw data into readable features, they act more as data preprocessing than actual classification, and emulate the sort of feature-extraction which was previously hard-coded by programmers.

    The other function in the ConvNet class is a train method, which combines finetuning and pretraining, controlling which one is performed with a finetune boolean.

    The train method combines hardcoded keyword arguments and the **kwargs argument, simply because I did not completely convert my code to the dictionary method. The arguments encoded in the kwargs dictionary are those which apply only to pretraining (i.e. the build method arguments), and not to finetuning. Since the argument dictionary length is variable, we can pass in None values when finetuning, and since they are never used no errors are thrown. When pretraining, we can simply pass on the kwargs dictionary to the build method

    Besides the arguments for the build method, which I have already described, the train method takes in some arguments which apply both to pretraining and finetuning:

    The rest of the build method is for the most part self-explanatory. If finetuning is set to True, a model is loaded from save_directory (line 11), and all but the last 2 layers, which make up the classifier, are frozen (lines 13-15).

    If finetuning is set to False, the input_shape and classes arguments are calculated (lines 18-19), and those along with the kwargs dictionary get passed to the build function (line 21), which returns a model.

    Whichever conditional the uncompiled model comes from, it is compiled with the Adam optimizer and a max learning rate set by the learning_rate argument,a debugging summary is printed, and the model is trained on the features and labels passed in (lines 24-26).

    That concludes the ConvNet class, leaving only a bit of runner code. The data loading and shuffling is identical to that of the ResNet class, and the hyperparameter definitions are almost identical as well.

    To make passing arguments even easier, the pretraining arguments are first declared in a dictionary called kwargs (lines 39-46).

    With data loaded and parameters initialized, running pretraining and finetuning is as simple as passing in parameters.

    When creating the two models, pretraining and finetuning, the only differentiation between the two is that the pretraining model sets the finetune boolean to false and save_weights to true, while the finetuning model does the opposite (lines 3-12). The pretraining model also gets the additional kwargs dictionary, which includes arguments to define the model when it is first created, but which are irrelevant to finetuning.

    Finally, we simply print out the evaluated accuracies along with runtime. A network with these hyperparameters we set gets ~86% accuracy, ± ~0.5%. The output of one run of the network is shown below.  

Model: "convnet"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_20 (InputLayer)        [(None, 1000, 1)]         0         
_________________________________________________________________
conv1d_54 (Conv1D)           (None, 1000, 10)          50        
_________________________________________________________________
activation_81 (Activation)   (None, 1000, 10)          0         
_________________________________________________________________
batch_normalization_63 (Batc (None, 1000, 10)          40        
_________________________________________________________________
conv1d_55 (Conv1D)           (None, 1000, 25)          1250      
_________________________________________________________________
activation_82 (Activation)   (None, 1000, 25)          0         
_________________________________________________________________
batch_normalization_64 (Batc (None, 1000, 25)          100       
_________________________________________________________________
conv1d_56 (Conv1D)           (None, 1000, 25)          3125      
_________________________________________________________________
activation_83 (Activation)   (None, 1000, 25)          0         
_________________________________________________________________
batch_normalization_65 (Batc (None, 1000, 25)          100       
_________________________________________________________________
average_pooling1d_18 (Averag (None, 125, 25)           0         
_________________________________________________________________
flatten_18 (Flatten)         (None, 3125)              0         
_________________________________________________________________
dense_18 (Dense)             (None, 30)                93780     
_________________________________________________________________
activation_84 (Activation)   (None, 30)                0         
=================================================================
Total params: 98,445
Trainable params: 98,325
Non-trainable params: 120
_________________________________________________________________
Epoch 1/10
  1/600 [..............................] - ETA: 10s - loss: 3.6628 - accuracy: 0.0100WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0090s vs `on_train_batch_end` time: 0.0149s). Check your callbacks.
600/600 [==============================] - 11s 18ms/step - loss: 0.5143 - accuracy: 0.8777
Epoch 2/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2950 - accuracy: 0.9366
Epoch 3/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2728 - accuracy: 0.9419
Epoch 4/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2561 - accuracy: 0.9458
Epoch 5/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2415 - accuracy: 0.9502
Epoch 6/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2349 - accuracy: 0.9511
Epoch 7/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2327 - accuracy: 0.9522
Epoch 8/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2284 - accuracy: 0.9530
Epoch 9/10
600/600 [==============================] - 11s 18ms/step - loss: 0.2215 - accuracy: 0.9562
Epoch 10/10
600/600 [==============================] - 10s 17ms/step - loss: 0.2198 - accuracy: 0.9563
30/30 [==============================] - 0s 6ms/step - loss: 4.6145 - accuracy: 0.4553
Model: "convnet"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_20 (InputLayer)        [(None, 1000, 1)]         0         
_________________________________________________________________
conv1d_54 (Conv1D)           (None, 1000, 10)          50        
_________________________________________________________________
activation_81 (Activation)   (None, 1000, 10)          0         
_________________________________________________________________
batch_normalization_63 (Batc (None, 1000, 10)          40        
_________________________________________________________________
conv1d_55 (Conv1D)           (None, 1000, 25)          1250      
_________________________________________________________________
activation_82 (Activation)   (None, 1000, 25)          0         
_________________________________________________________________
batch_normalization_64 (Batc (None, 1000, 25)          100       
_________________________________________________________________
conv1d_56 (Conv1D)           (None, 1000, 25)          3125      
_________________________________________________________________
activation_83 (Activation)   (None, 1000, 25)          0         
_________________________________________________________________
batch_normalization_65 (Batc (None, 1000, 25)          100       
_________________________________________________________________
average_pooling1d_18 (Averag (None, 125, 25)           0         
_________________________________________________________________
flatten_18 (Flatten)         (None, 3125)              0         
_________________________________________________________________
dense_18 (Dense)             (None, 30)                93780     
_________________________________________________________________
activation_84 (Activation)   (None, 30)                0         
=================================================================
Total params: 98,445
Trainable params: 93,780
Non-trainable params: 4,665
_________________________________________________________________
Epoch 1/10
100/100 [==============================] - 0s 4ms/step - loss: 0.9957 - accuracy: 0.7830
Epoch 2/10
100/100 [==============================] - 0s 4ms/step - loss: 0.3138 - accuracy: 0.9283
Epoch 3/10
100/100 [==============================] - 0s 4ms/step - loss: 0.1864 - accuracy: 0.9753
Epoch 4/10
100/100 [==============================] - 0s 4ms/step - loss: 0.1325 - accuracy: 0.9943
Epoch 5/10
100/100 [==============================] - 0s 4ms/step - loss: 0.1115 - accuracy: 0.9997
Epoch 6/10
100/100 [==============================] - 0s 4ms/step - loss: 0.1041 - accuracy: 1.0000
Epoch 7/10
100/100 [==============================] - 0s 4ms/step - loss: 0.0981 - accuracy: 1.0000
Epoch 8/10
100/100 [==============================] - 0s 4ms/step - loss: 0.0930 - accuracy: 1.0000
Epoch 9/10
100/100 [==============================] - 0s 4ms/step - loss: 0.0887 - accuracy: 1.0000
Epoch 10/10
100/100 [==============================] - 0s 4ms/step - loss: 0.0848 - accuracy: 1.0000
100/100 [==============================] - 0s 4ms/step - loss: 0.5874 - accuracy: 0.8577
Pretrained Accuracy: 0.45533332228660583
Fintuned Accuracy:  0.8576666712760925

 Completed in: 114.18s

Denzel Farmer