Training a CNN
This section walks through a common deep learning task - training a Neural Network. WekaDeeplearning4j allows you to do this in one of two ways, both of which will be explained in this section: - Design your own architecture, specifying a custom layer setup - Use a well-known pre-defined architecture from the Model Zoo. Most models from the Model Zoo have the added benefit of a pre-trained option, which tends to improve generality and reduce the training required on your end to achieve good performance.
All datasets/models referenced in this tutorial can be found in the asset pack.
Starting Simple - The Iris Dataset (Tabular)
A very common dataset to test algorithms with is the Iris Dataset, a simple 4-attribute classification dataset. Although this is often used to demonstrate classical ML algorithms (e.g., boosting, decision trees), neural networks can also be applied to this style of dataset.
Dataset Sample
Dataset Distribution Visualisation
We're going to start with a simple network, progressively adding layers and noting the effect this has on performance & model size.
GUI
- Open the Weka Explorer, click
Open file...
and open theiris.arff
dataset. - Switch to the
Classify
panel, clickChoose
, and selectfunctions
>Dl4jMlpClassifier
as the classifier. - Click
Start
to begin training. You should see the Progress Manager show a progress bar indicating the progress and ETA. By default the classifier only has one layer (the output layer) but this does have trainable weights so can be fit to the dataset. Note the cross-validated accuracy.
We're now going to start adding some layers:
- Next to
Choose
, click theDl4jMlpClassifier
box to open the classifier options - Click
layer specification
to open theGenericArrayEditor
. From here we can edit the model's layers. - Click
Choose
and select theDenseLayer
. - Open the
DenseLayer
settings and setnumber of outputs
to32
. - Click
Add
to add the layer. You should see it appear above theOutputLayer
- Click
OK
to save the configuration and clickStart
again to see how the accuracy improves.
You'll note we've gone from 15 parameters, to 259 after only adding a single layer - neural networks can balloon in size very quickly as we add more layers!
- Finally add one more
DenseLayer
with8
outputs.
Train the model again and you should see a minor improvement in accuracy again, but not as much as before. If you don't see an improvement don't worry - the point at this part of the tutorial isn't to get accurate classification, but rather as a mild introduction to creating neural networks.
You may notice it training the model many times; by default WEKA performs 10-fold cross-validation on your data. This trains 10 different classifiers on subsets of the data, then averages their performance for the final summary result. This is fine for smaller datasets & models but as we continue this tutorial and train larger models it may take too long. For the sake of this tutorial, a faster method is to do a training/test split on the loaded dataset.
- In
Test options
, selectPercentage split
and set the%
field to80
. This will train the model on 80% of the data once, evaluating it on the remaining 20%.
Command Line
$ java weka.Run .Dl4jMlpClassifier \
-layer "weka.dl4j.layers.DenseLayer -nOut 8" \
-layer "weka.dl4j.layers.DenseLayer -nOut 32 " \
-layer "weka.dl4j.layers.OutputLayer -nOut" \
-t $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/iris.arff
-split-percentage 80
Java
The same architecture can be built programmatically with the following Java code
// Load all packages so that Dl4jMlpFilter class can be found using forName("weka.filters.unsupervised.attribute.Dl4jMlpFilter")
weka.core.WekaPackageManager.loadPackages(true);
// Load the dataset
weka.core.Instances data = new weka.core.Instances(new FileReader("datasets/nominal/iris.arff"));
data.setClassIndex(data.numAttributes() - 1);
String[] classifierOptions = weka.core.Utils.splitOptions("-S 1 -numEpochs 10 -layer \"weka.dl4j.layers.OutputLayer -activation weka.dl4j.activations.ActivationSoftmax -lossFn weka.dl4j.lossfunctions.LossMCXENT\"");
weka.classifiers.AbstractClassifier myClassifier = (AbstractClassifier) weka.core.Utils.forName(weka.classifiers.AbstractClassifier.class, "weka.classifiers.functions.Dl4jMlpClassifier", classifierOptions);
// Stratify and split the data
Random rand = new Random(0);
Instances randData = new Instances(data);
randData.randomize(rand);
randData.stratify(3);
Instances train = randData.trainCV(3, 0);
Instances test = randData.testCV(3, 0);
// Build the classifier on the training data
myClassifier.buildClassifier(train);
// Evaluate the model on test data
Evaluation eval = new Evaluation(test);
eval.evaluateModel(myClassifier, test);
// Output some summary statistics
System.out.println(eval.toSummaryString());
System.out.println(eval.toMatrixString());
Starting Simple - The MNIST Dataset (Image)
This next section will walkthrough training a neural network on an image-classification dataset, using a cutdown version of the MNIST dataset, which provides images of handwritten digits of 10 classes (0-9). We'll also introduce you to the WekaDeeplearning4j Model Zoo.
GUI
- Back in the
Preprocess
panel, open themnist.meta.minimal.arff
dataset.
Important note: You'll notice that the format of this dataset is quite different to the previous one. Whereas iris.arff
specified all attributes explicitly, this version of MNIST only specifies the filename and its associated class; this is a much more common use case (especially for image classification) so it's important to understand how to work with this in WekaDeeplearning4j.
Going to the Classify
panel, if you click Start
WEKA will throw an error (...Dl4jMlpClassifier: Cannot handle string attributes
). This is because we're using a 'meta-style' dataset which only contains filenames. We need to configure the Dl4jMlpClassifier
further to handle this new type of dataset and tell it what to do with these filenames.
- In the Dl4jMlpClassifier
settings, change the instance iterator
from DefaultInstanceIterator
to ImageInstanceIterator
.
- In the ImageInstanceIterator
settings, set the directory of images
to mnist-minimal/images/
, and the size of mini batch
to 64 (you can reduce this if you run into out-of-memory issues).
- Delete the 2 layers we previously added (leaving only the
OutputLayer
) and clickStart
to train your model.
Command Line
$ java weka.Run .Dl4jMlpClassifier \
-iterator ".ImageInstanceIterator \
-imagesLocation $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist-minimal -bs 64" \
-t $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist.meta.minimal.arff \
-split-percentage 80
You should get an accuracy ~71% which is OK considering the model is only using the single output layer. We won't go any further into custom architectures at this point but instead look at using a model from the WekaDeeplearning4j Model Zoo.
Fine-tuning a Zoo Model
WekaDeeplearning4j's Model Zoo contains a set of predefined state-of-the-art architectures which can be used on your dataset out-of-the-box. In addition, most of them include pre-trained weights so they often require less training to converge than training a model from scratch (with randomly-initialized weights).
As we're training on MNIST, it would make sense to use a pretrained model suited for this dataset. The LeNet architecture is one of the earliest CNN architectures proposed and was initially done so for use on the MNIST dataset. The Dl4jLeNet
model included in WekaDeeplearning4j makes this (relatively) basic CNN architecture available, which we'll use for this section.
Note that the first time this is run the pretrained weights may need to be downloaded in which case actual runtime will be longer. These weights are cached locally so subsequent runs are faster.
GUI
- In the
Dl4jMlpClassifier
settings change thezooModel
fromCustomNet
toDl4jLeNet
. - That's all you need to change! Click
Start
to begin fine-tuning the pretrainedDl4jLeNet
model.
You should get a marked improvement in accuracy over the previous (default layer-setup) model of ~88% with a single click - using pretrained models couldn't get much easier!
Commandline
$ java weka.Run .Dl4jMlpClassifier \
-iterator ".ImageInstanceIterator \
-imagesLocation $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist-minimal -bs 64" \
-zooModel ".Dl4jLeNet" \
-t $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist.meta.minimal.arff \
-split-percentage 80
Fine-tuning a model on a Custom Dataset
The previous section is a useful walkthrough if you're training a model on a dataset which already has an .arff
file correctly defined. For most use-cases 'in the wild', however, this is not the case; a more common case for image classification is a 'folder organised' dataset format - images are sorted into subfolders, with the class name being the subfolder name. This format is intuitive and easy to work with but cannot be loaded directly into WEKA without a few extra steps.
The dataset used in this tutorial is from the Plant Seedlings Classification competition on Kaggle, which is supplied in the tutorial asset folder.
Loading a dataset in without an .arff
file - Introduction to the ImageDirectoryLoader
This dataset is uses a common 'folder organised' format - images are sorted into subfolders, with the class name being the subfolder name. This format is intuitive and easy to work with but cannot be loaded directly into WEKA without further processing.
WekaDeeplearning4j comes with the ImageDirectoryLoader
, a simple tool which creates an .arff
file from a 'folder organised' dataset.
GUI Usage
The ImageDirectoryLoader
can be invoked by selecting a folder instead of a file from the Open file...
menu, from the Preprocess panel.
- Click
Open File...
and navigate to theimages/
folder in the Plant Seedlings dataset.
- Click
Ok
, then on the next dialog clickChoose
and choose theImageDirectoryLoader
.
- There are no settings to change so simply click
OK
to run - you should be taken back to thePreprocess
panel with your instances now loaded.
The ImageDirectoryLoader loads the dataset into the 'meta' format similar to the MNIST dataset we used above.
Commandline Usage
The tool can also be run from the command line
java weka.Run .ImageDirectoryLoader -i <input dataset path> -name <output arff filename>
e.g.:
java weka.Run .ImageDirectoryLoader -i /path/to/plant-seedlings/data/train -name plant-seedlings-train.arff
The associated meta .arff
file has been created at the path specified and can be loaded into WEKA just as any other .arff
file.
As we're simply checking accuracy within WEKA, we won't load in the test/
data and submit it to Kaggle - that is outside the scope of this tutorial.
Training - GUI
Now that we've loaded our custom dataset, we can get back to setting up our model.
- If you haven't already, set
Dl4jMlpClassifier
as theClassifier
and select theImageInstanceIterator
asinstance iterator
. - In the
ImageInstanceIterator
settings, set theimages location
to theimages/
directory in theplant-seedlings
dataset folder. - As we'll be using a larger model than before, your machine may not be able to handle a
batch size
of64
. Set it to8
for now. We'll be using a pretrained model (which has a fixed input size) so the width, height, and number of channels don't need to be set.
For the sake of this example, we'll use the tried-and-tested ResNet architecture.
- Select
KerasResNet
from thezooModel
option.
Note that by default, the layer specification is not loaded in the GUI for usability reasons;
loading the layers every time an option is changed can slow down the GUI significantly. If, however, you'd like
to view the layers of the zoo model you've selected, set the Preview zoo model layer specification in GUI
flag to true.
The above setup, trained for 20 epochs with a batch size of 16 produces a classification accuracy of 94.51% on the test data (your results may vary).
Correctly Classified Instances 1497 94.5076 %
Incorrectly Classified Instances 87 5.4924 %
Kappa statistic 0.9392
Mean absolute error 0.01
Root mean squared error 0.0894
Relative absolute error 6.6502 %
Root relative squared error 32.5587 %
Total Number of Instances 1584
=== Confusion Matrix ===
a b c d e f g h i j k l <-- classified as
44 0 0 0 6 0 38 0 0 0 0 0 | a = Black-grass
0 128 0 0 0 1 0 0 0 1 0 0 | b = Charlock
0 0 92 2 0 0 0 0 0 0 0 2 | c = Cleavers
0 1 0 202 0 0 1 0 0 0 0 0 | d = Common Chickweed
2 0 0 0 69 0 2 0 0 0 0 1 | e = Common wheat
1 0 1 0 1 154 0 0 0 0 0 1 | f = Fat Hen
11 0 0 0 0 0 206 0 1 0 0 0 | g = Loose Silky-bent
0 0 0 0 0 0 3 71 0 0 0 0 | h = Maize
1 0 0 0 0 0 0 0 171 0 0 0 | i = Scentless Mayweed
0 0 0 1 0 0 0 0 5 71 0 0 | j = Shepherds Purse
0 0 0 0 0 0 1 0 0 0 164 0 | k = Small-flowered Cranesbill
0 1 0 2 0 0 0 0 0 0 0 125 | l = Sugar beet
Training - Commandline
$ java weka.Run \
.Dl4jMlpClassifier \
-S 1 \
-iterator ".ImageInstanceIterator -imagesLocation plant-seedlings/data/train -bs 8" \
-zooModel ".KerasResNet" \
-numEpochs 1 \
-t plant-seedlings/data/train/plant-seedlings-train.arff \
-split-percentage 80
// Load all packages so that Dl4jMlpClassifier class can be found using forName("weka.filters.unsupervised.attribute.Dl4jMlpClassifier")
weka.core.WekaPackageManager.loadPackages(true);
// Load the dataset
weka.core.Instances data = new weka.core.Instances(new FileReader("/home/rhys/Documents/git/kaggle-competitions/plant-seedlings/data/train/output.arff"));
data.setClassIndex(data.numAttributes() - 1);
String[] classifierOptions = weka.core.Utils.splitOptions("-iterator \".ImageInstanceIterator -imagesLocation /home/rhys/Documents/git/kaggle-competitions/plant-seedlings/data/train -bs 16\" -zooModel \"weka.dl4j.zoo.KerasResNet -variation RESNET152V2\" -config \"weka.dl4j.NeuralNetConfiguration -updater \\\"weka.dl4j.updater.Adam -lr 0.1\\\"\" -numEpochs 20");
weka.classifiers.AbstractClassifier myClassifier = (AbstractClassifier) weka.core.Utils.forName(weka.classifiers.AbstractClassifier.class, "weka.classifiers.functions.Dl4jMlpClassifier", classifierOptions);
// Stratify and split the data
Random rand = new Random(0);
Instances randData = new Instances(data);
randData.randomize(rand);
randData.stratify(5);
Instances train = randData.trainCV(5, 0);
Instances test = randData.testCV(5, 0);
// Build the classifier on the training data
myClassifier.buildClassifier(train);
// Evaluate the model on test data
Evaluation eval = new Evaluation(test);
eval.evaluateModel(myClassifier, test);
// Output some summary statistics
System.out.println(eval.toSummaryString());
System.out.println(eval.toMatrixString());
Rapid Prototyping
One difficulty with rapid prototyping in ML is that large datasets & models can increase your iteration time, slowing down development. Following are some useful techniques to avoid this:
Dataset Resampling
Using the Resample
filter, we can randomly take out instances from our dataset while roughly preserving the class balance (how many images are in each category).
- In the
Preprocess
panel, select theResample
filter:
- Set the
sampleSizePercent
to a value <100
:
- Click
Apply
to resample the dataset down to your specified size.
Reduce Training Epochs
As we're getting to modern CNN architectures, their size becomes large and training takes a long time. We can reduce the training time during experimentation, and then easily scale it back up once we've decided on the configuration.
- In the
Dl4jMlpClassifier
settings, changenumber of epochs
to1
We're now ready for a short training run.
- Keeping the same
80%
split, clickStart
Even with the smaller dataset, training on CPU with such a sizable model (KerasResNet
) can take a while (~20-30 minutes). Once you've got the model to begin training successfully, feel free to move on to the other sections of the tutorial if you find yourself waiting here for a long time.
Further Experiments
We've introduced a few new features already, so now is the time to do some exploration of your own. Below are some suggestions for things to try out - try improve the accuracy over what we achieved with defaults.
- Custom Dataset - if you've got your own dataset, try loading that into WEKA for your exploration. If not, feel free to use one of the other datasets provided in the asset pack.
- Custom Layer architecture - Designing neural network architectures is an open research area which we only briefly touched on so now's the time to try creating your own model and getting a feel for CNNs. Start with a blank
Dl4jMlpClassifier
and create your own architecture; you may like to refer to the model summaries for some inspiration. - Different Zoo Models - After trying to create your own model, try out some of the different models (and variations) available in the Model Zoo.
- Hyperparameter tuning - Part of becoming a competent deep learning practitioner is having an intuition for training hyperparameters and the effect it has on a model's overall accuracy. Try altering some parts of the
Dl4jMlpClassifier
config, e.g. with your custom model architecture, try different numbers of epochs (10, 20, 30, 40, 50). You can also try theGridSearch
andMultiSearch
packages in WEKA to automatically search for the optimal hyperparameters.