Training a CNN
This section walks through a common deep learning task - training a Neural Network. WekaDeeplearning4j allows you to do this in one of two ways, both of which will be explained in this section: - Design your own architecture, specifying a custom layer setup - Use a well-known pre-defined architecture from the Model Zoo. Most models from the Model Zoo have the added benefit of a pre-trained option, which tends to improve generality and reduce the training required on your end to achieve good performance.
All datasets/models referenced in this tutorial can be found in the asset pack.
Starting Simple - The Iris Dataset (Tabular)
A very common dataset to test algorithms with is the Iris Dataset, a simple 4-attribute classification dataset. Although this is often used to demonstrate classical ML algorithms (e.g., boosting, decision trees), neural networks can also be applied to this style of dataset.
Dataset Distribution Visualisation
We're going to start with a simple network, progressively adding layers and noting the effect this has on performance & model size.
- Open the Weka Explorer, click
Open file...and open the
- Switch to the
Choose, and select
Dl4jMlpClassifieras the classifier.
Startto begin training. You should see the Progress Manager show a progress bar indicating the progress and ETA. By default the classifier only has one layer (the output layer) but this does have trainable weights so can be fit to the dataset. Note the cross-validated accuracy.
We're now going to start adding some layers:
- Next to
Choose, click the
Dl4jMlpClassifierbox to open the classifier options
layer specificationto open the
GenericArrayEditor. From here we can edit the model's layers.
Chooseand select the
- Open the
DenseLayersettings and set
number of outputsto
Addto add the layer. You should see it appear above the
OKto save the configuration and click
Startagain to see how the accuracy improves.
You'll note we've gone from 15 parameters, to 259 after only adding a single layer - neural networks can balloon in size very quickly as we add more layers!
- Finally add one more
Train the model again and you should see a minor improvement in accuracy again, but not as much as before. If you don't see an improvement don't worry - the point at this part of the tutorial isn't to get accurate classification, but rather as a mild introduction to creating neural networks.
You may notice it training the model many times; by default WEKA performs 10-fold cross-validation on your data. This trains 10 different classifiers on subsets of the data, then averages their performance for the final summary result. This is fine for smaller datasets & models but as we continue this tutorial and train larger models it may take too long. For the sake of this tutorial, a faster method is to do a training/test split on the loaded dataset.
Test options, select
Percentage splitand set the
80. This will train the model on 80% of the data once, evaluating it on the remaining 20%.
$ java weka.Run .Dl4jMlpClassifier \ -layer "weka.dl4j.layers.DenseLayer -nOut 8" \ -layer "weka.dl4j.layers.DenseLayer -nOut 32 " \ -layer "weka.dl4j.layers.OutputLayer -nOut" \ -t $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/iris.arff -split-percentage 80
The same architecture can be built programmatically with the following Java code
// Load all packages so that Dl4jMlpFilter class can be found using forName("weka.filters.unsupervised.attribute.Dl4jMlpFilter") weka.core.WekaPackageManager.loadPackages(true); // Load the dataset weka.core.Instances data = new weka.core.Instances(new FileReader("datasets/nominal/iris.arff")); data.setClassIndex(data.numAttributes() - 1); String classifierOptions = weka.core.Utils.splitOptions("-S 1 -numEpochs 10 -layer \"weka.dl4j.layers.OutputLayer -activation weka.dl4j.activations.ActivationSoftmax -lossFn weka.dl4j.lossfunctions.LossMCXENT\""); weka.classifiers.AbstractClassifier myClassifier = (AbstractClassifier) weka.core.Utils.forName(weka.classifiers.AbstractClassifier.class, "weka.classifiers.functions.Dl4jMlpClassifier", classifierOptions); // Stratify and split the data Random rand = new Random(0); Instances randData = new Instances(data); randData.randomize(rand); randData.stratify(3); Instances train = randData.trainCV(3, 0); Instances test = randData.testCV(3, 0); // Build the classifier on the training data myClassifier.buildClassifier(train); // Evaluate the model on test data Evaluation eval = new Evaluation(test); eval.evaluateModel(myClassifier, test); // Output some summary statistics System.out.println(eval.toSummaryString()); System.out.println(eval.toMatrixString());
Starting Simple - The MNIST Dataset (Image)
This next section will walkthrough training a neural network on an image-classification dataset, using a cutdown version of the MNIST dataset, which provides images of handwritten digits of 10 classes (0-9). We'll also introduce you to the WekaDeeplearning4j Model Zoo.
- Back in the
Preprocesspanel, open the
Important note: You'll notice that the format of this dataset is quite different to the previous one. Whereas
iris.arff specified all attributes explicitly, this version of MNIST only specifies the filename and its associated class; this is a much more common use case (especially for image classification) so it's important to understand how to work with this in WekaDeeplearning4j.
Going to the
Classify panel, if you click
Start WEKA will throw an error (
...Dl4jMlpClassifier: Cannot handle string attributes). This is because we're using a 'meta-style' dataset which only contains filenames. We need to configure the
Dl4jMlpClassifier further to handle this new type of dataset and tell it what to do with these filenames.
- In the
Dl4jMlpClassifier settings, change the
instance iterator from
- In the
ImageInstanceIterator settings, set the
directory of images to
mnist-minimal/images/ , and the
size of mini batch to 64 (you can reduce this if you run into out-of-memory issues).
- Delete the 2 layers we previously added (leaving only the
OutputLayer) and click
Startto train your model.
$ java weka.Run .Dl4jMlpClassifier \ -iterator ".ImageInstanceIterator \ -imagesLocation $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist-minimal -bs 64" \ -t $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist.meta.minimal.arff \ -split-percentage 80
You should get an accuracy ~71% which is OK considering the model is only using the single output layer. We won't go any further into custom architectures at this point but instead look at using a model from the WekaDeeplearning4j Model Zoo.
Fine-tuning a Zoo Model
WekaDeeplearning4j's Model Zoo contains a set of predefined state-of-the-art architectures which can be used on your dataset out-of-the-box. In addition, most of them include pre-trained weights so they often require less training to converge than training a model from scratch (with randomly-initialized weights).
As we're training on MNIST, it would make sense to use a pretrained model suited for this dataset. The LeNet architecture is one of the earliest CNN architectures proposed and was initially done so for use on the MNIST dataset. The
Dl4jLeNet model included in WekaDeeplearning4j makes this (relatively) basic CNN architecture available, which we'll use for this section.
Note that the first time this is run the pretrained weights may need to be downloaded in which case actual runtime will be longer. These weights are cached locally so subsequent runs are faster.
- In the
Dl4jMlpClassifiersettings change the
- That's all you need to change! Click
Startto begin fine-tuning the pretrained
You should get a marked improvement in accuracy over the previous (default layer-setup) model of ~88% with a single click - using pretrained models couldn't get much easier!
$ java weka.Run .Dl4jMlpClassifier \ -iterator ".ImageInstanceIterator \ -imagesLocation $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist-minimal -bs 64" \ -zooModel ".Dl4jLeNet" \ -t $WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist.meta.minimal.arff \ -split-percentage 80
Fine-tuning a model on a Custom Dataset
The previous section is a useful walkthrough if you're training a model on a dataset which already has an
.arff file correctly defined. For most use-cases 'in the wild', however, this is not the case; a more common case for image classification is a 'folder organised' dataset format - images are sorted into subfolders, with the class name being the subfolder name. This format is intuitive and easy to work with but cannot be loaded directly into WEKA without a few extra steps.
The dataset used in this tutorial is from the Plant Seedlings Classification competition on Kaggle, which is supplied in the tutorial asset folder.
Loading a dataset in without an
.arff file - Introduction to the ImageDirectoryLoader
This dataset is uses a common 'folder organised' format - images are sorted into subfolders, with the class name being the subfolder name. This format is intuitive and easy to work with but cannot be loaded directly into WEKA without further processing.
WekaDeeplearning4j comes with the
ImageDirectoryLoader, a simple tool which creates an
.arff file from a 'folder organised' dataset.
ImageDirectoryLoader can be invoked by selecting a folder instead of a file from the
Open file... menu, from the Preprocess panel.
Open File...and navigate to the
images/folder in the Plant Seedlings dataset.
Ok, then on the next dialog click
Chooseand choose the
- There are no settings to change so simply click
OKto run - you should be taken back to the
Preprocesspanel with your instances now loaded.
The ImageDirectoryLoader loads the dataset into the 'meta' format similar to the MNIST dataset we used above.
The tool can also be run from the command line
java weka.Run .ImageDirectoryLoader -i <input dataset path> -name <output arff filename>
java weka.Run .ImageDirectoryLoader -i /path/to/plant-seedlings/data/train -name plant-seedlings-train.arff
The associated meta
.arff file has been created at the path specified and can be loaded into WEKA just as any other
As we're simply checking accuracy within WEKA, we won't load in the
test/ data and submit it to Kaggle - that is outside the scope of this tutorial.
Training - GUI
Now that we've loaded our custom dataset, we can get back to setting up our model.
- If you haven't already, set
Classifierand select the
- In the
ImageInstanceIteratorsettings, set the
images locationto the
images/directory in the
- As we'll be using a larger model than before, your machine may not be able to handle a
64. Set it to
8for now. We'll be using a pretrained model (which has a fixed input size) so the width, height, and number of channels don't need to be set.
For the sake of this example, we'll use the tried-and-tested ResNet architecture.
Note that by default, the layer specification is not loaded in the GUI for usability reasons;
loading the layers every time an option is changed can slow down the GUI significantly. If, however, you'd like
to view the layers of the zoo model you've selected, set the
Preview zoo model layer specification in GUI flag to true.
The above setup, trained for 20 epochs with a batch size of 16 produces a classification accuracy of 94.51% on the test data (your results may vary).
Correctly Classified Instances 1497 94.5076 % Incorrectly Classified Instances 87 5.4924 % Kappa statistic 0.9392 Mean absolute error 0.01 Root mean squared error 0.0894 Relative absolute error 6.6502 % Root relative squared error 32.5587 % Total Number of Instances 1584 === Confusion Matrix === a b c d e f g h i j k l <-- classified as 44 0 0 0 6 0 38 0 0 0 0 0 | a = Black-grass 0 128 0 0 0 1 0 0 0 1 0 0 | b = Charlock 0 0 92 2 0 0 0 0 0 0 0 2 | c = Cleavers 0 1 0 202 0 0 1 0 0 0 0 0 | d = Common Chickweed 2 0 0 0 69 0 2 0 0 0 0 1 | e = Common wheat 1 0 1 0 1 154 0 0 0 0 0 1 | f = Fat Hen 11 0 0 0 0 0 206 0 1 0 0 0 | g = Loose Silky-bent 0 0 0 0 0 0 3 71 0 0 0 0 | h = Maize 1 0 0 0 0 0 0 0 171 0 0 0 | i = Scentless Mayweed 0 0 0 1 0 0 0 0 5 71 0 0 | j = Shepherds Purse 0 0 0 0 0 0 1 0 0 0 164 0 | k = Small-flowered Cranesbill 0 1 0 2 0 0 0 0 0 0 0 125 | l = Sugar beet
Training - Commandline
$ java weka.Run \ .Dl4jMlpClassifier \ -S 1 \ -iterator ".ImageInstanceIterator -imagesLocation plant-seedlings/data/train -bs 8" \ -zooModel ".KerasResNet" \ -numEpochs 1 \ -t plant-seedlings/data/train/plant-seedlings-train.arff \ -split-percentage 80
// Load all packages so that Dl4jMlpClassifier class can be found using forName("weka.filters.unsupervised.attribute.Dl4jMlpClassifier") weka.core.WekaPackageManager.loadPackages(true); // Load the dataset weka.core.Instances data = new weka.core.Instances(new FileReader("/home/rhys/Documents/git/kaggle-competitions/plant-seedlings/data/train/output.arff")); data.setClassIndex(data.numAttributes() - 1); String classifierOptions = weka.core.Utils.splitOptions("-iterator \".ImageInstanceIterator -imagesLocation /home/rhys/Documents/git/kaggle-competitions/plant-seedlings/data/train -bs 16\" -zooModel \"weka.dl4j.zoo.KerasResNet -variation RESNET152V2\" -config \"weka.dl4j.NeuralNetConfiguration -updater \\\"weka.dl4j.updater.Adam -lr 0.1\\\"\" -numEpochs 20"); weka.classifiers.AbstractClassifier myClassifier = (AbstractClassifier) weka.core.Utils.forName(weka.classifiers.AbstractClassifier.class, "weka.classifiers.functions.Dl4jMlpClassifier", classifierOptions); // Stratify and split the data Random rand = new Random(0); Instances randData = new Instances(data); randData.randomize(rand); randData.stratify(5); Instances train = randData.trainCV(5, 0); Instances test = randData.testCV(5, 0); // Build the classifier on the training data myClassifier.buildClassifier(train); // Evaluate the model on test data Evaluation eval = new Evaluation(test); eval.evaluateModel(myClassifier, test); // Output some summary statistics System.out.println(eval.toSummaryString()); System.out.println(eval.toMatrixString());
One difficulty with rapid prototyping in ML is that large datasets & models can increase your iteration time, slowing down development. Following are some useful techniques to avoid this:
Resample filter, we can randomly take out instances from our dataset while roughly preserving the class balance (how many images are in each category).
- In the
Preprocesspanel, select the
- Set the
sampleSizePercentto a value <
Applyto resample the dataset down to your specified size.
Reduce Training Epochs
As we're getting to modern CNN architectures, their size becomes large and training takes a long time. We can reduce the training time during experimentation, and then easily scale it back up once we've decided on the configuration.
- In the
number of epochsto
We're now ready for a short training run.
- Keeping the same
Even with the smaller dataset, training on CPU with such a sizable model (
KerasResNet) can take a while (~20-30 minutes). Once you've got the model to begin training successfully, feel free to move on to the other sections of the tutorial if you find yourself waiting here for a long time.
We've introduced a few new features already, so now is the time to do some exploration of your own. Below are some suggestions for things to try out - try improve the accuracy over what we achieved with defaults.
- Custom Dataset - if you've got your own dataset, try loading that into WEKA for your exploration. If not, feel free to use one of the other datasets provided in the asset pack.
- Custom Layer architecture - Designing neural network architectures is an open research area which we only briefly touched on so now's the time to try creating your own model and getting a feel for CNNs. Start with a blank
Dl4jMlpClassifierand create your own architecture; you may like to refer to the model summaries for some inspiration.
- Different Zoo Models - After trying to create your own model, try out some of the different models (and variations) available in the Model Zoo.
- Hyperparameter tuning - Part of becoming a competent deep learning practitioner is having an intuition for training hyperparameters and the effect it has on a model's overall accuracy. Try altering some parts of the
Dl4jMlpClassifierconfig, e.g. with your custom model architecture, try different numbers of epochs (10, 20, 30, 40, 50). You can also try the
MultiSearchpackages in WEKA to automatically search for the optimal hyperparameters.