The MNIST Dataset - Feature Extraction

Mnist Example 0 Mnist Example 1 Mnist Example 2 Mnist Example 3 Mnist Example 4 Mnist Example 5 Mnist Example 6 Mnist Example 7 Mnist Example 8 Mnist Example 9

Instead of training a full neural network on your dataset, you may like to try using a pretrained model as a feature extractor and fitting a simpler model to those features.

You can use any model you've trained within WEKA, or any pretrained model in the model zoo for this task; try a few and see which works best for your task!

Concatenating Activations

By default the filter takes features from the final dense/pooling layer of the model (before the classification layer). You can also take features from any intermediary layer and concatenate them.

Activation Pooling

An important parameter when using intermediate layers is the filter's PoolingType. Activations from intermediate layers are often 3-dimensional for a given instance, so they need to be reduced into a 1-dimensional vector. There are 4 pooling methods currently supported: - PoolingType.MAX (default) - PoolingType.AVG - PoolingType.SUM - PoolingType.MIN

These pool the 2nd and 3rd dimension into a single value, i.e., activations of [512, 26, 26] (512 26x26 feature maps) are pooled into shape [512]. You can also specify PoolingType.NONE which simply flattens the extra dimensions (aforementioned example would become shape [346112]).

Default Feature Extraction Layer

During feature extraction, the output activations from the designated feature extraction layer are used to create the 'featurized' instances. All zoo models have a default feature extraction layer, which is typically the second-to-last layer in the model (e.g., Dl4jResNet50's default feature layer is set to flatten_1). The second-to-last layer tends to give the most meaningful activations, hence why it's set to the default (although you can use any intermediary layer).

PoolingType does not need to be specified when using the default activation layer - the outputs are already the correct dimensionality ([batch size, num activations]). If using an intermediary layer the outputs will typically be of size [batch size, width, height, num channels].

Example 1: Default MNIST Minimal

The following example walks through using a pretrained ResNet50 (from the Deeplearning4j model zoo) as a feature extractor on the MNIST dataset and fitting a model using Weka's SMO algorithm to the dataset. This only takes 1-2 minutes on a modern CPU — much faster than training a neural network from scratch.

The steps shown below split this into two steps; storing the featurized dataset, and fitting a Weka classifier to the dataset. They can be combined into a single command with a filtered classifier, however, the method shown below is more efficient as the dataset featurizing (which is the most expensive part of this operation) is only done once (it would be done 10 times using 10-fold CV with a FilteredClassifier). Saving the featurized dataset separately then makes it much faster to try out different Weka classifiers.

Note that the first time this example is run it may need to download the pretrained weights, in which case actual runtime will be longer. These weights are cached locally so subsequent runs are much faster.

GUI

The first step is to open the MNIST meta ARFF file in the Weka Explorer Preprocess tab via Open File. A randomly sampled MNIST dataset of 420 images is provided in the WekaDeeplearning4j package for testing purposes ($WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist.meta.minimal.arff).

Then, select the the Dl4jMlpFilter in the filter panel (within filters/unsupervised/attribute). Click in the properties box to open the filter settings.

Classifier

To correctly load the images it is further necessary to select the ImageInstanceIterator as instance iterator and point it to the MNIST directory that contains the actual image files ($WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist-minimal/). The width, height, and channels are automatically set based on the zoo model's input shape (further explanation).

If running on CPU then set mini-batch size to your machine's thread count. If you run into memory issues then use a smaller mini-batch size. Click OK to exit the ImageInstanceIterator settings.

Image Instance Iterator

Dl4jResNet50 is already selected as the feature extractor model, and will by default use the final dense layer activations as the image features. The other filter options can be left as default; they'll be explained in the next example.

Click Ok and Apply to begin processing your dataset. After completion, you should see your newly processed dataset!

Processed Dataset

Simply switch to the Classify tab to start applying different WEKA classifiers to your newly transformed dataset.

Commandline

This example assumes that weka.jar is in the classpath and the current working directory is the root of the WekaDeeplearning4j folder ($WEKA_HOME/packages/wekaDeeplearning4j).

It should be noted that because we're using the default extraction layer (for this model) of flatten_1, we can simply specify the -default-feature-layer flag. This is especially useful if trying a range of different zoo models and one wants to avoid specifying layer names for each one.

$ java -Xmx8g weka.Run \
    .Dl4jMlpFilter \
        -i datasets/nominal/mnist.meta.minimal.arff \
        -o mnist-rn50.arff \
        -c last \
        -decimal 20 \
        -iterator ".ImageInstanceIterator -imagesLocation datasets/nominal/mnist-minimal -bs 12" \
        -zooModel ".Dl4jResNet50"
        -default-feature-layer

We now have a standard .arff file that can be fit to like any numerical dataset

$ java weka.Run .SMO -t mnist-rn50.arff

Java

This uses reflection to load the filter so all the DL4J dependencies don't need to be on the CLASSPATH - as long as WekaDeeplearning4j is installed from the Package Manager, weka.core.WekaPackageManager.loadPackages will load the necessary libraries at runtime.

// Load all packages so that Dl4jMlpFilter class can be found using forName("weka.filters.unsupervised.attribute.Dl4jMlpFilter")
weka.core.WekaPackageManager.loadPackages(true);

// Load the dataset
weka.core.Instances instances = new weka.core.Instances(new FileReader("datasets/nominal/mnist.meta.minimal.arff"));
instances.setClassIndex(1);
String[] filterOptions = weka.core.Utils.splitOptions("-iterator \".ImageInstanceIterator -imagesLocation datasets/nominal/mnist-minimal -bs 12\" -poolingType AVG -layer-extract \".DenseLayer -name flatten_1\" -zooModel \".Dl4jResNet50\"");
weka.filters.Filter myFilter = (weka.filters.Filter) weka.core.Utils.forName(weka.filters.Filter.class, "weka.filters.unsupervised.attribute.Dl4jMlpFilter", filterOptions);

// Run the filter, using the model as a feature extractor
myFilter.setInputFormat(instances);
weka.core.Instances transformedInstances = weka.filters.Filter.useFilter(instances, myFilter);

// You could save the instances at this point to an arff file for rapid experimentation with other classifiers via:
// https://waikato.github.io/weka-wiki/formats_and_processing/save_instances_to_arff/

// CV our Random Forest classifier on the extracted features
weka.classifiers.evaluation.Evaluation evaluation = new weka.classifiers.evaluation.Evaluation(transformedInstances);
int numFolds = 10;
evaluation.crossValidateModel(new weka.classifiers.functions.SMO(), transformedInstances, numFolds, new Random(1));
System.out.println(evaluation.toSummaryString());
System.out.println(evaluation.toMatrixString());

Results

Using SMO gives us 89% accuracy - certainly not SOTA but given the simplicity and speed of the method it's not bad! It should be noted that the training dataset size is very small (~400 instances) and also that the ResNet50 weights are trained on ImageNet, which is a very different domain to MNIST (classifying cars, animals, etc. vs classifying handwritten digits).

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         374               89.0476 %
Incorrectly Classified Instances        46               10.9524 %
Kappa statistic                          0.8783
Mean absolute error                      0.1611
Root mean squared error                  0.2736
Relative absolute error                 89.4975 %
Root relative squared error             91.2069 %
Total Number of Instances              420     

=== Confusion Matrix ===

  a  b  c  d  e  f  g  h  i  j   <-- classified as
 39  0  0  2  0  0  0  0  0  0 |  a = 0
  0 46  1  0  0  0  0  0  0  0 |  b = 1
  2  0 37  0  0  0  1  0  1  0 |  c = 2
  1  0  0 40  0  1  0  0  0  2 |  d = 3
  1  1  1  0 36  0  0  0  0  2 |  e = 4
  0  0  0  3  0 33  0  1  1  0 |  f = 5
  0  0  0  0  0  0 41  0  0  0 |  g = 6
  0  0  0  2  1  0  0 38  0  3 |  h = 7
  1  0  0  2  0  4  0  1 32  1 |  i = 8
  1  0  1  0  3  1  0  4  0 32 |  j = 9

Example 2: MNIST Using Activation Layer Concatenation and Pooling

This example shows concatenating the activations from an intermediary convolution layer (res4a_branch2b) to the default activations (from layer flatten_1) and using PoolingType.AVG to average pool the extra dimensions from res4a_branch2b.

Check out the model summary to look at other layers you could use.

GUI

The first step is to open the MNIST meta ARFF file in the Weka Explorer Preprocess tab via Open File. A randomly sampled MNIST dataset of 420 images is provided in the WekaDeeplearning4j package for testing purposes ($WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist.meta.minimal.arff).

Then, select the the Dl4jMlpFilter in the filter panel. Click in the box to open the filter settings.

Classifier

To correctly load the images it is further necessary to select the Image-Instance-Iterator as instance iterator and point it to the MNIST directory that contains the actual image files ($WEKA_HOME/packages/wekaDeeplearning4j/datasets/nominal/mnist-minimal/).

If you run into memory issues then use a smaller mini-batch size.

Image Instance Iterator

Dl4jResNet50 is already selected as the feature extractor model. To add res4a_branch2b as another feature extraction layer, we first set the Use default feature layer flag to false - if this is not done, only the default extraction layer will be used. To edit the Feature extraction layers property, click the property to open the array editor, and click the DenseLayer specification to open the editor for our new layer.

Feature Extraction Layers

When adding another feature extraction layer, only the layer name property needs to be set. Set the layer name property to res4a_branch2b. Click OK and Add to add the newly created layer:

Feature Extraction Layers

Note that the order of feature extraction layers may have an effect on accuracy obtained by a WEKA classifier.

Because we're using activations from an intermediary layer now, the Pooling Type argument will be used (as explained at the beginning of this tutorial). The default type is MAX, but for the sake of this tutorial we're going to use AVG, so change the Pooling Type property to AVG.

Click Ok on the filter properties and Apply to begin processing your dataset. After completion, you should see your newly processed dataset! The attributes are named after the layer they were derived from, so more investigation can be done around which layer provides the most informative features (e.g., using the Select Attributes panel in WEKA).

Processed Dataset

Simply switch to the Classify tab to start applying different WEKA classifiers (this example uses SMO) to the newly transformed dataset.

Commandline

$ java -Xmx8g weka.Run \
    .Dl4jMlpFilter \
        -i datasets/nominal/mnist.meta.minimal.arff \
        -o mnist-rn50-concat.arff \
        -c last \
        -decimal 20 \
        -iterator ".ImageInstanceIterator -imagesLocation datasets/nominal/mnist-minimal -bs 16" \
        -poolingType AVG \
        -zooModel ".Dl4jResNet50" \ 
        -layer-extract ".DenseLayer -name res4a_branch2b" \
        -layer-extract ".DenseLayer -name flatten_1"

We now have a standard .arff file that can be fit to like any numerical dataset

$ java weka.Run .SMO -t mnist-rn50-concat.arff

Java

// Load all packages so that Dl4jMlpFilter class can be found using forName("weka.filters.unsupervised.attribute.Dl4jMlpFilter")
weka.core.WekaPackageManager.loadPackages(true);

// Load the dataset
weka.core.Instances instances = new weka.core.Instances(new FileReader("datasets/nominal/mnist.meta.minimal.arff"));
instances.setClassIndex(1);
String[] filterOptions = weka.core.Utils.splitOptions("-iterator \".ImageInstanceIterator -imagesLocation datasets/nominal/mnist-minimal -bs 12\" -poolingType AVG -layer-extract \".DenseLayer -name flatten_1\" -layer-extract \".DenseLayer -name res4a_branch2b\" -zooModel \".Dl4jResNet50\"");
weka.filters.Filter myFilter = (weka.filters.Filter) weka.core.Utils.forName(weka.filters.Filter.class, "weka.filters.unsupervised.attribute.Dl4jMlpFilter", filterOptions);

// Run the filter, using the model as a feature extractor
myFilter.setInputFormat(instances);
weka.core.Instances transformedInstances = weka.filters.Filter.useFilter(instances, myFilter);

// You could save the instances at this point to an arff file for rapid experimentation with other classifiers via:
// https://waikato.github.io/weka-wiki/formats_and_processing/save_instances_to_arff/

// CV our Random Forest classifier on the extracted features
weka.classifiers.evaluation.Evaluation evaluation = new weka.classifiers.evaluation.Evaluation(transformedInstances);
int numFolds = 10;
evaluation.crossValidateModel(new weka.classifiers.functions.SMO(), transformedInstances, numFolds, new Random(1));
System.out.println(evaluation.toSummaryString());
System.out.println(evaluation.toMatrixString());

Results

Adding this extra layer increased the accuracy very slightly! Try playing around with some other layers/classifiers/pooling types to try improve the accuracy even further.

=== Stratified cross-validation ===

Correctly Classified Instances         376               89.5238 %
Incorrectly Classified Instances        44               10.4762 %
Kappa statistic                          0.8836
Mean absolute error                      0.1611
Root mean squared error                  0.2736
Relative absolute error                 89.521  %
Root relative squared error             91.2207 %
Total Number of Instances              420     

=== Confusion Matrix ===

  a  b  c  d  e  f  g  h  i  j   <-- classified as
 39  0  1  1  0  0  0  0  0  0 |  a = 0
  0 46  1  0  0  0  0  0  0  0 |  b = 1
  1  0 37  1  0  0  1  0  1  0 |  c = 2
  1  0  1 39  0  1  0  0  0  2 |  d = 3
  0  1  1  0 37  0  0  0  0  2 |  e = 4
  0  0  0  2  0 34  0  1  1  0 |  f = 5
  1  0  0  0  0  0 39  0  1  0 |  g = 6
  0  0  0  1  1  0  0 39  1  2 |  h = 7
  1  0  1  2  0  1  0  1 33  2 |  i = 8
  1  0  0  0  3  1  0  4  0 33 |  j = 9