Run this notebook online:Binder or Colab: Colab

Run this notebook online:Binder or Colab: Colab

10.4. The Object Detection Dataset

There are no small datasets, like MNIST or Fashion-MNIST, in the object detection field. In order to quickly test models, we are going to assemble a small dataset. First, we generate 1000 banana images of different angles and sizes using free bananas from our office. Then, we collect a series of background images and place a banana image at a random position on each image.

10.4.1. Downloading the Dataset

The banana detection dataset in RecordIO format can be downloaded directly from the Internet.

%mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/

%maven ai.djl:api:0.7.0-SNAPSHOT
%maven ai.djl:basicdataset:0.7.0-SNAPSHOT
%maven org.slf4j:slf4j-api:1.7.26
%maven org.slf4j:slf4j-simple:1.7.26

%maven ai.djl.mxnet:mxnet-engine:0.7.0-SNAPSHOT
%maven ai.djl.mxnet:mxnet-native-auto:1.7.0-a
import ai.djl.basicdataset.BananaDetection;
import ai.djl.modality.cv.Image;
import ai.djl.modality.cv.ImageFactory;
import ai.djl.modality.cv.output.BoundingBox;
import ai.djl.modality.cv.output.DetectedObjects;
import ai.djl.modality.cv.output.Rectangle;
import ai.djl.ndarray.NDArray;
import ai.djl.ndarray.NDManager;
import ai.djl.ndarray.types.DataType;
import ai.djl.training.dataset.Batch;
import ai.djl.training.dataset.Dataset;
import ai.djl.translate.TranslateException;

import javax.swing.*;
import java.awt.*;
import java.awt.image.BufferedImage;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

10.4.2. Reading the Dataset

We are going to read the object detection dataset by creating the instance BananaDetection. DJL makes it fairly easy to get the dataset. Here is how we do it.

// Load the bananas dataset.
BananaDetection trainIter = BananaDetection.builder()
                                .setSampling(32, true)  // Read the dataset in random order
                                .optUsage(Dataset.Usage.TRAIN)
                                .build();

trainIter.prepare();

Below, we read a minibatch and print the shape of the image and label. The shape of the image is the same as in the previous experiment (batch size, number of channels, height, width). The shape of the label is (batch size, \(m\), 5), where \(m\) is equal to the maximum number of bounding boxes contained in a single image in the dataset. Although computation for the minibatch is very efficient, it requires each image to contain the same number of bounding boxes so that they can be placed in the same batch. Since each image may have a different number of bounding boxes, we can add illegal bounding boxes to images that have less than \(m\) bounding boxes until each image contains \(m\) bounding boxes. Thus, we can read a minibatch of images each time. The label of each bounding box in the image is represented by an array of length 5. The first element in the array is the category of the object contained in the bounding box. When the value is -1, the bounding box is an illegal bounding box for filling purpose. The remaining four elements of the array represent the \(x, y\) axis coordinates of the upper-left corner of the bounding box and the \(x, y\) axis coordinates of the lower-right corner of the bounding box (the value range is between 0 and 1). The banana dataset here has only one bounding box per image, so \(m=1\).

NDManager manager = NDManager.newBaseManager();

for (Batch batch : trainIter.getData(manager)){

    System.out.println(batch.getData().get(0).getShape() + ", " + batch.getLabels().get(0).getShape());
    break;
}
(32, 3, 256, 256), (32, 1, 5)

10.4.3. Demonstration

We have ten images with bounding boxes on them. We can see that the angle, size, and position of banana are different in each image. Of course, this is a simple artificial dataset. In actual practice, the data are usually much more complicated.

public static class ImagePanel extends JPanel {
        int SCALE;
        BufferedImage img;

        public ImagePanel() {
            this.SCALE = 1;
        }

        public ImagePanel(int scale, BufferedImage img) {
            this.SCALE = scale;
            this.img = img;
        }

        @Override
        protected void paintComponent(Graphics g) {
            Graphics2D g2d = (Graphics2D) g;
            g2d.scale(SCALE, SCALE);
            g2d.drawImage(this.img, 0, 0, this);
        }
}

public static class Container extends JPanel {
        public Container(String label) {
            setLayout(new BoxLayout(this, BoxLayout.Y_AXIS));
            JLabel l = new JLabel(label, JLabel.CENTER);
            l.setAlignmentX(Component.CENTER_ALIGNMENT);
            add(l);
        }

        public Container(String trueLabel, String predLabel) {
            setLayout(new BoxLayout(this, BoxLayout.Y_AXIS));
            JLabel l = new JLabel(trueLabel, JLabel.CENTER);
            l.setAlignmentX(Component.CENTER_ALIGNMENT);
            add(l);
            JLabel l2 = new JLabel(predLabel, JLabel.CENTER);
            l2.setAlignmentX(Component.CENTER_ALIGNMENT);
            add(l2);
        }
}

public static void showImages(Image[] dataset,
                                  int number, int WIDTH, int HEIGHT, int SCALE,
                                  NDManager manager)
            throws IOException, TranslateException {
        // Plot a list of images
        JFrame frame = new JFrame("");
        for (int record = 0; record < number; record++) {
            Image i = dataset[record];
            BufferedImage img = (BufferedImage) i.getWrappedImage();
            Graphics2D g = (Graphics2D) img.getGraphics();

            JPanel panel = new ImagePanel(SCALE, img);
            panel.setPreferredSize(new Dimension(WIDTH * SCALE, HEIGHT * SCALE));
            JPanel container = new Container("");
            container.add(panel);
            frame.getContentPane().add(container);
        }
        frame.getContentPane().setLayout(new FlowLayout());
        frame.pack();
        frame.setVisible(true);
}
Image[] imageArr = new Image[10];
List<List<String>> classNames = new ArrayList();
List<List<Double>> prob = new ArrayList<>();
List<List<BoundingBox>> boxes = new ArrayList<>();

for (Batch batch : trainIter.getData(manager)) {

    for (int i=0; i < 10; i++){
        NDArray imgData = batch.getData().get(0).get(i);
        imgData.muli(255);
        NDArray imgLabel = batch.getLabels().get(0).get(i);

        List<String> bananaList = new ArrayList<>();
        bananaList.add("banana");
        classNames.add(new ArrayList<>(bananaList));

        List<Double> probabilityList = new ArrayList<>();
        probabilityList.add(1.0);
        prob.add(new ArrayList<>(probabilityList));

        List<BoundingBox> boundBoxes = new ArrayList<>();

        float[] coord = imgLabel.get(0).toFloatArray();
        double first = (double) (coord[1]);
        double second = (double) (coord[2]);
        double third = (double) (coord[3]);
        double fourth = (double) (coord[4]);

        boundBoxes.add(new Rectangle(first, second, (third-first), (fourth-second)));

        boxes.add(new ArrayList<>(boundBoxes));
        DetectedObjects detectedObjects = new DetectedObjects(classNames.get(i), prob.get(i), boxes.get(i));
        imageArr[i] = ImageFactory.getInstance().fromNDArray(imgData.toType(DataType.INT8, true));
        imageArr[i].drawBoundingBoxes(detectedObjects);
    }
    break;
}

// refer to https://github.com/aws-samples/d2l-java/tree/master/documentation/troubleshoot.md
// if you encounter X11 errors when drawing bounding boxes.
showImages(imageArr, 10, 256, 256, 1, manager)
https://d2l-java-resources.s3.amazonaws.com/img/object_detection.png

Fig. 10.4.1 Contour Gradient Descent.

10.4.4. Summary

  • The banana detection dataset we synthesized can be used to test object detection models.

  • The data reading for object detection is similar to that for image classification. However, after we introduce bounding boxes, the label shape and image augmentation (e.g., random cropping) are changed.