Run this notebook online:Binder or Colab: Colab

13.4. The Object Detection Dataset

There are no small datasets, like MNIST or Fashion-MNIST, in the object detection field. In order to quickly test models, we are going to assemble a small dataset. First, we generate 1000 banana images of different angles and sizes using free bananas from our office. Then, we collect a series of background images and place a banana image at a random position on each image.

13.4.1. Downloading the Dataset

The banana detection dataset in RecordIO format can be downloaded directly from the Internet.

%load ../utils/djl-imports
import ai.djl.basicdataset.cv.*;
import java.awt.*;
import java.awt.image.*;
import java.util.List;
import javax.swing.*;
import ai.djl.modality.cv.output.Rectangle;

13.4.2. Reading the Dataset

We are going to read the object detection dataset by creating the instance BananaDetection. DJL makes it fairly easy to get the dataset. Here is how we do it.

// Load the bananas dataset.
BananaDetection trainIter = BananaDetection.builder()
        .setSampling(32, true)  // Read the dataset in random order
        .optUsage(Dataset.Usage.TRAIN)
        .build();

trainIter.prepare();

Below, we read a minibatch and print the shape of the image and label. The shape of the image is the same as in the previous experiment (batch size, number of channels, height, width). The shape of the label is (batch size, \(m\), 5), where \(m\) is equal to the maximum number of bounding boxes contained in a single image in the dataset. Although computation for the minibatch is very efficient, it requires each image to contain the same number of bounding boxes so that they can be placed in the same batch. Since each image may have a different number of bounding boxes, we can add illegal bounding boxes to images that have less than \(m\) bounding boxes until each image contains \(m\) bounding boxes. Thus, we can read a minibatch of images each time. The label of each bounding box in the image is represented by an array of length 5. The first element in the array is the category of the object contained in the bounding box. When the value is -1, the bounding box is an illegal bounding box for filling purpose. The remaining four elements of the array represent the \(x, y\) axis coordinates of the upper-left corner of the bounding box and the \(x, y\) axis coordinates of the lower-right corner of the bounding box (the value range is between 0 and 1). The banana dataset here has only one bounding box per image, so \(m=1\).

NDManager manager = NDManager.newBaseManager();

Batch batch = trainIter.getData(manager).iterator().next();
System.out.println(batch.getData().get(0).getShape() + ", " + batch.getLabels().get(0).getShape());
(32, 3, 256, 256), (32, 1, 5)

13.4.3. Demonstration

We have ten images with bounding boxes on them. We can see that the angle, size, and position of banana are different in each image. Of course, this is a simple artificial dataset. In actual practice, the data are usually much more complicated.

public static BufferedImage showImages(Image[] dataset, int width, int height) {
    int col = 1280 / width;
    int row = (dataset.length + col - 1) / col;
    int w = col * (width + 3);
    int h = row * (height + 3);
    BufferedImage bi = new BufferedImage(w + 3, h, BufferedImage.TYPE_INT_RGB);
    Graphics2D g = bi.createGraphics();

    for (int i = 0; i < dataset.length; i++) {
        Image image = dataset[i];
        BufferedImage img = (BufferedImage) image.getWrappedImage();
        int x = (i % col) * (width + 3) + 3;
        int y = (i / col) * (height + 3) + 3;
        g.drawImage(img, x, y, width, height, null);
    }
    g.dispose();
    return bi;
}
Image[] imageArr = new Image[10];
List<List<String>> classNames = new ArrayList();
List<List<Double>> prob = new ArrayList<>();
List<List<BoundingBox>> boxes = new ArrayList<>();

Batch batch = trainIter.getData(manager).iterator().next();
for (int i=0; i < 10; i++) {
    NDArray imgData = batch.getData().get(0).get(i);
    imgData.muli(255);
    NDArray imgLabel = batch.getLabels().get(0).get(i);

    List<String> bananaList = new ArrayList<>();
    bananaList.add("banana");
    classNames.add(new ArrayList<>(bananaList));

    List<Double> probabilityList = new ArrayList<>();
    probabilityList.add(1.0);
    prob.add(new ArrayList<>(probabilityList));

    List<BoundingBox> boundBoxes = new ArrayList<>();

    float[] coord = imgLabel.get(0).toFloatArray();
    double first = (double) (coord[1]);
    double second = (double) (coord[2]);
    double third = (double) (coord[3]);
    double fourth = (double) (coord[4]);

    boundBoxes.add(new Rectangle(first, second, (third-first), (fourth-second)));

    boxes.add(new ArrayList<>(boundBoxes));
    DetectedObjects detectedObjects = new DetectedObjects(classNames.get(i), prob.get(i), boxes.get(i));
    imageArr[i] = ImageFactory.getInstance().fromNDArray(imgData.toType(DataType.INT8, true));
    imageArr[i].drawBoundingBoxes(detectedObjects);
}

// refer to https://github.com/deepjavalibrary/d2l-java/tree/master/documentation/troubleshoot.md
// if you encounter X11 errors when drawing bounding boxes.
showImages(imageArr, 256, 256)
../_images/output_object-detection-dataset_42a473_10_0.png

13.4.4. Summary

  • The banana detection dataset we synthesized can be used to test object detection models.

  • The data reading for object detection is similar to that for image classification. However, after we introduce bounding boxes, the label shape and image augmentation (e.g., random cropping) are changed.