Run this notebook online: or Colab:

# 2.1. Data Manipulation¶

In order to get anything done, we need some way to store and manipulate
data. Generally, there are two important things we need to do with data:
(i) acquire them; and (ii) process them once they are inside the
computer. There is no point in acquiring data without some way to store
it, so let us get our hands dirty first by playing with synthetic data.
To start, we introduce the \(n\)-dimensional array (`ndarray`

),
MXNet’s primary tool for storing and transforming data. In MXNet,
`ndarray`

is a class and we call any instance “an `ndarray`

”.

If you have worked with NumPy, the most widely-used scientific computing
package in Python, then you will find this section familiar. That’s by
design. We designed MXNet’s `ndarray`

to be an extension to NumPy’s
`ndarray`

with a few killer features. First, MXNet’s `ndarray`

supports asynchronous computation on CPU, GPU, and distributed cloud
architectures, whereas NumPy only supports CPU computation. Second,
MXNet’s `ndarray`

supports automatic differentiation. These properties
make MXNet’s `ndarray`

suitable for deep learning. Throughout the
book, when we say `ndarray`

, we are referring to MXNet’s `ndarray`

unless otherwise stated.

## 2.1.1. Getting Started¶

In this section, we aim to get you up and running, equipping you with the basic math and numerical computing tools that you will build on as you progress through the book. Do not worry if you struggle to grok some of the mathematical concepts or library functions. The following sections will revisit this material in the context of practical examples and it will sink. On the other hand, if you already have some background and want to go deeper into the mathematical content, just skip this section.

To start, we import the `api`

and `mxnet-engine`

modules from Deep
Java Library on maven. Here, the `api`

module includes all high level
Java APIs that will be used for data processing, training and inference.
The `mxnet-engine`

includes the implementation of those high level
APIs using Apache MXnet framework. Using the DJL automatic engine mode,
the MXNet native libraries with basic operations and functions
implemented in C++ will be downloaded automatically when DJL is first
used.

```
%load ../utils/djl-imports
```

An `ndarray`

represents a (possibly multi-dimensional) array of
numerical values. With one axis, an `ndarray`

corresponds (in math) to
a *vector*. With two axes, an `ndarray`

corresponds to a *matrix*.
Arrays with more than two axes do not have special mathematical
names—we simply call them *tensors*.

To start, we can use `arange`

to create a row vector `x`

containing
the first \(12\) integers starting with \(0\), though they are
created as floats by default. Each of the values in an `ndarray`

is
called an *element* of the `ndarray`

. For instance, there are
\(12\) elements in the `ndarray`

`x`

. Unless otherwise
specified, a new `ndarray`

will be stored in main memory and
designated for CPU-based computation.

```
NDManager manager = NDManager.newBaseManager();
var x = manager.arange(12);
x
```

```
ND: (12) gpu(0) int32
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
```

Here we are using a
``NDManager`

<https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDManager.html>`__
to create the `ndarray`

x. `NDManager`

implements the
AutoClosable
interface and manages the life cycles of the `ndarray`

s it created.
This is needed to help manage native memory consumption that Java
Garbage Collector does not have control of. We usually wrap NDManager
with try blocks so all `ndarray`

s will be closed in time. To know
more about memory management, read DJL’s
documentation.

```
try(NDManager manager = NDManager.newBaseManager()){
NDArray x = manager.arange(12);
}
```

We can access an `ndarray`

’s *shape* (the length along each axis) by
inspecting its `shape`

property.

```
x.getShape()
```

```
(12)
```

If we just want to know the total number of elements in an `ndarray`

,
i.e., the product of all of the shape elements, we can inspect its
`size`

property. Because we are dealing with a vector here, the single
element of its `shape`

is identical to its `size`

.

```
x.size()
```

```
12
```

To change the shape of an `ndarray`

without altering either the number
of elements or their values, we can invoke the `reshape`

function. For
example, we can transform our `ndarray`

, `x`

, from a row vector with
shape (\(12\),) to a matrix with shape (\(3\), \(4\)). This
new `ndarray`

contains the exact same values, but views them as a
matrix organized as \(3\) rows and \(4\) columns. To reiterate,
although the shape has changed, the elements in `x`

have not. Note
that the `size`

is unaltered by reshaping.

```
x = x.reshape(3, 4);
x
```

```
ND: (3, 4) gpu(0) int32
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
]
```

Reshaping by manually specifying every dimension is unnecessary. If our
target shape is a matrix with shape (height, width), then after we know
the width, the height is given implicitly. Why should we have to perform
the division ourselves? In the example above, to get a matrix with
\(3\) rows, we specified both that it should have \(3\) rows and
\(4\) columns. Fortunately, `ndarray`

can automatically work out
one dimension given the rest. We invoke this capability by placing
`-1`

for the dimension that we would like `ndarray`

to automatically
infer. In our case, instead of calling `x.reshape(3, 4)`

, we could
have equivalently called `x.reshape(-1, 4)`

or `x.reshape(3, -1)`

.

Passing `create`

method with only `Shape`

will grab a chunk of
memory and hands us back a matrix without bothering to change the value
of any of its entries. This is remarkably efficient but we must be
careful because the entries might take arbitrary values, including very
big ones!

```
manager.create(new Shape(3, 4))
```

```
ND: (3, 4) gpu(0) float32
[[ 1.12103877e-44, 1.26116862e-44, 1.40129846e-44, 1.54142831e-44],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],
]
```

Typically, we will want our matrices initialized either with zeros,
ones, some other constants, or numbers randomly sampled from a specific
distribution. We can create an `ndarray`

representing a tensor with
all elements set to \(0\) and a shape of (\(2\), \(3\),
\(4\)) as follows:

```
manager.zeros(new Shape(2, 3, 4))
```

```
ND: (2, 3, 4) gpu(0) float32
[[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
],
]
```

Similarly, we can create tensors with each element set to 1 as follows:

```
manager.ones(new Shape(2, 3, 4))
```

```
ND: (2, 3, 4) gpu(0) float32
[[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
],
]
```

Often, we want to randomly sample the values for each element in an
`ndarray`

from some probability distribution. For example, when we
construct arrays to serve as parameters in a neural network, we will
typically initialize their values randomly. The following snippet
creates an `ndarray`

with shape (\(3\), \(4\)). Each of its
elements is randomly sampled from a standard Gaussian (normal)
distribution with a mean of \(0\) and a standard deviation of
\(1\).

```
manager.randomNormal(0f, 1f, new Shape(3, 4), DataType.FLOAT32)
```

```
ND: (3, 4) gpu(0) float32
[[ 0.2925, -0.7184, 0.1 , -0.3932],
[ 2.547 , -0.0034, 0.0083, -0.251 ],
[ 0.129 , 0.3728, 1.0822, -0.665 ],
]
```

You can also just pass the shape and it will use default values for mean and standard deviation (0 and 1).

```
manager.randomNormal(new Shape(3, 4))
```

```
ND: (3, 4) gpu(0) float32
[[ 0.5434, -0.7168, -1.4913, 1.4805],
[ 0.1374, -1.2208, 0.3072, 1.1135],
[-0.0376, -0.7109, -1.2903, -0.8822],
]
```

We can also specify the exact values for each element in the desired
`ndarray`

by supplying an array containing the numerical values and
the desired shape.

```
manager.create(new float[]{2, 1, 4, 3, 1, 2, 3, 4, 4, 3, 2, 1}, new Shape(3, 4))
```

```
ND: (3, 4) gpu(0) float32
[[2., 1., 4., 3.],
[1., 2., 3., 4.],
[4., 3., 2., 1.],
]
```

## 2.1.2. Operations¶

This book is not about software engineering. Our interests are not
limited to simply reading and writing data from/to arrays. We want to
perform mathematical operations on those arrays. Some of the simplest
and most useful operations are the *elementwise* operations. These apply
a standard scalar operation to each element of an array. For functions
that take two arrays as inputs, elementwise operations apply some
standard binary operator on each pair of corresponding elements from the
two arrays. We can create an elementwise function from any function that
maps from a scalar to a scalar.

In mathematical notation, we would denote such a *unary* scalar operator
(taking one input) by the signature
\(f: \mathbb{R} \rightarrow \mathbb{R}\). This just means that the
function is mapping from any real number (\(\mathbb{R}\)) onto
another. Likewise, we denote a *binary* scalar operator (taking two real
inputs, and yielding one output) by the signature
\(f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}\). Given any two
vectors \(\mathbf{u}\) and \(\mathbf{v}\) *of the same shape*,
and a binary operator \(f\), we can produce a vector
\(\mathbf{c} = F(\mathbf{u},\mathbf{v})\) by setting
\(c_i \gets f(u_i, v_i)\) for all \(i\), where \(c_i, u_i\),
and \(v_i\) are the \(i^\mathrm{th}\) elements of vectors
\(\mathbf{c}, \mathbf{u}\), and \(\mathbf{v}\). Here, we
produced the vector-valued
\(F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d\) by
*lifting* the scalar function to an elementwise vector operation.

In DJL, the common standard arithmetic operators (`+`

, `-`

, `*`

,
`/`

, and `**`

) have all been *lifted* to elementwise operations for
any identically-shaped tensors of arbitrary shape. We can call
elementwise operations on any two tensors of the same shape. In the
following example, we use commas to formulate a \(5\)-element tuple,
where each element is the result of an elementwise operation. Note: you
need to use `add`

, `sub`

, `mul`

, `div`

, and `pow`

as Java does
not support overloading of these operators.

```
var x = manager.create(new float[]{1f, 2f, 4f, 8f});
var y = manager.create(new float[]{2f, 2f, 2f, 2f});
x.add(y);
```

```
ND: (4) gpu(0) float32
[ 3., 4., 6., 10.]
```

```
x.sub(y);
```

```
ND: (4) gpu(0) float32
[-1., 0., 2., 6.]
```

```
x.mul(y);
```

```
ND: (4) gpu(0) float32
[ 2., 4., 8., 16.]
```

```
x.div(y);
```

```
ND: (4) gpu(0) float32
[0.5, 1. , 2. , 4. ]
```

```
x.pow(y);
```

```
ND: (4) gpu(0) float32
[ 1., 4., 16., 64.]
```

Many more operations can be applied elementwise, including unary operators like exponentiation.

```
x.exp()
```

```
ND: (4) gpu(0) float32
[ 2.71828175e+00, 7.38905621e+00, 5.45981483e+01, 2.98095801e+03]
```

In addition to elementwise computations, we can also perform linear algebra operations, including vector dot products and matrix multiplication. We will explain the crucial bits of linear algebra (with no assumed prior knowledge) in Section 2.3.

We can also *concatenate* multiple `ndarray`

s together, stacking
them end-to-end to form a larger `ndarray`

. We just need to provide a
list of `ndarray`

s and tell the system along which axis to
concatenate. The example below shows what happens when we concatenate
two matrices along rows (axis \(0\), the first element of the shape)
vs. columns (axis \(1\), the second element of the shape). We can
see that the first output `ndarray`

’s axis-\(0\) length
(\(6\)) is the sum of the two input `ndarray`

s’ axis-\(0\)
lengths (\(3 + 3\)); while the second output `ndarray`

’s
axis-\(1\) length (\(8\)) is the sum of the two input
`ndarray`

s’ axis-\(1\) lengths (\(4 + 4\)).

```
x = manager.arange(12f).reshape(3, 4);
y = manager.create(new float[]{2, 1, 4, 3, 1, 2, 3, 4, 4, 3, 2, 1}, new Shape(3, 4));
x.concat(y) // default axis = 0
```

```
ND: (6, 4) gpu(0) float32
[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 2., 1., 4., 3.],
[ 1., 2., 3., 4.],
[ 4., 3., 2., 1.],
]
```

```
x.concat(y, 1)
```

```
ND: (3, 8) gpu(0) float32
[[ 0., 1., 2., 3., 2., 1., 4., 3.],
[ 4., 5., 6., 7., 1., 2., 3., 4.],
[ 8., 9., 10., 11., 4., 3., 2., 1.],
]
```

Sometimes, we want to construct a binary `ndarray`

via *logical
statements*. Take `x.eq(y)`

as an example. For each position, if `x`

and `y`

are equal at that position, the corresponding entry in the new
`ndarray`

takes a value of \(1\), meaning that the logical
statement `x.eq(y)`

is true at that position; otherwise that position
takes \(0\).

```
x.eq(y)
```

```
ND: (3, 4) gpu(0) boolean
[[false, true, false, true],
[false, false, false, false],
[false, false, false, false],
]
```

Summing all the elements in the `ndarray`

yields an `ndarray`

with
only one element.

```
x.sum()
```

```
ND: () gpu(0) float32
66.
```

For stylistic convenience, we can write `x.sum()`

as `np.sum(x)`

.

## 2.1.3. Broadcasting Mechanism¶

In the above section, we saw how to perform elementwise operations on
two `ndarray`

s of the same shape. Under certain conditions, even
when shapes differ, we can still perform elementwise operations by
invoking the *broadcasting mechanism*. This mechanism works in the
following way: First, expand one or both arrays by copying elements
appropriately so that after this transformation, the two `ndarray`

s
have the same shape. Second, carry out the elementwise operations on the
resulting arrays.

In most cases, we broadcast along an axis where an array initially only has length \(1\), such as in the following example:

```
var a = manager.arange(3f).reshape(3, 1);
var b = manager.arange(2f).reshape(1, 2);
a
```

```
ND: (3, 1) gpu(0) float32
[[0.],
[1.],
[2.],
]
```

```
b
```

```
ND: (1, 2) gpu(0) float32
[[0., 1.],
]
```

Since `a`

and `b`

are \(3\times1\) and \(1\times2\) matrices
respectively, their shapes do not match up if we want to add them. We
*broadcast* the entries of both matrices into a larger \(3\times2\)
matrix as follows: for matrix `a`

it replicates the columns and for
matrix `b`

it replicates the rows before adding up both elementwise.

```
a.add(b)
```

```
ND: (3, 2) gpu(0) float32
[[0., 1.],
[1., 2.],
[2., 3.],
]
```

## 2.1.4. Indexing and Slicing¶

DJL use the same syntax as Numpy in Python for indexing and slicing.
Just as in any other Python array, elements in an `ndarray`

can be
accessed by index. As in any Python array, the first element has index
\(0\) and ranges are specified to include the first but *before* the
last element. As in standard Python lists, we can access elements
according to their relative position to the end of the list by using
negative indices.

Thus, `[-1]`

selects the last element and `[1:3]`

selects the second
and the third elements as follows:

```
x.get(":-1");
```

```
ND: (2, 4) gpu(0) float32
[[0., 1., 2., 3.],
[4., 5., 6., 7.],
]
```

```
x.get("1:3")
```

```
ND: (2, 4) gpu(0) float32
[[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
]
```

Beyond reading, we can also write elements of a matrix by specifying indices.

```
x.set(new NDIndex("1, 2"), 9);
x
```

```
ND: (3, 4) gpu(0) float32
[[ 0., 1., 2., 3.],
[ 4., 5., 9., 7.],
[ 8., 9., 10., 11.],
]
```

If we want to assign multiple elements the same value, we simply index
all of them and then assign them the value. For instance, `[0:2, :]`

accesses the first and second rows, where `:`

takes all the elements
along axis \(1\) (column). While we discussed indexing for matrices,
this obviously also works for vectors and for tensors of more than
\(2\) dimensions.

```
x.set(new NDIndex("0:2, :"), 12);
x
```

```
ND: (3, 4) gpu(0) float32
[[12., 12., 12., 12.],
[12., 12., 12., 12.],
[ 8., 9., 10., 11.],
]
```

## 2.1.5. Saving Memory¶

Running operations can cause new memory to be allocated to host results.
For example, if we write `y = x.add(y)`

, we will dereference the
`ndarray`

that `y`

used to point to and instead point `y`

at the
newly allocated memory.

This might be undesirable for two reasons. First, we do not want to run
around allocating memory unnecessarily all the time. In machine
learning, we might have hundreds of megabytes of parameters and update
all of them multiple times per second. Typically, we will want to
perform these updates *in place*. Second, we might point at the same
parameters from multiple variables. If we do not update in place, other
references will still point to the old memory location, making it
possible for parts of our code to inadvertently reference stale
parameters.

Fortunately, performing in-place operations in DJL is easy. We can
assign the result of an operation to a previously allocated array using
inplace operators like `addi`

, `subi`

, `muli`

, and `divi`

.

```
var original = manager.zeros(y.getShape());
var actual = original.addi(x);
original == actual
```

```
true
```

## 2.1.6. Summary¶

DJL’s

`ndarray`

is an extension to NumPy’s`ndarray`

with a few killer advantages that make it suitable for deep learning.DJL’s

`ndarray`

provides a variety of functionalities including basic mathematics operations, broadcasting, indexing, slicing, memory saving, and conversion to other Python objects.

## 2.1.7. Exercises¶

Run the code in this section. Change the conditional statement

`x.eq(y)`

in this section to`x.lt(y)`

(less than) or`x.gt(y)`

(greater than), and then see what kind of`ndarray`

you can get.Replace the two

`ndarray`

s that operate by element in the broadcasting mechanism with other shapes, e.g., three dimensional tensors. Is the result the same as expected?