A Quick Practitioner's Guide to Tensors¶
by Eric Wong
Introduction¶
Two approaches to teaching linear algebra¶
This notebook serves as a basic introduction to tensors from a practitioners perspective. What does this mean? In many math departments, there often exists multiple levels of linear algebra. For example, at CMU there is
- 21-241 "Matrices and Linear Transformations" - typically aimed at undergraduates that need a working level of linear algebra for their application field, e.g. engineering. This usually takes the approach of teaching linear algebra from the perspective of matrices and vectors of numbers, explaining their operations, properties, and their decompositions.
- 21-341 "Linear Algebra" - usually taken by math majors/minors. This focuses more on a mathematically rigorous treatment of linear algebra over arbitrary fields, focusing on abstract linear transformations and their corresponding operations, properties, and decompositions.
Both courses have their respective strengths: for many people, working with matrix operations is sufficient for their application, and the former course is more than enough. The latter course gives a more abstract perspective which, while providing a more fundamental understanding in how these operations work, is not directly necessary when performing matrix computations.
However, when we move on from matrices to tensors, it is easy to find a number of resources from the more mathematical point of view, but more difficult to find a resource that shows how to use tensors from a working perspective. References on tensors tend to explain tensors as linear functionals on vector spaces and dual vector spaces, which automatically assumes a more mathematically rigorous understanding of linear algebra.
Why tensors? Tensor differential calculus!¶
Why do we need tensors, you might ask? Everything is simple and easy to understand with matrix operations, you might say! I find this tensor algebra (and consequently this notebook) to be highly motivated by matrix differential calculus. With respect to matrix differential calculus, there are resources from both of the two above perspectives.
- The commonly used and referenced Matrix Cookbook by Kaare Brandt Petersen and Michael Syskind Pedersen contains all the identities and facts about matrices (and consequently matrix derivatives) you would ever use in practice.
- For a more mathematical background, one can read Matrix Differential Calculus with Applications in Statistics and Econometrics by Jan R. Magnus and Heinz Neudecker.
The major limitation of these resources is that they focus primarily on scalar or vector valued functions, with derivatives with respect to scalars or vectors. What happens when you want to take a derivative of a vector valued function with respect to a matrix parameter? Or take a derivative of a matrix valued function with respect to a vector parameter? A very natural extension to higher dimensions of the Jacobian, which is the partial derivatives of a vector valued function, is to have a tensor of partial derivatives. The matrix cookbook avoids this by using elementwise partial derivatives, while the book on matrix differential calculus side-steps the whole idea of tensor differential calculus by simply vectorizing the function inputs and outputs to only work with 2-D Jacobian matrices.
While it is possible to turn tensors into combinations of matrix and vector operations, I believe this to be an unnecessary extra step for the practitioner. Sometimes, it can be more clear to derive your equations using tensors instead of being mangled by a number of vec
and kronecker
operations to turn it into 2-D calculus. With the usage of differential notation, we can get very clear and elegant derivations.
Optimization problems¶
The intended use case for this is (but is not limited to) optimization. Standard solutions to optimization problems, $\min_\theta f(\theta)$, usually involve some variation of gradient descent, where the parameters $\theta \in \mathbb{R}^k$ is iteratively updated with the equation
$$\theta = \theta - \alpha \nabla f(\theta)$$where the gradient is usually derived from the problem at hand. For example, the canonical least squares problem,
$$\min_\theta \frac{1}{2}||X\theta - y||^2$$can be solved with gradient descent by iteratively updating with the following gradient:
$$\nabla \frac{1}{2}||X\theta - y||^2 = X^T(X\theta - y)$$Calculating this gradient for various problems is where matrix differential calculus is typically used. With more problems involving higher dimensional outputs and inputs, this is an opportunity to use tensor differential calculus to simplify all of our derivations.
Scalars to Vectors to Matrices to Tensors¶
We begin with an brief overview of vector and matrix algebra, leading up to tensor algebra. These will be accompanied with code examples for the practitioner using Numpy.
import numpy as np
Scalars¶
Scalars are simply any real number. For example, all of the following are scalars:
- 0
- 1.5
- 9,001
- -23.23
- $\pi$
a = 0
b = 1.5
c = 9001
d = -23.23
e = np.pi
print(a,b,c,d,e)
Vectors¶
Vectors are finite sequences of scalars, often organized as a 'row vector' or 'column vector'. The key is that there is one dimension, and an index into this dimension provides the scalar at that position. For example, if we have a vector
x = [0, 10, 21, 8, 1]
Then (using 0 indexing) the value x[2] = 21
.
print(np.ones(5))
x = np.array([0, 10, 21, 8, 1])
print (x, x[2])
Matrices¶
Matrices are finite sequences of vectors of the same size. The key is that there are now 2 dimensions: 1 dimension to designate which vector in the sequence, and 1 dimension to designate the scalar in that dimension. In other words, its a 2D grid of numbers; a vector can be seen as a 1D grid of numbers. For example, if we have a matrix
X = [[0, 1, 2],
[3, 4, 5]]
Then X[0] = [0 1 2]
and X[0,1] = X[0][1] = 1
.
print(np.ones((5,6)))
X = np.array([[0, 1, 2],
[3, 4, 5]])
print(X, X[0], X[0,1], X[0][1])
3-D Tensors¶
Now we move on past the cushy realm of matrices. However, this is again quite similar to before. A 3-D tensor is simply a finite sequence of matrices of the same size. The key is that there are now 3 dimensions: 1 dimension to designate which matrix, and 2 dimensions to designate the scalar in that matrix. In other words, its just a 3D grid of numbers. For example, consider the following two matrices
M1 = [[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]
M2 = [[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]
We can form a tensor by simply making the sequence of these two matrices:
T = [M1, M2]
= [[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]]
M1 = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
M2 = np.array([[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])
T = np.array([M1, M2])
print(T)
N-D Tensors¶
Lastly, we can generalize to arbitrary dimensions. An N-D tensor is a finite sequence of (N-1)-D tensors of the same size. There are N diemnsions: 1 dimension to designate which (N-1)-D tensor, and N-1 dimensions to denote the scalar in that (N-1)-D tensor. Again, this is simply an N-dimensional grid of numbers. We can form an N-D tensor by making a sequence of (N-1)-D tensors.
T0_3D = T.copy()
T1_3D = T.copy()*(-1)
T_4D = np.array([T0_3D, T1_3D])
print(T_4D)
Note that using this nomenclature, we can say that a matrix is just a 2-D tensor, a vector is a 1-D tensor, and a scalar is a 0-D tensor.
Tensor Operations¶
This is all fairly straightforward: we're just making grids of numbers with more dimensions. But how do we do algebra with these tensors? First, we will introduce some notation that is modeled after the notation used in Python's Numpy library.
- Let T be a tensor. Then, T[i1, i2, ..., ik] is the (i1, i2, .., ik)th element of the Tensor. We can replace indices with slices to indicate that we want all elements along that dimension. The number of unfixed dimensions will determine the dimension of the resulting sub-tensor.
- E.g. T[0,:,2] will fix the 0th and 2th dimension indices to be 0 and 2 respectively, and return all the elements along the 1th dimension, resulting in a 1D tensor (vector).
Vector operations¶
With vectors, recall we have scalar multiplication and vector dot product:
- For any vector v, and scalar a, we can calculate a*v with element-wise multiplication by a:
(a\*v)[i] = a\*v[i]
- If v1 and v2 have the same dimension k, we can calculate the dot product of v1 and v2, we will designate this as dot(v1,v2).
dot(v1,v2) = v1[0]*v2[0] + .. + v1[k]*v2[k]
v1 = np.arange(5)
v2 = np.arange(5)
print(v1, v2, np.dot(v1, v2))
Matrix operations¶
Also recall your standard dot matrix operations:
For any matrix M and scalar a, we can calculate a*v with element-wise multiplication by a.
(a\* M)[i,j] = a\*M[i,j]
If M1 is n by k and v is a k length vector, we can do the normal matrix-vector multiplication as the dot product of the rows of M1 with v.
dot(M1, v)[i] = dot(M1[i,:],v)
Alternatively, if M1 is k by n and v is a k length vector, we can do a 'left multiply' as the dot product of v with the rows of M1.
dot(v, M1)[i] = dot(v, M1[:,i])
If M1 is n by k and M2 is k by m, then we can perform the usual matrix multiplication as the dot product of the rows of M1 with the columns of M2. In numpy, this is called the dot operation as well.
dot(M1, M2)[i,j] = dot(M1[i,:], M2[:,j])
Note that the colon is used to designate a slice across an entire axis, and we provide an index into one dimension which reduces the matrix to a vector.
a = 5
M1 = np.ones((2,3))
M2 = np.arange(6).reshape(3,2)
v1 = np.arange(3)
v2 = np.arange(2)
print(a*M1)
print(np.dot(M1,v1))
print(np.dot(v2, M1))
print(np.dot(M1, M2))
Note that the order here matters. Let's try to be more specific in what is actually going on here.
vecdot¶
For notational simplicity, we'll use vecdot
to denote the the normal dot product of the vectorized inputs. Specifically:
vecdot(T1, T2) = dot(vec(T1), vec(T2))
The reason is because the normal dot product has different semantics based on the inputs. The vecdot
operator allows us to have a general dot operation that works the same across all different types of inputs.
Note that if T1=v1
and T2=v2
are both 1D tensors (vectors), then, it is equivalent to the normal dot product:
vecdot(v1, v2) = dot(v1,v2)
Also note that if T1=T2=M1
is a matrix, then this is just the squared Frobenius norm.
vecdot(M1, M1) = ||M1||_F^2
Note that in Numpy, this operation is called ravel
.
def vecdot(T1, T2):
return np.dot(np.ravel(T1), np.ravel(T2))
T1 = np.arange(6).reshape(3,2)
T2 = np.ones((3,2))
print(T1)
print(T2)
print(vecdot(T1,T2))
tensordot¶
Now we're ready to define tensordot, which will be slightly different from a dot product but generalize to arbitrary dimensions. It will have the following signature:
tensordot(T1, T2, axes=(L1, L2))
T1 : D1 dimensional Tensor
T2 : D2 dimensional Tensor
L1 : list of k dimensions (axes), a subset of size k of range(D1)
L2 : list of k dimensions (axes), a subset of size k of range(D2)
At a high level, it is simply doing a 'dot product' over the specified dimensions, keeping all the other dimensions fixed. Without loss of generality, let's assume that the specified dimensions L1 = (i1, ..., ik)
and L2 = (j1, ... jk)
are the last in the list of dimensions (they do not necessarily have to, but in the definition it is notationally nicer). Then if n1=(D1-k)
and n2=(D2-k)
are the number of remaining dimensions in the two tensors, we have
tensordot(T1, T2, (L1, L2))[a1, ..., an1 , b1, ..., bn2]
= vecdot(T1[a1, ..., an1, :, .... :], T2[b1, ..., bn2, :, ..., :])
where the slices are over the specified dimensions. If the specified dimensions are not at the end, then we just rearrange the slices and the fixed axes. For the purposes of this guide, we will refer to the two sets of dimensions:
- static dimensions: these are the unspecified dimensions,
(a1, ..., an1)
and(b1, ..., bn2)
. These determine the output tensor's shape. - collapsed dimensions: these are the specified dimensions
L1
andL2
. Thevecdot
operation is done over these dimensions, and so these dimensions are lost from the tensor.
from itertools import product
def tensordot(T1, T2, axes=()):
if(len(axes) != 2):
raise ValueError("L1 and L2 must have equal length (found {} and {}).".format(len(L1), len(L2)))
L1, L2 = axes
# get lists of static dimensions
T1_rdim = [d for d in range(T1.ndim) if d not in L1]
T2_rdim = [d for d in range(T2.ndim) if d not in L2]
# get shapes of static dimensions, this is the shape of the output
T1_rshape = [T1.shape[d] for d in T1_rdim]
T2_rshape = [T2.shape[d] for d in T2_rdim]
# initialize the tensor
T = np.zeros(tuple(T1.shape[d] for d in T1_rdim) + tuple(T2.shape[d] for d in T2_rdim))
# Initialize the indices to T1 and T2
T1_index = [slice(None,None,None)]*T1.ndim
T2_index = [slice(None,None,None)]*T2.ndim
for t1 in product(*[range(n) for n in T1_rshape]):
for t2 in product(*[range(n) for n in T2_rshape]):
# for each scalar in the output tensor, construct the appropriate
# indexing and calculate the vecdot value.
for i in range(len(T1_rdim)):
T1_index[T1_rdim[i]] = t1[i]
for i in range(len(T2_rdim)):
T2_index[T2_rdim[i]] = t2[i]
T[t1+t2] = vecdot(T1[tuple(T1_index)], T2[tuple(T2_index)])
return T
T1 = np.arange(24).reshape(2,3,4)
T2 = np.ones(3)
print(T1)
print(T2)
print(tensordot(T1,T2, ([1], [0])))
What does this mean? Let's go over some known examples first.
Example: Vector dot product¶
For example, recall the vector dot product over vectors v1
and v2
. Note that since they are both vectors, each one only has 1 dimension: the 0th dimension. Thus:
tensordot(v1, v2, (0,), (0,)) = vecdot(v1[:], v2[:]) = dot(v1, v2)
Very naturally, we get the original vector dot product! So a tensor dot product of two vectors along the 0th axis is the same as the original vector dot product. Since there is only one dimension available in each vector, there is only one other option, to specify no indices at all:
tensordot(v1, v2, (), ())[i,j] = vecdot(v1[i], v2[j]) = outer(v1, v2)
This should also be a familiar operation: the outer product. While I think this should just be treated as a subcase of the matrix product, I bring it up here to show all the possible cases with two vectors.
v1 = np.ones(4)
v2 = np.arange(4)
print(np.dot(v1, v2), tensordot(v1, v2, ([0], [0])))
print(np.outer(v1, v2))
print(tensordot(v1, v2, ([], [])))
Example: Matrix vector product¶
Now lets go to the matrix vector product. Recall that we can 'left multiply' and 'right multiply' by a vector with dot(v,M)
and dot(M,v)
. However, this is the same as just picking an axis of M
, and taking the dot product with v
!
tensordot(M, v), (0,), ())[a] = vecdot(M[:,a], v) = dot(v, M)[a]
tensordot(M, v), (1,), ())[a] = vecdot(M[a,:], v) = dot(M, a)[a]
Notice that we no longer have to worry about the order of M
and v
: we just need to specify what axis to perform the tensordot operation.
M = np.arange(9).reshape(3,3)
v = np.ones(3)
print(np.dot(v, M))
print(tensordot(M, v, ([0], [0])))
print(np.dot(M,v))
print(tensordot(M, v, ([1], [0])))
Example: Matrix matrix product¶
Finally, lets consider the matrix matrix product M1 x M2
where M1
is n
by k
and M2
is k
by n
. Again, we have an option of left and right multiplying these matrices. But in the end, we see the same story: we can do the same thing with a tensordot
operation and specifying the correct dimension.
tensordot(M1, M2, (1,), (0,))[a,b] = vecdot(M1[a,:], M2[:,b])[a,b] = dot(M1,M2)[a,b]
tensordot(M2, M1, (1,), (0,))[a,b] = vecdot(M2[a,:], M1[:,b])[a,b] = dot(M2,M1)[a,b]
tensordot(M1, M2, (0,), (1,))[a,b] = vecdot(M1[:,a], M2[b,:])[a,b] = dot(M2,M1).T[a,b]
tensordot(M2, M1, (0,), (1,))[a,b] = vecdot(M2[:,a], M1[b,:])[a,b] = dot(M1,M2).T[a,b]
M1 = np.arange(12).reshape(3,4)
M2 = np.random.random((4,3))
print(np.dot(M1,M2))
print(tensordot(M1, M2, ([1], [0])))
print(np.dot(M2,M1))
print(tensordot(M2, M1, (([1], [0]))))
print(np.dot(M2,M1).T)
print(tensordot(M1, M2, ([0], [1])))
print(np.dot(M1,M2).T)
print(tensordot(M2, M1, ([0], [1])))
A note on transposes: Here, we notice that swapping the arguments of the tensordot operation resulted in the transpose operation. This is because of the ordering of the dimensions: so far we've always listed the dimensions of the first tensor followed by that of the second tensor. Thus, swapping matrix arguments reverses the order of the two dimensions, which is equivalent to doing a transpose.
Example: Frobenius norm¶
Recall that the vecdot
of a matrix with itself is the squared Frobenius norm. We can rewrite this using the tensordot
operation as follows:
tensordot(M, M, (0, 1), (0, 1)) = vecdot(M[:,:], M[:,:]) = ||M||_F^2
As a sneak peak towards the future, the ability to take a vecdot
product along multiple axis instead of just one will be extremely valuable when calculating differentials.
M = np.arange(6).reshape(2,3)
print(tensordot(M, M, ([0,1], [0,1])), np.sum(M*M))
Example: tensor tensor products¶
We are now ready to look at an example of a tensor tensor product. By specifying the axes for which to do a dot product, we proceed in exactly the same manner as before, with simply more dimensions. Furthermore, it turns out numpy also has tensordot
built in as a function, so you can use that one instead of the one coded earlier (which is inefficient).
The only interface difference is that instead of always, passing two lists of of axes, the 3rd argument has two options:
- if it is a two-tuple of lists of dimensions, proceed as usual
- if it is an integer k, then infer the lists of dimensions to be the last k dimensions of T1 and the first k dimensions of T2.
T1 = np.arange(24).reshape(3,2,4)
T2 = np.ones((2,4,2))
print(tensordot(T1, T2, ([1,2], [2,1])))
print(np.tensordot(T1, T2, axes=([1,2], [2,1])))
print(tensordot(T1, T2, ([1,2], [0,1])))
print(np.tensordot(T1, T2, 2))
Notation¶
In Latex we will refer to tensordot
as a normal dot with a $T$ subscript, $\cdot_T$.
In general, we will assume the case where axes=k=\min(T1.ndim, T2.ndim)
. In other words, unless otherwise specified, the tensordot
will occur over as many dimensions as possible following the semantics used by Numpy (the last k
dimensions of T1
matched with the first k
dimensions of T2
.
As a preview, this will manifest itself as $\mathrm df = T_1 \cdot_T \mathrm dT_2$, where
- $f$ is a tensor valued function that takes tensor valued arguments
- $T_1$ is a tensor of partial derivatives
- $\mathrm dT_2$ is the differential of the arguments to $f$
Tensordot properties¶
Lastly, we cover a few properties of tensordot
. First we start with the case where the tensor is a matrix.
- Transpose invariant: $M_1^T\cdot_T M_2^T = M_1 \cdot_T M_2$ (e.g. swapping the axes of two two matrices doesn't change the
tensordot
value). - Inner/outer product relation: $x^TMy = y\cdot_T (x\cdot_T M)= xy^T \cdot_T M$ (this is why a lot of tensor references define Tensors as linear functions, functions from vector spaces to scalars)
These can both be fairly quickly verified by writing out the definitions. Of course, the tensor variants work as well. Let $\otimes$ be an outer product, then:
- Transpose invariant: $T_1^T\cdot_T T_2^T = T_1 \cdot_T T_2$ (where the transpose here refers to some reordering of the axes).
- Inner/outer product relation: $(x_k \cdot_T (\dots(x_1\cdot_T M))) = (x_1 \otimes \dots \otimes x_k) \cdot_T M$ (for a $k$ dimensional tensor, taking the dot product with a sequence of $k$ vectors is equivalent to taking the
tensordot
with the tensor formed by the outer product of the vectors).
This last one is actually even more general, can works with products of tensors instead of just vectors.
- Inner/outer product relation: $(T_k \cdot_T (\dots(T_1\cdot_T T))) = (T_1 \otimes \dots \otimes T_k) \cdot_T T$
Rearranging expressions¶
Lastly, sometimes we'll want to rearrange matrix expressions as a tensordot expression. For matrix calculations, note that
$$ABC = A\otimes C \cdot_T^{2,3} B$$where the tensor dot product is taken over axes 2 and 3. By doing so, we can pull out B as a tensor product, which will be useful in the next section in taking differentials of tensor functions. Implementation wise, this is achieved by either passing the axes to the tensordot function, or just rolling axis 3 to position 1.
A = np.random.randn(2,3)
B = np.random.randn(3,4)
C = np.random.randn(4,5)
print(A.dot(B).dot(C))
print(np.tensordot(np.multiply.outer(A,C),B, axes=([1,2],[0,1])))
print(np.tensordot(np.rollaxis(np.multiply.outer(A,C),3,1),B))
Comments