NumPy-style broadcasting for R TensorFlow customers

By admin2010

July 22, 2025

112

We develop, prepare, and deploy TensorFlow fashions from R. However that doesn’t imply we don’t make use of documentation, weblog posts, and examples written in Python. We glance up particular performance within the official TensorFlow API docs; we get inspiration from different folks’s code.

Relying on how snug you’re with Python, there’s an issue. For instance: You’re purported to know the way broadcasting works. And maybe, you’d say you’re vaguely aware of it: So when arrays have totally different shapes, some parts get duplicated till their shapes match and … and isn’t R vectorized anyway?

Whereas such a worldwide notion may fit basically, like when skimming a weblog put up, it’s not sufficient to know, say, examples within the TensorFlow API docs. On this put up, we’ll attempt to arrive at a extra actual understanding, and examine it on concrete examples.

Talking of examples, listed below are two motivating ones.

Broadcasting in motion

The primary makes use of TensorFlow’s matmul to multiply two tensors. Would you prefer to guess the outcome – not the numbers, however the way it comes about basically? Does this even run with out error – shouldn’t matrices be two-dimensional (rank-2 tensors, in TensorFlow communicate)?

a <- tf$fixed(keras::array_reshape(1:12, dim = c(2, 2, 3)))
a 
# tf.Tensor(
# [[[ 1.  2.  3.]
#   [ 4.  5.  6.]]
# 
#  [[ 7.  8.  9.]
#   [10. 11. 12.]]], form=(2, 2, 3), dtype=float64)

b <- tf$fixed(keras::array_reshape(101:106, dim = c(1, 3, 2)))
b  
# tf.Tensor(
# [[[101. 102.]
#   [103. 104.]
#   [105. 106.]]], form=(1, 3, 2), dtype=float64)

c <- tf$matmul(a, b)

Second, here’s a “actual instance” from a TensorFlow Chance (TFP) github concern. (Translated to R, however holding the semantics).
In TFP, we will have batches of distributions. That, per se, isn’t a surprise. However take a look at this:

library(tfprobability)
d <- tfd_normal(loc = c(0, 1), scale = matrix(1.5:4.5, ncol = 2, byrow = TRUE))
d
# tfp.distributions.Regular("Regular", batch_shape=[2, 2], event_shape=[], dtype=float64)

We create a batch of 4 regular distributions: every with a special scale (1.5, 2.5, 3.5, 4.5). However wait: there are solely two location parameters given. So what are their scales, respectively?
Fortunately, TFP builders Brian Patton and Chris Suter defined the way it works: TFP truly does broadcasting – with distributions – identical to with tensors!

We get again to each examples on the finish of this put up. Our important focus will probably be to elucidate broadcasting as completed in NumPy, as NumPy-style broadcasting is what quite a few different frameworks have adopted (e.g., TensorFlow).

Earlier than although, let’s rapidly evaluation just a few fundamentals about NumPy arrays: Easy methods to index or slice them (indexing usually referring to single-element extraction, whereas slicing would yield – effectively – slices containing a number of parts); tips on how to parse their shapes; some terminology and associated background.
Although not difficult per se, these are the sorts of issues that may be complicated to rare Python customers; but they’re typically a prerequisite to efficiently making use of Python documentation.

Said upfront, we’ll actually limit ourselves to the fundamentals right here; for instance, we gained’t contact superior indexing which – identical to heaps extra –, will be regarded up intimately within the NumPy documentation.

Few details about NumPy

Fundamental slicing

For simplicity, we’ll use the phrases indexing and slicing kind of synonymously any further. The fundamental system here’s a slice, particularly, a begin:cease construction indicating, for a single dimension, which vary of parts to incorporate within the choice.

In distinction to R, Python indexing is zero-based, and the tip index is unique:

import numpy as np
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[1:7] 
# array([1, 2, 3, 4, 5, 6])

x[5:] 
# array([5, 6, 7, 8, 9])

x[:7]
# array([0, 1, 2, 3, 4, 5, 6])

x[:] 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x = np.array([[1, 2], [3, 4], [5, 6]])
x
# array([[1, 2],
#        [3, 4],
#        [5, 6]])

x[1, :] 
# array([3, 4])

x[1] 
# array([3, 4])

x[1, ] 
# array([3, 4])

Whereas the second positive appears a bit like R, the mechanism is totally different. Technically, these begin:cease issues are components of a Python tuple – that list-like, however immutable information construction that may be written with or with out parentheses, e.g., 1,2 or (1,2) –, and at any time when we’ve got extra dimensions within the array than parts within the tuple NumPy will assume we meant : for that dimension: Simply choose every part.

We will see that transferring on to a few dimensions. Here’s a 2 x 3 x 1-dimensional array:

x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
x
# array([[[1],
#         [2],
#         [3]],
# 
#        [[4],
#         [5],
#         [6]]])

x.form
# (2, 3, 1)

x[0,]
#array([[1],
#       [2],
#       [3]])

x[0, ...]
#array([[1],
#       [2],
#       [3]])

We cease right here with our choice of important (but complicated, presumably, to rare Python customers) Numpy indexing options; re. “presumably complicated” although, listed below are just a few remarks about array creation.

Syntax for array creation

Making a more-dimensional NumPy array just isn’t that arduous – relying on the way you do it. The trick is to make use of reshape to inform NumPy precisely what form you need. For instance, to create an array of all zeros, of dimensions 3 x 4 x 2:

np.zeros(24).reshape(4, 3, 2)

c1 = np.array([[[0, 0, 0]]])
c2 = np.array([[[0], [0], [0]]]) 
c3 = np.array([[[0]], [[0]], [[0]]])

c1.form # (1, 1, 3)
c2.form # (1, 3, 1)
c3.form # (3, 1, 1)

however we’d like to have the ability to “parse” internally with out executing the code. A technique to consider it could be processing the brackets like a state machine, each opening bracket transferring one axis to the fitting and each closing bracket transferring again left by one axis. Tell us should you can consider different – presumably extra useful – mnemonics!

Within the final sentence, we on objective used “left” and “proper” referring to the array axes; “on the market” although, you’ll additionally hear “outmost” and “innermost”. Which, then, is which?

A little bit of terminology

In frequent Python (TensorFlow, for instance) utilization, when speaking of an array form like (2, 6, 7), outmost is left and innermost is proper. Why?
Let’s take a less complicated, two-dimensional instance of form (2, 3).

a = np.array([[1, 2, 3], [4, 5, 6]])
a
# array([[1, 2, 3],
#        [4, 5, 6]])

Laptop reminiscence is conceptually one-dimensional, a sequence of areas; so once we create arrays in a high-level programming language, their contents are successfully “flattened” right into a vector. That flattening may happen “by row” (row-major, C-style, the default in NumPy), ensuing within the above array ending up like this

1 2 3 4 5 6

or “by column” (column-major, Fortran-style, the ordering utilized in R), yielding

1 4 2 5 3 6

for the above instance.

Now if we see “outmost” because the axis whose index varies the least typically, and “innermost” because the one which modifications most rapidly, in row-major ordering the left axis is “outer”, and the fitting one is “interior”.

Simply as a (cool!) apart, NumPy arrays have an attribute referred to as strides that shops what number of bytes need to be traversed, for every axis, to reach at its subsequent aspect. For our above instance:

c1 = np.array([[[0, 0, 0]]])
c1.form   # (1, 1, 3)
c1.strides # (24, 24, 8)

c2 = np.array([[[0], [0], [0]]]) 
c2.form   # (1, 3, 1)
c2.strides # (24, 8, 8)

c3 = np.array([[[0]], [[0]], [[0]]])
c3.form   # (3, 1, 1) 
c3.strides # (8, 8, 8)

For array c3, each aspect is by itself on the outmost stage; so for axis 0, to leap from one aspect to the subsequent, it’s simply 8 bytes. For c2 and c1 although, every part is “squished” within the first aspect of axis 0 (there’s only a single aspect there). So if we needed to leap to a different, nonexisting-as-yet, outmost merchandise, it’d take us 3 * 8 = 24 bytes.

At this level, we’re prepared to speak about broadcasting. We first stick with NumPy after which, look at some TensorFlow examples.

NumPy Broadcasting

What occurs if we add a scalar to an array? This gained’t be stunning for R customers:

a = np.array([1,2,3])
b = 1
a + b

array([2, 3, 4])

Technically, that is already broadcasting in motion; b is nearly (not bodily!) expanded to form (3,) with a purpose to match the form of a.

How about two arrays, one among form (2, 3) – two rows, three columns –, the opposite one-dimensional, of form (3,)?

a = np.array([1,2,3])
b = np.array([[1,2,3], [4,5,6]])
a + b

array([[2, 4, 6],
       [5, 7, 9]])

The one-dimensional array will get added to each rows. If a have been length-two as a substitute, would it not get added to each column?

a = np.array([1,2,3])
b = np.array([[1,2,3], [4,5,6]])
a + b

ValueError: operands couldn't be broadcast along with shapes (2,) (2,3)

So now it’s time for the broadcasting rule. For broadcasting (digital growth) to occur, the next is required.

We align array shapes, ranging from the fitting.

   # array 1, form:     8  1  6  1
   # array 2, form:        7  1  5

Beginning to look from the fitting, the sizes alongside aligned axes both need to match precisely, or one among them needs to be 1: By which case the latter is broadcast to the one not equal to 1.
If on the left, one of many arrays has a further axis (or multiple), the opposite is nearly expanded to have a 1 in that place, wherein case broadcasting will occur as said in (2).

Said like this, it in all probability sounds extremely easy. Perhaps it’s, and it solely appears difficult as a result of it presupposes right parsing of array shapes (which as proven above, will be complicated)?

Right here once more is a fast instance to check our understanding:

a = np.zeros([2, 3]) # form (2, 3)
b = np.zeros([2])    # form (2,)
c = np.zeros([3])    # form (3,)

a + b # error

a + c
# array([[0., 0., 0.],
#        [0., 0., 0.]])

All in accord with the principles. Perhaps there’s one thing else that makes it complicated?
From linear algebra, we’re used to pondering by way of column vectors (typically seen because the default) and row vectors (accordingly, seen as their transposes). What now’s

, of form – as we’ve seen just a few instances by now – (2,)? Actually it’s neither, it’s just a few one-dimensional array construction. We will create row vectors and column vectors although, within the sense of 1 x n and n x 1 matrices, by explicitly including a second axis. Any of those would create a column vector:

# begin with the above "non-vector"
c = np.array([0, 0])
c.form
# (2,)

# manner 1: reshape
c.reshape(2, 1).form
# (2, 1)

# np.newaxis inserts new axis
c[ :, np.newaxis].form
# (2, 1)

# None does the identical
c[ :, None].form
# (2, 1)

# or assemble instantly as (2, 1), taking note of the parentheses...
c = np.array([[0], [0]])
c.form
# (2, 1)

c = np.array([[0], [0]])
c.form
# (2, 1)

a = np.zeros([2, 3])
a.form
# (2, 3)
a + c
# array([[0., 0., 0.],
#       [0., 0., 0.]])

a = np.zeros([3, 2])
a.form
# (3, 2)
a + c
# ValueError: operands couldn't be broadcast along with shapes (3,2) (2,1)

a = np.array([0.0, 10.0, 20.0, 30.0])
a.form
# (4,)

b = np.array([1.0, 2.0, 3.0])
b.form
# (3,)

a[:, np.newaxis] * b
# array([[ 0.,  0.,  0.],
#        [10., 20., 30.],
#        [20., 40., 60.],
#        [30., 60., 90.]])

TensorFlow

If by now, you’re feeling lower than obsessed with listening to an in depth exposition of how TensorFlow broadcasting differs from NumPy’s, there’s excellent news: Principally, the principles are the identical. Nevertheless, when matrix operations work on batches – as within the case of matmul and pals – , issues should still get difficult; the most effective recommendation right here in all probability is to rigorously learn the documentation (and as at all times, strive issues out).

Earlier than revisiting our introductory matmul instance, we rapidly examine that actually, issues work identical to in NumPy. Because of the tensorflow R package deal, there isn’t any motive to do that in Python; so at this level, we swap to R – consideration, it’s 1-based indexing from right here.

First examine – (4, 1) added to (4,) ought to yield (4, 4):

a <- tf$ones(form = c(4L, 1L))
a
# tf.Tensor(
# [[1.]
#  [1.]
#  [1.]
#  [1.]], form=(4, 1), dtype=float32)

b <- tf$fixed(c(1, 2, 3, 4))
b
# tf.Tensor([1. 2. 3. 4.], form=(4,), dtype=float32)

a + b
# tf.Tensor(
# [[2. 3. 4. 5.]
# [2. 3. 4. 5.]
# [2. 3. 4. 5.]
# [2. 3. 4. 5.]], form=(4, 4), dtype=float32)

And second, once we add tensors with shapes (3, 3) and (3,), the 1-d tensor ought to get added to each row (not each column):

a <- tf$fixed(matrix(1:9, ncol = 3, byrow = TRUE), dtype = tf$float32)
a
# tf.Tensor(
# [[1. 2. 3.]
#  [4. 5. 6.]
#  [7. 8. 9.]], form=(3, 3), dtype=float32)

b <- tf$fixed(c(100, 200, 300))
b
# tf.Tensor([100. 200. 300.], form=(3,), dtype=float32)

a + b
# tf.Tensor(
# [[101. 202. 303.]
#  [104. 205. 306.]
#  [107. 208. 309.]], form=(3, 3), dtype=float32)

Now again to the preliminary matmul instance.

Again to the puzzles

The documentation for matmul says,

The inputs should, following any transpositions, be tensors of rank >= 2 the place the interior 2 dimensions specify legitimate matrix multiplication dimensions, and any additional outer dimensions specify matching batch measurement.

So right here (see code slightly below), the interior two dimensions look good – (2, 3) and (3, 2) – whereas the one (one and solely, on this case) batch dimension exhibits mismatching values 2 and 1, respectively.
A case for broadcasting thus: Each “batches” of a get matrix-multiplied with b.

a <- tf$fixed(keras::array_reshape(1:12, dim = c(2, 2, 3)))
a 
# tf.Tensor(
# [[[ 1.  2.  3.]
#   [ 4.  5.  6.]]
# 
#  [[ 7.  8.  9.]
#   [10. 11. 12.]]], form=(2, 2, 3), dtype=float64)

b <- tf$fixed(keras::array_reshape(101:106, dim = c(1, 3, 2)))
b  
# tf.Tensor(
# [[[101. 102.]
#   [103. 104.]
#   [105. 106.]]], form=(1, 3, 2), dtype=float64)

c <- tf$matmul(a, b)
c
# tf.Tensor(
# [[[ 622.  628.]
#   [1549. 1564.]]
# 
#  [[2476. 2500.]
#   [3403. 3436.]]], form=(2, 2, 2), dtype=float64)

Let’s rapidly examine this actually is what occurs, by multiplying each batches individually:

tf$matmul(a[1, , ], b)
# tf.Tensor(
# [[[ 622.  628.]
#   [1549. 1564.]]], form=(1, 2, 2), dtype=float64)

tf$matmul(a[2, , ], b)
# tf.Tensor(
# [[[2476. 2500.]
#   [3403. 3436.]]], form=(1, 2, 2), dtype=float64)

Is it too bizarre to be questioning if broadcasting would additionally occur for matrix dimensions? E.g., may we strive matmuling tensors of shapes (2, 4, 1) and (2, 3, 1), the place the 4 x 1 matrix can be broadcast to 4 x 3? – A fast check exhibits that no.

To see how actually, when coping with TensorFlow operations, it pays off overcoming one’s preliminary reluctance and truly seek the advice of the documentation, let’s strive one other one.

Within the documentation for matvec, we’re informed:

Multiplies matrix a by vector b, producing a * b.
The matrix a should, following any transpositions, be a tensor of rank >= 2, with form(a)[-1] == form(b)[-1], and form(a)[:-2] in a position to broadcast with form(b)[:-1].

In our understanding, given enter tensors of shapes (2, 2, 3) and (2, 3), matvec ought to carry out two matrix-vector multiplications: as soon as for every batch, as listed by every enter’s leftmost dimension. Let’s examine this – to this point, there isn’t any broadcasting concerned:

# two matrices
a <- tf$fixed(keras::array_reshape(1:12, dim = c(2, 2, 3)))
a
# tf.Tensor(
# [[[ 1.  2.  3.]
#   [ 4.  5.  6.]]
# 
#  [[ 7.  8.  9.]
#   [10. 11. 12.]]], form=(2, 2, 3), dtype=float64)

b = tf$fixed(keras::array_reshape(101:106, dim = c(2, 3)))
b
# tf.Tensor(
# [[101. 102. 103.]
#  [104. 105. 106.]], form=(2, 3), dtype=float64)

c <- tf$linalg$matvec(a, b)
c
# tf.Tensor(
# [[ 614. 1532.]
#  [2522. 3467.]], form=(2, 2), dtype=float64)

Doublechecking, we manually multiply the corresponding matrices and vectors, and get:

tf$linalg$matvec(a[1,  , ], b[1, ])
# tf.Tensor([ 614. 1532.], form=(2,), dtype=float64)

tf$linalg$matvec(a[2,  , ], b[2, ])
# tf.Tensor([2522. 3467.], form=(2,), dtype=float64)

The identical. Now, will we see broadcasting if b has only a single batch?

b = tf$fixed(keras::array_reshape(101:103, dim = c(1, 3)))
b
# tf.Tensor([[101. 102. 103.]], form=(1, 3), dtype=float64)

c <- tf$linalg$matvec(a, b)
c
# tf.Tensor(
# [[ 614. 1532.]
#  [2450. 3368.]], form=(2, 2), dtype=float64)

Multiplying each batch of a with b, for comparability:

tf$linalg$matvec(a[1,  , ], b)
# tf.Tensor([ 614. 1532.], form=(2,), dtype=float64)

tf$linalg$matvec(a[2,  , ], b)
# tf.Tensor([[2450. 3368.]], form=(1, 2), dtype=float64)

It labored!

Now, on to the opposite motivating instance, utilizing tfprobability.

Broadcasting all over the place

Right here once more is the setup:

library(tfprobability)
d <- tfd_normal(loc = c(0, 1), scale = matrix(1.5:4.5, ncol = 2, byrow = TRUE))
d
# tfp.distributions.Regular("Regular", batch_shape=[2, 2], event_shape=[], dtype=float64)

What’s going on? Let’s examine location and scale individually:

d$loc
# tf.Tensor([0. 1.], form=(2,), dtype=float64)

d$scale
# tf.Tensor(
# [[1.5 2.5]
#  [3.5 4.5]], form=(2, 2), dtype=float64)

Simply specializing in these tensors and their shapes, and having been informed that there’s broadcasting happening, we will motive like this: Aligning each shapes on the fitting and increasing loc’s form by 1 (on the left), we’ve got (1, 2) which can be broadcast with (2,2) – in matrix-speak, loc is handled as a row and duplicated.

That means: We’ve two distributions with imply (0) (one among scale (1.5), the opposite of scale (3.5)), and in addition two with imply (1) (corresponding scales being (2.5) and (4.5)).

Right here’s a extra direct strategy to see this:

d$imply()
# tf.Tensor(
# [[0. 1.]
#  [0. 1.]], form=(2, 2), dtype=float64)

d$stddev()
# tf.Tensor(
# [[1.5 2.5]
#  [3.5 4.5]], form=(2, 2), dtype=float64)

Puzzle solved!

Summing up, broadcasting is straightforward “in idea” (its guidelines are), however may have some practising to get it proper. Particularly together with the truth that features / operators do have their very own views on which components of its inputs ought to broadcast, and which shouldn’t. Actually, there isn’t any manner round trying up the precise behaviors within the documentation.

Hopefully although, you’ve discovered this put up to be a superb begin into the subject. Perhaps, just like the writer, you’re feeling such as you may see broadcasting happening anyplace on the planet now. Thanks for studying!

NumPy-style broadcasting for R TensorFlow customers

Broadcasting in motion

Few details about NumPy

Fundamental slicing

Syntax for array creation

A little bit of terminology

NumPy Broadcasting

TensorFlow

Again to the puzzles

Broadcasting all over the place

How AI is popping the Iran battle into theater

Andrew Ng’s Crew Releases Context Hub: An Open Supply Device that Provides Your Coding Agent the Up-to-Date API Documentation It Wants

Prime Price-Environment friendly Small Fashions for AI APIs

LEAVE A REPLY Cancel reply

Most Popular

Samsung Galaxy S26 Sequence Launch: Pre-Reserving Presents, Free Items, and Financial institution Reductions

Here is How Many Shares of Capital Energy You Ought to Personal to Get $1,000 in Dividends

How AI is popping the Iran battle into theater

Greenback-Pegged Stablecoins Surge to $313B in Threat-Off Pivot amid US–Iran Tensions

Recent Comments

ABOUT US

POPULAR POSTS

Samsung Galaxy S26 Sequence Launch: Pre-Reserving Presents, Free Items, and Financial institution Reductions

Here is How Many Shares of Capital Energy You Ought to Personal to Get $1,000 in Dividends

How AI is popping the Iran battle into theater

POPULAR CATEGORY