Non-volatile summation of contiguous dynamic ranges in Excel

This is just a very quick post, more as a reminder to my future self than anything else, but when summing a contiguous, dynamically-sized range in Excel, it can be orders of magnitude faster to use INDEX in place OFFSET, the latter of which is volatile and is therefore recalculated in response to changes to any other value on the sheet, regardless of its dependencies. This has already been explored by others, but just for future reference, always try to replace:

=SUM(A1:OFFSET(A1,rowOffset,0))

with:

=SUM(A1:INDEX(A:A,1+rowOffset))

That’s all for now!

Replacing cell value assignment loops with single array assignments in VBA

If you’re shifting large amounts of data around between VBA variables and Excel worksheets, there are substantial performance gains that can be achieved by making relatively straightforward changes to your VBA, especially if you’re faced with code similar to the following:

Dim i, j, columns, rows As Integer
columns = 50
rows = 50

For i = 1 To columns
    For j = 1 To rows
        Sheets("RandomValues").Cells(j, i).Value = Rnd
    Next j
Next i

If the intention is to simply move data from VBA to a worksheet in a single, atomic operation, the very first (and simplest) performance optimization is to switch off automatic recalculation before entering the loop:

Dim i, j, columns, rows As Integer
columns = 50
rows = 50

Application.Calculation = xlManual

For i = 1 To columns
    For j = 1 To rows
        Sheets("RandomValues").Cells(j, i).Value = Rnd
    Next j
Next i

Application.Calculation = xlCalculationAutomatic

In the above case of generating 2,500 (in this case, random) values in VBA and writing them out to the RandomValues sheet, switching off automatic recalculation reduced the mean runtime over five runs from 35.22s (standard deviation [SD] = 0.367s) to 0.270s (SD = 0.010s), a performance improvement of just over two orders of magnitude.

But what happens if we increase the number of values to, say, 250,000 in a 500×500 grid? In this case, even with automatic recalculation switched off, we see the runtime of the above code increasing roughly linearly back up to 26.62s (SD = 0.317s). The major bottleneck has now shifted to the cell value assignment, which is being called 250,000 times in the inner loop. The easiest way to alleviate this is to assign the values to an array in the inner loop, then assign the array directly to the desired range:

Option Base 1

Dim i, j, columns, rows As Integer
Dim values() As Double
columns = 500
rows = 500

Application.Calculation = xlManual

ReDim values(rows, columns)

For i = 1 To columns
    For j = 1 To rows
        values(j, i) = Rnd
    Next j
Next i

With Sheets("RandomValues")
    .Range(.Cells(1, 1), .Cells(rows, columns)).Value = values
End With

Application.Calculation = xlCalculationAutomatic

This once again gives us a performance improvement in excess of two orders of magnitude relative to the manual calculation single-cell assignment loop. With 100x more values than in the first two examples, the runtime dropped back down to 0.138s (SD = 0.004s). Since the VBA is no longer interacting with the worksheet in the inner loop, we could actually leave automatic recalculation on in this final example, although there is still a slight performance hit relative to switching on manual recalculation: leaving automatic recalculation on increased the runtime to 0.163s (SD = 0.004s).

If we examine these three approaches together on a modest number of data points, say 1,000 in a 10×100 grid, we can see the substantial incremental effects on the performance of moving the data out onto the worksheet:

SD, standard deviation; n=5 for each technique
Technique Mean runtime (ms) SD (ms)
Automatic recalculation with individual cell assignment 13,851.80 292.54
Manual recalculation with individual cell assignment 106.04 3.48
Manual recalculation with single array assignment 0.86 0.14

The improvement is so dramatic that if you do have any VBA that’s performing a similar task, it’s definitely worth investigating both of these relatively straightforward optimizations.