Replacing cell value assignment loops with single array assignments in VBA

If you’re shifting large amounts of data around between VBA variables and Excel worksheets, there are substantial performance gains that can be achieved by making relatively straightforward changes to your VBA, especially if you’re faced with code similar to the following:

Dim i, j, columns, rows As Integer
columns = 50
rows = 50

For i = 1 To columns
    For j = 1 To rows
        Sheets("RandomValues").Cells(j, i).Value = Rnd
    Next j
Next i

If the intention is to simply move data from VBA to a worksheet in a single, atomic operation, the very first (and simplest) performance optimization is to switch off automatic recalculation before entering the loop:

Dim i, j, columns, rows As Integer
columns = 50
rows = 50

Application.Calculation = xlManual

For i = 1 To columns
    For j = 1 To rows
        Sheets("RandomValues").Cells(j, i).Value = Rnd
    Next j
Next i

Application.Calculation = xlCalculationAutomatic

In the above case of generating 2,500 (in this case, random) values in VBA and writing them out to the RandomValues sheet, switching off automatic recalculation reduced the mean runtime over five runs from 35.22s (standard deviation [SD] = 0.367s) to 0.270s (SD = 0.010s), a performance improvement of just over two orders of magnitude.

But what happens if we increase the number of values to, say, 250,000 in a 500×500 grid? In this case, even with automatic recalculation switched off, we see the runtime of the above code increasing roughly linearly back up to 26.62s (SD = 0.317s). The major bottleneck has now shifted to the cell value assignment, which is being called 250,000 times in the inner loop. The easiest way to alleviate this is to assign the values to an array in the inner loop, then assign the array directly to the desired range:

Option Base 1

Dim i, j, columns, rows As Integer
Dim values() As Double
columns = 500
rows = 500

Application.Calculation = xlManual

ReDim values(rows, columns)

For i = 1 To columns
    For j = 1 To rows
        values(j, i) = Rnd
    Next j
Next i

With Sheets("RandomValues")
    .Range(.Cells(1, 1), .Cells(rows, columns)).Value = values
End With

Application.Calculation = xlCalculationAutomatic

This once again gives us a performance improvement in excess of two orders of magnitude relative to the manual calculation single-cell assignment loop. With 100x more values than in the first two examples, the runtime dropped back down to 0.138s (SD = 0.004s). Since the VBA is no longer interacting with the worksheet in the inner loop, we could actually leave automatic recalculation on in this final example, although there is still a slight performance hit relative to switching on manual recalculation: leaving automatic recalculation on increased the runtime to 0.163s (SD = 0.004s).

If we examine these three approaches together on a modest number of data points, say 1,000 in a 10×100 grid, we can see the substantial incremental effects on the performance of moving the data out onto the worksheet:

SD, standard deviation; n=5 for each technique
Technique Mean runtime (ms) SD (ms)
Automatic recalculation with individual cell assignment 13,851.80 292.54
Manual recalculation with individual cell assignment 106.04 3.48
Manual recalculation with single array assignment 0.86 0.14

The improvement is so dramatic that if you do have any VBA that’s performing a similar task, it’s definitely worth investigating both of these relatively straightforward optimizations.

Adding text cells to an Excel worksheet using xlsx4j

I’ve just started using xlsx4j (a large collection of classes in what is more widely known as the docx4j project) to implement Excel export as part of a large, ongoing StackThread project. As an extremely brief introduction, docx4j uses the Java Architecture for XML Binding (JAXB) to create an in-memory representation of the Microsoft Open XML format that underlies all Microsoft Office documents in the post-2007 era.

Since xlsx4j is effectively a wrapper around the Open XML format, it becomes immediately apparent in using the library that you need to have some familiarity with the file format in order to get anything working. The first snag I hit was the manner in which Excel handles string storage, namely through the use of a shared string table. The shared string table contains every unique string in the workbook only once. Every cell that contains a string actually just holds a reference to the appropriate string in the shared string table. While I did eventually get this working, it took quite a lot of code to do so:

// Create a new spreadsheet package
SpreadsheetMLPackage pkg = SpreadsheetMLPackage.createPackage();

// Create a new worksheet part and retrieve the sheet data
WorksheetPart sheet = pkg.createWorksheetPart(new PartName("/xl/worksheets/sheet1.xml"), "Sheet 1", 1);
SheetData sheetData = sheet.getJaxbElement().getSheetData();

// Keep track of how many strings we've added
long sharedStringCounter = 0;

// Create a new row
Row row = Context.getsmlObjectFactory().createRow();

// Create a shared strings table instance
CTSst sharedStringTable = new CTSst();
CTXstringWhitespace ctx;
CTRst crt;

// Create 10 cells and add them to the row
for (int i = 0; i < 10; i++) {
	
	// Create a shared string
	crt = new CTRst();
	ctx = Context.getsmlObjectFactory().createCTXstringWhitespace();
	ctx.setValue("Shared string text " + Integer.toString(i + 1));
	crt.setT(ctx);
    
	// Add it to the shared string table
	sharedStringTable.getSi().add(crt);
 
	// Add a reference to the shared string to our cell using the counter
    Cell cell = Context.getsmlObjectFactory().createCell();
    cell.setT(STCellType.S);
    cell.setV(String.valueOf(sharedStringCounter));
    
    // Add the cell to the row and increment the counter
    row.getC().add(cell);
    sharedStringCounter++;
}

// Add the row to our sheet
sheetData.getRow().add(row);

// Set the string and unique string counts on the shared string table
sharedStringTable.setCount(sharedStringCounter);
sharedStringTable.setUniqueCount(sharedStringCounter);

// Create a SharedStrings workbook part 
SharedStrings sharedStrings = new SharedStrings(new PartName("/xl/sharedStrings.xml"));

// Add the shared string table to the part
sharedStrings.setJaxbElement(sharedStringTable);

// Then add the part to the workbook
Parts parts = pkg.getParts();
Part workBook = parts.get( new PartName("/xl/workbook.xml") );
workBook.addTargetPart(sharedStrings);

This is, of course, greatly simplified. Notably, as you'll see on line 46, it's not keeping track of unique strings. While this wouldn't be too difficult to implement (a HashMap<String, Integer> could be used to keep count of unique strings and instances), the above code feels very heavyweight for one of the most basic spreadsheet manipulation tasks. Thankfully, there is an easier way.

The easier way

After a little bit of digging around, I discovered that cells also support inline strings. With the addition of a convenience newCellWithInlineString method to generate INLINE_STR cells, the shared string table code can be removed entirely, leaving this:

// Create a new spreadsheet package
SpreadsheetMLPackage pkg = SpreadsheetMLPackage.createPackage();

// Create a new worksheet part and retrieve the sheet data
WorksheetPart sheet = pkg.createWorksheetPart(new PartName("/xl/worksheets/sheet1.xml"), "Sheet 1", 1);
SheetData sheetData = sheet.getJaxbElement().getSheetData();

// Create a new row
Row row = Context.getsmlObjectFactory().createRow();

// Create 10 cells and add them to the row
for (int i = 0; i < 10; i++) {
	row.getC().add(this.newCellWithInlineString("Inline string text " + Integer.toString(i + 1)));
}

// Add the row to our sheet
sheetData.getRow().add(row);

private Cell newCellWithInlineString(String string) {
	
	CTXstringWhitespace ctx = Context.getsmlObjectFactory().createCTXstringWhitespace();
	ctx.setValue(string);
	
	CTRst ctrst = new CTRst();
	ctrst.setT(ctx);
	
	Cell newCell = Context.getsmlObjectFactory().createCell();
	newCell.setIs(ctrst);
	newCell.setT(STCellType.INLINE_STR);
	
	return newCell;
}

While the first example could definitely have been distilled down and refactored to keep the workings of the shared string table hidden away, the lack of string-tracking overhead in the second example makes for much easier-to-read code in my opinion. It's also much faster to get to the point of functional Excel export without having the hassle of setting up the shared string table. I haven't yet investigated the performance implications of using INLINE_STR when opening the workbook in Excel, but from a quick look at the XML after opening and saving an Excel workbook created using the INLINE_STR method, it looks as though Excel generates its own shared string table when the workbook's first saved.

Code-signing Windows executables using Authenticode

I’ve recently made a Windows installer (in the form of a WinRAR self-extracting archive) available for download to one of our clients. When testing out the download link (which is currently just secured using HTTP Basic Auth), I chose to “Run” the installer directly and was presented with Microsoft’s SmartScreen Application Reputation feature, introduced with IE9 in 2010. For an infrequently downloaded, unsigned installer package, the dialog that appears after the installer has downloaded and Windows has run its security scan looks like this:
SmartScreen Application Reputation warningThis is obviously fairly intimidating to the end user. To complete the download and run the installer, the user must click “Actions”, which displays a screen noting that the publisher is unknown, that SmartScreen Filter “reported that this program is not commonly downloaded and is not signed by its author” and presenting three options:

  • Don’t run this program (recommended)
  • Delete program
  • Run anyway

The “Don’t run this program” option is selected by default. This is bespoke software, so it’s never going to be “commonly downloaded”, but is there anything we can do to make the process less user hostile? The first port of call was to obtain a code-signing certificate to use with Microsoft’s Authenticode program. The process is summarized in this MSDN blog post from 2011, but I thought it would be worth documenting the process here since the nuts and bolts of it don’t all seem to be in one place anywhere else:

1. Obtain a code-signing certificate

Since DigiCert were on our client’s list of approved certificate providers, we ordered a 1-year code signing certificate which, at the time of writing, cost $223. After an email asking to confirm the order and a telephone call from DigiCert to confirm contact details, DigiCert emailed through a link to create and download the certificate. Since I was downloading the certificate onto an OS X machine in Chrome, the certificate was automatically added to the login Keychain. If you downloaded the certificate using IE or Chrome in Windows, you can skip to step 3 as the certificate should already be in the Windows certificate store.

2. Export certificate to a .p12 file from Keychain Access

Select the downloaded certificate in your Keychain, ensuring that you’ve first selected the appropriate Keychain and the “My Certificates” category to the left:
Keychain export
Then choose File > Export Items… and save the certificate in “Personal Information Exchange (.p12)” format. After choosing the file location, you’ll be prompted to set a password for the exported certificate (and then for the Administrator password to actually export the certificate):Keychain export 3

3. Install signtool in Windows

With the certificate exported to a .p12 file, it’s now time to sign your Windows executable. This is best done in Windows using Microsoft’s signtool command line utility. More information about the utility can be found over at MSDN, but you’ll first need to ensure that it’s installed. The utility is bundled with Microsoft Visual Studio 2005 or later and is included in the Microsoft Windows SDK. If you’ve gone down the route of installing the Windows SDK, you should find that signtool is installed here: C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin

4. Sign the executable

Since I downloaded the certificate in OS X, the appropriate signtool incantation references the .p12 file exported in step 2:

>signtool sign
/t http://timestamp.digicert.com
/f "C:\path\to\certificate.p12"
/p password_you_set_during_p12_export
/a "C:\path\to\the\executable\you\want\to\sign.exe"

Note that the timestamp server option specified by \t is optional, but since it adds a cryptographically-verifiable timestamp to the signature, it proves that the executable was signed when the certificate was still valid, obviating the need to re-sign every signed executable at the time of certificate expiry. Each certificate authority has its own timestamp server, which you should be able to locate in their documentation.

After running the above command (without the /f and /p options if you downloaded the certificate in Windows in the first place), you should see the following at the command prompt:

Done Adding Additional Store
Successfully signed and timestamped: C:\path\to\the\executable\you\want\to\sign.exe

And that’s it. Your executable is now cryptographically signed. You can verify this in Windows, by selecting the executable in Windows Explorer, selecting File > Properties and switching to the Digital Signatures tab, under which the timestamped signature should be listed.

Conclusion

The process of signing a Windows executable with a cryptographic signature is relatively straightforward, but unfortunately this isn’t the end of the road for bespoke software providers who are trying to appear reputable to their users. Once we re-uploaded the file to our web server and downloaded it again, the only change from the end-user perspective was that the SmartScreen warning had a yellow banner in place of the red banner and the red warning “shield” from the original dialog had disappeared:
SmartScreen Application Reputation warning after signing
Clicking “Actions” brings up a dialog with exactly the same layout as with the unsigned executable (including a red “shield”), but the publisher is now listed and the text changes from this:

SmartScreen Filter reported that this program is not commonly downloaded and is not signed by its author.

to this:

SmartScreen Filter reported that this program is not commonly downloaded.

Both dialogs still include the text “If you received this program as an unexpected solicitation, link, or attachment, it might be an attempt to fool you. We recommend that you delete it.” and the recommended option is still “Don’t run this program”:
SmartScreen filter after signing

So the problem we’re having is now exclusively one of “reputation” from the perspective of Microsoft’s SmartScreen Application Reputation software, the workings of which are entirely opaque to us. So the next step here is to upgrade to an Extended Validation (EV) certificate and sign the installer using that. From the MSDN article:

“These certificates tend to be more expensive and harder-to-use (requiring security token hardware) but have the benefit of providing faster accumulation of reputation for SmartScreen.”

Although we don’t have the certificate just yet, DigiCert customer service have already been very good about the upgrade to an EV certificate. They credited back the $223 for our standard certificate back within a couple of hours and won’t revoke the certificate until we confirm receipt of the hardware “e-token” required for EV certificate signing. More to follow!

Update June 25th, 2014: Success with the EV certificate!