I’ve just started using xlsx4j (a large collection of classes in what is more widely known as the docx4j project) to implement Excel export as part of a large, ongoing StackThread project. As an extremely brief introduction, docx4j uses the Java Architecture for XML Binding (JAXB) to create an in-memory representation of the Microsoft Open XML format that underlies all Microsoft Office documents in the post-2007 era.
Since xlsx4j is effectively a wrapper around the Open XML format, it becomes immediately apparent in using the library that you need to have some familiarity with the file format in order to get anything working. The first snag I hit was the manner in which Excel handles string storage, namely through the use of a shared string table. The shared string table contains every unique string in the workbook only once. Every cell that contains a string actually just holds a reference to the appropriate string in the shared string table. While I did eventually get this working, it took quite a lot of code to do so:
// Create a new spreadsheet package SpreadsheetMLPackage pkg = SpreadsheetMLPackage.createPackage(); // Create a new worksheet part and retrieve the sheet data WorksheetPart sheet = pkg.createWorksheetPart(new PartName("/xl/worksheets/sheet1.xml"), "Sheet 1", 1); SheetData sheetData = sheet.getJaxbElement().getSheetData(); // Keep track of how many strings we've added long sharedStringCounter = 0; // Create a new row Row row = Context.getsmlObjectFactory().createRow(); // Create a shared strings table instance CTSst sharedStringTable = new CTSst(); CTXstringWhitespace ctx; CTRst crt; // Create 10 cells and add them to the row for (int i = 0; i < 10; i++) { // Create a shared string crt = new CTRst(); ctx = Context.getsmlObjectFactory().createCTXstringWhitespace(); ctx.setValue("Shared string text " + Integer.toString(i + 1)); crt.setT(ctx); // Add it to the shared string table sharedStringTable.getSi().add(crt); // Add a reference to the shared string to our cell using the counter Cell cell = Context.getsmlObjectFactory().createCell(); cell.setT(STCellType.S); cell.setV(String.valueOf(sharedStringCounter)); // Add the cell to the row and increment the counter row.getC().add(cell); sharedStringCounter++; } // Add the row to our sheet sheetData.getRow().add(row); // Set the string and unique string counts on the shared string table sharedStringTable.setCount(sharedStringCounter); sharedStringTable.setUniqueCount(sharedStringCounter); // Create a SharedStrings workbook part SharedStrings sharedStrings = new SharedStrings(new PartName("/xl/sharedStrings.xml")); // Add the shared string table to the part sharedStrings.setJaxbElement(sharedStringTable); // Then add the part to the workbook Parts parts = pkg.getParts(); Part workBook = parts.get( new PartName("/xl/workbook.xml") ); workBook.addTargetPart(sharedStrings);
This is, of course, greatly simplified. Notably, as you'll see on line 46, it's not keeping track of unique strings. While this wouldn't be too difficult to implement (a HashMap<String, Integer> could be used to keep count of unique strings and instances), the above code feels very heavyweight for one of the most basic spreadsheet manipulation tasks. Thankfully, there is an easier way.
The easier way
After a little bit of digging around, I discovered that cells also support inline strings. With the addition of a convenience newCellWithInlineString method to generate INLINE_STR cells, the shared string table code can be removed entirely, leaving this:
// Create a new spreadsheet package SpreadsheetMLPackage pkg = SpreadsheetMLPackage.createPackage(); // Create a new worksheet part and retrieve the sheet data WorksheetPart sheet = pkg.createWorksheetPart(new PartName("/xl/worksheets/sheet1.xml"), "Sheet 1", 1); SheetData sheetData = sheet.getJaxbElement().getSheetData(); // Create a new row Row row = Context.getsmlObjectFactory().createRow(); // Create 10 cells and add them to the row for (int i = 0; i < 10; i++) { row.getC().add(this.newCellWithInlineString("Inline string text " + Integer.toString(i + 1))); } // Add the row to our sheet sheetData.getRow().add(row); private Cell newCellWithInlineString(String string) { CTXstringWhitespace ctx = Context.getsmlObjectFactory().createCTXstringWhitespace(); ctx.setValue(string); CTRst ctrst = new CTRst(); ctrst.setT(ctx); Cell newCell = Context.getsmlObjectFactory().createCell(); newCell.setIs(ctrst); newCell.setT(STCellType.INLINE_STR); return newCell; }
While the first example could definitely have been distilled down and refactored to keep the workings of the shared string table hidden away, the lack of string-tracking overhead in the second example makes for much easier-to-read code in my opinion. It's also much faster to get to the point of functional Excel export without having the hassle of setting up the shared string table. I haven't yet investigated the performance implications of using INLINE_STR when opening the workbook in Excel, but from a quick look at the XML after opening and saving an Excel workbook created using the INLINE_STR method, it looks as though Excel generates its own shared string table when the workbook's first saved.