Validating Excel Markov models in R

As part of my ongoing campaign to stop using Microsoft Excel for tasks to which it isn’t well suited, I’ve recently started validating all Excel-based Markov implementations using R. Regrettably, Excel is still the preferred “platform” for developing health economic models (hence R being relegated to validation work), but recreating Markov models in R starkly illustrates how much time could be saved if more reimbursement or health technology assessment agencies would consider models written in non-proprietary languages rather than proprietary file formats such as Excel.

As a brief re-cap, strict Markov chains are memoryless, probability-driven processes in which transitions through a defined state space, S, are driven by a transition matrix, briefly as follows:

$$ \Pr(X_{n+1}=x\mid X_1=x_1, X_2=x_2, \ldots, X_n=x_n) = \Pr(X_{n+1}=x\mid X_n=x_n) $$

where $$S = \{X_1 \ldots X_n\} $$

A typical Markov modelling approach in Excel is to start with a transition matrix as a contiguous range at the top of sheet, lay out the initial state distributions immediately below it, and model state transitions using a series of rows containing SUMPRODUCT() formulae, which multiply and sum the state distribution in the previous row with the relevant transition probability row. As an illustrative example, a simple 5-state Markov model running over 120 cycles (say, 10 years with a monthly cycle length) would result in 600 SUMPRODUCT() formulae. Despite the ease with which the cells can be populated with Excel’s AutoFill functionality, making any change to the model (e.g. adding a state) requires the whole range to be updated every time. The use of per-row summation checks with a Machine epsilon (2^-1022) can highlight errors in the formulae, but this requires yet more formulae to update should the model structure change.

Conversely, in R, once the transition matrix and initial states have been defined, implementing a similar Markov model requires just one line of code (with the use of the Matrix exponential package):

install.packages("expm")

tm <- matrix(c(
	0.9,0.1,0,0,0,
	0,0.9,0.1,0,0,
	0,0,0.9,0.1,0,
	0,0,0,0.9,0.1,
	0,0,0,0,1), byrow=TRUE, nrow=5)

initial <- c(1,0,0,0,0)

print(initial %*% (tm %^% 120))

             [,1]         [,2]         [,3]        [,4]     [,5]
[1,] 3.229246e-06 4.305661e-05 0.0002846521 0.001244035 0.998425

This simply prints out the final state distribution after 120 cycles, but the entire Markov trace can be printed using R’s built-in sapply() and t() functions without a dramatic increase in the complexity of the code:

print(t(sapply(1:120, function(x) initial %*% (tm %^% x))))

             [,1]         [,2]         [,3]        [,4]        [,5]
[1,] 1.000000e+00 0.000000e+00 0.0000000000 0.000000000 0.000000000
[2,] 9.000000e-01 1.000000e-01 0.0000000000 0.000000000 0.000000000
...

At this point in Excel, the state distributions might subsequently be used to tally up costs and quality-adjusted life expectancy (in quality-adjusted life years or QALYs) for each state. This would require another two identically-sized ranges of cells to capture cost and QALY estimates for each state, trebling the number of formulas to 1,800. In R, adding in and summing up state costs is much more straightforward:

costs <- c(300, 350, 400, 200, 0)
timehorizon <- 120
utilities <- c(0.85, 0.8, 0.75, 0.85, 0)
discount <- 0.035

print(apply(sapply(1:timehorizon, function(x) initial %*% (tm %^% x) * costs * 1/(1+discount)^x), c(1), sum)) 
print(apply(sapply(1:timehorizon, function(x) initial %*% (tm %^% x) * utilities * 1/(1+discount)^x), c(1), sum))

[1] 2699.991 3499.853 3998.790 1997.099    0.000
[1] 7.649975 7.999664 7.497731 8.487670 0.000000

To output the top-line health economic outcomes over the entire Markov simulation, the code can then actually be simplified (since we don’t need cycle-by-cycle results) to sum the costs and the quality-adjusted life expectancy:

print(sum(sapply(1:timehorizon, function(x) initial %*% (tm %^% x) * costs * 1/(1+discount)^x)))
print(sum(sapply(1:timehorizon, function(x) initial %*% (tm %^% x) * utilities * 1/(1+discount)^x)))

[1] 6293.488
[1] 16.01336

And finally, with the introduction of a second transition matrix, a complete, two-arm, five-state Markov model with support for discounting and an arbitrary time horizon can be implemented in 10 lines of R as follows:

tm <- matrix(c(
	0.9,0.1,0,0,0,
	0,0.9,0.1,0,0,
	0,0,0.9,0.1,0,
	0,0,0,0.9,0.1,
	0,0,0,0,1), byrow=TRUE, nrow=5)
	
tm2 <- matrix(c(
	0.91,0.09,0,0,0,
	0,0.91,0.09,0,0,
	0,0,0.91,0.09,0,
	0,0,0,0.91,0.09,
	0,0,0,0,1), byrow=TRUE, nrow=5)

initial <- c(1,0,0,0,0)

costs <- c(300, 350, 400, 200, 0)
utilities <- c(0.85, 0.8, 0.75, 0.85, 0)
timehorizon <- 120
discount <- 0.035

cost <- c(
	sum(sapply(1:timehorizon, function(x) initial %*% (tm %^% x) * costs * 1/(1+discount)^x)),
	sum(sapply(1:timehorizon, function(x) initial %*% (tm2 %^% x) * costs * 1/(1+discount)^x))
)
qale <- c(
	sum(sapply(1:timehorizon, function(x) initial %*% (tm %^% x) * utilities * 1/(1+discount)^x)),
	sum(sapply(1:timehorizon, function(x) initial %*% (tm2 %^% x) * utilities * 1/(1+discount)^x))
)
print( (cost[1] - cost[2])/(qale[1] - qale[2]) )

[1] 395.0946

Having all of this functionality in 10 easily readable lines of code strikes an excellent balance between concision and clarity, and is definitely preferable to 1,800+ formulae in Excel with equivalent functionality (or even to a similar Markov model laid out in TreeAge). The above R code, while still slightly contrived in its simplicity, can be printed on a single sheet of paper and, crucially, can be very easily adapted and modified without needing to worry about filling ranges of cells with formulae including a mix of relative and absolute cell referencing.

While R is not necessarily the very best language for health economic modelling, given how straightforwardly a Markov model (as but one example) can be implemented in the language, it wouldn’t be a bad candidate to be supported by healthcare technology assessment agencies for the purposes of economic evaluation. More importantly, regardless of any specific merits and shortcomings of R, the adoption or acceptance of any similar language would represent an excellent first step away from the current near-ubiquitous use of proprietary modelling technologies or platforms that are, in many cases, ill-suited to the task.

Quality of life by age, sex and body mass index

Just a quick post on some data reported in Section 6 of NICE Clinical Guideline 43 (originally from a 2004 publication by Macran) which reports quality of life (QoL) by sex, and BMI and age brackets:1Macran S. The Relationship between Body Mass Index and Health-Related Quality of Life. 190. 2004. York, The Publications Office, Centre for Health Economics. Centre for Health Economics Discussion … Continue reading

NICE CG43 QoL Figure

The only unexpected aspect of the data is the high QoL in males with BMI greater than 40 aged 25-34 and females aged 18-24, but that’s likely driven by low sample size. Otherwise, there’s a general downward trend with age and an inverted J-shaped curve with respect to BMI, which approximately mirrors the J-shaped curves observed when plotting BMI against mortality 2Berrington de Gonzalez A, Hartge P, Cerhan JR et al. Body-mass index and mortality among 1.46 million white adults. N Engl J Med. 2010;363(23):2211–9.,3Tobias DK, Pan A, Jackson CL et al. Body-mass index and mortality among adults with incident type 2 diabetes. N Engl J Med. 2014;370(3):233–44.,4Dudina A, Cooney MT, Bacquer DD et al. Relationships between body mass index, cardiovascular mortality, and risk factors: a report from the SCORE investigators. Eur J Cardiovasc Prev Rehabil. … Continue reading and certain diabetes complications.5Sohn MW, Budiman-Mak E, Lee TA, Oh E, Stuck RM. Significant J-shaped association between body mass index (BMI) and diabetic foot ulcers. Diabetes Metab Res Rev. 2011;27(4):402–9.

References

References
1 Macran S. The Relationship between Body Mass Index and Health-Related Quality of Life. 190. 2004. York, The Publications Office, Centre for Health Economics. Centre for Health Economics Discussion Paper.
2 Berrington de Gonzalez A, Hartge P, Cerhan JR et al. Body-mass index and mortality among 1.46 million white adults. N Engl J Med. 2010;363(23):2211–9.
3 Tobias DK, Pan A, Jackson CL et al. Body-mass index and mortality among adults with incident type 2 diabetes. N Engl J Med. 2014;370(3):233–44.
4 Dudina A, Cooney MT, Bacquer DD et al. Relationships between body mass index, cardiovascular mortality, and risk factors: a report from the SCORE investigators. Eur J Cardiovasc Prev Rehabil. 2011;18(5):731-42.
5 Sohn MW, Budiman-Mak E, Lee TA, Oh E, Stuck RM. Significant J-shaped association between body mass index (BMI) and diabetic foot ulcers. Diabetes Metab Res Rev. 2011;27(4):402–9.