These functions fit penalized dummy coefficients of ordinally scaled independent variables. The covariates are assumed to take values 1,2,…,max, where max denotes the (columnwise) highest level observed in the data. The ordSmooth()
function penalizes the sum of squared differences of adjacent dummy coefficients, while ordSelect()
uses a group lasso penalty on differences of adjacent dummies and ordFusion()
proceeds with a fused lasso penalty on differences of adjacent dummy coefficients using the glmpath algorithm. For details on the different penalties used, please see also Tutz and Gertheiss (2014, 2016).
Smoothing
For smoothing only, the generalized linear model is fitted via penalized maximum likelihood. In particular, the logit or poisson model is fitted by penalized Fisher scoring. For stopping the iterations the criterion \(\sqrt(\sum((b.new-b.old)^2)/\sum(b.old^2)) < \delta\) is used.
The tuning parameter \(\lambda\) controls the overall strength of the penalty. If \(\lambda = 0\) one obtains the usual un-penalized coefficient estimates. If \(\lambda \to \infty\) coefficients are enforced to shrink towards 0. For \(0 < \lambda < \infty\) parameters corresponding to adjacent categories are enforced to obtain similar values, as it is sensible to assume that coefficients vary smoothly over ordinal categories.
It should be highlighted, that ordSmooth()
is intended for use with high-dimensional ordinal predictors; more precisely, if the number of ordinal predictors is large. Package ordPens
, however, also includes auxiliary functions such that mgcv::gam()
(Wood, 2011 and 2017) can be used for fitting generalized linear and additive models with first- and second-order ordinal smoothing penalty as well as built-in smoothing parameter selection. In addition, mgcv
tools for further statistical inference can be used. Note, however, significance of smooth (ordinal) terms is only reliable in case of the second-order penalty. Also note, if using gam()
, dummy coefficients/fitted functions are centered over the data observed. For details, see also Gertheiss et al. (2021).
Simulated data example
Now, let’s demonstrate and compare the three methods for the default Gaussian model.
Simulating data with ordinal predictors:
set.seed(123)
# generate (ordinal) predictors
x1 <- sample(1:8, 100, replace = TRUE)
x2 <- sample(1:6, 100, replace = TRUE)
x3 <- sample(1:7, 100, replace = TRUE)
# the response
y <- -1 + log(x1) + sin(3*(x2-1)/pi) + rnorm(100)
# x matrix
x <- cbind(x1,x2,x3)
# lambda values
lambda <- c(1000, 500, 200, 100, 70, 50, 30, 20, 10, 1)
The syntax of the three functions is very similar, type also ?ordSmooth
or one of the other functions. The fitted ordPen
object contains the fitted coefficients and can be visualized by the generic plot
function:
osm1 <- ordSmooth(x = x, y = y, lambda = lambda)
osl <- ordSelect(x = x, y = y, lambda = lambda)
#> Setting update.every = length(lambda) + 1
#> Lambda: 1000 nr.var: 1
#> Lambda: 500 nr.var: 1
#> Lambda: 200 nr.var: 1
#> Lambda: 100 nr.var: 8
#> Lambda: 70 nr.var: 13
#> Lambda: 50 nr.var: 13
#> Lambda: 30 nr.var: 13
#> Lambda: 20 nr.var: 13
#> Lambda: 10 nr.var: 13
#> Lambda: 1 nr.var: 19
ofu <- ordFusion(x = x, y = y, lambda = lambda)
par(mar = c(4.1, 4.1, 3.1, 1.1))
plot(osm1)
plot(osl, main = "")
plot(ofu, main = "")
The vector of penalty parameters lambda
is sorted decreasingly and each curve corresponds to a \(\lambda\) value, with larger values being associated with the darker colors. The top row corresponds to smoothing, the middle row shows selection and the bottom row demonstrates variable fusion.
It is seen that for smaller \(\lambda\), the coefficients are more wiggly, while differences of adjacent categories are more and more shrunk when \(\lambda\) increases, which yields smoother coefficients.
Variable 3 is excluded from the model for \(\lambda \geq 10\) with both ordSelect()
and ordFusion()
. In general, ordFusion()
yields estimates that tend to be flat over adjacent categories, which represents a specific form of smoothness. With the fused lasso, all categories are fused and excluded at some point, which can be viewed from the bottom row or from the matrix of estimated coefficients (not printed here):
round(osm1$coefficients, digits = 3)
round(osl$coefficients, digits = 3)
round(ofu$coefficients, digits = 3)
Now, let’s plot the path of coefficients against the log-lambda value:
matplot(log(lambda), t(osm1$coefficients), type = "l", xlab = expression(log(lambda)),
ylab = "Coefficients", col = c(1, rep(2,8), rep(3,6), rep(4,7)), cex.main = 1,
main = "Smoothing", xlim = c(max(log(lambda)), min(log(lambda))))
axis(4, at = osm1$coefficients[,ncol(osm1$coefficients)], label = rownames(osm1$coefficients),
line = -.5, las = 1, tick = FALSE, cex.axis = 0.75)
matplot(log(lambda), t(osl$coefficients), type = "l", xlab = expression(log(lambda)),
ylab = "Coefficients", col = c(1, rep(2,8), rep(3,6), rep(4,7)), cex.main = 1,
main = "Selection", xlim = c(max(log(lambda)), min(log(lambda))))
axis(4, at = osl$coefficients[,ncol(osl$coefficients)], label = rownames(osl$coefficients),
line = -.5, las = 1, tick = FALSE, cex.axis = 0.75)
matplot(log(lambda), t(ofu$coefficients), type = "l", xlab = expression(log(lambda)),
ylab = "Coefficients", col = c(1, rep(2,8), rep(3,6), rep(4,7)), cex.main = 1,
main = "Fusion", xlim = c(max(log(lambda)), min(log(lambda))))
axis(4, at = ofu$coefficients[,ncol(ofu$coefficients)], label = rownames(ofu$coefficients),
line = -.5, las = 1, tick = FALSE, cex.axis = 0.75)
Each curve corresponds to a dummy coefficient and each color corresponds to a variable (intercept not penalized here). The effect of the different penalties can be seen very clearly.
Using mgcv::gam()
with ordPens
and comparison to ordSmooth()
As mentioned before, the package also builds a bridge to mgcv::gam()
by providing a new type of spline basis for ordered factors s(..., bs = "ordinal")
, such that smooth terms in the GAM formula can be used. In addition, generic functions for prediction and plotting are provided; and existing mgcv
tools for further statistical inference can be used this way, which enables a very flexible analysis.
Before estimation, we need to modify the ordinal predictors to the class of ordered factors. We also add a nominal covariate to control for.
x1 <- as.ordered(x1)
x2 <- as.ordered(x2)
x3 <- as.ordered(x3)
u1 <- sample(1:8, 100, replace = TRUE)
u <- cbind(u1)
osm2 <- ordSmooth(x = x, y = y, u = u, lambda = lambda)
Now, let us use gam()
from mgcv for model fitting. Estimation with first-order penalty and smoothing parameter selection by REML:
gom1 <- gam(y ~ s(x1, bs = "ordinal", m = 1) + s(x2, bs = "ordinal", m = 1) +
s(x3, bs = "ordinal", m = 1) + factor(u1), method = "REML")
Using second-order penalty instead:
gom2 <- gam(y ~ s(x1, bs = "ordinal", m = 2) + s(x2, bs = "ordinal", m = 2) +
s(x3, bs = "ordinal", m = 2) + factor(u1), method = "REML")
The summary command including significance of smooth terms can be used in the usual way. Please note, the latter is only reliable for m = 2
, as mentioned above.
summary(gom2)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> y ~ s(x1, bs = "ordinal", m = 2) + s(x2, bs = "ordinal", m = 2) +
#> s(x3, bs = "ordinal", m = 2) + factor(u1)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.01402 0.37928 -0.037 0.971
#> factor(u1)2 0.23057 0.58116 0.397 0.693
#> factor(u1)3 0.54998 0.52426 1.049 0.297
#> factor(u1)4 0.31065 0.48449 0.641 0.523
#> factor(u1)5 0.15299 0.58312 0.262 0.794
#> factor(u1)6 -0.06266 0.48096 -0.130 0.897
#> factor(u1)7 0.27193 0.48767 0.558 0.579
#> factor(u1)8 0.37443 0.48599 0.770 0.443
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(x1) 1.710 2.146 15.047 2.07e-06 ***
#> s(x2) 3.019 3.731 9.029 1.17e-05 ***
#> s(x3) 1.000 1.000 0.100 0.753
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.419 Deviance explained = 49.4%
#> -REML = 155.64 Scale est. = 1.3422 n = 100
We can also visualize the coefficients including confidence intervals by executing the plot
function:
par(mar = c(4.1, 4.1, 3.1, 1.1))
plot(osm2)
plot(gom1)
plot(gom2)
Predicting from an ordPen
object
The package also includes a generic function for prediction. It takes a fitted ordPen
object and a matrix/data frame of new observations of the considered ordinal predictors (denoted by newx
). If the fitted object contains also additional nominal or metric predictors, further matrices of new observations corresponding to those predictors need to be added (denoted by newu
or newz
).
x1 <- sample(1:8, 10, replace = TRUE)
x2 <- sample(1:6, 10, replace = TRUE)
x3 <- sample(1:7, 10, replace = TRUE)
newx <- cbind(x1, x2, x3)
The type of prediction can be chosen by the type
option: “link” is on the scale of linear predictors, “response” is on the scale of the response variable (inverse link function).
Predict from an ordPen
object:
round(predict(osm1, newx), digits = 3)
#> 1000 500 200 100 70 50 30 20 10 1
#> [1,] 0.261 0.301 0.404 0.539 0.632 0.733 0.911 1.068 1.351 2.107
#> [2,] 0.229 0.239 0.265 0.299 0.323 0.351 0.400 0.445 0.518 0.525
#> [3,] 0.289 0.352 0.511 0.706 0.828 0.954 1.151 1.305 1.547 2.049
#> [4,] 0.240 0.261 0.315 0.388 0.440 0.497 0.601 0.693 0.857 1.210
#> [5,] 0.211 0.203 0.178 0.137 0.106 0.070 0.007 -0.045 -0.113 -0.017
#> [6,] 0.232 0.244 0.273 0.306 0.325 0.343 0.365 0.375 0.377 0.297
#> [7,] 0.234 0.247 0.280 0.315 0.333 0.348 0.364 0.371 0.379 0.472
#> [8,] 0.272 0.320 0.441 0.588 0.680 0.772 0.910 1.007 1.124 1.094
#> [9,] 0.256 0.290 0.376 0.482 0.548 0.615 0.718 0.792 0.884 0.827
#> [10,] 0.246 0.272 0.333 0.405 0.447 0.486 0.537 0.563 0.572 0.402
round(predict(osl, newx), digits = 3)
#> 1000 500 200 100 70 50 30 20 10 1
#> [1,] 0.218 0.218 0.218 0.255 0.473 0.692 0.996 1.212 1.532 2.276
#> [2,] 0.218 0.218 0.218 0.159 0.220 0.315 0.434 0.505 0.566 0.483
#> [3,] 0.218 0.218 0.218 0.331 0.683 0.970 1.299 1.485 1.691 2.102
#> [4,] 0.218 0.218 0.218 0.206 0.329 0.483 0.712 0.875 1.092 1.292
#> [5,] 0.218 0.218 0.218 0.117 0.079 0.050 -0.043 -0.113 -0.160 0.053
#> [6,] 0.218 0.218 0.218 0.296 0.356 0.378 0.403 0.413 0.406 0.260
#> [7,] 0.218 0.218 0.218 0.372 0.440 0.416 0.375 0.355 0.359 0.490
#> [8,] 0.218 0.218 0.218 0.331 0.578 0.755 0.930 0.994 0.994 1.034
#> [9,] 0.218 0.218 0.218 0.296 0.478 0.622 0.781 0.856 0.896 0.767
#> [10,] 0.218 0.218 0.218 0.331 0.455 0.511 0.552 0.551 0.504 0.356
round(predict(ofu, newx), digits = 3)
#> 1000 500 200 100 70 50 30 20 10 1
#> [1,] 0.218 0.218 0.218 0.218 0.246 0.512 0.872 1.090 1.425 2.169
#> [2,] 0.218 0.218 0.218 0.218 0.190 0.069 0.183 0.303 0.572 0.538
#> [3,] 0.218 0.218 0.218 0.218 0.246 0.512 0.872 1.090 1.425 2.032
#> [4,] 0.218 0.218 0.218 0.218 0.190 0.137 0.329 0.455 0.774 1.295
#> [5,] 0.218 0.218 0.218 0.218 0.190 0.069 0.183 0.303 0.109 0.014
#> [6,] 0.218 0.218 0.218 0.218 0.246 0.440 0.519 0.545 0.488 0.315
#> [7,] 0.218 0.218 0.218 0.218 0.246 0.250 0.102 0.028 0.199 0.463
#> [8,] 0.218 0.218 0.218 0.218 0.246 0.512 0.785 0.850 0.872 1.005
#> [9,] 0.218 0.218 0.218 0.218 0.246 0.512 0.785 0.850 0.872 0.816
#> [10,] 0.218 0.218 0.218 0.218 0.246 0.440 0.519 0.545 0.488 0.344
ICF core set for chronic widespread Pain
Lastly, we demonstrate the three methods on a publicly available dataset suiting the high-dimensional setting they’re designed for. The ICF core set for chronic widespread pain (CWP) consists of 67 rating scales - called ICF categories - and a physical health component summary measure (phcs
).
Each ICF factor is associated with one of the following four types: ‘body functions’, ‘body structures’, ‘activities and participation’, and ‘environmental factors’. The latter are measured on a nine-point Likert scale ranging from −4 ‘complete barrier’ to +4 ‘complete facilitator’. All remaining factors are evaluated on a five-point Likert scale ranging from 0 ‘no problem’ to 4 ‘complete problem’. Type ?ICFCoreSetCWP
and see also references therein for further details.
data(ICFCoreSetCWP)
head(ICFCoreSetCWP)
#> b1602 b122 b126 b130 b134 b140 b147 b152 b164 b180 b260 b265 b270 b280 b430
#> 1 0 1 2 1 1 0 0 2 0 0 0 0 0 1 0
#> 2 3 2 2 3 3 2 3 3 3 1 2 2 2 2 0
#> 3 0 1 2 1 1 0 1 2 0 0 0 0 0 1 0
#> 4 0 0 0 2 1 0 0 0 0 0 0 0 0 3 0
#> 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 7 0 1 0 2 3 3 2 3 2 2 0 0 3 3 0
#> b455 b640 b710 b730 b735 b740 b760 b780 d160 d175 d220 d230 d240 d410 d415
#> 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0
#> 2 3 0 2 3 3 3 3 3 3 2 3 3 3 3 3
#> 3 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0
#> 4 3 0 0 0 0 3 0 1 0 0 0 1 0 0 2
#> 5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 7 3 2 0 3 3 3 0 2 2 2 3 2 0 3 3
#> d430 d450 d455 d470 d475 d510 d540 d570 d620 d640 d650 d660 d720 d760 d770
#> 1 1 0 2 0 1 0 0 0 1 0 0 0 0 0 0
#> 2 3 3 3 3 0 2 2 3 3 3 3 3 3 3 0
#> 3 1 0 2 0 1 0 0 0 0 0 0 0 0 0 0
#> 4 2 3 2 0 0 0 0 0 0 1 0 0 0 0 0
#> 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 7 3 1 3 1 3 1 1 1 1 3 2 0 1 0 3
#> d845 d850 d855 d910 d920 e1101 e310 e325 e355 e410 e420 e425 e430 e450 e455
#> 1 0 0 0 0 1 0 0 0 4 0 0 0 0 4 0
#> 2 0 0 2 2 2 2 3 2 3 3 3 3 3 3 3
#> 3 0 0 0 0 1 0 0 0 4 0 0 0 0 4 0
#> 4 0 0 0 0 0 0 2 1 2 0 -1 -1 0 2 2
#> 5 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0
#> 7 2 2 0 0 2 3 3 3 3 3 2 1 -1 2 2
#> e460 e465 e570 e575 e580 e590 s770 phcs
#> 1 0 0 3 3 4 3 0 44.33
#> 2 2 2 2 2 2 2 2 21.09
#> 3 0 0 3 3 4 0 0 41.74
#> 4 -1 0 0 2 2 1 1 33.96
#> 5 0 0 0 0 0 0 0 46.29
#> 7 -2 -2 1 1 1 1 1 29.89
Before estimation, we make sure that the ordinal design matrix is coded adequately:
y <- ICFCoreSetCWP$phcs
x <- ICFCoreSetCWP[, 1:67] + matrix(c(rep(1, 50), rep(5, 16), 1),
nrow(ICFCoreSetCWP), 67,
byrow = TRUE)
xnames <- names(x)
head(x)
#> b1602 b122 b126 b130 b134 b140 b147 b152 b164 b180 b260 b265 b270 b280 b430
#> 1 1 2 3 2 2 1 1 3 1 1 1 1 1 2 1
#> 2 4 3 3 4 4 3 4 4 4 2 3 3 3 3 1
#> 3 1 2 3 2 2 1 2 3 1 1 1 1 1 2 1
#> 4 1 1 1 3 2 1 1 1 1 1 1 1 1 4 1
#> 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 7 1 2 1 3 4 4 3 4 3 3 1 1 4 4 1
#> b455 b640 b710 b730 b735 b740 b760 b780 d160 d175 d220 d230 d240 d410 d415
#> 1 2 1 1 2 1 1 1 2 1 1 2 1 2 1 1
#> 2 4 1 3 4 4 4 4 4 4 3 4 4 4 4 4
#> 3 2 1 1 2 1 2 1 2 1 1 1 2 2 1 1
#> 4 4 1 1 1 1 4 1 2 1 1 1 2 1 1 3
#> 5 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 7 4 3 1 4 4 4 1 3 3 3 4 3 1 4 4
#> d430 d450 d455 d470 d475 d510 d540 d570 d620 d640 d650 d660 d720 d760 d770
#> 1 2 1 3 1 2 1 1 1 2 1 1 1 1 1 1
#> 2 4 4 4 4 1 3 3 4 4 4 4 4 4 4 1
#> 3 2 1 3 1 2 1 1 1 1 1 1 1 1 1 1
#> 4 3 4 3 1 1 1 1 1 1 2 1 1 1 1 1
#> 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 7 4 2 4 2 4 2 2 2 2 4 3 1 2 1 4
#> d845 d850 d855 d910 d920 e1101 e310 e325 e355 e410 e420 e425 e430 e450 e455
#> 1 1 1 1 1 2 5 5 5 9 5 5 5 5 9 5
#> 2 1 1 3 3 3 7 8 7 8 8 8 8 8 8 8
#> 3 1 1 1 1 2 5 5 5 9 5 5 5 5 9 5
#> 4 1 1 1 1 1 5 7 6 7 5 4 4 5 7 7
#> 5 1 1 1 1 1 5 6 5 6 5 5 5 5 6 5
#> 7 3 3 1 1 3 8 8 8 8 8 7 6 4 7 7
#> e460 e465 e570 e575 e580 e590 s770
#> 1 5 5 8 8 9 8 1
#> 2 7 7 7 7 7 7 3
#> 3 5 5 8 8 9 5 1
#> 4 4 5 5 7 7 6 2
#> 5 5 5 5 5 5 5 1
#> 7 3 3 6 6 6 6 2
We can check whether all categories are observed at least once as follows:
rbind(apply(x, 2, min), apply(x, 2, max))
#> b1602 b122 b126 b130 b134 b140 b147 b152 b164 b180 b260 b265 b270 b280
#> [1,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [2,] 4 5 4 5 5 5 5 5 5 4 4 5 5 5
#> b430 b455 b640 b710 b730 b735 b740 b760 b780 d160 d175 d220 d230 d240 d410
#> [1,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [2,] 4 5 5 5 5 5 5 4 5 4 5 5 5 5 5
#> d415 d430 d450 d455 d470 d475 d510 d540 d570 d620 d640 d650 d660 d720 d760
#> [1,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [2,] 4 5 4 5 5 5 4 4 4 5 5 5 5 4 5
#> d770 d845 d850 d855 d910 d920 e1101 e310 e325 e355 e410 e420 e425 e430
#> [1,] 1 1 1 1 1 1 2 1 1 2 1 1 1 1
#> [2,] 5 5 5 5 5 5 9 9 9 9 9 9 9 9
#> e450 e455 e460 e465 e570 e575 e580 e590 s770
#> [1,] 1 2 1 1 1 1 1 1 1
#> [2,] 9 9 9 9 9 9 9 9 5
As for some covariates not all possible levels are observed at least once, the easiest way is to add a corresponding row to x with corresponding y value being NA
to ensure a corresponding coefficient is to be fitted.
x <- rbind(x, rep(1,67))
x <- rbind(x, c(rep(5, 50), rep(9,16), 5))
y <- c(y, NA, NA)
osm_icf <- ordSmooth(x = x, y = y, lambda = lambda)
osl_icf <- ordSelect(x = x, y = y, lambda = lambda)
#> Setting update.every = length(lambda) + 1
#> Lambda: 1000 nr.var: 20
#> Lambda: 500 nr.var: 55
#> Lambda: 200 nr.var: 160
#> Lambda: 100 nr.var: 268
#> Lambda: 70 nr.var: 279
#> Lambda: 50 nr.var: 302
#> Lambda: 30 nr.var: 305
#> Lambda: 20 nr.var: 313
#> Lambda: 10 nr.var: 317
#> Lambda: 1 nr.var: 317
ofu_icf <- ordFusion(x = x, y = y, lambda = lambda)
Let us illustrate the fitted coefficients for some covariates for different values of \(\lambda\) (larger values being associated with the darker curves). We can use axis()
to replace the integers 1,2,… used for fitting with the original ICF levels:
wx <- which(xnames=="b1602"|xnames=="d230"|xnames=="d430"|xnames=="d455"|xnames=="e1101")
xmain <- c()
xmain[wx] <- c("Content of thought",
"Carrying out daily routine",
"Lifting and carrying objects",
"Moving around",
"Drugs")
par(mar = c(4.1, 4.1, 3.1, 1.1))
for(i in wx){
plot(osm_icf, whx = i, main = "", xaxt = "n")
axis(1, at = 1:length(osm_icf$xlevels),
labels = ((1:length(osm_icf$xlevels)) - c(rep(1,50), rep(5,16), 1)[i]))
plot(osl_icf, whx = i, main = xmain[i], xaxt = "n")
axis(1, at = 1:length(osm_icf$xlevels),
labels = ((1:length(osm_icf$xlevels)) - c(rep(1,50), rep(5,16), 1)[i]))
plot(ofu_icf, whx = i, main = "", xaxt = "n")
axis(1, at = 1:length(osm_icf$xlevels),
labels = ((1:length(osm_icf$xlevels)) - c(rep(1,50), rep(5,16), 1)[i]))
}