The StepReg package, developed for exploratory model building tasks, offers support across diverse scenarios. It facilitates model construction for various response variable types, including continuous (linear regression), binary (logistic regression), and time-to-event (Cox regression), among others. StepReg encompasses all commonly used model selection strategies, including forward selection, backward elimination, bidirectional elimination, and best subsets. Notably, it offers flexibility in selection metrics, accommodating both information criteria (AIC, BIC, etc.) and significance level cutoffs. This vignettes provide numerous examples showcasing the effective utilization of StepReg for model development in diverse contexts. Furthermore, it delves into considerations for selecting appropriate strategies and metrics, empowering users to make informed decisions throughout the modeling process.
StepReg 1.5.2
Model selection is the process of choosing the most relevant features from a set of candidate variables. This procedure is crucial because it ensures that the final model is both accurate and interpretable while being computationally efficient and avoiding overfitting. Stepwise regression algorithms iteratively add or remove features from the model based on certain criteria (e.g., significance level or P-value, information criteria like AIC or BIC, etc.). The process continues until no further improvements can be made according to the chosen criterion. At the end of the stepwise procedure, you’ll have a final model that includes the selected features and their coefficients.
StepReg simplifies model selection tasks by providing a unified programming interface. It currently supports model buildings for five distinct response variable types (section 3.1), four model selection strategies (section 3.2) including the best subsets algorithm, and a variety of selection metrics (section 3.3). Moreover, StepReg detects and addresses the multicollinearity issues if they exist (section 3.4). The output of StepReg includes multiple tables summarizing the final model and the variable selection procedures. Additionally, StepReg offers a plot function to visualize the selection steps (section 4). For demonstration, the vignettes include four use cases covering distinct regression scenarios (section 5). Non-programmers can access the tool through the iterative Shiny app detailed in section 6.
The following example selects an optimal linear regression model with the mtcars
dataset.
library(StepReg)
data(mtcars)
formula <- mpg ~ .
res <- stepwise(formula = formula,
data = mtcars,
type = "linear",
include = c("qsec"),
strategy = "bidirection",
metric = c("AIC"))
Breakdown of the parameters:
formula
: specifies the dependent and independent variablestype
: specifies the regression category, depending on your data, choose from “linear”, “logit”, “cox”, etc.include
: specifies the variables that must be in the final modelstrategy
: specifies the model selection strategy, choose from “forward”, “backward”, “bidirection”, “subset”metric
: specifies the model fit evaluation metric, choose one or more from “AIC”, “AICc”, “BIC”, “SL”, etc.The output consists of multiple tables, which can be viewed with:
res
Table 1. Summary of arguments for model selection
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Parameter Value
—————————————————————————————————————————————
included variable qsec
strategy bidirection
metric AIC
tolerance of multicollinearity 1e-07
multicollinearity variable NULL
intercept 1
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Table 2. Summary of variables in dataset
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Variable_type Variable_name Variable_class
——————————————————————————————————————————————
Dependent mpg numeric
Independent cyl numeric
Independent disp numeric
Independent hp numeric
Independent drat numeric
Independent wt numeric
Independent qsec numeric
Independent vs numeric
Independent am numeric
Independent gear numeric
Independent carb numeric
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Table 3. Summary of selection process under bidirection with AIC
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Step EffectEntered EffectRemoved NumberParams AIC
——————————————————————————————————————————————————————————————
1 1 1 149.94345
2 qsec 2 145.776054
3 wt 3 97.90843
4 am 4 95.307305
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Table 4. Summary of coefficients for the selected model with mpg under bidirection and AIC
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
Variable Estimate Std. Error t value Pr(>|t|)
—————————————————————————————————————————————————————————
(Intercept) 9.617781 6.959593 1.381946 0.177915
qsec 1.225886 0.28867 4.246676 0.000216
wt -3.916504 0.711202 -5.506882 7e-06
am 2.935837 1.410905 2.080819 0.046716
‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗‗
You can also visualize the variable selection procedures with:
plot(res)
$bidirection
$bidirection$detail
$bidirection$summary