Package can be installed from CRAN

install.packages("ldatuning")

or downloaded from the GitHub repository (developer version).

install.packages("devtools")
devtools::install_github("nikita-moor/ldatuning")

Package ldatuning realizes 4 metrics to select perfect number of topics for LDA model.

library("ldatuning")

Load “AssociatedPress” dataset from the topicmodels package.

library("topicmodels")
data("AssociatedPress", package="topicmodels")
dtm <- AssociatedPress[1:10, ]

The most easy way is to calculate all metrics at once. All existing methods require to train multiple LDA models to select one with the best performance. It is computation intensive procedure and ldatuning uses parallelism, so do not forget to point correct number of CPU cores in mc.core parameter to archive the best performance.

All standard LDA methods and parameters from topimodels package can be set with method and control.

result <- FindTopicsNumber(
  dtm,
  topics = seq(from = 2, to = 15, by = 1),
  metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014"),
  method = "Gibbs",
  control = list(seed = 77),
  mc.cores = 2L,
  verbose = TRUE
)
## fit models... done.
## calculate metrics:
##   Griffiths2004... done.
##   CaoJuan2009... done.
##   Arun2010... done.
##   Deveaud2014... done.

Result is a number of topics and corresponding values of metrics

topics Griffiths2004 CaoJuan2009 Arun2010 Deveaud2014
15 -15297.82 0.5047240 15.92711 0.1362596
14 -15338.24 0.4927860 15.36552 0.1406462
13 -15319.82 0.4944709 15.80569 0.1504368
12 -15326.94 0.4756351 15.81278 0.1594651
11 -15293.55 0.4347111 15.23313 0.1770861
10 -15291.00 0.3829542 14.93706 0.1969989
9 -15303.87 0.3379840 14.71664 0.2181424
8 -15256.30 0.3061726 14.78140 0.2435689
7 -15259.80 0.2746812 14.82908 0.2746203
6 -15251.04 0.2612029 15.28425 0.3101625
5 -15226.91 0.1875260 15.34470 0.3718687
4 -15242.86 0.1779016 16.29708 0.4323482
3 -15266.66 0.1600736 16.97832 0.5318997
2 -15349.79 0.1169522 18.47430 0.6989189

Simple approach in analyze of metrics is to find extremum, more complete description is in corresponding papers:

Support function FindTopicsNumber_plot can be used for easy analyze of the results

FindTopicsNumber_plot(result)

Results calculated on the whole dataset (about 10 hours on quad-core computer) look like