smallsets: Visual Documentation for Data Preprocessing

Data practitioners regularly use the 'R' and 'Python' programming languages to prepare data for analyses. Thus, they encode important data preprocessing decisions in 'R' and 'Python' code. The 'smallsets' package subsequently decodes these decisions into a Smallset Timeline, a static, compact visualisation of data preprocessing decisions (Lucchesi et al. (2022) <doi:10.1145/3531146.3533175>). The visualisation consists of small data snapshots of different preprocessing steps. The 'smallsets' package builds this visualisation from a user's dataset and preprocessing code located in an 'R', 'R Markdown', 'Python', or 'Jupyter Notebook' file. Users simply add structured comments with snapshot instructions to the preprocessing code. One optional feature in 'smallsets' requires installation of the 'Gurobi' optimisation software and 'gurobi' 'R' package, available from <>. More information regarding the optional feature and 'gurobi' installation can be found in the 'smallsets' vignette.

Version: 2.0.0
Depends: R (≥ 3.5.0)
Imports: callr, colorspace, flextable, ggplot2, ggtext, knitr, patchwork, plotrix, reticulate, rmarkdown
Suggests: gurobi, testthat (≥ 3.0.0)
Published: 2023-12-05
DOI: 10.32614/CRAN.package.smallsets
Author: Lydia R. Lucchesi ORCID iD [aut, cre], Petra M. Kuhnert [ths], Jenny L. Davis [ths], Lexing Xie [ths]
Maintainer: Lydia R. Lucchesi <Lydia.Lucchesi at>
License: GPL (≥ 3)
NeedsCompilation: no
Materials: README NEWS
CRAN checks: smallsets results


Reference manual: smallsets.pdf
Vignettes: smallsets User Guide


Package source: smallsets_2.0.0.tar.gz
Windows binaries: r-devel:, r-release:, r-oldrel:
macOS binaries: r-release (arm64): smallsets_2.0.0.tgz, r-oldrel (arm64): smallsets_2.0.0.tgz, r-release (x86_64): smallsets_2.0.0.tgz, r-oldrel (x86_64): smallsets_2.0.0.tgz
Old sources: smallsets archive


Please use the canonical form to link to this page.