selenium

R-CMD-check CRAN status

selenium is a tool for the automation of web browsers. It is a low-level interface to the WebDriver specification, and an up-to-date alternative to RSelenium.

Installation

# Install selenider from CRAN
install.packages("selenium")

# Or the development version from Github
# install.packages("pak")
pak::pak("ashbythorpe/selenium-r")

However, you must also have a selenium server installed and running (see below).

Starting the server

A selenium instance consists of two parts: the client and the server. The selenium package only provides the client. This means that you have to start the server yourself.

To do this you must:

There are many different ways to download and start the server, one of which is provided by selenium:

library(selenium)
server <- selenium_server()

This will download the latest version of the server and start it.

By default, the server file will be stored in a temporary directory, meaning it will be deleted when the session is closed. If you want the server to persist, meaning that you don’t have to re-download the server each time, you can use the temp argument:

server <- selenium_server(temp = FALSE)

You can also do this manually if you want:

  1. Download the latest .jar file for Selenium Server. Do this by navigating to the latest GitHub release page (https://github.com/SeleniumHQ/selenium/releases/latest/), scrolling down to the Assets section, and downloading the file named selenium-server-standalone-<VERSION>.jar (with <VERSION> being the latest release version).
  2. Make sure you are in the same directory as the file you downloaded.
  3. In the terminal, run java -jar selenium-server-standalone-<VERSION>.jar standalone --selenium-manager true, replacing <VERSION> with the version number that you downloaded. This will download any drivers you need to communicate with the server and the browser, and start the server.

There are a few other ways of starting Selenium Server:

Starting the client

Once a server instance has started, move to R and load selenium.

library(selenium)

Client sessions can be started using SeleniumSession$new()

session <- SeleniumSession$new()

By default, this will connect to Firefox, but you can use the browser argument to specify a different browser if you like.

session <- SeleniumSession$new(browser = "chrome")

Here, we use the capabilities argument to specify options for the browser. Here, the remote-debugging-port argument to Chrome is used to make sure the port that the browser uses does not conflict with any others (and may be necessary if Chrome is not working by default).

session <- SeleniumSession$new(
  browser = "chrome",
  capabilities = list(
    `goog:chromeOptions` = list(
      args = list("remote-debugging-port=9222")
    )
  )
)

If this doesn’t work, please see the Debugging Selenium article for more information.

Usage

Once the session has been successfully started, you can use the session object to control the browser. Here, we dynamically navigate through the R project homepage.

session$navigate("https://www.r-project.org/")
session$
  find_element(using = "css selector", value = ".row")$
  find_element(using = "css selector", value = "ul")$
  find_element(using = "css selector", value = "a")$
  click()

session$
  find_element(using = "css selector", value = ".row")$
  find_elements(using = "css selector", value = "div")[[2]]$
  find_element(using = "css selector", value = "p")$
  get_text()
#> [1] ""

session$close()

For a more detailed introduction to using selenium, see the Getting Started article.

Note that selenium is low-level and mainly aimed towards developers. If you are wanting to use browser automation for web scraping or testing, you may want to take a look at selenider instead.