selectr

The selectr package for R parses CSS3 Selectors and translates them to XPath 1.0 expressions. It is an R port of the cssselect package for Python. Development occurs on GitHub in the selectr repository.

The main purpose of this package is to make working with (X)HTML and XML documents easier within R. The XML and xml2 packages are typically used when working with these documents, but they can only select content based on XPath expressions. By translating CSS selectors to XPath, we can use the more familiar CSS selectors instead of XPath.

Download

To use selectr, all that is necessary is to run the following command as selectr is available on CRAN:

install.packages("selectr")

Usage

The most basic and flexible function provided by selectr is css_to_xpath(). It simply translates a vector of CSS Selectors to their equivalent XPath expressions.

> library(selectr)
> css_to_xpath("div > a")
[1] "descendant-or-self::div/a"
> css_to_xpath("div:nth-child(2) > a")
[1] "descendant-or-self::div[count(preceding-sibling::*) = 1]/a"

A common task is to search for matching nodes within a document. selectr makes this task easier by mimicking the behaviour of DOM methods present in JavaScript (querySelector() and querySelectorAll()).

> fileName <- system.file("exampleData", "test.xml", package="XML")
> mydoc <- xmlParse(fileName)
> querySelector(mydoc, "a")
<a>
  <!-- A comment -->
  <b> 
    %extEnt;
  </b>
</a> 
> querySelectorAll(mydoc, "code")
[[1]]
<code>
xmlTreeParse("test.xml", replaceEntities = TRUE)
</code> 

[[2]]
<code>
xmlTreeParse("test.xml")
</code> 

attr(,"class")
[1] "XMLNodeSet"

Further Information

A technical report has been created that describes this package in more detail. It also contains examples on how you would use selectr.

The package documentation also goes into more detail on the usage of each particular function.