Title: | Formal Parser and Related Tools for R Markdown Documents |
---|---|
Description: | An implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. It also includes a collection of high level functions for working with the resulting abstract syntax tree. |
Authors: | Colin Rundel [aut, cre] |
Maintainer: | Colin Rundel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-12 06:12:53 UTC |
Source: | https://github.com/rundel/parsermd |
rmd_ast
.Currently only supports conversion of rmd_tibble
objects back to rmd_ast
.
as_ast(x, ...)
as_ast(x, ...)
x |
Object to convert |
... |
Unused, for extensibility. |
Returns an rmd_ast
object.
parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) %>% as_tibble() %>% as_ast()
parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) %>% as_tibble() %>% as_ast()
rmd_ast
, rmd_tibble
, or any ast node into text.Convert an rmd_ast
, rmd_tibble
, or any ast node into text.
as_document(x, padding = "", collapse = NULL, ...)
as_document(x, padding = "", collapse = NULL, ...)
x |
|
padding |
Padding to add between nodes when assembling the text. |
collapse |
If not |
... |
Passed to |
Returns a character vector.
Helper functions for obtaining or changing chunk options within an rmd object.
rmd_set_options(x, ...) rmd_get_options(x, ..., defaults = list())
rmd_set_options(x, ...) rmd_get_options(x, ..., defaults = list())
x |
An |
... |
Either a collection of named values for the setter or a character values of the option names for the getter. |
defaults |
A named list of default values for the options. |
rmd_set_options
returns the modified version of the original object.
rmd_get_options
returns a list of the requested options (or all options if none
are specified). Non-chunk nodes return NULL
.
rmd = parse_rmd(system.file("examples/minimal.Rmd", package = "parsermd")) str(rmd_get_options(rmd)) str(rmd_get_options(rmd), "include") rmd_set_options(rmd, include = TRUE)
rmd = parse_rmd(system.file("examples/minimal.Rmd", package = "parsermd")) str(rmd_get_options(rmd)) str(rmd_get_options(rmd), "include") rmd_set_options(rmd, include = TRUE)
Recursively searches a directory for R Markdown or Quarto documents and parses them into a collection of
rmd_ast
objects
parse_qmd_collection( dir = "./", pattern = "*.qmd", all = FALSE, recurse = TRUE, regex = FALSE ) parse_rmd_collection( dir = "./", pattern = "*.Rmd", all = FALSE, recurse = TRUE, regex = FALSE )
parse_qmd_collection( dir = "./", pattern = "*.qmd", all = FALSE, recurse = TRUE, regex = FALSE ) parse_rmd_collection( dir = "./", pattern = "*.Rmd", all = FALSE, recurse = TRUE, regex = FALSE )
dir |
Directory to search |
pattern |
Pattern to match files, defaults to glob syntax |
all |
Search includes hidden files |
recurse |
Search recusively within |
regex |
Treat |
Returns a tibble
object with columns for document name
, path
, and ast
.
parse_rmd_collection(system.file("examples/", package="parsermd"))
parse_rmd_collection(system.file("examples/", package="parsermd"))
Documents are parsed into an rmd_ast
object.
parse_rmd(rmd, allow_incomplete = FALSE) parse_qmd(qmd, allow_incomplete = FALSE)
parse_rmd(rmd, allow_incomplete = FALSE) parse_qmd(qmd, allow_incomplete = FALSE)
rmd |
Either the path to an |
allow_incomplete |
Allow incomplete parsing of the document. |
qmd |
Either the path to an |
Returns a rmd_ast
object.
parse_rmd(system.file("examples/hw01.Rmd", package="parsermd"))
parse_rmd(system.file("examples/hw01.Rmd", package="parsermd"))
parsermd
objectsObject contents are converted to a character vector and written to a
temporary directory before rendering via quarto::quarto_render()
or rmarkdown::render()
.
Note that this function has the potential to overwrite existing output
files (e.g. .html
, .pdf
, etc).
render(x, name = NULL, ..., engine = c("quarto", "rmarkdown"))
render(x, name = NULL, ..., engine = c("quarto", "rmarkdown"))
x |
Object to render, e.g. a |
name |
Name of the output file, if not given it will be inferred from the
name of |
... |
Any additional arguments to be passed to |
engine |
The rendering engine to use, either "quarto" or "rmarkdown". |
Returns the results of the render function.
Functions for adding nodes to the beginning or end of an ast.
rmd_ast_append(x, ...) rmd_ast_prepend(x, ...)
rmd_ast_append(x, ...) rmd_ast_prepend(x, ...)
x |
An object containing an |
... |
A collections of ast nodes to append or prepend. |
An object of the same class as x
This function compares the provided Rmd against a template and reports on discrepancies (e.g. missing or unmodified components).
rmd_check_template(rmd, template, ...)
rmd_check_template(rmd, template, ...)
rmd |
The rmd to be check, can be an |
template |
|
... |
Unused, for extensibility. |
Invisibly returns TRUE
if the rmd matches the template, FALSE
otherwise.
tmpl = parse_rmd(system.file("examples/hw01.Rmd", package = "parsermd")) %>% rmd_select(by_section(c("Exercise *", "Solution"))) %>% rmd_template(keep_content = TRUE) rmd_check_template( system.file("examples/hw01-student.Rmd", package = "parsermd"), tmpl )
tmpl = parse_rmd(system.file("examples/hw01.Rmd", package = "parsermd")) %>% rmd_select(by_section(c("Exercise *", "Solution"))) %>% rmd_template(keep_content = TRUE) rmd_check_template( system.file("examples/hw01-student.Rmd", package = "parsermd"), tmpl )
Functions for creating ast nodes,
rmd_ast()
- Create an ast container of nodes
rmd_yaml()
- Create a yaml node
rmd_heading()
- Create a heading node
rmd_code_blokc()
- Create a markdown code block node
rmd_chunk()
- Create a chunk node
rmd_raw_chunk()
- Create a raw chunk node
rmd_fenced_div_open()
- Create a fenced div open node
rmd_fenced_div_close()
- Create a fenced div close node
rmd_markdown()
- Create a markdown container node of rmd_markdown_line
s
rmd_markdown_line()
- Create a markdown line node
rmd_inline_code()
- Create an inline code node
rmd_shortcode()
- Create a shortcode node
rmd_ast(...) rmd_yaml(...) rmd_heading(name, level) rmd_code_block(attr = "", code = character(), indent = "", n_ticks = 3L) rmd_chunk( name = NULL, engine = "r", options = list(), yaml_options = list(), code = character(), indent = "", n_ticks = 3L ) rmd_raw_chunk(format, code = character(), indent = "", n_ticks = 3L) rmd_fenced_div_open(attr = character()) rmd_fenced_div_close() rmd_markdown(...) rmd_markdown_line(...) rmd_inline_code(engine = "", code = "") rmd_shortcode(func, args = character())
rmd_ast(...) rmd_yaml(...) rmd_heading(name, level) rmd_code_block(attr = "", code = character(), indent = "", n_ticks = 3L) rmd_chunk( name = NULL, engine = "r", options = list(), yaml_options = list(), code = character(), indent = "", n_ticks = 3L ) rmd_raw_chunk(format, code = character(), indent = "", n_ticks = 3L) rmd_fenced_div_open(attr = character()) rmd_fenced_div_close() rmd_markdown(...) rmd_markdown_line(...) rmd_inline_code(engine = "", code = "") rmd_shortcode(func, args = character())
... |
Elements within the node. |
name |
Character. Heading or chunk name. |
level |
Integer. Heading level (1-6). |
attr |
Character. Attributes for code block or fenced div. |
code |
Character. Code lines for code block or chunk. |
indent |
Character. Indentation for code block or chunk. |
n_ticks |
Integer. Number of backticks for code block or chunk. |
engine |
Character. Language engine for chunk or inline code |
options |
List. Chunk options. |
yaml_options |
List. Chunk yaml options. |
format |
Character. Format for raw chunk. |
func |
Character. Shortcode function name. |
args |
Character. Shortcode arguments. |
An object with class matching the function name, e.g. rmd_ast()
returns an rmd_ast
object.
Functions for extracting information for Rmd nodes.
rmd_node_label(x, ...) rmd_node_type(x, ...) rmd_node_length(x, ...) rmd_node_content(x, ...) rmd_node_attr(x, attr, ...) rmd_node_engine(x, ...) rmd_node_options(x, ...) rmd_node_code(x, ...)
rmd_node_label(x, ...) rmd_node_type(x, ...) rmd_node_length(x, ...) rmd_node_content(x, ...) rmd_node_attr(x, attr, ...) rmd_node_engine(x, ...) rmd_node_options(x, ...) rmd_node_code(x, ...)
x |
An rmd object, e.g. |
... |
Unused, for extensibility. |
attr |
Attribute name to extract. |
rmd_node_label()
- returns a character vector of node labels,
nodes without labels return NA
.
rmd_node_type()
- returns a character vector of node types.
rmd_node_length()
- returns an integer vector of node lengths (i.e. lines of code, lines of text, etc.),
nodes without a length return NA
.
rmd_node_content()
- returns a character vector of node textual content, nodes without content return NA
.
rmd_node_attr()
- returns a list of node attribute values.
rmd_node_engine()
- returns a character vector of chunk engines,
NA
for all other node types.
rmd_node_options()
- returns a list of chunk node options (named list), MULL
for all other node types.
rmd_node_code()
- returns a list of chunk node code (character vector),
NULL
for all other node types.
rmd = parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) rmd_node_label(rmd) rmd_node_type(rmd) rmd_node_content(rmd) rmd_node_attr(rmd, "level") rmd_node_engine(rmd) rmd_node_options(rmd) rmd_node_code(rmd)
rmd = parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) rmd_node_label(rmd) rmd_node_type(rmd) rmd_node_content(rmd) rmd_node_attr(rmd, "level") rmd_node_engine(rmd) rmd_node_options(rmd) rmd_node_code(rmd)
Uses the section headings of an rmd object to identify the hierarchical structure of the document.
rmd_node_sections(x, levels = 1:6, drop_na = FALSE)
rmd_node_sections(x, levels = 1:6, drop_na = FALSE)
x |
An rmd object, e.g. |
levels |
Limit which section heading levels to return. |
drop_na |
Should |
A list of section names for each node.
This function is implemented using tidyselect::eval_select()
which enables
a variety of useful syntax for selecting nodes from the ast.
Additionally, a number of additional parsermd
specific selection helpers are available:
by_section()
, has_type()
, has_label()
, and has_option()
.
rmd_select(x, ...)
rmd_select(x, ...)
x |
Rmd object, e.g. |
... |
One or more unquoted expressions separated by commas. Chunk labels can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of nodes. |
Returns a subset Rmd object (either rmd_ast
or rmd_tibble
depending on input).
rmd = parse_rmd(system.file("examples/hw01.Rmd", package = "parsermd")) rmd_select(rmd, "plot-dino", "cor-dino") rmd_select(rmd, "plot-dino":"cor-dino") rmd_select(rmd, `plot-dino`:`cor-dino`) rmd_select(rmd, has_type("rmd_chunk")) rmd_select(rmd, by_section(c("Exercise *", "Solution")))
rmd = parse_rmd(system.file("examples/hw01.Rmd", package = "parsermd")) rmd_select(rmd, "plot-dino", "cor-dino") rmd_select(rmd, "plot-dino":"cor-dino") rmd_select(rmd, `plot-dino`:`cor-dino`) rmd_select(rmd, has_type("rmd_chunk")) rmd_select(rmd, by_section(c("Exercise *", "Solution")))
These functions are used in conjunction with rmd_select()
to
select nodes from an Rmd ast.
by_section()
- uses section selectors to select nodes.
has_type()
- selects all nodes that have the given type(s).
has_label()
- selects nodes with labels matching the given glob.
has_option()
- selects nodes that have the given option(s) set.
has_type(types) by_section(sec_ref, keep_parents = TRUE) has_label(label) has_code(code) has_option(...)
has_type(types) by_section(sec_ref, keep_parents = TRUE) has_label(label) has_code(code) has_option(...)
types |
Vector of character type names, e.g. |
sec_ref |
character vector, a section reference selector. See details below for further details on how these are constructed. |
keep_parents |
Logical, retain the parent headings of selected sections.
Default: |
label |
character vector, glob patterns for matching chunk labels. |
code |
character vector, regex patterns for matching chunk code line(s) |
... |
Either option names represented by a scalar string or a named argument with the form
|
Section reference selectors are a simplified version of CSS selectors that are designed to enable the selection nodes in a way that respects the implied hierarchy of a document's section headings.
They consist of a character vector of heading names where each subsequent value
is assumed to be nested within the preceding value. For example, the section
selector c("Sec 1", "Sec 2")
would select all nodes that are contained within
a section named Sec 2
that is in turn contained within a section named Sec 1
(or a section contained within a section named Sec 1
, and so on).
The individual section names can be specified using wildcards (aka globbing
patterns), which may match one or more sections within the document, e.g.
c("Sec 1", "Sec *")
. See utils::glob2rx()
or
wikipedia
for more details on the syntax for these patterns.
All helper functions return an integer vector of selected indexes.
rmd = parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) rmd_select(rmd, has_type("rmd_chunk")) rmd_select(rmd, has_label("*dino")) rmd_select(rmd, has_option("message")) rmd_select(rmd, has_option(message = FALSE)) rmd_select(rmd, has_option(message = TRUE))
rmd = parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) rmd_select(rmd, has_type("rmd_chunk")) rmd_select(rmd, has_label("*dino")) rmd_select(rmd, has_option("message")) rmd_select(rmd, has_option(message = FALSE)) rmd_select(rmd, has_option(message = TRUE))
This is the equivalent of the source()
function for Rmd files or
their resulting asts.
rmd_source(x, local = FALSE, ..., label_comment = TRUE, use_eval = TRUE)
rmd_source(x, local = FALSE, ..., label_comment = TRUE, use_eval = TRUE)
x |
An Rmd document (e.g. |
local |
|
... |
Additional arguments passed to |
label_comment |
Attach chunk labels as comment before each code block. |
use_eval |
Use the |
Returns the result of source()
for any R code chunks.
rmd_source(system.file("examples/minimal.Rmd", package = "parsermd"), echo=TRUE)
rmd_source(system.file("examples/minimal.Rmd", package = "parsermd"), echo=TRUE)
Subset an rmd object based on sections, node types, or names.
rmd_subset( x, sec_refs = NULL, type_refs = NULL, name_refs = NULL, exclude = FALSE, keep_yaml = TRUE, keep_setup = FALSE, ... )
rmd_subset( x, sec_refs = NULL, type_refs = NULL, name_refs = NULL, exclude = FALSE, keep_yaml = TRUE, keep_setup = FALSE, ... )
x |
rmd object, e.g. |
sec_refs |
Section references, TODO - add details. |
type_refs |
Node type references, TODO - add details. |
name_refs |
Name references, TODO - add details. |
exclude |
Should the matching nodes be excluded. |
keep_yaml |
Should the document yaml be kept. |
keep_setup |
Should the document setup chunk be kept. |
... |
Unused, for extensibility. |
Returns a subset Rmd object (either rmd_ast
or rmd_tibble
depending on input).
Tools for selecting or checking a single node using rmd_subset()
selection.
rmd_get_node(x, sec_refs = NULL, type_refs = NULL, name_refs = NULL, ...) rmd_get_chunk(x, sec_refs = NULL, name_refs = NULL) rmd_get_markdown(x, sec_refs = NULL) rmd_has_node(x, sec_refs = NULL, type_refs = NULL, name_refs = NULL, ...) rmd_has_chunk(x, sec_refs = NULL, name_refs = NULL, ...) rmd_has_markdown(x, sec_refs = NULL, ...)
rmd_get_node(x, sec_refs = NULL, type_refs = NULL, name_refs = NULL, ...) rmd_get_chunk(x, sec_refs = NULL, name_refs = NULL) rmd_get_markdown(x, sec_refs = NULL) rmd_has_node(x, sec_refs = NULL, type_refs = NULL, name_refs = NULL, ...) rmd_has_chunk(x, sec_refs = NULL, name_refs = NULL, ...) rmd_has_markdown(x, sec_refs = NULL, ...)
x |
rmd object, e.g. |
sec_refs |
Section references, TODO - add details. |
type_refs |
Node type references, TODO - add details. |
name_refs |
Name references, TODO - add details. |
... |
Unused, for extensibility. |
rmd_get_*()
functions returns a single Rmd node object (e.g. rmd_heading
, rmd_chunk
, rmd_markdown
, etc.)
rmd_has_*()
functions return TRUE
if a matching node exists, FALSE
otherwise.
rmd
object.Templates are objects which are meant to capture the structure of an R Markdown document and facilitate the comparison between the template and new Rmd documents, usually to ensure the structure and/or content matches sufficiently.
rmd_template( rmd, keep_content = FALSE, keep_labels = TRUE, keep_headings = FALSE, keep_yaml = FALSE, ... )
rmd_template( rmd, keep_content = FALSE, keep_labels = TRUE, keep_headings = FALSE, keep_yaml = FALSE, ... )
rmd |
R Markdown document in the form of an |
keep_content |
Should the template keep the document's content (markdown text and chunk code). |
keep_labels |
Should the template keep the document's code chunk labels. |
keep_headings |
Should the template keep the document's headings. |
keep_yaml |
Should the template keep the document's yaml. |
... |
Unused, for extensibility. |
Returns an rmd_template
object, which is a derived tibble containing relevant structural
details of the document.
rmd = parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) rmd_select(rmd, by_section(c("Exercise *", "Solution"))) %>% rmd_template()
rmd = parse_rmd(system.file("examples/hw01.Rmd", package="parsermd")) rmd_select(rmd, by_section(c("Exercise *", "Solution"))) %>% rmd_template()