Skip to content

Instantly share code, notes, and snippets.

@JosiahParry
Last active March 26, 2025 15:38
Show Gist options
  • Save JosiahParry/fc52a7b241bc400f55742df872a4e8d4 to your computer and use it in GitHub Desktop.
Save JosiahParry/fc52a7b241bc400f55742df872a4e8d4 to your computer and use it in GitHub Desktop.
Read R’s C headers to identify where R functions are coming from
library(dplyr)
api <- as_tibble(tools:::funAPI())
# list all header files from include dir
header_files <- list.files(
R.home("include"),
full.names = TRUE,
recursive = TRUE,
pattern = "*.h"
)
read_header <- function(.h) {
# find include dir
include_dir <- R.home("include")
# generate the clang ar #thanksChatGPT
system_cmd <- sprintf(
"clang -Xclang -ast-dump=json -fsyntax-only -x c -I%s %s",
include_dir,
.h
)
# parse and read
json <- paste(system(system_cmd, intern = TRUE), collapse = "")
in_header <- yyjsonr::read_json_str(json)[[c("inner", "name")]]
header_fns <- tibble(
name = in_header,
header_path = gsub(paste0(include_dir, "/"), "", .h)
)
header_fns
}
all_api_fns <- lapply(header_files, read_header) |>
bind_rows()
res <- inner_join(api, all_api_fns, by = "name")
@aitap
Copy link

aitap commented Mar 26, 2025

Some of the duplication may come from the headers being #included by other headers (e.g. R_ext/Callbacks.h including Rinternals.h). Unfortunately, parsing C involves preprocessing it, which in turn involves substituting contents of other files into your translation unit.

clang's -ast-dump=json handles this by including the ['loc']['file'] JSON entry, except in an attempt at compression it is omitted when it is identical to that of the previous item. (Also I couldn't find any official documentation confirming this.) Once you expand the deduplicated ['loc']['file'] entries, it should be possible to filter for declarations directly belonging to your file, not its #includes.

Some API corner cases (e.g. Fortran functions d1mach, rexit, rwarn) may be not declared in C headers at all because they are aimed at MIL-STD-1753-era Fortran, to be used with implicit interfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment