Last active
March 26, 2025 15:38
-
-
Save JosiahParry/fc52a7b241bc400f55742df872a4e8d4 to your computer and use it in GitHub Desktop.
Read R’s C headers to identify where R functions are coming from
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(dplyr) | |
api <- as_tibble(tools:::funAPI()) | |
# list all header files from include dir | |
header_files <- list.files( | |
R.home("include"), | |
full.names = TRUE, | |
recursive = TRUE, | |
pattern = "*.h" | |
) | |
read_header <- function(.h) { | |
# find include dir | |
include_dir <- R.home("include") | |
# generate the clang ar #thanksChatGPT | |
system_cmd <- sprintf( | |
"clang -Xclang -ast-dump=json -fsyntax-only -x c -I%s %s", | |
include_dir, | |
.h | |
) | |
# parse and read | |
json <- paste(system(system_cmd, intern = TRUE), collapse = "") | |
in_header <- yyjsonr::read_json_str(json)[[c("inner", "name")]] | |
header_fns <- tibble( | |
name = in_header, | |
header_path = gsub(paste0(include_dir, "/"), "", .h) | |
) | |
header_fns | |
} | |
all_api_fns <- lapply(header_files, read_header) |> | |
bind_rows() | |
res <- inner_join(api, all_api_fns, by = "name") | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Some of the duplication may come from the headers being
#include
d by other headers (e.g.R_ext/Callbacks.h
includingRinternals.h
). Unfortunately, parsing C involves preprocessing it, which in turn involves substituting contents of other files into your translation unit.clang
's-ast-dump=json
handles this by including the['loc']['file']
JSON entry, except in an attempt at compression it is omitted when it is identical to that of the previous item. (Also I couldn't find any official documentation confirming this.) Once you expand the deduplicated['loc']['file']
entries, it should be possible to filter for declarations directly belonging to your file, not its#include
s.Some API corner cases (e.g. Fortran functions
d1mach
,rexit
,rwarn
) may be not declared in C headers at all because they are aimed at MIL-STD-1753-era Fortran, to be used with implicit interfaces.