Skip to content

Instantly share code, notes, and snippets.

@jflanaga
Last active June 8, 2023 12:02
Show Gist options
  • Save jflanaga/1ab2fa1434064780d2237e73d9e669c4 to your computer and use it in GitHub Desktop.
Save jflanaga/1ab2fa1434064780d2237e73d9e669c4 to your computer and use it in GitHub Desktop.
R Script for splitting data frame and then saving separate .csv
#----------------------------------------------------------------------------------------
# File:
# Author: Joseph Flanagan, adopted from https://stackoverflow.com/questions/10002021/split-dataframe-into-multiple-output-files-in-r
# email: [email protected]
# Purpose: Split a dataframe by group, then save each as separate .csv file
#----------------------------------------------------------------------------------------
# new tidyverse solution with `group_walk`
library(dplyr)
library(readr)
iris %>%
group_by(Species) %>%
group_walk(~ write_csv(.x, paste0(.y$Species, ".csv")))
# Old version
library(tidyverse)
# Make a copy of iris
iris2 <- iris
# Split by variable
spt2 <- split(iris2, iris2$Species)
# Save
lapply(names(spt2), function(x){
write_csv(spt2[[x]], paste(x, ".csv", sep = ""))
})
@millassch
Copy link

millassch commented Nov 23, 2021

Thank you for posting this!

This solution almost worked for my work.
write_csv has been deprecated. write_csv2 is an option.

I personally like write.table because it gives me the freedom to control the parameters I need, like sep = ","

I updated the last part to:

lapply(names(spt2, function(x){
write.table(
spt2[[x]],
file = paste(x, ".csv", sep = ""),
row.names = F,
col.names = T,
sep=",",
na = ""
)
})

@jflanaga
Copy link
Author

This solution almost worked for my work. write_csv has been deprecated. write_csv2 is an option.

Sorry, are you sure that write_csv() from the tidyverse has been deprecated? Where did you see that? And I thought that the difference between write_csv() and write_csv2() is more an issue of whether you want a comma as a delimiter (US and others) or a semi-colon (at least some European countries). And that choice mostly due to whether the locale in question uses a point or a comma as a decimal separator.

@millassch
Copy link

Sorry, are you sure that write_csv() from the tidyverse has been deprecated? Where did you see that? And I thought that the difference between write_csv() and write_csv2() is more an issue of whether you want a comma as a delimiter (US and others) or a semi-colon (at least some European countries). And that choice mostly due to whether the locale in question uses a point or a comma as a decimal separator.

I'm sorry, I'm mistaken. The "path" argument inside of write_csv is deprecated, I just quickly read the error message R spilled out and did not realize it was the argument and not the entire function.

@jflanaga
Copy link
Author

The "path" argument inside of write_csv is deprecated

Thanks for the update. I just saw that path = is deprecated and we should now use file = . I updated it so just to avoid the issue completely (although my preference is for arguments after the first to be named).

I also updated the original solution it to take advance of group_walk from dplyr. It's now a much cleaner solution that doesn't rely upon lapply() at all.

I prefer to use readr for most cases just for convenience, but obviously there are a range of alternatives and people can always find another package for their own use cases.

@endlesstour
Copy link

thank you!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment