Code
library(readr)
write_csv(datasets::iris, "iris.csv")
write_csv(datasets::mtcars, "mtcars.csv")
August 14, 2023
To increase the transparency of my work, I was interested in file hashing. It allows to assign a unique value to a file. Such a value is calculated with a defined algorithm.
In R, the rlang library provides the hash_file()
function and uses the XXH128 hash algorithm to generate a 128-bit hash.
This can be used to uniquely identify a data file. For reproducible research, you can add hash values for the datafiles used in a project to uniquely identify them.
Here, I have created two files for data sets from the datasets library and calculate the hash values of these files.
library(tidyverse)
tibble(files = fs::dir_ls(glob = "*.csv")) |>
mutate(hash = rlang::hash_file(files))
files | hash |
---|---|
iris.csv | dbdc1846dff7fba30a88d5b23e15ea80 |
mtcars.csv | 1d350737ac40dc6fb6ae8f5ad616fc4e |