Online data dictionary creator

9/1/2023

To add a static text description to a column, you use the info_columns() function, which has the following syntax:Įach info_columns() description is a new layer added to the informant via a %>% or |> pipe.

You can describe as many or as few columns as you'd like in an informant those details aren't required. I don't always describe every column in a data set when the column names seem clear, although admittedly that's not best practice. For example, "Pop_2020" in a data set of Census population information seems pretty self-explanatory. Next, I'll add details about the data set's columns. An informant with custom metadata fields added. If you run the above info_tabular() code chunk or something similar and print the informant, you should see output similar to what's shown in Figure 3 at the top of your report. As an example, Stored in Listing 2 creates a hyperlink to my repository. Many pointblank arguments will take Markdown syntax such as ().

That way, you can standardize your metadata. I suggest creating a template file you can copy and fill in, or an RStudio code snippet, with the info_tabular() fields you want in your data dictionary. Source = "US Census Bureau and the R tidycensus package", Updates = "Does not update (except once every 10 years)", Library(pointblank) # install with install.packages if neededĭescription = "Table of US state populations from decennial censuses, with data from 2000, 2010, and 2020 as well as columns for percent changes and Census Bureau regions and divisions.", Feel free to use whatever import function you like best for importing CSV files- readr::read_csv(), vroom::vroom(), data.table::fread(), base R's read.csv(), etc. The code in Listing 1 loads the pointblank and dplyr packages and uses the rio package to read the file into R. Let's use a simple data set of US state population data to see it in action. To document a data set with pointblank in R, you start by creating a pointblank informant object with the create_informant() function. Create a data dictionary report with R and pointblank # However, it will not reverse items automatically.Here's how it all works. # identifying these aggregates allows the codebook function to The following line finds item aggregates with names like this: # If you are not using formr, the codebook package needs to guess which items Ninety_nine_problems = TRUE, # 99/999 are missing values, if they Negative_values_are_missing = FALSE, # negative values are missing values Only_labelled = TRUE, # only labelled values are autodetected as # omit the following lines, if your missing values are already properly labelledĬodebook_data <- detect_missing(codebook_data, Message = TRUE, # show messages during codebook generationĮrror = TRUE, # do not interrupt codebook generation in case of errors, Warning = TRUE, # show warnings during codebook generation If one wants to document large, private, or many datasets, or if you first need to add the metadata, it is easier to install the codebook package locally. Moreover, for very large datasets, you may get an error message, because the server limits the resources you can use. This is not permissible for certain restricted-use datasets. However, the webapp does not store edits, is not as interactive as working in R, and it requires the user to upload the dataset to a server. The webapp sets reasonable defaults and it is possible to edit the text and the R code to improve the resulting codebook. If you prefer a PDF over HTML (but remember, PDFs are much less readable for machines and hard to read on mobile devices), just remove the html_document block below. You'll get the most mileage out of this package by using data collected with and imported using the formr R package. csv), but the resulting codebook will be less useful. You can upload files without such metadata (e.g.

The codebook package uses variable and value labels, as well as labelled missing values to make sense of the data. All are read using rio, which means you can also upload zipped files, see rio docs for more information. The following file formats are supported, among others. This will also make it easier to document multiple data files in the same document, should you want to. The data you upload is not stored, but if you do not want to upload the data, you can also install the codebook R package on your computer using install.packages("codebook"). Unless you share the link, others cannot easily discover it. The codebook generated here will be stored for 24 hours.

0 Comments

Online data dictionary creator

Leave a Reply.

Author

Archives

Categories