```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Use this document to follow along with the **gt** basics working session, running provided code to see how it works as well as writing your own practice code. I recommend turning on the visual editor via the ![](https://rstudio.github.io/visual-markdown-editing/images/visual_mode_2x.png){width="3%"} button in the right-hand corner. You can find today's slides [here](https://aosmith16.github.io/spring-r-topics/slides/week04_gt_tables.html#1). The slides contain introductory material as well as a list of resources on **gt** tables that aren't included here. ## R packages Be sure to run code chunks in order, starting with loading packages. The main package we're using today is **gt**, but we'll also use package **dplyr** for data manipulation. Check that your package versions are up to date. ```{r, message = FALSE, warning = FALSE} library(gt) # v. 0.2.2 library(dplyr) # v. 1.0.5 ``` ## The datasets We will primarily work with built-in datasets, although I will also provide a small dataset for you to practice on. Creating display tables is generally going to involve data manipulation. This can be done in the same code pipeline as making the table, and many **gt** examples will include data manipulation code as well as table-making code. I want to focus in on just table features today, so we'll begin by creating very small examples from the datasets that ship with the **gt** package. ### The `gtcars` dataset This dataset is data on deluxe automobiles from 2014-2017. See `?gtcars` for more information. We'll pull out just 6 rows from two countries and keep 7 of the variables. ```{r} gtcars_small = gtcars %>% filter(ctry_origin %in% c("United States", "Japan")) %>% select(mfr:year, mpg_c, mpg_h, ctry_origin, msrp) gtcars_small ``` ### The `mtcars` dataset The `mtcars` dataset is data from 1974 Motor Trend road tests. It contains a fairly large number of numeric and integer variables that make it useful for practicing with. I'm taking just the first 6 rows and adding in some missing values. ```{r} mtcars_small = mtcars %>% head() %>% mutate( disp = c(NA, disp[2:6]), qsec = c(qsec[1:5], NA) ) %>% select(disp, hp, wt, qsec, carb) mtcars_small ``` ### Table of results Here is a small table of results I created after an analysis, which reports estimated ratios of medians between two groups for 4 `litterspp`. ```{r} results = structure(list(contrast = structure(c(1L, 1L, 1L, 1L), .Label = "DF / RA", class = "factor"), litterspp = structure(1:4, .Label = c("ACMA", "ALRU", "PSME", "TSHE"), class = "factor"), ratio = c(2.92534041512422, 3.98726047426825, 1.1303275363783, 1.69285339886012), lower.CL = c(1.35187771051096, 1.84261924981326, 0.522354456290416, 0.782312637958251), upper.CL = c(6.33017060479877, 8.62806903340073, 2.44592598782139, 3.66318079369337)), row.names = c(NA, 4L), class = "data.frame") results ``` ## Basic **gt** usage We start with the `gt()` function to create a basic **gt** table object. This is the first step in a typical workflow. The dataset is the first argument to `gt()`. You can put your dataset within `gt()` (i.e., `gt(data = gtcars_small)`), but it is standard to use pipe syntax when building **gt** tables. We'll be using that syntax throughout this session. This shows all the default settings when using `gt()`. ```{r} gtcars_small %>% gt() ``` ## Modify columns We're going to first focus on cleaning up the table body and columns. The [`cols_*()` functions](https://gt.rstudio.com/reference/index.html#section-modify-columns) allow for modifications of entire columns. We can control the column labels, cell alignment, column width and placement as well as combine columns with `cols_*()` functions. ### Column labels Relabel one or more labels using `cols_label()`. The original names are on the left and new names in quotes on the right within `cols_label()`. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) ``` You can control text formatting of labels with Markdown or HTML syntax. Such syntax can be used on any text in a **gt** table. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = md("**Manufacturer**"), # markdown bold syntax ctry_origin = html("Country") # html italic syntax ) ``` ### Column alignment Align all text within different columns using `cols_align()`. Most often we left-align text with varying length and right-align numbers. By default all columns will be aligned when using `cols_align()`. Choose which variables you want to align a certain way within `vars()`. Most columns above are already aligned how we might want them to be due to the `gt()` defaults, so this is for practice. **Of note**: While we changed the *label* of `mfr` to `Manufacturer`, the underlying column name is unchanged and so we must refer to the original name. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) %>% cols_align( align = "center", columns = vars(mfr, model) ) ``` Align different columns different ways by adding multiple `cols_align()` layers. You can see how the code pipe will quickly get very long when working with **gt**. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) %>% cols_align( align = "center", columns = vars(mfr, model) ) %>% cols_align( align = "left", columns = vars(mpg_c, mpg_h, msrp) ) ``` ### Column order Move columns to the start or end or wherever you'd like with the `cols_move_*()` functions. Functions `cols_move_to_start()` and `cols_move_to_end()` can move variables to the beginning or end of the table, respectively. Choose `columns` with `vars()` as in `cols_align()`. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) %>% cols_move_to_start( columns = vars(ctry_origin) ) ``` Use `cols_move()` to move a column *after* another column. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) %>% cols_move_to_start( columns = vars(ctry_origin) ) %>% cols_move( columns = vars(mpg_c, mpg_h), after = vars(msrp) ) ``` ### Column widths You can control with widths of columns either in pixels (`px()`) or percentage of the current size (`pct()`) with `cols_width()`. This is really starting to get at the overall look of your output table, where you might want a categorical variable that defines the rows of the table to be wider than others. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) %>% cols_move_to_start( columns = vars(ctry_origin) ) %>% cols_width( vars(ctry_origin) ~ px(150) ) ``` If you want many columns to be the same width, use `everything()` after defining the widths of other columns. ```{r} gtcars_small %>% gt() %>% cols_label( mfr = "Manufacturer", ctry_origin = "Country" ) %>% cols_move_to_start( columns = vars(ctry_origin) ) %>% cols_width( vars(ctry_origin, mfr, model) ~ px(120), everything() ~ px(50) ) ``` ### Merging columns You can combine multiple columns into one using `cols_merge()`. Here, for example, we could make a column that shows the range of `mpg` by combining the city (`mpg_c`) and highway (`mpg_h`) columns. Merging looks similar to previous functions, where we first define which columns we want to merge. ```{r} gtcars_small %>% gt() %>% cols_merge( columns = vars(mpg_c, mpg_h) ) ``` However, by default the new column is named based on only the first column and the values are put together with a space between them. The `pattern` argument controls how column values are combined. Use indices to refer to the combined columns; i.e., use `{1}` to refer to the first column listed and `{2}` for the second. I want a hyphen between the two values. ```{r} gtcars_small %>% gt() %>% cols_merge( columns = vars(mpg_c, mpg_h), pattern = "{1}-{2}" ) ``` ### Your turn Write code in the empty code chunk below. Starting with the `cols_merge()` code chunk we just used above, do the following: - We just made a new *merged* column called `mpg_c`. Change the name of this column to `Range`. - Change the name of `ctry_origin` to `Origin`. - Move `year` to be the first column, followed by `ctry_origin`. ```{r} # Write your code in this chunk and run to see the output ``` ## Format columns The **gt** package provides a series of [`fmt_*()` functions](https://gt.rstudio.com/reference/index.html#section-format-data) for formatting the values within columns. This can be done on entire rows or on individual cells (i.e., rows within columns). ### Number formatting The function `fmt_number()` is for formatting numeric columns, which includes, e.g., choosing the number of decimal places, setting the decimal separator (defaults to "."), and using large-number suffixes such as `K` for thousands. We'll start by setting the decimal place for `mpg_c` and adding a suffix to `msrp`. Note `decimals` defaults to 2. ```{r} gtcars_small %>% gt() %>% fmt_number( columns = vars(mpg_c), decimals = 1 ) %>% fmt_number( columns = vars(msrp), decimals = 0, suffixing = TRUE ) ``` ### Currency formatting Add a currency symbol using `fmt_currency()`. This function has many options we won't see today, including options for how to display negative values. First we'll add a dollar sign to `msrp`. Note `USD` is the default currency symbol, set with the `currency` argument. ```{r} gtcars_small %>% gt() %>% fmt_currency( columns = vars(msrp), currency = "USD", decimals = 0, suffixing = TRUE ) ``` We can change the currency symbol, using either 3-letter currency codes or common currency names. See the documentation for details as well as available currencies. We'll change the currency symbol to euros. ```{r} gtcars_small %>% gt() %>% fmt_currency( columns = vars(msrp), currency = "EUR", decimals = 0, suffixing = TRUE ) ``` Then to pounds. ```{r} gtcars_small %>% gt() %>% fmt_currency( columns = vars(msrp), currency = "pound", decimals = 0, suffixing = TRUE ) ``` ### Percent formatting Use `fmt_percent()` for percent formatting. Let's pretend the `mpg` columns are percents. ```{r} gtcars_small %>% gt() %>% fmt_percent( columns = vars(mpg_c, mpg_h), decimals = 0 ) ``` You can see that, by default, the values were multiplied by 100 prior to adding the `%` symbol. If your values already are percentages and not proportions, change this using `scale_value = FALSE`. ```{r} gtcars_small %>% gt() %>% fmt_percent( columns = vars(mpg_c, mpg_h), decimals = 0, scale_value = FALSE ) ``` ### Missing values formatting By default missing values in R are shown as `NA`. This can be controlled using `fmt_missing()`. The two `mpg` columns contain a missing value. Note my use of the select helper function `contains()` to choose all columns that have "mpg" in them. This uses the new default missing text, `---`. ```{r} gtcars_small %>% gt() %>% fmt_missing( columns = contains("mpg") ) ``` Change the missing text to `none` using `missing_text`. ```{r} gtcars_small %>% gt() %>% fmt_missing( columns = contains("mpg"), missing_text = "none" ) ``` ### Formatting specific cells All the `fmt_*()` functions work on entire columns by default. In [Jonathan Schwabish's "Ten Guidelines for Better Tables"](https://www.cambridge.org/core/journals/journal-of-benefit-cost-analysis/article/abs/ten-guidelines-for-better-tables/74C6FD9FEB12038A52A95B9FBCA05A12), he recommends adding symbols (e.g., currency, percentage symbols) to only the first row in each column. To add a symbol to only the first row of a column use the `rows` argument. Choose rows by index. In this case, we use `1` to choose the first row. Notice I am now choosing the two `mpg` columns by their position in the table. ```{r} gtcars_small %>% gt() %>% fmt_percent( columns = 4:5, rows = 1, decimals = 0, scale_values = FALSE ) ``` You'll note this changes the alignment of the column, which is not ideal. There is an open issue to fix this in **gt**. In the meantime, check out Thomas Mock's work-arounds in his [blog post](https://themockup.blog/posts/2020-09-04-10-table-rules-in-r/#rule-7-remove-unit-repetition). ### Date-time formatting While we aren't going to explore them today, if you are working with dates, times, or date-times you'll find the respective `fmt_*()` functions for those to be useful. Here's a link to some examples in the **gt** Cookbook: . ### Your turn You will be using `mtcars_small` for this exercise. - Convert `carb` to a percent with 1 decimal place, where `4` = `4.0%`. - Replace the missing value in `disp` with `---`. - Replace the missing value in `qsec` with "missing". - Add the `yen` currency symbol to `hp` with 0 decimal places. ```{r} # Write your code in this chunk and run to see the output ``` ## Row names, row groups, and summary rows Now that we've spent time working on the columns, let's switch to thinking about the table rows. The rows are what **gt** refers to as the *stub*. The stub is often a column of row labels that doesn't have (or need) a column label. We can also add in row grouping here. ### Row names The variable that represents the rows is an argument within the `gt()` function. For example, let's start by using `"ctry_origin"` as row labels. Note we need quotes around the variable we pass to `rowname_col`. ```{r} gtcars_small %>% gt(rowname_col = "ctry_origin") ``` You can see the column is moved to the beginning of the table and, by default, does not have a column name. If you want a column name, add one using `tab_stubhead()`. ### Row groups Adding groups can help us organize a table to tell a story. If our focus is on different countries, separating the table into country groups makes sense. This can be done by assigning a grouping variable with `groupname_col` in `gt()`. You can also pass a grouped dataset from **dplyr** to set the groups. Let's group by `ctry_origin`. ```{r} gtcars_small %>% gt(groupname_col = "ctry_origin") ``` By default the groups are organized based on the order groups appear in the table. You can change this using function `row_group_order()`. ### Row names and groups In many cases we'll want to have a row names column when we have groups. This helps the table organization. Let's group by `ctry_origin` and put `year` as the row names. ```{r} gtcars_small %>% gt( rowname_col = "year", groupname_col = "ctry_origin" ) ``` If you don't have an appropriate row names column, you may want to put in blank row names. See an example from the **gt** Cookbook [here](https://themockup.blog/static/gt-cookbook.html#create-blank-rownames) for one approach. ### Manual groups You can manually define groups using the function `tab_row_group()`. This involves using some sort of logical statement. Let's make two groups, one for "low" city `mpg_c` (\<21) and one for "high" city mpg. This involves two calls to `tab_row_group()`. If you have many groups, consider making a grouping column prior to creating the **gt** table rather than doing it this way. The name of the group is defined with `group` and the rows of interest are defined with logical operators in `rows`. ```{r} gtcars_small %>% gt() %>% tab_row_group( group = "Low city mpg", rows = mpg_c < 21 ) %>% tab_row_group( group = "High city mpg", rows = mpg_c >= 21 ) ``` ### Summary rows Once we have groups, we may want to add summaries for each group. To get summary rows by group use `summary_rows()` with `groups = TRUE`. You must also have a `rowname_col` as well as defined groups, which I believe is a [bug](https://github.com/rstudio/gt/issues/602) that has been reported. As in previous functions, you can choose which columns to work on in `columns`. You must define the function or functions you want to use for aggregation in a list. Put in list names to name the summary row in the output. ```{r} gtcars_small %>% gt( rowname_col = "year", groupname_col = "ctry_origin" ) %>% summary_rows( groups = TRUE, columns = vars(msrp), fns = list(Average = ~mean(.)) ) ``` Use the `formatter` argument to pass `fmt_*()` functions to control what the output looks like. Use any arguments you want to use in the `fmt_*()` function as further arguments. Here we add an additional group summary (SD) and round all results to 0 decimal places. ```{r} gtcars_small %>% gt( rowname_col = "year", groupname_col = "ctry_origin" ) %>% summary_rows( groups = TRUE, columns = vars(msrp), fns = list(Average = ~mean(.), SD = ~sd(.)), formatter = fmt_number, decimals = 0 ) ``` As always in R, watch out for missing values when summarizing. Using the tilde coding in the list of functions allows you to add additional arguments. You've likely noted non-summarized columns are given `---`. Change this with `missing_text`. I'll put in a blank as I average the `mpg` columns, removing the `NA` values prior to taking the mean. ```{r} gtcars_small %>% gt( rowname_col = "year", groupname_col = "ctry_origin" ) %>% summary_rows( groups = TRUE, columns = contains("mpg"), fns = list(Average = ~mean(., na.rm = TRUE)), missing_text = "" ) ``` If we have all numeric columns we can summarize them all at once. Leave the `columns` argument out of the function to do this. We'll switch to `mtcars_small` to see this in action. Note this only works right now if we have a `rowname_col`, which may be [fixed in the future](https://github.com/rstudio/gt/issues/602). ```{r} mtcars_small %>% gt( rowname_col = "hp", groupname_col = "carb" ) %>% summary_rows( groups = TRUE, fns = list(Average = ~mean(., na.rm = TRUE)) ) ``` ### Grand summary Add an overall summary with `grand_summary_rows()`. You could also leave `groups = TRUE` out of `summary_rows()`. ```{r} gtcars_small %>% gt( rowname_col = "year", groupname_col = "ctry_origin" ) %>% grand_summary_rows( columns = vars(msrp), fns = list(Average = ~mean(.)) ) ``` ### Your turn Take `gtcars_small` and do the following: - Group rows by `year` - Use `ctry_origin` as row names - Add the total `msrp` per group with the row name "Total" - Finally, add the overall total `msrp` across groups with the row name "Overall" ```{r} # Write your code in this chunk and run to see the output ``` ## Headers, column spanners, and notes We can add information to our table using a series of [`tab_*()` functions](https://gt.rstudio.com/reference/index.html#section-create-or-modify-parts). Per the **gt** documentation, these functions are for *creating or modifying parts* of your display table. ### Headers Add a title and/or subtitle as a table header using `tab_header()`. ```{r} gtcars_small %>% gt() %>% tab_header( title = "Deluxe automobiles from the 2014-2017 period", subtitle = "Data from the United States and Japan" ) ``` As we saw with column labels, we can control formatting with `md()` or `html()` syntax. ```{r} gtcars_small %>% gt() %>% tab_header( title = "Deluxe automobiles from the 2014-2017 period", subtitle = md("Data from the **United States** and **Japan**") ) ``` ### Spanner column labels When we have columns that contain similar information, we may want to add labels that span those columns. In a way this is similar to grouping, but now we're grouping columns instead of rows. This can be done with `tab_spanner()`. We choose what we want to call the grouped column using `label` and choose the columns to groups with `columns` much like in other functions we've used. ```{r} gtcars_small %>% gt() %>% tab_spanner( label = "Miles per gallon", columns = vars(mpg_c, mpg_h) ) ``` Tables can have multiple spanners encompassing different groups of columns. ```{r} gtcars_small %>% gt() %>% tab_spanner( label = "Miles per gallon", columns = vars(mpg_c, mpg_h) ) %>% tab_spanner( label = "Car type", columns = 1:2 ) ``` ### Source notes Source notes are generally notes about where the data in the table came from. Most often these will be at the bottom of the table, but you can move them other places (*not shown today*). Use `tab_source_note()` to add a source note. You can use `md()` and `html()` with source notes. ```{r} gtcars_small %>% gt() %>% tab_source_note( source_note = md("*Data from [package **gt**](https://gt.rstudio.com/reference/gtcars.html)*") ) ``` ### Footnotes We can add footnotes to add information on important components of the table that may not be clear. This is done with `tab_footnote()`. The `footnote` argument is where we define the text of the footnote. Again, use `md()` and/or `html()` to style the text. The `locations` argument is how we set the placement of the footnote. This involves a series of [`cells_*()` helper functions](https://gt.rstudio.com/reference/index.html#section-helper-functions) that we have not covered yet. We'll use one of these here but will not go through these functions in detail. We'll start by adding a footnote to the column label `msrp` to indicate these are given in USD. Since we are putting a footnote on a column label we need `cells_column_labels()` to help us choose just that column. ```{r} gtcars_small %>% gt() %>% tab_source_note( source_note = md("*Data from [package **gt**](https://gt.rstudio.com/reference/gtcars.html)*") ) %>% tab_footnote( footnote = "In USD", location = cells_column_labels( columns = vars(msrp) ) ) ``` **gt** keeps track of our footnote numbering for us. The footnote that is first in the table is given an smaller number even if we list them out of order in our code. ```{r} gtcars_small %>% gt() %>% tab_source_note( source_note = md("*Data from [package **gt**](https://gt.rstudio.com/reference/gtcars.html)*") ) %>% tab_footnote( footnote = "In USD", location = cells_column_labels( columns = vars(msrp) ) ) %>% tab_footnote( footnote = "City miles per gallon", location = cells_column_labels( columns = vars(mpg_c) ) ) ``` You can put footnotes wherever you want, including in individual cells, on row groups, etc. It all depends on what `cells*()` helper function you use. ### Your turn Using the `mtcars_small` dataset - Add a header title and subtitle to describe the table however you'd like - Group columns `disp`, `hp`, `qsec`, and `carb` with spanner label "Engine" - Add a source note to indicate the data are from a 1974 issue of Motor Trend magazine - Add a footnote to describe `disp` (which is engine displacement in cubic inches) ```{r} # Write your code in this chunk and run to see the output ``` ## Table styling The **gt** package allows for an enormous amount of customization, allowing you to build a *theme* for what your tables should look like. What you'll want your table to look like will depend on your audience, and a table for a manuscript will likely look different than one you build for an HTML document you are sharing. We'll take a cursory look at some options today and then look at some overall themes next week. ### Table option functions The family of `opt_*_*()` functions are for setting some commonly used table options. This includes adding row striping or setting the font of the whole table. Row stripes with `opt_row_striping()` can help distinguish rows. ```{r} gtcars_small %>% gt() %>% opt_row_striping() ``` Set overall fonts using fonts you have locally or use Google fonts via `google_font()` in `opt_table_font()`. (*This works fine below but won't show in the output HTML file unless you add Fira Mono onto your computer.*) ```{r} gtcars_small %>% gt() %>% opt_table_font(font = google_font("Fira Mono")) ``` ### Table output options Much of the work styling the table overall is done in the `tab_options()` function. Much like `themes()` from **ggplot2**, there are an enormous number of things you can change in `tab_options()`. Here I'll make the top and bottom lines black and remove all other horizontal lines within the main body of the table. ```{r} gtcars_small %>% gt() %>% tab_options( column_labels.border.top.color = "black", column_labels.border.bottom.color = "black", table_body.border.bottom.color = "black", table_body.hlines.color = "white" ) ``` To get rid of the line above and below the header, set the top and bottom border colors to white and the line below the header to black. I throw in `opt_align_table_header()` to left-align the header. ```{r} gtcars_small %>% gt() %>% tab_header( title = "Here is a header" ) %>% tab_source_note( source_note = "Give the source here" ) %>% tab_options( table.border.top.color = "white", heading.border.bottom.color = "black", table.border.bottom.color = "white", column_labels.border.top.color = "black", column_labels.border.bottom.color = "black", table_body.border.bottom.color = "black", table_body.hlines.color = "white" ) %>% opt_align_table_header(align = "left") # left align header ``` See the `tab_style()` function for controlling the style of individual parts of the table. ## Saving tables The workhorse of saving tables is `gtsave()`. From the documentation > The gtsave() function makes it easy to save a gt table to a file. The function guesses the file type by the extension provided in the output filename, producing either an HTML, PDF, PNG, LaTeX, or RTF file. You pass in a **gt** object to the `data()` argument and then set the file name in `filename`. If you want to save them somewhere outside your working directory set the path with `path`. We can practice this by naming one of our tables. So far we've only been printing them. ```{r} gt_output = gtcars_small %>% gt() %>% tab_header( title = "Here is a header" ) %>% tab_source_note( source_note = "Give the source here" ) %>% tab_options( table.border.top.color = "white", heading.border.bottom.color = "black", table.border.bottom.color = "white", column_labels.border.top.color = "black", column_labels.border.bottom.color = "black", table_body.border.bottom.color = "black", table_body.hlines.color = "white" ) %>% opt_align_table_header(align = "left") ``` *You may need to install PhantomJS prior to being able to saving image files, particularly if you have a Windows machine.* ```{r} webshot::install_phantomjs(version = "2.1.1", baseURL = "https://github.com/wch/webshot/releases/download/v0.3.1/", force = FALSE) ``` Save the table as a PNG image file into the same directory as this document. ```{r, eval = FALSE} gtsave( data = gt_output, filename = "test1.png" ) ``` If you want to use a table in Word you'll need the RTF format. You may still need to modify some formats (like column widths) manually after you open in Word. ```{r, eval = FALSE} gtsave( data = gt_output, filename = "test1.rtf" ) ``` ## Final exercise For the final practice exercise you will clean up the small table of results named `results` and save it. Here's what you should do: - Use `litterspp` as the row names - Use `contrast` as the row group - Increase the column width of `litterspp` to 100 px - Round the numeric columns to a single decimal place - Change the label of `ratio` to "Ratio" - Merge the two CL columns with a comma between the numbers and name the result "95% CI" - Add a header that says "Comparisons" - Add a source note in italics that says "Data from Siuslaw study" - Save this as a PNG file called "final_exercise.png" Optional if you have time: - Use the same `tab_options()` as above but add `row_group.border.top.color = "black"`, `row_group.border.bottom.color = "white"`, and `stub.border.color = "transparent"` You may want to do these steps one or a few at a time to help troubleshoot any problems that arise. Sometimes if we add too much at once it is hard to tell where any mistakes are coming from. ```{r} # Write your code in this chunk and run to see the output ```