Topic 5: Deeper dive into how to provide and pre-process data in teal.modules.clinical (15 minutes)

How to provide data to a teal application? - code samples

The data argument in teal::init() is where you specify all the datasets your application will use. There are several ways to provide data to teal:

  • using teal_data()
  • using cdisc_data() for clinical data

Option 1: Using teal_data()

library(teal.data)

# Create a basic teal_data object
my_data <- teal_data(
  IRIS = iris,
  MTCARS = mtcars
)

# View the structure
print(my_data)
βœ– code unverified
<environment: 0x562e0b14c590> πŸ”’ 
Parent: <environment: package:teal.data> 
Bindings:
- IRIS: [data.frame]
- MTCARS: [data.frame]

Option 2: Adding code for reproducibility

One of the key features of teal.data is its ability to track the code used to generate data:

# Create teal_data with explicit code tracking
my_data <- teal_data()

# Add datasets with code
my_data <- within(my_data, {
  # Load and prepare iris data
  IRIS <- iris
  IRIS$Species <- as.factor(IRIS$Species)

  # Load and prepare mtcars data
  MTCARS <- mtcars
  MTCARS$cyl <- as.factor(MTCARS$cyl)
  MTCARS$gear <- as.factor(MTCARS$gear)
})

# View the tracked code
get_code(my_data) |>
  cat()
IRIS <- iris
IRIS$Species <- as.factor(IRIS$Species)
MTCARS <- mtcars
MTCARS$cyl <- as.factor(MTCARS$cyl)
MTCARS$gear <- as.factor(MTCARS$gear)

Option 3: Using cdisc_data() for clinical data

For clinical trial data following CDISC standards, teal.data provides the specialized cdisc_data() function that automatically handles common clinical data relationships.

library(teal.data)
library(pharmaverseadam)

adsl <- pharmaverseadam::adsl
adae <- pharmaverseadam::adae

# Create cdisc_data object
clinical_data <- cdisc_data(
  ADSL = adsl,
  ADAE = adae,
  code = c(
    "adsl <- pharmaverseadam::adsl",
    "adae <- pharmaverseadam::adae"
  )
)

print(clinical_data)
βœ– code unverified
<environment: 0x562e0f570a80> πŸ”’ 
Parent: <environment: package:pharmaverseadam> 
Bindings:
- ADAE: [tbl_df]
- ADSL: [tbl_df]

Introduction to teal.data in the context of the CDISC data standard

The teal.data package serves as the foundation for all data operations in teal. It provides a structured way to:

  • Store multiple datasets in a single object
  • Track the code used to create or modify data
  • Define relationships between datasets

When you create a teal_data object, teal does several things behind the scenes:

  1. Data storage - Your datasets are stored within the object
  2. Code tracking - The creation process is recorded for reproducibility
  3. Metadata creation - Information about datasets and their relationships is stored

Creating cdisc_data objects with explicit code

# More explicit approach with within()
library(teal.data)
library(teal)
library(pharmaverseadam)

clinical_data <- cdisc_data()
clinical_data <- within(clinical_data, {
  adsl <- pharmaverseadam::adsl
  adae <- pharmaverseadam::adae
})
join_keys(clinical_data) <- teal.data::default_cdisc_join_keys[c("ADSL", "ADAE")]

# View the structure and join keys
print(clinical_data)
print(join_keys(clinical_data))

Introduction to the concept of keys in teal.data

join_keys define how datasets relate to each other, which is crucial for: - Proper filtering across related datasets (filtering on the parent dataset filters entries in the child dataset as well) - Correct data merging in modules (join_key are used by the modules to merge datasets together)

Automatic join key detection with cdisc_data()

One of the key advantages of cdisc_data() is automatic join key detection:

library(pharmaverseadam)
library(teal)
Loading required package: shiny
Loading required package: teal.slice
Registered S3 method overwritten by 'teal':
  method        from      
  c.teal_slices teal.slice

You are using teal version 1.1.0

Attaching package: 'teal'
The following objects are masked from 'package:teal.slice':

    as.teal_slices, teal_slices
library(teal.data)

adsl <- pharmaverseadam::adsl
clinical_data_1 <- teal.data::cdisc_data(ADSL = adsl, code = "ADSL <- pharmaverseadam::adsl")
print(join_keys(clinical_data_1))
A join_keys object containing foreign keys between 1 datasets:
ADSL: [STUDYID, USUBJID] 
clinical_data_2 <- teal.data::cdisc_data()
clinical_data_2 <- within(clinical_data_2, {
  ADSL <- pharmaverseadam::adsl
})
print(join_keys(clinical_data_2))
An empty join_keys object. 

Notice the difference in the two above prints. When we pass a dataset directly to cdisc_data constructor, the constructor automatically adds the join_keys object. Whereas in the second example, we used within and our object did not automatically receive correct join_keys.

Understanding join key output

library(pharmaverseadam)
library(teal.data)
library(teal)
# Create example with multiple relationships
complex_data <- cdisc_data(
  ADSL = pharmaverseadam::adsl,
  ADAE = pharmaverseadam::adae,
  ADEX = pharmaverseadam::adex
)
join_keys(complex_data) <- teal.data::default_cdisc_join_keys[c("ADSL", "ADAE", "ADEX")]

# Understanding the output:
# - Each line shows a relationship between two datasets
# - The arrow (->) indicates the direction (parent -> child)
# - Variables in brackets show the joining columns
# - ADSL is typically the parent (subject-level data)
# - the implicit join keys are created automatically when
#   teal.data detects a possible join through a "middleman" dataset

# Print and interpret join keys
cat("Join Keys Structure:\n")
Join Keys Structure:
print(join_keys(complex_data))
A join_keys object containing foreign keys between 3 datasets:
ADSL: [STUDYID, USUBJID]
  <-- ADAE: [STUDYID, USUBJID]
  <-- ADEX: [STUDYID, USUBJID]
ADAE: [STUDYID, USUBJID, ASTDTM, AETERM, AESEQ]
  --> ADSL: [STUDYID, USUBJID]
  --* (implicit via parent with): ADEX
ADEX: [STUDYID, USUBJID, PARCAT1, PARAMCD, AVISITN, ASTDTM, EXSEQ]
  --> ADSL: [STUDYID, USUBJID]
  --* (implicit via parent with): ADAE 

Creating a custom teal.data object with custom user-defined keys - code samples

Sometimes you need to create or modify join keys manually, especially when working with non-standard data structures or when you need custom relationships.

Manual join keys creation

# Create a teal_data object without automatic join detection
my_data <- teal_data(
  PATIENTS = data.frame(
    patient_id = 1:100,
    age = sample(18:80, 100, replace = TRUE),
    treatment = sample(c("A", "B"), 100, replace = TRUE)
  ),
  VISITS = data.frame(
    patient_id = rep(1:100, each = 4),
    visit_num = rep(1:4, 100),
    visit_date = seq.Date(as.Date("2024-01-01"), by = "week", length.out = 400),
    measurement = rnorm(400, 100, 15)
  ),
  EVENTS = data.frame(
    patient_id = sample(1:100, 200, replace = TRUE),
    event_type = sample(c("AE", "CM", "EX"), 200, replace = TRUE),
    event_date = sample(seq.Date(as.Date("2024-01-01"), as.Date("2024-12-31"), by = "day"), 200)
  )
)

# Manually define join keys
join_keys(my_data) <- join_keys(
  join_key("PATIENTS", "VISITS", "patient_id"),
  join_key("PATIENTS", "EVENTS", "patient_id")
)

# View the join keys
print("Manual join keys:")
[1] "Manual join keys:"
print(join_keys(my_data))
A join_keys object containing foreign keys between 3 datasets:
PATIENTS: [no primary keys]
  <-- VISITS: [patient_id]
  <-- EVENTS: [patient_id]
VISITS: [no primary keys]
  --> PATIENTS: [patient_id]
  --* (implicit via parent with): EVENTS
EVENTS: [no primary keys]
  --> PATIENTS: [patient_id]
  --* (implicit via parent with): VISITS 

Overwriting automatic join keys in cdisc_data

Sometimes the automatic join key detection in cdisc_data() doesn’t match your specific needs. Here’s how to override the default behavior:

library(pharmaverseadam)
library(teal)
library(teal.data)

custom_keys <- join_keys(
  # Primary keys for each dataset
  join_key("ADSL", "ADSL", c("STUDYID", "USUBJID")),
  join_key("ADAE", "ADAE", c("STUDYID", "USUBJID", "AESEQ")),
  join_key("ADCM", "ADCM", c("STUDYID", "USUBJID", "CMSEQ")),

  # Relationships between datasets
  join_key("ADSL", "ADAE", c("STUDYID", "USUBJID")),
  join_key("ADSL", "ADCM", c("STUDYID", "USUBJID")),

  # Custom: Allow direct relationship between ADAE and ADCM
  # This might be useful for analyzing AEs and concomitant medications together
  join_key("ADAE", "ADCM", c("STUDYID", "USUBJID"))
)

custom_data <- cdisc_data(
  ADSL = pharmaverseadam::adsl,
  ADAE = pharmaverseadam::adae,
  ADCM = pharmaverseadam::adcm
)

# Override the automatic join keys
join_keys(custom_data) <- custom_keys

# View the custom join keys
cat("\nCustom join keys:\n")

Custom join keys:
print(join_keys(custom_data))
A join_keys object containing foreign keys between 3 datasets:
ADSL: [STUDYID, USUBJID]
  <-- ADAE: [STUDYID, USUBJID]
  <-> ADCM: [STUDYID, USUBJID]
ADAE: [STUDYID, USUBJID, AESEQ]
  --> ADSL: [STUDYID, USUBJID]
  <-- ADCM: [STUDYID, USUBJID]
ADCM: [STUDYID, USUBJID, CMSEQ]
  <-> ADSL: [STUDYID, USUBJID]
  --> ADAE: [STUDYID, USUBJID] 

πŸ› οΈ Exercise

  • load teal.data
  • create a teal_data object named basic_data that bundles the built-in iris and mtcars datasets
  • print teal_data - notice the verification status printed in the console
  • try to verify the code you provided to teal_data actually reproduces the data you stored inside the object
  • inspect the resulting object with print() and get_code()
library(teal.data)

basic_data <- teal_data(iris = iris, mtcars = mtcars, code = "iris <- iris; mtcars <- mtcars")
print(basic_data)
verify(basic_data)
print(basic_data)
cat(get_code(basic_data))

πŸ› οΈ Exercise

  • use teal_data() together with within() (or an equivalent approach) to construct an object named tracked_data.
  • make at least one transformation to each dataset (e.g., convert a column to a factor, create a derived variable).
  • confirm that get_code(tracked_data) records the transformation steps.
library(teal.data)

tracked_data <- teal_data(iris = iris, mtcars = mtcars, code = "iris <- iris; mtcars <- mtcars")
tracked_data <- within(tracked_data, {
  iris$Species <- as.factor(iris$Species)
  mtcars$new_column <- 3
})
cat(get_code(tracked_data))

πŸ› οΈ Exercise

  • load the pharmaverseadam package and pull the datasets adsl, adae, and adtte.
  • build a single teal_data object named adam_manual containing those three datasets.
  • define custom join keys that mimic the automatic CDISC relationships:
    • ADSL -> ADAE on c("STUDYID", "USUBJID")
    • ADSL -> ADTTE on c("STUDYID", "USUBJID")
    • Optionally add subject-level self keys for each table.
  • assign the custom keys with join_keys(adam_manual) <- ....
  • confirm the structure by printing the join keys.
library(pharmaverseadam)
library(teal.data)
library(teal)

adam_manual <- teal_data(adsl = adsl, adae = adae, adtte = adtte_onco)
custom_join_keys <- join_keys(
  join_key("ADSL", "ADAE", c("STUDYID", "USUBJID")),
  join_key("ADSL", "ADTTE", c("STUDYID", "USUBJID"))
)
join_keys(adam_manual) <- custom_join_keys
print(join_keys(adam_manual))

πŸ› οΈ Exercise

  • use one of the application defined by you in exercises or one of the applications shown as examples during workshops
  • debug issues with join keys if any
  • use the Show R Code button to verify the application returns code that you can use to reproduce the output
library(pharmaverseadam)
library(teal.data)
library(teal)
library(teal.modules.clinical)

adam_manual <- teal_data(ADSL = adsl, ADAE = adae, ADTTE = adtte_onco)
adam_manual <- within(adam_manual, {
  ADSL$ARM <- as.factor(ADSL$ARM)
  ADSL$ARMCD <- as.factor(ADSL$ARMCD)
})
custom_join_keys <- join_keys(
  join_key("ADSL", "ADAE", c("STUDYID", "USUBJID")),
  join_key("ADSL", "ADTTE", c("STUDYID", "USUBJID")),
  join_key("ADSL", "ADSL", c("STUDYID", "USUBJID"))
)
join_keys(adam_manual) <- custom_join_keys
print(join_keys(adam_manual))

app <- init(
  data = adam_manual,
  modules = modules(
    tm_t_events(
      label = "Adverse Event Table",
      dataname = "ADAE",
      arm_var = choices_selected(c("ARM", "ARMCD"), "ARM"),
      llt = choices_selected(
        choices = variable_choices("ADAE", c("AETERM", "AEDECOD")),
        selected = c("AEDECOD")
      ),
      hlt = choices_selected(
        choices = variable_choices("ADAE", c("AEBODSYS", "AESOC")),
        selected = "AEBODSYS"
      ),
      add_total = TRUE,
      event_type = "adverse event",
      sort_criteria = "alpha",
      pre_output = shiny::div("Who won the most at the poker last night?")
    )
  )
)

shinyApp(app$ui, app$server)

🌐 References