4 Documenting Trust

4.1 How do you document your trust in an open source solution?

How do we have document our trust that an open source solution is accurate?
How do we know if a third-party will accept our documentation of trust?

4.2 Community Input [DRAFT]

Once we’ve chosen our process, [we must] either demonstrate that any human action in the process is without error (quality control and quality assurance) or any machine action in the process works as intended (testing and validation). We accomplish this by demonstrating the accuracy, reproducibility and traceability of the data which is transformed through that process.
– Michael Rimler, Can Clinical Data Processed With R Be Used in a Regulatory Submission?, PHUSE Blog, 2022

The two primary efforts which have tackled the question of documenting trust in software solutions in the industry come from:

Transcelerate’s Modernization of Statistical Analytics (MSA) Framework
R Validation Hub

Each of these efforts speak to the value of accuracy, reproducibility, and traceability in establishing trust in a software solution, which must be documented in case it is ever interrogated by a third-party.

According to Transcelerate’s MSA Framework:

Accuracy is the “measure of correctness of software libraries that are used to generate results.”
Reproducibility indicates the ability that that result “can be recreated from the original dataset along with the associated environment, including all artifacts and dependencies.”
Traceability “refers to the ability to trace inputs to outputs, with the main goal being to provide evidence to connect e-data, code, and environment to the final output that is produced.

The R Validation Hub cites the FDA’s Glossary of Computer System Software Development Technology in defining validation as “establishing documented evidence which provides a high degree of assurance (accuracy) that a specific process consistently (reproducibility) produces a product meeting its predetermined specifications (traceability) and quality attributes.”

In 2020, the R Validation Hub issued a white paper “A Risk-based Approach for Assessing R Package Accuracy within a Validated Infrastructure,” suggesting a risk assessment framework based on four criteria:

Purpose
Maintenance Good Practice (SDLC)
Community Usage
Testing

Essentially, we may choose to trust a solution if that solution:

is well documented
follows a well-defined and well-designed software development process,
is widely used by the community, and
contains documentable and thorough testing of algorithms

Andy Nicholls, former Chair of the R Validation Hub and co-author of the white paper, suggests that there are 3 key documents required for establishing trust in a solution:

Explanation of the overall approach to validation
Explanation of why one might be willing to accept suites of packages, full-stop. For example, when considering tidyverse or core R packages, documenting that “We’ve reviewed [list of documents] from posit (or R Foundation) and accept that they follow good practice.” may be sufficient.
An assessment report for each additional package, which may include both human-written interpretations and automated metrics.

4.2.1 Useful Packages for R

The risk assessment framework described in the white paper has resulted in two open-sourced R packages which perform automated checks on characteristics which correlate with the four criteria:

riskmetric: “a framework to quantify an R package’s”risk of use” by assessing a number of meaningful metrics designed to evaluate package development best practices, code documentation, community engagement, and development sustainability.”
riskassessment: “an R package containing a shiny front-end to augment the utility of the {riskmetric} package within an organizational context.”

In addition, the {valtools} package “helps automate the validation of R packages used in clinical research and drug development” by providing “useful templates and helper functions for tasks that arise during project set up and development of the validation framework.”

Another open-sourced R package is {thevalidatoR} which is “a GitHub action that generates a validation report for an R package.”

These packages help a user perform and document risk assessments on R packages, which aids in justifying why that individual (or organization) places trust in the results of the functionality provided by the solution.

4.2.2 Approaches Across Industry

PHUSE’s End-to-End Open-Source Collaboration Guidance references a Case Studies Repository, “which contains examples from Roche, Merck and Novartis on how they approach validation and risk mitigation.” The main takeaway, according to James Black, is of “the 4 companies that shared their process for assessing R packages, each company currently takes a very different approach.” Coline Zeballos presented on Roche’s to package validation at the R/Pharma conference, both in 2021 and 2022 (co-presenting with Doug Kelkhoff).

4.3 Discussion

Contribute to the discussion here in GitHub Discussions:
How do you document your trust in an open source solution to satisfy a third-party inquiry?