6 Regulatory Acceptance – Open Source Technology in Clinical Data Analysis

6.1 Will the regulatory agencies accept data and analyses generated with solutions developed and available as open source?

What do we know regarding data submissions to FDA?
What do we know regarding data submissions to other regulatory agencies?
Are there technical considerations for the creation of submission data packages?

6.2 (Perceived) Fear

There has been a long debate about whether regulatory agencies would accept submissions using R and Open Source tools. This is in spite of communications from agency staff for over 15 years refuting this and stating that the agency would not and could not endorse any specific software tool for sponsor submissions (Bell 2006; Soukup 2007; Drug Administration (FDA) 2015; Schuette 2018). Yet still there has been reticence from pharmaceutical industry sponsors around using R for clinical trials reporting and analysis of key endpoints. Meanwhile clinical pharmacology and pharmacometrics groups within many pharma companies have been using R for over 10 years in regulatory submissions and interactions, preparing graphics for presentation, model diagnostics and data summaries. So while some business areas within the industry are comfortable using R and other open source tools, other parts are more nervous about committing fully to an open source software solution.

What is driving this nervousness then? If the regulatory agencies are telling us that they will accept submissions with R then what is stopping it from being more widely used within and across pharmaceutical companies?

Perhaps it is down to fear of having a dossier rejected by a regulatory agency - “Refusal to file” - due to application deficiencies. In this context, could the use of open source software trigger this “Refusal to file”? Since “Refusal to file” could have a significant impact on a company’s reputation, confidence and ultimately their stock price, it is understandable that companies are cagey. Also, who would want to be first to try out this untrodden path? It is easier to follow in the footsteps of somebody else than to forge a path yourself.

6.3 (Reducing) Uncertainty

“…there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know.”

– Donald Rumsfeld

Fortunately we now have a growing number of examples of companies who are on this journey towards using more open source software in drug development and, critically, in regulatory interactions and submission (Knoph 2023; Neitman 2023; Bowsher, n.d.). There is also the R Consortium Submissions Working Group (“R Consortium Submissions Working Group” 2021) - a cross industry pharma working group focusing on improving practices of R-based clinical trial regulatory submissions. This group has open dialogue with colleagues at the US FDA regulatory agency around technical issues of filing submissions using R deliverables and pilots these using open source examples in order to check and verify that submissions via the FDA electronic submissions process are received and can be validated within the agency. This eliminates some elements of fear and uncertainty from sponsors around the technical aspects, allows regulatory agency colleagues to provide feedback with the working group members in an environment which is not high-pressure or business critical to a sponsor. It is in the interest of both regulators and industry to modernise, to acknowledge a shift in tools that are being used to conduct clinical development of new therapeutics and be ready for whatever future state emerges. To be able to do this with increased confidence and mutual understanding is vital to this modernisation.

The Novo Nordisk experience (Knoph 2023) provides critical learnings. In their filing to US FDA, PMDA and other health authorities, they used a combination of SAS and R to prepare submission datasets and deliverables. They described their process as being an evolution, rather than revolution, using R to replicate tables and figures usually generated via SAS. Key also to the process was building an infrastructure and environment around R that would meet expectations pertaining to quality and reproducibility. This, more so than the replication of SAS output provides the backbone of “validated” work in R. They communicated in advance of submission with the regulatory authorities, which allowed both parties (sponsor and authority) to set expectations and “get ready” in good time. This did not prevent “Information Requests” from the FDA for clarification - causing anxious moments for the sponsor - but in the end this was about information sharing, reaching mutual understanding, for clarification and ensuring that the FDA internal environment would match Novo Nordisk’s rather than a refusal to accept open source software approaches. The results of their work is giving Novo Nordisk and the pharmaceutical industry generally, more confidence to move forward with R and open source tools as the core of their submission work.

Similarly, Roche (Neitman 2023) is spending a great deal of effort in preparation - organisationally and technologically - for submission to regulatory agencies. The common thread across both Novo Nordisk and Roche / Genentech is about preparation and dialogue within the company (with regulatory, quality assurance and process groups) and with regulatory agencies in the lead up to submission. Proper preparation prevents poor performance, as the aphorism goes.

6.4 (Avoiding) Doubt

As has been mentioned, the Submissions Working Group of the R Consortium is actively partnering with US FDA to pilot submissions through the agency’s electronic submissions portal. These pilot submissions are created and presented on Github as open source code, so it is possible for anyone to review what was submitted, how, and the agency’s feedback and report on which elements were successful and any issues found with the submission. The pilots are vital for sponsors to understand the mechanisms of submission using open source software like R, and for regulatory agencies to understand how to reconstitute sponsor software environments, install packages and confirm results. This dialogue reduces the likelihood of any fundamental issues with submission using open source software, but does not completely eliminate sources of potential problems. In a sense, the pilots uncover known unknowns - the things that we expect to cause issues - so that they can be discussed and addressed on both sides. The sponsor needs to be sure to provide sufficient information that the agency has the best chance of recreating the environment for reproducing the results. Through the pilot process then, both parties can understand what needs to be communicated at the time of submission to reduce unexpected findings and business critical issues that need to resolved quickly.

The pilots are also allowing sponsors and agencies to look at modernising the content of the submission, moving from static tables, listings and figures to more dynamic, interactive presentations through web applications and dynamic HTML presentations. This goes beyond the “evolution” described by Novo Nordisk towards a true “revolution” in what sponsors submit to agencies and how they review the contents of those submissions.

Yet still some doubts persist. If a sponsor submits analytical results using Tool X, but the agency reviewer re-analyses the data with Tool Y and sees a different result, who owns reconciling the differences? The sponsor, via a time-bound Information Request from the agency? The agency? The developer of Tool X? When the majority of analysis uses only one tool for analysis, this situation is likely to be a rare occurence. But when the toolset available to both sponsors and agencies becomes wider, then resolving these questions is likely to come up more often.

And what about agencies who typically do not re-analyse sponsor data? How can the industry provide reassurance and proof that analytical environments are validated: accurate, reproducible and traceable? When installation of software into an environment is a one-step process, we can be pretty sure of consistency across analysts. But if there are multiple steps involved, and the software can change almost daily, then how do we ensure this consistency and reproducibility? This is not just the job of IT organisations within sponsor companies, but the responsibility of the individual analyst, to ensure that they are working compliant with organisational processes and in validated environments. Audit trails and traceability helps ensure this, but it is a potential source of doubt for both sponsor and regulatory authority.

6.5 Industry Experiences

The industry offers a diverse range of experiences that can significantly enhance your professional growth. Below is an overview of various industry experiences gathered so far:

Sponsor	Agency	Experiences
RConsortium Pilot Info	FDA	Submitted Accepted
Novo Nordisk Presentation	FDA	Submitted Big data ADaM programmed in R and pre-announced this during Type C and B meeting packages Remaining ADaM in SAS and double programmed in R (where applicable) All TFL in R Following requests from FDA all TFL programs and internal R packages submitted. Packages packed using pkglite. Additional guidance and informal meeting with FDA statisticians on restoring R environment was held
	EMA	Submitted content using R
	NMPA	Same as PMDA
	PMDA	Submitted Big data ADaM programmed in R and pre-announced during additional eSubmission meeting Remaining ADaM in SAS and double programmed in R (where applicable) All TFL in R programs submitted
	Health Canada	Submitted content using R
Roche Presentation	FDA	Submitted content using R Initial pushback from the FDA reviewer to accept R, they wanted SAS instead. Roche pushed back and submitted in R (based on Technical Conformance Guidance) Key message - acceptance of R may vary between different agency reviewers Key message - have the content & format discussions early with the agency
	EMA	Submitted content using R (no comments received so far)
	NMPA	Submitted content using R (no comments received so far)
SwissMedic
Merck	FDA	Submitted Multiple successful submissions GxP platform available Reproducible environment
Astellas Pharma	FDA	(2021) for using R markdown to submit programs Example

6.6 Discussion

Contribute to the discussion here in GitHub Discussions:
Will the FDA accept data and analyses generated with solutions developed and available as open source?

6.7 References

Bell, B “Issues with Open Source Statistical Software in Industry: Validation, Legal Issues, and Regulatory Requirements” ASA JSM 2006.

Sukop, M “Using R: Perspectives of a FDA Statistical RevieweR”. UseR 2007 https://www.r-project.org/conferences/useR-2007/program/presentations/soukup.pdf

U.S. Food & Drug Administration. (2015, May 6).Statistical Software Clarifing Statement. Retrieved from FDA.gov: https://www.fda.gov/media/109552/download