Introduction

In this whitepaper we outline a framework for conducting statistical analyses in different programming languages, particularly when results across languages may not exactly match numerically. The purpose of this paper is support statisticians and analysts to identify and reconcile differences in numerical results, providing confidence in the integrity of those results even if another language provides seemingly different results.

To illustrate the framework, we provide examples from use cases of SAS and R differences within select families of analyses.

Specifically, we’re going to cover summary statistics, linear models, mixed models, survival models, and the Cochran-Mantel-Haenszel test. The following list is a simple breakdown of what this paper will assist with.

Topic Concept Reference
Summary Stats FIVE NUMBER SUMMARIES
. FREQUENCY TABLES
. GROUPED SUMMARIES
. MISCELLANEOUS SUMMARY VALUES
Linear Models REGRESSION
. COMPARING MEANS
. ANOVA MODELS
. ANCOVA MODELS
. MANOVA MODELS
. USING CONTRASTS
. TESTS OF NORMALITY
. TESTS OF EQUAL VARIANCE
Mixed Models MIXED MODEL ANOVA
. REPEATED MEASURES
.
Survival Models KAPLAN MEIER
. LOG-RANK TEST
. COX PROPORTIONAL HAZARDS MODEL
. PARAMETRIC MODEL
. COCHRAN-MANTEL-HAENSZEL