Generalized Model Aggregation

 
The Cradle via Flickr

Generalized Model Aggregation (GMA)

A five-year joint project between MIT Sloan School of Management and the School of Industrial & Systems Engineering at Georgia Tech

Rapid growth in scientific output requires flexible and robust methods for aggregating the findings from prior studies. In most cases, 'qualitative' reviews are conducted for taking stock of what is known, but they offer little 'quantitative' guidance. Also, the common quantitative aggregation methods (i.e., meta-analysis methods) usually combine one explanatory variable (e.g., a treatment) on one response variable (e.g., a health outcome) across multiple studies with similar designs. 

Despite these limitations, the rapid growth of scientific literature has notably promoted increasing applications of meta-analysis. The number of articles in major databases with the term “meta-analysis” ONLY in the title shows over 25-fold growth over the last decade—reaching tens of thousands annually. Therefore, the value of a broader and more flexible method for synthesizing prior research can be immense across various disciplines.

Our recent paper in PLOS ONE outlines a new method (generalized model aggregation – GMA; see it on Wikipedia!) for aggregating into a meta-model the results of prior studies (of a phenomenon) when those studies vary in design and measures used. In the paper, we provide several numerical and empirical examples demonstrating the ability of GMA to aggregate evidence from methodologically diverse studies and obtain unbiased estimates from potentially mis-specified studies. By enabling more complex meta-analyses, GMA allows researchers to leverage previous findings to compare alternative theories and advance new models in diverse domains.

 

Why GMA

  • Growing Need: In light of the growing volume of research globally, there is a significant need for methods of combining, contrasting, and building on others’ research, a need reflected in the exponential growth of meta-analysis papers over the past decade. For example, according to Web of Science, the number of 2015 articles with Meta-Analysis in their title exceeds the total number of articles in many important domains such as virology, thermodynamics, marine freshwater biology, geochemistry and geophysics, parasitology, developmental biology, oceanography, or evolutionary biology.  Generalized model aggregation offers a major contribution to such aggregation of prior studies. For instance, it relaxes several restrictive requirements of current meta-analysis methods, which can only combine studies with similar designs, allows for estimating statistically consistent parameters from potentially biased prior studies, and enables the estimation of some of the biases in prior measurement methods. We therefore think GMA can address an important unmet need shared by various research communities.
     
  • Breadth of Applications: From health to climate change, and environmental sciences, we present a wide range of examples that can benefit from GMA.
     
  • Relevance to Reproducibility: Building confidence in research results and reproducibility are growing concerns of the scientific communities. In conducting GMA, one reproduces (the analysis of) prior studies in order to combine them, and can identify outliers and tease out the reasons for discrepancies among those studies, thus contributing to this important area of research.
     
  • Accessibility: The mathematical and conceptual underpinnings of GMA are simple enough that a broad research audience can appreciate the idea and see its potential for their research areas.
     
  • Theoretical, Empirical and Numerical Validation: We not only provide analytical proofs for various properties of the method, but also have provided numerical examples that show the breadth of the situations in which GMA could be used and the potential for its application in various research domains. Finally, we have presented an empirical validation that demonstrates the strength of the method in an actual research application and validated the results using an independent dataset. In the paper, we focus on the key ideas and results, relegating the mathematical treatment, proofs, and documentation to the online supplement.

 

The Method

The intuition behind GMA is simple. "Prior studies provide statistical estimates (denoted as signatures, e.g., regression coefficients, correlation matrices, and variance of effect sizes across prior studies) that, even if biased and incomplete, include relevant information about the phenomenon of interest, i.e., the data generating process. A meta-model corresponds well to the real data generating process if the same statistical operations that generated the empirical signatures of prior studies lead to similar signatures when applied to simulated data from the meta-model. Thus, by matching the simulated signatures from a meta-model against the empirical signatures of prior studies, we can estimate the parameters of the meta-model. The resulting meta-model aggregates prior research by embedding into a single model the quantitative information from all prior studies and the variations across them." Figure below presents an overview of the method. Also, see the paper for more details. 

Overview of GMA (Download the high resoultion version)

 

  Upcoming semiars

May 5, 2 tp 3:30 pm, Harvard School of Public Health (building 2, Room 426), by Dr. Jalali 

Seminar webpage (no registration required)

 

 Media Coverage

MIT Sloan Professor Builds New Meta-analysis Method to Help Settle Unresolved Debates, April 2017

New Meta-Analysis Method To Help Settle Unresolved Debates, April 2017

Why Meta-Data Analysis Tools Like This one are key for Industry 4.0, April 2017

More news pieces to come soon...

 

 Paper

See/downlaod the paper PLOS ONE
Downlaod supplementary text
See below for GMA codes and instructions.

 

 Recipe for GMA

GMA Codes (MATLAB version)
Codes README

Here we provide a quick recipe of the main steps to use GMA. Check out the supplementary document for the details of each step. Also, the Codes README file provides more information about the code files listed in each step--as well as all other code files, including their actions, inputs, and outputs.  

Step 1: Choose signatures and replicate prior models

  • Code file:       
    User_model_(l).m

Step 2: Generate simulated data for explanatory and response variables

  • Code files:      
    User_DataGeneration.m
    and User_SIGMA.m for the explanatory variables
    User_Meta_model.m for the response variable

Step 3: Initiate the GMA and optimization solver

  • Code files: 
    User_GeneralInputs.m
    User_OptInitiation.m

Step 4: Weighting matrix, optimization and iteration

  • Code files:
    GMA_W_star.m 
    GMA_Optimization.m

    GMA_ObjFn.m

 

Cite these materials as:

Rahmandad H, Jalali MS, Paynabar K (2017) A flexible method for aggregation of prior statistical findings. PLOS ONE 12(4): e0175111. doi: 10.1371/journal.pone.0175111

Dowload the citation here