Theory | Gennady Gorin, Ph.D.

In brief: I develop, solve, and apply models that represent variability due to sequencing as well as biology, and use these models to understand the limitations of typical workflows.

Single-cell workflows have been evolving at tremendous pace — yet data interpretation has been lagging behind in important ways. Conventional analyses take inspiration from the computer science field, using methods from signal processing and graph theory. But gene expression violates many of these methods’ assumptions: it is high-dimensional, discrete, and sparse. Even if the current methods work acceptably some of the time, I strive to characterize their limits and failure modes to understand how to build on them.

The observed experimental readouts combine biological variability due to cell type heterogeneity, transcriptional bursting, and single-molecule stochasticity, as well as technical variability due to imperfect sequencing, batch effects, and many as of yet uncharacterized sources of variation. To appropriately attribute and exploit differences in expression, we need to understand the contributions of biological and technical effects.

My Ph.D. work focused on developing a suite of interpretable, tractable models that combine the mechanisms of single-cell biology with the technology of single-cell RNA sequencing in a common stochastic framework. This approach brings the foundations of sequencing in line with theory developed for fluorescence transcriptomics, and confers two advantages.

First, the mechanistic approach uses more of the data and provides more information than the descriptive approach. For example, instead of talking about differences in average gene expression, we can directly attribute those differences to changes in specific transcriptional parameters. To that end, I have implemented the Monod software package and used it to characterize subtle differences in biophysical processes which would otherwise be undetectable.

Second, it provides a principled way to ask questions of standard workflows, clarify their performance, and build better tools. I have used this approach to characterize the limitations of standard methods. For example, dimensionality reduction methods make severe assumptions, leading them to discard biologically meaningful variability. RNA velocity uses the outputs of dimensionality reduction workflows to visualize its results, leading to potentially flawed conclusions.

Stochastic modeling provides a principled alternative to the foundational issues of RNA velocity and dimensionality reduction.

References

2024

New and notable: Revisiting the ‘‘two cultures’’ through extrinsic noise

Gennady Gorin, and Lior Pachter

Biophysical Journal, Jan 2024

Abs HTML

A recent article by Grima and Esmenjaud draws attention to the unexpectedly complex effects of extrinsic noise on inference of transcriptional kinetics. We contrast the authors’ mechanistic approach with the descriptive, data-scientific methods used in single-cell RNA sequencing, and discuss broader philosophical connections to Leo Breiman’s "two cultures" of statistics.

2023

Length biases in single-cell RNA sequencing of pre-mRNA

Gennady Gorin, and Lior Pachter

Biophysical Reports, Mar 2023

Abs HTML

Single-cell RNA sequencing data can be modeled using Markov chains to yield genome-wide insights into transcriptional physics. However, quantitative inference with such data requires careful assessment of noise sources. We find that long pre-mRNA transcripts are over-represented in sequencing data. To explain this trend, we propose a length-based model of capture bias, which may produce false-positive observations. We solve this model and use it to find concordant parameter trends as well as systematic, mechanistically interpretable technical and biological differences in paired data sets.
Assessing Markovian and Delay Models for Single-Nucleus RNA Sequencing

Gennady Gorin, Shawn Yoshida, and Lior Pachter

Bulletin of Mathematical Biology, Oct 2023

Abs HTML

The serial nature of reactions involved in the RNA life-cycle motivates the incorporation of delays in models of transcriptional dynamics. The models couple a transcriptional process to a fairly general set of delayed monomolecular reactions with no feedback. We provide numerical strategies for calculating the RNA copy number distributions induced by these models, and solve several systems with splicing, degradation, and catalysis. An analysis of single-cell and single-nucleus RNA sequencing data using these models reveals that the kinetics of nuclear export do not appear to require invocation of a non-Markovian waiting time.
Studying stochastic systems biology of the cell with single-cell genomics data

Gennady Gorin, John J. Vastola, and Lior Pachter

Cell Systems, Oct 2023

Abs HTML

Recent experimental developments in genome-wide RNA quantification hold considerable promise for systems biology. However, rigorously probing the biology of living cells requires a unified mathematical framework that accounts for single-molecule biological stochasticity in the context of technical variation associated with genomics assays. We review models for a variety of RNA transcription processes, as well as the encapsulation and library construction steps of microfluidics-based single-cell RNA sequencing, and present a framework to integrate these phenomena by the manipulation of generating functions. Finally, we use simulated scenarios and biological data to illustrate the implications and applications of the approach.

2022

RNA velocity unraveled

Gennady Gorin, Meichen Fang, Tara Chari, and Lior Pachter

PLOS Computational Biology, Sep 2022

Abs HTML

We perform a thorough analysis of RNA velocity methods, with a view towards understanding the suitability of the various assumptions underlying popular implementations. In addition to providing a self-contained exposition of the underlying mathematics, we undertake simulations and perform controlled experiments on biological datasets to assess workflow sensitivity to parameter choices and underlying biology. Finally, we argue for a more rigorous approach to RNA velocity, and present a framework for Markovian analysis that points to directions for improvement and mitigation of current problems.