Clustering methods for single-cell RNA-sequencing expression data: performance evaluation with varying sample sizes and cell compositions

Suner Karakülah A.

STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, vol.18, no.5, 2019 (Journal Indexed in SCI) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 18 Issue: 5
  • Publication Date: 2019
  • Doi Number: 10.1515/sagmb-2019-0004
  • Keywords: clustering, performance evaluation, RNA sequencing, single cell, EMBRYONIC STEM-CELLS, QUALITY-CONTROL, SEQ, HETEROGENEITY, EVOLUTION, VISUALIZATION, CRITERIA, TOOLS


A number of specialized clustering methods have been developed so far for the accurate analysis of single-cell RNA-sequencing (scRNA-seq) expression data, and several reports have been published documenting the performance measures of these clustering methods under different conditions. However, to date, there are no available studies regarding the systematic evaluation of the performance measures of the clustering methods taking into consideration the sample size and cell composition of a given scRNA-seq dataset. Herein, a comprehensive performance evaluation study of 11 selected scRNA-seq clustering methods was performed using synthetic datasets with known sample sizes and number of subpopulations, as well as varying levels of transcriptome complexity. The results indicate that the overall performance of the clustering methods under study are highly dependent on the sample size and complexity of the scRNA-seq dataset. In most of the cases, better clustering performances were obtained as the number of cells in a given expression dataset was increased. The findings of this study also highlight the importance of sample size for the successful detection of rare cell subpopulations with an appropriate clustering tool.