SPADE parameters and quality control

SPADE parameters were benchmarked and optimal user-defined settings for this particular dataset were selected. The distribution of all markers for each SPADE clusters can be displayed as histograms to check the clustering quality.

SPADE parameters

A random pre-downsampling was used to select 50,000 cells from each sample (50,000 corresponded to the number of cells contained in the smallest sample). Then the SPADE algorithm per se was applied to all samples to define the phenotype of each cluster as well as the topology of the tree. Full upsampling was eventually performed.

For our dataset, the optimal SPADE settings were determined with SPADEVizR package as 25 clustering markers (CD66, HLA-DR, CD3, CD107a, CD8, CD45, granzyme B, CD56, CD62L, CD4, CD11a, CD2, CD7, NKG2D, CD11c, CD69, CD25, CD16, CCR5, CXCR4, CD14, perforin, NKG2A, CD20, and CCR7), 900 clusters, a density-based downsampling of 20%, and an outlier density parameter of 0.01.


SPADE quality control

The clustering quality was expressed as the percentage of clusters displaying a unimodal and narrow distribution of all clustering markers, as well as the percentage of small clusters (clusters with less than 50 cells in total). Markers distributions were assessed using the Hartigan’s dip test (p-value<0.05 to reject the uni-modality hypothesis). Markers distributions with an interquartile range (IQR) < 2 were considered to be narrow.

66.6% of the NK cell clusters (22 out of 33) showed a ‘good’ SPADE clustering.