SPADE parameters and quality control

SPADE parameters were benchmarked and optimal user-defined settings for this particular dataset were selected. The distribution of all markers for each SPADE clusters can be displayed as histograms to check the clustering quality.

SPADE parameters

A random pre-downsampling was used to select 60,000 cells from each sample (60,000 corresponded to the number of cells contained in the smallest sample). Then the SPADE algorithm per se was applied to all samples to define the phenotype of each cluster as well as the topology of the tree. Full upsampling was eventually performed.

For our dataset, the optimal SPADE settings were determined with SPADEVizR package as 20 clustering markers (CD66, HLA-DR, CD3, CD64, CD8, CD123, CD11a, CD11b, CD4, CD23, CD86, CD32, CXCR4, CCR5, CD16, CD11c, CD14, CD45, CD20 and CCR7), 600 clusters, a density-based downsampling of 10%, and an outlier density parameter of 0.01.


SPADE quality control

The clustering quality was expressed as the percentage of clusters displaying a unimodal and narrow distribution of all clustering markers, as well as the percentage of small clusters (clusters with less than 50 cells in total). Markers distributions were assessed using the Hartigan’s dip test (p-value<0.05 to reject the uni-modality hypothesis). Markers distributions with an interquartile range (IQR) < 2 were considered to be narrow.

62.5% of the clusters (375 of 600) showed a ‘good’ SPADE clustering. Imperfect clustering quality was mainly related to the bimodal distribution of CD64 in 25% of the clusters. This likely resulted from differences in staining and/or acquisition efficacy between animals, as shown with the control samples (see Figure S13 from the original scientific paper). Except for CD64, 97% of the clusters were uniform for the expression of at least 18 of the 19 remaining clustering markers.