<style type="text/css"> .wpb_animate_when_almost_visible { opacity: 1; }</style>


Phenotypic families

As the second analysis step, hierarchical clustering of NK cell clusters and markers was performed and is represented as a heatmap. The heatmap allow the visualization of the phenotype of each SPADE cluster at a glance. They represent the phenotypic diversity within the dataset, but not a particular sample and, in no case, a particular timepoint, nor do they give an idea of the abundance of each subphenotype. They are not snapshots. Clusters sharing similar phenotypes as measured by a close proximity on the cluster dendrogram, were gathered into phenotypic families. Phenotypic families were further grouped into superfamilies. This analytical strategy including 2 successive clusterings (SPADE followed by hierarchical clustering of SPADE clusters) prevents inaccurate interpretations due to potential SPADE ‘over-clustering’. Indeed, clusters may actually account for different stages of activation or maturation within a cell subpopulation, whereas phenotypic families may represent actual subpopulations.

Heatmap interactivity

The web favors interactivity. On mouseover and for each cluster on the heatmap, three complementary images pop-up. A cluster ID card summarizes the cluster phenotype, pedigree and dynamic. A phenotypic viewer shows the range of expression of each marker within the dataset, which is helpful to better understand and interpret the five categories color coded in the heatmap. It also shows the intensity of expression of all samples for all markers for a given cluster to assess the inter-individual variability as well as the clustering quality. Above all, a kinetic viewer shows the abundance profile over time of the phenotypic family to which the cluster of interest belongs, linking phenotype to dynamic.


Heatmap. The mean of the median of the mean signal intensity for each marker among all samples was displayed according to five phenotypic bins calculated by dividing the marker range of expression between the 5th and the 95th percentile into five categories for all SPADE clusters. For each cluster, samples contributing less than 10 cells were excluded. Hierarchical clusterings of cell clusters and markers were performed using the Euclidean metric based on the ward.D linkage. 10 phenotypic families were identified. They were arbitrarily colored and numbered using Arabic numerals. Groups of phenotypic families defining superfamilies were framed with bold lines, and labeled with capital letters.

Cluster ID. For each cluster, its phenotypic family (1 to 10) and superfamily (A to C), annotation and detailed annotation are summarized from the heatmap and linked to the kinetic analysis. Whether its phznotypic family is necessary to discriminate the response to the first and second immunization after LASSO and LDA (prime, boost or not included) are indicated. The main features that differed between the post-prime and post-boost immune response were identified using a Least Absolute Shrinkage and Selection Operator (LASSO) approach. This approach allowed to statistically select the phenotypic families that best characterized the post-prime or post-boost immune response. Linear discriminant analysis (LDA) allowed us to score their contribution to post-prime and post-boost category and gave a statistical criterion to classify the selected phenotypic families in post-prime or post-boost signature.

Phenotypic viewer. The cell cluster phenotype is visualized using parallel coordinates. The X-axis represents the cell markers and the Y-axis represents the marker expressions. Each line corresponds to a biological sample. Samples from the 5 animals collected at the same time-point are shown using the same color. The grey ribbon displays the range of expression between the 5th and the 95th percentile for each marker and which was divided in 5 categories used in the heatmap. The dotted line represents the mean of the median of the mean signal intensity of all samples and for each marker, which is used to infer the categories on the heatmaps.

Heatmap compared to SPADE tree

Phenotypic families defined by the categorical heatmap and the annotated branches on the SPADE tree representation may not matched. The heatmap is constructed based on categories of marker expressions, whereas the SPADE tree is constructed based on the intensity of marker expressions. The heatmap results from a hierarchical clustering and a dendrogram with a simple lineage method, whereas the SPADE algorithm links cell populations having similar phenotypes using a minimal spanning tree approach. Distances between cell populations are meaningful in a dendrogram representation, whereas they are not in a SPADE tree representation. The heatmap was generated based on the expression of all the 32 markers (20 SPADE clustering markers and 12 cytokines), whereas the SPADE tree representations were generated based on the expression of 25 SPADE clustering markers.