Tutorial 2: Integrating datasets measured by different platforms.

In this tutorial, we demonstrate how to apply Spatialign to integrate datasets that were measured by different platforms. As an example, we used three mouse olfactory bulb datasets. One slice was profiled by 10x Genomics Visium, while the other two slices were obtained from Stereo-seq.

Import packages

[1]:

import scanpy as sc
from spatialign import Spatialign
from warnings import filterwarnings
from anndata import AnnData

filterwarnings("ignore")

[2]:

datasets = [
    "stereo_olfactory_bulb_ann.h5ad",
    "visium_olfactory_bulb_ann.h5ad",
    "stereo_ob_SS200000213BR_C5_bin200_ann.h5ad"
]

Initialize

Read datasets and load Spatialign model

[3]:

model = Spatialign(
    *datasets,
    batch_key='batch',
    is_norm_log=True,
    is_scale=False,
    n_neigh=15,
    is_undirected=True,
    latent_dims=100,
    seed=42,
    gpu=0,
    save_path="./",
    is_verbose=False
)

2023-07-20 18:18:32 Found Dataset:
  cell nums: 1123 gene nums: 20317
  cell nums: 1184 gene nums: 13956
  cell nums: 812 gene nums: 16370
2023-07-20 18:18:33: Spatial coordinates are used to calculate nearest neighbor graphs
2023-07-20 18:18:38 Method: 'init_bank' Running...
  Running time: 0.0 min 1.0062 s

Visualization original dataset by UMAP

[4]:

raw_merge = AnnData.concatenate(*model.dataset.data_list)

[5]:

sc.tl.pca(raw_merge, n_comps=100, random_state=42)
sc.pp.neighbors(raw_merge, random_state=42)
sc.tl.umap(raw_merge, random_state=42)

[6]:

sc.pl.umap(raw_merge, color=["batch", "celltype"])

_images/Tutorial-2-Integrate-datasets-measured-by-different-platforms_9_0.png

Training Spatialign model

Spatialign aims to aligns the biological effects, while maximized preserving spots/cells biological variances in the latent embedding.

It is possible to fine-tune the parameters (tau1, tau2, tau3) to optimize the model’s performance.

[7]:

model.train(tau1=0.05, tau2=1, tau3=0.1)

2023-07-20 18:20:43 Method: 'train' Running...
  2023-07-20 18:21:30 Epoch: 123 Loss: 21.9860 Loss min: 21.9455 EarlyStopping counter: 15 out of 15
  2023-07-20 18:21:30 Model Training Finished!
  2023-07-20 18:21:30 Trained checkpoint file has been saved to ./ckpt
  Running time: 0.0 min 46.6752 s

Inference alignment datasets

After model training, the learned lower-dimensional representations will be saved in adata.obsm[‘correct’], and the higher-dimensional representations will be saved in adata.X

[8]:

model.alignment()

2023-07-20 18:21:30 Method: 'alignment' Running...
2023-07-20 18:21:33 Batch Alignment Finished!
2023-07-20 18:21:33 Alignment data saved in: ./res
  Running time: 0.0 min 3.3786 s

Validation inference datasets

[9]:

correct1 = sc.read_h5ad("./res/correct_data0.h5ad")
correct2 = sc.read_h5ad("./res/correct_data1.h5ad")
correct3 = sc.read_h5ad("./res/correct_data2.h5ad")

[10]:

merge_data = correct1.concatenate(correct2, correct3)

[11]:

sc.pp.neighbors(merge_data, use_rep="correct", random_state=42)
sc.tl.umap(merge_data, random_state=42)

[12]:

sc.pl.umap(merge_data, color=["batch", "celltype"])

_images/Tutorial-2-Integrate-datasets-measured-by-different-platforms_18_0.png

[ ]: