背景
co-occurrence/mutual exclusivity分析,画出图d。
Co-occurrence/mutual exclusivity -- Only mutations seen in at least 10 patients were kept. The DISCOVER method was used to determine significant mutual exclusivity and co-occurrence. A plot of the co-occurrences was generated using corrplot with the odds ratio of the pairwise co-occurrence used to color and scale the circle sizes.

出自PMID: 30333627文章
应用场景
用TCGA的基因突变数据分析Co-occurrence/mutual exclusivity。
用corrplot同时展示相关性和显著性。
corrplot这个包对相关性的更多展示方式可参考http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram。
环境设置
#使用国内镜像安装包
#options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
#options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
#install.packages("Cairo")
#安装discover包
#options(repos=c(getOption("repos"), "http://ccb.nki.nl/software/discover/repos/r"))
#install.packages("discover")
library(reshape2)
library(RColorBrewer)
library(Cairo)
library(discover)
library(readr)
library(corrplot)
library(openxlsx)
Sys.setenv(LANGUAGE = "en") #显示英文报错信息
options(stringsAsFactors = FALSE) #禁止chr转成factor
参数设置
# 你感兴趣的癌症,此处以BRCA为例
target_tumor <- "BRCA"
输入文件的准备
如果你的数据已经保存为“easy_input_mutation.csv”,就可以跳过这步,直接进入“Co-occurrence/mutual exclusivity分析”。
如果只为画图,就直接进入“开始画图”。
例文没有提供基因层面的突变数据,此处以TCGA的BRCA为例,用DISCOVERY method进行co-occurrence和mutual exclusivity分析。
基因层面的突变数据下载
从UCSC xena的TCGA Pan-Cancer (PANCAN) (39 datasets)https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443),下载两个文件:
- mc3.v0.2.8.PUBLIC.nonsilentGene.xena:somatic mutation (SNP and INDEL) - Gene level non-silent mutation。点击链接下载:https://pancanatlas.xenahubs.net/download/mc3.v0.2.8.PUBLIC.nonsilentGene.xena.gz。下载后解压缩,把mc3.v0.2.8.PUBLIC.nonsilentGene.xena文件保存到当前文件夹。里面包含TCGA所有癌症类型,用下面的代码提取你感兴趣的癌症的突变数据。
- TCGA_phenotype_denseDataOnlyDownload.tsv:phenotype - sample type and primary disease。点击链接下载:https://pancanatlas.xenahubs.net/download/TCGA_phenotype_denseDataOnlyDownload.tsv.gz
- TCGA的癌症全称和缩写的对应关系,参照GEPIA help的Differential analysis:http://gepia.cancer-pku.cn/help.html,整理成samplepair.txt文件
#library(readr)
#数据量很大,读入可能需要一段时间
mutationinfo <- read_tsv(file = "mc3.v0.2.8.PUBLIC.nonsilentGene.xena")
mutationinfo[1:5,1:5]
samplepair <- read.delim("samplepair.txt",as.is = T)
gtexpair <- samplepair[,c(1,2,5)]
gtexpair
#以TCGA的缩写为GTEx的组织命名
gtexpair$type <- paste0(gtexpair$TCGA,"_normal_GTEx")
gtexpair$type2 <-"normal"
gtextcga <- gtexpair[,c(1,3:5)] #筛掉Detail列
colnames(gtextcga)[1:2] <- c("tissue","X_primary_site")
head(gtextcga)
#为GTEx的sample标出组织
gtexcase <- read.delim(file="GTEX_phenotype.tsv",header=T,as.is = T)
colnames(gtexcase)[1] <- "sample"
gtexcase2tcga <- merge(gtextcga,gtexcase,by="X_primary_site")
gtextable <- gtexcase2tcga[,c(5,2:4)]
head(gtextable)
tissue <- gtexpair$TCGA
names(tissue) <- gtexpair$Detail
tcgacase <- read.delim(file="TCGA_phenotype_denseDataOnlyDownload.tsv",header=T,as.is = T)
tcgacase$tissue <- tissue[tcgacase$X_primary_disease]
tcgacase$type <- ifelse(tcgacase$sample_type=='Solid Tissue Normal',paste(tcgacase$tissue,"normal_TCGA",sep="_"),paste(tcgacase$tissue,"tumor_TCGA",sep="_"))
tcgacase$type2 <- ifelse(tcgacase$sample_type=='Solid Tissue Normal',"normal","tumor")
tcgatable<-tcgacase[,c(1,5:7)]
head(tcgatable)
提取你感兴趣的癌症的突变数据
library(data.table)
target_sample <- tcgacase[tcgacase$tissue == target_tumor & tcgatable$type2 == "tumor",]
target_mutationinfo <- mutationinfo[,colnames(mutationinfo) %in% target_sample$sample]
head(target_mutationinfo)
#table(colnames(mutationinfo) %in% target_sample$sample)
target_mutationinfo$sample = mutationinfo$sample
target_mutationinfo = target_mutationinfo[complete.cases(target_mutationinfo[,"sample"]),]
target_mutationinfo = as.data.frame(target_mutationinfo)
rownames(target_mutationinfo) = target_mutationinfo$sample
target_mutationinfo = target_mutationinfo[,-length(target_mutationinfo)]
target_mutationinfo[1:5,1:5]
#保存到文件
#write.csv(target_mutationinfo, "easy_input_mutation.csv",quote = F)
#此处保存局部,用于查看文件格式
write.csv(target_mutationinfo[,1:5], "easy_input_mutation.csv",quote = F)
Co-occurrence/mutual exclusivity分析
DESCOVERY method,用R版本的DESCOVERY计算co-occurrence和mutual exclusivity。
参考资料:https://github.com/NKI-CCB/DISCOVER
http://ccb.nki.nl/software/discover/doc/r/discover-intro.html
下面使用Discover包进行co-occurrence和mutual exclusivity分析
# 输入文件:每个基因在每个sample里是否发生突变,0为未突变,1为突变。每行一个基因,每列一个sample。
#target_mutationinfo <- read.csv("easy_input_mutation.csv", row.names = 1)
#target_mutationinfo[1:5,1:5]
library(discover)
#step1 估计背景矩阵,这一步会比较花时间
events <- discover.matrix(target_mutationinfo)
#step2 选择至少在25个肿瘤样本中都有突变的基因
subset <- rowSums(target_mutationinfo) > 25
#step3 进行pairwise test,这一步用来进行mutual exclusivity的运算。
result.mutex <- pairwise.discover.test(events[subset, ],alternative=c("less"))
result.mutex
print(result.mutex, fdr.threshold=0.05)
result = as.data.frame(result.mutex)
write.csv(result,file = "Target_cancer_discover_less.csv")
#进行co-occurence的计算,加入参数alternative=c("greater")
result.mutex <- pairwise.discover.test(events[subset, ],alternative=c("greater"))
print(result.mutex, fdr.threshold=0.05)
result = as.data.frame(result.mutex)
write.csv(result,file = "Target_cancer_discover_greater.csv")
例文的图中有两个输入:P value和odds ratio。
Figure legend of Figure 1d.d, Co-occurrence or exclusivity of the most recurrent mutational events in the Beat AML cohort (n = 531 patients) were assessed using the DISCOVER method. The dot plot shows the odds ratio of co-occurrence (blue) or exclusivity (red) using colour-coding and circle size as well as asterisks that indicate FDR-corrected statistical significance. *P < 0.1; **P < 0.05; ***P < 0.01.
And the discription in Methods:
Co-occurrence and mutual exclusivity. Only mutations seen in at least 10 patients were kept. The DISCOVER41 method was used to determine significant mutual exclusivity and co-occurrence. A plot of the co-occurrences was generated using corrplot84 with the odds ratio of the pairwise co-occurrence used to colour and scale the circle sizes.
然而DESCOVER并不输出odds ratio,原文未描述odds ratio的计算方法,欢迎回帖讨论。