背景
突变会影响转录因子结合吗?作出判断,并同时画出motif logo和SNP的图。

出自PMID: 28108517文章
应用场景
在基因组上同时展示突变位点和motif,为突变影响转录因子结合提供量化(pvalue、score)和可视化的证据。
运行下面这行,查看motifbreakR的官方手册
browseVignettes("motifbreakR")
motifbreakR还可以跟其他工具结合使用,一系列结果图作为证据,帮你充实文章。看这篇:Variant Annotation Workshop with FunciVAR, StateHub and MotifBreakR
环境设置
使用国内镜像安装包
options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("motifbreakR", version = "3.8")
#如果你只提供rs ID,就需要安装这个包
#SNP locations and alleles for Homo sapiens extracted from NCBI dbSNP Build 151. The source data files used for this package were created by NCBI between February 16-22, 2018, and contain SNPs mapped to reference genome GRCh38.p7
#这个版本的SNP文件480M,其他版本更大,建议下载后本地安装,<http://bioconductor.org/packages/3.8/data/annotation/src/contrib/SNPlocs.Hsapiens.dbSNP142.GRCh37_0.99.5.tar.gz>
BiocManager::install("SNPlocs.Hsapiens.dbSNP142.GRCh37", version = "3.8")
#如果你提供bed或vcf,有下面这个包就够了
#Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects.也可以本地下载后安装<https://mirrors.westlake.edu.cn/bioconductor/packages/3.21/data/annotation/src/contrib/BSgenome.Hsapiens.UCSC.hg19_1.4.3.tar.gz>
BiocManager::install("BSgenome.Hsapiens.UCSC.hg19", version = "3.8")
其他物种到http://www.bioconductor.org/packages/3.8/data/annotation查找相应的包的名字。
你的电脑可能需要安装GhostScript,参考https://github.com/Simon-Coetzee/motifBreakR里的Prepairing to install
加载包
library(motifbreakR)
library(SNPlocs.Hsapiens.dbSNP142.GRCh37)
library(BSgenome.Hsapiens.UCSC.hg19)
Sys.setenv(LANGUAGE = "en") #显示英文报错信息
options(stringsAsFactors = FALSE) #禁止chr转成factor
输入数据
motifbreakR可以用SNP的rs ID,或bed文件,或vcf文件作为输入。
提供SNP的rs ID
输入文件可以只提供SNP的rsID,例如rs1006140
SNPID <- read.table("easy_input_rs.txt")
SNPID
SNPinfo <- snps.from.rsid(rsid = SNPID$V1,
dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
search.genome = BSgenome.Hsapiens.UCSC.hg19)
SNPinfo
提供突变位点的bed或vcf文件
easy_input.bed,整理自例文里的Table S1
SNPinfo <- snps.from.file(file = "easy_input.bed", #输入文件
search.genome = BSgenome.Hsapiens.UCSC.hg19,
format = "bed") #或vcf
判断SNP对motif的影响
data(hocomoco)
results <- motifbreakR(snpList = SNPinfo, filterp = TRUE,
pwmList = hocomoco,
threshold = 1e-4,
method = "ic",
bkg = c(A=0.25, C=0.25, G=0.25, T=0.25),
BPPARAM = BiocParallel::SerialParam())
# SerialParam()串行;bpparam()并行
#提取其中一个突变位点的结果
result1 <- results[names(results) %in% "chr11:111957524:C:T"]
#去除重复的行
result1 <- unique(result1)
#计算p value
result1pval <- calculatePvalue(result1)
result1pval
#计算例文图中motif右侧的分数,Altscore-Refscore
result1pval$score <- result1pval$scoreAlt-result1pval$scoreRef
# 查看所有可用的列名
names(mcols(result1pval))
# 选择您关心的列(例如:基因符号、分数变化、p值等)
selected_columns <- c("geneSymbol", "scoreRef", "scoreAlt", "alleleDiff", "Refpvalue", "Altpvalue", "effect", "score")
result1pval_subset <- as.data.frame(result1pval)[, selected_columns]
# 导出选定的数据
write.csv(result1pval_subset, "output.csv", row.names = FALSE, quote = FALSE)
开始画图
pdf("SNPmotif.pdf",8,4)
plotMB(results = result1, rsid = "chr11:111957524:C:T",
effect = "strong") #或weak
dev.off()

后期加工
根据output.csv中的最后一列score,向图中添加每个转录因子的Altscore-Refscore