背景
有了转录组数据,需要画出某基因相关通路的其他基因的表达热图,通路和基因的关系,从KEGG数据库中获得。
通过R语言批量获取总结。
代码
rm(list = ls()) #### 魔幻操作,一键清空~
getwd()
setwd('D:/tmp/2')
# 自行写代码进行KEGG富集分析时需要制作富集背景文件,将所有通路中包含的所有基因
# (蛋白质)提取出来,生成背景文件。
# 加载必要的库
library(KEGGREST)
# 获取小鼠的所有通路
pathways <- keggList("pathway", "mmu")
pathway_ids <- names(pathways)
# 初始化数据框,用于存储通路ID、通路名称和基因ID
pathway_ids
pathway_gene_df <- data.frame(PathwayID = character(),
PathwayName = character(),
gene_id = character(),
description = character())
# 遍历每个通路,获取基因信息
for (pathway_id in pathway_ids) {
pathway_info <- keggGet(pathway_id)[[1]]
# 提取基因信息。这里假设基因信息出现在"GENE"行后
if("GENE" %in% names(pathway_info)) {
genes_info <- pathway_info$GENE
genes_df <- data.frame(gene_id = character(),
description = character())
for(i in seq(1, length(genes_info), by = 2)) {
# 基因ID位于奇数行
gene_id <- genes_info[i]
# 描述位于偶数行
description <- genes_info[i + 1]
# 将提取的信息添加到数据框中
genes_df <- rbind(genes_df, data.frame(GeneID = gene_id, Description = description, stringsAsFactors = FALSE))
}
# 添加到数据框
pathway_gene_df1 <- data.frame(
PathwayID = rep(pathway_id, nrow(genes_df)),
PathwayName = rep(pathways[pathway_id], nrow(genes_df))
)
pathway_gene_df1 <- cbind(pathway_gene_df1,genes_df)
pathway_gene_df <- rbind(pathway_gene_df,pathway_gene_df1)
} else {
# 如果不存在GENE列,打印错误消息和当前pathway的ID(假设pathway ID存储在pathway_info[[i]]的某个字段)
cat("Error: 'GENE' column not found for pathway ID", pathway_id, "\n")
}
}
# Error: 'GENE' column not found for pathway ID mmu01100
# Error: 'GENE' column not found for pathway ID mmu01200
# Error: 'GENE' column not found for pathway ID mmu01210
# Error: 'GENE' column not found for pathway ID mmu01212
# Error: 'GENE' column not found for pathway ID mmu01230
# Error: 'GENE' column not found for pathway ID mmu01232
# Error: 'GENE' column not found for pathway ID mmu01250
# Error: 'GENE' column not found for pathway ID mmu01240
# Error: 'GENE' column not found for pathway ID mmu01521
# Error: 'GENE' column not found for pathway ID mmu01524
# Error: 'GENE' column not found for pathway ID mmu01523
# Error: 'GENE' column not found for pathway ID mmu01522
#12个没有 343个mmu有
# 查看数据框的结构
head(pathway_gene_df)
# 如果你想保存这个数据框到CSV文件
write.table(pathway_gene_df, "Mus_kegg_pathway_gene_relationships.txt",sep = '\t', quote = F, row.names = F)
小结
存在某些通路id找不到gene列,
如前355个通路,有12个找不到。
这个时候寻求网页帮助,查漏补缺。
例如mmu01522
参考
[https://www.genome.jp/dbget-bin/get_linkdb?-t+genes+path:mmu01522](https://www.genome.jp/dbget-bin/get_linkdb?-t+genes+path:mmu01522)
以上网址可以查到相关的基因。 |