生信人

找回密码
立即注册
搜索
热搜: 活动 交友 discuz
发新帖

0

收听

0

听众

11

主题
发表于 2024-4-11 13:24:33 | 查看: 1130| 回复: 0

背景

有了转录组数据,需要画出某基因相关通路的其他基因的表达热图,通路和基因的关系,从KEGG数据库中获得。 通过R语言批量获取总结。

代码

rm(list = ls())  ####  魔幻操作,一键清空~
getwd()
setwd('D:/tmp/2')

# 自行写代码进行KEGG富集分析时需要制作富集背景文件,将所有通路中包含的所有基因
# (蛋白质)提取出来,生成背景文件。
# 加载必要的库
library(KEGGREST)

# 获取小鼠的所有通路
pathways <- keggList("pathway", "mmu")
pathway_ids <- names(pathways)

# 初始化数据框,用于存储通路ID、通路名称和基因ID
pathway_ids
pathway_gene_df <- data.frame(PathwayID = character(),
                              PathwayName = character(),
                              gene_id = character(),
                              description = character())
# 遍历每个通路,获取基因信息
for (pathway_id in pathway_ids) {
  pathway_info <- keggGet(pathway_id)[[1]]

  # 提取基因信息。这里假设基因信息出现在"GENE"行后
  if("GENE" %in% names(pathway_info)) {
    genes_info <- pathway_info$GENE

    genes_df <- data.frame(gene_id = character(),
                           description = character())
    for(i in seq(1, length(genes_info), by = 2)) {
      # 基因ID位于奇数行
      gene_id <- genes_info[i]
      # 描述位于偶数行
      description <- genes_info[i + 1]

      # 将提取的信息添加到数据框中
      genes_df <- rbind(genes_df, data.frame(GeneID = gene_id, Description = description, stringsAsFactors = FALSE))
    }
    # 添加到数据框
    pathway_gene_df1 <- data.frame(
      PathwayID = rep(pathway_id, nrow(genes_df)),
      PathwayName = rep(pathways[pathway_id], nrow(genes_df))
    )
    pathway_gene_df1 <- cbind(pathway_gene_df1,genes_df)
    pathway_gene_df <- rbind(pathway_gene_df,pathway_gene_df1)
  } else {
    # 如果不存在GENE列,打印错误消息和当前pathway的ID(假设pathway ID存储在pathway_info[[i]]的某个字段)
    cat("Error: 'GENE' column not found for pathway ID", pathway_id, "\n")
  }
}
# Error: 'GENE' column not found for pathway ID mmu01100 
# Error: 'GENE' column not found for pathway ID mmu01200 
# Error: 'GENE' column not found for pathway ID mmu01210 
# Error: 'GENE' column not found for pathway ID mmu01212 
# Error: 'GENE' column not found for pathway ID mmu01230 
# Error: 'GENE' column not found for pathway ID mmu01232 
# Error: 'GENE' column not found for pathway ID mmu01250 
# Error: 'GENE' column not found for pathway ID mmu01240 
# Error: 'GENE' column not found for pathway ID mmu01521 
# Error: 'GENE' column not found for pathway ID mmu01524 
# Error: 'GENE' column not found for pathway ID mmu01523 
# Error: 'GENE' column not found for pathway ID mmu01522
#12个没有 343个mmu有
# 查看数据框的结构
head(pathway_gene_df)

# 如果你想保存这个数据框到CSV文件
write.table(pathway_gene_df, "Mus_kegg_pathway_gene_relationships.txt",sep = '\t', quote = F, row.names = F)

小结

存在某些通路id找不到gene列, 如前355个通路,有12个找不到。 这个时候寻求网页帮助,查漏补缺。 例如mmu01522 参考

[https://www.genome.jp/dbget-bin/get_linkdb?-t+genes+path:mmu01522](https://www.genome.jp/dbget-bin/get_linkdb?-t+genes+path:mmu01522)

以上网址可以查到相关的基因。

收藏回复 显示全部楼层 道具 举报

您需要登录后才可以回帖 登录 | 立即注册

QQ|Archiver|手机版|小黑屋|生信人 ( 萌ICP备20244422号 )

GMT+8, 2024-11-21 17:36 , Processed in 0.102107 second(s), 30 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表