生信人

找回密码
立即注册
搜索
热搜: 活动 交友 discuz
发新帖

0

收听

12

听众

318

主题
发表于 2022-4-14 17:58:09 | 查看: 1651| 回复: 0
1 不同 kmer 大小
  1. mkdir 1.kmer
  2. cd 1.kmer
  3. cp ../36.illumina/lib.list .
  4. SOAPdenovo-31mer all -s lib.list -K 31 -o kmer31 -D 1 -d 1 -u 2 -p 12 >kmer31.log
  5. SOAPdenovo-63mer all -s lib.list -K 63 -o kmer63 -D 1 -d 1 -u 2 -p 12 >kmer63.log
  6. SOAPdenovo-127mer all -s lib.list -K 127 -o kmer105 -D 1 -d 1 -u 2 -p 12 >kmer105.log

  7. ll *.scafSeq
  8. seqkit stat *.scafSeq
  9. mv *.scafSeq ../
  10. rm -rf kmer*
  11. #修改-d
  12. SOAPdenovo-63mer all -s lib.list -K 63 -o kmer63 -D 1 -d 0 -u 2 -p 12 >kmer63_0.log
  13. SOAPdenovo-63mer all -s lib.list -K 63 -o kmer63 -D 1 -d 1 -u 2 -p 12 >kmer63_1.log
  14. SOAPdenovo-63mer all -s lib.list -K 63 -o kmer63 -D 1 -d 2 -u 2 -p 12 >kmer63_2.log
  15. seqkit stat *.scafSeq
复制代码

2 文库大小

小片段文库,写入文件 lib.list
  1. #大片段文库
  2. max_rd_len=150
  3. [LIB]
  4. avg_ins=500
  5. reverse_seq=0
  6. asm_flags=3
  7. rank=1
  8. pair_num_cutoff=3
  9. q1=/cleandata/500_clean.1.fq.gz
  10. q2=/cleandata/500_clean.2.fq.gz

  11. [LIB]
  12. avg_ins=2000
  13. reverse_seq=1
  14. asm_flags=2
  15. rank=2
  16. pair_num_cutoff=3
  17. q1=/cleandata/2000_clean.1.fq.gz
  18. q2=/cleandata/2000_clean.2.fq.gz

  19. SOAPdenovo-63mer all -s lib.list -K 63 -o kmer1 -D 1 -d 1 -u 2 -p 12 >kmer1.log
  20. SOAPdenovo-63mer all -s lib2.list -K 63 -o kmer2 -D 1 -d 1 -u 2 -p 12 >kmer2.log

  21. ll *.scafSeq
  22. seqkit stat *.scafSeq
  23. seqkit seq -m 500 kmer1.scafSeq | seqkit stat
  24. seqkit seq -m 500 kmer2.scafSeq | seqkit stat
复制代码
换spades软件看大片段文库
  1. echo "spades.py --isolate -o spades1 -t 12 -1 /share/home/xiehs/05.assembly/data/illumina._1.fastq.gz -2 /share/home/xiehs/05.assembly/data/illumina_2.fastq.gz -t 12 1>spades1.log 2>spades1.err" > spades.sh
  2. echo "spades.py --isolate -o spades2 -t 12 -1 /share/home/xiehs/05.assembly/data/illumina._1.fastq.gz -2 /share/home/xiehs/05.assembly/data/illumina_2.fastq.gz -t 12 --mp1-1 /cleandata/2000_clean.1.fq.gz --mp1-2 /cleandata/2000_clean.2.fq.gz 1>spades2.log 2>spades2.err" >> spades.sh
  3. nohup sh spades.sh &
  4. seqkit seq -m 500 spades1/scaffolds.fasta | seqkit stat
  5. seqkit seq -m 500 spades2/scaffolds.fasta | seqkit stat
复制代码

3 数据质量
比较过滤前后拼接结果差别;

  1. mkdir 3.filter
  2. SOAPdenovo-63mer all -s lib.list -K 63 -o kmer63 -D 1 -d 0 -u 2 -p 12 >kmer63.log
  3. seqkit seq -m 500 kmer63.scafSeq | seqkit stat
  4. seqkit seq -m 500 ../1.kmer/kmer63_1.scafSeq | seqkit stat
复制代码

4 数据量大小
分别抽取 10%,30%,50%,80%进行比较。

  1. seqkit sample -p 0.1 -s 1234 /share/home/xiehs/05.assembly/data/illumina_1.fastq.gz | gzip >reads.0.1_1.fq.gz
  2. seqkit sample -p 0.1 -s 1234 /share/home/xiehs/05.assembly/data/illumina_2.fastq.gz | gzip >reads.0.1_2.fq.gz

  3. for i in {0.1,0.3,0.5,0.8};do
  4.     seqkit sample -p ${i} -s 1234 /share/home/xiehs/05.assembly/data/illumina_1.fastq.gz | gzip >reads.${i}_1.fq.gz
  5.     seqkit sample -p ${i} -s 1234 /share/home/xiehs/05.assembly/data/illumina_2.fastq.gz | gzip >reads.${i}_2.fq.gz;
  6. done;
  7. ls -1 *.fq.gz | xargs -n 2 | while read {i,j};do echo spades.py -o spades_${i} -t 12 -1 ${i} -2 ${j};done;
  8. spades.py -o spades_reads.0.1_1.fq.gz -t 12 -1 reads.0.1_1.fq.gz -2 reads.0.1_2.fq.gz
  9. spades.py -o spades_reads.0.3_1.fq.gz -t 12 -1 reads.0.3_1.fq.gz -2 reads.0.3_2.fq.gz
  10. spades.py -o spades_reads.0.5_1.fq.gz -t 12 -1 reads.0.5_1.fq.gz -2 reads.0.5_2.fq.gz
  11. spades.py -o spades_reads.0.8_1.fq.gz -t 12 -1 reads.0.8_1.fq.gz -2 reads.0.8_2.fq.gz

  12. _1.fq.gz删掉作为目录-o

  13. nohup sh spades.sh &
  14. seqkit stat spades_reads*/scaffolds.fasta
复制代码


5 reads 长度的影响

  1. #不同reads长度
  2. #利用wgsim分别模拟长度70与150bp长度reads
  3. cp /share/home/xiehs/05.assembly/data/MGH78578.fasta .
  4. wgsim MGH78578.fasta read.50_1.fq read.50_2.fq -1 50 -2 50
  5. wgsim MGH78578.fasta read.300_1.fq read.300_2.fq -1 300 -2 300

  6. spades.py -o spades50 -t 12 -1 read.50_1.fq -2 read.50_2.fq
  7. spades.py -o spades300 -t 12 -1 read.300_1.fq -2 read.300_2.fq
  8. seqkit stat spades50/scaffolds.fasta
  9. seqkit stat spades300/scaffolds.fasta
复制代码

6 不同错误率
  1. #不同错误率的影响
  2. wgsim -e 0.01 -1 50 -2 50 MGH78578.fasta read.0.01_1.fq read.0.01_2.fq
  3. wgsim -e 0.1 -1 50 -2 50 MGH78578.fasta read.0.1_1.fq read.0.1_2.fq
  4. #用soapdenovo拼接
  5. SOAPdenovo-63mer all -s lib.list -K 35 -o kmer35 -D 1 -d 1 -u 2 -p 12
  6. SOAPdenovo-63mer all -s lib.list -K 35 -o kmer35_01 -D 1 -d 1 -u 2 -p 12
  7. seqkit stat kmer35.scafSeq
复制代码

您需要登录后才可以回帖 登录 | 立即注册

QQ|Archiver|手机版|小黑屋|生信人 ( 萌ICP备20244422号 )

GMT+8, 2024-11-22 02:16 , Processed in 0.070716 second(s), 30 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表