大規(guī)?;蚪M測序中的信息分析_第1頁
大規(guī)?;蚪M測序中的信息分析_第2頁
大規(guī)?;蚪M測序中的信息分析_第3頁
大規(guī)?;蚪M測序中的信息分析_第4頁
大規(guī)?;蚪M測序中的信息分析_第5頁
已閱讀5頁,還剩20頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、大規(guī)?;蚪M測序中的信息分析-拼接與注釋 大規(guī)模測序是基因組研究的最基本任務,它的每一個環(huán)節(jié)都與信息分析緊密相關。從測序儀的光密度采樣與分析、堿基讀出、載體標識與去除、拼接與組裝、填補序列間隙、到重復序列標識、讀框預測和基因標注的每一步都是緊密依賴基因組信息學的軟件和數(shù)據(jù)庫的。 12Sequence Data DistributionVectorSequencing and Data Processing Procedure BasecallingRepeat MarkORF Prediction Gene AnnotationFinishingAssembleVector MarkPhred

2、Phd2fastaCrossmatchPhrapConsedRepeatmaskerGlimmerBlastxBlastnClastaltRNAscanQualityControlQualCalQualDrawQualStatCOGsSwiss-port,PIR, GDB,GenBankSequencing3 How to find the coding regions in rude DNA sequence?Statistical method and Sequence Alignment Method eneven positional base frequence (D value)

3、Neural networkpredicting the splicing sitesFractal dimension of exons and intronsComplexity analysisHow many different patterns are there in the area of the different DNA sequence?Method and Techniques in Cryptology Coincident Indexs Unicity Distance Hidden-Markov method(HMM) Glimmerv-4Neural networ

4、k method A improved back-propagation (BP) learning algorithm have been built in the study. One goal of this network is to find the splicing site between intron and exon.5Neural Network Procedurexl-1yW =synaptic weight vector of a neuron in layer l =threshold of a neuron in layer lV =vector of net in

5、ternal activity levels of neurons Y =vector of function signals of neurons in layer l (l)(l)(l)(l) Forward Computationinput vector xdesired response vextor d The net internal activity level for neuron j in layer l is6If neuron j is in the output layer (i.e.,l=L), set Hence, compute the error signal

6、Backward Computationvector of local gradients of neurons in layer le = error vector respresented by e , e , ., e as elements12qfor neuron j in output layer Lfor neuron j in hidden layer l Iteration7-GU -AG-splicing sites exonexonintronpseudosplicing sites-GU -AG -success rate = 38/60 64%8Fractal dim

7、ension of exon and intron We introduce the mapping from nucleotide sequences to two-dimensional metric space by assigning each nucleotide a fixed vector and constructing a representation by joining these vectors end-to-end.9Fractal Dimension of Exon and Intron SequencesD is the fractal dimension and

8、 there exists self-similarity in this rangeof the nucleotide sequence.-CGCGGCGTGTGTTATA -100 0.5 1 1.5 2 2.5 3 3.5 07654321Intron Seq.Exon Seq.Random Seq.End-to-end RangeMain RangeGeometric RangelnN and are average of and of each class of sequences, and are their standard deviations. The class of ex

9、on sequences is denoted as EX, and intron sequences IN, randomized exon sequences RE, and randomized intron sequences RI.11Complexity analysis In our case complexity means that how many patterns of base connections are there in a window, which are from different parts of DNA sequences. Our results s

10、how that the complexity of exon is larger than the complexity of intron and 5-flanking and 3-flanking.12Relative Complexity in Exon, Intron and Flanks 13Using the methods and techniques in cryptology One could think that the DNA sequences are similar with the telegram, which has been used to transmi

11、t secrets in the field of commerce or military affairs, so we try to apply methods and techniques in cryptology to DNA sequence analysis. As the first step we introduce some parameters of cryptology to DNA sequence analysis.14Coincident index In cryptology this parameter has been used to find the pl

12、ain code in an enciphered text of a telegram. The part with the plain code is the high-value region of this parameter. We assume that exon is corresponding to the plain code and also has the high-value of the coincident index. Unicity distance15COINCIDENT INDEX OF EXON AND INTRONThe original present

13、ation of coincident index is:in which N is the number of characters in the sequence,L is the number of different kinds of character, and is the frequency of the lth character appearing in this sequence.The coincident index introduced from cryptology is applied to DNA sequence analysis. In case of mo

14、no-letter,the expressing of coincident index IC of DNA sequence is:j=0,1,.,L-1; k=A,C,G,T.R=4 25 (25 1)=2400 In case of biletter,the expression of IC is:j=0,1,.,L-1; k=AA,AC,AG,AT,CA,CC,.TG,TT16The Values of Coincident Index in HSHSC70 Sequence17ORF Prediction of the Contig by 5 Method18Composite Ar

15、tificial Neural Network System19Auto-annotation Procedure2021A. T. tengcongensis Complete Genome AnalysisThe bacteria was designated as Thermoanaerobacter tengcongensis MB4 which was isolated from the hotspring in Tengcong area, Yunnan Province of China, by the scientists working in the institute of

16、 microbiology of Chinese Academy of Sciences four years ago. The cell morphology of the bacteria is baculiform, Gram reaction is negative(-), it can not form spore and also can not mobilize. The bacteria is anaerobes to oxygen tolerance, optimum temperature is 75oC, optimum pH 7.0. The genome of T. tengcongensis consists of a single circular chrom

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論