大規(guī)模基因組測(cè)序中的信息分析_第1頁(yè)
大規(guī)?;蚪M測(cè)序中的信息分析_第2頁(yè)
大規(guī)模基因組測(cè)序中的信息分析_第3頁(yè)
大規(guī)?;蚪M測(cè)序中的信息分析_第4頁(yè)
大規(guī)?;蚪M測(cè)序中的信息分析_第5頁(yè)
已閱讀5頁(yè),還剩25頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、會(huì)計(jì)學(xué)1大規(guī)?;蚪M測(cè)序中的信息分析大規(guī)模基因組測(cè)序中的信息分析Sequence Data DistributionVectorSequencing and Data Processing Procedure BasecallingRepeat MarkORF Prediction Gene AnnotationFinishingAssembleVector MarkPhredPhd2fastaCrossmatchPhrapConsedRepeatmaskerGlimmerBlastxBlastnClastaltRNAscanQualityControlQualCalQualDrawQualS

2、tatCOGsSwiss-port,PIR, GDB,GenBankSequencing-xl-1yW =synaptic weight vector of a neuron in layer l =threshold of a neuron in layer lV =vector of net internal activity levels of neurons Y =vector of function signals of neurons in layer l (l)(l)(l)(l)vjlnwjilipnyiln()()()()()01()yvjljlnn()()( )( )exp(

3、)11 Forward Computationinput vector xdesired response vextor d The net internal activity level for neuron j in layer l isIf neuron j is in the output layer (i.e.,l=L), set yojLjnn()()()Hence, compute the error signaledojjj Backward Computationvector of local gradients of neurons in layer le = error

4、vector respresented by e , e , ., e as elements12qjljljljlkjlnynynnn()()()()()()()()()()1jLjLjjnen onon()()()()()()1for neuron j in output layer Lfor neuron j in hidden layer l()1wnwnwnwnn ynjiljiljiljiljljl()()()()()()()( )( )()( )( )111 Iteration-GU -AG-splicing sites exonexonintronpseudosplicing

5、sites-GU -AG -Numbe of splicing sites andpseudosplicing sites of testing groupsplicing sitespseudosplicingGU601710AG602800Distribution Matrix of Prediction38226015278528005328072860success rate = 38/60 64%DNNNRRNNNNNNN000111221 20()ln/ ln,()/RNNRNNNNi iNiNN2020000111,()D is the fractal dimension and

6、 there exists self-similarity in this rangeof the nucleotide sequence.CGAT-CGCGGCGTGTGTTATA - 0 0.5 1 1.5 2 2.5 3 3.5 07654321Intron Seq.Exon Seq.Random Seq.End-to-end Rangel n/RN212Main RangeGeometric RangelnNFractal dimension of classes ofsequencessequenceDgDgDmDmEX1.9040.1031.6640.230RE1.9530.072

7、1.7730.274IN1.7110.1471.5910.216RI1.9350.1341.6680.309 and are average of and of each class of sequences, and are their standard deviations. The class of exon sequences is denoted as EX, and intron sequences IN, randomized exon sequences RE, and randomized intron sequences RI.gDDmgDDmDgDmComplexity

8、analysisUsing the methods and techniques in cryptology Coincident indexICCCffNNfNlLlllLl220011()()The original presentation of coincident index is:in which N is the number of characters in the sequence,L is the number of different kinds of character, and is the frequency of the lth character appeari

9、ng in this sequence.flThe coincident index introduced from cryptology is applied to DNA sequence analysis. In case of mono-letter,the expressing of coincident index IC of DNA sequence is:ICSTPILbbRIPjkjkkj()()1641j=0,1,.,L-1; k=A,C,G,T.R=4 25 (25 1)=2400 In case of biletter,the expression of IC is:I

10、CSTPILbbRIPjkjkkj()()1641j=0,1,.,L-1; k=AA,AC,AG,AT,CA,CC,.TG,TT R1 61 0 01 61 0 01 615 2 5()Composite Artificial Neural Network SystemAuto-annotation ProcedureA nn otation D ataB aseS eque n ce fro m 25W h ole gen om e sC o m p ared w ith C O GS elect co ntig fo rO R F pre dictionS eque n ce colle

11、ctio nS eque n ce F un ctio n In dexS eque n ce C lassficationP erl P ro cessin g M o d uleR esult F ile P arsin gC re at F u n ction C lassF o r A ceD B D ataB aseR ep ort F ileP rintin g P S I - B lastSequence Data Storage and DistributionDatabasePerl modulesAceclient/perlGifaceserverCacheDb.xxx.h

12、tmlDb.xxx.keysetDb.xxx.boxesDb.xxx.gif Cgi-binwebacedisplaygetimglabelHtml,gifJavaScriptJavaCookiesINTERNETData Analysis and Model DesignA. Complete Genome Analysis大規(guī)模全基因組shotgun測(cè)序Shotgun 反應(yīng): 109,970有效總讀長(zhǎng):26,553,803bp基因組總長(zhǎng):2,689,445bp冗余度:9.87PCR補(bǔ)洞PCR反應(yīng):40,000PCR產(chǎn)物測(cè)序:4,100Circular representation of t

13、he genome of T. tengcongensis MB4Sequence Data DistributionVectorSequencing and Data Processing Procedure BasecallingRepeat MarkORF Prediction Gene AnnotationFinishingAssembleVector MarkPhredPhd2fastaCrossmatchPhrapConsedRepeatmaskerGlimmerBlastxBlastnClastaltRNAscanQualityControlQualCalQualDrawQual

14、StatCOGsSwiss-port,PIR, GDB,GenBankSequencingComplexity analysisICCCffNNfNlLlllLl220011()()The original presentation of coincident index is:in which N is the number of characters in the sequence,L is the number of different kinds of character, and is the frequency of the lth character appearing in t

15、his sequence.flThe coincident index introduced from cryptology is applied to DNA sequence analysis. In case of mono-letter,the expressing of coincident index IC of DNA sequence is:ICSTPILbbRIPjkjkkj()()1641j=0,1,.,L-1; k=A,C,G,T.R=4 25 (25 1)=2400 In case of biletter,the expression of IC is:ICSTPILb

16、bRIPjkjkkj()()1641j=0,1,.,L-1; k=AA,AC,AG,AT,CA,CC,.TG,TT R1 61 0 01 61 0 01 615 2 5()Auto-annotation ProcedureA nn otation D ataB aseS eque n ce fro m 25W h ole gen om e sC o m p ared w ith C O GS elect co ntig fo rO R F pre dictionS eque n ce colle ctio nS eque n ce F un ctio n In dexS eque n ce C lassfica

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論