全文預(yù)覽已結(jié)束
下載本文檔
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
Anonlinesystemforfunctionalrelationshipanalysisofgenome-widegeneproductsQiangHu,Zheng-GuoZhang*DepartmentofBiomedicalEngineeringInstituteofBasicMedicalSciences,ChineseAcademyofMedicalSciencesSchoolofBasicMedicine,PekingUnionMedicalCollegeBeijing,China*Email:AbstractThoughthefunctionalrelationshipanalysisforgeneproductsisuseful,aconvenientanduser-friendlytooltomeasurethefunctionalsimilarityforgenome-widegeneproductsinmultiplespeciesisstillnotavailable.Wecomputedthefunctionalsimilarityofgeneproductsingenomewideinhuman,mouseandratbasedonouralgorithm.Databaseandwebserviceswerebuiltbasedontheprecomputedsimilarityscores.Oursystemprovidedagroupoftoolstoretrievethefunctionalsimilarityandanalysisthefunctionalrelationshipforgeneproducts.Thewebserviceisfreelyavailableat/fsim/index.html.I.INTRODUCTIONThefunctionalsimilaritymeasurementforgeneproductsisausefulmethodtoinvestigatetheirrelationship.Oneimportantapplicationoffunctionalsimilarityanalysisistopredictandassesstheprotein-proteininteractions1,2,3.Anotherapplicationistodiscoverthepositionalcandidategenesofdiseases4.Functionalsimilarityalsocanbeusedtoclustergeneexpressiondataforfunctionalrelatedgeneshavesimilarexpressionprofiles5.Mostofmethodstomeasurefunctionalsimilarityarebasedontheannotationinformationofgeneproducts.TheGeneOntology(GO)database6providesacontrolledvocabularyoftermstoannotatethefunctionsofgeneproducts.Itiswidelyadoptedbymostofalgorithmsandtoolstomeasurethefunctionalsimilarity.Thoughmanytoolshavebeendevelopedtomeasurethefunctionalsimilarity,aconvenientanduser-friendlytooltoanalysistherelationshipofgenome-widegeneproductsisstillnotavailable.TheGOtoolswebpagecollectedalotofsoftwarebasedonthedatabase.Forexample,AmiGO7andQuickGO8provideaninterfacetosearchandbrowsetheontologyandannotationdata.Therelationshipofgeneproductscanbecomparedbyusersbutnotautomatically.GOTax9thatintegratedtheannotationdataofproteinandproteinfamiliesprovidedafunctionalsimilaritysearchtool(FSST)basedonthealgorithmofInformationContent(IC)ofGOterms.Thetoolcanbeusedtomeasurethefunctionalsimilarityofproteinsandproteinfamilies.G-SESAME10developedanewalgorithmtomeasurethefunctionalsimilarity.Thewebtoolitofferedonlycanbeusedtomeasurethefunctionalsimilarityoftwogeneproducts.FunSimMat11calculatedthesimilarityofproteinsinUniProtKB12.Awebsearchenginewasdevelopedtoretrievethefunctionalsimilarityofproteins.Itwouldbehelpfulifatoolcouldassistbiologiststocomparethefunctionalrelationshipofinterestedgeneswithwholegenomegeneproducts.However,genome-widerelationshipanalysiscouldnotbecarriedoutinordinarycomputingservers.Itwouldcostdozensofhourseveninhighperformancecluster.Wedevelopedanonlinesystemforfunctionalrelationshipanalysisofgenome-widegeneproducts.Anall-against-allfunctionalsimilaritycomparisonforgenome-widegeneproductsinhuman,mouseandratwerecomputedpreliminarilybasedonouralgorithms.Threedatabaseswerebuilttointegratethesimilarityscoresrespectively.Basedontheprecomputedsimilarityscores,awebsearchenginewasdevelopedtoretrievethesimilarityscoresdireclty.Someotherrelatedtoolsweredevelopedtoextendtheonlinewebservices.Biologistscanusethesystemeasilytoanalyzethefunctionalrelationshipofgenome-widegeneproducts.II.CONSTRUCTIONANDCONTENTA.DataSetsTherawdataadoptedtocalculatethesimilarityweredirectlyfromtheannotationpackagesofR/Bioconductorproject13,14.Forexample,thepackagesorg.Hs.eg.db,org.Mm.eg.dbandorg.Rn.eg.dbcontainedtheGOannotationdataofgeneproductsinhuman,mouseandratrespectively.ThepackagesweredescribedinthetableI.AlltheseGOrelatedpackageswerebuiltbyBioconductorprojectaccordingtothelatestversionofGOdatabasein2009March.TheannotationdataofprobeIDsofdifferentmicroarrayplatformswerealsofromtheannotationpackagesinBioconductor.B.Implement1)Algorithm:Threedatabasesintegratedallsimilarityscoresofgenome-widegeneproductsinhuman,mouseandratrespectively.Weproposedanovelalgorithmtomeasuretherelationship.Statisticalmodelwasbuiltaccordingtothecommoninformationoftheannotationtermsbetweentwogeneproducts.TheGOprovidedthreestructuredvocabularies(ontologies)todescribegeneproductsintermsoftheirassociatedbiologicalprocesses(BP),cellularcomponents978-1-4244-4713-8/10/$25.002010IEEEFig.1.Functionalsimilaritysearchforgeneproducts.TABLEIDATASETSADOPTEDINTHEDATABASESAnnotationpackagesSpiecesRawdataorg.Hs.eg.dbHumanGOannotation;M.Mm.eg.dbMousedittoorg.Rn.eg.dbRatdittoorg.Hs.sp.dbHumanProteinidentifierstoEntrezIDsorg.Mm.sp.dbMousedittoorg.Rn.sp.dbRatdittoGO.db-GOtermsrelationshipandannotationKEGG.db-AnnotationmapsforKEGGdatabase(CC)andmolecularfunctions(MF).TheGOtermscouldbeconnectedwithchild-parentrelationshipbetweeneachother.ThethreeontologieswerestructuredasDirectedAcyclicGraph(DAG).GOtermswereindifferentlevelsoftheDAG.ThetermslocatedclosetotheleavesofDAGdescribedmorespecificmeanings.Thesetermscontainedmoreinformationthanthetermslocatedclosetotheroot.Wedefinedaparameter,LevelCoefficient(LC),todenotetheweightoftheinformationofaGOterm.TheLCvaluesofleavesweredefinedas1.Fromchildrentoparents,theLCvaluesgraduallydecreasedastheratiooftheirlevelsintheDAG.Ageneusuallywasannotatedbymorethanoneterminthreeontologies.Theinformationofatermshouldalsocontaintheinformationofitsancestorterms.Thus,thecommontermsbetweentwogeneproductscouldbesummarizedtoacontingencytable.TheLCvaluesasinformationweightsoftermscouldbecountedtothecontingencytable.Therefore,therelationshipoftwogeneproductscouldbemeasuredbystatisticallytestingtheagreementofthecontingencytable.WeadoptedKappavaluetotesttheagreement.Furthermore,theZtestwasusedtotestthesignificantofKappavalue.Whentwogeneproductswerefunctionallyrelated,theKappavaluewouldbecloseto1.2)SimilarityScoresComputation:Therearemorethantenthousandsgeneproductsindifferentspecies.All-against-allcomparisonofallgeneproductsrequiredsolargeamountofcomputingpowerthatordinarycomputerscouldnotfinishthecalculation.Thecomputationaltaskwasseparatedintosmalltasksbydividingtheinputdata.Iftheamountofgenome-widegeneproductsisn,theithcalculationtaskwastocalculatethesimilarityscoresbetweentheithgeneproductandtheonesfromthefirsttotheithgeneproducts.DifferentcalculationtaskswereassignedtodifferentCPUsinahighperformancecluster.Thenthecomputationalresultsweresummarizedtoamatrixofsimilarityscores.ParallelprogramsbasedonRlanguageweredevelopedtorealizethecomputation.RpackagesRmpi15andsnow16providedparallelinterfacestoMPIlibraryoftheclusterenvironment.C.DatabasesThreedatabaseswerecreatedtointegratetheprecomputedsimilarityscoresmatricesofallgeneproductsinhuman,mouseandrat.ThescoresincludedKappavaluesandZscoresbetweeneverytwogeneproducts.Forexample,therewere17482humangeneproducts,thenthescorematrixwiththedimensionof1748217482wouldbestoredinthedatabases.Rlanguage13wereusedtodevelopprogramstoperformthecomputation.Theresultsmatricesweresohugethatitwasdifficulttobestoredinregularrelationaldatabase.Fig.2.Onlinetoolsforfunctionalrelationshipanalysis.Weformattedthelargescorematricesintohundredsofmatriceswithsmallerdimensions.ThenoursystemstoredthematricesdatadirectlyinRbinaryfiles(Rdata).Thevolumeofdatabasefileswasapproximate4gigabytesinsize.ThefiledatabasecouldbeimportedbyRscripts.D.WebsystemThesystemcouldbevisitedthoughinternettoretrieveandanalyzethefunctionalrelationshipofgeneproducts.TheApachehttpserverwasusedtoparsetheHTMLwebpages.Throughthewebserver,theuserscouldsubmittheirdatatothesystemandtheresultswouldbereturnedonthewebpages.Renvironmentwasthebaseofthesystem,whichwasinchargeofdataanalysisandinteractingwiththedatabases.Rapache17asafunctionalmoduleofApache,connectedthewebserverandRenvironment.ThedataandvariablessubmittedbytheuserscouldbetransferredtoRenvironmentviaApache.TheresultsfromRprogramsalsocouldbereturnedtotheusersthroughthewebserver.III.UTILITYANDDISCUSSIONA.WebInterfacesWebinterfacestothedatabaseandanalysistoolsweredeveloped.Asshowninfigure1,ourwebtoolsweredesignedintheconciseanduser-friendlyway.Thesystemprovidedthetoolsoffunctionalsimilaritysearchandclassificationforgeneproducts.Someothertools,suchasgeneenrichmentanalysis,identifierconversionandGOannotation,wereextendedtothesystemtoassistthedataanalysis.DocumentswerealsowrittenintheFAQpagetodescribethetoolsandgiveexamples.B.FunctionalsimilaritysearchforasinglegeneproductThegFSimtoolprovidesafunctiontosearchthemostrelatedgeneproductsforasinglegeneproductinthegenome(Figure1A).SeveralidentifiersofgeneproductsincludingEntrezID,Symbol,UnigeneandSwissProtIDweresupported.Geneproductsinthreespeciesincludinghuman,mouseandratcouldbesearchedinthetool.Thenumberofgeneproductsintheresultscouldbespecified.Thetop100functionallysimilargeneproductswouldbereturnedintheresultsbydefaults.EntrezID,annotatedGOtermsandZscoreswouldbeshowninthesearchresults(Figure1B).GeneproductsannotatedwiththesameGOtermswouldbeputinthesamerow.ThesearchresultscouldalsobedownloadedintheCSV(commaseparatedvalues)formatfile.C.FunctionalsimilarityanalysisforagroupofgeneproductsThegsFSimtoolcouldbeusedtoretrieveandanalyzethefunctionalrelationshipofagroupofgeneproducts(Figure1C).MultipleidentifiersandspeciesofgeneproductsweresupportedinthetoolassameasgFSim.Agroupofformattedgeneproductscouldbesubmittedwiththeseparatorssuchascommas,semicolons,spacesandlinebreaks.AsimilarityscorematrixoftheinputgeneproductswithKappavalueswasshownintheresults.Thesimilarityscorematrixwasalsographicallyvisualized.Aheatmap(Figure1D)demonstratedtheannotatedGOtermsofgeneproducts.ThebluecolorinthegraphdenotedthetheGOtermswereusedtoannotatethecorrespondinggeneproducts.Blackmeantthesetermsdidnotannotatethegeneproducts.Adendrogram(Figure1E)intheresultsshowedthehierarchicalclusteringresultsaccordingtothesimilarityscorematrix.Geneproductswereclassifiedintodifferentgroupsbasedontheirfunctionalrelationship.D.EnrichmentAnalysisGeneenrichmentanalysis18isausefulmethodtodiscoverthespecificfunctionalannotationintheselectedgenesfromthetotal(universe)genes.Asshowninfigure2A,theannotationdatabaseshouldbeselectedfirstly.BP,MFandCContologyofGOdatabaseandKEGGpathwaydatabase19weresupportedinthetool.Thenthep-valueofsignificanttestintheenrichmentanalysisalgorithmcouldbespecified.Thep-valuewas0.05bydefault.Iftheannotationtermwasmorespecificandimportantintheselectedgeneproducts,thetermwouldgetasmallerp-value.Thisvaluecouldbeusedtorestrictthenumberofresults.Iftherewasnoresultintheenrichmentanalysis,abiggerp-valuecouldbeassigned.Agroupofinterestedgeneproductscouldbesubmittedtotheselectedgenes.Theoverallgeneproductsshouldbesubmittedastheuniversegenes.Theanalysisresultsincludethesignificantlyenrichedfunctions,P-values,oddsratio,andannotatedcounts(Figure2B).TheresultscouldalsobedownloadedintheCSVformatfile.Theenrichmentanalysistoolcouldbeusedtoanalysistheresultsoffunctionalsearchforagroupofgeneproducts(gsFSim).E.MicroarrayProbeIDConversionThemicroarrayprobeIDconversiontoolcouldtransfertheprobeIDsfromdifferentmicroaryplatformstoEntrezIDs(Figure2C).Mostofcommercialgenechips,suchasAffymetrix,Agilent,GE(GeneralElectric)andIlluminaweresupported.MicroarrayprobeIDscouldbeconvertedtoEntrezID,thentheIDscouldbesubmittedtotheothertoolstoanalyzethefunctionalrelationship.Therefore,thetoolextendsthesupportedidentifierstypesofgeneproductsinthesystem.F.GOAnnotationAsetofGOtermscouldbesubmittedtotheannotationtooltosearchthedetailedinformationinbatch.AfteragroupofGOtermsweresubmitted,theresultswouldbereturnedincludingthetermnames,definitions,synonymsandLCvaluesindescendingorderofLCvalues.LCdenotedtheweightedinformationofaGOterm.Thusthetermswithmorespecificbiologicalmeaningswouldbeshowninthefrontoftheresults.IV.CONCLUSIONForthepurposeofdevelopingapowerfulanduser-friendlytooltoanalyzethefunctionalrelationshipofgenome-widegeneproducts,wecomputedthefunctionalsimilarityscoresofallgeneproductsinhuman,mouseandratbasedonouralgorithminadvance.Anonlinesystemwasdevelopedonthebaseoftheprecomputedsimilarityscores.Thesystemprovidedagroupoftoolstoretrievethefunctionalsimilarityandanalyzetherelationshipforgenome-widegeneproducts.Ourwebservicesarefreelyavailableat/fsim/index.html.ACKNOWLEDGMENTThisworkwaspartiallysupportedbyChinaMedicalBoardofNewYork,Inc.#03-787.ThecomputingtasksofsimilarityscorematriceswereperformedintheHighPerformanceComputingCenter,PekingUnionMedicalCollege.REFERENCES1L.J.Lu,Y.Xia,A.Paccanaro,H.Yu,andM.Gerstein,“Assessingthelimitsofgenomicdataintegrationforpredictingproteinnetworks.”GenomeRes,vol.15,no.7,pp.945953,Jul2005.2A.Schlicker,C.Huthmacher,F.Ramrez,T.Lengauer,andM.Albrecht,“Functionalevaluationofdomain-domaininteractionsandhumanpro-teininteractionnetworks.”Bioinformatics,vol.23,no.7,pp.859865,Apr2007.3M.E.Futschik,G.Chaurasia,andH.Herzel,“Comparisonofhumanprotein-proteininteractionmaps.”Bioinformatics,vol.23,no.5,pp.605611,Mar2007.4E.A.Adie,R.R.Adams,K.L.Evans,D.J.Porteous,andB.S.Pickard,“Suspects:enablingfastandeffectiveprioritizationofpositionalcandidates.”Bioinformatics,vol.22,no.6,pp.773774,Mar2006.5Y.QuandS.Xu,“Supervisedclusteranalysisformicroarraydatabasedonmultivariategaussianmixture.”Bioinformatics,vol.20,no.12,pp.19051913,Aug2004.6M.Ashburner,C.A.Ball,J.A.Blake,D.Botstein,H.Butler,J.M.Cherry,A.P.Davis,K.Dolinski,S.S.Dwight,J.T.Eppig,M.A.Harris,D.P.Hill,L.Issel-Tarver,A.Kasarskis,S.Lewis,J.C.Matese,J.E.Richardson,M.Ringwald,G.M.Rubin,andG.Sherlock,“Geneontology:toolfortheunificationofbiology.thegeneontologyconsortium.”NatGenet,vol.25,no.1,pp.2529,May2000.7S.Carbon,A.Ireland,C.J.Mungall,S.Shu,B.Marshall,S.Lewis,A.O.Hub,andW.P.W.Group,“Amigo:onlineaccesstoontologyandannotationdata.”Bioinformatics,vol.25,no.2,pp.288289,Jan2009.8D.Binns,E.Dimmer,R.Huntley,D.Barrell,C.ODonovan,andR.Apweiler,“Quickgo:aweb-basedtoolforgeneontologysearching.”Bioinformatics,vol.25,no.22,pp.30453046,Nov2009.9A.Schlicker,J.Rahnenfhrer,M.Albrecht,T.Lengauer,andF.S.Domingues,“Gotax:investigatingbiologicalprocessesandbiochemicalactivitiesalongthetaxonomictree.”GenomeBiol,vol.8,no.3,p.R33,2007.10Z.Du,L.Li,C.-F.Chen,P.S.Yu,andJ.Z.Wang,“G-sesame:webtoolsforgo-term-ba
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年大學(xué)大三(植物營養(yǎng)學(xué))植物施肥技術(shù)階段測試題及答案
- 2025年大學(xué)大二(計算機(jī)科學(xué)與技術(shù))計算機(jī)網(wǎng)絡(luò)基礎(chǔ)階段測試題及答案
- 2025年高職數(shù)字印刷技術(shù)(圖文處理工藝)試題及答案
- 2025年大學(xué)一年級(預(yù)防醫(yī)學(xué))流行病學(xué)概論試題及答案
- 2025年高職畜牧獸醫(yī)(獸醫(yī)藥理學(xué))試題及答案
- 2025年中職農(nóng)業(yè)機(jī)械應(yīng)用技術(shù)(農(nóng)業(yè)機(jī)械基礎(chǔ))試題及答案
- 2025年高職學(xué)前教育(教育基礎(chǔ))試題及答案
- 2025年高職食品加工工藝(食品保鮮技術(shù))試題及答案
- 2025年高職焊接技術(shù)與自動化(焊接自動化設(shè)備)試題及答案
- 2026年心理咨詢師(心理疏導(dǎo))考題及答案
- 2025年涼山教師業(yè)務(wù)素質(zhì)測試題及答案
- 2026年昭通市威信縣公安局第一季度輔警招聘(14人)筆試模擬試題及答案解析
- 氫能技術(shù)研發(fā)協(xié)議
- 經(jīng)皮內(nèi)鏡下胃造瘺術(shù)護(hù)理配合
- 財務(wù)部2025年總結(jié)及2026年工作計劃
- 國企財務(wù)崗筆試題目及答案
- 2025年國家開放大學(xué)(電大)《中國近現(xiàn)代史綱要》期末考試復(fù)習(xí)試題及答案解析
- 工程倫理-形考任務(wù)一(權(quán)重20%)-國開(SX)-參考資料
- 2025年叉車工安全教育培訓(xùn)試題附答案
- 2025至2030中國半導(dǎo)體AMC過濾器行業(yè)競爭優(yōu)勢及前景趨勢預(yù)判報告
- 五恒系統(tǒng)節(jié)能環(huán)保施工技術(shù)規(guī)范與優(yōu)化研究
評論
0/150
提交評論