版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
數(shù)據(jù)挖掘外文翻譯參考文獻數(shù)據(jù)挖掘外文翻譯參考文獻(文檔含中英文對照即英文原文和中文翻譯)外文:WhatisDataMining?Simplystated,dataminingreferstoextractingor“mining”knowledgefromlargeamountsofdata.Thetermisactuallyamisnomer.Rememberthattheminingofgoldfromrocksorsandisreferredtoasgoldminingratherthanrockorsandmining.Thus,“datamining”shouldhavebeenmoreappropriatelynamed“knowledgeminingfromdata”,whichisunfortunatelysomewhatlong.“Knowledgemining”,ashorterterm,maynotreflecttheemphasisonminingfromlargeamountsofdata.Nevertheless,miningisavividtermcharacterizingtheprocessthatfindsasmallsetofpreciousnuggetsfromagreatdealofrawmaterial.Thus,suchamisnomerwhichcarriesboth“data”and“mining”becameapopularchoice.Therearemanyothertermscarryingasimilarorslightlydifferentmeaningtodatamining,suchasknowledgeminingfromdatabases,knowledgeextraction,data/patternanalysis,dataarchaeology,anddatadredging.Manypeopletreatdataminingasasynonymforanotherpopularlyusedterm,“KnowledgeDiscoveryinDatabases”,orKDD.Alternatively,othersviewdataminingassimplyanessentialstepintheprocessofknowledgediscoveryindatabases.Knowledgediscoveryconsistsofaniterativesequenceofthefollowingsteps:·datacleaning:toremovenoiseorirrelevantdata,·dataintegration:wheremultipledatasourcesmaybecombined,·dataselection:wheredatarelevanttotheanalysistaskareretrievedfromthedatabase,·datatransformation:wheredataaretransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations,forinstance,·datamining:anessentialprocesswhereintelligentmethodsareappliedinordertoextractdatapatterns,·patternevaluation:toidentifythetrulyinterestingpatternsrepresentingknowledgebasedonsomeinterestingnessmeasures,and·knowledgepresentation:wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser.Thedataminingstepmayinteractwiththeuseroraknowledgebase.Theinterestingpatternsarepresentedtotheuser,andmaybestoredasnewknowledgeintheknowledgebase.Notethataccordingtothisview,dataminingisonlyonestepintheentireprocess,albeitanessentialonesinceituncovershiddenpatternsforevaluation.Weagreethatdataminingisaknowledgediscoveryprocess.However,inindustry,inmedia,andinthedatabaseresearchmilieu,theterm“datamining”isbecomingmorepopularthanthelongertermof“knowledgediscoveryindatabases”.Therefore,inthisbook,wechoosetousetheterm“datamining”.Weadoptabroadviewofdataminingfunctionality:dataminingistheprocessofdiscoveringinterestingknowledgefromlargeamountsofdatastoredeitherindatabases,datawarehouses,orotherinformationrepositories.Basedonthisview,thearchitectureofatypicaldataminingsystemmayhavethefollowingmajorcomponents:1.Database,datawarehouse,orotherinformationrepository.Thisisoneorasetofdatabases,datawarehouses,spreadsheets,orotherkindsofinformationrepositories.Datacleaninganddataintegrationtechniquesmaybeperformedonthedata.2.Databaseordatawarehouseserver.Thedatabaseordatawarehouseserverisresponsibleforfetchingtherelevantdata,basedontheuser’sdataminingrequest.3.Knowledgebase.Thisisthedomainknowledgethatisusedtoguidethesearch,orevaluatetheinterestingnessofresultingpatterns.Suchknowledgecanincludeconcepthierarchies,usedtoorganizeattributesorattributevaluesintodifferentlevelsofabstraction.Knowledgesuchasuserbeliefs,whichcanbeusedtoassessapattern’sinterestingnessbasedonitsunexpectedness,mayalsobeincluded.Otherexamplesofdomainknowledgeareadditionalinterestingnessconstraintsorthresholds,andmetadata(e.g.,describingdatafrommultipleheterogeneoussources).4.Dataminingengine.Thisisessentialtothedataminingsystemandideallyconsistsofasetoffunctionalmodulesfortaskssuchascharacterization,associationanalysis,classification,evolutionanddeviationanalysis.5.Patternevaluationmodule.Thiscomponenttypicallyemploysinterestingnessmeasuresandinteractswiththedataminingmodulessoastofocusthesearchtowardsinterestingpatterns.Itmayaccessinterestingnessthresholdsstoredintheknowledgebase.Alternatively,thepatternevaluationmodulemaybeintegratedwiththeminingmodule,dependingontheimplementationofthedataminingmethodused.Forefficientdatamining,itishighlyrecommendedtopushtheevaluationofpatterninterestingnessasdeepaspossibleintotheminingprocesssoastoconfinethesearchtoonlytheinterestingpatterns.6.Graphicaluserinterface.Thismodulecommunicatesbetweenusersandthedataminingsystem,allowingtheusertointeractwiththesystembyspecifyingadataminingqueryortask,providinginformationtohelpfocusthesearch,andperformingexploratorydataminingbasedontheintermediatedataminingresults.Inaddition,thiscomponentallowstheusertobrowsedatabaseanddatawarehouseschemasordatastructures,evaluateminedpatterns,andvisualizethepatternsindifferentforms.Fromadatawarehouseperspective,dataminingcanbeviewedasanadvancedstageofon-1ineanalyticalprocessing(OLAP).However,datamininggoesfarbeyondthenarrowscopeofsummarization-styleanalyticalprocessingofdatawarehousesystemsbyincorporatingmoreadvancedtechniquesfordataunderstanding.Whiletheremaybemany“dataminingsystems”onthemarket,notallofthemcanperformtruedatamining.Adataanalysissystemthatdoesnothandlelargeamountsofdatacanatmostbecategorizedasamachinelearningsystem,astatisticaldataanalysistool,oranexperimentalsystemprototype.Asystemthatcanonlyperformdataorinformationretrieval,includingfindingaggregatevalues,orthatperformsdeductivequeryansweringinlargedatabasesshouldbemoreappropriatelycategorizedaseitheradatabasesystem,aninformationretrievalsystem,oradeductivedatabasesystem.Datamininginvolvesanintegrationoftechniquesfrommult1pledisciplinessuchasdatabasetechnology,statistics,machinelearning,highperformancecomputing,patternrecognition,neuralnetworks,datavisualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.Weadoptadatabaseperspectiveinourpresentationofdatamininginthisbook.Thatis,emphasisisplacedonefficientandscalabledataminingtechniquesforlargedatabases.Byperformingdatamining,interestingknowledge,regularities,orhigh-levelinformationcanbeextractedfromdatabasesandviewedorbrowsedfromdifferentangles.Thediscoveredknowledgecanbeappliedtodecisionmaking,processcontrol,informationmanagement,queryprocessing,andsoon.Therefore,dataminingisconsideredasoneofthemostimportantfrontiersindatabasesystemsandoneofthemostpromising,newdatabaseapplicationsintheinformationindustry.AclassificationofdataminingsystemsDataminingisaninterdisciplinaryfield,theconfluenceofasetofdisciplines,includingdatabasesystems,statistics,machinelearning,visualization,andinformationscience.Moreover,dependingonthedataminingapproachused,techniquesfromotherdisciplinesmaybeapplied,suchasneuralnetworks,fuzzyandorroughsettheory,knowledgerepresentation,inductivelogicprogramming,orhighperformancecomputing.Dependingonthekindsofdatatobeminedoronthegivendataminingapplication,thedataminingsystemmayalsointegratetechniquesfromspatialdataanalysis,Informationretrieval,patternrecognition,imageanalysis,signalprocessing,computergraphics,Webtechnology,economics,orpsychology.Becauseofthediversityofdisciplinescontributingtodatamining,dataminingresearchisexpectedtogeneratealargevarietyofdataminingsystems.Therefore,itisnecessarytoprovideaclearclassificationofdataminingsystems.Suchaclassificationmayhelppotentialusersdistinguishdataminingsystemsandidentifythosethatbestmatchtheirneeds.Dataminingsystemscanbecategorizedaccordingtovariouscriteria,asfollows.1)Classificationaccordingtothekindsofdatabasesmined.Adataminingsystemcanbeclassifiedaccordingtothekindsofdatabasesmined.Databasesystemsthemselvescanbeclassifiedaccordingtodifferentcriteria(suchasdatamodels,orthetypesofdataorapplicationsinvolved),eachofwhichmayrequireitsowndataminingtechnique.Dataminingsystemscanthereforebeclassifiedaccordingly.Forinstance,ifclassifyingaccordingtodatamodels,wemayhavearelational,transactional,object-oriented,object-relational,ordatawarehouseminingsystem.Ifclassifyingaccordingtothespecialtypesofdatahandled,wemayhaveaspatial,time-series,text,ormultimediadataminingsystem,oraWorld-WideWebminingsystem.Othersystemtypesincludeheterogeneousdataminingsystems,andlegacydataminingsystems.2)Classificationaccordingtothekindsofknowledgemined.Dataminingsystemscanbecategorizedaccordingtothekindsofknowledgetheymine,i.e.,basedondataminingfunctionalities,suchascharacterization,discrimination,association,classification,clustering,trendandevolutionanalysis,deviationanalysis,similarityanalysis,etc.Acomprehensivedataminingsystemusuallyprovidesmultipleand/orintegrateddataminingfunctionalities.Moreover,dataminingsystemscanalsobedistinguishedbasedonthegranularityorlevelsofabstractionoftheknowledgemined,includinggeneralizedknowledge(atahighlevelofabstraction),primitive-levelknowledge(atarawdatalevel),orknowledgeatmultiplelevels(consideringseverallevelsofabstraction).Anadvanceddataminingsystemshouldfacilitatethediscoveryofknowledgeatmultiplelevelsofabstraction.3)Classificationaccordingtothekindsoftechniquesutilized.Dataminingsystemscanalsobecategorizedaccordingtotheunderlyingdataminingtechniquesemployed.Thesetechniquescanbedescribedaccordingtothedegreeofuserinteractioninvolved(e.g.,autonomoussystems,interactiveexploratorysystems,query-drivensystems),orthemethodsofdataanalysisemployed(e.g.,database-orientedordatawarehouse-orientedtechniques,machinelearning,statistics,visualization,patternrecognition,neuralnetworks,andsoon).Asophisticateddataminingsystemwilloftenadoptmultipledataminingtechniquesorworkoutaneffective,integratedtechniquewhichcombinesthemeritsofafewindividualapproaches.翻譯:什么是數(shù)據(jù)挖掘?簡單地說,數(shù)據(jù)挖掘是從大量的數(shù)據(jù)中提取或“挖掘”知識。該術語實際上有點兒用詞不當。注意,從礦石或砂子中挖掘黃金叫做黃金挖掘,而不是叫做礦石挖掘。這樣,數(shù)據(jù)挖掘應當更準確地命名為“從數(shù)據(jù)中挖掘知識”,不幸的是這個有點兒長?!爸R挖掘”是一個短術語,可能它不能反映出從大量數(shù)據(jù)中挖掘的意思。畢竟,挖掘是一個很生動的術語,它抓住了從大量的、未加工的材料中發(fā)現(xiàn)少量金塊這一過程的特點。這樣,這種用詞不當攜帶了“數(shù)據(jù)”和“挖掘”,就成了流行的選擇。還有一些術語,具有和數(shù)據(jù)挖掘類似但稍有不同的含義,如數(shù)據(jù)庫中的知識挖掘、知識提取、數(shù)據(jù)/模式分析、數(shù)據(jù)考古和數(shù)據(jù)捕撈。許多人把數(shù)據(jù)挖掘視為另一個常用的術語—數(shù)據(jù)庫中的知識發(fā)現(xiàn)或KDD的同義詞。而另一些人只是把數(shù)據(jù)挖掘視為數(shù)據(jù)庫中知識發(fā)現(xiàn)過程的一個基本步驟。知識發(fā)現(xiàn)的過程由以下步驟組成:1)數(shù)據(jù)清理:消除噪聲或不一致數(shù)據(jù),2)數(shù)據(jù)集成:多種數(shù)據(jù)可以組合在一起,3)數(shù)據(jù)選擇:從數(shù)據(jù)庫中檢索與分析任務相關的數(shù)據(jù),4)數(shù)據(jù)變換:數(shù)據(jù)變換或統(tǒng)一成適合挖掘的形式,如通過匯總或聚集操作,5)數(shù)據(jù)挖掘:基本步驟,使用智能方法提取數(shù)據(jù)模式,6)模式評估:根據(jù)某種興趣度度量,識別表示知識的真正有趣的模式,7)知識表示:使用可視化和知識表示技術,向用戶提供挖掘的知識。數(shù)據(jù)挖掘的步驟可以與用戶或知識庫進行交互。把有趣的模式提供給用戶,或作為新的知識存放在知識庫中。注意,根據(jù)這種觀點,數(shù)據(jù)挖掘只是整個過程中的一個步驟,盡管是最重要的一步,因為它發(fā)現(xiàn)隱藏的模式。我們同意數(shù)據(jù)挖掘是知識發(fā)現(xiàn)過程中的一個步驟。然而,在產業(yè)界、媒體和數(shù)據(jù)庫研究界,“數(shù)據(jù)挖掘”比那個較長的術語“數(shù)據(jù)庫中知識發(fā)現(xiàn)”更為流行。因此,在本書中,選用的術語是數(shù)據(jù)挖掘。我們采用數(shù)據(jù)挖掘的廣義觀點:數(shù)據(jù)挖掘是從存放在數(shù)據(jù)庫中或其他信息庫中的大量數(shù)據(jù)中挖掘出有趣知識的過程?;谶@種觀點,典型的數(shù)據(jù)挖掘系統(tǒng)具有以下主要成分:數(shù)據(jù)庫、數(shù)據(jù)倉庫或其他信息庫:這是一個或一組數(shù)據(jù)庫、數(shù)據(jù)倉庫、電子表格或其他類型的信息庫。可以在數(shù)據(jù)上進行數(shù)據(jù)清理和集成。數(shù)據(jù)庫、數(shù)據(jù)倉庫服務器:根據(jù)用戶的數(shù)據(jù)挖掘請求,數(shù)據(jù)庫、數(shù)據(jù)倉庫服務器負責提取相關數(shù)據(jù)。知識庫:這是領域知識,用于指導搜索,或評估結果模式的興趣度。這種知識可能包括概念分層,用于將屬性或屬性值組織成不同的抽象層。用戶確信方面的知識也可以包含在內。可以使用這種知識,根據(jù)非期望性評估模式的興趣度。領域知識的其他例子有興趣度限制或閾值和元數(shù)據(jù)(例如,描述來自多個異種數(shù)據(jù)源的數(shù)據(jù))。數(shù)據(jù)挖掘引擎:這是數(shù)據(jù)挖掘系統(tǒng)基本的部分,由一組功能模塊組成,用于特征化、關聯(lián)、分類、聚類分析以及演變和偏差分析。模式評估模塊:通常,此成分使用興趣度度量,并與數(shù)據(jù)挖掘模塊交互,以便將搜索聚集在有趣的模式上。它可能使用興趣度閾值過濾發(fā)現(xiàn)的模式。模式評估模塊也可以與挖掘模塊集成在一起,這依賴于所用的數(shù)據(jù)挖掘方法的實現(xiàn)。對于有效的數(shù)據(jù)挖掘,建議盡可能深地將模式評估推進到挖掘過程之中,以便將搜索限制在有興趣的模式上。圖形用戶界面:本模塊在用戶和數(shù)據(jù)挖掘系統(tǒng)之間進行通信,允許用戶與系統(tǒng)進行交互,指定數(shù)據(jù)挖掘查詢或任務,提供信息、幫助搜索聚焦,根據(jù)數(shù)據(jù)挖掘的中間結果進行探索式數(shù)據(jù)挖掘。此外,此成分還允許用戶瀏覽數(shù)據(jù)庫和數(shù)據(jù)倉庫模式或數(shù)據(jù)結構,評估挖掘的模式,以不同的形式對模式進行可視化。從數(shù)據(jù)倉庫觀點,數(shù)據(jù)挖掘可以看作聯(lián)機分析處理(OLAP)的高級階段。然而,通過結合更高級的數(shù)據(jù)理解技術,數(shù)據(jù)挖掘比數(shù)據(jù)倉庫的匯總型分析處理走得更遠。盡管市場上已有許多“數(shù)據(jù)挖掘系統(tǒng)”,但是并非所有系統(tǒng)的都能進行真正的數(shù)據(jù)挖掘。不能處理大量數(shù)據(jù)的數(shù)據(jù)分析系統(tǒng),最多是被稱作機器學習系統(tǒng)、統(tǒng)計數(shù)據(jù)分析工具或實驗系統(tǒng)原型。一個系統(tǒng)只能夠進行數(shù)據(jù)或信息檢索,包括在大型數(shù)據(jù)庫中找出聚集的值或回答演繹查詢,應當歸類為數(shù)據(jù)庫系統(tǒng)
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年西安雁塔區(qū)長延堡社區(qū)衛(wèi)生服務中心招聘備考題庫及完整答案詳解一套
- 2025年河北省三河市醫(yī)院招聘36人備考題庫及完整答案詳解1套
- 灰色插畫風總結匯報模板
- 2025年成都大學附屬小學公開招聘教師備考題庫含答案詳解
- 2025年遵化市事業(yè)單位公開選聘高層次人才8人備考題庫含答案詳解
- 2025年國家空間科學中心質量管理處招聘備考題庫參考答案詳解
- 2025年湯旺縣事業(yè)單位公開招聘19人備考題庫及完整答案詳解一套
- 2025年福州市婦女兒童活動中心關于招聘勞務派遣制工作人員的備考題庫及參考答案詳解1套
- 2025年南寧市興寧區(qū)虹橋路幼兒園招聘備考題庫及1套完整答案詳解
- 后疫情時代邀約策略
- 《大容積鋁合金內膽碳纖維全纏繞復合氣瓶》
- 化工設備新員工培訓課件
- 防漏電安全工作培訓課件
- 分包工程監(jiān)理方案(3篇)
- 燒燙傷凍傷救護知識培訓
- DB51∕T 2791-2021 川西高原公路隧道設計與施工技術規(guī)程
- 行政單位預算管理課件
- 2025年企業(yè)人大代表述職報告模版(七)
- 2025+CSCO胃癌診療指南解讀課件
- 快遞公司購銷合同協(xié)議
- 2025年鄭州公用事業(yè)投資發(fā)展集團有限公司招聘筆試參考題庫附帶答案詳解
評論
0/150
提交評論