知識圖譜構建技術-北理工_第1頁
知識圖譜構建技術-北理工_第2頁
知識圖譜構建技術-北理工_第3頁
知識圖譜構建技術-北理工_第4頁
知識圖譜構建技術-北理工_第5頁
已閱讀5頁,還剩52頁未讀 繼續(xù)免費閱讀

付費下載

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

百科和佛學知識圖譜構建技術介紹

漆桂林東南大學認知智能研究所Schedule

of

My

Talk百科知識圖譜構建技術佛學知識圖譜構建技術IntroductionofKnowledgeBasesWhatisknowledge?Facts,information,descriptions,orskillsAcquiredthroughexperienceoreducationbyperceiving,discovering,orlearningKnowledgebase:anorganizedrepositoryofknowledgeconsistingofconcepts,instances,relations(properties),facts,rulesetc.Isaprincipalpartofexpertsystems“thepowerofanAIprogramcametobeseenaslargelyinitsknowledgebase”EdwardFeigenbaum,1994ACMTuringAwardDevelopmentofKnowledgeBaseinRecentDecades1985199019952000(#$capitalCity#$France#$Paris)student

enrollee

person35millionarticlesin288differentlanguages…15thousandconcepts600millioninstances20billionfacts200520102012NELLGoogle

Knowledge

Graph

(KG)It

isanewgenerationofintelligentsearchtechnology,whichenablesyoutosearchforthings,notstringsFormal

definition:

a

knowledge

graph

is

a

knowledge

base

with

graph

structure,

where

the

nodes

are

instances

or

concepts,

and

edges

are

relations

between

themIt

is

a

special

semantic

networkIt

belongs

to

knowledge

engineering中興通訊上市公司非上市公司子公司中興康訊Acacia(IPO中)卓翼科技美國高通共進股份宇順電子美國博通供應商客戶競爭對手合作伙伴中國移動英特爾華為中國聯(lián)通大富科技華星創(chuàng)業(yè)盛路通信超聲電子ExampleKG

and

Semantic

Search

Go

deeper

and

broaderTechnologiesofKnowledgeBaseConstructionBaiduHudongZh-WikipediaKnowledge

Graph

(KG)ConstructionfromOnlineEncyclopediasWell-knownopenknowledgegraphssuchasDBpedia,YagoandZhishi.mearebuiltfromonlineencyclopedias.Technologies

ofencyclopedicknowledgegraphconstruction:DataextractionEntitymatchingTypeinferenceZhishi.meZhishi.me(http://zhishi.me)isthefirstefforttopublishlargescaleChinesesemanticdataandlinkthemtogetherasaChineseLinkingOpenData(CLOD).OverviewofZhishi.meCurrently,itconsistsofstructureddataextractedfromthreelargestChineseencyclopediasites:BaiduBaikeHudongBaikeChineseWikipediaItnow

has

over

10

milliondistinctinstancesand200millionRDFtriples,

and

can

be

accessed

by

online

API,

lookup

service

and

SPARQL

endpoint.LabelsAbstractsRedirectsImagesrdfs:labelzhishi:abstractrdfs:commentdbpedia:abstractzhishi:pageRedirectszhishi:thumbnailDataExtractionXingNiu,XinruoSun,HaofenWang,ShuRong,GuilinQi,YongYu:Zhishi.me-WeavingChineseLinkingOpenData.ISWC2011:205-220infoboxPropertieshttp://zhishi.me/[sourceName]/property/[propertyName]http://zhishi.me/baidubaike/property/中文名稱“南京”@zhDataExtractionInternalLinkszhishi:internalLinkzhishi:categoryskos:broaderDataExtractionEntityMatchingBaidu:北京Zh-Wiki:北京市EquivalententitiesEntityMatchingAutomaticallydiscoveringandrefiningdataset-specificmatchingrulesiniterationsDerivingtheserulesbyfindingthemostdiscriminativedatacharacteristicsforagivendatasourcepair,

e.g.(baidu:北京,Zh-wiki:北京市).From

Haofen

WangForeachpairofexistingmatchedinstances,theirproperty-valuepairsaremerged.ValuesProperty_1Property_2“大熊貓”baidu:標簽hudong:中文學名“Ailuropodamelanoleuca”baidu:拉丁學名hudong:二名法“白鰭豚”baidu:標簽hudong:中文學名“桂花”baidu:標簽hudong:中文學名………EntityMatchingFrom

Haofen

WangMatchingrule(frequentsetmining):baidu:xandhudong:xarematched,iff.valueOf(baidu:標簽)=valueOf(hudong:中文學名)andvalueOf(baidu:拉丁學名)=valueOf(hudong:二名法)andvalueOf(baidu:綱)=valueOf(hudong:綱)EntityMatchingFrom

Haofen

WangApplyingtheobtainedrule(s)ontheunlabeleddatatogeneratematches’candidates.Thecombinerisusedtocombineconfidencevaluesofamatch’scandidate.EntityMatchingFrom

Haofen

WangType

InferenceTypeinformationstatingthataninstanceisofacertaintype(e.g.Chinaisaninstanceofcountry)isanimportantcomponentofknowledgebasesGivenanapplication

scenario—QuestionAnswering.Question:WhoistheNobellaureateinliteratureofpeople’s

republicofChina?Answer:Moyan.Howtogettheanswer?

MoyanInstanceOf

Nobellaureateofpeople’srepublicofChinaTianxingWu,ShaoweiLing,GuilinQi,HaofenWang:MiningTypeInformationfromChineseOnlineEncyclopedias.JIST2014:213-229The4th

JointInternationalSemanticTechnologyConferenceApproach

InChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.

“TimBerners-Lee”hasseveralcategories:“Englishcomputerscientists”,“PeopleassociatedwithCERN”,“EnglishexpatriatesintheUnitedStates”,“LivingPeople”,“WorldWideWebConsortium”

The4th

JointInternationalSemanticTechnologyConferenceApproachInChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.

Givenanexample:

Giventhearticlepagesof“China”inBaiduBaike,Hudong

BaikeandChineseWikipedia,itscategoriesareasfollows:

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.

Butanotherproblemis:categoryattributesarenotabundantlyavailable.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)ExplicitIsARelationDetector:DetectexplicitinstanceOfandsubclassOfrelationsCategoryAttributesGenerator:

GenerateattributesforcategorieswithanattributepropagationalgorithmInstanceTypeRanker:

Rankcandidatetypeswithagraph-basedrandomwalkmethod

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfrominfoboxes

I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetThe4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfrominfoboxes

I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetvkinstanceOfakExample:<director,StevenSpielberg>

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObjectThe4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject邁克爾·喬丹instanceOf籃球運動員MichaelJeffreyJordanBasketballPlayer

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitSubclassOfRelationDetection

GeneratecandidateSubclassOfcategorypairsintheformof(sub-category,category)basedonthecategorysystem.Checkwhetherthe(sub-category,category)sharethesamelexicalhead

withPOStagging.

Foreach(sub-category,category),checkwhetherthecategoryisaparentconceptofthesub-categoryinZhishi.schema[Wangetal.,2014]江蘇學校(schoolinJiangSu)subclassOf中國學校(schoolinChina)The4th

JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorWetakeattributesininfoboxtemplatesasexistingcategoryattributes

andattributesininfoboxofarticlepagesasinstanceattributes.

WeconstructaCategoryGraphcomposedofallcategorieswithsubclassOfrelations.

WepropagateattributesovertheCategoryGraphleveragingexistingcategoryattributes,instanceattributes,identifiedinstanceOfandsubclassOfrelations.The4th

JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorTheattributepropagationalgorithmarebasedonfollowingrules:Rule1:Ifacategorychasattributesfrominfoboxtemplates,theseattributesshouldremainunchanged.Rule2:Ifacategorychassomeinstanceswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfoftheseinstances.Rule3:Ifacategorychassomechildcategorieswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfofthesechildcategories.Rule4:Ifparentcategoriesofacategorychaveattributes,alltheattributesshouldbeinheritedbyc.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeorganizeeachgiveninstance,itsattributesandcategories(i.e.candidatetypes)ofthecorrespondingarticlepageintoanInstanceGraph.WegroupsynonymousattributeswithBabelNetbeforeconstructingallInstanceGraphs.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerThe4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeassumethatthefewercategoriesanattributebelongsto,themorerepresentativetheattributeis.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRanker

Whenexecutingarandomstepfromthegiveninstancetooneofitsattributes,thewalktendstochoosethemostrepresentativeattributeinordertowalktothecorrectcategories.Whenexecutingarandomstepfromanattributetotheoneofthecategoriesinthearticlepage,thecategoriescontainingthisattributehaveequalopportunity.The4th

JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",or"Unknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th

JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",orUnknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th

JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)The4th

JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)TechnologiesofKnowledgeBaseConstructionWebAccesstoZhishi.me

http://zhishi.me/apiSchedule

of

My

Talk百科知識圖譜構建佛學知識圖譜構建Framework(takeBuddhistfiguresastheexample)KnowledgeCollectionCategory方法人工觀察百科中與佛教人物相關的分類抽取佛教人物分類下所有文章對應的實體命名規(guī)則方法

例:

“.+菩薩”“.+禪師”維基百科“佛教頭銜”分類下的所有實體已抽取出的實體名中高頻的公共字符串KnowledgeFusion主語融合實體的“別名”屬性和重定向作為實體的別名集合不同來源的實體存在一個完全匹配的別名則認為是相同實體人工檢查相同實體數(shù)多于三個的映射百度百科:互動百科:維基百科:{確吉堅贊,班禪額爾德尼·確吉堅贊,羅桑赤烈倫珠}{班禪額爾德

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論