版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
百科和佛學知識圖譜構建技術介紹
漆桂林東南大學認知智能研究所Schedule
of
My
Talk百科知識圖譜構建技術佛學知識圖譜構建技術IntroductionofKnowledgeBasesWhatisknowledge?Facts,information,descriptions,orskillsAcquiredthroughexperienceoreducationbyperceiving,discovering,orlearningKnowledgebase:anorganizedrepositoryofknowledgeconsistingofconcepts,instances,relations(properties),facts,rulesetc.Isaprincipalpartofexpertsystems“thepowerofanAIprogramcametobeseenaslargelyinitsknowledgebase”EdwardFeigenbaum,1994ACMTuringAwardDevelopmentofKnowledgeBaseinRecentDecades1985199019952000(#$capitalCity#$France#$Paris)student
enrollee
person35millionarticlesin288differentlanguages…15thousandconcepts600millioninstances20billionfacts200520102012NELLGoogle
Knowledge
Graph
(KG)It
isanewgenerationofintelligentsearchtechnology,whichenablesyoutosearchforthings,notstringsFormal
definition:
a
knowledge
graph
is
a
knowledge
base
with
graph
structure,
where
the
nodes
are
instances
or
concepts,
and
edges
are
relations
between
themIt
is
a
special
semantic
networkIt
belongs
to
knowledge
engineering中興通訊上市公司非上市公司子公司中興康訊Acacia(IPO中)卓翼科技美國高通共進股份宇順電子美國博通供應商客戶競爭對手合作伙伴中國移動英特爾華為中國聯(lián)通大富科技華星創(chuàng)業(yè)盛路通信超聲電子ExampleKG
and
Semantic
Search
Go
deeper
and
broaderTechnologiesofKnowledgeBaseConstructionBaiduHudongZh-WikipediaKnowledge
Graph
(KG)ConstructionfromOnlineEncyclopediasWell-knownopenknowledgegraphssuchasDBpedia,YagoandZhishi.mearebuiltfromonlineencyclopedias.Technologies
ofencyclopedicknowledgegraphconstruction:DataextractionEntitymatchingTypeinferenceZhishi.meZhishi.me(http://zhishi.me)isthefirstefforttopublishlargescaleChinesesemanticdataandlinkthemtogetherasaChineseLinkingOpenData(CLOD).OverviewofZhishi.meCurrently,itconsistsofstructureddataextractedfromthreelargestChineseencyclopediasites:BaiduBaikeHudongBaikeChineseWikipediaItnow
has
over
10
milliondistinctinstancesand200millionRDFtriples,
and
can
be
accessed
by
online
API,
lookup
service
and
SPARQL
endpoint.LabelsAbstractsRedirectsImagesrdfs:labelzhishi:abstractrdfs:commentdbpedia:abstractzhishi:pageRedirectszhishi:thumbnailDataExtractionXingNiu,XinruoSun,HaofenWang,ShuRong,GuilinQi,YongYu:Zhishi.me-WeavingChineseLinkingOpenData.ISWC2011:205-220infoboxPropertieshttp://zhishi.me/[sourceName]/property/[propertyName]http://zhishi.me/baidubaike/property/中文名稱“南京”@zhDataExtractionInternalLinkszhishi:internalLinkzhishi:categoryskos:broaderDataExtractionEntityMatchingBaidu:北京Zh-Wiki:北京市EquivalententitiesEntityMatchingAutomaticallydiscoveringandrefiningdataset-specificmatchingrulesiniterationsDerivingtheserulesbyfindingthemostdiscriminativedatacharacteristicsforagivendatasourcepair,
e.g.(baidu:北京,Zh-wiki:北京市).From
Haofen
WangForeachpairofexistingmatchedinstances,theirproperty-valuepairsaremerged.ValuesProperty_1Property_2“大熊貓”baidu:標簽hudong:中文學名“Ailuropodamelanoleuca”baidu:拉丁學名hudong:二名法“白鰭豚”baidu:標簽hudong:中文學名“桂花”baidu:標簽hudong:中文學名………EntityMatchingFrom
Haofen
WangMatchingrule(frequentsetmining):baidu:xandhudong:xarematched,iff.valueOf(baidu:標簽)=valueOf(hudong:中文學名)andvalueOf(baidu:拉丁學名)=valueOf(hudong:二名法)andvalueOf(baidu:綱)=valueOf(hudong:綱)EntityMatchingFrom
Haofen
WangApplyingtheobtainedrule(s)ontheunlabeleddatatogeneratematches’candidates.Thecombinerisusedtocombineconfidencevaluesofamatch’scandidate.EntityMatchingFrom
Haofen
WangType
InferenceTypeinformationstatingthataninstanceisofacertaintype(e.g.Chinaisaninstanceofcountry)isanimportantcomponentofknowledgebasesGivenanapplication
scenario—QuestionAnswering.Question:WhoistheNobellaureateinliteratureofpeople’s
republicofChina?Answer:Moyan.Howtogettheanswer?
MoyanInstanceOf
Nobellaureateofpeople’srepublicofChinaTianxingWu,ShaoweiLing,GuilinQi,HaofenWang:MiningTypeInformationfromChineseOnlineEncyclopedias.JIST2014:213-229The4th
JointInternationalSemanticTechnologyConferenceApproach
InChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.
“TimBerners-Lee”hasseveralcategories:“Englishcomputerscientists”,“PeopleassociatedwithCERN”,“EnglishexpatriatesintheUnitedStates”,“LivingPeople”,“WorldWideWebConsortium”
The4th
JointInternationalSemanticTechnologyConferenceApproachInChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.
Givenanexample:
Giventhearticlepagesof“China”inBaiduBaike,Hudong
BaikeandChineseWikipedia,itscategoriesareasfollows:
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.
Butanotherproblemis:categoryattributesarenotabundantlyavailable.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)ExplicitIsARelationDetector:DetectexplicitinstanceOfandsubclassOfrelationsCategoryAttributesGenerator:
GenerateattributesforcategorieswithanattributepropagationalgorithmInstanceTypeRanker:
Rankcandidatetypeswithagraph-basedrandomwalkmethod
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfrominfoboxes
I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetThe4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfrominfoboxes
I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetvkinstanceOfakExample:<director,StevenSpielberg>
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObjectThe4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject邁克爾·喬丹instanceOf籃球運動員MichaelJeffreyJordanBasketballPlayer
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitSubclassOfRelationDetection
GeneratecandidateSubclassOfcategorypairsintheformof(sub-category,category)basedonthecategorysystem.Checkwhetherthe(sub-category,category)sharethesamelexicalhead
withPOStagging.
Foreach(sub-category,category),checkwhetherthecategoryisaparentconceptofthesub-categoryinZhishi.schema[Wangetal.,2014]江蘇學校(schoolinJiangSu)subclassOf中國學校(schoolinChina)The4th
JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorWetakeattributesininfoboxtemplatesasexistingcategoryattributes
andattributesininfoboxofarticlepagesasinstanceattributes.
WeconstructaCategoryGraphcomposedofallcategorieswithsubclassOfrelations.
WepropagateattributesovertheCategoryGraphleveragingexistingcategoryattributes,instanceattributes,identifiedinstanceOfandsubclassOfrelations.The4th
JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorTheattributepropagationalgorithmarebasedonfollowingrules:Rule1:Ifacategorychasattributesfrominfoboxtemplates,theseattributesshouldremainunchanged.Rule2:Ifacategorychassomeinstanceswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfoftheseinstances.Rule3:Ifacategorychassomechildcategorieswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfofthesechildcategories.Rule4:Ifparentcategoriesofacategorychaveattributes,alltheattributesshouldbeinheritedbyc.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeorganizeeachgiveninstance,itsattributesandcategories(i.e.candidatetypes)ofthecorrespondingarticlepageintoanInstanceGraph.WegroupsynonymousattributeswithBabelNetbeforeconstructingallInstanceGraphs.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerThe4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeassumethatthefewercategoriesanattributebelongsto,themorerepresentativetheattributeis.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRanker
Whenexecutingarandomstepfromthegiveninstancetooneofitsattributes,thewalktendstochoosethemostrepresentativeattributeinordertowalktothecorrectcategories.Whenexecutingarandomstepfromanattributetotheoneofthecategoriesinthearticlepage,thecategoriescontainingthisattributehaveequalopportunity.The4th
JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",or"Unknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th
JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",orUnknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th
JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)The4th
JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)TechnologiesofKnowledgeBaseConstructionWebAccesstoZhishi.me
http://zhishi.me/apiSchedule
of
My
Talk百科知識圖譜構建佛學知識圖譜構建Framework(takeBuddhistfiguresastheexample)KnowledgeCollectionCategory方法人工觀察百科中與佛教人物相關的分類抽取佛教人物分類下所有文章對應的實體命名規(guī)則方法
例:
“.+菩薩”“.+禪師”維基百科“佛教頭銜”分類下的所有實體已抽取出的實體名中高頻的公共字符串KnowledgeFusion主語融合實體的“別名”屬性和重定向作為實體的別名集合不同來源的實體存在一個完全匹配的別名則認為是相同實體人工檢查相同實體數(shù)多于三個的映射百度百科:互動百科:維基百科:{確吉堅贊,班禪額爾德尼·確吉堅贊,羅桑赤烈倫珠}{班禪額爾德
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 空姐的禮儀培訓
- 新聞稿撰寫培訓
- 新聞攝影培訓
- 2025-2030物聯(lián)網(wǎng)設備在智能家居中隱私保護規(guī)程
- 2025-2030物聯(lián)網(wǎng)智能硬件產(chǎn)品開發(fā)市場競爭態(tài)勢行業(yè)需求及智慧生活投資機遇分析報告
- 新聞宣傳培訓課件教學
- 2025-2030物流運輸行業(yè)發(fā)展分析投資需求競爭格局分析
- 2025-2030物流倉儲服務行業(yè)市場供需情況與資金投入評估規(guī)劃實施調研報告
- 空中種菜技術培訓課件
- 2024年豐城市衛(wèi)生系統(tǒng)考試真題
- ISO 9001(DIS)-2026與ISO9001-2015英文標準對照版(編輯-2025年9月)
- 2024譯林版七年級英語上冊知識清單
- 通信凝凍期間安全培訓課件
- 股東查賬申請書規(guī)范撰寫范文
- 腎囊腫護理查房要點
- 2025年掛面制造行業(yè)研究報告及未來發(fā)展趨勢預測
- 艾媒咨詢2025年中國新式茶飲大數(shù)據(jù)研究及消費行為調查數(shù)據(jù)
- 半導體安全培訓課件
- 頂管施工臨時用電方案
- 廣東省惠州市高三上學期第一次調研考英語試題-1
- 瀘州老窖釀酒有限責任公司釀酒廢棄物熱化學能源化與資源化耦合利用技術環(huán)評報告
評論
0/150
提交評論