信息過濾(Ination Filtering)綜述.ppt_第1頁(yè)
信息過濾(Ination Filtering)綜述.ppt_第2頁(yè)
信息過濾(Ination Filtering)綜述.ppt_第3頁(yè)
信息過濾(Ination Filtering)綜述.ppt_第4頁(yè)
信息過濾(Ination Filtering)綜述.ppt_第5頁(yè)
已閱讀5頁(yè),還剩31頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、信息過濾(Information Filtering,IF)綜述,中科院計(jì)算所軟件室 王斌 2001.12.10,主要內(nèi)容,IF的基本概念 IF系統(tǒng)的分類 IF系統(tǒng)的組成 IF系統(tǒng)的評(píng)估 IF的現(xiàn)狀及發(fā)展趨勢(shì),一、基本概念,定義,IF定義: 從動(dòng)態(tài)的信息流中將滿足用戶興趣的信息挑選出來,用戶的興趣一般在較長(zhǎng)一段時(shí)間內(nèi)不會(huì)改變(靜態(tài))。 Selective Dissemination of Information(SDI),來自圖書館領(lǐng)域。 Routing,來自Message Understanding。 Current Awareness, Data Mining,IF vs IR/分類/IE

2、,IF&IR:廣義地講,IF是IR的一部分 Database動(dòng)態(tài),需求靜態(tài);Database靜態(tài),需求靜態(tài) User Profile vs Query IF用戶要對(duì)系統(tǒng)有所了解,IR不需要。 IF要涉及到用戶建模/個(gè)人隱私等社會(huì)問題 IF&Categorization Categorization中的Category不會(huì)經(jīng)常改變。相對(duì)而言,User Profile會(huì)動(dòng)態(tài)變化 IF&IE IF關(guān)心相關(guān)性,IE只關(guān)心抽取的那些部分,不管相關(guān)性,IF applications,Internet Search Results Filter Personal Email Filter List Serv

3、er/Newsgroup Filter Browser Filter Filter for children Filter for customers: recommendation,二、IF分類體系,IF分類示意圖,Initiative of operation,Active IF systems Collect and send relevant info to users Push to users Info overload, so make accurate user profile Passive IF systems Not collect info for users Emai

4、l or Usenet news,Location of operation,At the info source Post profiles to info provider Clipping service Usually pay fee At a filtering server Info provider send info to server Serve distributed info to users At the user site Local filtering system Such as outlook & Netscape Email & Foxmail,Filteri

5、ng approach,Cognitive filtering Content-based filtering Document content vs user profiles Sociological filtering Collaborative filtering, or properties-based filtering Similarity between users Recommendation systems User modeling & User clustering Complement for content-based systems,Methods of acqu

6、iring knowledge about users,Explicit approach User interrogation Filling forms Implicit approach Recording user behavior Time/times/context/activity(save/discard/print/browsing/click)/etc. Explicit & Implicit approach Document space (case-based) Stereotypic inference(predefined default profile,then

7、change during scanning),三、IF系統(tǒng)的組成,一般組成,(d) Learning Component,User,Information Provider,(b) Filtering Component,(a) Data Analyzer Component,(c) User-Model Component,updates,feedback,relevant data items,represented data items,data items,personal details,user profile,Data-analyzer component,Be close t

8、o the info provider Obtain or collect data from the info provider Analyze & represent documents(such as Boolean Model, VSM, etc) Pass the representation to the filtering component,User-model component,Gather info about users(explicitly and/or implicitly) Construct the user profiles or other user mod

9、els(rules, VSM, documents center) Pass the user models to the filtering component User models must be suitable for the document representation,Filtering component,The heart of the IF system Match the user profiles with the represented data items Decision may be binary or probabilistic (ordered by ra

10、nk) The selected items relevancy can be determined by the user The relevancy info can be sent to the learning component (feedback info),Learning component,To improve further filtering Detect shifts in users interests Update the user-model,Two concepts used in IF systems,System based on the statistic

11、al concept System based on the knowledge-based concept,Statistical concept,User-model component: Profile is a weighted-vector of index terms(such as: VSM, LSI) Filtering component Correlation, Cosine measure Robertson&Sparck-Jones formula (PRM) (nave) Bayesian classifier Learning component Feedback,

12、 query reconstruction(such as: Rocchio),Knowledge-based concept,Rule-based and Semantic-nets filtering systems: Rule (if . Then take action), obsolescence problem User profile represents by semantic-net (wordnet) Neural-network filtering systems Genetic-based filtering systems,User modeling for IF s

13、ystems,Acquisition of the data for the model Implicit approach: observation of user behavior Explicit approach: fill forms, interact (feedback) Data included in the model Shallow semantics: keywords Enhanced user model, high level knowledge about the user(background past experience) Semantic network

14、s/Stereotypic inference/Statistical inference on the relationship between words in docs Underlying Architecture Agent/neural networks for auto inferred model VSM/LSI for explicit inference Concept model for intelligent systems Keyword system for statistically-based systems,Learning in IF systems,Met

15、hods of Learning Learning by observation Learning by feedback User-training learning Frequency of learning Critical learning Periodic learning,四、IF系統(tǒng)的評(píng)估,Methods & Measures,Evaluation methods of IF systems,Evaluation by Experiments Evaluation by Simulation: such as TREC Analytical Evaluation,Measures

16、 of evaluation of IF systems,Simple Precision & Recall Statistical Measurements Correlation(User evaluation vs. System evaluation): Rank vector Set-based Measurements Utility=(A*R+)+(B*N+)+(C*R-)+(D*N-), Normalize ASP(average set precision)=P*R, if P or R=0, ASP is not suitable User-oriented Measure

17、s Coverage Ratio=|Rk|/|U|=|AU|/|U|, Rk is the number of documents known to the user Novelty=|Ru|/(|Ru|+|Rk|),五、IF的現(xiàn)狀及發(fā)展趨勢(shì),Current situation,IF system is indispensable But IF system is unreliable Commercial IF systems relevancy is about 50% Results of the TREC experiments are poor User prefers to rea

18、d non-relevant info, fear the loss of important info Still many things to do to improve the effectiveness of IF systems,User modeling,Integrate several methods to model the users(Not only keywords, but also property of users and other parameters) Profile updating & updating time Include a learning m

19、odule Queries formulation and tracking their changes over time,Filtering techniques,Goal: get more relevant docs, although get some non-relevant docs Combining several methods Research directions: Intelligent agents: decentralized, based on trust,evolve, compete & collaborate Visualization techniques: map Variety of multiple implicit resources on user behavior: open profiling standard Filtering of multimedia repositories:VOD, not t

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論