版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、信息過濾(Information Filtering,IF)綜述,中科院計(jì)算所軟件室 王斌 2001.12.10,主要內(nèi)容,IF的基本概念 IF系統(tǒng)的分類 IF系統(tǒng)的組成 IF系統(tǒng)的評(píng)估 IF的現(xiàn)狀及發(fā)展趨勢(shì),一、基本概念,定義,IF定義: 從動(dòng)態(tài)的信息流中將滿足用戶興趣的信息挑選出來,用戶的興趣一般在較長(zhǎng)一段時(shí)間內(nèi)不會(huì)改變(靜態(tài))。 Selective Dissemination of Information(SDI),來自圖書館領(lǐng)域。 Routing,來自Message Understanding。 Current Awareness, Data Mining,IF vs IR/分類/IE
2、,IF&IR:廣義地講,IF是IR的一部分 Database動(dòng)態(tài),需求靜態(tài);Database靜態(tài),需求靜態(tài) User Profile vs Query IF用戶要對(duì)系統(tǒng)有所了解,IR不需要。 IF要涉及到用戶建模/個(gè)人隱私等社會(huì)問題 IF&Categorization Categorization中的Category不會(huì)經(jīng)常改變。相對(duì)而言,User Profile會(huì)動(dòng)態(tài)變化 IF&IE IF關(guān)心相關(guān)性,IE只關(guān)心抽取的那些部分,不管相關(guān)性,IF applications,Internet Search Results Filter Personal Email Filter List Serv
3、er/Newsgroup Filter Browser Filter Filter for children Filter for customers: recommendation,二、IF分類體系,IF分類示意圖,Initiative of operation,Active IF systems Collect and send relevant info to users Push to users Info overload, so make accurate user profile Passive IF systems Not collect info for users Emai
4、l or Usenet news,Location of operation,At the info source Post profiles to info provider Clipping service Usually pay fee At a filtering server Info provider send info to server Serve distributed info to users At the user site Local filtering system Such as outlook & Netscape Email & Foxmail,Filteri
5、ng approach,Cognitive filtering Content-based filtering Document content vs user profiles Sociological filtering Collaborative filtering, or properties-based filtering Similarity between users Recommendation systems User modeling & User clustering Complement for content-based systems,Methods of acqu
6、iring knowledge about users,Explicit approach User interrogation Filling forms Implicit approach Recording user behavior Time/times/context/activity(save/discard/print/browsing/click)/etc. Explicit & Implicit approach Document space (case-based) Stereotypic inference(predefined default profile,then
7、change during scanning),三、IF系統(tǒng)的組成,一般組成,(d) Learning Component,User,Information Provider,(b) Filtering Component,(a) Data Analyzer Component,(c) User-Model Component,updates,feedback,relevant data items,represented data items,data items,personal details,user profile,Data-analyzer component,Be close t
8、o the info provider Obtain or collect data from the info provider Analyze & represent documents(such as Boolean Model, VSM, etc) Pass the representation to the filtering component,User-model component,Gather info about users(explicitly and/or implicitly) Construct the user profiles or other user mod
9、els(rules, VSM, documents center) Pass the user models to the filtering component User models must be suitable for the document representation,Filtering component,The heart of the IF system Match the user profiles with the represented data items Decision may be binary or probabilistic (ordered by ra
10、nk) The selected items relevancy can be determined by the user The relevancy info can be sent to the learning component (feedback info),Learning component,To improve further filtering Detect shifts in users interests Update the user-model,Two concepts used in IF systems,System based on the statistic
11、al concept System based on the knowledge-based concept,Statistical concept,User-model component: Profile is a weighted-vector of index terms(such as: VSM, LSI) Filtering component Correlation, Cosine measure Robertson&Sparck-Jones formula (PRM) (nave) Bayesian classifier Learning component Feedback,
12、 query reconstruction(such as: Rocchio),Knowledge-based concept,Rule-based and Semantic-nets filtering systems: Rule (if . Then take action), obsolescence problem User profile represents by semantic-net (wordnet) Neural-network filtering systems Genetic-based filtering systems,User modeling for IF s
13、ystems,Acquisition of the data for the model Implicit approach: observation of user behavior Explicit approach: fill forms, interact (feedback) Data included in the model Shallow semantics: keywords Enhanced user model, high level knowledge about the user(background past experience) Semantic network
14、s/Stereotypic inference/Statistical inference on the relationship between words in docs Underlying Architecture Agent/neural networks for auto inferred model VSM/LSI for explicit inference Concept model for intelligent systems Keyword system for statistically-based systems,Learning in IF systems,Met
15、hods of Learning Learning by observation Learning by feedback User-training learning Frequency of learning Critical learning Periodic learning,四、IF系統(tǒng)的評(píng)估,Methods & Measures,Evaluation methods of IF systems,Evaluation by Experiments Evaluation by Simulation: such as TREC Analytical Evaluation,Measures
16、 of evaluation of IF systems,Simple Precision & Recall Statistical Measurements Correlation(User evaluation vs. System evaluation): Rank vector Set-based Measurements Utility=(A*R+)+(B*N+)+(C*R-)+(D*N-), Normalize ASP(average set precision)=P*R, if P or R=0, ASP is not suitable User-oriented Measure
17、s Coverage Ratio=|Rk|/|U|=|AU|/|U|, Rk is the number of documents known to the user Novelty=|Ru|/(|Ru|+|Rk|),五、IF的現(xiàn)狀及發(fā)展趨勢(shì),Current situation,IF system is indispensable But IF system is unreliable Commercial IF systems relevancy is about 50% Results of the TREC experiments are poor User prefers to rea
18、d non-relevant info, fear the loss of important info Still many things to do to improve the effectiveness of IF systems,User modeling,Integrate several methods to model the users(Not only keywords, but also property of users and other parameters) Profile updating & updating time Include a learning m
19、odule Queries formulation and tracking their changes over time,Filtering techniques,Goal: get more relevant docs, although get some non-relevant docs Combining several methods Research directions: Intelligent agents: decentralized, based on trust,evolve, compete & collaborate Visualization techniques: map Variety of multiple implicit resources on user behavior: open profiling standard Filtering of multimedia repositories:VOD, not t
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 超聲科預(yù)約制度
- 請(qǐng)大家認(rèn)真查閱報(bào)銷制度
- 養(yǎng)老機(jī)構(gòu)后勤管理制度
- 2025 小學(xué)四年級(jí)科學(xué)下冊(cè)土壤有機(jī)質(zhì)來源與作用講解課件
- 零配件技術(shù)型銷售
- 2026年聊城東阿縣初級(jí)綜合類事業(yè)單位公開招聘人員(37人)備考考試試題附答案解析
- 2026廣西防城港市東興市商務(wù)和口岸管理局、東興海關(guān)招聘1人(第二批)參考考試題庫(kù)附答案解析
- 2026浙江臺(tái)州玉環(huán)農(nóng)商銀行招聘17人參考考試題庫(kù)附答案解析
- 2026年亳州利辛縣中醫(yī)院招聘護(hù)士8名參考考試題庫(kù)附答案解析
- 2026湖南懷化溆浦縣衛(wèi)生健康局公益性崗位招聘?jìng)淇伎荚囶}庫(kù)附答案解析
- 暫緩行政拘留申請(qǐng)書
- 小學(xué)班主任經(jīng)驗(yàn)交流課件
- 變配電室工程施工質(zhì)量控制流程及控制要點(diǎn)
- 國(guó)有企業(yè)合規(guī)管理
- 膀胱全切回腸代膀胱護(hù)理
- 公司個(gè)人征信合同申請(qǐng)表
- 示波器說明書
- 談心談話記錄100條范文(6篇)
- 微電影投資合作協(xié)議書
- 排水管道溝槽土方開挖專項(xiàng)方案
- GB/T 5277-1985緊固件螺栓和螺釘通孔
評(píng)論
0/150
提交評(píng)論