北野武(north.ppt_第1頁
北野武(north.ppt_第2頁
北野武(north.ppt_第3頁
北野武(north.ppt_第4頁
北野武(north.ppt_第5頁
已閱讀5頁,還剩10頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、1,Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval,Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Ying Zhang and Phil Vines,2,Motivation Objective Previous work Methodology Experiments and results Conclusions,Outline,3,Motivation,One of the major remainin

2、g reasons that CLIR does not perform as well as monolingual retrieval is the presence of out of vocabulary (OOV) terms. it will not be recognized, and segmented into either smaller sequences of characters or individual characters 北野武(north limit military) Previous work has either relied on manual in

3、tervention or has only been partially successful in solving this problem.,4,Objective,We propose a segmentation free method which can be applied to both Chinese-English and English-Chinese CLIR, correctly extracting translations of OOV terms from the Web automatically, and thus is a significant impr

4、ovement on earlier work,5,English translation extraction in Chinese-English CLIR,Chinese OOV term detection 北野武(north limit military) Pvalue given by the HMM will be very low if Pvalue Pmin contains OOV terms web text extraction we extract strings that contain the Chinese query terms and some Englis

5、h text from the Web. collection of co-occurrence statistics, translation selection.,北野武(Kitano Takeshi)c4 c5 c6 e 1 導(dǎo)演北野武 ( Kitano Takeshi)c2 c3 c4 c5 c6 e1,6,Chinese translation extraction in English-Chinese CLIR,Extraction of web text use Google to fetch the top100 Chinese documents with the Engli

6、sh OOV term eoov as the query. Collection of co-occurrence statistics accumulate the frequency foov. considering all substrings in Sleft and Sright, and collecting the frequency fn and the length |sn| of each Chinese substring. Translation selection exclude any substring that already in the translat

7、ion dictionary doesnt occur in the document collection,7,Experiments and results,Chinese-English CLIR,English-Chinese CLIR,8,Introduction,When translating from Chinese to English, a standard first step is to segment the text into words based on an existing segmentation dictionary. However where an O

8、OV term occurs, it will not be recognized, and segmented into either smaller sequences of characters or individual characters. We propose a segmentation free method based on frequency and length analysis and corpus-based disambiguation,9,Previous work,Dictionary-based translation schemes need to add

9、ress three major issues phrase identification and translation ex. non proliferation treaty and cross straits. translation ambiguity using techniques such as term co-occurrence , mutual information or language modeling. out of vocabulary (OOV) terms. ex. Dioxin,10,Previous work- Existing approaches t

10、o the OOV problem,Depending on the language, it may be possible to deduce appropriate transliterated translations automatically. that they successfully applied in English-Arabic CLIR. However the issue is more difficult in Chinese as many characters have the same sound, and many English syllables do

11、 not have equivalent sounds in Chinese, meaning that selecting the correct characters to represent a transliterated word can be problematic. cross straits(兩岸)、北野武(north limit military),11,Previous work- Segmentation free translation extraction,It is common to find a small amount of English text in C

12、hinese web documents, but extremely rare to find Chinese text in English web documents. We therefore rely on Chinese web documents to extract translations in both directions. The problem is that the Chinese OOV term we are looking for is currently unknown, and thus we have no information about how i

13、t should be segmented. In previous work, this problem was overcome by manual intervention to provide appropriate segmentation.,12,Experiments and results,Chinese-English CLIR retrieving English documents using Chinese queries.,13,Experiments and results (cont.),English-Chinese CLIR retrieving Chinese documents using English queries. The aim of our work is to find appropriate Chinese translations of English OOV terms,14,Conclusions,We have also described improved ways to extract the translation of OOV terms from the Web in a way that does not rely on prior segmentation. Although

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論