版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、1,Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval,Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Ying Zhang and Phil Vines,2,Motivation Objective Previous work Methodology Experiments and results Conclusions,Outline,3,Motivation,One of the major remainin
2、g reasons that CLIR does not perform as well as monolingual retrieval is the presence of out of vocabulary (OOV) terms. it will not be recognized, and segmented into either smaller sequences of characters or individual characters 北野武(north limit military) Previous work has either relied on manual in
3、tervention or has only been partially successful in solving this problem.,4,Objective,We propose a segmentation free method which can be applied to both Chinese-English and English-Chinese CLIR, correctly extracting translations of OOV terms from the Web automatically, and thus is a significant impr
4、ovement on earlier work,5,English translation extraction in Chinese-English CLIR,Chinese OOV term detection 北野武(north limit military) Pvalue given by the HMM will be very low if Pvalue Pmin contains OOV terms web text extraction we extract strings that contain the Chinese query terms and some Englis
5、h text from the Web. collection of co-occurrence statistics, translation selection.,北野武(Kitano Takeshi)c4 c5 c6 e 1 導(dǎo)演北野武 ( Kitano Takeshi)c2 c3 c4 c5 c6 e1,6,Chinese translation extraction in English-Chinese CLIR,Extraction of web text use Google to fetch the top100 Chinese documents with the Engli
6、sh OOV term eoov as the query. Collection of co-occurrence statistics accumulate the frequency foov. considering all substrings in Sleft and Sright, and collecting the frequency fn and the length |sn| of each Chinese substring. Translation selection exclude any substring that already in the translat
7、ion dictionary doesnt occur in the document collection,7,Experiments and results,Chinese-English CLIR,English-Chinese CLIR,8,Introduction,When translating from Chinese to English, a standard first step is to segment the text into words based on an existing segmentation dictionary. However where an O
8、OV term occurs, it will not be recognized, and segmented into either smaller sequences of characters or individual characters. We propose a segmentation free method based on frequency and length analysis and corpus-based disambiguation,9,Previous work,Dictionary-based translation schemes need to add
9、ress three major issues phrase identification and translation ex. non proliferation treaty and cross straits. translation ambiguity using techniques such as term co-occurrence , mutual information or language modeling. out of vocabulary (OOV) terms. ex. Dioxin,10,Previous work- Existing approaches t
10、o the OOV problem,Depending on the language, it may be possible to deduce appropriate transliterated translations automatically. that they successfully applied in English-Arabic CLIR. However the issue is more difficult in Chinese as many characters have the same sound, and many English syllables do
11、 not have equivalent sounds in Chinese, meaning that selecting the correct characters to represent a transliterated word can be problematic. cross straits(兩岸)、北野武(north limit military),11,Previous work- Segmentation free translation extraction,It is common to find a small amount of English text in C
12、hinese web documents, but extremely rare to find Chinese text in English web documents. We therefore rely on Chinese web documents to extract translations in both directions. The problem is that the Chinese OOV term we are looking for is currently unknown, and thus we have no information about how i
13、t should be segmented. In previous work, this problem was overcome by manual intervention to provide appropriate segmentation.,12,Experiments and results,Chinese-English CLIR retrieving English documents using Chinese queries.,13,Experiments and results (cont.),English-Chinese CLIR retrieving Chinese documents using English queries. The aim of our work is to find appropriate Chinese translations of English OOV terms,14,Conclusions,We have also described improved ways to extract the translation of OOV terms from the Web in a way that does not rely on prior segmentation. Although
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 配料熔制工復(fù)試能力考核試卷含答案
- 印前處理和制作員安全文明競賽考核試卷含答案
- 紫膠生產(chǎn)工安全技能測試評優(yōu)考核試卷含答案
- 計(jì)算機(jī)及外部設(shè)備裝配調(diào)試員安全演練測試考核試卷含答案
- 林木采伐工安全演練考核試卷含答案
- 靜電成像顯影材料載體制造工安全應(yīng)急知識考核試卷含答案
- 汽車零部件再制造修復(fù)工崗前創(chuàng)新應(yīng)用考核試卷含答案
- 橋梁工程課件培訓(xùn)
- 酒店客房設(shè)施設(shè)備更新與替換制度
- 酒店餐飲部食品安全管理規(guī)范制度
- 企業(yè)中長期發(fā)展戰(zhàn)略規(guī)劃書
- 道路運(yùn)輸春運(yùn)安全培訓(xùn)課件
- IPC-6012C-2010 中文版 剛性印制板的鑒定及性能規(guī)范
- 機(jī)器人手術(shù)術(shù)中應(yīng)急預(yù)案演練方案
- 2025年度護(hù)士長工作述職報(bào)告
- 污水處理藥劑采購項(xiàng)目方案投標(biāo)文件(技術(shù)標(biāo))
- 醫(yī)院信訪應(yīng)急預(yù)案(3篇)
- 2025年領(lǐng)導(dǎo)干部任前廉政知識測試題庫(附答案)
- 安徽省蚌埠市2024-2025學(xué)年高二上學(xué)期期末學(xué)業(yè)水平監(jiān)測物理試卷(含答案)
- 全國網(wǎng)絡(luò)安全行業(yè)職業(yè)技能大賽(網(wǎng)絡(luò)安全管理員)考試題及答案
- 2025及未來5年中國血康口服液市場調(diào)查、數(shù)據(jù)監(jiān)測研究報(bào)告
評論
0/150
提交評論