版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、一.1. NLP中兩種派系Rational和Statistal的基本觀點(diǎn)和方法Symbolic approach:Encode all the required information into computer(rationalism).linguistic knowledge(static knowledge,context-dependent knowledge).world knowledge(uniqueness of reference ,type of noun ,situational associativity between noun)Statistic approach:
2、infer language properties from language samples(empiricism)Collect a large collection of texts relevant to your domainFor each noun, compute its probability to take a certain determinerP(determiner|noun)=Given a new noun ,select a determiner with the highest likelihood as estimated on the training c
3、orpus2. 給定“自然”兩個字的三種編碼,判斷是Big5,GB2312或者UTF-8,說明理由Reason:Big5:-the first byte ranges from 0xA0-0xF9,-the second byte range from 0x40-0x7e,0xA0 to 0xFE-ASCII characters are still represented with a single byte-the MSB of the first byte of a Big5 character is always 1-Big5 is an 8-bit encoding with a 1
4、5-bit code spaceGB2312:-contained only one code point for each character-MSB.bit-8 of each byte, is set to 1,and therefore becomes a 8-bit character. Otherwise,the byte is interpreted as ASCII-every Chinese character is represented by a two-byte code.the MSB of both the first and second bytes are se
5、t 自 然 語 言 處 理GB2312 D7D4 C8BB D3EF D1D4 B4A6 C0EDBig5 A6DB B54D BB79 A8A5 B342 B27AUTF-8 E887AA E784B6 E8AFAD E8A880 E5A484 E790863. 給定五個中文詞語,判斷屬于哪種構(gòu)詞法(注意英文術(shù)語)Modified noun compound(大人,小人,熱心,水手,黑板,去年)Modified verb compound(寄生,飛馳,雜居,火葬,面授,單戀)Coordinative compound (報告,聲音,奇怪,幫助,學(xué)習(xí),購買)Antonymous compoun
6、ds(買賣,左右,高矮,大小,開關(guān),長短)Verb-object compound(放心,鼓掌,動員,司機(jī),主席,干事)Verb complement compound(進(jìn)來,進(jìn)去,介入, 改良, 打破, 推翻)Subject-predicate compound(地震,心疼,民主,自決,膽小,年輕)Noun-measure complement compounds(人口,羊群,書本,花朵,槍支)Modifier-noun (情人節(jié),小說家,加油站,大學(xué)生,金黃色)Verb-object tri-syllabic compound(開玩笑,吹牛皮,吃豆腐)Subject-verb-object
7、(膽結(jié)石,鬼畫符,鬼打墻)Descriptive +noun (棒棒糖,乒乓球,呼啦圈)4. 給出三種Structural Ambiguities的詞語的例子(Overlapping,Combinatorial,Mixed) Overlapping ambiguity(交集型歧義)網(wǎng)球場,美國會Combinatorial ambiguity(組合型歧義)才能,學(xué)生會Mixed type(混合型歧義)太平洋,太平,平淡5. Write down three types of feature of unknown words-abbreviation (國考-國家公務(wù)員考試)-proper nam
8、e/Name Es of people 小月月.names of places 延坪島.name of organization 上海合作組織-derived words:(審計(jì)人,審計(jì)員,審計(jì)局,審計(jì)處)-compounds:(光敏感,流體力學(xué))-Numeric type compounds:(五月三日,八點(diǎn)十分,第一)第二題:信息論(1) 熵是什么意思?What is the entropyDefined by the second law of thermodynamicsA measure of the energy not available for work i
9、n a thermodynamic processA closed system always tends towards achieving a state with a maximum of entropy1. 針對Limited substitutability, limited modifiability, Limited extent compositional,分別給出兩個Quantitative Features-synonymy substitution and ratio-feature characterizes the distribution significance
10、of how two words co-occur at different positions -the number of peak co-occurrence開門,斗志昂揚(yáng)2. WordNet是如何識別單詞的不同意思的?HowNet和Tong Yi Cilin又是如何識別單詞的不同意思的?請區(qū)分wordnet 和hownet 對詞語的語義進(jìn)行描述的方法差異WordNet:-follow different grammatical rules-every synset contains a group of synonymous words or collocations-differen
11、t senses of a word are in different synsets-the meaning of the synsets is further clarified with short defining glossing -synsets are connected to other synsets via a number of semantic relationsHowNet:-the concept definition in hownet is based on sememes-sememes are in a structured marked language
12、-hownet constructs a graph structure of its knowledge base on the inter-concept-the representation in based on concepts denoted by words and expressions in both Chinese and EnglishTong Yi Cilin:-this hierarchical structure reflects the semantic relationship between words-each minor semantic cluster
13、consists of a set of words-words under the same minor semantic cluster share the concept of this class3. WordNet把單詞劃分成synsets,那它是怎么建立synsets之間的聯(lián)系的,例如Nouns和Adjectives. synsets are connected to other synsets via a number of semantic relations4.Homonyms,Antonyms 反義,Hypernymy 上位,Hyponomy 下位,Holonymy整體,當(dāng)
14、然了,考試是沒有給出中文的。Homonyms(同音): one of a group of words that share the same spelling and the same pronunciation but have different meaningAntonyms(反義):different words having contradictory or contrary meanings Synonyms(同義):different words having similar or identical meaningsHypernymy(上位):the semantic rel
15、ation of being super-ordinate or belong to a higher rank or classHyponomy(下位):the semantic relation of being sub-ordinate or belong to a lower rank or classHolonymy(整體) :a word that defines the relationship between a term denoting the whole and a term denoting Meronym(部分) :a word that names a part o
16、f a large whole Metonymy(轉(zhuǎn)指):a figure of speech in which a concept is referred to by the name of something closely associated with that conceptProposition: it refers to the meaning of a statement解釋他們的意思,并給出例子;(解釋的話應(yīng)該是英文,例子是中文吧)4. (好像是Thomas還是誰)關(guān)于Word sense ambiguity的兩個基本假設(shè)是什么 one sense per collocati
17、on,one sense per discourse6, 請區(qū)分在語言建模中平滑(smoothing)和線性插值(linear interpolation)方法在處理零概率情況時的不同7,請解釋詞語消歧中常用的兩個假設(shè) one sense per collocation 和one sense per discourseone sense per collocation:nearby words provide strong and consistent clues to the sense of a target word, conditional on relative distance ,
18、order and syntactic relationshipone sense per discourse: the sense of a target word is highly consistent within any given document-true for topic dependent words -not true for verbs8,請回答關(guān)于隱馬爾可夫模型(Hidden Markov Model)的下列問題:(1) 寫出馬爾科夫模型的三個基本元素和三個基本問題Markov assumption:三個基本元素1. Evaluation problem2. Deco
19、ding problem3. Learning problem(2) 請描述Viterbi 算法的基本思想,同時回答該算法是針對隱馬爾可夫模型的哪個基本問題 (Decoding problem)General idea:If best path ending in goes through then it should coincide with best path ending in 9.請描述基于窗口的搭配抽取算法(window-based collocation extraction)的基本思想和三類常用的特征-Based on the property of collocation:
20、Recurrent and habitual use-For a given headword, collect all of the co-words surrounding this headword within fixed context windows-identifies the word combinations with statistical lexical significance as collocation-Features based on lexical co-occurrence frequency significance-Features based on lexical co-occurrence distribution significance-Features based on context10. 請描述k-means聚類算法的基本思路11,她 是 漂亮 女孩,按照自底向上和自頂向下的規(guī)則,識別該句子12, K-NN和K-means哪個是用來聚類的,哪個是用來分類的,分別描述算法原理 KNN13給出三個
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026年上饒幼兒師范高等??茖W(xué)校單招綜合素質(zhì)筆試模擬試題附答案詳解
- 2026年德陽城市軌道交通職業(yè)學(xué)院高職單招職業(yè)適應(yīng)性考試備考題庫有答案解析
- 2026年安徽衛(wèi)生健康職業(yè)學(xué)院單招綜合素質(zhì)考試備考題庫帶答案解析
- 2026年河北能源職業(yè)技術(shù)學(xué)院高職單招職業(yè)適應(yīng)性測試模擬試題有答案解析
- 2026年合肥職業(yè)技術(shù)學(xué)院單招職業(yè)技能考試模擬試題帶答案解析
- 投資合作框架協(xié)議2025年合作條款
- 體檢隱私保護(hù)合同(2025年協(xié)議范本)合同三篇
- 2026年河南地礦職業(yè)學(xué)院單招綜合素質(zhì)筆試模擬試題帶答案解析
- 2026年保定幼兒師范高等??茖W(xué)校單招綜合素質(zhì)筆試參考題庫帶答案解析
- 2026年鄂爾多斯生態(tài)環(huán)境職業(yè)學(xué)院高職單招職業(yè)適應(yīng)性考試備考試題帶答案解析
- 醫(yī)保藥械管理制度內(nèi)容
- 商業(yè)地產(chǎn)投資講座
- 機(jī)房動力環(huán)境監(jiān)控系統(tǒng)調(diào)試自檢報告
- 電網(wǎng)勞務(wù)分包投標(biāo)方案(技術(shù)方案)
- 2023年北京第二次高中學(xué)業(yè)水平合格考化學(xué)試卷真題(含答案詳解)
- NB-T20048-2011核電廠建設(shè)項(xiàng)目經(jīng)濟(jì)評價方法
- 4第四章 入侵檢測流程
- 鈀金的選礦工藝
- JCT640-2010 頂進(jìn)施工法用鋼筋混凝土排水管
- 赤壁賦的議論文800字(實(shí)用8篇)
- 輸變電工程技術(shù)標(biāo)書【實(shí)用文檔】doc
評論
0/150
提交評論