自然與語言處理review

上傳人：我*** IP屬地：貴州上傳時間：2020-06-23 格式：DOC 頁數(shù)：14 大?。?85KB 積分：20 舉報 版權(quán)申訴

已閱讀5頁，還剩9頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、一.1. NLP中兩種派系Rational和Statistal的基本觀點(diǎn)和方法Symbolic approach:Encode all the required information into computer(rationalism).linguistic knowledge(static knowledge,context-dependent knowledge).world knowledge(uniqueness of reference ,type of noun ,situational associativity between noun)Statistic approach:

2、infer language properties from language samples(empiricism)Collect a large collection of texts relevant to your domainFor each noun, compute its probability to take a certain determinerP(determiner|noun)=Given a new noun ,select a determiner with the highest likelihood as estimated on the training c

3、orpus2. 給定“自然”兩個字的三種編碼，判斷是Big5,GB2312或者UTF-8，說明理由Reason:Big5:-the first byte ranges from 0xA0-0xF9,-the second byte range from 0x40-0x7e,0xA0 to 0xFE-ASCII characters are still represented with a single byte-the MSB of the first byte of a Big5 character is always 1-Big5 is an 8-bit encoding with a 1

4、5-bit code spaceGB2312:-contained only one code point for each character-MSB.bit-8 of each byte, is set to 1,and therefore becomes a 8-bit character. Otherwise,the byte is interpreted as ASCII-every Chinese character is represented by a two-byte code.the MSB of both the first and second bytes are se

5、t 自然語言處理GB2312 D7D4 C8BB D3EF D1D4 B4A6 C0EDBig5 A6DB B54D BB79 A8A5 B342 B27AUTF-8 E887AA E784B6 E8AFAD E8A880 E5A484 E790863. 給定五個中文詞語，判斷屬于哪種構(gòu)詞法（注意英文術(shù)語)Modified noun compound(大人，小人，熱心，水手，黑板，去年)Modified verb compound(寄生，飛馳，雜居，火葬，面授，單戀)Coordinative compound (報告，聲音，奇怪，幫助，學(xué)習(xí)，購買)Antonymous compoun

6、ds(買賣，左右，高矮，大小，開關(guān)，長短)Verb-object compound(放心，鼓掌，動員，司機(jī)，主席，干事)Verb complement compound(進(jìn)來，進(jìn)去，介入，改良，打破，推翻)Subject-predicate compound(地震，心疼，民主，自決，膽小，年輕)Noun-measure complement compounds(人口，羊群，書本，花朵，槍支)Modifier-noun (情人節(jié)，小說家，加油站，大學(xué)生，金黃色)Verb-object tri-syllabic compound(開玩笑，吹牛皮，吃豆腐)Subject-verb-object

7、(膽結(jié)石，鬼畫符，鬼打墻)Descriptive +noun (棒棒糖，乒乓球，呼啦圈)4. 給出三種Structural Ambiguities的詞語的例子(Overlapping,Combinatorial,Mixed) Overlapping ambiguity(交集型歧義)網(wǎng)球場，美國會Combinatorial ambiguity(組合型歧義)才能，學(xué)生會Mixed type(混合型歧義)太平洋，太平，平淡5. Write down three types of feature of unknown words-abbreviation (國考-國家公務(wù)員考試)-proper nam

8、e/Name Es of people 小月月.names of places 延坪島.name of organization 上海合作組織-derived words:(審計(jì)人，審計(jì)員，審計(jì)局，審計(jì)處)-compounds:(光敏感，流體力學(xué))-Numeric type compounds:(五月三日，八點(diǎn)十分，第一)第二題：信息論（1）熵是什么意思？What is the entropyDefined by the second law of thermodynamicsA measure of the energy not available for work i

9、n a thermodynamic processA closed system always tends towards achieving a state with a maximum of entropy1. 針對Limited substitutability, limited modifiability, Limited extent compositional，分別給出兩個Quantitative Features-synonymy substitution and ratio-feature characterizes the distribution significance

10、of how two words co-occur at different positions -the number of peak co-occurrence開門，斗志昂揚(yáng)2. WordNet是如何識別單詞的不同意思的？HowNet和Tong Yi Cilin又是如何識別單詞的不同意思的？請區(qū)分wordnet 和hownet 對詞語的語義進(jìn)行描述的方法差異WordNet:-follow different grammatical rules-every synset contains a group of synonymous words or collocations-differen

11、t senses of a word are in different synsets-the meaning of the synsets is further clarified with short defining glossing -synsets are connected to other synsets via a number of semantic relationsHowNet:-the concept definition in hownet is based on sememes-sememes are in a structured marked language

12、-hownet constructs a graph structure of its knowledge base on the inter-concept-the representation in based on concepts denoted by words and expressions in both Chinese and EnglishTong Yi Cilin:-this hierarchical structure reflects the semantic relationship between words-each minor semantic cluster

13、consists of a set of words-words under the same minor semantic cluster share the concept of this class3. WordNet把單詞劃分成synsets，那它是怎么建立synsets之間的聯(lián)系的，例如Nouns和Adjectives. synsets are connected to other synsets via a number of semantic relations4.Homonyms，Antonyms 反義，Hypernymy 上位，Hyponomy 下位，Holonymy整體，當(dāng)

14、然了，考試是沒有給出中文的。Homonyms（同音）: one of a group of words that share the same spelling and the same pronunciation but have different meaningAntonyms(反義)：different words having contradictory or contrary meanings Synonyms(同義)：different words having similar or identical meaningsHypernymy(上位)：the semantic rel

15、ation of being super-ordinate or belong to a higher rank or classHyponomy(下位)：the semantic relation of being sub-ordinate or belong to a lower rank or classHolonymy(整體) ：a word that defines the relationship between a term denoting the whole and a term denoting Meronym(部分) ：a word that names a part o

16、f a large whole Metonymy(轉(zhuǎn)指)：a figure of speech in which a concept is referred to by the name of something closely associated with that conceptProposition: it refers to the meaning of a statement解釋他們的意思，并給出例子；（解釋的話應(yīng)該是英文，例子是中文吧）4. (好像是Thomas還是誰)關(guān)于Word sense ambiguity的兩個基本假設(shè)是什么 one sense per collocati

17、on，one sense per discourse6，請區(qū)分在語言建模中平滑(smoothing)和線性插值(linear interpolation)方法在處理零概率情況時的不同7，請解釋詞語消歧中常用的兩個假設(shè) one sense per collocation 和one sense per discourseone sense per collocation：nearby words provide strong and consistent clues to the sense of a target word, conditional on relative distance ,

18、order and syntactic relationshipone sense per discourse: the sense of a target word is highly consistent within any given document-true for topic dependent words -not true for verbs8,請回答關(guān)于隱馬爾可夫模型（Hidden Markov Model）的下列問題：（1）寫出馬爾科夫模型的三個基本元素和三個基本問題Markov assumption:三個基本元素1. Evaluation problem2. Deco

19、ding problem3. Learning problem（2）請描述Viterbi 算法的基本思想，同時回答該算法是針對隱馬爾可夫模型的哪個基本問題 (Decoding problem)General idea:If best path ending in goes through then it should coincide with best path ending in 9.請描述基于窗口的搭配抽取算法（window-based collocation extraction）的基本思想和三類常用的特征-Based on the property of collocation:

20、Recurrent and habitual use-For a given headword, collect all of the co-words surrounding this headword within fixed context windows-identifies the word combinations with statistical lexical significance as collocation-Features based on lexical co-occurrence frequency significance-Features based on lexical co-occurrence distribution significance-Features based on context10. 請描述k-means聚類算法的基本思路11，她是漂亮女孩，按照自底向上和自頂向下的規(guī)則，識別該句子12， K-NN和K-means哪個是用來聚類的，哪個是用來分類的，分別描述算法原理 KNN13給出三個

人人文庫> 全部分類> 應(yīng)用文書 > 事務(wù)文書

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

自然與語言處理review

文檔簡介

溫馨提示

最新文檔

評論

自然與語言處理review

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔