自然與語言處理review_第1頁
自然與語言處理review_第2頁
自然與語言處理review_第3頁
自然與語言處理review_第4頁
自然與語言處理review_第5頁
已閱讀5頁,還剩9頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、一.1. NLP中兩種派系Rational和Statistal的基本觀點(diǎn)和方法Symbolic approach:Encode all the required information into computer(rationalism).linguistic knowledge(static knowledge,context-dependent knowledge).world knowledge(uniqueness of reference ,type of noun ,situational associativity between noun)Statistic approach:

2、infer language properties from language samples(empiricism)Collect a large collection of texts relevant to your domainFor each noun, compute its probability to take a certain determinerP(determiner|noun)=Given a new noun ,select a determiner with the highest likelihood as estimated on the training c

3、orpus2. 給定“自然”兩個字的三種編碼,判斷是Big5,GB2312或者UTF-8,說明理由Reason:Big5:-the first byte ranges from 0xA0-0xF9,-the second byte range from 0x40-0x7e,0xA0 to 0xFE-ASCII characters are still represented with a single byte-the MSB of the first byte of a Big5 character is always 1-Big5 is an 8-bit encoding with a 1

4、5-bit code spaceGB2312:-contained only one code point for each character-MSB.bit-8 of each byte, is set to 1,and therefore becomes a 8-bit character. Otherwise,the byte is interpreted as ASCII-every Chinese character is represented by a two-byte code.the MSB of both the first and second bytes are se

5、t 自 然 語 言 處 理GB2312 D7D4 C8BB D3EF D1D4 B4A6 C0EDBig5 A6DB B54D BB79 A8A5 B342 B27AUTF-8 E887AA E784B6 E8AFAD E8A880 E5A484 E790863. 給定五個中文詞語,判斷屬于哪種構(gòu)詞法(注意英文術(shù)語)Modified noun compound(大人,小人,熱心,水手,黑板,去年)Modified verb compound(寄生,飛馳,雜居,火葬,面授,單戀)Coordinative compound (報告,聲音,奇怪,幫助,學(xué)習(xí),購買)Antonymous compoun

6、ds(買賣,左右,高矮,大小,開關(guān),長短)Verb-object compound(放心,鼓掌,動員,司機(jī),主席,干事)Verb complement compound(進(jìn)來,進(jìn)去,介入, 改良, 打破, 推翻)Subject-predicate compound(地震,心疼,民主,自決,膽小,年輕)Noun-measure complement compounds(人口,羊群,書本,花朵,槍支)Modifier-noun (情人節(jié),小說家,加油站,大學(xué)生,金黃色)Verb-object tri-syllabic compound(開玩笑,吹牛皮,吃豆腐)Subject-verb-object

7、(膽結(jié)石,鬼畫符,鬼打墻)Descriptive +noun (棒棒糖,乒乓球,呼啦圈)4. 給出三種Structural Ambiguities的詞語的例子(Overlapping,Combinatorial,Mixed) Overlapping ambiguity(交集型歧義)網(wǎng)球場,美國會Combinatorial ambiguity(組合型歧義)才能,學(xué)生會Mixed type(混合型歧義)太平洋,太平,平淡5. Write down three types of feature of unknown words-abbreviation (國考-國家公務(wù)員考試)-proper nam

8、e/Name Es of people 小月月.names of places 延坪島.name of organization 上海合作組織-derived words:(審計(jì)人,審計(jì)員,審計(jì)局,審計(jì)處)-compounds:(光敏感,流體力學(xué))-Numeric type compounds:(五月三日,八點(diǎn)十分,第一)第二題:信息論(1) 熵是什么意思?What is the entropyDefined by the second law of thermodynamicsA measure of the energy not available for work i

9、n a thermodynamic processA closed system always tends towards achieving a state with a maximum of entropy1. 針對Limited substitutability, limited modifiability, Limited extent compositional,分別給出兩個Quantitative Features-synonymy substitution and ratio-feature characterizes the distribution significance

10、of how two words co-occur at different positions -the number of peak co-occurrence開門,斗志昂揚(yáng)2. WordNet是如何識別單詞的不同意思的?HowNet和Tong Yi Cilin又是如何識別單詞的不同意思的?請區(qū)分wordnet 和hownet 對詞語的語義進(jìn)行描述的方法差異WordNet:-follow different grammatical rules-every synset contains a group of synonymous words or collocations-differen

11、t senses of a word are in different synsets-the meaning of the synsets is further clarified with short defining glossing -synsets are connected to other synsets via a number of semantic relationsHowNet:-the concept definition in hownet is based on sememes-sememes are in a structured marked language

12、-hownet constructs a graph structure of its knowledge base on the inter-concept-the representation in based on concepts denoted by words and expressions in both Chinese and EnglishTong Yi Cilin:-this hierarchical structure reflects the semantic relationship between words-each minor semantic cluster

13、consists of a set of words-words under the same minor semantic cluster share the concept of this class3. WordNet把單詞劃分成synsets,那它是怎么建立synsets之間的聯(lián)系的,例如Nouns和Adjectives. synsets are connected to other synsets via a number of semantic relations4.Homonyms,Antonyms 反義,Hypernymy 上位,Hyponomy 下位,Holonymy整體,當(dāng)

14、然了,考試是沒有給出中文的。Homonyms(同音): one of a group of words that share the same spelling and the same pronunciation but have different meaningAntonyms(反義):different words having contradictory or contrary meanings Synonyms(同義):different words having similar or identical meaningsHypernymy(上位):the semantic rel

15、ation of being super-ordinate or belong to a higher rank or classHyponomy(下位):the semantic relation of being sub-ordinate or belong to a lower rank or classHolonymy(整體) :a word that defines the relationship between a term denoting the whole and a term denoting Meronym(部分) :a word that names a part o

16、f a large whole Metonymy(轉(zhuǎn)指):a figure of speech in which a concept is referred to by the name of something closely associated with that conceptProposition: it refers to the meaning of a statement解釋他們的意思,并給出例子;(解釋的話應(yīng)該是英文,例子是中文吧)4. (好像是Thomas還是誰)關(guān)于Word sense ambiguity的兩個基本假設(shè)是什么 one sense per collocati

17、on,one sense per discourse6, 請區(qū)分在語言建模中平滑(smoothing)和線性插值(linear interpolation)方法在處理零概率情況時的不同7,請解釋詞語消歧中常用的兩個假設(shè) one sense per collocation 和one sense per discourseone sense per collocation:nearby words provide strong and consistent clues to the sense of a target word, conditional on relative distance ,

18、order and syntactic relationshipone sense per discourse: the sense of a target word is highly consistent within any given document-true for topic dependent words -not true for verbs8,請回答關(guān)于隱馬爾可夫模型(Hidden Markov Model)的下列問題:(1) 寫出馬爾科夫模型的三個基本元素和三個基本問題Markov assumption:三個基本元素1. Evaluation problem2. Deco

19、ding problem3. Learning problem(2) 請描述Viterbi 算法的基本思想,同時回答該算法是針對隱馬爾可夫模型的哪個基本問題 (Decoding problem)General idea:If best path ending in goes through then it should coincide with best path ending in 9.請描述基于窗口的搭配抽取算法(window-based collocation extraction)的基本思想和三類常用的特征-Based on the property of collocation:

20、Recurrent and habitual use-For a given headword, collect all of the co-words surrounding this headword within fixed context windows-identifies the word combinations with statistical lexical significance as collocation-Features based on lexical co-occurrence frequency significance-Features based on lexical co-occurrence distribution significance-Features based on context10. 請描述k-means聚類算法的基本思路11,她 是 漂亮 女孩,按照自底向上和自頂向下的規(guī)則,識別該句子12, K-NN和K-means哪個是用來聚類的,哪個是用來分類的,分別描述算法原理 KNN13給出三個

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論