下載本文檔
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、CL timetable,MonTuesWedsThursFri 2526272829 12345 89101112 7th April (Sun): Friday timetable,CL timetable,27/03 (Wed)18:30-21:30E6-224 28/03 (Thu)13:15-16:40E6-219 29/03 (Fri)14:05-17:30E6-219 03/04 (Wed)18:30-21:30E6-224 07/04 (Fri)14:05-17:30E6-219 10/04 (Wed)18:30-21:30E6-224 11/04 (Thu)13:15-16:
2、40E6-219 12/04 (Fri)14:05-17:30E6-219,Introducing Corpus Linguistics,Corpus Linguistics Richard Xiao ,Module description,Since the 1990s, the corpus methodology has revolutionized nearly all branches of linguistics Corpus analysis can be illuminating in “virtually all branches of linguistics or lang
3、uage learning.” (Leech 1997) One of the strengths of corpus data lies in its empirical and attested nature pools together the intuitions of a great number of speakers makes linguistic analysis more objective This module introduces the theoretical and practical issues of using corpora in linguistic s
4、tudies explores how the corpus-based approach and other methodologies can be combined in linguistic studies,Aims of the module,The module aims to provide an introduction to corpus linguistics; familiarise students with major corpus resources and tools; pass on essential knowledge and skills for buil
5、ding DIY corpora; to keep students up to date with the latest developments in corpus research; develop students ability in corpus-based language studies.,Contents,1)Introducing corpus linguistics 2)Corpus design and types of corpora 3)Data capture and markup 4)Corpus annotation 5)Making statistic cl
6、aims 6)Corpus analysis (1): concordance and wordlist 7)Corpus analysis (2): keyword analysis 8)Corpora in lexicographic and lexical studies 9)Corpora in grammatical studies 10) Corpora in diachronic studies 11)Corpora in language variation research 12)Corpora in sociolinguistic studies 13)Corpora in
7、 language education 14)Corpora in literary and stylistic studies 15)Corpora in critical discourse analysis 16)Corpora in contrastive and translation studies,Learning outcomes,On successful completion of the module, students will be able to understand the major theoretical frameworks in corpus lingui
8、stics and formulate research questions that are amenable to corpus research; think critically about the strengths and weaknesses of the corpus methodology and decide when and how to interface it with other methodologies; get familiar with major corpus resources and tools and to develop DIY corpora w
9、hen necessary; apply the corpus-based approach in their own research.,Teaching/learning strategies,With a dual focus on why and how to in corpus-based language studies, this practical module will be delivered through a series of lectures and hands-on lab sessions The module also engages students in
10、extensive reading and interaction with corpus data outside of class,Assessment,Option A A 1,000-word essay that critically reviews a corpus exploration tool or a corpus-based study (40%) A 2,500-word project report (60%) Option B One 3,500-word essay based on a research project of your own choice (1
11、00%) Deadline: Friday 31 May 2013 Submission A Word copy as email attachment,Reading list,Set text McEnery, A., Xiao, R. and Tono, Y. (2006) Corpus-Based Language Studies: An Advanced Resource Book. London & New York: Routledge. Wynne, M. (2005) Developing Linguistic Corpora. Oxford: Oxbow Books. Av
12、ailable online at http:/www.ahds.ac.uk/creating/guides/linguistic-corpora Recommended reading See the module syllabus at the course website www.lancs.ac.uk/fass/projects/corpus/ZJU/CL_syllabus.htm (pass for unzipping ebooks: lancs),Outline of this session,Lecture: introducing key concepts and debate
13、s in corpus linguistics What is and is not a corpus? Why use corpora? Corpora vs. intuitions The corpus methodology A brief history of Corpus Linguistics Nature and applications of corpus-based studies Lab: testing your intuitions + exploring online resources,What is a corpus?,The word corpus comes
14、from Latin (“body”) and the plural is corpora A corpus is a body of naturally occurring language but rarely a random collection of text Corpora “are generally assembled with particular purposes in mind, and are often assembled to be (informally speaking) representative of some language or text type.
15、” (Leech 1992) “A corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety.” (MXT 2006: 5),What is not a corpus?,A list of words is not a corpus Building blocks
16、 of language A text archive is not a corpus A random collection of texts A collection of citations is not a corpus A short quotation which contains a word or phrase that is the reason for its selection A collection of quotations is not a corpus A short selection from a text chosen on internal criter
17、ia by human beings A text is not a corpus Intending to be read in different ways The Web is not a corpus Its dimensions unknown, constantly changing, not designed from a linguistic perspective Sinclair (2005),What is a corpus for?,A corpus is made for the study of language in a broad sense To test e
18、xisting linguistic theory and hypotheses To generate and verify new linguistic hypotheses Beyond linguistics, to provide textual evidence in text-based humanities and social sciences subjects The purpose is reflected in a well-designed corpus,Why use corpora?,Even expert speakers have only a partial
19、 knowledge of a language A corpus can be more comprehensive and balanced Even expert speakers tend to notice the unusual and think of what is possible A corpus can show us what is common and typical Even expert speakers cannot quantify their knowledge of language A corpus can readily give us accurat
20、e statistics,Why use corpora?,Even expert speakers cannot remember everything they know A corpus can store and recall all the information that has been stored in it Even experts speakers cannot make up natural examples A corpus can provide us with a vast number of examples in real communication cont
21、ext Even expert speakers have prejudices and preferences and every language has cultural connotations and underlying ideology A corpus can give you more objective evidence,Why use corpora?,Even expert speakers are not always available to be consulted A corpus can be made permanently accessible to al
22、l Even expert speakers cannot keep up with language change A constantly updated corpus can reflect even recent changes in the language Even expert speakers lack authority: they can be challenged by other expert speakers A corpus can encompass the actual language use of many expert speakers,Intuition
23、s as an alternative,Intuitions are always useful in linguistics To invent (grammatical, ungrammatical, or questionable) example sentences for linguistic analysis To make judgments about the acceptability / grammaticality or meaning of an expression To help with categorization,Intuitions as an altern
24、ative,Intuitions should be applied with caution Possibly biased as they are likely to be influenced by ones dialect or sociolect Introspective data is artificial and may not represent typical language use as one is consciously monitoring ones language production Introspective data is decontextualize
25、d because it exists in the analysts mind rather than in any real linguistic context Intuitions are not observable and verifiable by everyone as corpora are Excessive reliance on intuitions blinds the analyst to the realities of language usage because we tend to notice the unusual but overlook the co
26、mmonplace There are areas in linguistics where intuitions cannot be used reliably e.g. language variation, historical linguistics, register and style, first and second language acquisition Human beings have only the vaguest notion of the frequency of a construct or a word,Benefits of corpus data,Cor
27、pus data is more reliable A corpus pools together linguistic intuitions of a range of language speakers, which offsets the potential biases in intuitions of individual speakers Corpus data is more natural It is used in real communications instead of being invented specifically for linguistic analysi
28、s Corpus data is contextualized Attested language use which has already occurred in real linguistic context Corpus data is quantitative Corpora can provide frequencies and statistics readily Corpus data can find differences that intuitions alone cannot perceive E.g. synonyms totally, absolutely, utt
29、erly, completely, entirely,Corpora vs. intuitions,Not necessarily antagonistic, but rather corroborate each other and can be gainfully viewed as being complementary Armchair linguists and corpus linguists “need each other. Or better, the two kinds of linguists, wherever possible, should exist in the
30、 same body.” (Fillmore 1992) “Neither the corpus linguist of the 1950s, who rejected intuitions, nor the general linguist of the 1960s, who rejected corpus data, was able to achieve the interaction of data coverage and the insight that characterize the many successful corpus analyses of recent years
31、.” (Leech 1991) The key to using corpus data is to find the balance between the use of corpus data and the use of ones intuitions,The corpus methodology,It is debatable whether CL is a methodology or a branch of linguistics CL goes well beyond this methodological role and has become an independent d
32、iscipline In spite of the name, CL is indeed a methodology rather than an independent branch of linguistics in the same sense as phonetics, syntax, semantics or pragmatics These latter areas of linguistics describe, or explain, a certain aspect of language use Corpus linguistics, in contrast, is not
33、 restricted to a particular aspect of language - it can be employed to explore almost any area of linguistic research,A brief history of CL,The term corpus linguistics first appeared only in the early 1980s, but corpus-based language study has a substantial history The history of CL can be split int
34、o two periods: before and after Chomsky,A brief history of CL,Before Chomsky Field linguists and linguists of the structuralist tradition used “shoebox corpora” shoeboxes filled with paper slips Their methodology was essentially “corpus-based” in the sense that it was empirical and based on observed
35、 data The work of early corpus linguistics was underpinned by two fundamental, yet flawed assumptions The sentences of a natural language are finite. The sentences of a natural language can be collected and enumerated. Most linguists saw the “corpus” as the only source of linguistic evidence in the
36、formation of linguistic theories,A brief history of CL,Chomsky revolution: Between 1957 and 1965 Chomsky changed the direction of linguistics from empiricism towards rationalism “Any natural corpus will be skewed. Some sentences wont occur because they are obvious, others because they are false, sti
37、ll others because they are impolite. The corpus, if natural, will be so wildly skewed that the description would be no more than a mere list.” (Chomsky 1962) Our internal knowledge of language in human brain (competence, I-language) replaces observed data (performance, E-language) Intuitions started
38、 to be relied on as evidence Xiao, R. (2008) “Theory-driven corpus research: using corpora to inform aspect theory”. In A. Ldeling & M. Kyto (eds.) Corpus Linguistics: An International Handbook. Berlin: Mouton de Gruyter,A brief history of CL,Revival of CL Corpus research was continued in a few cent
39、res (Brown, Lancaster) in the 60s-70s The Brown University Standard Corpus of Present-day American English (Brown corpus) Lancaster-Oslo-Bergen Corpus of BrE (LOB) The hardware still imposed some restrictions until the real development started in the 1980s The marriage of corpora with computer techn
40、ology rekindled interest in the corpus methodology Since then, the number and size of corpora and corpus-based studies have increased dramatically Nowadays, the corpus methodology enjoys widespread popularity, and has opened up or foregrounded many new areas of research,Areas that have used corpora,
41、Lexicography Lexical studies Grammatical studies Register/genre analysis Language variation Contrastive analysis Translation studies Language change Language teaching,Semantics Pragmatics Stylistics Literary study Sociolinguistics Discourse analysis Forensic linguistics Computational linguistics ,Na
42、ture of corpus-based approach,It is empirical, analysing the actual patterns of use from natural texts It utilises a large and principled collection of natural texts as the basis for analysis It makes extensive use of computers for analysis, using both automatic and interactive techniques It integra
43、tes both quantitative and qualitative analytical techniques (Biber et al 1998: 4-5),Why use computers?,Development of computer technology has revived CL Machine-readability is a de facto attribute of modern corpora Electronic corpora have advantages unavailable to their “shoebox” ancestors It is the
44、 use of computerized corpora, together with computer programs which facilitate linguistic analysis, that distinguishes modern electronic corpora from early drawer-cum-slip corpora,Why use computers?,Computerized corpora can be processed and manipulated rapidly at minimal cost E.g. searching, selecti
45、ng, sorting and formatting Computers can process machine-readable data accurately and consistently Computers can avoid human bias in an analysis, thus making the result more reliable Machine-readability allows further automatic processing to be performed on the corpus so that corpus texts can be enr
46、iched with various metadata and linguistic analyses Corpus markup and corpus annotation,A question for Deep Thought,“Alright,” said the computer Deep Thought. “The Answer to the Great Question.” “Yes.!” “Of Life, the Universe and Everything .” said Deep Thought. “Yes.!” “Is.” “Yes.!.?” “Forty-two,”
47、said Deep Thought, with infinite majesty and calm. It was a long time before anyone spoke. “Forty-two!” yelled someone in the audience. “Is that all youve got to show for seven and a half million years work?” “I checked it very thoroughly,” said the computer, “and that quite definitely is the answer
48、. I think the problem, to be quite honest with you, is that youve never actually known what the question is.” Hitchhikers Guide to the Galaxy by Douglas Adams What can we learn from this story?,What corpora cannot do,Corpora do not provide negative evidence Cannot tell us what is possible or not pos
49、sible Can show what is central and typical in language Corpora can yield findings but rarely provide explanations for what is observed Interfacing other methodologies The use of corpora as a methodology also defines the boundaries of any given study Importance of amenable research questions The find
50、ings based on a particular corpus only tell us what is true in that corpus Generalisation vs. representativeness See Unit B2 for pros and cons of corpora,Ask corpora the right questions,Corpus linguistics as a methodology is only one of the (many) ways of doing things “doing linguistics” The usefuln
51、ess of corpora depends upon the research question being investigated “They are invaluable for doing what they do, and what they do not do must be done in another way.” (Hunston 2002: 20) The development of the corpus-based approach as a tool in language studies has been compared to the invention of
52、telescopes in astronomy If it is ridiculous to criticize a telescope for not being a microscope, it is equally pointless to criticize the corpus-based approach for not doing what it is not intended to do It is up to you to formulate research questions amenable to corpus-based investigation and to decide how to combine corpora with other resources,Testing your intuitions with BUY-BNC,/bnc/,Most common noun in English,Search for n*,Top 10: time, people, way, years, year, work, government, day, man, world,Most common noun in adverts,Search for nn* in w-advert,Top 20: hotel,
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- CCAA - 2017年06月環(huán)境管理體系基礎(chǔ)答案及解析 - 詳解版(100題)
- 山西省晉中市2025-2026年九年級(jí)上歷史期末試卷(含答案)
- CCAA - 認(rèn)證基礎(chǔ) 認(rèn)通基摸底考試三答案及解析 - 詳解版(62題)
- CCAA - 2021年05月認(rèn)證基礎(chǔ)答案及解析 - 詳解版(62題)
- 選礦供料工崗前安全管理考核試卷含答案
- 薄膜電阻器制造工崗前操作考核試卷含答案
- 高壓熔斷器裝配工安全演練考核試卷含答案
- 紡織印花制版工崗后模擬考核試卷含答案
- 橋梁工7S執(zhí)行考核試卷含答案
- 纖維染色工安全宣貫?zāi)M考核試卷含答案
- 2025年中考英語(yǔ)復(fù)習(xí)必背1600課標(biāo)詞匯(30天記背)
- 資產(chǎn)管理部2025年工作總結(jié)與2025年工作計(jì)劃
- 科技成果轉(zhuǎn)化技術(shù)平臺(tái)
- 下腔靜脈濾器置入術(shù)的護(hù)理查房
- 基建人員考核管理辦法
- 2025體育與健康課程標(biāo)準(zhǔn)深度解讀與教學(xué)實(shí)踐
- 礦山救援器材管理制度
- 2025西南民族大學(xué)輔導(dǎo)員考試試題及答案
- T/CSPSTC 17-2018企業(yè)安全生產(chǎn)雙重預(yù)防機(jī)制建設(shè)規(guī)范
- 2025年《三級(jí)物業(yè)管理師》考試復(fù)習(xí)題(含答案)
- 《數(shù)據(jù)與管理》課件
評(píng)論
0/150
提交評(píng)論