kc第13講-數(shù)據(jù)散列技術(shù).ppt_第1頁
kc第13講-數(shù)據(jù)散列技術(shù).ppt_第2頁
kc第13講-數(shù)據(jù)散列技術(shù).ppt_第3頁
kc第13講-數(shù)據(jù)散列技術(shù).ppt_第4頁
kc第13講-數(shù)據(jù)散列技術(shù).ppt_第5頁
已閱讀5頁,還剩24頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、第13講: (第12章) 數(shù)據(jù)散列技術(shù) 重慶大學計算機學院,課程名稱: 數(shù)據(jù)庫系統(tǒng) -,第13講:數(shù)據(jù)散列技術(shù),項目驅(qū)動目標: 如何實現(xiàn)基于散列的文件組織和索引: 一、靜態(tài)散列 二、動態(tài)散列 三、SQL中的索引定義 主要討論問題: 什么是靜態(tài)散列 好的散列函數(shù)該是什么樣 什么是動態(tài)散列 如何在動態(tài)散列中插入記錄 如何刪除動態(tài)散列中的記錄 與靜態(tài)散列相比,動態(tài)散列有特點 順序索引與散列的區(qū)別 SQL中如何定義索引,Exercise 13,Static Hashing(靜態(tài)散列),A bucket(桶) is a unit of storage containing one or more rec

2、ords (a bucket is typically a disk block). In a hash file organization(散列文件組織) we obtain the bucket of a record directly from its search-key value using a hash function(散列函數(shù)). Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B. Hash function is

3、 used to locate records for access, insertion as well as deletion. Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.,一 靜態(tài)散列,1-1 什么是靜態(tài)散列?,問題1答案,1.1 基本概念,Example of Hash File Organization,There are 10 bucke

4、ts, The binary representation of the ith character is assumed to be the integer i. The hash function returns the sum of the binary representations of the characters modulo 10. E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(B r igh t o n)= 3 h(Mia nu s) = 7,Hash file organization of account file, using d

5、ept_name as key,1.1 基本概念,Hash Functions(散列函數(shù)),Worst hash function maps all search-key values to the same bucket; this makes access time proportional to the number of search-key values in the file. An ideal hash function is uniform(均勻的), i.e., each bucket is assigned the same number of search-key val

6、ues from the set of all possible values. (桶的尺寸一樣大!) An ideal hash function is random(隨機的), so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search-key values in the file. (桶中放的記錄一樣多!) Typical hash functions perform computation on the inter

7、nal binary representation of the search-key. For example, for a string search-key (如account的部門名), the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned. .,1-2 好的散列函數(shù)該是什么樣?,1.1 基本概念,問題2答案,Problems of Bucket Overflows(溢

8、出問題),Bucket overflow can occur because of Insufficient buckets(桶不足) 若記錄總數(shù)是nr,fr一個桶能存放的記錄數(shù)(分布的空間大小),則桶數(shù)目nB應(yīng)滿足: nB nr / fr 但遺憾的是: 在選擇散列函數(shù)時, nr不是已知的 Skew(偏斜) in distribution of records. This can occur due to two reasons: multiple records have same search-key value chosen hash function produces non-unif

9、orm distribution of key values Although the probability of bucket overflow can be reduced(減少), it cannot be eliminated(消除); it is handled by using overflow buckets(溢出桶).,1.2 桶溢出問題,Handling of Bucket Overflows (溢出處理),解決方法:Overflow chaining(溢出鏈) The overflow buckets of a given bucket are chained toget

10、her in a linked list. Above scheme is called closed hashing(閉散列). 注: An alternative, called open hashing(開散列) which does not use overflow buckets, is not suitable for database applications.,1.2 桶溢出問題,Hash Indices(散列索引),Hashing can be used not only for file organization, but also for index-structure

11、creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Strictly speaking, hash indices are always secondary indices if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessar

12、y. However, we use the term hash index to refer to both secondary index structures and hash organized files. 例子:散列索引,1.3 散列索引,Example of Hash Index,1.3 散列索引,Deficiencies of Static Hashing,In static hashing, function h maps search-key values to a fixed set of B of bucket addresses. Databases(指記錄數(shù)) gr

13、ow升 or shrink降 with time. If initial number of buckets is too small, and file grows, performance will degrade due to too much overflows. If space is allocated for anticipated growth, a significant amount of space will be wasted initially (and buckets will be underfull). If database shrinks, again sp

14、ace will be wasted. One solution: periodic re-organization of the file with a new hash function Expensive, disrupts干擾 normal operations Better solution: allow the number of buckets to be modified dynamically. (詳見下頁),1.4 靜態(tài)散列的弱點,Dynamic Hashing(動態(tài)散列),Good for database that grows and shrinks in size A

15、llows the hash function to be modified dynamically Extendable hashing(可擴充散列) one form of dynamic hashing: Hash function generates values over a large range typically b-bit integers, with b = 32. (典型值) (但并不為每個散列值創(chuàng)建1個桶) At any time use only a prefix of the hash function to index into a table of bucket

16、 addresses. (這前i個位,用于計算桶地址偏移量) Let the length of the prefix be i bits, 0 i 32. Bucket address table size = 2i. Initially i = 0 Value of i grows and shrinks as the size of the database grows and shrinks. Multiple多個 entries in the bucket address table may point to a bucket (why?) Thus, actual number o

17、f buckets is 2i The number of buckets also changes dynamically due to coalescing 整合 and splitting of buckets. (因重組時一次僅作用1個桶, 索引更新的維護開銷低) 一般結(jié)構(gòu):可擴展散列,1-1 什么是動態(tài)散列?,二 動態(tài)散列,問題3答案,2.1 動態(tài)散列的基本知識,General Extendable Hash Structure -一般結(jié)構(gòu),In this structure, i2 = i3 = i (=2), whereas i1 = i 1 (=1) 而 i=2 (see ne

18、xt slide for details),一般地iki,2.1 動態(tài)散列的基本知識,Use of Extendable Hash Structure-如何該使用索引,如何知道指向同一桶: Each bucket j stores a valueij 正整數(shù) All the entries that point to the same bucket have the same values on the first ij bits. 如何定位屬于哪一桶:To locate the bucket containing search-key Kj: 1.Compute h(Kj) = X 即計算該

19、搜索碼相應(yīng)的二進數(shù)表示 Use the first i high order bits of X as a displacement偏轉(zhuǎn)/位移 into bucket address table, and follow跟隨 the pointer to appropriate bucket 桶地址表中指向桶j的表項編號(計算公式)為: 2(i-ij) 如何插入記錄:To insert a record with search-key value Kj follow same procedure as look-up and locate the bucket, say j. (桶j) If t

20、here is room in the bucket j, insert record in the bucket. Else the bucket must be split and insertion re-attempted 重試 (next slide) Overflow buckets used instead in some cases (will see in next slide),2.1 動態(tài)散列的基本知識,1-2 如何在動態(tài)散列中插入記錄?,問題4答案,Insertion in Extendable Hash Structure (桶分裂),:If i ij (more t

21、han one pointer to bucket j) 可在該桶后增加桶 allocate a new bucket z, and set ij = iz = (ij + 1) Update the second half of the bucket address table entries originally pointing to j, to point to z remove each record in bucket j and reinsert (in j or z) recompute new bucket for Kj and insert record in the bu

22、cket (further splitting is required if the bucket is still full) If i = ij (only one pointer to bucket j) 不能在該桶后增加桶 If i reaches some limit b, or too many splits have happened in this insertion, create an overflow bucket 只能擴展溢出桶 Else (做下三步工作) 還可成倍增加桶指針 increment i (to i+1) and double the size of the

23、 bucket address table. 表項指針翻倍! replace each entry in the table by two entries that point to the same bucket. (因指針空間加倍,用指向同一桶的兩個指針去替代表中每個指針) recompute new bucket address table entry for Kj (當前待插記錄) (Now i ij so use the first case above.) 桶加倍和桶內(nèi)記錄處理!,To split a bucket j when inserting record with sear

24、ch-key value Kj: (滿時),返回例子中,2.2 散列桶動態(tài)分裂,Deletion in Extendable Hash Structure,locate it in its bucket and remove it. The bucket itself can be removed if it becomes empty (with appropriate updates to the bucket address table). Coalescing整合 of buckets can be done (can coalesce only with a “buddy” buck

25、et having same value of ij and same ij 1 prefix, if it is present) Decreasing bucket address table size is also possible Note: decreasing bucket address table size is an expensive operation and should be done only if number of buckets becomes much smaller than the size of the table,2.2 散列桶動態(tài)分裂,1-3 如

26、何刪除動態(tài)散列中的記錄?,問題5答案,To delete a key value,Use Example of Extendable Hash Structure,1)Initial Hash structure:假設(shè)bucket size = 2 (記錄,桶大小),b=2,Branch_name的32位散列值: (即將它們按照前面指出的單詞到整數(shù)的映射,并將該數(shù)字采用二進制表示),2.3 散列桶動態(tài)分裂過程實例,Example (續(xù)1),2)Hash structure after insertion of one Brighton and two Downtown records -在依次

27、插入Brighton記錄和第一個Downtown記錄時,i=0,僅一個初始桶,不需桶定位,都可直接裝入; -在插入第二個Downtown記錄時,桶已裝滿,需要分裂 (參分裂方法) i=i0且i2 ,屬于“需要成倍增加桶指針”情形,分裂如下: (i=1,桶地址表加倍,桶數(shù)量加倍!由前i=1位為0、1兩表項指針分別確定) (并且,原桶中記錄重定位時,根據(jù)前i=1位是0或1分別裝入到第1、2桶中),2.3 散列桶動態(tài)分裂過程實例,Example (續(xù)2),3)Hash structure after insertion of Mianus record 過程說明: Downtown的散列碼首位也為1

28、 ,Mianus的散列碼首位也為1; 但第二個桶已裝滿,故桶指針數(shù)量又增加一倍,到4個; 這時本來仍只有兩個桶,但Mianus定位到第二個桶時發(fā)現(xiàn)已滿; 故在該桶后又增加一個桶,并將Mianus記錄插入。,2.3 散列桶動態(tài)分裂過程實例,Example (續(xù)3),4)Hash structure after insertion of three Perryridge records 說明:在插入第3個Perryridge記錄時,因與已有兩個Perryridge的散列值相同,無法增加桶來存放(因無法區(qū)分這三個記錄)。這是只能采用增加溢出桶來解決!,Example (續(xù)4),5) Hash str

29、ucture after insertion of Redwood and Round Hill records 說明:因桶中有空間,在確定桶位置后(散列值的前2位分別是00和11),直接插入既可,2.3 散列桶動態(tài)分裂過程實例,Extendable Hashing vs. Other Schemes(與靜態(tài)散列比),Benefits of extendable hashing: Hash performance does not degrade with growth of file Minimal space overhead負載 (桶地址表中僅為每個當前前綴長度的散列值存放一個指針) D

30、isadvantages of extendable hashing Extra level of indirection間接層 to find desired record 增加訪問桶地址表開銷 Bucket address table may itself become very big (larger than memory) Cannot allocate very large contiguous areas on disk either Solution: B+-tree structure to locate desired record in bucket address ta

31、ble Changing size of bucket address table is an expensive operation 自學:* Linear hashing(線性散列-另一種動態(tài)散列) is an alternative mechanism Allows incremental growth of its directory 避免使用額外的間接層 (equivalent to bucket address table) At the cost of more bucket overflows,2.4 動態(tài)散列與靜態(tài)散列比較,2-4 與靜態(tài)散列相比,動態(tài)散列有特點?,問題6答案

32、,Comparison of Ordered Indexing and Hashing,是否使用索引和使用什么類型的索引應(yīng)考慮的因素: Cost of periodic re-organization Relative frequency of insertions and deletions Is it desirable to optimize average access time at the expense of worst-case access time? 什么時候用什么索引會更好 Expected type of queries: Hashing is generally be

33、tter at retrieving records having a specified value of the key. If range queries are common, ordered indices are to be preferred 哪些DBMS支持哪些類型索引: In practice: PostgreSQL supports hash indices, but discourages use due to poor performance Oracle supports static hash organization, but not hash indices S

34、QLServer supports only B+-trees,2.5 有序索引與散列的比較,2-5 何時使用順序索引和散列?,問題7答案,Index Definition in SQL,Create an index create index on () E.g. create index b-index on branch(branch_name) Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key. E.g. creat

35、e unique index branch-index on branch(branch_name) Not really required if SQL unique integrity constraint is supported To drop an index drop index Most database systems allow specification of type of index(比如:B+樹或散列) Some database systems allow specification of clustering index.,三 SQL中的索引定義,3-1 SQL中

36、如何定義索引?,問題8答案,3-1 SQL索引如何被使用?,系統(tǒng)自動分析選擇使用,用于無法顯示指定! 應(yīng)在SQL查詢語句中的WHERE條件中使用到相應(yīng)字段!,項目驅(qū)動目標: 數(shù)據(jù)庫系統(tǒng)如何實現(xiàn)查詢快速處理: 一 、查詢處理概述 二 、基本查詢操作的處理 三 、連接運算的處理 主要討論問題: 你知道查詢處理的基本過程 查詢代價大致該如何估算 選擇操作代價如何 如何實現(xiàn)外排序歸并 外排序歸并的復(fù)雜度如何 嵌套循環(huán)連接代價如何 索引嵌套循環(huán)連接代價如何 歸并連接算法是什么樣,練習 13:,P.350 12.18 12.19,Thank you !,End !,Bitmap Indices(位圖索引)

37、,Bitmap indices are a special type of index designed for efficient querying on multiple keys Records in a relation are assumed to be numbered sequentially from, say, 0 Given a number n it must be easy to retrieve record n Particularly easy if records are of fixed size Applicable on attributes that t

38、ake on a relatively small number of distinct values E.g. gender, country, state, E.g. income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity) A bitmap is simply an array of bits,*四 位圖索引,4-1 什么是位圖索引?,問題9答案,Bitmap Indices (Cont.),In its s

39、implest form a bitmap index on an attribute has a bitmap for each value of the attribute Bitmap has as many bits as records In a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwise,二 位圖索引,Bitmap Indices (Cont.),Bitmap indices are useful f

40、or queries on multiple attributes not particularly useful for single attribute queries Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not) Each operation takes two bitmaps of the same size and applies the operation on corresponding bits to get the result bitmap E.g. 100110 AND 110011 = 100010 100110 OR 110011 = 110111 NOT 100110 = 011001 Males with income level L1: 10010 AND 10100 = 10000 Can then retrieve required tuples. Counting number of matching tuples is even faster,二 位圖索引,Bitmap Indices (Cont.),Bitmap ind

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論