=========================preview======================
(COMP334)final99F_sol.pdf
Back to COMP334 Login to download
======================================================
The Hong Kong University of Science and Technology
COMP334: Distributed Database Systems
Final Examination
16 December 1999
Note: This is an open book examination. You can consult books and notes to answer the questions, but do not simply copy from them.
Please check whether you have all the eleven pages. You are required to answer all questions in the space provided.
Your Student Number: _______________________________ Your Name: _______________________________ Lab: ________________________________
Question Total marks You got
Q1 20
Q2 25
Q3 20
Q4 15
Q5 20
Total 100
Question 1 (20%)
Answer R=S get 1 mark
(2) (5%) Blooming join of two relations, R and S at site S1 and S2, works as follows. At site S1, a bit-vector of some size k is computed by hashing the join attribute values of R into range 0 to k-1 and set bit I (0 I < k) to 1 if some tuple hashes to I. After the bit-vector is computed, it is shipped to S2. At site S2, each tuple of S is hashed similarly. If the corresponding bit in the Rs bit-vector is 0, the tuple is discarded. The remaining tuples of this reduction process are shipped to site S1 to join with R. Write the Blooming join algorithm in the form of psudo-code.
Clear all bits in bit-vector B
foreach tuple r in R do
begin
I := hash (r.A);
B[I] := 1;
end;
Ship B to S2;
for each tuple s in S do
begin
I := hash(s.B);
If B[I] = 1 then insert s to S
end
Ship S to S1
Join R and S at site S1
(3) (3%) Compared to semi-join, what are the pros and cons of blooming join?
+ bit-vector size can be smaller than the size of projection of the join attributes
-may ship some tuples that will not produce join results
(4) (8%) Give the cost formula of blooming join of relation R with S. Describe clearly the meaning of the symbols used in your formula.
One scan of relation R; Hash each tuples in R; Transfer the bit-vector One scan of relation S; Hash each tuple of S Store/Ship relation S Local join cost of S and R (S may be stored)
Question 2 (25%)
Consider a database consisting the following two relations
Employees (eid: integer, did: integer, sal: real)
Departments (did: integer, mgrid: integer, budget: integer) The mgrid field of Departments is the eid of the manager. Each of these relations contains 20-byte tuples, and the sal and budget fields both contain uniformly distributed values in the range 0 to 1,000,000. The Employees relation contains 100,000 pages, the Departments relation contains 5000 pages, and each processor has 100 buffer pages of 4000 bytes each. The cost of one page I/O is td and the cost of shipping one page is ts. There are no indexes.
The database is stored in a distributed DBMS with 10 sites. The Departments tuples are horizontally partitioned across the 10 sites by did, with the same number of tuples assigned to each site, and with no particular order to how tuples are assigned to sites. The Emplo