=========================preview======================
(comp337)[2009](f)finalsample~PPSpider^_10201.pdf
Back to COMP337 Login to download
======================================================
COMP337: Sample Final Exam
Qiang Yang, Fall 2009


Consider the following data table where Play is a class attribute. In the table, the Humidity attribute has values l (for low) or h (for high), Sunny has values y or n, Windy has y or n, and Play has y or n.

Humidity
Sunny
Windy
Play

h
n
n
n

l
n
y
y

l
y
n
y

h
y
y
n



Table 1. Training data for deciding to play or not.


1. (Decision Trees)

a. Build a one-level decision tree using information gain. Show all calculations. What is the error rate on training data?

b. Now build a two-level decision tree using information gain. Show all your calculations. What is the error rate on training data?

c. Which tree is better? Why?





2. (Na.ve Bayes)

a. For the following test data, determine the value of the Play attribute chosen by the Na.ve Bayes learning method. Note that the value of Sunny is missing.





Humidity
Sunny
Windy

n
?
n




b. The Na.ve Bayes learning method can be considered as a Bayesian Network. Draw a Bayesian Network for the Na.ve Bayesian Method for the data in Table 1. Briefly explain your answer and the network. Point out conditional independence property between nodes.

c. (Missing values) Prove that even when Sunny has a missing value, the Na.ve Bayes maximum likelihood function can still be used in deciding the final class value, simply by ignoring the missing value attribute.



3. (Neural Nets) Draw a Perceptron Neural Network with multiple layers for the XOR problem. Explain why a single layer perceptron cannot learn this function.



4. (Association Rules) Remove the Play attribute. Now consider the following questions



a. What are the max frequent patterns for support value = 2?

b. Draw a lattice for finding all frequent patterns with support value = 2. Indicate the max patterns in this lattice.



5. (Collaborative Filtering) Compute User As preference for New Item using Pearson Correlation.



users
Item1
Item2
Item3
New Item

A
1
5
1
?

B
2
3
2
4

C
4
1
4
2

D
4
4
2
5





6. Answer the following multiple choice questions.


1. Suppose we would like to perform clustering on spatial data such as the geometrical locations of houses. We wish to produce overlapping clusters because some houses can belong to several categories. Which of the following methods is (are) NOT appropriate?
A. Probabilistic Clustering
B. Density-based clustering
C. Model-based clustering
D. K-means clustering
E: Decision Trees


2. Which of the following statements is correct?
A. If confidence(X Y) c, then confidence(Y X) c
B. If confidence(X YZ) c, then confidence(X Y) c
C. If confidence(XY Z) c, then confidence(X Z) c
D. If support