COMP225 Assignment 2 — Semester 1, 2008
Tree-structured databases: Part 2
Worth 10%
Due 11:59pm on Monday 5 May 2008.
Objectives
This assignment is designed to give you more
practice in programming with multiple pointer-based trees. You will also
learn how to rationalise data from several sources to maintain
consistency.
This assignment has a standard part (which is worth the 10% mentioned above)
and a bonus part (which is worth even more).
For the standard (none bonus) part you must submit one file for Task 1 using the
Blackboard submission system, and one hard copy in the named
assignment box for Task 2. The
file should be called Merge.cpp, and should complete the code
you need to solve the problem described below at Task 1. The hard copy
should contain the written
answers to the questions described below at Task 2.
The formats you should use for your files are specified below.
If you do not use the filenames stated above for the program files, or you do not use
the file format specified below, then you will
receive 0 (zero) marks.
Submit your work by the due date and time stated above.
Overview
When a database is distributed over a number of independent sites, from time to time the information gathered at the different sites must be merged into a single consistent file. In a smart telephone-enquiry service the day’s information gathered at different locations must be gathered into a central consistent repository so that all users can benefit from the improved service.
This Assignment
In this assignment you will learn how to merge two saved games together (from Assignment 1) to produce a single consistent game tree. The process for doing that is split up into two stages: the first merges files together; and the second removes inconsistencies generated by the merging process. In Task 1 you are asked to write a program which merges two trees together. In the Task 2 you are asked to design an algorithm to remove any inconsistencies remaining after the merging process. For the bonus you are asked to code up your algorithm designed in Task 2.
You have been provided with
a number of program- and specification files, and three executables. The
executable called Guess is the game program from Assignment 1. The executatable called Merge gives you an idea of what your program from Task 1 should do if implemented correctly. The executable called Clean is for Task 2 and the Bonus. All executables will run on titanic. The other files are as follows:
- QuestionNode.h This is the file which deals with the
building blocks for a tree: it defines a tree node, and implements a
constructor. (This file contains fully implemented code, corresponding to Task 2 of Assignment 1.)
- Animal.h This contains some utility routines for
implementing the game, and is the same as the file of that name distributed for Assignment 1.
- Merge.cpp This is the file which does the merging. It takes as input two saved games in serialised format, and outputs a single serialised file which is the merge of the two input files. Your Task 1 below asks you to complete the implementation of the function Merge which is the main routine for merging the trees. To understand how
Merge should work, read the specification of Task 1 below, play the separate games corresponding to the two input files (animalData1, animalData2) and the merged file (animalMerge) by executing GuessGame separately on the three files and study the notes provided in lectures. Then complete the starred tutorial and practical exercises.
- animalData1, animalData2 and animalMerge are three files containing respectively two saved games and the result of a merge.
To understand how they relate to each other, play the game separately on the three files.
- animalData3, animalData4, animalMerge2 and animalCleaned are four files containing respectively two saved games, the result of merging them, and the result of cleaning them to remove inconsistencies.
To understand how they relate to each other, try playing the four games. Your Task 2 below asks you questions about that relationship, and to how to design an algorithm which achieves it, i.e. cleans the merge of two files to remove inconsistencies.
To play the game using any of the executables, download it
and any required datafiles and make sure that they are in the same
directory.
- To run the game on animalData1 type ./GuessGame animalData1
- To merge two files animalData1 and animalData2 type ./Merge animalData1 animalData2 . This will produce a third file called animalMerge which can then be played by running GuessGame animalMerge.
- To clean a file animalMerge type ./Clean animalMerge. This will produce a third file called animalCleaned which can then be played by running GuessGame animalCleaned.
Task 1 (10 marks)
Your first task is to complete the function Merge in
Merge.cpp so that your program merges two game trees together and
produces a serialised file of the merged trees, in the same way that the executable Merge does.
To help your understanding of what needs to be done, complete the starred Tutorial exercises in Week 6. Then interpret the following pseudocode to complete this task. (UPPER-CASE portions are for you to fill in.)
QuestionNode *merge(QuestionNode *qN1, QuestionNode *qN2) {
if (EITHER qN1 OR qN2 IS NULL) RETURN THE APPROPRIATE TREE WITHOUT MERGING;
else if (THE TWO TEXTS ARE THE SAME) // DO NOT DUPLICATE THE SAME QUESTION
MAKE A NODE WITH THE CURRENT TEXT, THEN
ADD THE MERGE OF THE YES SUBTREES AS THE NEW YES SUBTREE
ADD THE MERGE OF THE NO SUBTREES AS THE NEW NO SUBTREE
THEN RETURN THE RESULT;
else if (BOTH ARE GUESS NODES)
MAKE A NODE USING qN1'S TEXT, THEN
ADD NULL AS THE YES CHILD, AND qN2 AS THE NO SUBTREE
THEN RETURN THE RESULT
else if (qN1 IS A GUESS)
MAKE A NODE WITH qN2'S TEXT, THEN
ADD THE MERGE OF qN1 AND qN2'S YES SUBTREES AS THE NEW YES SUBTREE
ADD THE MERGE OF qN1 AND qN2'S NO SUBTREES AS THE NEW NO SUBTREE
THEN RETURN THE RESULT;
else if (qN2 IS A GUESS)
SIMILAR TO THE ABOVE;
// IF NONE OF THE ABOVE, THEN THE TWO NODE ARE BOTH QUESTIONS
else MAKE A NODE USING qN1'S TEXT, THEN
ADD THE MERGE OF qN2 AND qN1'S YES SUBTREE AS THE NEW YES SUBTREE
ADD THE MERGE OF qN2 AND qN1'S NO SUBTREE AS THE NEW NO SUBTREE
THEN RETURN THE RESULT// NOTE ORDER SWAP.
}
You will have correctly implemented this function if it does exactly the
same as Merge.
Remember that in order to compile your code, you should have all the files
QuestionNode.h, Animal.h and Merge.cpp in the
same folder and use g++ Merge.cpp.
Task 2 (10 marks)
The second task is to answer the following questions about designing a cleaning
algorithm for removing inconsistencies in a merged tree.
- (1 mark) Given the two files animalData3 and animamalData4 given below sketch their corresponding binary
trees. Now sketch the tree obtained by merging the two together and explain why there is an
inconsistency.
- (2 marks) Now suppose that we want to preserve all the information of the two original trees,
but to remove any inconsistencies.
Which branches of the merged tree should be removed? Sketch the resulting tree.
- (1 mark) The basic idea to removing inconsistencies is to traverse the tree, pruning it whenever
a question is repeated. Describe what additional data structures are required to do this in practice.
- (4 marks) Now describe a strategy for pruning the tree, explaining carefully how you would use the
data structure mentioned above. You must present your answer clearly using pseudocode. You
may if you wish define a number of functions to carry out the various tasks in the strategy,
but you must specify those functions carefully.
- (2 marks) Give an estimate of the complexity of your design, with reasons.
The Bonus
Complete the function clean in the file Clean.cpp, so that it does the same as
the executable Clean.
The files
Animal class (fully implemented): Animal.h
QuestionNode class from Assignment 1
(now fully implemented):
QuestionNode.h
The merge program (partially implemented): Merge.cpp
The clean program (partially implemented for Bonus Supplicants only): Clean.cpp
Two saved games and the result of merging them resepctively:
animalData1,
animalData2,
animalMerge.
Two saved games, the result of merging them and the result of
cleaning up the merged file resepctively:
animalData3,
animalData4,
animalMerge1
animalCleaned.
The executable: GuessGame
The executable: Merge
The executable: Clean
Submission instructions
Submit your files using the on-line Blackboard system. Make sure that the program files match the given names exactly. You may submit a .doc file for Task 2 if you wish.
Assessment
This assignment will be marked electronically. To
see how the marks will be awarded get the marking scheme here.
The electronic part
You may have your program “trialed” by the electronic
marker before the due date. Trials will begin from 28 April onwards. Any
files submitted after that time and before the deadline will be
presented to the electronic marker every alternate day until the
submission date.
The results of the trial runs will be emailed to you at your student
account. These trial results will be completely disregarded for
marking, so do not worry that mistakes in the trials will count against
you: they will not. The feedback is to help you correct errors that may
be lurking in your programs, and to make sure that your code is
efficient enough.
You will receive the actual results for the electronically marked
part of Assignment 2 after the final deadline for submission.
Hints for obtaining full marks in this assignment
There are a number of common mistakes that students make which
prevent them receiving full marks, even if they think they have
thoroughly tested their programs!
- There are sometimes small differences between the compilers
that students use at home and the compilers used in the department
computers. If you have developed your program at home, be sure to
check that it works on a department computer before you submit it.
If it fails because your home setup is different from the
department’s, you might receive zero marks.
This policy is for two reasons. Firstly (and obviously) we cannot run
assignments on your home computers, and so we cannot check whether they
work there. (We can’t even check if you bring in your laptop,
because we can’t be sure what’s installed on it.)
The second, and more important reason, is that when you have
graduated and are writing programs for clients, they will of course
insist that your product run on their machines. If you say to them
“Well, it ran on my own installation back at the office.”
then you won’t be in business for long.
- In this assignment you may only use the C++ library files already
declared in the program files. If you include any libraries of your own,
any at all, in your submitted files then you
will receive zero marks.
- Your assignments will be marked partly electronically on a
unix-based workstation (rather than on a PC). The unix operating system
distinguishes upper- and lower-cases in text. (For example
MyProgram.cpp and myprogram.cpp are different on a
unix machine, but might the same on a PC.) This means that you must
follow the case conventions for naming files and classes exactly; if you
do not then you run the risk that your program will not compile when it
is tested electronically, and then you will receive zero
marks.
Remember that there will be several opportunities to check that your
programs survive the electronic marker before the final submission date.
In spite of all these warnings, sometimes
programs are submitted that, astonishingly, don’t even
compile with our departmental electronic marker. This means that
their authors not only ignored all the warnings above, but they did not
even once use the trial-marking system. Try not to be one of
those people.