COMP225 Assignment 2 — Semester 1, 2008


Tree-structured databases: Part 2

Worth 10%

Due 11:59pm on Monday 5 May 2008.


Objectives

This assignment is designed to give you more practice in programming with multiple pointer-based trees. You will also learn how to rationalise data from several sources to maintain consistency.

This assignment has a standard part (which is worth the 10% mentioned above) and a bonus part (which is worth even more). For the standard (none bonus) part you must submit one file for Task 1 using the Blackboard submission system, and one hard copy in the named assignment box for Task 2. The file should be called Merge.cpp, and should complete the code you need to solve the problem described below at Task 1. The hard copy should contain the written answers to the questions described below at Task 2.

The formats you should use for your files are specified below.

If you do not use the filenames stated above for the program files, or you do not use the file format specified below, then you will receive 0 (zero) marks.

Submit your work by the due date and time stated above.

Overview

When a database is distributed over a number of independent sites, from time to time the information gathered at the different sites must be merged into a single consistent file. In a smart telephone-enquiry service the day’s information gathered at different locations must be gathered into a central consistent repository so that all users can benefit from the improved service.

This Assignment

In this assignment you will learn how to merge two saved games together (from Assignment 1) to produce a single consistent game tree. The process for doing that is split up into two stages: the first merges files together; and the second removes inconsistencies generated by the merging process. In Task 1 you are asked to write a program which merges two trees together. In the Task 2 you are asked to design an algorithm to remove any inconsistencies remaining after the merging process. For the bonus you are asked to code up your algorithm designed in Task 2. You have been provided with a number of program- and specification files, and three executables. The executable called Guess is the game program from Assignment 1. The executatable called Merge gives you an idea of what your program from Task 1 should do if implemented correctly. The executable called Clean is for Task 2 and the Bonus. All executables will run on titanic. The other files are as follows:
  1. QuestionNode.h This is the file which deals with the building blocks for a tree: it defines a tree node, and implements a constructor. (This file contains fully implemented code, corresponding to Task 2 of Assignment 1.)
  2. Animal.h This contains some utility routines for implementing the game, and is the same as the file of that name distributed for Assignment 1.
  3. Merge.cpp This is the file which does the merging. It takes as input two saved games in serialised format, and outputs a single serialised file which is the merge of the two input files. Your Task 1 below asks you to complete the implementation of the function Merge which is the main routine for merging the trees. To understand how Merge should work, read the specification of Task 1 below, play the separate games corresponding to the two input files (animalData1, animalData2) and the merged file (animalMerge) by executing GuessGame separately on the three files and study the notes provided in lectures. Then complete the starred tutorial and practical exercises.
  4. animalData1, animalData2 and animalMerge are three files containing respectively two saved games and the result of a merge.

    To understand how they relate to each other, play the game separately on the three files.

  5. animalData3, animalData4, animalMerge2 and animalCleaned are four files containing respectively two saved games, the result of merging them, and the result of cleaning them to remove inconsistencies.

    To understand how they relate to each other, try playing the four games. Your Task 2 below asks you questions about that relationship, and to how to design an algorithm which achieves it, i.e. cleans the merge of two files to remove inconsistencies.

To play the game using any of the executables, download it and any required datafiles and make sure that they are in the same directory.

Task 1 (10 marks)

Your first task is to complete the function Merge in Merge.cpp so that your program merges two game trees together and produces a serialised file of the merged trees, in the same way that the executable Merge does.

To help your understanding of what needs to be done, complete the starred Tutorial exercises in Week 6. Then interpret the following pseudocode to complete this task. (UPPER-CASE portions are for you to fill in.)

 QuestionNode *merge(QuestionNode *qN1, QuestionNode *qN2) {
  if      (EITHER qN1 OR qN2 IS NULL) RETURN THE APPROPRIATE TREE WITHOUT MERGING;
  else if (THE TWO TEXTS ARE THE SAME) // DO NOT DUPLICATE THE SAME QUESTION
       MAKE A NODE WITH THE CURRENT TEXT, THEN 
       ADD THE MERGE OF THE YES SUBTREES AS THE NEW YES SUBTREE
       ADD THE MERGE OF THE NO SUBTREES AS THE NEW NO SUBTREE
       THEN RETURN THE RESULT;
  else if (BOTH ARE GUESS NODES)
       MAKE A NODE USING qN1'S TEXT, THEN
       ADD NULL AS THE YES CHILD, AND qN2 AS THE NO SUBTREE
       THEN RETURN THE RESULT
  else if (qN1 IS A GUESS) 
       MAKE A NODE WITH qN2'S TEXT, THEN 
       ADD THE MERGE OF qN1 AND qN2'S YES SUBTREES AS THE NEW YES SUBTREE
       ADD THE MERGE OF qN1 AND qN2'S NO SUBTREES AS THE NEW NO SUBTREE
       THEN RETURN THE RESULT;
  else if (qN2 IS A GUESS)
        SIMILAR TO THE ABOVE;
        // IF NONE OF THE ABOVE, THEN THE TWO NODE ARE BOTH QUESTIONS
  else MAKE A NODE USING qN1'S TEXT, THEN
       ADD THE MERGE OF qN2 AND qN1'S YES SUBTREE AS THE NEW YES SUBTREE
       ADD THE MERGE OF qN2 AND qN1'S NO SUBTREE AS THE NEW NO SUBTREE
       THEN RETURN THE RESULT// NOTE ORDER SWAP.
}
 
You will have correctly implemented this function if it does exactly the same as Merge.

Remember that in order to compile your code, you should have all the files QuestionNode.h, Animal.h and Merge.cpp in the same folder and use g++ Merge.cpp.

Task 2 (10 marks)

The second task is to answer the following questions about designing a cleaning algorithm for removing inconsistencies in a merged tree.
  1. (1 mark) Given the two files animalData3 and animamalData4 given below sketch their corresponding binary trees. Now sketch the tree obtained by merging the two together and explain why there is an inconsistency.
  2. (2 marks) Now suppose that we want to preserve all the information of the two original trees, but to remove any inconsistencies. Which branches of the merged tree should be removed? Sketch the resulting tree.
  3. (1 mark) The basic idea to removing inconsistencies is to traverse the tree, pruning it whenever a question is repeated. Describe what additional data structures are required to do this in practice.
  4. (4 marks) Now describe a strategy for pruning the tree, explaining carefully how you would use the data structure mentioned above. You must present your answer clearly using pseudocode. You may if you wish define a number of functions to carry out the various tasks in the strategy, but you must specify those functions carefully.
  5. (2 marks) Give an estimate of the complexity of your design, with reasons.

The Bonus

Complete the function clean in the file Clean.cpp, so that it does the same as the executable Clean.

The files

Animal class (fully implemented): Animal.h

QuestionNode class from Assignment 1 (now fully implemented): QuestionNode.h

The merge program (partially implemented): Merge.cpp

The clean program (partially implemented for Bonus Supplicants only): Clean.cpp

Two saved games and the result of merging them resepctively: animalData1, animalData2, animalMerge.

Two saved games, the result of merging them and the result of cleaning up the merged file resepctively: animalData3, animalData4, animalMerge1 animalCleaned.

The executable: GuessGame

The executable: Merge

The executable: Clean

Submission instructions

Submit your files using the on-line Blackboard system. Make sure that the program files match the given names exactly. You may submit a .doc file for Task 2 if you wish.

Assessment

This assignment will be marked electronically. To see how the marks will be awarded get the marking scheme here.

The electronic part

You may have your program “trialed” by the electronic marker before the due date. Trials will begin from 28 April onwards. Any files submitted after that time and before the deadline will be presented to the electronic marker every alternate day until the submission date.

The results of the trial runs will be emailed to you at your student account. These trial results will be completely disregarded for marking, so do not worry that mistakes in the trials will count against you: they will not. The feedback is to help you correct errors that may be lurking in your programs, and to make sure that your code is efficient enough.

You will receive the actual results for the electronically marked part of Assignment 2 after the final deadline for submission.

Hints for obtaining full marks in this assignment

There are a number of common mistakes that students make which prevent them receiving full marks, even if they think they have thoroughly tested their programs!

Remember that there will be several opportunities to check that your programs survive the electronic marker before the final submission date.

In spite of all these warnings, sometimes programs are submitted that, astonishingly, don’t even compile with our departmental electronic marker. This means that their authors not only ignored all the warnings above, but they did not even once use the trial-marking system.

Try not to be one of those people.