Home > lazy_id3

lazy_id3

Lazy_id3 is a project mainly written in ..., it's free.

Lazy ID3 implementation in Python

Objective

Lazy algorithms have no training phase, their task is to classify the results based on an examples set as they arrive. A classic example is the NN algorithm, in which the nearest neighbors of the instance to be classified are selected from the examples set. Algorithms for decision trees on the other hand have a training phase before they can classify incoming instances. In this phase, a decision tree is built and is then exploited to answer classification queries. In this approach, the ID3 algorithm's training phase is replaced by one that also considers the query instance in order to minimize the produced tree. This way the training (tree construction phase) is converted into a decision phase, through which only a subtree that is relative to the incoming data is calculated.

Description

In order to make the decision tree created upon the arrival of an unclassified instance, the first step to be made is to prune all data from the examples set that are unrelated to the unclassified instance.

First step: Removing all unrelated examples

A simple way to remove all unrelated records is to check all the fields of the unclassified instance and for each example that has at least one field same we should keep it, if not (they have no same fields) then we have to remove it from the examples set. This is because if we had built a complete decision tree then in the classification process we would never check the paths of the tree that prune on the fields of those records because none would match our unclassified instance.

Second step: Excluding all unrelated fields from branching heuristic

Since we have already checked that some of the examples have at least one same field with the instance that must be classified, then we already know which fields are important for our decision tree. We should make sure that the tree only branches on those fields, and that is because if we branch on a field for which we do not have any training example with the same as the instance's value we will not be able to classify our instance through this sub-tree. A simple way to do this is to keep all the fields in a list and remove those that we should branch on from it, then we can use the remaining points as an already visited list, and thus preventing the algorithm to branch on them.

Running

An example is presented below. python main.py -d datasets/games/games.data --lazy_find UNSEEN_VALUE,5pm,OTHER_VALUE,OTHER_VALUE,Center,OTHER_VALUE,?

This command will create a small decision tree for the instance described as parameter to the lazy find argument, using the algorithms described above (the target value is marked with a question mark).

Previous:nicotine