Two-Tier Performance Based Classification Model for Low Level NLP tasks
Loading...
Date
2005-02-02
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
INFLIBNET Centre
Abstract
An error in classification can occur due to an error of omission, statistically known as a false
negative or an error of commission, statistically known as a false positive. In order to build
a perfect classifier, the false negatives and false positives have to be zero. With this in mind,
we propose a two-tier model for the classifier. The first tier will reduce false negatives to zero
and pass the results to the second tier. The second tier will reduce false positives to zero.
We demonstrate the working of this model for the task of classifying sentences in Hindi as
passive formations. The first tier will consist of a simple pattern matching system for filtering
out sentences with likely passive formations without committing errors of omission. This will
reduce the size of the corpus considerably. The second tier will work on the reduced corpus
and make a complete grammatical analysis of these filtered sentences in order to reduce
the false positives to a zero. The Anusaraka System [Bharati 1995] is a very good example
of such a system. This paper concentrates on building the first tier. A hill climbing algorithm
is proposed, where the start state is a list of patterns commonly found in passive formations.
Each step up the hill will update the list of patterns such that the next state will bring down the
number of false negatives, thereby reducing errors of omission. The hill climbing algorithm
terminates when the false negatives are zero.
Description
Keywords
Natural Language Processing, Automated Language Processing