ICADL 2007 - LNCS 4822

Synopsis Information Extraction in Documents Through Probabilistic Text Classifiers

Jantima Polpinij¹ and Aditya Ghose²

¹Faculty of Informatics, Mahasarakham University, Mahasarakham 44150 Thailand
jantima.p@msu.ac.th

²School of Computer Science and Software Engineering, Faculty of Informatics, University of Wollonong, Wollongong, 2500 NSW, Australia
aditya@uow.edu.au

Abstract. Digital Libraries currently use several advanced information technologies to organize information and make it easy accessible to users. Current digital library trends to be dynamic digital library [1]. It is possible that business rules also can be approached for improving dynamic digital library. Business rules [2] are statements that define or contain some aspects of IT systems by providing a foundation for understanding how an IT system functions. At present, the need for automated business rules is becoming more essential because of the increasing usage of IT systems. However, it is not easy to extract business rules because they are written in a natural language structure and much of it is ignored. Therefore, one important question in this research area is how to automatically extract a business rule from a document? Based on this, information extraction (IE) [3] typically can be applied. Basically, IE is to transform text into information that is more readily analyzed. We believe that if the content of a document is decreased, the accuracy of rules extraction may be increased logically. With this assumption, if irrelevant information is filtered from the document, it is possible to easily extract business rules from the rest. Therefore, this research proposes a method based on probabilistic text classifier to extract synopsis information. It could be said that this work is the pre-processing of a business rules extraction methodology.

LNCS 4822, p. 508 f.

Full article in PDF | BibTeX