R&D

R&D in the Categorizing Page of the Encyclopedia of Law

Categorizing Page of the Encyclopedia of Law’s LCC classification research aims to automatically assign entries of the legal encyclopedia a LCC classifier scheme call number and category.

The aim of this project is to assign one or more classifications from the Library of Congress Classification (LCC) scheme to each entry. Over the years, other projects have explored similar territory: most of these are based on Infomation Retrieval techniques and use additional metadata, such as the document title. Some use Support Vector Machine algorithms for machine learning techniques for text classification.

Categorizing Page of the Encyclopedia of Law has two main practical applications for this research. First, we are using it to assign a LCC to the more than 200,000 existing Encyclopedia of Law entries (records). Second, we will use this project in combination with our other assignment tools (like the legal classification project) to assign both Library of Congress Classification and other schemes to new entries of the Encyclopedia.

Work in Progress

We have applied the (a detailed version of the LCC hierarchy) Library of Congress Classification to every entry in the Encyclopedia of Law (but is hidden currently), and created a browsing interface to this structure. This interface is a work in progress.

The model itself is based on the LCC’s hierarchical structure. We found that when we use more training data, performance increases.

We are particularly interested in top-level accuracy: the proportion of records that were correctly classified at the top level of the tree.

We have encountered a number of problems. For example, a problem occurs when the entries have a good primary category and an additional category or keyword that is only slightly related to the entry topic.

The Categorizing Page of the Encyclopedia of Law Project has several avenues of future research in mind, including creating a user-friendly Subject Disciplines hierarchy and browsing interface based on the Library of Congress Classifications.

Further Reading

Classification:

Y. Yang, J. Zhang and B. Kisiel. A scalability analysis of classifiers in text categorization (ps.gz) ACM SIGIR’03, pp 96-103, 2003.
Ardo, A. and Koch, T. Automatic Classification of WAIS Databases. Technical report, Lund University Library, 1994.[Available at URL: http://www.ub2.lu.se/autoclass.html] 22 November 1996.

Chen, H, Schuffels, C. and Orwig, R. 1996. Internet Categorization and Search: A Self-Organizing Approach. Journal of Visual Communication and Image Representation 7(1):88-102.

Hermans, B. 1996. Intelligent Software Agents on the Internet: An Inventory of Currently Offered Functionality in the Information Society & A Prediction of (Near-)Future Developments. Tilburg University, Tilburg, The Netherlands, July 9, 1996.[Available at URL: http://www.hermans.org/agents ] 23 November 1996.

McKiernan, G. 1995. CyberStacks(sm): A ‘Library-Organized’ Virtual Science and Technology Reference Collection. In D-Lib Magazine.[Available at URL: http://www .dlib.org/dlib/december95/briefings/12cyber.html] December 1995. 19 November 1996.

McKiernan, G. 1996a. Information Visualisation: The World Wide Web Gets Really Graphical! Intelligence: The Magazine of the Information Age, Special Edition (1997 Guide to the Internet), 116-118, December 1996.

McKiernan, G. 1997a. Automated Categorization of Web Resources: A Profile of Selected Projects, Research, Products, and Services. New Review of Information Networking. In review.

McKiernan, G. 1997b. Beyond Bookmarks: A Review of Frameworks, Features, and Functionalities of Schemes for Organizing the Web. Internet Reference Services Quarterly 2(1/2) 1997. In final review.

McKiernan, G. 1997c. The New/Old World (Wide Web) Order: The Application of ‘Neo-Conventional Functionality’ to Facilitate Access and Use of a WWW Database of Science and Technology Internet Resources. Journal of Internet Cataloging 1(1). In Press [Abstract available at: http://jic.libraries.psu.edu/jic1nr1.html ] 12 February 1997.

“Net Projects.” [ http://www.public.iastate.edu/~CYBERSTACKS/Projects.htm ] 20 November 1996.

Vizine-Goetz, D. 1996. Online Classification: Implications for Classifying and Document[-like Object] Retrieval, in Knowledge Organization and Change: Proceedings of the Fourth International ISKO Conference, 15-18 July 1996, Washington, DC, USA, (Washington, D.C: INDEKS Verlag, Frankfurt/Main, 1996), pp. 249- 253.

Linear Classification:

J. Zhang and Y. Yang. Robustness of regularized linear classification methods in text categorization (ps.gz) ACM SIGIR’03, pp 190-197, 2003.

Leave a Reply

Your email address will not be published. Required fields are marked *