Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Machine Learning for Readability of Legislative Sentences

Loading...
Thumbnail Image

Date

Authors

Curtotti, Michael
McCreath, Eric
Bruce, Thomas
Frug, Sara
Weibel, Wayne
Ceynowa, Nicolas

Journal Title

Journal ISSN

Volume Title

Publisher

Association for Computing Machinery (ACM)

Abstract

Improving the readability of legislation is an important and unresolved problem. Recently, researchers have begun to apply legal informatics to this problem. This paper applies machine learning to predict the readability of sentences from legislation and regulations. A corpus of sentences from the United States Code and US Code of Federal Regulations was created. Each sentence was labelled for language difficulty using results from a large-scale crowdsourced study undertaken during 2014. The corpus was used as training and test data for machine learning. The corpus includes a version tagged using the Stanford parser context free grammar and a version tagged using the Stanford dependency grammar parser. The corpus is described and made available to interested researchers. We investigated whether extending natural language features available as input to machine learning improves the accuracy of prediction. Among features evaluated are those from the context free and dependency grammars. Letter and word ngrams were also studied. We found the addition of such features improves accuracy of prediction on legal language. We also undertake a correlation study of natural language features and language difficulty drawing insights as to the characteristics that may make legal language more difficult. These insights, and those from machine learning, enable us to describe a system for reducing legal language difficulty and to identify a number of suggested heuristics for improving the writing of legislation and regulations

Description

Keywords

Citation

Source

A Study of Query Reformulation for Patent Prior Art Search with Partial Patent Applications

Book Title

Entity type

Access Statement

License Rights

Restricted until

2037-12-31
abcd