Using Text Mining to Identify Crime Patterns from Arabic Crime News Report Corpus

De Montfort University Open Research Archive

Show simple item record Alruily, Meshrif 2012-10-17T10:16:47Z 2012-10-17T10:16:47Z 2012
dc.description.abstract Most text mining techniques have been proposed only for English text, and even here, most research has been conducted on specific texts related to special contexts within the English language, such as politics, medicine and crime. In contrast, although Arabic is a widely spoken language, few mining tools have been developed to process Arabic text, and some Arabic domains have not been studied at all. In fact, Arabic is a language with a very complex morphology because it is highly inflectional l, and therefore, dealing with texts written in Arabic is highly complicated. This research studies the crime domain in the Arabic language, exploiting unstructured text using text mining techniques. Developing a system for extracting important information from crime reports would be useful for police investigators, for accelerating the investigative process (instead of reading entire reports) as well as for conducting further or wider analyses. We propose the Crime Profiling System (CPS) to extract crime-related information (crime type, crime location and nationality of persons involved in the event), automatically construct dictionaries for the existing information, cluster crime documents based on certain attributes and utilize visualisation techniques to assist in crime data analysis. The proposed information extraction approach is novel, and it relies on computational linguistic techniques to identify the abovementioned information, i.e. without using predefined dictionaries (e.g. lists of location names) and annotated corpus. The language used in crime reporting is studied to identify patterns of interest using a corpus-based approach. Frequency analysis, collocation analysis and concordance analysis are used to perform the syntactic analysis in order to discover the local grammar. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering and visualisation tasks for crime documents based on crime type, location or nationality. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data is cleaned by removing noise. As a result, a huge reduction in the quantity of data fed into the SOM is obtained, consequently, saving memory, data loading time and the execution time needed to perform the clustering. Therefore, the computation of the SOM is accelerated. Finally, the quantization error is reduced, which leads to high quality clustering. The outcome of the clustering stage is also visualised and the system is able to provide statistical information in the form of graphs and tables about crimes committed within certain periods of time and within a particular area. en
dc.description.sponsorship Saudi Cultural Bureau en
dc.language.iso en en
dc.publisher De Montfort University en
dc.subject Arabic language en
dc.subject text mining en
dc.subject Information Extraction en
dc.subject Named entity Recognition en
dc.title Using Text Mining to Identify Crime Patterns from Arabic Crime News Report Corpus en
dc.type Thesis or dissertation en
dc.publisher.department Faculty of Technology en
dc.publisher.department Software Technology Research Laboratory en
dc.type.qualificationlevel Doctoral en
dc.type.qualificationname PhD en

Files in this item

This item appears in the following Collection(s)

Show simple item record