• Login
    View Item 
    •   DORA Home
    • De Montfort University e-theses
    • PhD
    • View Item
    •   DORA Home
    • De Montfort University e-theses
    • PhD
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Using Text Mining to Identify Crime Patterns from Arabic Crime News Report Corpus

    Thumbnail
    View/Open
    thesis- Meshrif Alruily.pdf (3.062Mb)
    Date
    2012
    Author
    Alruily, Meshrif
    Metadata
    Show attachments and full item record
    Abstract
    Most text mining techniques have been proposed only for English text, and even here, most research has been conducted on specific texts related to special contexts within the English language, such as politics, medicine and crime. In contrast, although Arabic is a widely spoken language, few mining tools have been developed to process Arabic text, and some Arabic domains have not been studied at all. In fact, Arabic is a language with a very complex morphology because it is highly inflectional l, and therefore, dealing with texts written in Arabic is highly complicated. This research studies the crime domain in the Arabic language, exploiting unstructured text using text mining techniques. Developing a system for extracting important information from crime reports would be useful for police investigators, for accelerating the investigative process (instead of reading entire reports) as well as for conducting further or wider analyses. We propose the Crime Profiling System (CPS) to extract crime-related information (crime type, crime location and nationality of persons involved in the event), automatically construct dictionaries for the existing information, cluster crime documents based on certain attributes and utilize visualisation techniques to assist in crime data analysis. The proposed information extraction approach is novel, and it relies on computational linguistic techniques to identify the abovementioned information, i.e. without using predefined dictionaries (e.g. lists of location names) and annotated corpus. The language used in crime reporting is studied to identify patterns of interest using a corpus-based approach. Frequency analysis, collocation analysis and concordance analysis are used to perform the syntactic analysis in order to discover the local grammar. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering and visualisation tasks for crime documents based on crime type, location or nationality. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data is cleaned by removing noise. As a result, a huge reduction in the quantity of data fed into the SOM is obtained, consequently, saving memory, data loading time and the execution time needed to perform the clustering. Therefore, the computation of the SOM is accelerated. Finally, the quantization error is reduced, which leads to high quality clustering. The outcome of the clustering stage is also visualised and the system is able to provide statistical information in the form of graphs and tables about crimes committed within certain periods of time and within a particular area.
    Description
    URI
    http://hdl.handle.net/2086/7584
    Collections
    • PhD [1359]

    Submission Guide | Reporting Guide | Reporting Tool | DMU Open Access Libguide | Take Down Policy | Connect with DORA
    DMU LIbrary
     

     

    Browse

    All of DORACommunities & CollectionsAuthorsTitlesSubjects/KeywordsResearch InstituteBy Publication DateBy Submission DateThis CollectionAuthorsTitlesSubjects/KeywordsResearch InstituteBy Publication DateBy Submission Date

    My Account

    Login

    Submission Guide | Reporting Guide | Reporting Tool | DMU Open Access Libguide | Take Down Policy | Connect with DORA
    DMU LIbrary