A Generic architecture for semantic enhanced tagging systems
The Social Web, or Web 2.0, has recently gained popularity because of its low cost and ease of use. Social tagging sites (e.g. Flickr and YouTube) offer new principles for end-users to publish and classify their content (data). Tagging systems contain free-keywords (tags) generated by end-users to annotate and categorise data. Lack of semantics is the main drawback in social tagging due to the use of unstructured vocabulary. Therefore, tagging systems suffer from shortcomings such as low precision, lack of collocation, synonymy, multilinguality, and use of shorthands. Consequently, relevant contents are not visible, and thus not retrievable while searching in tag-based systems. On the other hand, the Semantic Web, so-called Web 3.0, provides a rich semantic infrastructure. Ontologies are the key enabling technology for the Semantic Web. Ontologies can be integrated with the Social Web to overcome the lack of semantics in tagging systems. In the work presented in this thesis, we build an architecture to address a number of tagging systems drawbacks. In particular, we make use of the controlled vocabularies presented by ontologies to improve the information retrieval in tag-based systems. Based on the tags provided by the end-users, we introduce the idea of adding “system tags” from semantic, as well as social, resources. The “system tags” are comprehensive and wide-ranging in comparison with the limited “user tags”. The system tags are used to fill the gap between the user tags and the search terms used for searching in the tag-based systems. We restricted the scope of our work to tackle the following tagging systems shortcomings: - The lack of semantic relations between user tags and search terms (e.g. synonymy, hypernymy), - The lack of translation mediums between user tags and search terms (multilinguality), - The lack of context to define the emergent shorthand writing user tags. To address the first shortcoming, we use the WordNet ontology as a semantic lingual resource from where system tags are extracted. For the second shortcoming, we use the MultiWordNet ontology to recognise the cross-languages linkages between different languages. Finally, to address the third shortcoming, we use tag clusters that are obtained from the Social Web to create a context for defining the meaning of shorthand writing tags. A prototype for our architecture was implemented. In the prototype system, we built our own database to host videos that we imported from real tag-based system (YouTube). The user tags associated with these videos were also imported and stored in the database. For each user tag, our algorithm adds a number of system tags that came from either semantic ontologies (WordNet or MultiWordNet), or from tag clusters that are imported from the Flickr website. Therefore, each system tag added to annotate the imported videos has a relationship with one of the user tags on that video. The relationship might be one of the following: synonymy, hypernymy, similar term, related term, translation, or clustering relation. To evaluate the suitability of our proposed system tags, we developed an online environment where participants submit search terms and retrieve two groups of videos to be evaluated. Each group is produced from one distinct type of tags; user tags or system tags. The videos in the two groups are produced from the same database and are evaluated by the same participants in order to have a consistent and reliable evaluation. Since the user tags are used nowadays for searching the real tag-based systems, we consider its efficiency as a criterion (reference) to which we compare the efficiency of the new system tags. In order to compare the relevancy between the search terms and each group of retrieved videos, we carried out a statistical approach. According to Wilcoxon Signed-Rank test, there was no significant difference between using either system tags or user tags. The findings revealed that the use of the system tags in the search is as efficient as the use of the user tags; both types of tags produce different results, but at the same level of relevance to the submitted search terms.
- PhD