Evaluation and Improvement of Semantically-Enhanced Tagging System
The Social Web or ‘Web 2.0’ is focused on the interaction and collaboration between web sites users. It is credited for the existence of tagging systems, amongst other things such as blogs and Wikis. Tagging systems like YouTube and Flickr offer their users the simplicity and freedom in creating and sharing their own contents and thus folksonomy is a very active research area where many improvements are presented to overcome existing disadvantages such as the lack of semantic meaning, ambiguity, and inconsistency. TE is a tagging system proposing solutions to the problems of multilingualism, lack of semantic meaning and shorthand writing (which is very common in the social web) through the aid of semantic and social resources. The current research is presenting an addition to the TE system in the form of an embedded stemming component to provide a solution to the different lexical form problems. Prior to this, the TE system had to be explored thoroughly and then its efficiency had to be determined in order to decide on the practicality of embedding any additional components as enhancements to the performance. Deciding on this involved analysing the algorithm efficiency using an analytical approach to determine its time and space complexity. The TE had a time growth rate of O (N²) which is polynomial, thus the algorithm is considered efficient. Nonetheless, recommended modifications like patch SQL execution can improve this. Regarding space complexity, the number of tags per photo represents the problem size which, if it grows, will increase linearly the required memory space. Based on the findings above, the TE system is re-implemented on Flickr instead of YouTube, because of a recent YouTube restriction, which is of greater benefit in multi languages tagging system since the language barrier is meaningless in this case. The re-implementation is achieved using ‘flickrj’ (Java Interface for Flickr APIs). Next, the stemming component is added to perform tags normalisation prior to the ontologies querying. The component is embedded using the Java encoding of the porter 2 stemmer which support many languages including Italian. The impact of the stemming component on the performance of the TE system in terms of the size of the index table and the number of retrieved results is investigated using an experiment that showed a reduction of 48% in the size of the index table. This also means that search queries have less system tags to compare them against the search keywords and this can speed up the search. Furthermore, the experiment runs similar search trails on two versions of the TE systems one without the stemming component and the other with the stemming component and found out that the latter produced more results on the conditions of working with valid words and valid stems. The embedding of the stemming component in the new TE system has lessened the effect of the storage overhead needed for the generated system tags by their reduction for the size of the index table which make the system suited for many applications such as text classification, summarization, email filtering, machine translation…etc.
- PhD