Media Technology MSc

MSc Thesis


Automatic Annotation of Cyttron Entries using the NCIthesaurus


David Graus




Semantic annotation uses human knowledge formalized in ontologies to enrich texts, by providing structured and machine-understandable information of its content. This paper proposes an approach for automatically annotating texts of the Cyttron Scientific Image Database, using the NCI Thesaurus ontology. Several frequency-based keyword extraction algorithms were implemented and evaluated, aiming to extract important concepts and exclude less relevant ones. Furthermore, topic classification algorithms were applied to identify important concepts which do not occur in the text. The algorithms were evaluated by comparison to annotations provided by experts. Semantic networks were generated from these annotations and an ontology-based similarity metric was applied to perform the comparison. Finally the networks were visualized to provide further insights into the differences of the semantic structure generated by humans, and the algorithms.

Full Reference

David Graus, "Automatic Annotation of Cyttron Entries using the NCIthesaurus", Master's Thesis for the Media Technology programme, Leiden University (The Netherlands), 2012