Free growdsourcing-based corpus annotation

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Université Amar Telidji - Laghouat - Département d'informatique

Abstract

Large corpora are very useful to develop and validate Natural Language Processing (NLP) systems. However, these corpora are generally collected and annotated automatically. To validate such annotation, two solutions are possible. We can use skills of expert, which can be costly and time consuming, or use crowdsourcing technique. Crowdsourcing can be defined as the act of attracting many non experts to complete a certain task by using paid/unpaid dedicated platform. In this work, we are interested to validate a semi-automatic dialect annotation of Kalam’DZ corpus. Our approach relies on free crowdsourcing using Crowdcrafting platform. The validation is performed on 10% (11 hours) of the total size of Kalam’DZ. A quality control of this validation is ensured through a confrontation with expert annotation, which shows that more than 80% of annotations are similar. Our results confirm that free crowdsourcing is effective for speech dialect annotation.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By