Free growdsourcing-based corpus annotation

dc.contributor.author	Bougrine, Fatna
dc.contributor.author	Djellikh, Soumia
dc.contributor.author	Cherroun, Hadda
dc.date.accessioned	2023-01-25T14:05:48Z
dc.date.available	2023-01-25T14:05:48Z
dc.date.issued	2017
dc.description.abstract	Large corpora are very useful to develop and validate Natural Language Processing (NLP) systems. However, these corpora are generally collected and annotated automatically. To validate such annotation, two solutions are possible. We can use skills of expert, which can be costly and time consuming, or use crowdsourcing technique. Crowdsourcing can be defined as the act of attracting many non experts to complete a certain task by using paid/unpaid dedicated platform. In this work, we are interested to validate a semi-automatic dialect annotation of Kalam’DZ corpus. Our approach relies on free crowdsourcing using Crowdcrafting platform. The validation is performed on 10% (11 hours) of the total size of Kalam’DZ. A quality control of this validation is ensured through a confrontation with expert annotation, which shows that more than 80% of annotations are similar. Our results confirm that free crowdsourcing is effective for speech dialect annotation.
dc.identifier.uri	https://dspace.lagh-univ.dz/handle/123456789/3185
dc.language.iso	en
dc.publisher	Université Amar Telidji - Laghouat - Département d'informatique
dc.title	Free growdsourcing-based corpus annotation
dc.type	Thesis

Files

Now showing 1 - 1 of 1

Now showing 1 - 1 of 1