Free growdsourcing-based corpus annotation

dc.contributor.authorBougrine, Fatna
dc.contributor.authorDjellikh, Soumia
dc.contributor.authorCherroun, Hadda
dc.date.accessioned2023-01-25T14:05:48Z
dc.date.available2023-01-25T14:05:48Z
dc.date.issued2017
dc.description.abstractLarge corpora are very useful to develop and validate Natural Language Processing (NLP) systems. However, these corpora are generally collected and annotated automatically. To validate such annotation, two solutions are possible. We can use skills of expert, which can be costly and time consuming, or use crowdsourcing technique. Crowdsourcing can be defined as the act of attracting many non experts to complete a certain task by using paid/unpaid dedicated platform. In this work, we are interested to validate a semi-automatic dialect annotation of Kalam’DZ corpus. Our approach relies on free crowdsourcing using Crowdcrafting platform. The validation is performed on 10% (11 hours) of the total size of Kalam’DZ. A quality control of this validation is ensured through a confrontation with expert annotation, which shows that more than 80% of annotations are similar. Our results confirm that free crowdsourcing is effective for speech dialect annotation.
dc.identifier.urihttps://dspace.lagh-univ.dz/handle/123456789/3185
dc.language.isoen
dc.publisherUniversité Amar Telidji - Laghouat - Département d'informatique
dc.titleFree growdsourcing-based corpus annotation
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MF 01-22.pdf
Size:
5.11 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: