Amazigh Grammatical Labelling using n-gram Properties and Segmentation Pre-treatment

Mohamed Outahajala, Yassine Benajiba, Paolo Rosso, Lahbib Zenkouar

Abstract


This paper present the first Amazigh POS tagger. Very few linguistic resources have been developed so far for Amazigh and we believe that the development of a POS tagger tool is the first step needed for automatic text processing. In order to achieve this endeavor, we have trained two sequence classification models using Support Vector Machines (SVMs) and Conditional Random Fields (CRFs) after using a tokenization step. We have used the 10-fold technique to evaluate our approach. Results show that the performance of SVMs and CRFs are very comparable. Across the board, SVMs outperformed CRFs on the fold level (92.58% vs. 92.14%) and CRFs outperformed SVMs on the 10 folds average level (89.48% vs. 89.29%). These results are very promising considering that we have used a corpus of only ~20k tokens.

Refbacks

  • There are currently no refbacks.


Copyright (c) 2012 Mohamed Outahajala, Yassine Benajiba, Paolo Rosso, Lahbib Zenkouar

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN 1114-8802 / ISBN 2665-7015

Last updated : Oct 6, 2020