Etiquetado de partes del discurso sobre un corpus en castellano basado en metaheurísticas

Jose Julio Tobar Cifuentes; Miguel Alexis Solano Jiménez; Luz Marina Sierra M; Carlos Alberto Cobos Lozada

Etiquetado de partes del discurso sobre un corpus en castellano basado en metaheurísticas

José Julio Tobar C. ^[1] ; Miguel Alexis Solano J. ^[1] ; Luz Marina Sierra-M. ^[1] ; Carlos Alberto Cobos L ^[1]
1. [1] Universidad del Cauca
  
  Universidad del Cauca
  
  Colombia
Localización: RISTI: Revista Ibérica de Sistemas e Tecnologias de Informação, ISSN-e 1646-9895, Nº. Extra 32, 2020, págs. 215-228
Idioma: español
Títulos paralelos:
- Parts of Speech Tagging for a corpus in Spanish based on metaheuristics
Enlaces
- Texto Completo Ejemplar
Resumen
- español
  El etiquetado de partes del discurso es una de las tareas más importantes en el preprocesamiento del lenguaje natural y tiene usos en el análisis de sentimientos, traducción de texto, reconocimiento de voz y recuperación de información, entre otros. Esta tarea se enfrenta a tres retos principales relacionados con la ambigüedad de las palabras, el tamaño del conjunto de etiquetas y el etiquetado de palabras desconocidas. Este artículo presenta la construcción de un dataset en castellano y la comparación de varios algoritmos metaheurísticos del estado del arte sobre el corpus en castellano, incluido un algoritmo memético mejorado que maneja diferentes contextos de las palabras, lo que le permite obtener un mejor desempeño.
- English
  The Part of Speech Tagging is one of the most important tasks in the natural language preprocessing and it has uses in sentiment analysis, text translation, voice recognition and information retrieval, among others. This task faces three main challenges related to the ambiguity of words, the size of the tagset and the labeling of unknown words. This article presents the construction of a dataset labeled in Spanish and the comparison of several state-of-theart metaheuristic algorithms over the Spanish corpus, including an improved memetic algorithm that handles different word contexts, which allows it to obtain a better performance.
Referencias bibliográficas
- Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M. J., Ventura, S., Garrell, J. M., … Herrera, F. (2009). KEEL: A software tool to assess...
- Alhasan, A., & Al-taani, A. T. (2018). POS Tagging for Arabic Text Using Bee Colony Algorithm. Procedia Computer Science, 158–165. https://doi.org/10.1016/j. procs.2018.10.471
- Alonso, H. M., & Zeman, D. (2016). Universal dependencies for the AnCora treebanks. Procesamiento de Lenguaje Natural, 57, 91–98.
- Araujo, L., Luque, G., & Alba, E. (2004). Metaheuristics for Natural Language Tagging. 889–900. https://doi.org/10.1007/978-3-540-24854-5_90
- Bordoloi, M., & Biswas, S. K. (2019). Graph-Based Sentiment Analysis Model for E-Commerce Websites’ Data. In Cognitive Informatics and...
- Brants, T. (2000). A Statistical Part-of-Speech Tagger. Sixth Applied Natural Language Processing Conference, 5, 1–7.
- Brill, E. (1992). A simple rule-based part of speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing. https://doi.org/10.3115/974499.974526
- Brownlee, J. (2011). Clever Algorithms. In Search. https://doi.org/10.1017/ CBO9781107415324.004
- De Marneffe, M.-C., MacCartney, B., & Manning, C. D. (2006). Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings...
- EAGLES. (2004.). ETIQUETAS EAGLES. Retrieved December 3, 2018, from http://blade10.cs.upc.edu/freeling-old/doc/tagsets/tagset-es.html
- Forsati, R., & Shamsfard, M. (2014a). Hybrid PoS-tagging: A cooperation of evolutionary and statistical approaches. Applied Mathematical...
- Forsati, R., & Shamsfard, M. (2014b). Novel harmony search-based algorithms for part-of-speech tagging. Knowledge and Information Systems,...
- Francis, W. N., & H. Kucera. (1979). Brown Corpus Manual. Retrieved December 3, 2018, from http://clu.uni.no/icame/manuals/BROWN/INDEX.HTM#bc8
- Güngör, T. (2011). Handbook of Natural Language Processing ( second edition ). In In ACM Computing Surveys. https://doi.org/10.1007/s10590-011-9117-6
- Huet, S., Gravier, G., & Sébillot, P. (2008). Morphosyntactic resources for automatic speech recognition. In 6th International Conference...
- Institut Universitari de Lingüística Aplicada (IULA), U. P. F. (2012). IULA Spanish LSP Treebank.
- Jurafsky, D., & Martin, J. H. (2009). Speech and Language Processing. Speech and Language Processing An Introduction to Natural Language Processing...
- Kabashi, B., & Proisl, T. (2016). A Proposal for a Part-of-Speech Tagset for the Albanian Language. Proceedings of the Tenth International...
- Karimpour, R., Ghorbani, A., Pishdad, A., Mohtarami, M., Aleahmad, A., Amiri, H., & Oroumchian, F. (2009). Improving persian information...
- Lavid, J., Arús, J., DeClerck, B., & Hoste, V. (2015). Creation of a High-quality, Registerdiversified Parallel (English-Spanish) Corpus...
- Luke, S. (2015). Essentials of Metaheuristics. Ma, J., Liu, H., Huang, D., & Sheng, W. (2011). An English part-of-speech tagger for machine...
- Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: the penn treebank. Computational...
- Neri, F., Mininno, E., & Iacca, G. (2013). Compact particle swarm optimization. Information Sciences, 239, 96–121. https://doi.org/10.1016/j.ins.2013.03.026
- Paul, A., Purkayastha, B. S., & Sarkar, S. (2015). Hidden Markov Model based Part of Speech Tagging for Nepali language. In 2015 International Symposium...
- Petrov, S., Das, D., & McDonald, R. (2011). A Universal Part-of-Speech Tagset. https://doi.org/10.1038/hdy.2008.34
- Pratt, K. S. (2009). Design Patterns for Research Methods: Iterative Field Research. AAAI Spring Symposium: Experimental Design for Real,...
- Ratnaparkhi, A. (1996). A Maximum Entropy Model for Part-Of-Speech Tagging. In Proceedings of the Conference on Empirical Methods in Natural...
- Schmid, H. (1994). Part-of-speech tagging with Neural Networks. European Journal of Cancer Prevention, 27(4), 296–302. https://doi.org/10.1097/ CEJ.0000000000000354
- Sierra, L. M., Cobos, C., & Corrales, J. C. (2014). Continuous Optimization Based on a Hybridization of Differential Evolution with K-means....
- Sierra M, L. M., Cobos, C. A., & Corrales, J. C. (2017). Memetic algorithm based on global-best harmony search and hill climbing for part...
- Sierra Martínez, L. M., Cobos, C. A., Muñoz Corrales, C. J., Curieux Rojas, T., Herreraviedma, E., & Peluffo-ordóñez, D. H. (2018). Building...
- Silva, A. P., Silva, A., & Rodrigues, I. (2013a). A New Approach to the POS Tagging Problem Using Evolutionary Computation. Proceedings...
- Silva, A. P., Silva, A., & Rodrigues, I. (2013b). BioPOS : Biologically Inspired Algorithms for POS Tagging. Proceedings of the First...
- Singh, P., & Chaudhary, H. (2018). A Modified Jaya Algorithm for Mixed-Variable Optimization Problems. Journal of Intelligent Systems,...
- Universal Pos-tags (2015). slavpetrov/universal-pos-tags: Automatically exported from code.google.com/p/universal-pos-tags.. Retrieved June...
- Surendran, D., & Levow, G.-A. (2006). Dialog Act Tagging with Support Vector Machines and Hidden Markov Models. Interspeech 2006 and 9th...
- Tobar, J., & Solano, M. (2020). Scripts Dataset IULA. https://bit.ly/38pr5oh Universal Dependencies. (2014). Universal Dependencies. Retrieved...
- Zeman, D. (2008). Reusable Tagset Conversion Using Tagset Drivers. In Proceedings of LREC.
- Zeroual, I., Lakhouaja, A., & Belahbib, R. (2017). Towards a standard Part of Speech tagset for the Arabic language. Journal of King Saud...

Mi Ágora

Selección

Opciones de artículo

Seleccionado

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Acceso de usuarios registrados

Etiquetado de partes del discurso sobre un corpus en castellano basado en metaheurísticas

Universidad del Cauca

Mi Ágora

Opciones de artículo

Opciones de compartir

Opciones de entorno