Linguistically-driven Selection of Correct Arcs for Dependency Parsing

Dell'Orletta, Felice; Venturi, Giulia; Montemagni, Simonetta

Services on Demand

Journal

Article

Indicators

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 Ciudad de México Apr./Jun. 2013

Artículos

Linguistically-driven Selection of Correct Arcs for Dependency Parsing

Selección de los arcos correctos basada en información lingüística para análisis sintáctico de dependencias

Felice Dell'Orletta¹, Giulia Venturi², and Simonetta Montemagni³

¹ Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), ItaliaNLP Lab - www.italianlp.it Pisa, Italy felice.dellorletta@ilc.cnr.it

² Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), ItaliaNLP Lab - www.italianlp.it Pisa, Italy simonetta.montemagni@ilc.cnr.it

³ Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR), ItaliaNLP Lab - www.italianlp.it Pisa, Italy giulia.venturi@ilc.cnr.it

Article received on 07/12/2012
Accepted on 15/01/2013.

Abstract

LISCA is an unsupervised algorithm aimed at assigning a quality score to each arc generated by a dependency parser in order to produce a decreasing ranking of arcs from correct to incorrect ones. LISCA exploits statistics about a set of linguistically-motivated and dependency-based features extracted from a large corpus of automatically parsed sentences and uses them to assign a quality score to each arc of a parsed sentence belonging to the same domain of the automatically parsed corpus. LISCA has been successfully tested on two datasets belonging to two different domains and in all experiments it turned out to outperform different baselines, thus showing to be able to reliably detect correct arcs also representing domain-specific peculiarities.

Keywords: Dependency parsing, correct arcs.

Resumen

LISCA es un algoritmo no supervisado cuyo objetivo es asignar un puntaje cualitativo a cada arco generado por el analizador sintáctico de dependencias con el fin de producir un ranking decreciente de los arcos desde los correctos hasta los incorrectos. LISCA usa la estadística del conjunto de características basadas en la información lingüística y dependencias que se extraen del corpus grande de frases analizadas sintácticamente por la computadora y las utiliza para asignar un puntaje cualitativo a cada arco de la frase analizada que pertenece al mismo dominio del corpus. LISCA se probo exitosamente utilizando dos conjuntos de datos de dos dominios distintos y en todos los experimentos su rendimiento fue mejor que el de varios métodos de referencia; así se demostró su capacidad de detectar los arcos correctos de manera confiable representando también las características específicas de los dominios.

Palabras clave: Análisis sintáctico de dependencias, arcos correctos.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

1. Ambati, B. R., Gupta, M., Husain, S., & Sharma, D. M. (2010). A high recall error identification tool for hindi treebank validation. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC'10). Valleta, Malta, 682-686. [ Links ]

2. Anguiano, E. H. & Candito, M. (2011). Parse correction with specialized models for difficult attachment types. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2011). Edinburgh, United Kingdom, 1222-1233. [ Links ]

3. Attardi, G. & Ciaramita, M. (2007). Tree revision learning for dependency parsing. In Proceedings of the Conference on Human Language Technologies (NAACL-HLT2007). Rochester, NY, 388-395. [ Links ]

4. Attardi, G., Dell'Orletta, F., Simi, M., Chanev, A., & Ciaramita, M. (2007). Multilingual dependency parsing and domain adaptation using desr. In Proceedings of the CoNLL Shared Task Session of the EMNLP-CoNLL 2007. Prague, 1112-1118. [ Links ]

5. Attardi, G., Dell'Orletta, F., Simi, M., & Turian, J. (2009). Accurate dependency parsing with a stacked multilayer perceptron. In Proceedings of EVALITA, Evaluation of NLP and Speech Tools for Italian. Reggio Emilia, Italy. [ Links ]

6. Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In Proceedings of Biennial GSCL Conference 2009, Meaning: Processing Texts Automatically. Tubingen, Gunter Narr Verlag, 31-40. [ Links ]

7. Buchholz, S. & Marsi, E. (2006). Conll-x shared task on multilingual dependency parsing. In Proceedings of CoNLL. [ Links ]

8. Charniak, E. & Johnson, M. (2005). Coarse-to-fine n-best parsing and maxent discriminative reranking. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL'05). Ann Arbor, Michigan, 173-180. [ Links ]

9. Chen, W., Kazama, J., Uchimoto, K., & Torisawa, K. (2009). Improving dependency parsing with subtrees from auto-parsed data. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Singapore, 570-579. [ Links ]

10. Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics. Santa Cruz, CA, 184-191. [ Links ]

11. Dell'Orletta, F. (2009). Ensemble system for part-of-speech tagging. In Proceedings of Evalita, Evaluation of NLP and Speech Tools for Italian. Reggio Emilia, Italy. [ Links ]

12. Dell'Orletta, F., Venturi, G., & Montemagni, S. (2011). Ulisse: an unsupervised algorithm for detecting reliable dependency parses. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL 11), Association for Computational Linguistics. Portland, Oregon, 115-124. [ Links ]

13. Dickinson, M. (2010). Detecting errors in automatically-parsed dependency relations. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 10). Uppsala, Sweden, 729-738. [ Links ]

14. Dickinson, M. & Meurers, W. D. (2003). Detecting inconsistencies in treebank. In Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT). [ Links ]

15. Dickinson, M. & Smith, A. (2011). Detecting dependency parse errors with minimal resources. In Proceedings of the 12th International Conference on Parsing Technologies (IWPT2011). Dublin, Ireland, 241-252. [ Links ]

16. Frazier, L. (1985). Syntactic complexity. In Dowty, D., Karttunen, L., & Zwicky, A., editors, Natural Language Parsing: Psychological, Computational, and Thepretical Perspectives. Cambridge, Cambridge University Press, 129-189. [ Links ]

17. Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68(1), 1-76. [ Links ]

18. Hall, K. & Novák, V. (2005). Corrective modeling for non-projective dependency parsing. In Proceedings of the Ninth International Workshop on Parsing Technology. Vancouver, British Columbia, Canada, 42-52. [ Links ]

19. Hawkins, J. A. (1994). A Performance Theory of Order and Constituency. Cambridge University Press, Cambridge. [ Links ]

20. Kawahara, D. & Uchimoto, K. (2008). Learning reliability of parses for domain adaptation of dependency parsing. In Proceedings of IJCNLP 2008. 709-714. [ Links ]

21. Lin, D. (1996). On the structural complexity of natural language sentences. In Proceedings of COLING 1996. 729-733. [ Links ]

22. Liu, H. (2010). Dependency direction as a means of word-order typology a method based on dependency treebanks. Lingua, 120(6), 1567-1578. [ Links ]

23. Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of english: the penn treebank. Comput. Linguist., 19(2), 313-330. [ Links ]

24. McClosky, D., Charniak, E., & Johnson, M. (2006). Reranking and self-training for parser adaptation. In Proceedings of 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Sydney, Australia, 337-344. [ Links ]

25. McDonald, R. & Nivre, J. (2007). Characterizing the errors of data-driven dependency parsing models. In Proceedings of the EMNLP-CoNLL. 122-131. [ Links ]

26. Mirroshandel, S. A., Nasr, A., & Roux, J. L. (2012). Semi-supervised dependency parsing using lexical affinities. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju Island, Korea, 777-785. [ Links ]

27. Ninio, A. (1998). Acquiring a dependency grammar: The first three stages in the acquisition of multiword combinations in hebrew-speaking children. In Makiello-Jarza, G., Kaiser, J., & Smolczynska, M., editors, Language acquisition and developmental psycology. Crakow, Universitas. [ Links ]

28. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., & Yuret, D. (2007). The conll 2007 shared task on dependency parsing. In Proceedings of the EMNLP-CoNLL. 915-932. [ Links ]

29. Noord, G. V. (2007). Using self-trained bilexical preferences to improve disambiguation accuracy. In Proceedings of the Tenth International Conference on Parsing Technologies. Prague, Czech Republic, 1-10. [ Links ]

30. Plank, B. & Søgaard, A. (2012). Experiments in newswire-to-law adaptation of graph-based dependency parsers. In Working Notes of EVALITA 2011. Rome, Italy. [ Links ]

31. Reichart, R. & Rappoport, A. (2009). Automatic selection of high quality parses created by a fully unsupervised parser. In Proceedings of CoNLL 2009. 156-164. [ Links ]

32. Reichart, R. & Rappoport, A. (2009b). Sample selection for statistical parsers: Cognitively driven algorithms and evaluation measures. In Proceedings of CoNLL 2009. 3-11. [ Links ]

33. Tesniere, L. (1959). Elements de la syntaxe structurale. Klincksieck, Paris. [ Links ]

34. Yngve, V. H. (1960). A model and a hypothesis for language structure. In Proceedings of the American Philosophical Society. 444-466. [ Links ]