Graph Mining under Linguistic Constraints for Exploring Large Texts

Quiniou, Solen; Cellier, Peggy; Charnois, Thierry; Legallois, Dominique

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.17 no.2 Ciudad de México abr./jun. 2013

Artículos

Graph Mining under Linguistic Constraints for Exploring Large Texts

Minería de grafos bajo restricciones lingüísticas para exploración de textos grandes

Solen Quiniou¹, Peggy Cellier², Thierry Charnois³, and Dominique Legallois⁴

¹ LINA, LUNAM Université de Nantes, Nantes, France solen.quiniou@univ-nantes.fr

² IRISA, INSA de Rennes, Rennes, France peggy.cellier@irisa.fr

³ GREYC, Université de Caen Basse-Normandie, Caen, France and MoDyCO, Université Paris-Ouest Nanterre La Défense, Paris, France thierry.charnois@unicaen.fr

⁴ CRISCO, Université de Caen Basse-Normandie, Caen, France dominique.legallois@unicaen.fr

Article received on 07/12/2012
Accepted on 11/01/2013.

Abstract

In this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to Hoey's linguistic model which allows the selection and the binding of adjacent and non-adjacent sentences. The main contribution of our work consists in proposing a method based on both Hoey's linguistic model and a special graph mining technique, called CoHoP mining, to extract coherent sub-parts of the graph representation of the text. We have conducted some experiments on several English texts showing the interest of the proposed approach.

Keywords: Text coherence, graph representation, graph mining, Hoey's linguistic model.

Resumen

En este artículo se propone el enfoque para la exploración de textos grandes destacando las sub-partes coherentes. El método de exploración se basa en la representación del texto mediante un gráfo de acuerdo con el modelo lingüístico de Hoey, el cual permite la selección y vinculación de frases adyacentes y no adyacentes. La principal aportación de este trabajo es la propuesta del método basado en el modelo lingüístico de Hoey por un lado y por otro lado en la técnica especial de minería de grafos llamada minería CoHoP, con el fin de extraer las sub-partes coherentes de la representación gráfica del texto. Se realizaron unos experimentos sobre varios textos en inglés mostrando el interés del enfoque propuesto.

Palabras clave: Coherencia de texto, representación con un grafo, minería de grafos, el modelo lingüístico de Hoey.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgments

This work is partly supported by the French Région Basse-Normandie and by the ANR (French National Research Agency) funded project Hybride ANR-11-BS02-002. The authors would also like to thank Pierre-Nicolas Mougel and Christophe Rigotti (LIRIS, Lyon) for the availability of CoHoP Miner.

References

1. Achtert, E., Goldhofer, S., Kriegel, H.-P., Schubert, E., & Zimek, A. (2012). Evaluation of clusterings - metrics and visual support. In Proc. of ICDE'12. [ Links ]

2. Ben-Ze'ev, A. (2004). Love Online: Emotions on the Internet. Cambridge Univ. Pr. [ Links ]

3. Derenyi, I., Palla, G., & Vicsek, T. (2005). Clique percolation in random networks. Physical Review Letters, 94. [ Links ]

4. Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., Shneiderman, B., & Plaisant, C. (2007). Discovering interesting usage patterns in text collections: integrating text mining with visualization. In Proc. of CIKM'07. [ Links ]

5. Feldman, R. & Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge Univ. Pr. [ Links ]

6. Hoey, M. (1991). Patterns of Lexis in Text. Describing English Language. Oxford Univ. Pr. [ Links ]

7. Hovy, E. (1988). Planning coherent multisentential text. In Proc. of ACL88. [ Links ]

8. Jones, K. S. (2007). Automatic summarising: The state of the art. Information Processing & Management, 43(6). [ Links ]

9. Legallois, D., Cellier, P., & Charnois, T. (2011). Calcul de réseaux phrastiques pour l'analyse et la navigation textuelle. In Actes de TALN'11. [ Links ]

10. MacNeilage, P. (2008). The Origin of Speech. UOP Oxford. [ Links ]

11. Mann, W. & Thompson, S. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3). [ Links ]

12. Mougel, P.-N., Rigotti, C., & Gandrillon, O. (2012). Finding collections of k-clique percolated components in attributed graphs. In Proc. of PAKDD 12. [ Links ]

13. Quiniou, S., Cellier, P., Charnois, T., & Legallois, D. (2012). What about sequential data mining techniques to identify linguistic patterns for stylistics? In Proc. ofCICLing'12. [ Links ]

14. Renouf, A. & Kehoe, A. (2004). Textual Distraction as a Basis for Evaluating Automatic Summarisers. In Proc. of LREC'04. [ Links ]

15. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proc. of KDD'94. [ Links ]

16. Washio, T. & Motoda, H. (2003). State of the art of graph-based data mining. SIGKDD Explorations, 5(1). [ Links ]