SciELO - Scientific Electronic Library Online

vol.17 issue2Detecting Salient Events in Large Corpora by a Combination of NLP and Data Mining TechniquesCorpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 México Apr./Jun. 2013




Graph Mining under Linguistic Constraints for Exploring Large Texts


Minería de grafos bajo restricciones lingüísticas para exploración de textos grandes


Solen Quiniou1, Peggy Cellier2, Thierry Charnois3, and Dominique Legallois4


1 LINA, LUNAM Université de Nantes, Nantes, France

2 IRISA, INSA de Rennes, Rennes, France

3 GREYC, Université de Caen Basse-Normandie, Caen, France and MoDyCO, Université Paris-Ouest Nanterre La Défense, Paris, France

4 CRISCO, Université de Caen Basse-Normandie, Caen, France


Article received on 07/12/2012
Accepted on 11/01/2013.



In this paper, we propose an approach to explore large texts by highlighting coherent sub-parts. The exploration method relies on a graph representation of the text according to Hoey's linguistic model which allows the selection and the binding of adjacent and non-adjacent sentences. The main contribution of our work consists in proposing a method based on both Hoey's linguistic model and a special graph mining technique, called CoHoP mining, to extract coherent sub-parts of the graph representation of the text. We have conducted some experiments on several English texts showing the interest of the proposed approach.

Keywords: Text coherence, graph representation, graph mining, Hoey's linguistic model.



En este artículo se propone el enfoque para la exploración de textos grandes destacando las sub-partes coherentes. El método de exploración se basa en la representación del texto mediante un gráfo de acuerdo con el modelo lingüístico de Hoey, el cual permite la selección y vinculación de frases adyacentes y no adyacentes. La principal aportación de este trabajo es la propuesta del método basado en el modelo lingüístico de Hoey por un lado y por otro lado en la técnica especial de minería de grafos llamada minería CoHoP, con el fin de extraer las sub-partes coherentes de la representación gráfica del texto. Se realizaron unos experimentos sobre varios textos en inglés mostrando el interés del enfoque propuesto.

Palabras clave: Coherencia de texto, representación con un grafo, minería de grafos, el modelo lingüístico de Hoey.





This work is partly supported by the French Région Basse-Normandie and by the ANR (French National Research Agency) funded project Hybride ANR-11-BS02-002. The authors would also like to thank Pierre-Nicolas Mougel and Christophe Rigotti (LIRIS, Lyon) for the availability of CoHoP Miner.



1. Achtert, E., Goldhofer, S., Kriegel, H.-P., Schubert, E., & Zimek, A. (2012). Evaluation of clusterings - metrics and visual support. In Proc. of ICDE'12.         [ Links ]

2. Ben-Ze'ev, A. (2004). Love Online: Emotions on the Internet. Cambridge Univ. Pr.         [ Links ]

3. Derenyi, I., Palla, G., & Vicsek, T. (2005). Clique percolation in random networks. Physical Review Letters, 94.         [ Links ]

4. Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., Shneiderman, B., & Plaisant, C. (2007). Discovering interesting usage patterns in text collections: integrating text mining with visualization. In Proc. of CIKM'07.         [ Links ]

5. Feldman, R. & Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge Univ. Pr.         [ Links ]

6. Hoey, M. (1991). Patterns of Lexis in Text. Describing English Language. Oxford Univ. Pr.         [ Links ]

7. Hovy, E. (1988). Planning coherent multisentential text. In Proc. of ACL88.         [ Links ]

8. Jones, K. S. (2007). Automatic summarising: The state of the art. Information Processing & Management, 43(6).         [ Links ]

9. Legallois, D., Cellier, P., & Charnois, T. (2011). Calcul de réseaux phrastiques pour l'analyse et la navigation textuelle. In Actes de TALN'11.         [ Links ]

10. MacNeilage, P. (2008). The Origin of Speech. UOP Oxford.         [ Links ]

11. Mann, W. & Thompson, S. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3).         [ Links ]

12. Mougel, P.-N., Rigotti, C., & Gandrillon, O. (2012). Finding collections of k-clique percolated components in attributed graphs. In Proc. of PAKDD 12.         [ Links ]

13. Quiniou, S., Cellier, P., Charnois, T., & Legallois, D. (2012). What about sequential data mining techniques to identify linguistic patterns for stylistics? In Proc. ofCICLing'12.         [ Links ]

14. Renouf, A. & Kehoe, A. (2004). Textual Distraction as a Basis for Evaluating Automatic Summarisers. In Proc. of LREC'04.         [ Links ]

15. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proc. of KDD'94.         [ Links ]

16. Washio, T. & Motoda, H. (2003). State of the art of graph-based data mining. SIGKDD Explorations, 5(1).         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License