SciELO - Scientific Electronic Library Online

vol.17 issue2Extracting Phrases Describing Problems with Products and Services from Twitter MessagesUsing Stylistic Features for Social Power Modeling author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.2 México Apr./Jun. 2013




A Supervised Approach for Reconstructing Thread Structure in Comments on Blogs and Online News Agencies


El enfoque supervisado para reconstrucción de la estructura de hilos en comentarios en blogs y agencias de noticias en línea


Ali Balali1, Hesham Faili2, Masoud Asadpour3, and Mostafa Dehghani4


1 School of ECE, College of Engineering, University of Tehran, Tehran, Iran

2 School of ECE, College of Engineering, University of Tehran, Tehran, Iran

3 School of ECE, College of Engineering, University of Tehran, Tehran, Iran

4 School of ECE, College of Engineering, University of Tehran, Tehran, Iran


Article received on 07/12/2012
Accepted on 16/01/2013.



There is a great deal of knowledge in online environments such as forums, chats and blogs. A large volume of comments with different subjects on a page has created a lot of complexity in following the actual conversation streams, since the reply structures of comments are generally not publicly accessible in online environments. It is beneficial to automatically reconstruct thread structure of comments to deal with such a problem. This work focuses on reconstructing thread structures on blogs and online news agencies' comment space. First, we define a set of textual and non-textual features. Then we use a learning algorithm to combine extracted features. The proposed method has been evaluated on three different datasets, which include two datasets in Persian and one in English. The accuracy ratio of the proposed model is compared with three baseline algorithms. The results reveal higher accuracy ratio for the proposed method in comparison with the baseline methods for all datasets.

Keywords: Reconstructing thread structure, reply structure, information extraction, blogs and online news agencies, machine learning, information management.



Una cantidad grande de conocimiento está hoy en línea en varias formas como foros, chats y blogs. El gran volumen de comentarios acerca de diversos temas en una página ha creado gran complejidad para realizar el seguimiento de los flujos reales de conversación, ya que las estructuras de respuesta a comentarios por lo general no son de acceso público en las páginas web. Sería beneficioso reconstruir automáticamente la estructura de hilos de comentarios para resolver este problema. El presente trabajo se centra en la reconstrucción de la estructura de hilos en el espacio de comentarios en blogs y agencias de noticias en línea. En primer lugar, se define el conjunto de características textuales y no textuales. Luego se utiliza un algoritmo de aprendizaje para combinar las características extraídas. El método propuesto ha sido evaluado sobre tres distintos conjuntos de datos, que incluye dos conjuntos de datos en idioma persa y un conjunto en inglés. La precisión del modelo propuesto se compara con tres algoritmos de referencia. Los resultados muestran mayor precisión del método propuesto en comparación con los métodos de referencia para todos los conjuntos de datos.

Palabras clave: Reconstrucción de la estructura de hilos, estructura de respuestas, extracción de información, blogs y agencias de noticias en línea, aprendizaje de máquina, administración de información.





1. Schuth, A., Marx, M., & de Rijke, M. (2007). Extracting the Discussion Structure in Comments on News-Articles. 9th ACM International workshop on Web Information and data management (WIDM'07), Lisboa, Portugal, 97-104.         [ Links ]

2. Shen, D., Yang, Q., Sun, J.T., & Chenj, Z. (2006). Thread detection in dynamic text message streams. 29th Annual International ACM SIGIR conference on Research and Development in Information Retrieval, Seattle, Washington, 35-42.         [ Links ]

3. Aumayr, E., Chan, J., & Hayes, C. (2011). Reconstruction of threaded conversations in online discussion forums. Fifth International AAAI Conference on Weblogs and Social Media ICWSM-11, Catalonia, Spain, 26-33.         [ Links ]

4. Seo, J., Croft, W.B., & Smith, D.A. (2011). Online community search using conversational structures. Information Retrieval, 14(6), 547-571.         [ Links ]

5. Adams, P.H. & Martell, C.H. (2008). Topic Detection and Extraction in Chat. Second IEEE International Conference on Semantic Computing (ICSC '08), Santa Clara, California, 581-588.         [ Links ]

6. Wang, Y.C., Joshi, M., Cohen, W.W., & Rosé, C. (2008). Recovering Implicit Thread Structure in Newsgroup Style Conversations. 2nd International Conference on Weblogs and Social Media (ICWSMII), Seattle, Washington. 152-160.         [ Links ]

7. Gottipati, S., Lo, D., &Jiang, J. (2011). Finding relevant answers in software forums. 26 IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), Lawrence, Massachusetts, 323-332.         [ Links ]

8. Georgiou, T., Karvounis, M., & Ioannidis, Y. (2010). Extracting Topics of Debate between Users on Web Discussion Boards. 2010 ACM SIGMOD Conference, Indianapolis, Indiana.         [ Links ]

9. Chan, J., Hayes, C., & Daly, E.M. (2010). Decomposing Discussion Forums and Boards Using User Roles. Fourth International AAAI Conference on Weblogs and Social Media, Washington, DC, 215-218.         [ Links ]

10. Dehghani, M., Asadpour, M., & Shakery, A. (2012). An evolutionary-based method for reconstructing conversation threads in email corpora. 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Istanbul, Turkey, 1132-1137.         [ Links ]

11. Yeh, J.Y. & Harnly, A. (2006). Email thread reassembly using similarity matching. Third Conference on Email and Anti-Spam (CEAS 2006), Mountain View, California.         [ Links ]

12. Joachims, T. (2006). Training linear SVMs in linear time. 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'06), Philadelphia, USA, 217-226.         [ Links ]

13. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., & Gatford, M. (1994). Okapi in TREC-3. NIST Special Publication 500226: Overview of the Text Retrieval Conference TREC-3, Gaithersburg, USA, 109-126.         [ Links ]

14. Wagner, R.A. & Fischer, M.J. (1974). The String-to-String Correction Problem. Journal of the ACM, 21(1), 168-173.         [ Links ]

15. A. Balali, et al. (2013). Content Diffusion Prediction in Social Networks. Paper presented at the 5th International Conference on Information and Knowledge Technology (IKT), Shiraz, Iran.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License