SciELO - Scientific Electronic Library Online

 
vol.23 número3Sentence Similarity Techniques for Short vs Variable Length Text using Word EmbeddingsText Classification using Gated Fusion of n-gram Features and Semantic Features índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

SHAH, Sapan; S, Sarath  e  REDDY, Sreedhar. Similarity Driven Unsupervised Learning for Materials Science Terminology Extraction. Comp. y Sist. [online]. 2019, vol.23, n.3, pp.1005-1013.  Epub 09-Ago-2021. ISSN 2007-9737.  https://doi.org/10.13053/cys-23-3-3266.

Knowledge of material properties, microstructure, underlying material composition and manufacturing process parameters that the material has undergone is of significant interest to materials scientists and engineers. A large amount of information of this nature is present in the form of unstructured sources. To access the right information for a given problem at hand, various domain specific search systems have been developed. Domain terminologies, when available, can significantly improve the quality of such systems. In this paper, we propose a novel similarity driven learning approach for automatic terminology extraction for materials science domain. It first uses various intra-domain and inter-domain unsupervised corpus level features to score and rank candidate terminologies. For inter-domain features, we use British National Corpus (BNC) as the general purpose corpus. The ranked candidate terms are then used to generate training data for learning a similarity based scoring function. The parameters of this scoring function are learnt using a Siamese neural network which uses word embeddings learnt from both the domain as well as the general purpose corpora to leverage contrasting term features. The proposed similarity based learning approach consistently outperforms other reported classification approaches on the materials dataset.

Palavras-chave : Terminology extraction; computational terminology; domain specific search; natural language processing.

        · texto em Inglês     · Inglês ( pdf )