Breast, Lung and Liver Cancer Classification from Structured and Unstructured Data

González-Beltrán, Beatriz A.; Reyes-Ortiz, José A.; Montelongo-González, Erick E.

doi:10.13053/cys-26-1-4167

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Permalink

Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

GONZALEZ-BELTRAN, Beatriz A.; REYES-ORTIZ, José A. e MONTELONGO-GONZALEZ, Erick E.. Breast, Lung and Liver Cancer Classification from Structured and Unstructured Data. Comp. y Sist. [online]. 2022, vol.26, n.1, pp.233-243. Epub 08-Ago-2022. ISSN 2007-9737. https://doi.org/10.13053/cys-26-1-4167.

Currently, cancer is a worldwide public health problem. Machine and deep learning techniques hold great promise in healthcare by analyzing Electronic Health Records (EHR) that contain a large collection of structured and unstructured data. However, most research has been done with structured data, and valuable data is also found in doctor’s plain-text notes. Thus, this paper proposes an approach to classify breast, liver, and lung cancer based on structured and unstructured data obtained from the MIMIC-II clinical database by using machine and deep learning techniques. In particular, the Paragraph Vector algorithm is used as a deep learning approach to text representation. The goal of this work is to help physicians in early diagnosis of cancer. The proposed approach was tested on a balanced dataset of breast, liver, and lung cancer patient records. Pre-processing is done with structured and unstructured data, and the result is used as input variables to three machine learning models: Support Vector Machines, Multi Layer Perceptron, and Adaboost-SAMME. Then, the scoring metrics for these models are calculated in different training data configurations to choose the best performing model for classification. Results show that the best performing model was obtained with MLP, achieving 89% precision using unstructured data.

Palavras-chave : Cancer classification; structured and unstructured data; deep learning for unstructured data representation; machine learning models; electronic health records.

· texto em Inglês · Inglês (

pdf )