Clustering and topic modeling over tweets: A comparison over a health dataset

Juan Antonio Lossio-Ventura; Juandiego Morzan; Hugo Alatrista-Salas; Tina Hernandez-Boussard; Jiang Bian

doi:10.1109/BIBM47256.2019.8983167

Clustering and topic modeling over tweets: A comparison over a health dataset

Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Tina Hernandez-Boussard, Jiang Bian

Universidad del Pacífico

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

8 Scopus citations

Abstract

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

Original language	English
Title of host publication	Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Editors	Illhoi Yoo, Jinbo Bi, Xiaohua Tony Hu
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1544-1547
Number of pages	4
ISBN (Electronic)	978-172811867-3
DOIs	https://doi.org/10.1109/BIBM47256.2019.8983167
State	Published - 1 Nov 2019
Event	Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - Duration: 1 Nov 2019 → …

Publication series

Name	Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

Conference

Conference	Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Period	1/11/19 → …

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

Twitter
clustering
internal cluster indexes
natural language processing
topic modeling

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/BIBM47256.2019.8983167

Cite this

Lossio-Ventura, J. A., Morzan, J., Alatrista-Salas, H., Hernandez-Boussard, T., & Bian, J. (2019). Clustering and topic modeling over tweets: A comparison over a health dataset. In I. Yoo, J. Bi, & X. T. Hu (Eds.), Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 (pp. 1544-1547). Article 8983167 (Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM47256.2019.8983167

Lossio-Ventura, Juan Antonio ; Morzan, Juandiego ; Alatrista-Salas, Hugo et al. / Clustering and topic modeling over tweets : A comparison over a health dataset. Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019. editor / Illhoi Yoo ; Jinbo Bi ; Xiaohua Tony Hu. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1544-1547 (Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019).

@inproceedings{5ce9aa004e7249c3a115c9f12297f5b3,

title = "Clustering and topic modeling over tweets: A comparison over a health dataset",

abstract = "Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.",

keywords = "Twitter, clustering, internal cluster indexes, natural language processing, topic modeling",

author = "Lossio-Ventura, {Juan Antonio} and Juandiego Morzan and Hugo Alatrista-Salas and Tina Hernandez-Boussard and Jiang Bian",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 ; Conference date: 01-11-2019",

year = "2019",

month = nov,

day = "1",

doi = "10.1109/BIBM47256.2019.8983167",

language = "English",

series = "Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1544--1547",

editor = "Illhoi Yoo and Jinbo Bi and Hu, {Xiaohua Tony}",

booktitle = "Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019",

address = "United States",

}

Lossio-Ventura, JA, Morzan, J, Alatrista-Salas, H, Hernandez-Boussard, T & Bian, J 2019, Clustering and topic modeling over tweets: A comparison over a health dataset. in I Yoo, J Bi & XT Hu (eds), Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019., 8983167, Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, Institute of Electrical and Electronics Engineers Inc., pp. 1544-1547, Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, 1/11/19. https://doi.org/10.1109/BIBM47256.2019.8983167

Clustering and topic modeling over tweets: A comparison over a health dataset. / Lossio-Ventura, Juan Antonio; Morzan, Juandiego; Alatrista-Salas, Hugo et al.
Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019. ed. / Illhoi Yoo; Jinbo Bi; Xiaohua Tony Hu. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1544-1547 8983167 (Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Clustering and topic modeling over tweets

T2 - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

AU - Lossio-Ventura, Juan Antonio

AU - Morzan, Juandiego

AU - Alatrista-Salas, Hugo

AU - Hernandez-Boussard, Tina

AU - Bian, Jiang

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

AB - Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

KW - Twitter

KW - clustering

KW - internal cluster indexes

KW - natural language processing

KW - topic modeling

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85084332074&origin=inward

UR - http://www.scopus.com/inward/record.url?scp=85084332074&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/19b575ec-1d7b-3178-9373-38f25615a22c/

U2 - 10.1109/BIBM47256.2019.8983167

DO - 10.1109/BIBM47256.2019.8983167

M3 - Conference contribution

T3 - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

SP - 1544

EP - 1547

BT - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

A2 - Yoo, Illhoi

A2 - Bi, Jinbo

A2 - Hu, Xiaohua Tony

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 November 2019

ER -

Lossio-Ventura JA, Morzan J, Alatrista-Salas H, Hernandez-Boussard T, Bian J. Clustering and topic modeling over tweets: A comparison over a health dataset. In Yoo I, Bi J, Hu XT, editors, Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1544-1547. 8983167. (Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019). doi: 10.1109/BIBM47256.2019.8983167

Clustering and topic modeling over tweets: A comparison over a health dataset

Abstract

Publication series

Conference

Bibliographical note

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this