About this project

The appearance of the novel SARS-COV-2 virus on the global scale has generated demand for rapid research into the virus and the disease it causes, COVID-19. However, the literature about coronaviruses such as SARS-COV-2 is vast and difficult to sift through. This website showcase the way to organize existing literature on coronaviruses, other pandemics, and early research on the current COVID-19 outbreak in response to the call to action issued by the White House Office of Science and Technology Policy (Science and Policy,2020) and posted on the Semantic Scholar (Scholar, 2020) and Kaggle (2020) websites. We augment the original dataset posted on that site by adding articles drawn from other databases in order to make our final interactive organizational structure more robust for researchers.

Our primary goal is to create a framework for a topic-based search of papers within this dataset that is helpful to those investigating the novel coronavirus, SARS-COV-2, and the global COVID-19 pandemic. In order to discover the latest topics present in the collection of scholarly articles, as well as to organize them into a hierarchical tree structure that allows for an interactive search, we use a modified hierarchical nonnegative matrix factorization (HNMF) approach.

We use this hierarchical organization of the papers to create a website that allows users to walk through the topic tree based on the top keywords associated with each topic.

A full corpus data file can be downloaded here.