Comparing methods to extract technical content for technological intelligence

Title: Comparing methods to extract technical content for technological intelligence
Format: Conference
Publication Date: November 2012
Description: We are developing indicators for the emergence of science and technology (S&T) topics. We are targeting various S&T information resources, including metadata (i.e., bibliographic information) and full text. We explore alternative text analysis approaches - principal components analysis (PCA) and topic modeling - to extract technical topic information. We analyze the topical content to pursue potential applications and innovation pathways. In this presentation we compare alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing (NLP) on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and tf-idf weighting. We compare PCA results to topic modeling results. Our key test set consists of 4104 Web of Science records on Dye-Sensitized Solar Cells (DSSCs). Results suggest good potential to enhance our technical intelligence payoffs from database searches on topics of interest. © 2012 IEEE.
Ivan Allen College Contributors:
Citation: 1279 - 1285.
Related Departments:
  • School of Public Policy