Using the wayback machine to mine websites in the social sciences: A methodological resource

Title: Using the wayback machine to mine websites in the social sciences: A methodological resource
Format: Journal Article
Publication Date: August 2016
Published In: Journal of the Association for Information Science and Technology
Description: © 2015 The Authors. Journal of the Association for Information Science and Technology published by Wiley Periodicals, Inc. on behalf of ASIS&T.Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
Ivan Allen College Contributors:
Citation: Journal of the Association for Information Science and Technology. 67. Issue 8. 1904 - 1915. ISSN 2330-1635. DOI 10.1002/asi.23503.
Related Departments:
  • School of Public Policy