Sergio Pelaez
Ph.D. Student
Overview
I am a Ph.D. candidate in Public Policy, specializing in Science and Technology Policy at Georgia Tech. My research investigates how policy, incentives, and social networks influence the scientific and technological outcomes of researchers and inventors, as well as the societal impacts of such activities, in particular in the fields of artificial intelligence and nanotechnology. A significant aspect of my work involves implementing state-of-the-art language models to extract and analyze value promises from patent text and examining their role in patent drafting, prosecution, valuation, and technological orientation.
I have an additional research agenda in the economics of innovation, focusing on the factors that drive business innovation and its impacts. For example, how market competition, taxation, and entrepreneurial ecosystems influence firms’ innovation activities. My work has also explored the effects of business innovation on the exports of knowledge-intensive business services (KIBS) and the gender disparities within KIBS sectors. I have also investigated how productive development policies contribute to increased tax collection.
- MA Economics of Public Policy, Universidad del Rosario
- BA Economics, Universidad Autonoma de Manizales
- BA Business, Universidad Autonoma de Manizales
Interests
- Applied Microeconomics
- Development Economics
- Industrial Organization
- Program Evaluation, Public Management and Administration
- Science, Technology, and Innovation Policy
- Antitrust Law and Policy
- Innovation
- Intellectual Property Law
- Science and Technology
- Technology and Innovation
Publications
Recent Publications
Journal Articles
- Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents
In: Quantitative Science Studies [Peer Reviewed]
Date: March 2024
We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and Trademark Office (USPTO), we design a semi-automated, human-supervised framework for identifying and labeling public value expressions in these sentences. A GPT-4 prompt is developed that includes definitions, guidelines, examples, and rationales for text classification. We evaluate the labels and rationales produced by GPT-4 using BLEU scores and topic modeling, finding that they are accurate, diverse, and faithful. GPT-4 achieved an advanced recognition of public value expressions from our framework, which it also uses to discover unseen public value expressions. The GPT-produced labels are used to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts. With careful framework design and interactive human oversight, we suggest that generative language models can offer significant assistance in producing labels and rationales.
Working Papers
- Do Societal Promises Influence Patent Value? An Analysis of Inventions in Artificial Intelligence
In: SSRN
Date: June 2024
- Exporting Knowledge-Intensive Business Services (KIBS): Innovation and Complementary Factors
In: SSRN
Date: April 2024
This paper investigates the impact of innovation on the export performance of Knowledge-Intensive Business Services (KIBS) firms. It contrasts two competing hypotheses: the diminishing returns hypothesis, which predicts a positive but weaker effect of innovation on exporting for KIBS firms than for traditional services; and the complementarity hypothesis, which predicts a positive and stronger effect for KIBS firms, based on the presence of complementary factors that enhance their innovation capabilities. To test these hypotheses, the paper applies a three-stage structural model (ie, the CDM model) to a large-scale firm-level innovation survey in Colombia, covering 25,996 observations from 2014 to 2019. The empirical results support the complementarity hypothesis, showing that innovation increases the likelihood of exporting for KIBS firms more than for traditional services. The paper also identifies the main complementary factors that boost the innovation performance of KIBS firms, such as superior management practices, higher investment in information and communication technologies for innovation purposes, and stronger commitment to employee training in innovation-related tasks. Based on these findings, the paper discusses how innovation policy can facilitate the entry of KIBS firms into foreign markets, as a key element of a modern productive development strategy.
- The Gradual Impact of Sanctioning Cartels on Market Competition: Evidence from the Colombian Manufacturing Sector
In: Research Square
Date: September 2023
This paper investigates how fines on hard-core cartels affect market competition in Colombia. We use data from the national department of statistics and the competition agency, which covers 10,316 firms in the manufacturing sector from 2012 to 2020 and a panel of 67,671 observations. We measure market competition by the complement of the Lerner index and an indicator variable to account for the variation in the timing and sectors of the fines. We apply a difference-in-differences (DID) approach with multiple periods based on Callaway and Sant’Anna (2021) and perform robustness tests with Gardner (2021) and by incorporating anticipation effects. Our results show that the fines had a positive and gradual impact on market competition, implying that they deterred cartel behavior but also that some tacit collusion and price inertia persisted over time.
- Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents
In: arXiv
Date: May 2023
Labeling data is essential for training text classifiers but is often difficult to accomplish accurately, especially for complex and abstract concepts. Seeking an improved method, this paper employs a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. We apply this approach to the task of discovering public value expressions in US AI patents. We collect a database comprising 154,934 patent documents using an advanced Boolean query submitted to InnovationQ+. The results are merged with full patent text from the USPTO, resulting in 5.4 million sentences. We design a framework for identifying and labeling public value expressions in these AI patent sentences. A prompt for GPT-4 is developed which includes definitions, guidelines, examples, and rationales for text classification. We evaluate the quality of the labels and rationales produced by GPT-4 using BLEU scores and topic modeling and find that they are accurate, diverse, and faithful. These rationales also serve as a chain-of-thought for the model, a transparent mechanism for human verification, and support for human annotators to overcome cognitive limitations. We conclude that GPT-4 achieved a high-level of recognition of public value theory from our framework, which it also uses to discover unseen public value expressions. We use the labels produced by GPT-4 to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts and suggest that, with careful framework design and interactive human oversight, generative language models can offer significant advantages in quality and in reduced time and costs for producing labels and rationales.
All Publications
Journal Articles
- Large-scale text analysis using generative language models: A case study in discovering public value expressions in AI patents
In: Quantitative Science Studies [Peer Reviewed]
Date: March 2024
We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and Trademark Office (USPTO), we design a semi-automated, human-supervised framework for identifying and labeling public value expressions in these sentences. A GPT-4 prompt is developed that includes definitions, guidelines, examples, and rationales for text classification. We evaluate the labels and rationales produced by GPT-4 using BLEU scores and topic modeling, finding that they are accurate, diverse, and faithful. GPT-4 achieved an advanced recognition of public value expressions from our framework, which it also uses to discover unseen public value expressions. The GPT-produced labels are used to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts. With careful framework design and interactive human oversight, we suggest that generative language models can offer significant assistance in producing labels and rationales.
- Analyzing research outcomes and spillovers at a US nanotechnology user facility
In: Journal of Nanoparticle Research [Peer Reviewed]
Date: November 2022
Abstract
This paper maps research outcomes and identifies spillover effects at a US University Research Center (URC) that offers user facilities for nanotechnology research. We use scientometric and network science approaches to analyze measures of topical orientation, productivity, impact, and collaboration applied to URC-related Web of Science abstract publications records. A focus is on the analysis of spillover effects on external organizations (i.e., non-affiliated users). Our findings suggest the URC’s network relies on external organizations acting as brokers, to provide access to the facilities to other external organizations. Analysis of heterophily indicates that collaboration among internal and external organizations is enhanced by the facilities, while articles written by a mix of co-authors affiliated with internal and external organizations are likely to be more cited. These results provide insights on how URCs with user facilities can create conditions for diverse collaboration and greater research impact.
- Taxation and innovation: evidence from Colombia
In: Economics of Innovation and New Technology [Peer Reviewed]
Date: November 2022
Abstract
We use firm-level data from a Colombian manufacturing survey, complemented with data from the tax department, to test the effect of firms’ total tax and contribution rate (TCR) on the ratio of innovation expenditures to sales. We construct a data panel from 2003 to 2018 comprising 104,762 observations and implement fixed effects and instrumental variables estimation methods. Our results suggest that an increase of one percentage point in direct taxation leads to a decrease of 0.10% in the probability that firms engage in innovation investments, and market power moderates this effect. We discuss distinctive features of the effect of taxation on innovation in emerging economies—one being the inability of local innovation clusters to temper it. Policy implications include considering modifications to the magnitude and composition of the TCR as an alternative to R&D tax credits.
Working Papers
- Do Societal Promises Influence Patent Value? An Analysis of Inventions in Artificial Intelligence
In: SSRN
Date: June 2024
- Exporting Knowledge-Intensive Business Services (KIBS): Innovation and Complementary Factors
In: SSRN
Date: April 2024
This paper investigates the impact of innovation on the export performance of Knowledge-Intensive Business Services (KIBS) firms. It contrasts two competing hypotheses: the diminishing returns hypothesis, which predicts a positive but weaker effect of innovation on exporting for KIBS firms than for traditional services; and the complementarity hypothesis, which predicts a positive and stronger effect for KIBS firms, based on the presence of complementary factors that enhance their innovation capabilities. To test these hypotheses, the paper applies a three-stage structural model (ie, the CDM model) to a large-scale firm-level innovation survey in Colombia, covering 25,996 observations from 2014 to 2019. The empirical results support the complementarity hypothesis, showing that innovation increases the likelihood of exporting for KIBS firms more than for traditional services. The paper also identifies the main complementary factors that boost the innovation performance of KIBS firms, such as superior management practices, higher investment in information and communication technologies for innovation purposes, and stronger commitment to employee training in innovation-related tasks. Based on these findings, the paper discusses how innovation policy can facilitate the entry of KIBS firms into foreign markets, as a key element of a modern productive development strategy.
- The Gradual Impact of Sanctioning Cartels on Market Competition: Evidence from the Colombian Manufacturing Sector
In: Research Square
Date: September 2023
This paper investigates how fines on hard-core cartels affect market competition in Colombia. We use data from the national department of statistics and the competition agency, which covers 10,316 firms in the manufacturing sector from 2012 to 2020 and a panel of 67,671 observations. We measure market competition by the complement of the Lerner index and an indicator variable to account for the variation in the timing and sectors of the fines. We apply a difference-in-differences (DID) approach with multiple periods based on Callaway and Sant’Anna (2021) and perform robustness tests with Gardner (2021) and by incorporating anticipation effects. Our results show that the fines had a positive and gradual impact on market competition, implying that they deterred cartel behavior but also that some tacit collusion and price inertia persisted over time.
- Large-Scale Text Analysis Using Generative Language Models: A Case Study in Discovering Public Value Expressions in AI Patents
In: arXiv
Date: May 2023
Labeling data is essential for training text classifiers but is often difficult to accomplish accurately, especially for complex and abstract concepts. Seeking an improved method, this paper employs a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. We apply this approach to the task of discovering public value expressions in US AI patents. We collect a database comprising 154,934 patent documents using an advanced Boolean query submitted to InnovationQ+. The results are merged with full patent text from the USPTO, resulting in 5.4 million sentences. We design a framework for identifying and labeling public value expressions in these AI patent sentences. A prompt for GPT-4 is developed which includes definitions, guidelines, examples, and rationales for text classification. We evaluate the quality of the labels and rationales produced by GPT-4 using BLEU scores and topic modeling and find that they are accurate, diverse, and faithful. These rationales also serve as a chain-of-thought for the model, a transparent mechanism for human verification, and support for human annotators to overcome cognitive limitations. We conclude that GPT-4 achieved a high-level of recognition of public value theory from our framework, which it also uses to discover unseen public value expressions. We use the labels produced by GPT-4 to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts and suggest that, with careful framework design and interactive human oversight, generative language models can offer significant advantages in quality and in reduced time and costs for producing labels and rationales.