Text as Data

PhD

Published

September 10, 2022

Photo by Adeolu Eletu on Unsplash

Ressources

Syllabus

Course Description

The course is aimed at doctoral students and teaches current textual analysis methods used in Accounting, Management, and Finance research. It introduces a framework and a tool set which enables researchers to measure previously hard to measure latent concepts using text data.

The course is roughly divided into three parts. The first, shorter, part introduces a modern framework for drawing inferences from data. This part introduces basic concepts. It also shows how to use graphs (DAGs) to derive research designs. The second part introduces textual analysis using a framework that divides textual analysis (or any measure generation) into two connected steps: quantification and mapping. The third part introduces GenAI as a tool to create measures of important concepts. The advent of generative AI applications has generated new possibilities for deriving concept representations. Yet, it also poses several challenges. We will discuss and test-run the use of GenAI on several examples.

Quantification concerns quantifying text into machine readable form, such as the bag-of-words representation. Mapping encompasses methods, such as word-lists, supervised, or unsupervised methods, that turn numerical representations into the measure of interest.

Participants will be introduced to commonly applied approaches for both steps and will learn to reason about which approaches are advisable given the text at the hand and the concept to be measured. We will see multiple examples of how the concept to be measured influences certain texts and suggests particular quantification and mapping steps.

Learning Goals

  1. Develop a Comprehensive Understanding of Inference and Measurement in Empirical Research. Students will learn to critically evaluate the theoretical underpinnings of causal inference and prediction, utilizing Directed Acyclic Graphs (DAGs) to visualize and reason through complex relationships among variables, confounders, moderators, and mediators. Students will demonstrate proficiency in diagnosing measurement errors and their impacts on empirical identification and inference.
  2. Master Advanced Textual Analysis Techniques for Empirical Measurement. Students will acquire the skills necessary to quantify textual data into numerical representations and effectively map these into meaningful empirical constructs. They will proficiently apply and critically evaluate methodologies such as dictionary-based approaches, document similarity measures, supervised and unsupervised machine learning techniques, and pre-trained language models like BERT to empirically measure complex concepts such as sentiment, economic uncertainty, and competition.
  3. Critically Assess and Implement LLM Methods in Textual Analysis. Students will gain expertise in leveraging LLMs as an advanced methodological tool for textual analysis. They will analyze and articulate the methodological strengths, limitations, and potential biases inherent in LLM-based measurement techniques, demonstrating the capability to apply LMM responsibly and effectively to quantify complex concepts in novel research contexts.