The emergence of a new field: topics and new categories (Topic Models)
Following our earlier session on quantitative textual analysis, Sophie Mützel will introduce us to applications of topic models. Originally, developed in computer science, machine learning, and natural language processing, topic models have recently gained increased attention in digital humanities, but also in the social sciences. Based on a particular probabilistic distribution, topic model algorithms cluster together words into "topics" without a priori coding by analysts. Topic modeling thus can be used as a tool to discover themes in large collections of documents. Sophie's work analyses the emergence of a new field, innovative breast cancer therapeutics, that has been evolving since the late 1980s. She applies topic models to show thematic trajectories over 22 years using several large textual corpora: scientific discussions (WoS data), industry and financial analysts, newspapers, and wire services (all LexisNexis). Her talk will present some results of this work and will also focus on practical aspects of running topic models and on computationally modelling textual data more generally.
Resources:
Blei, David M. (2012) "Probabilistic Topic Models", Communications of the ACM 55(4): 77-84.