Chan Young Park
PhD Candidate
Carnegie Mellon University
Chan Park is a PhD candidate in the School of Computer Science at Carnegie Mellon University, advised by Professor Yulia Tsvetkov. She earned her MS and BS in CS from Yonsei University, Korea, and is currently a visiting PhD student at the University of Washington.
Her research focuses on the intersection of natural language processing, computational social science, and AI ethics, specifically enhancing language technologies' effectiveness, fairness, and adaptability for diverse populations. She is passionate about using NLP for social good and is dedicated to developing fair and accessible language technologies.
Her work has been published in top conferences and journals, including PNAS, ACL, EMNLP, WWW, and ICWSM. She was selected as a University of Chicago Rising Stars in Data Science and received the Wikimedia Foundation Research Award of the Year 2023 for studying social biases on Wikipedia. Chan is also a Korea Foundation for Advanced Studies PhD fellowship recipient.
Society and culture play a significant role in shaping an individual's identity. Language, as a tool for cultural and psychological expression, thus reflects the individual's sociocultural background. However, most current language technologies do not explicitly consider sociocultural contexts. This can result in less effective and potentially biased models, as they may overemphasize dominant cultures and social groups. On the other hand, the fields that explore sociocultural knowledge, including social science and sociolinguistics, are also not making the most out of state-of-the-art NLP models. Despite the enormous potential and their recent remarkable advancements, social science researchers still rely on relatively primitive techniques, such as lexicon-based approaches and simple unsupervised methods like topic models.
In this thesis, I aim to bridge the gap between NLP and sociocultural knowledge and demonstrate how they can mutually benefit each other. In Part 1, I present a series of works demonstrating how NLP models can be improved by incorporating sociocultural context. Specifically, I explore the role of cultural and social community contexts in transfer language selection and norm violation detection models. In the second part, I showcase how we can leverage advanced language technologies to study three sociocultural phenomena: social bias, social movements, and information campaigns. For each work, I identify obstacles that prevent state-of-the-art NLP models from being used to study the phenomenon and provide solutions to make these models practical and effective. Ultimately, this thesis aims to demonstrate the vital connection between NLP and sociocultural knowledge.