Calibrated Information Extraction from Coastal Ecosystems Literature
About
A large portion of data for freshwater and coastal ecosystems exists within text, tables, and figures from PDF research papers. Generative AI is increasingly used as a tool for extracting such data, but is subject to high risk inaccuracies (e.g. 'hallucinations'). We propose to surmount this drawback through a novel technique: calibrated information extraction. We develop mechanistic interpretability tools for probing an LLM's internal activation patterns and producing confidence scores for extracted data points. In turn, we show that strong calibration among scores suggests a path for reliably supporting ecological research in downstream statistical models or analyses.
Speaker

Kevin Quinn
Kevin is a PhD student at Boston University working with Professor Mark Crovella and Professor Evimaria Terzi. Kevin's research focuses on the design and application of interpretable machine learning models, specifically for unsupervised clustering problems. He previously completed a BA in mathematics and computer science at BU.