Calibrated Information Extraction from Coastal Ecosystems Literature

About

A large portion of data for freshwater and coastal ecosystems exists within text, tables, and figures from PDF research papers. Generative AI is increasingly used as a tool for extracting such data, but is subject to high risk inaccuracies (e.g. 'hallucinations'). We propose to surmount this drawback through a novel technique: calibrated information extraction. We develop mechanistic interpretability tools for probing an LLM's internal activation patterns and producing confidence scores for extracted data points. In turn, we show that strong calibration among scores suggests a path for reliably supporting ecological research in downstream statistical models or analyses.

About

Speaker

Kevin Quinn

Event Details

Evaluating Language Model Responses to Mental Health Symptom Disclosures & Survey of Predictive Recursive Algorithms for Inference

CDS PhD Student Lightning Talk Competition