Automated data science | Lengerich Lab

Data science has traditionally taken extensive domain expertise and arduous human effort to design targeted experiments which isolate effects in observational data. We are actively building AI tools to automate and streamline these efforts.

References

2023

Data Science with LLMs and Interpretable Models

Sebastian Bordt, Ben Lengerich, Harsha Nori, and 1 more author

AAAI Explainable AI for Science, 26–28 aug 2023

Abs arXiv Bib

Recent years have seen important advances in the building of interpretable models, machine learning models that are designed to be easily understood by humans. In this work, we show that large language models (LLMs) are remarkably good at working with interpretable models, too. In particular, we show that LLMs can describe, interpret, and debug Generalized Additive Models (GAMs). Combining the flexibility of LLMs with the breadth of statistical patterns accurately described by GAMs enables dataset summarization, question answering, and model critique. LLMs can also improve the interaction between domain experts and interpretable models, and generate hypotheses about the underlying phenomenon. We release TalkToEBM as an open-source LLM-GAM interface.
@article{bordt2024data, author = {Bordt, Sebastian and Lengerich, Ben and Nori, Harsha and Caruana, Rich}, title = {Data Science with LLMs and Interpretable Models}, journal = {AAAI Explainable AI for Science}, year = {2023}, informal_venue = {AAAI XAI4Sci}, keywords = {Interpretable, LLMs}, }
LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

Benjamin J. Lengerich, Sebastian Bordt, Harsha Nori, and 4 more authors

26–28 aug 2023

Abs arXiv Bib

We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package 𝚃𝚊𝚕𝚔𝚃𝚘𝙴𝙱𝙼 as an open-source LLM-GAM interface.
@article{lengerich2023llms, author = {Lengerich, Benjamin J. and Bordt, Sebastian and Nori, Harsha and Nunnally, Mark E. and Aphinyanaphongs, Yin and Kellis, Manolis and Caruana, Rich}, title = {LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs}, year = {2023}, }