Language Model Interpretability - Explainable AI Methods: Exploring explainable AI methods for interpreting and explaining the decisions made by language models to enhance transparency and trustworthiness

Srihari Maruthi; Sarath Babu Dodda; Ramswaroop Reddy Yellu; Praveen Thuniki; Surendranadha Reddy Byrapu Reddy

Authors

Srihari Maruthi University of New Haven, West Haven, CT, United States Author
Sarath Babu Dodda Central Michigan University, MI, United States Author
Ramswaroop Reddy Yellu Independent Researcher, USA Author
Praveen Thuniki Independent Researcher & Program Analyst, Georgia, United States Author
Surendranadha Reddy Byrapu Reddy Sr. Data Architect at Lincoln Financial Group, Greensboro, NC, United States Author

Keywords:

Language models, Explainable AI, Interpretability, Transparency, Trustworthiness

Abstract

Language models have achieved remarkable success in various natural language processing tasks, but their complex inner workings often lack transparency, leading to concerns about their reliability and ethical implications. Explainable AI (XAI) methods aim to address this issue by providing insights into how language models make decisions. This paper presents a comprehensive review of XAI methods for interpreting and explaining the decisions made by language models. We discuss key approaches such as attention mechanisms, saliency maps, and model-agnostic techniques, highlighting their strengths and limitations. Additionally, we explore the implications of XAI for enhancing the transparency and trustworthiness of language models in real-world applications.

Language Model Interpretability - Explainable AI Methods

Exploring explainable AI methods for interpreting and explaining the decisions made by language models to enhance transparency and trustworthiness

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

Most read articles by the same author(s)

Similar Articles