Trustworthy Artificial Intelligence

Mihir Mehta
Mihir Mehta
Pennsylvania State University
Mihir is a doctoral student and an instructor for an undergraduate engineering analytics course in the Industrial and Manufacturing Engineering Department at Pennsylvania State University. His research is focused on developing TAI-based healthcare applications. He holds a master’s degree in statistics and has worked in the industry for over seven years in data science-driven roles. He also serves as an editorial staff writer for OR/MS Tomorrow.

The COVID-19 pandemic has accelerated digital technologies adaptation across global organizations, creating an unprecedented amount of digital data at our disposal [mck]. This data has the huge potential to help us make better decisions and improve our lives. Thanks to enhanced computing and storing devices and efficient, easy-to-use software, we are deploying more and more artificial intelligence-based (AI) applications. These applications have shown immense potential to solve complex problems ranging from predicting the 3-D structure of proteins “AlphaFold-2" [8] within a few hours to determining algorithmic prudence to beating Go master by “AlphaGo" [4].

Our society exhibits disparities in different forms, including social [24, 29], economic [43], and health [22]. The COVID-19 pandemic has widened or made more prominent existing disparities gaps in data (both structured and unstructured data types) [50]. AI-based applications have been used to combat some of the challenges related to these disparities. For example, AI technologies have improved healthcare access in resource-poor settings [58]. However, the application of these technologies is not without unintended consequences. We have a responsibility to balance the overall performance of AI applications with their ethical and socially responsible usage. At times, AI-based applications have aggravated racial bias and discrimination (e.g., image data) [opi, bea], gender imbalance (e.g., medical imaging) [35], comprehension difficulties (e.g., electronic health records) [60], and poor generalizations (e.g., imaging transformations) [16]. These ill effects have generated multiple question marks on the trustworthiness of AI applications and created significant barriers to leverage them for societal welfare.

As digitalization continues to influence our societal growth and future, we need to make conscious efforts to develop socially responsible and ethical AI-based applications. This effort will minimize the post-implementation adverse effects and assist us in carving a path for an equitable, diverse, and inclusive society. Concurrently, trust in AI applications will be enhanced, facilitating their continued use in future societal welfare. Recent advances in the Trustworthy AI (TAI) domain are focused on meeting these objectives through appropriate trade-off strategies in relation to overall performance.

An AI-based decision-making system takes a data-driven learning path. However, this data is nothing but a mirror reflecting our present. As a result, it reflects existing disparities, biases, and injustices present in society. Hence, TAI strongly recommends making socio-technical justified adjustments to this data and these data-based algorithms while evaluating their performance. With a multi-faceted contextual and casual approach, TAI helps design safe, secure, unbiased, fair, robust, explainable, accountable, and dynamic AI applications that perform well.

This article provides an introductory overview of the TAI domain. In the beginning, it provides an overview of the guidelines and recommendations for TAI applications development. Then, it covers the motivations behind each of TAI’s multiple attributes and lists key approaches taken to satisfy corresponding requirements. Afterward, this article lists the important research trends in verifiable claims that are vital for demonstrating and quantifying trust in the deployed AI applications. Finally, this article concludes with a summary. The ethical and responsible development of AI serves as the foundation for developing and deploying TAI-driven applications. TAI is based on the five principles of ‘Beneficence’ (sustainable and inclusive goodness), ‘Non-Maleficence’ (minimizing ill-effects), ‘Autonomy’ (considerations to human choices), ‘Justice’ (ensuring impartiality), and ‘Explicability’ (transparency and responsibility discussions) [26]. TAI discussions are engaged from multiple perspectives. A few examples of these perspectives are:

  1. Academia-focused trustworthy machine learning technologies discussions [55]
  2. Governance guidelines to shape Europe’s digital future [eur],
  3. Google’s explainability and trust discussions provide an industrial perspective [goo]
  4. Verifiable claims-based developmental recommendations through AI developer lenses [20]

All these perspectives emphasize a few key ideas:

  1. Data investigation before feeding to the algorithm such as identifying Simpson’s paradox in social data [14]
  2. A consistent and impartial algorithmic methodology like a convex framework for fair regression [19]
  3. Documentation-based transparency and verifiability enhancement like ABOUT ML (bench-marking of machine-learning lifecycles) [49]

Promoting community-level collaborative AI development tools and practices such as ‘Model cards’ [42] and ‘Datasheets’ [28]


Figure: Trustworthy AI: An Overview

Trustworthy AI Attributes
The TAI domain focuses on formulating specific methodologies to realize explainable, unbiased, fair, consistent, robust, secure, privacy-preserving AI algorithms. These algorithms are further augmented through causality and context-specific considerations. The overall deployment process manages trade-offs strategies between the overall algorithm performance and each of these considerations. With the development of verifiable claims, the overall TAI deployment process demonstrates and quantifies its trustworthiness.


  • Why: An AI-based application is a socio-technical toolkit used to obtain a specific societal welfare objective. As a result, its development process should account for an underlying social and professional context. The context helps to decide the required compliance rigor level. By accounting for the degree of criticality of the objective and the involved stakeholders, this attribute enhances the overall holistic TAI development.
  • How: Two context-based examples are included.
    1. The “Sepsis Watch," a real-world AI-application deployment case study, is designed as a socio-technical toolkit for the clinical decision-making process [52].
    2. The ‘Artificial Intelligence and Machine Learning in Software as a Medical Device,’ the US Food and Drug Administration (FDA) regulatory framework, provides guidelines for context-based iterative digital health tool development [27].


  • Why: For the Sacramento region in California, AI-based algorithm results claimed that air pollution levels were safe for humans, contrary to the ground reality [sac]. In similar and increasingly critical contexts, AI-based algorithms need to explain their reasoning behind a decision. The explainable AI (XAI) attribute develops methodologies to explain the functioning of AI applications’ often involving complex architectures.
  • How:
    1. These techniques take diverse approaches for explaining AI algorithms.
      1. Quantifying the importance of input variables for the model like Shapley global explanations [38]
      2. Intrinsically interpretable model development like decision sets [34]
      3. Comparison involving a change in prediction values such as data perturbation for identifying the indirect influence of features [13]
      4. Driven by the contestability by design [39] like effective contestability as explanations [48]
    2. All these explanations are evaluated by quantifying accuracy and uncertainty in their results [30, 15]


  • Why: With a minor addition of noise, the image classification AI application has classified a benign tumor image as a malignant tumor image [25]. Unstable and brittle AI algorithms like this one often result in unintended and ill consequences. Hence, robustness attribute-related research enhances the overall validity, consistency, and accountability of AI applications.
  • How: These techniques assess AI application performance under multiple aspects:
    1. Consistency over time, for instance, performance under data distributional shift [46]
    2. Validity based on underlying assumptions and data like by using counterfactual explanations [54]
    3. The quantification of uncertainty by estimating distribution-free predictive intervals [47] and verification of limitations through formal methods [23].

Bias, Fairness, Privacy, and Security

  • Why: A research study [53] found that the open image dataset entries are contributed from only six countries across North America and Europe. In another research [40], a strong correlation between race and zip code was observed. These kinds of disparities and biases limit AI-application capacity to make non-discriminatory and impartial decisions. The bias-reduction and fairness-compliant methodologies identify existing biases and discriminations, develop specific strategies to mitigate them, and finally, verify the success of mitigation strategies. Recently, AI applications have been vulnerable to privacy violations and security issues [hea, aih]. It is crucial to reduce their vulnerabilities.
  • How: Numerous routes have been taken to develop unbiased, non-discriminatory, privacy-preserving, and secure AI algorithms.
    1. Defining different types of biases and discriminations [40] and mitigating them
      1. Distributional skewness similar to the earlier described images dataset often leads to a ‘Representational Bias.’ Hence, the investigative research work [53] advocates the vital need for geo-diversity and inclusion in open data sets, especially for the developing world.
      2. The inclusion of zip codes in AI-driven decision-making is vulnerable to ‘Indirect Discrimination.’ A casual framework-based approach has been developed to discover and remove these types of discriminations [63].
    2. Mitigating bias in the data as in Data Statements for natural language processing [18], constraint-based or modified objective function-based methods such as fairness classification [56], and mitigation success verification strategies such as fairness inference on outcomes using causal graphs [45] are examples of input, algorithm and output focused strategies, respectively.
    3. Model decisions should allow alteration by changing plausible input variables like actionable recourse [57]
    4. Further work requires developing specific methods to discover unfairness, extending fairness definition beyond equality perspectives through, for example, such as context-driven [31] and temporal fairness [36] discussions
    5. Privacy-preserving measures like decentralized [32] and differential privacy [41] -based techniques
    6. Security approaches like holomorphic encryptions [12], secure multi-party computations [64], and secure hardware implementations [sec]


  • Why: Data-driven AI applications leverage association learning and, at times, are susceptible to spurious correlation [21, 51]. Hence, a causal perspective inclusion in TAI development further bolsters AI applications’ overall success and longevity. It also incorporates subject-matter knowledge.
  • How: Casual-based approaches have been used to address multiple attributes of trustworthy AI development.
    1. Generative counterfactuals to explain AI applications [37], investigate fairness and improve robustness [54]
    2. Causal-based fairness quantification formula [62]
    3. Causality-manipulated data augmentations for robustness [61]

Verifiable Claims

  • Why: TAI development needs to demonstrate and quantify its overall trustworthiness. Quantifiable and verifiable claims will further accelerate and promote TAI-based deployment.
  • How: Three types of directions have been taken for developing verifiable claims:
    1. Theoretical methods like formal verification for deep learning robustness [23]
    2. Certification or grade-based approaches like compliance certificates [17] and care labels [44]
    3. Quantification techniques by combining one or more attributes of TAI, like causal regulator-based trust quantification [33] and fairness-based trust quantification [59]

The growing trends in technology advancement and digitalization make a strong case for AI-driven policy implementations. Verifiable trust quantification of ethical and responsible AI-driven policy implementations will bolster tomorrow’s safer, inclusive, diverse, and equity-based society. AI-based applications should strive to find trade-offs between the overall performance and TAI attributes. These applications should demonstrate their trustworthiness by developing algorithms incorporating contextual and causally justified enhancements, transparent, fair, secure, and privacy-preserving, and robust capabilities. By ensuring these characteristics, these applications can become a vital medium to realize a safer, inclusive, diverse, and equity-based society. The progress of the TAI domain is significant and critical for a better tomorrow.

The author MM would like to thank Drs. Soundar Kumara (Ph.D. advisor, Penn State University) and Ram Sriram (NIST), Egbe-Etu Etu, Abigail Linder, and Tatiana Klett for their help and support that helped him towards the culmination of this article.



All the references listed in this article can be viewed on the link: References_PDF