Hybrid Cognitive-Affective Strategies for AI Safety

Publication date

2020-12-02

Authors

Aliman, Nadisha-Marie

Editors

Advisors

Werkhoven, P.J.
Masthoff, J.F.M.

Supervisors

Document Type

Dissertation
Open Access logo

License

Abstract

The steadily increasing capabilities in AI systems can have tremendous beneficial impacts on society. However, it is important to simultaneously tackle possible risks that these developments are accompanied by. Therefore, the relatively young field of AI safety has gained international relevance. In parallel, popular media were commenting on whether society should ascribe motifs such as fear or enthusiasm to AI. However, in order to assess the landscape of AI risks and opportunities, it is instead first and foremost of relevance not to be afraid, not to be enthusiastic, but to understand as similarly suggested by Spinoza in the 17th century. In this vein, in this thesis, a transdisciplinary examination is performed to understand how to address possible instantiations of AI risks with the aid of scientifically grounded hybrid cognitive-affective strategies. The identified strategies are “hybrid" due to the fact that AI systems cannot be analyzed in isolation and the nature of human entities as well as the properties of human-machine interactions have to be taken into account within a socio-technological framework and not only addressing unintentional failures but also intentional malice. Consequently, the attribute “cognitive-affective" refers to the inherently affective nature of human cognition. We consider two disjunct sets of systems: Type I and Type II systems. Type II systems are systems that are able to consciously create and understand explanatory knowledge. Conversely, Type I systems are all systems that do not exhibit this ability. All current AIs are of Type I. However, even if Type II AI is non-existent nowadays, its implementation is not physically impossible. Overall, we identify the following non-exhaustive set of 10 tailored hybrid cognitive-affective strategical clusters for AI safety 1) international (meta-)goals, 2) transdisciplinary Type I/II AI safety and related education, 3) socio-technological feedback-loop, 4) integration of affective, dyadic and social information, 5) security measures and ethical adversarial examples research, 6) virtual reality frameworks, 7) orthogonality-based disentanglement of responsibilities, 8) augmented utilitarianism and ethical goal functions, 9) AI self-awareness and 10) artificial creativity augmentation research. In the thesis, we also introduce the so-called AI safety paradox stating, figuratively speaking, that value alignment and control represent conjugate requirements. In theory, with a Type II AI, a mutual value alignment might be achievable via a co- construction of novel values, however, at the cost of its predictability. Conversely, it is possible to build Type I AI systems that are controllable and predictable, but they would not exhibit a sufficient understanding of human morality. Nevertheless, AI safety can be addressed by a cybersecurity oriented and risk-centered approach reformulating AI safety as a discipline which proactively addresses AI risks and reactively responds to occurring instantiations of AI risks. In a nutshell, future AI safety requires transdisciplinarily conceived and scientifically grounded dynamics combining proactive error-prediction and reactive error-correction within a socio-technological feedback-loop together with the cognizance that it is first of relevance not to be afraid, not to be enthusiastic, but to understand – that the price of security is eternal creativity.

Keywords

AI safety, cybersecurity, cognitive science, affective science, adversarial AI, virtual reality, cybernetics, AI governance, AI ethics

Citation