Eliezer Yudkowsky
3 quotesArtificial Intelligence Researcher · Born Sep 11, 1979 · United States Of America · Male
Eliezer Shlomo Yudkowsky (born September 11, 1979) is an American AI researcher and writer best known for popularising the idea of friendly artificial intelligence. He is a co-founder and research fellow at the Machine Intelligence Research Institute, a private research nonprofit based in Berkeley, California. 2Work in artificial intelligence safety 3Goal learning and incentives in software systems Yudkowsky's views on the safety challenges posed by future generations of AI systems are discussed in the standard undergraduate textbook in AI, Stuart Russell and Peter Norvig's Artificial Intelligence: A Modern Approach. Noting the difficulty of formally specifying general-purpose goals by hand, Russell and Norvig cite Yudkowsky's proposal that autonomous and adaptive systems be designed to learn correct behavior over time: Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design – to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. Citing Steve Omohundro's idea of instrumental convergence, Russell and Norvig caution that autonomous decision-making systems with poorly designed goals would have default incentives to treat humans adversarially, or as dispensable resources, unless specifically designed to counter such incentives: "even if you only want your program to play chess or prove theorems, if you give it the capability to learn and alter itself, you need safeguards". In response to the instrumental convergence concern, Yudkowsky and other MIRI researchers have recommended that work be done to specify software agents that converge on safe default behaviors even when their goals are misspecified. The Future of Life Institute (FLI) summarizes this research program in the Open Letter on Artificial Intelligence research priorities document: If an AI system is selecting the actions that best allow it to complete a given task, then avoiding conditions that prevent the system from continuing to pursue the task is a natural subgoal (and conversely, seeking unconstrained situations is sometimes a useful heuristic). This could become problematic, however, if we wish to repurpose the system, to deactivate it, or to significantly alter its decision-making process; such a system would rationally avoid these changes. Systems that do not exhibit these behaviors have been termed corrigible systems, and both theoretical and practical work in this area appears tractable and usefu