Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Name: Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results
Creator: Ondrej Vadinský
Keywords: 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Ondrej Vadinský

Found an issue? Give us feedback

Journal of Artificia...arrow_drop_down

Journal of Artificial General Intelligence

Article . 2018 . Peer-reviewed

License: CC BY NC ND

Data sources: Crossref

Journal of Artificial General Intelligence

Article

License: CC BY NC ND

Data sources: UnpayWall

DBLP

Article

Data sources: DBLP

https://dx.doi.org/10.2478/jag...

Article

Data sources: Microsoft Academic Graph

Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 2018 English Publisher:Walter de Gruyter GmbHJournal:Journal of Artificial General Intelligence, volume 9, pages 1-54 (eissn: 1946-0163,

Copyright policy )

Authors: Ondrej Vadinský;

doi: 10.2478/jagi-2018-0001

Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

- Summary
- Related research
  (1)
- Metrics

Abstract

Abstract This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.

Related Organizations

University of Economics Prague
Czech Republic
Knowledge University
Iraq

1 Research products, page 1 of 1

REGULARITY OF SOLUTIONS FOR SINGULAR SCHRÖDINGER EQUATIONS
1992IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average