
Abstract This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
