A definition of general intelligence
In 2019, Chollet published a paper where he defines artificial general intelligence as follows:
The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty.
This definition focuses on human-like intelligence and uses insights from the field of psychometrics to design an intelligence test for human and non-human agents.
The field of psychometrics is now a well-established discipline, and its findings about the measurement of human intelligence are reliable. The same cannot be said for the state of the art in measuring intelligence of artificial systems.
Difficulties evaluating intelligence of systems
The conditions under which intelligence tests are carried out on humans are not easily transferable to systems.
When evaluating human intelligence, it’s not expected that the human will train for the test; however, that’s the current paradigm for evaluating systems. Developers provide as many examples as possible during a “training phase” so the system can learn and perform well in a final evaluation.
It’s difficult to build a test for systems that cannot easily be exploited – by the use of shortcuts – leading to a high score but in which the system does not show the type of intelligence the test is set to measure. As Chollet puts it:
… optimizing for a single metric or set of metrics often leads to tradeoffs and shortcuts … (a well-known effect on Kaggle, where winning models are often overly specialized for the specific benchmark they won and cannot be deployed on real-world versions of the underlying problem).
Measuring intelligence
Many of the approaches used to measure intelligence focus solely in the performance of an system in a single task, or a set of closely related tasks, Chollet argues that in order to measure general intelligence it is necessary to measure not only the skill of the system at a particular task, but it’s ability to deal properly with new – previously unknown – tasks.
The goal is to measure an system’s broad abilities instead of task-specific skills.