We Need a Turing Test for Learning

7 min readJun 2, 2018

We need a Turing Test for learning, which tests not what a machine or other intelligent entity knows, but its ability to learn. Intelligence is far more than simply possession and use of knowledge, but fundamentally is concerned with acquisition of knowledge or learning. Put simply, machines won’t be able to achieve human-level intelligence or Strong AI without something comparable to human-level learning.

Here’s a scary, breathtaking reality — I tried a Google search for “Turing Test for Learning” (in quotes) and Google came back with:

No results found for “Turing Test for Learning”.

Wow, really??!!

Like, nobody ever thought or wrote about this before?

Incredible!

So, here we go…

It is not the intention of this informal paper to propose or detail a specific Turing-style test for learning, but simply to highlight the nature of the need for such a test or tests plural.

Apologies to Turing

Technically, there is no reason that a test for learning should be associated with Turing per se, but I use his name here as an expedient since so many people are aware of the Turing test for intelligence.

Background

For background reference, consult the Wikipedia article for Turing Test. Or consult Turing’s original paper directly, in which he refers to it as The Imitation Game.

Conceptually, we could design and build a robot with all known human knowledge and wisdom, which would enable this machine to do lots of interesting and cool stuff, but even such an all-knowing machine would not be intelligent per se since (if) it lacked human-level learning abilities.

Machine learning (ML) is all the rage right now, especially so-called deep learning or neural networks, but as interesting or even amazing as these capabilities are, they still pale in comparison to true human-level learning.

Human learning

Human learning encompasses a wide range of modes, abilities, and behaviors, from very simple to very complex. Human learning occurs at all stages of human life, from infancy and childhood to school and adult maturity.

What kind of test or tests, plural can adequately assess whether a machine can compete with this fantastic range of human learning capabilities?

Granted, machines don’t have anything comparable to human biological stages of life, but there may well be stages of learning which may be comparable between machines and people.

Or levels of learning.

Or something like that. I really don’t know, since this is truly uncharted territory.

Range of tests for learning

Who knows, maybe it might be possible to conjure up a single Turing test for learning that can somehow, magically, cover all manner of human-level learning. I doubt it, but you never know.

But, for now, I’ll tentatively posit that a range of test suites will be necessary, with a host of tests tailored to each discrete subinterval of the range.

Whether the full range or spectrum of learning actually has discrete subintervals or is in fact continuous can remain an open question, for now.

Learning function

It may also be true that a machine can (hypothetically) approximate the full range of human learning with either a more discrete or a more continuous range than is true for the typical human learner. Even if both are discrete, maybe the granularity of subintervals might be different even as the full range is equivalent.

In other words, mathematically we want to take the integral of the learning function over the range, and as long as the integral is the same between human and machine, the actual learning function may be somewhat or rather different.

Subintervals of learning

Without prejudicing the eventual path of research and progress in this area, let me start with a simple proposal for an initial set of discrete subintervals of the learning function:

Human infant, 0–3 months.
Human infant, 3–6 months.
Human infant, 6–12 months.
Human toddler, 1–2 years.
Human small child, 2–4 years.
Human child, 4–6 years.
Human child, 6–8 years.
Human child, 8–10 years.
Human tween, 10–12 years.
Human young teenager, 12–14 years.
Human teenager, 14–16 years.
Human high school student, 16–18 years.
Human college student, 18–22 years.
Human grad student, 22–25.
Human recent college grad, 22–25 years.
Human young adult, 25–30 years.
Human mature adult, 30–50 years.

There are two questions with that list of intervals:

How does one learn at that age or stage?
How can you test or inquire so as to assess what level of learning ability or age-level a given individual possesses?

Learning IQ

There is also the notion of something comparable to IQ, so that orthogonal to age, for each age interval we might have:

Well below average learning ability for this age.
Below average learning ability for this age.
Average learning ability for this age.
Above average learning ability for this age.
Well above average learning ability for this age.
Near-genius learning ability for this age.
Genius learning ability for this age.
Super-genius learning ability for this age.

Reading comprehension

One possible proxy for a test of learning is reading comprehension — expose the intelligent entity to a story and then ask it questions to ascertain how well it comprehended the overall theme and specific details of the story.

And ask it questions that cannot be answered from the story text to assure that answers are not simply being made up and determined by approximate heuristics.

For a baseline, the test would need to ask the same set of questions before being exposed to the story so that it can assess learning ability rather than be fooled by preexisting knowledge.

And one could offer a sequence of stories to verify that the intelligent entity can keep the stories apart and answer questions only in the context of the relevant story as opposed to previous stories. One could also pose questions related to previous stories, such as “back in the story of X and Y, …”

One could also ask the intelligent entity to retell the story in its own words, although there are probably automated heuristics which could mimic such a task without requiring real intelligence and true comprehension per se.

Reading level

One could piggyback a reading level test on the reading comprehension test by offering a sequence of stories, each posed at distinct reading levels to get a sense of what reading level the intelligent entity is capable of.

The point of a test for reading level would simply be that presumably a truly intelligent AI system should have a reading level approximating a mature adult, or at least a high school or college student, as opposed to that of a child in elementary school. Although, even the reading level of a child would be a major leap forward.

Beyond reading

Reading comprehension alone is not a full indication for human-level learning. A machine must also be able to learn from:

Viewing images.
Viewing video.
Listening to voices. All of the aspects of speech beyond simply the textual words.
Listening to sounds.
Listening to music.
Viewing experiences.
Participating in activities with other intelligent entities which it is expected to learn from.
Making mistakes of its own — and learning from them.
Creativity. Creating that which never existed.

Learning how to learn

Beyond simply a basic ability to learn, there is the ability to learn how to learn.

How to test this quantum leap of learning potential is anybody’s guess at this stage.

General test for learning

I don’t have any great suggestions for a specific general test of learning, but I think any such test would need to satisfy several key requirements.

Requirements for any general test for learning

Any general test for learning would need to satisfy several key requirements:

The question is whether machine M is capable of learning about topic T such that it can answer questions Q1, Q2, … Qn about said topic.
The test must first verify that machine M does not already possess knowledge about topic T to answer those questions. The machine must be unable of how to answer any or at least most of the designated questions.
Expose the machine to enough knowledge, K, and experience, E, so that the machine should be able to learn enough about topic T to answer the chosen questions.
Ask the questions again to assess how much the machine learned.
Repeat for some number of other topics, T2, T3, … Tn.
The knowledge and experience for each topic should be graduated so that the ability to answer the designated questions can be used to judge how able the machine is to learn, independent of the particular topic.

Even such a rigorous process as that might not be sufficient as a general test, but at least it’s a start, and any ultimate test for learning will need to do at least as good as that.

Any given test might verify only that some degree of learning was observed. At best, such a test might only validate a portion of the complete learning ability of the intelligent entity.

Test suites

As indicated earlier, there may need to be a whole range of tests and test suites for learning.

The challenge seems quite daunting.

Heuristic test of learning

Given the great challenge of fully assessing learning, maybe the real question is whether there is any very small heuristic test or heuristic test suite which can give a rough ballpark estimate of learning ability without being exhaustive. Like an 80% confidence for 1% of the full effort.

And then once a basic heuristic test delivers an initial approximate assessment of learning ability, further incremental heuristic tests could incrementally narrow in on an ever-finer approximation of an accurate assessment of learning ability.

What’s next?

The sole purpose of this informal paper is to highlight the issue of the need for a decent test of learning ability of an intelligent entity and suggest a starting point for discussion, not to outline a full pathway to the ultimate solution.

Some simple questions to ponder:

Is the notion of discrete subintervals of learning ability sensible, or is a more continuous learning model more appropriate?
Are the discrete subintervals suggested here the most reasonable?
Should there be more subintervals?
Should there be fewer subintervals?
Does it make sense to pursue development of tests before we have developed the actual learning capabilities themselves?
Might a traditional IQ test apply here?
Do we need some alternative model comparable to IQ for machine intelligence?
Do we need a new model for metrics for knowledge and intelligence?

The real bottom line here, the point of this informal paper, is that any judgment of progress towards Strong AI should be heavily based on assessing the ability of the intelligent entity to learn with comparable skill to a human.

Again, intelligence is not simply what you know, but critically your ability to learn.

For more of my writings on artificial intelligence, see List of My Artificial Intelligence (AI) Papers.