========================================
[102, 197, 252, 400, 565]
- Select the [first], [third], and [fifth] number.
- Calculate the following: [first] [add] [fifth] [divide] [third].
- Report the [second] decimal place of the answer from step 2.
Step by step answer: [102, 252, 565] | 104.24206 | 4
[2, 4, 8, 3, 5]
- Sort in descending order.
- Select [first], [third] and [fifth] number.
- Write a sentence where each word's length corresponds to the selected sequence of numbers.
Step by step answer: [8, 5, 4, 3, 2] | [8, 4, 2] | "Daughter eats pie."
This approach presents several distinct advantages:
(1) With an extended input list of numbers and randomized bracket selections, each question type generates ample variations. For example, question 1 with a list of 5 numbers and 4 types of mathematical symbols (+ - / *) produces 5 * 5 * 5 * 4 * 4 variations, mitigating the risk of data contamination.
(2) The step-by-step answer format allows for a more precise evaluation of a model's ability to follow instructions. Assessment criteria could be based on the number of instructions correctly followed or a similar metric.
(3) This method enables a comprehensive assessment of numerical reasoning abilities, providing insight into an LLM's computational accuracy as well as its creative problem-solving capabilities.
As I am relatively new to AI research, I am eager to gather feedback on this concept. Please feel free to provide any critique or suggestions. Image is an example of chatgpt and oasst with the problem.