Context:
Hi, I am a Data Scientist in working on a local reproduction of the "GPTs are GPTs" paper (https://arxiv.org/pdf/2303.10130.pdf). In this paper, the authors discuss the impact of LLM on the workforce and give predictions on the expected levels of automation.
I would like to reproduce these estimates on local data, unfortunately the authors do not provide the occupation labels generated by GPT (estimates on the level of automation potential per task for each occupation).
Question:
Therefore i need to generate these myself! That means calling the GPT-4 api with a categorization instruction (rubric) on 20 000 Occupation/task combinations (approximately 10 tasks per occupation or so). Since the rubric described in the paper is about 6000 characters, this operation can be quite expensive. I am therefore wondering if any smart people in this forum could assist me in making smart API calls. From what i understand, i would have to pass the rubric with each request, costing nearly 1500 tokens(!), even though the task/occupation pairs are only about 100 each. I only need a 2 token response for each label (2 characters). Is there a way i can make the request to GPT-4 (Or any other, more suitable model(?)) where i only need to send the task/occupation pairs each time for a label?
Thanks in advance! 