#How can I use assistant api to generate quiz from a pdf?

1 messages · Page 1 of 1 (latest)

regal stratus
#

I am uploading my PDF to OpenAI, and then creating an assistant by providing it instructions and attaching it to the file.id of the uploaded PDF.

      const file = await openai.files.create({
        file: fileStream,
        purpose: "assistants",
      });
      console.log(file);
      const assistant = await openai.beta.assistants.create({
        instructions:
          "You are a quiz generator, make use of your knowledge based on the pdf to generate a quiz for the user based on the specifications.",
        model: "gpt-3.5-turbo-1106",
        tools: [{ type: "retrieval" }],
        file_ids: ["file-NLj0Kes2UrQGXxv5TcF5JbFP"],
      });

How can I proceed further to generate a response?

Say the user uploads the PDF and asks to generate a quiz of 5 mcq type questions. Then the AI should give the desired output.

I do know that I need to create a thread, and then create a run and then retrieve run. If this is the correct way to go then help me through the process, or if there is another way then help me understand that too.

viral turretBOT
#

Hi @regal stratus. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Based on the information provided in the extracts, you're on the right track with your current approach. After creating the assistant and uploading the file, you would indeed need to create a thread, add a message to the thread, and then create a run to generate a response. Here's a general outline of the process:

  1. Create a Thread: After creating an assistant, you need to create a thread when a user starts a conversation. This is done using the client.beta.threads.create() method.

  2. Add a Message to the Thread: You can add messages to the thread as the user asks questions. This is done using the client.beta.threads.messages.create() method. In your case, the content of the message could be something like "generate a quiz of 5 mcq type questions".

  3. Create a Run: After adding the message, you need to create a run to trigger responses from the assistant. This is done using the client.beta.threads.runs.create() method.

  4. Retrieve the Run: You can then retrieve the run to get the assistant's response. This is done using the client.beta.threads.runs.retrieve() method.

Here's a Python code snippet from the extracts that demonstrates this process:

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="generate a quiz of 5 mcq type questions",
    file_ids=[file.id]
)

run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id
)

start_time = time.time()
while run.status!= "completed":
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )
    if run.status in ["failed", "cancelled", "expired", "requires_action"]:
        print(f"run failed: {run.last_error}")
        break

end_time = time.time()
#

messages = client.beta.threads.messages.list(
thread_id=thread.id
)

print(messages)


Please note that this is a Python code snippet and you're using JavaScript. You'll need to adapt this code to JavaScript using OpenAI's Node.js SDK.

For more information, you can refer to the **OpenAI Assistants API documentation** (<https://platform.openai.com/docs/assistants/overview>).

Please note that the actual implementation might vary depending on the specifics of your application and the capabilities of the assistant.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
#

@regal stratus kapa.ai is still learning and improving, please let us know how it did by reacting below

regal stratus
#

@viral turret , please go through this.

      const thread = await openai.beta.threads.create({
        messages: [
          {
            role: "user",
            content: "Generate me a quiz of 5 questions in mcq format.",
            file_ids: [file.id],
          },
        ],
      });

      const run = await openai.beta.threads.runs.create(thread.id, {
        assistant_id: assistant.id,
      });
      
      const completedRun = await openai.beta.threads.runs.retrieve(
        thread.id,
        run.id
      );

This is my current approach, I am creating a thread with a message from the user, then creating a run and then trying to retrieve the run.

But with this I am unable to understand where my response is,
When I am logging the completedRun this is what I have,

{
  id: 'run_Tlnvi79hD3tHQb0gnAsDV1MF',
  object: 'thread.run',
  created_at: 1705819442,
  assistant_id: 'asst_TRNUzlzM1k7Wh0iBVIpdHYeV',
  thread_id: 'thread_i87CLddwJ40ixjaAU9iVDvIa',
  status: 'completed',
  started_at: 1705819443,
  expires_at: null,
  cancelled_at: null,
  failed_at: null,
  completed_at: 1705819453,
  last_error: null,
  model: 'gpt-3.5-turbo-1106',
  instructions: 'You are a quiz generator, make use of your knowledge based on the pdf to generate a quiz for the user based on the specifications.',
  tools: [ { type: 'retrieval' } ],
  file_ids: [ 'file-NLj0Kes2UrQGXxv5TcF5JbFP' ],
  metadata: {},
  usage: { prompt_tokens: 3729, completion_tokens: 399, total_tokens: 4128 }
}

Where is my response? Or do i need something else to get the response?

viral turretBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, the response from the assistant is not directly included in the run object. Instead, you need to retrieve the messages from the thread to get the assistant's response.

In your case, after the run is completed, you should retrieve the messages from the thread. The assistant's response will be included in these messages. Here's how you can do it:

const messages = await openai.beta.threads.messages.list(thread.id);
console.log(messages);

This will return a list of messages in the thread, including the assistant's response. You can then parse these messages to extract the assistant's response.

Please note that the actual implementation might vary depending on the specifics of your application and the capabilities of the assistant. For more information, you can refer to the OpenAI Assistants API documentation (https://platform.openai.com/docs/assistants/overview).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@regal stratus kapa.ai is still learning and improving, please let us know how it did by reacting below

regal stratus
#

@viral turret , this helped a lot.
I am able to get the response.
One more query, if my only purpose is for the user to upload the files and generate 5 mcq type questions from that PDF file, then which is a better approach?

Should I extract the text from the PDF and then send the text as prompt?
Or
Should I extract the text from the PDF, divide it into chunks and then send the text as prompt>? (Not sure how to achieve this since I ultimately need to generate quiz from the entire PDF)
Or
Should I use the assistant API to upload the PDF to the OpenAI, create an assistant and with the help of the assistant generate a quiz?

Also give me the use of tokens in each case, like approx use of tokens. Assuming I'll maximum be sending a file of 10 pages.

viral turretBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, here are the three approaches you mentioned with their potential benefits and drawbacks:

  1. Extract the text from the PDF and send the text as a prompt: This approach is straightforward and doesn't require uploading the file to OpenAI. However, if the text is too long (more than the model's maximum token limit, which is 4096 tokens for gpt-3.5-turbo), you won't be able to send it all at once. You would need to chunk the text and send each chunk separately, which might result in disjointed or incomplete responses.

  2. Extract the text from the PDF, divide it into chunks, and send the text as a prompt: This approach allows you to process longer documents by breaking them down into manageable chunks. However, it requires careful management of the chunks to ensure that the context is maintained. Also, the total token usage would be the sum of the tokens in each chunk.

  3. Use the assistant API to upload the PDF to OpenAI, create an assistant, and generate a quiz: This approach allows you to leverage the assistant's ability to read from uploaded documents. It's more flexible and can handle longer documents, but it requires more setup and uses the OpenAI API, which has associated costs. The token usage would depend on the length of the document and the complexity of the prompts and responses.

In your case, since you want to generate a quiz from the entire PDF, the third approach might be the most suitable. It allows the assistant to access the entire document and generate more comprehensive and context-aware questions.

#

As for the token usage, it's hard to give an exact number without knowing the specifics of the document and the prompts. However, keep in mind that both the prompt and the response count towards the total token usage. For a 10-page document, if the text is dense, it might be close to or exceed the model's maximum token limit. You can use OpenAI's tiktoken (https://github.com/openai/tiktoken) Python library to count the number of tokens in a text string without making an API call.

Remember to review the OpenAI Pricing (https://openai.com/pricing) for details on the costs associated with tokens and API usage.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@regal stratus kapa.ai is still learning and improving, please let us know how it did by reacting below

regal stratus
#

@viral turret ,
Okay, can I reduce my token usage with using assistant api?
This is my current approach?
Are there any tweaks I should consider or is it okay?

const fileStream = await fetch(url);

      const file = await openai.files.create({
        file: fileStream,
        purpose: "assistants",
      });
      
      const assistant = await openai.beta.assistants.create({
        instructions:
          "You are a quiz generator, make use of your knowledge based on the pdf to generate a quiz for the user based on the specifications.",
        model: "gpt-3.5-turbo-1106",
        tools: [{ type: "retrieval" }],
        file_ids: [file.id],
      });

      const thread = await openai.beta.threads.create({
        messages: [
          {
            role: "user",
            content: "Generate me a quiz of 5 questions in mcq format.",
            file_ids: [file.id],
          },
        ],
      });

      const run = await openai.beta.threads.runs.create(thread.id, {
        assistant_id: assistant.id,
      });

      const completedRun = await openai.beta.threads.runs.retrieve(
        thread.id,
        run.id
      );

      const messages = await openai.beta.threads.messages.list(
        thread.id
      );
viral turretBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Your current approach seems to be well-optimized for the task at hand. However, there are a few general strategies you can consider to potentially reduce token usage:

  1. Use shorter prompts: The length of the prompt contributes to the total token count. If possible, try to make your prompts concise while still clearly conveying the task to the assistant.

  2. Fine-tune models: Fine-tuning models can potentially reduce the number of tokens required by making the model more efficient at the specific task. However, as per the extracts (https://help.openai.com/en/articles/7232945-how-can-i-use-the-chatgpt-api), fine-tuning is currently only supported for base models.

  3. Cache common user queries: If there are common queries that users frequently ask, you can cache the responses to these queries so that they don't need to be processed repeatedly. This can save tokens, but it might not be applicable in your case if each user's queries are unique.

  4. Use smaller models for some tasks: Smaller models use fewer tokens, so you could consider using a smaller model if it can still perform the task adequately. However, this might not be applicable in your case if the task requires the capabilities of a larger model.

Remember that the actual token usage will depend on the specifics of the document, the prompts, and the responses. You can use OpenAI's tiktoken (https://github.com/openai/tiktoken) Python library to count the number of tokens in a text string without making an API call.

For more information on managing costs and estimating token usage, you can refer to the OpenAI Production Best Practices Guide (https://platform.openai.com/docs/guides/production-best-practices).

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@regal stratus kapa.ai is still learning and improving, please let us know how it did by reacting below

regal stratus
#

@viral turret ,

        const run = await openai.beta.threads.runs.create(thread.id, {
          assistant_id: assistant.id,
        });
        console.log(run);

        while (run.status !== "completed") {
          const completedRun = await openai.beta.threads.runs.retrieve(
            thread.id,
            run.id
          );
          console.log(completedRun);
          if (
            ["failed", "cancelled", "expired"].includes(completedRun.status)
          ) {
            throw new Error("Error while generating quiz from OpenAI.");
          }
        }

        const completion = await openai.beta.threads.messages.list(thread.id);
        console.log(completion);
        let message = completion.data[0].content;

With this code, I am getting rate limiting error from OpenAI since retrieve run is being called till the status is completed, but if I remove that loop then how can I call the retrieve run only and only when the run status is completed?

viral turretBOT
#

Let me check the knowledge base for you. One moment! :books:

#

Based on the information provided in the extracts, you can handle rate limit errors by implementing a backoff strategy. This means that if you encounter a rate limit error, you wait for a certain amount of time before making the next request. The waiting time increases exponentially with each subsequent rate limit error.

Here's how you can implement a simple exponential backoff strategy in your code:

const run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});

let delay = 500; // Start with a delay of 500 ms
while (run.status !== "completed") {
  try {
    const completedRun = await openai.beta.threads.runs.retrieve(
      thread.id,
      run.id
    );
    if (["failed", "cancelled", "expired"].includes(completedRun.status)) {
      throw new Error("Error while generating quiz from OpenAI.");
    }
  } catch (error) {
    if (error.message === "Rate limit error") {
      await new Promise((resolve) => setTimeout(resolve, delay));
      delay *= 2; // Double the delay for the next attempt
    } else {
      throw error; // If it's not a rate limit error, rethrow it
    }
  }
}

const completion = await openai.beta.threads.messages.list(thread.id);
let message = completion.data[0].content;

This code will keep trying to retrieve the run until it's completed, but if it encounters a rate limit error, it will wait for a certain amount of time before trying again. The waiting time doubles with each subsequent rate limit error.

Please note that this is a simple implementation and might not cover all possible scenarios. For a more robust solution, you might want to consider using a library that provides advanced retry strategies, such as the Tenacity library (https://github.com/jd/tenacity) in Python.

#

@regal stratus kapa.ai is still learning and improving, please let us know how it did by reacting below

regal stratus
#

@viral turret ,
Modify the function so that it calls the run retrieve after every 500ms, irrespective of the error.

viral turretBOT
#

Let me check the knowledge base for you. One moment! :books: