AQA for User Intent Resolution | Google Developer Community | Page 1

slender steppe Apr 5, 2024, 1:56 PM

#

Creating this thread for those who want to discuss using the AQA Gemini model for user intent resolution.

late crypt Apr 5, 2024, 1:57 PM

#

Ok, I have to admit. Using AQA and not Functions for intent detection is... I have to think about it.

slender steppe Apr 5, 2024, 3:03 PM

#

late crypt Ok, I have to admit. Using AQA and not Functions for *intent detection* is... I ...

How are you using Functions in this capacity? Here is how I use AQA. Below is a CSV sample file I load from disk that contains the inline grounding passages` I use for user intent resolution in my article/blog post writing assistant:

follow link one, I will now follow the link
follow link two, This is the web page the link visits
summarize one, This web page talks about raising puppies
summarize two, Here is a summary of this page
enter text one, Let's fill in the text for this field
enter text one, Let's create the content you need now for this text box
blog post one, I can help you write an article
blog post two, I will now combine your research into a blog post

The first field is the intent name, the second is the "sample" inline grounding passage I use to support the intent. As you can see, it's structured in the form of a hypothetical answer to a question that would fit the intent.

NOTE: The sample or hypothetical answer is not meant to be shown to the user. Once the user intent is resolved, my app then hands control over to code that uses a non-AQA model to work with the user to execute the operations indicated by the resolved intent (e.g. - writing a blog post/article)

Here is a sample run:

<cont'd>

#

SAMPLE RUN

USER INPUT: I want to write a blog post.

Answer: I can help you write an article. SourceId: blog post one. AnswerProb: 0.9986335).

USER INPUT: Can you help me write a document?

Answer: I can help you write an article. SourceId: blog post one. AnswerProb: 0.9967421).

USER INPUT: Give me a sample page

Answer: This is the web page the link visits. SourceId: follow link two. AnswerProb: 0.8102423).

==================

USER INPUT: Follow that link

Answer: I will now follow the link. SourceId: follow link one. AnswerProb: 0.91078687).

USER INPUT: Take me to that web page

Answer: I will now follow the link. SourceId: follow link one. AnswerProb: 0.861801).

USER INPUT: Let's go to that web site

Answer: I will now follow the link. SourceId: follow link one. AnswerProb: 0.8034113).

==========

USER INPUT: Can you give me an overview of the page?

Answer: This web page talks about raising puppies. SourceId: summarize one. AnswerProb: 0.9036653).

USER INPUT: Just summarize it for me

Answer: Here is a summary of this page. SourceId: summarize two. AnswerProb: 0.8448937).

USER INPUT: What is the gist of this article

Answer: raising puppies. SourceId: summarize one. AnswerProb: 0.13272023).

==========

<cont'd>

#

As you can see, the AQA model nails the task with a minimal amount of effort on my part. I will update this thread when my intent list gets really large to see how it holds up.

late crypt Apr 5, 2024, 3:15 PM

#

I will have functions for a variety of actions, possibly with parameters. and descriptions for all of those. So it does a match of my phrase against the best described function and extracts the parameters in it.

So a definition like

  "tools": [
    {
      "functionDeclarations": [
       {
        "name": "test",
        "description": "Run a test with a specific name and get if it passed or failed",
        "parameters": {
         "type": "object",
         "properties": {
          "testName": {
           "type": "string",
           "description": "The name of the test that should be run."
          }
         }
        }
       }
      ]

If I said something like "Run the omega-1 test" it would send back a part with

          {
            "functionCall": {
              "name": "test",
              "args": {
                "testName": "omega-1"
              }
            }

I can include a bunch of function declarations and the best one matches.

slender steppe Apr 5, 2024, 3:17 PM

#

Ok, so functions are not helping you with intent resolution, but are the "execute" side of what to do if an intent is matched? So you use the AQA to do a match against the description of each function to do the intent resolution? Do I have that right?

late crypt Apr 5, 2024, 3:20 PM

#

It does the intent detection and entity extraction part. What I do with it at that point is up to me. So I'd determine what to do based on the name, and then using the arts to determine how to do that.

I'm still digesting your use of AQA.

It seems like it does general matching, but can't do entity extraction unless you've definied those entities.

slender steppe Apr 5, 2024, 3:22 PM

#

late crypt It does the intent detection and entity extraction part. What I do with it at th...

Right. I only want it to do intent resolution. The hand-off module that executes the intent would re-process the same user input and then do things like entity extraction etc., typically using a prompt that instructs the general (non-AQA) model to output a JSON object with the desired elements.

late crypt Apr 5, 2024, 3:23 PM

#

I'm not sure what you mean by "resolution" in this case. Just picking which intent?

slender steppe Apr 5, 2024, 3:24 PM

#

late crypt I'm not sure what you mean by "resolution" in this case. Just picking which inte...

Yes, exactly., that is, conforming all the different ways a user could say something into one of a set of crisp intents that I can then use to execute a specific task.

late crypt Apr 5, 2024, 3:25 PM

#

Nod. Makes sense. (Your text strings seem free-form enough that they looked like what actually would be sent to the user, even tho you said they weren't. So was making sure I understood.)

slender steppe Apr 5, 2024, 3:27 PM

#

(Image from an AI gen artist I like on twitter @Psuedoliv1)

#AQA for User Intent Resolution