#Tool to evaluate LLMs on summarization

3 messages · Page 1 of 1 (latest)

vast tapir
#

I need a tool to evaluate llms models on a document summary dataset. Do you know of a tool that takes the dataset in json, models it and calculates the blue scores?

vast hearth
#

@turbid shadow do you think we can use embeddings here to evalute summarize , what if build a dataset containg 10 summaries of 1 content, each worse than the other . Plot the cosine similarity, you think can yield a result ?

#

Especially with contextualised token embeddings ?