#Extract Main Title

62 messages · Page 1 of 1 (latest)

tender iris
#

What's the best way to extract movie franchise name from a part title like "Thor Ragnarok" is based on "Thor" or "Avengers: Age of Ultron" is based on Avengers

I want it to generate list of main movie topics but not too part specific

storm saffron
#

The problem is this is a very subjective thing that doesn’t apply to all movies. What about:

  • the Green Book
  • I Know What You Did Last Summer
  • Water
  • Justice League
  • John Wicks
#

You’d need a good definition before trying automatic solutions

tender iris
#

mb for creating a thread

tender iris
#

ok so lets make an assumption I have a database of all the topics like "Avatar", "Breaking Bad", "Avatar 2", "Avengers", "Avengers: Age of Ultron". Now a user goes to my app and searches for a topic. I give user the most similar topic like lets say user queried "Avatar" then I will let him select "Avatar" topic because it exists in database.
@storm saffron all good till here right
now moving on I will constantly fetch new articles about movies. And I want to tag them if they have the same topic string.
Now one approach can be to loop and go to every topic and see if it exists in the article string
can be done but pretty inefficient imo
plus that db can be a graph db so that for example all "avatar" related topics can be interlinked
I hope you are getting me 😅

#

_
_
pasted the text ^^

storm saffron
tender iris
#

and deleted from #📖・nlp-discussions

storm saffron
#

You just need to classify most likely title for each recognized named entity. If non exists, then you don't need to loop

#

You'd still need a data quality feedback mechanism to ensure your new data can be good quality in the long term

#

Classify each of the recognized movie title into a title in the DB,

#

Have a user mechanism to indicate "This article does not contain title that I want"

#

If enough people selected that, you'd want to fix that data point

#

I think this works

tender iris
#

I hope I am getting you

storm saffron
#

Oh uh

tender iris
storm saffron
#

It's two seperate problems

tender iris
#

didnt u say to create every title in db

#

uh

#

also sorry for taking ur time 🥲

storm saffron
#

First, in a batch process, you'd need to constantly link each new article in the database

#

So you would perform NER on new articles, and link them to the right title

#

That's to create your database

#

That way, you don't need to perform NLU on their query

#

You just need to let them select the title they want right?

#

So just an autocomplete multi-select

#

And every article that gets classified into those titles will be shown/delivered

tender iris
#

so thats why u suggested the user feedback thing?

#

but I wouldnt want those titles to be suggested as well

#

like only "breaking" or "bad"

storm saffron
#

Step 1, create a NER model to recognize movie titles. Your model should be good enough to recognize Breaking Bad, not just single words, if it's not, tune it

tender iris
#

ahhh

storm saffron
#

Step 2, Each entity should be classified existing list of lables, like "Breaking Bad"

tender iris
#

so i should have a custom trained model??

tender iris
storm saffron
#

So even if it's "Breaking", which one out of ["Avatar", "Breaking Bad", "Avatar 2", "Avengers", "Avengers: Age of Ultron"], is most likely?

storm saffron
#

Then say, there are two movie titles in that article, like "Breaking Bad" and "Avengers", in your DB, you link that article to "Breaking Bad" and "Avengers"

tender iris
#

mhm

storm saffron
#

Users who select "Breaking Bad" and "Avengers" as their subscription, will just be delivered with every article classified to have those

#

Now if a user read the article, and said "Avengers" is not actually on there, you would get the feedback from users to create a new ground truth. Obviously there's conflict resolution, to resolve two users have different opinions if "Avengers" actually exist or not.

tender iris
#

hmmm

storm saffron
#

But that should be a feedback to your classification model (to classify recognized entities into movie titiles in your db)

#

Users don't need to have free text query. They shouldn't. Just auto complete on movie titles, that's good enough

tender iris
#

makes sense so the main takeway is to train and finetune my nlp model and take feedback from users on classification

storm saffron
#

Yeah, and realize that it's two different models

#

One to find entities in the article

#

Another to match entities to actual movie titles

tender iris
#

and one to classify

storm saffron
#

Yup yup

tender iris
#

like linking alternate titles to breaking bad

storm saffron
#

It's not gonna be easy because you'll have large list of titles. You probably will eventually need NER for other things like characters, actors, etc as features to help classify, but that's next step