#Extract Main Title
62 messages · Page 1 of 1 (latest)
The problem is this is a very subjective thing that doesn’t apply to all movies. What about:
- the Green Book
- I Know What You Did Last Summer
- Water
- Justice League
- John Wicks
You’d need a good definition before trying automatic solutions
I see btw I discussed the problem further in #📖・nlp-discussions since someone earlier asked a similar problem there 😅
mb for creating a thread
ok so lets make an assumption I have a database of all the topics like "Avatar", "Breaking Bad", "Avatar 2", "Avengers", "Avengers: Age of Ultron". Now a user goes to my app and searches for a topic. I give user the most similar topic like lets say user queried "Avatar" then I will let him select "Avatar" topic because it exists in database.
@storm saffron all good till here right
now moving on I will constantly fetch new articles about movies. And I want to tag them if they have the same topic string.
Now one approach can be to loop and go to every topic and see if it exists in the article string
can be done but pretty inefficient imo
plus that db can be a graph db so that for example all "avatar" related topics can be interlinked
I hope you are getting me 😅
_
_
pasted the text ^^
Sorry I'm replying slow as I'm working, uh let's see
and deleted from #📖・nlp-discussions
sure np
You just need to classify most likely title for each recognized named entity. If non exists, then you don't need to loop
You'd still need a data quality feedback mechanism to ensure your new data can be good quality in the long term
Classify each of the recognized movie title into a title in the DB,
Have a user mechanism to indicate "This article does not contain title that I want"
If enough people selected that, you'd want to fix that data point
I think this works
so like perform a search for all the main NERs one by one right?
I hope I am getting you
What do you mean by search?
Oh uh
uh
It's two seperate problems
First, in a batch process, you'd need to constantly link each new article in the database
So you would perform NER on new articles, and link them to the right title
That's to create your database
That way, you don't need to perform NLU on their query
You just need to let them select the title they want right?
So just an autocomplete multi-select
And every article that gets classified into those titles will be shown/delivered
so if the NER has common titles they will be linked with each other? but this has a problem of getting lot of single word entities which I wont need right
so thats why u suggested the user feedback thing?
but I wouldnt want those titles to be suggested as well
like only "breaking" or "bad"
Step 1, create a NER model to recognize movie titles. Your model should be good enough to recognize Breaking Bad, not just single words, if it's not, tune it
ahhh
Step 2, Each entity should be classified existing list of lables, like "Breaking Bad"
so i should have a custom trained model??
so like train it on articles etc
So even if it's "Breaking", which one out of ["Avatar", "Breaking Bad", "Avatar 2", "Avengers", "Avengers: Age of Ultron"], is most likely?
hmmm
Then say, there are two movie titles in that article, like "Breaking Bad" and "Avengers", in your DB, you link that article to "Breaking Bad" and "Avengers"
mhm
Users who select "Breaking Bad" and "Avengers" as their subscription, will just be delivered with every article classified to have those
Now if a user read the article, and said "Avengers" is not actually on there, you would get the feedback from users to create a new ground truth. Obviously there's conflict resolution, to resolve two users have different opinions if "Avengers" actually exist or not.
hmmm
But that should be a feedback to your classification model (to classify recognized entities into movie titiles in your db)
Users don't need to have free text query. They shouldn't. Just auto complete on movie titles, that's good enough
makes sense so the main takeway is to train and finetune my nlp model and take feedback from users on classification
yea
Yeah, and realize that it's two different models
One to find entities in the article
Another to match entities to actual movie titles
and one to classify
Yup yup
i think i can take the help of graph db as well for 2nd model right
like linking alternate titles to breaking bad
It's not gonna be easy because you'll have large list of titles. You probably will eventually need NER for other things like characters, actors, etc as features to help classify, but that's next step
Yup yup