Problem Statement
The existing emote search algorithm, while effective for finding emotes with exact name matches, suffers from a significant drawback: it prioritizes exact text matches within the default_name field over the overall popularity and prevalence of an emote. This can lead to counter-intuitive search results where rarely used emotes with exact name matches outrank popular and widely used emotes that have partial matches or are relevant through tags.
Examples of the Problem:
https://discord.com/channels/817075418054000661/1317190353295507519
My Proposal
I propose some complex mb junky thingy.
- A Prevalence Metric (
channel_count): Adding a new field to the emote data model to track the number of channels in which a particular emote is used - Modifying the current Sorting Algorithm: Adjusting the sorting formula to incorporate the emote's prevalence alongside text relevance and overall popularity
- Implementing Sort Mode Options: Providing users with the ability to choose between different sorting modes, allowing them to prioritize exact matches or popularity as needed
Walk-trough
1) channel_count field
The objective of this is to obtain a quantifiable measure of how widely an emote is used.
#[derive(Debug, Clone, Default, serde::Deserialize, serde::Serialize, TypesenseCollection)]
#[typesense(collection_name = "emotes")]
#[serde(deny_unknown_fields)]
pub struct Emote {
// ... other fields ...
pub channel_count: i32,
}
The channel_count field should be updated in response to the following events:
- When a user adds an emote to their channel's emote set.
- When a user removes an emote from their channel's emote
- Upon the creation of a new emote, its
channel_countshould be initialized to zero. - When an emote is deleted, its
channel_countbecomes irrelevant. The record might be deleted or marked as deleted.
... mb something else?