#Forecasting the number of strength buyers historical user purchase data

3 messages · Page 1 of 1 (latest)

snow shard
#

Hello, I am a statistical research and development graduate student and I have an interesting problem that I would like to share. I am trying to frame the problem correctly and would appreciate any opinions.

Imagine having a dataset with information on users (user_id) and the dates they purchased specific items (ex. ['toothpaste', 'apple']). The dataset contains multiple purchase dates for each user over time. If a user purchases items on sale on one day and then makes another sale purchase the next day, they get a coupon. This kind of purchase behavior is known as a "strength buyer".

My goal is to forecast the total number of users that will be strength buyers in the next week, defined as the latest date in the dataset plus seven days. To achieve this, I plan to create a flattened list for each user's purchase history, date history, and the last item purchased.

However, I cannot use word embedding techniques because cosine similarity requires vectors with equal dimensions, and each user will have a different number of purchases and purchase dates. Therefore, I am facing a categorical time series problem.

So, my question is, how can I generalize this problem to forecast the number of strength buyers while considering all the user data, rather than running time series analysis individually for each user?

Thank you for taking the time to read this.

thorn bobcat
#

Idk but would it be possible to use padding to make the vectors of equal dimension?

I use padding when I do tokenization. It's an inbuilt feature there.

snow shard
#

Potentially @thorn bobcat would that not be mathematically just ?