#timeseries
1 messages · Page 1 of 1 (latest)
Hello @hallow dagger
Hi, I need help with estimating market size of consumer credit, I am confused on which statistical model to go with. My data is aggregated by macro, industry and consumer
Have you looked into ARIMA @hallow dagger
@hallow dagger You can also look at SARIMA - it takes into account both seasonality and autocorrelation. VAR as well
Hi, I have 8 months(2022-01-01 00:00:00 to 2022-08-31 23:00:00) of measured time series data(not synthetic) on hourly basis. I want to forecast 1 month. ADF says data is stationary with values below:
ADF Statistic: -4.2776107655424855
p-value: 0.00048558484613883977
Critical Values: {'1%': -3.4314785489044994, '5%': -2.8620387116731525, '10%': -2.567035462232358}
However, when I apply seasonal decomposition, I see the trend changes over time. This confuses me.
Also, is it possible that data has both daily and weekly seasonality? If so, how should I approach this problem? How to select seasonality parameter? (24 for daily, 168 for weekly) Is lower MAPE always better? Thanks.
Who can recommend a good resource for mastering time series
Hi @spare bridge please provide recourse for time series
I’m also in search, I’d do well to drop once I find one
Hi everyone
Use lstm time series
Chat gpt is very good it will help you
I'm no expert but i believe most important part of time series is feature engineering, i've always had better predictions with xgboost on precisely chosen features
LTSM & ARIMA is usualy way to linear/simplistic regression when i tried using it on my problems (Also because they were multivariate problems probably)
Hi @spare bridge if you prefer books all in one i recommend this new book:
https://www.amazon.com/Modern-Time-Forecasting-Python-industry-ready/dp/1803246804
A fairly complete practical reference from fundamentals to advanced concepts like transformers for time-series forecasting, to quickly mastering time series forecasting use cases for everyone.
there is also this course which was my main source for learning time series
https://otexts.com/fpp3/regression.html
Althrough it's in R, the theory is here even if you dont use R
Hey everyone, I'm putting together a time series project and I was wondering if anyone knows... exactly how "realtime" is the yahoo finance api? Are we talking up to the second? Just wondering if anyone knows.
I think it's a 15-minute delay because it uses screen scraping... However, Alpha Vantage has a real-time one! The problem is that you can only make 500 calls a day (I think? The number might be different) in the basic version and you need to pay to make more.
indeed this book is very good but some details are explained in details in here but its a good place to start time series
Hi all, created a notebook on Time Series clustering using a statistics method called functional data analysis (FDA). Please take a look and let me know what you think if you're interested: https://www.kaggle.com/code/yuqizheng/time-series-clustering-with-fda
Reason: Bad word usage
Hello I am working on a time series forecasting project, and I decided to make a CNN-transformer model, but I did not find many codes for it, so I decided to make a model from scratch. The model worked, but its efficiency is less than other models, such as CNN-LSTM or LSTM-Attention, so if there is a person First, tell me if the model is built correctly and how I can improve it further.
`def transformer_encoder(inputs, model_dim, num_heads, ff_dim, dropout_rate):
Multi-head self-attention
attention_output = MultiHeadAttention(num_heads=num_heads, key_dim=model_dim, dropout=dropout_rate)(inputs, inputs) attention_output = Dropout(dropout_rate)(attention_output)
attention_output = Add()([inputs, attention_output])
attention_output = LayerNormalization(epsilon=1e-6)(attention_output) ffn_output = Dense(ff_dim, activation="relu")(attention_output)
ffn_output = Dropout(dropout_rate)(ffn_output)
ffn_output = Dense(model_dim, activation="linear")(ffn_output)
ffn_output = Add()([attention_output, ffn_output])
ffn_output = LayerNormalization(epsilon=1e-6)(ffn_output)
return ffn_output
#Model Parameters
input_shape = (24, 4)
num_filters = 64
kernel_size = 3
model_dim = 64
num_heads = 8
ff_dim = 100
dropout_rate = 0.1
#Model building
inputs = Input(shape=input_shape)
cnn_output = Conv1D(filters=num_filters, kernel_size=kernel_size, padding="same")(inputs)
cnn_output = Conv1D(filters=model_dim, kernel_size=kernel_size, padding="same")(cnn_output)
transformer_output = transformer_encoder(cnn_output, model_dim=model_dim, num_heads=num_heads, ff_dim=ff_dim, dropout_rate=dropout_rate)
transformer_output = GlobalAveragePooling1D()(transformer_output) transformer_output = Dense(64, activation="linear")(transformer_output)
Predicting 24 future values
outputs = Dense(24, activation="linear"))(transformer_output)
model = Model(inputs=inputs, outputs=outputs)`
I dunno much about that but i definitely want to see the answee
Hi everyone,
I'm currently working on time series forecasting for budget predictions and am facing challenges due to the limited data available. I need to predict for a horizon of 365 days, but I have only four months of daily data. I've already explored several models, including ARIMA, SARIMAX, Prophet, and some custom models using LSTM and hybrid approaches. I've also tried TIDE and TFT, experimenting with synthetic data which yielded promising results; however, I'm struggling to achieve similar outcomes with my real dataset. Could anyone suggest a model or approach that performs well with such constraints? Any tips or insights would be greatly appreciated!
Thank you!
Hi,
Anyone worked before on wind speed forecasting using deep learning ?
Hi , anyone here can please share the link for any tutorial in univariate timeseries transformers or GRU that can be replicated? That would be great help. I am shocked to see so many claps on articles on medium, and those articles are literally no where close to replicating. Infact some of them are so misleading.
Hi, could you please share the TIDE code?
hi
Im currently studying time series and I would like to share a notebook that a made about time series classic decomposition, feedback is appreciated, thanks! https://www.kaggle.com/code/caiomaxximus/time-series-decomposition-bike-sales-america
Hi All,
I am working with a client who is expecting me to do a commodity price forecasting on monthly basis. But they will be able to provide us with only monthly data for past 5 years. (60 data points)
I have tried Holt’s winter model, ARIMA, SARIMAX, LSTM, LR, Prophet. But the accuracy is not up to the mark.
What is the minimum data points requirement to do the monthly forecasting?
Can I please have help with the correct approach here?
if it is a trend plus seasonality thing, 5 years would be enough. But probably there are other factors and a degree of randomness involved. I'm not sure a daily granularity would improve things, unless you are going to use some sort of news feed as an input
Hi,
What is the best approach to know the state of weather of the next day (cloudy, sunny or partially sunny) based only on the actual and historical data of solar irradiance, temperature, Clear sky index. I would use it for solar irradiance forecast of one day ahead.
Thanks,
HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions
Thanks.
BSTS
Dm me
I am interested in trading and healthcare.
I want everyone's opinions about this topic in the linkedin comments.
hi folks, i am finishing my project as part of data science learning and need some help with the LSTM model i use. all fine but confusion matrix is broken... my kaggle notebook here https://www.kaggle.com/code/sheroleg/naya-final-lstm will be very appreciate for help. if possible please comment on notebook and not here.
need help in time series modeling
data:
Project year Month MoneyLeft
prj1 2024 1 1000
prj1 2024 2 800
prj1 2024 3 400
prj1 2024 4 100
prj2 2022 3 5000
prj2 2022 4 3493
prj2 2022 5 2000
prj2 2022 6 1000
fabrciate this for 10 to 20 projects ,each prorjecr can have month 12 to month 18
for a new project given moneyLeft for 2 or 3 months it should predcit next 4 months moneyLeft
the models like ARIMA ,SARIMA ,EXPONENETIAL SMOOTHING ETC will take only one season or trend,whick means we can train these model only on single project
1 .I have one solution like we can convert this time series problem to regression problem ,we can create lags or windows for three months and can predict for next 4 months , the problem here is it will train on that lags or windows only ,it should also be giving importance for project name (I do not no how to do)
- other solution would be we can train the model for each project which is not feasible here in this case
how to do this
#timeseries Attaching a text on quantum approach to time series. Comments appreciated.
If interested you may please go through my above thought paper
Hello guys, so i am working on an early warning score prediction model using LSTM as the prediction model. The dataset i was able to get had six vital signs features and 2 demographic features. So my main aim is to predict the next two to three days of a patient.
Now the dataset i was able to get has multiple patients with varying days of entries(some got their vitals taken for like three days, four and so). I have been kind of thinking whether this kind of dataset is fit for training that kind of model that can take any vital sign and predict what it will be for the next coming days.
can someone help me make a submission for this competition? https://www.kaggle.com/c/store-sales-time-series-forecasting/data
i have already created a good model on the train dataset but i dont understand how i should use the test data to make a submission. please ping me if you are able to help me
when dealing with time series data, how do we know if there is serial dependence in the data or not? is it a question of using domain knowledge or should we use methods like lagging and time step each time to check this thing?
You can try to plot Autocorrelation (ACF), Partial Autocorrelation (PACF), and Cross Correlation (CCF) https://business-science.github.io/timetk/reference/plot_acf_diagnostics.html
Thanks, will look into this
If anyone here as worked with Rocket transform for time series or using randomized convolutional kernels feature extraction, it'd be great if you could DM me, need advice on a project. Thenk
Hi, what happens when the data is not taken on regular intervals of time? Taking the example of ADSB data ( basically flight logs that are broadcasted at irregular intervals ), how would we analyse this data if it's not spread into regular intervals ? Also, would it make sense to model this kind of data using methods used for time series or should we use other features like position, altitude and velocity for predicting a flight's position at a given time?
What methods can be used for developing a system to flag outliers/ spoofing in flight data, things like a sudden unexplainable change in speed/altitude or times when the latitude /longitude data sent at a time doesn't make sense for the path being taken(common issue in ADSB)
Any suggestions or help is appreciated 🙂
Hey everyone!
I’m working on the CMI – Detect Behavior with Sensor Data Kaggle competition, where the goal is to classify BFRB vs non‑BFRB behaviors using wrist-worn sensor data (TOF, IMU, pressure, etc.)
https://www.kaggle.com/competitions/cmi-detect-behavior-with-sensor-data
I’ve trained a LSTM using PyTorch and got surprisingly strong results (i.e. accuracy = 93 ) which makes me worry about potential data leakage or preprocessing issues....
Here’s what I did to avoid leakage:
-Split data by sequence ID, no overlap between train/test
-Fit MinMaxScaler only on the training set, then applied to both
-Replaced NaNs, -1, and inf values with 0 before scaling
However, since 0 is a valid sensor reading, replacing missing/invalid values with 0 might introduce bias. I'm unsure whether I should switch to median, KNN, or use masking instead.
If anyone has experience with sensor data or wants to take a look at the code, I’d really appreciate the help and happy to include collaborators in the Kaggle submission team! Just DM me or reply here
Job Title: Part-Time Senior AI/ML Engineer (Remote)
We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.
Requirements:
-Minimum of 7–10 years of professional software development experience
-Proven experience working effectively in a remote environment
-Advanced English proficiency (C1 or higher); an American accent is preferred
-Availability to work 10–15 hours per week during EST or CST business hours
If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384
Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.
Has there ever been a competition that has involved images of time series charts rather than time series data.
Hey, I’m also interested in quant finance!
I’ve worked on a similar multi-class problem before, and from my experience it can help to split it into two binary models instead of one 3-class model.
For example, instead of predicting (long/neutral/short) , try:
Model 1: short vs not short
Model 2: long vs not long
This often gives better probability calibration, and during feature selection you can see which features are more useful for each direction (long/short).
I’d be happy to discuss it or help you out with your model if you’d like - feel free to dm me!
https://media.discordapp.net/attachments/1436719817624256534/1436719913518633010/1.JPG?ex=6910a130&is=690f4fb0&hm=6a48397700e40b701b7defba0bc73ccc590e83e58af09eb7035cae318e9fb319&=&format=webp&width=515&height=687
https://media.discordapp.net/attachments/1436719817624256534/1436719914034659408/2.jpg?ex=6910a130&is=690f4fb0&hm=5d3c01e3db0b2fe7135969c69c22cbf49db07bae5ed8cb9a98ac3e18d3c73ce5&=&format=webp&width=515&height=687
https://media.discordapp.net/attachments/1436719817624256534/1436719914512547951/3.jpg?ex=6910a130&is=690f4fb0&hm=59a326eaa4d74733a406431b5c2eb8ee07f6b78d95094102deb1153d2e261407&=&format=webp&width=515&height=687
I'm finding a US developer for the collaboration. If anybody interested, please dm me.
I am a project on time series and the goal is to predict cinema audience count.
I have give 5 datasets.
is there anyone who would like to join me?
Yes please I am interested in
when will you be free?
Hi everyone, I am currently working on my thesis involving time series modeling of stock returns.
I estimated an ARMA(1,2)–GARCH(1,1) model with GED errors in EViews. From the model, I obtained the standardized residuals and the GED shape parameter. Since EViews does not provide an Anderson–Darling goodness-of-fit test, I exported the standardized residuals to R.
My goal is to test whether the standardized residuals follow the GED distribution. Below is the R code I used:
library(readxl)
library(fGarch)
library(goftest)
data <- read_excel("C:/Users/myusername/Skripsi/1. data/resid01.xlsx")
z <- na.omit(data$resid01)
# Ljung-Box tests
Box.test(z, lag = 20, type = "Ljung-Box")
Box.test(z^2, lag = 20, type = "Ljung-Box")
# Anderson-Darling test for GED
nu <- 1.127493 # shape parameter from EViews
ad.test(z,
null = pged,
mean = 0,
sd = 1,
nu = nu)
# QQ-plot vs GED
n <- length(z)
p <- ppoints(n)
q_theoretical <- qged(p, mean = 0, sd = 1, nu = nu)
z_sorted <- sort(z)
plot(q_theoretical, z_sorted,
main = "QQ-Plot Standardized Residual vs GED",
xlab = "Theoretical GED Quantiles",
ylab = "Sample Quantiles")
abline(0,1,col="red",lwd=2)
Results
- The Anderson–Darling test does not reject GED (p > 0.05).
- The QQ-plot fits well in the center, but there are visible deviations in the tails.
Questions
- Is this a correct way to apply the Anderson–Darling test for GED on standardized residuals from a GARCH(GED) model estimated in EViews?
- I found other implementations of the AD test that reject GED, while my code accepts it. Why might different implementations produce opposite results?
- How should I interpret tail deviations in the QQ-plot when the AD test does not reject the distribution?
- For model validation, should I rely more on the AD test result or the tail behavior in the QQ-plot?
Any guidance would be greatly appreciated. Thank you.
Explore dataset for time series: About This Dataset https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-year-dataset
- 5 years daily gold futures (GC=F) data from Yahoo Finance with complete OHLCV
- Clean, ready-to-use for LSTM/GRU, ARIMA, Prophet time-series forecasting models
- 11 pre-computed technical indicators: MA7/30/90, RSI, MACD, Bollinger Bands, volatility
- No missing values, properly scaled features for immediate ML experimentation
🔗 [Starter Notebook created] — EDA, technical plots, LSTM baseline with RMSE evaluation
Hey everyone 👋
I’ve shared a clean, ready-to-use dataset covering Microsoft (MSFT) stock data from IPO to 2026 📈
https://www.kaggle.com/datasets/shauryasrivastava01/microsoft-all-time-stock-datalatest
🔹 Includes:
Open, High, Low, Close prices
Adjusted Close
Volume
Clean date formatting
💡 Perfect for:
Time series analysis
Stock price prediction models
EDA & visualization projects
Feature engineering practice
Shaurya, this is awesome, thanks for sharing. Having decades of data means you can really stress-test time series models across multiple market cycles (dot-com bubble, 2008, COVID crash, etc.).
A few ideas for anyone picking this up:
• Try decomposing the trend/seasonality using statsmodels
• Rolling volatility windows make for great features in prediction models