#timeseries

1 messages · Page 1 of 1 (latest)

hallow dagger
#

Hi

outer oracle
#

Hello @hallow dagger

hallow dagger
#

Hi, I need help with estimating market size of consumer credit, I am confused on which statistical model to go with. My data is aggregated by macro, industry and consumer

brave eagle
#

Have you looked into ARIMA @hallow dagger

mossy saffron
#

@hallow dagger You can also look at SARIMA - it takes into account both seasonality and autocorrelation. VAR as well

late kettle
chilly linden
#

Hi, I have 8 months(2022-01-01 00:00:00 to 2022-08-31 23:00:00) of measured time series data(not synthetic) on hourly basis. I want to forecast 1 month. ADF says data is stationary with values below:
ADF Statistic: -4.2776107655424855
p-value: 0.00048558484613883977
Critical Values: {'1%': -3.4314785489044994, '5%': -2.8620387116731525, '10%': -2.567035462232358}
However, when I apply seasonal decomposition, I see the trend changes over time. This confuses me.
Also, is it possible that data has both daily and weekly seasonality? If so, how should I approach this problem? How to select seasonality parameter? (24 for daily, 168 for weekly) Is lower MAPE always better? Thanks.

spare bridge
#

Who can recommend a good resource for mastering time series

rough silo
simple shoal
#

Hi @spare bridge please provide recourse for time series

spare bridge
midnight kiln
#

Hi everyone

rigid jewel
#

Chat gpt is very good it will help you

steep swan
#

I'm no expert but i believe most important part of time series is feature engineering, i've always had better predictions with xgboost on precisely chosen features

#

LTSM & ARIMA is usualy way to linear/simplistic regression when i tried using it on my problems (Also because they were multivariate problems probably)

ionic crane
#

A fairly complete practical reference from fundamentals to advanced concepts like transformers for time-series forecasting, to quickly mastering time series forecasting use cases for everyone.

steep swan
#

Althrough it's in R, the theory is here even if you dont use R

opal hollow
#

Hey everyone, I'm putting together a time series project and I was wondering if anyone knows... exactly how "realtime" is the yahoo finance api? Are we talking up to the second? Just wondering if anyone knows.

languid bronze
craggy matrix
gleaming mist
wooden daggerBOT
#
_shahin0519 has been warned

Reason: Bad word usage

languid solstice
#

Hello I am working on a time series forecasting project, and I decided to make a CNN-transformer model, but I did not find many codes for it, so I decided to make a model from scratch. The model worked, but its efficiency is less than other models, such as CNN-LSTM or LSTM-Attention, so if there is a person First, tell me if the model is built correctly and how I can improve it further.

`def transformer_encoder(inputs, model_dim, num_heads, ff_dim, dropout_rate):

Multi-head self-attention

attention_output = MultiHeadAttention(num_heads=num_heads, key_dim=model_dim, dropout=dropout_rate)(inputs, inputs) attention_output = Dropout(dropout_rate)(attention_output)
attention_output = Add()([inputs, attention_output])
attention_output = LayerNormalization(epsilon=1e-6)(attention_output) ffn_output = Dense(ff_dim, activation="relu")(attention_output)
ffn_output = Dropout(dropout_rate)(ffn_output)
ffn_output = Dense(model_dim, activation="linear")(ffn_output)
ffn_output = Add()([attention_output, ffn_output])
ffn_output = LayerNormalization(epsilon=1e-6)(ffn_output)
return ffn_output
#Model Parameters
input_shape = (24, 4)
num_filters = 64
kernel_size = 3
model_dim = 64
num_heads = 8
ff_dim = 100
dropout_rate = 0.1
#Model building
inputs = Input(shape=input_shape)
cnn_output = Conv1D(filters=num_filters, kernel_size=kernel_size, padding="same")(inputs)
cnn_output = Conv1D(filters=model_dim, kernel_size=kernel_size, padding="same")(cnn_output)
transformer_output = transformer_encoder(cnn_output, model_dim=model_dim, num_heads=num_heads, ff_dim=ff_dim, dropout_rate=dropout_rate)
transformer_output = GlobalAveragePooling1D()(transformer_output) transformer_output = Dense(64, activation="linear")(transformer_output)

Predicting 24 future values

outputs = Dense(24, activation="linear"))(transformer_output)
model = Model(inputs=inputs, outputs=outputs)`

peak obsidian
ancient merlin
#

Hi everyone,

I'm currently working on time series forecasting for budget predictions and am facing challenges due to the limited data available. I need to predict for a horizon of 365 days, but I have only four months of daily data. I've already explored several models, including ARIMA, SARIMAX, Prophet, and some custom models using LSTM and hybrid approaches. I've also tried TIDE and TFT, experimenting with synthetic data which yielded promising results; however, I'm struggling to achieve similar outcomes with my real dataset. Could anyone suggest a model or approach that performs well with such constraints? Any tips or insights would be greatly appreciated!

Thank you!

languid solstice
#

Hi,
Anyone worked before on wind speed forecasting using deep learning ?

gentle badge
#

Hi , anyone here can please share the link for any tutorial in univariate timeseries transformers or GRU that can be replicated? That would be great help. I am shocked to see so many claps on articles on medium, and those articles are literally no where close to replicating. Infact some of them are so misleading.

gentle badge
lean fog
#

hi

mild abyss
#

Hi All,

I am working with a client who is expecting me to do a commodity price forecasting on monthly basis. But they will be able to provide us with only monthly data for past 5 years. (60 data points)

I have tried Holt’s winter model, ARIMA, SARIMAX, LSTM, LR, Prophet. But the accuracy is not up to the mark.

What is the minimum data points requirement to do the monthly forecasting?

Can I please have help with the correct approach here?

desert raven
#

if it is a trend plus seasonality thing, 5 years would be enough. But probably there are other factors and a degree of randomness involved. I'm not sure a daily granularity would improve things, unless you are going to use some sort of news feed as an input

languid solstice
#

Hi,
What is the best approach to know the state of weather of the next day (cloudy, sunny or partially sunny) based only on the actual and historical data of solar irradiance, temperature, Clear sky index. I would use it for solar irradiance forecast of one day ahead.
Thanks,

alpine elk
#

HI, I am Abdullah I am an ML engineer want to join any team to particapte in kaggle competions

strange grove
#

I am interested in trading and healthcare.

surreal ferry
#

🚀 𝗔𝗜 𝗶𝘀 𝗘𝘅𝗽𝗹𝗼𝗱𝗶𝗻𝗴… 𝗕𝘂𝘁 𝗔𝗿𝗲 𝗪𝗲 𝗙𝗼𝗿𝗴𝗲𝘁𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗲𝗮𝗹 𝗧𝗵𝗿𝗲𝗮𝘁?

As an 𝗔𝗜 𝗘𝗻𝘁𝗵𝘂𝘀𝗶𝗮𝘀𝘁, 𝗣𝗿𝗼𝗱𝘂𝗰𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿, 𝗮𝗻𝗱 𝗔𝗜 𝗥&𝗗...

covert agate
#

hi folks, i am finishing my project as part of data science learning and need some help with the LSTM model i use. all fine but confusion matrix is broken... my kaggle notebook here https://www.kaggle.com/code/sheroleg/naya-final-lstm will be very appreciate for help. if possible please comment on notebook and not here.

tender hamlet
#

need help in time series modeling
data:

Project  year  Month  MoneyLeft
prj1  2024  1  1000
prj1  2024  2  800
prj1  2024  3  400
prj1  2024  4  100
prj2  2022  3  5000
prj2  2022  4  3493
prj2  2022  5  2000
prj2  2022  6  1000
fabrciate this for 10 to 20 projects ,each prorjecr can have month 12 to month 18
for a new project given moneyLeft  for 2 or 3 months it should predcit next 4 months moneyLeft
the models like ARIMA ,SARIMA ,EXPONENETIAL SMOOTHING  ETC will take only one season or trend,whick means we can train these model only on single project
1 .I have one solution like we can convert this time series problem to regression problem ,we can create lags or windows for three months and can predict for next 4 months , the problem here is it will train on that lags or windows only ,it should also be giving importance for project name (I do not no how to do)

  1. other solution would be we can train the model for each project which is not feasible here in this case
    how to do this
small dome
small dome
sharp verge
#

Hello guys, so i am working on an early warning score prediction model using LSTM as the prediction model. The dataset i was able to get had six vital signs features and 2 demographic features. So my main aim is to predict the next two to three days of a patient.

Now the dataset i was able to get has multiple patients with varying days of entries(some got their vitals taken for like three days, four and so). I have been kind of thinking whether this kind of dataset is fit for training that kind of model that can take any vital sign and predict what it will be for the next coming days.

split vault
obtuse peak
#

when dealing with time series data, how do we know if there is serial dependence in the data or not? is it a question of using domain knowledge or should we use methods like lagging and time step each time to check this thing?

spark fable
# obtuse peak when dealing with time series data, how do we know if there is serial dependence...

You can try to plot Autocorrelation (ACF), Partial Autocorrelation (PACF), and Cross Correlation (CCF) https://business-science.github.io/timetk/reference/plot_acf_diagnostics.html

subtle quest
#

If anyone here as worked with Rocket transform for time series or using randomized convolutional kernels feature extraction, it'd be great if you could DM me, need advice on a project. Thenk

obtuse peak
#

Hi, what happens when the data is not taken on regular intervals of time? Taking the example of ADSB data ( basically flight logs that are broadcasted at irregular intervals ), how would we analyse this data if it's not spread into regular intervals ? Also, would it make sense to model this kind of data using methods used for time series or should we use other features like position, altitude and velocity for predicting a flight's position at a given time?

#

What methods can be used for developing a system to flag outliers/ spoofing in flight data, things like a sudden unexplainable change in speed/altitude or times when the latitude /longitude data sent at a time doesn't make sense for the path being taken(common issue in ADSB)

#

Any suggestions or help is appreciated 🙂

candid vector
#

Hey everyone!
I’m working on the CMI – Detect Behavior with Sensor Data Kaggle competition, where the goal is to classify BFRB vs non‑BFRB behaviors using wrist-worn sensor data (TOF, IMU, pressure, etc.)

https://www.kaggle.com/competitions/cmi-detect-behavior-with-sensor-data

I’ve trained a LSTM using PyTorch and got surprisingly strong results (i.e. accuracy = 93 ) which makes me worry about potential data leakage or preprocessing issues....

Here’s what I did to avoid leakage:

-Split data by sequence ID, no overlap between train/test

-Fit MinMaxScaler only on the training set, then applied to both

-Replaced NaNs, -1, and inf values with 0 before scaling

However, since 0 is a valid sensor reading, replacing missing/invalid values with 0 might introduce bias. I'm unsure whether I should switch to median, KNN, or use masking instead.

If anyone has experience with sensor data or wants to take a look at the code, I’d really appreciate the help and happy to include collaborators in the Kaggle submission team! Just DM me or reply here

balmy ether
#

Job Title: Part-Time Senior AI/ML Engineer (Remote)

We are seeking a skilled and experienced Senior AI/ML Engineer to join our remote team on a part-time basis. The ideal candidate will have a strong technical background, excellent communication skills, and the ability to work independently in a fast-paced environment.

Requirements:
-Minimum of 7–10 years of professional software development experience

-Proven experience working effectively in a remote environment

-Advanced English proficiency (C1 or higher); an American accent is preferred

-Availability to work 10–15 hours per week during EST or CST business hours

If you're a highly motivated engineer with a passion for building high-quality software and can commit to a flexible part-time schedule, we’d love to hear from you.
You can connect with me on WhatsApp: +1 (567) 469-5384

steel dune
#

Hi, @everybody
I have one question, I'm training ml models for the prediction, which is classification problem of 3 classes, where the number of samples are similar but the predition is skewed.
First class and second class is predicted with low precision tough, third class is never predicted. What's the reason? I can' t find the reason.
Before, when I applyed reinforcement learning, where the three classes were assigned to three actions and one action is never selected, too.
Actually, that is the preeiction model of forex eur/usd.

charred steppe
#

Has there ever been a competition that has involved images of time series charts rather than time series data.

burnt dome
# steel dune Hi, @everybody I have one question, I'm training ml models for the prediction, w...

Hey, I’m also interested in quant finance!
I’ve worked on a similar multi-class problem before, and from my experience it can help to split it into two binary models instead of one 3-class model.
For example, instead of predicting (long/neutral/short) , try:

Model 1: short vs not short
Model 2: long vs not long

This often gives better probability calibration, and during feature selection you can see which features are more useful for each direction (long/short).
I’d be happy to discuss it or help you out with your model if you’d like - feel free to dm me!

steel dune
#

I'm finding a US developer for the collaboration. If anybody interested, please dm me.

sullen fulcrum
#

I am a project on time series and the goal is to predict cinema audience count.
I have give 5 datasets.
is there anyone who would like to join me?

sullen fulcrum
tight osprey
#

Hi everyone, I am currently working on my thesis involving time series modeling of stock returns.

I estimated an ARMA(1,2)–GARCH(1,1) model with GED errors in EViews. From the model, I obtained the standardized residuals and the GED shape parameter. Since EViews does not provide an Anderson–Darling goodness-of-fit test, I exported the standardized residuals to R.

My goal is to test whether the standardized residuals follow the GED distribution. Below is the R code I used:

library(readxl)
library(fGarch)
library(goftest)

data <- read_excel("C:/Users/myusername/Skripsi/1. data/resid01.xlsx")
z <- na.omit(data$resid01)

# Ljung-Box tests
Box.test(z, lag = 20, type = "Ljung-Box")
Box.test(z^2, lag = 20, type = "Ljung-Box")

# Anderson-Darling test for GED
nu <- 1.127493  # shape parameter from EViews

ad.test(z,
        null = pged,
        mean = 0,
        sd = 1,
        nu = nu)

# QQ-plot vs GED
n <- length(z)
p <- ppoints(n)
q_theoretical <- qged(p, mean = 0, sd = 1, nu = nu)
z_sorted <- sort(z)

plot(q_theoretical, z_sorted,
     main = "QQ-Plot Standardized Residual vs GED",
     xlab = "Theoretical GED Quantiles",
     ylab = "Sample Quantiles")
abline(0,1,col="red",lwd=2)

Results

  • The Anderson–Darling test does not reject GED (p > 0.05).
  • The QQ-plot fits well in the center, but there are visible deviations in the tails.
    Questions
  1. Is this a correct way to apply the Anderson–Darling test for GED on standardized residuals from a GARCH(GED) model estimated in EViews?
  2. I found other implementations of the AD test that reject GED, while my code accepts it. Why might different implementations produce opposite results?
  3. How should I interpret tail deviations in the QQ-plot when the AD test does not reject the distribution?
  4. For model validation, should I rely more on the AD test result or the tail behavior in the QQ-plot?

Any guidance would be greatly appreciated. Thank you.

rugged edge
#

Explore dataset for time series: About This Dataset https://www.kaggle.com/datasets/suhanigupta04/gold-futures-5-year-dataset

  • 5 years daily gold futures (GC=F) data from Yahoo Finance with complete OHLCV
  • Clean, ready-to-use for LSTM/GRU, ARIMA, Prophet time-series forecasting models
  • 11 pre-computed technical indicators: MA7/30/90, RSI, MACD, Bollinger Bands, volatility
  • No missing values, properly scaled features for immediate ML experimentation

🔗 [Starter Notebook created] — EDA, technical plots, LSTM baseline with RMSE evaluation

coral kestrel
solemn crypt
#

Shaurya, this is awesome, thanks for sharing. Having decades of data means you can really stress-test time series models across multiple market cycles (dot-com bubble, 2008, COVID crash, etc.).

A few ideas for anyone picking this up:
• Try decomposing the trend/seasonality using statsmodels
• Rolling volatility windows make for great features in prediction models