#need help with lstm

3 messages · Page 1 of 1 (latest)

tulip anvil Mar 16, 2024, 2:30 PM

i was playing around with lstm networks for sequence classification. The dataset is "tweet_eval" (on huggingface) .
I made a small lstm network but no matter how i tune the hyperparameters the test accuracy isn't improving. Its just moving up and down in the range of (55-45%). Since the training dataset is about 45K samples big i think the model should be able to generalize to the test data, which isn't happening. Right after the start of the training the accuracy is around 50% and its about the same even in like 20 epochs.

here's the networks code:

    def __init__(self, embedding_len=32,hidden_dim=64, n_layers=1, output_size=3):
        super(net, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=embedding_len)
        self.lstm = nn.LSTM(input_size=embedding_len, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_size)
    def forward(self, x):
        x = self.embedding_layer(x)
        x,_ = self.lstm(x)
        x = x[:, -1, :]
        return self.linear(x).softmax(1) 
model = net().to("cuda")```

tulip anvil Mar 18, 2024, 2:57 PM


lr = 0.003
test_steps = 500
interval = 100
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
criterion = nn.BCELoss()
epochs = 10
stat = []
acc = 0 

gradient_accumulation_steps = 128


for epoch in range(epochs):
    for sample in (t := trange(len(x_train))):
        
        x,y = torch.tensor(x_train[sample]).to("cuda").unsqueeze(0), y_train[sample]
        #print(x.shape)
        
        out  = model.forward(x)# TODO: ineffecient! --> write a normal dataloader        
        loss = criterion(out.squeeze(),y.float())
        loss = loss / gradient_accumulation_steps 

        loss.backward()
        
        if ((sample+1) % gradient_accumulation_steps == 0) or ((sample+1) == len(x_train)):
            optimizer.step()
            optimizer.zero_grad()
        if (sample + 1) % interval == 0:
            t.set_description("E: %.1i | Loss: %.4f |" % (epoch+1,loss.item())) #| Accuracy: %.2f " % (lt, acc))
        if ((sample+1) % 5000 )== 0:
            print(measure_acc(1000))
            model.train()
            #stat.append(lt)

here's the training code