Trying to make my own neural network for language generation, struggling to get coherent text | Learn AI Together | Page 1

tired sun Mar 31, 2023, 8:30 AM

#

I'm trying to make a network that generates a character, given a sequence. I don't want to use LSTM/Transformer/RNN, I want to try and see how far I can get without these.
So far I am able to make the network overfit, but it is really bad at generalizing.
I'm not sure if I need to make my network deeper, or more wide, or what.
Also it takes a long time to train (takes a good couple hours on an RTX 4090 to get down to even 1.7 training loss, which still produces gibberish like this: "The sovk If the serl wk tnt tn t site toove thich tn toooe te tnsorvlng torhhfr torphith
TThe sont on tn t toarn tn thes trrt tn tn t d tfher Tut tn the soketin the sordnh pg th th th the sord th tnpert th tond tndtn hnere hrto d r og thich tirld tnl w tn th te tptont rtd
he sa eu tf tomttrueteon toaeave tp the sor")

I can only get the training accuracy up to like 55% (meaning 55% of the time the character is generated correctly, but that still ends up looking like jibberish)

Anybody have any tips?

Currently the network has about 1.5 million parameters

        super(Gen, self).__init__()
        print("SIZE: ", read_size * vocab_size)
        embedding_dim = 10
        mid_size = 200
        self.vocab_size = vocab_size
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lin1 = nn.Linear(read_size * embedding_dim, 400)
        self.bn = nn.BatchNorm1d(400)
        self.lin2 = nn.Linear(400, mid_size)
        self.bn2 = nn.BatchNorm1d(mid_size)
        self.linears = nn.ModuleList([nn.Linear(mid_size, mid_size) for i in range(15)])
        self.resid = nn.ModuleList([nn.Linear(mid_size, mid_size) for i in range(15)])
        self.batchnorms = nn.ModuleList([nn.BatchNorm1d(mid_size) for i in range(15)])
        self.layernorm = nn.LayerNorm(mid_size)
        self.lin4 = nn.Linear(mid_size, 100)
        self.lin5 = nn.Linear(100, 50)
        self.lin6 = nn.Linear(50, 20)
        self.bn6=nn.BatchNorm1d(20)
        self.lin7 = nn.Linear(20, 10)
        self.lin8 = nn.Linear(10, output_size * vocab_size)
        
        self.drp = nn.Dropout(0.15)
        self.drpmid = nn.Dropout(0.01)
        self.drpsmall=nn.Dropout(0.00)

    def forward(self, x):
        x = x.long()
        x = self.embedding(x)
        x = x.view(x.shape[0], -1)
        x = nn.functional.leaky_relu(self.bn(self.lin1(x)))
        x = self.drpsmall(nn.functional.leaky_relu(self.bn2(self.lin2(x))))
        for i, l in enumerate(self.linears):
            x = self.drpmid(nn.functional.leaky_relu(self.batchnorms[i](l(x)) + nn.functional.leaky_relu(self.resid[i](x)), negative_slope=0.01))
        x = nn.functional.leaky_relu(self.lin4(x))
        x = nn.functional.leaky_relu(self.lin5(x))
        x=self.drp(x)
        x = nn.functional.selu(self.bn6(self.lin6(x)))
        x = nn.functional.leaky_relu(self.lin7(x))
        x=self.lin8(x)
        return x```

sweet spade Mar 31, 2023, 2:11 PM

#

This is @strange grove's expertise, where he knows a lot more about non-transformer text generation than anyone else. But it won't be anything bruteforcing like you're trying to do, but to incorporate quite a bit of linguistics knowledge.

slow wedge Mar 31, 2023, 2:26 PM

#

https://keras.io/examples/generative/text_generation_fnet/

Keras documentation: Text Generation using FNet

strange grove Mar 31, 2023, 2:52 PM

#

This is a n-gram model. I don't quite understand from your code the size of the n-gram (15?) but beyond 5-grams in general you won't be able to see enough data to fit the model.

tired sun Mar 31, 2023, 4:48 PM

#

strange grove This is a n-gram model. I don't quite understand from your code the size of the ...

The way the model is set up is it takes the last read_size characters (60 in this case) and then predicts the next 1 character by outputting a prediction for each possible character

#

the 15 is just I have 15 other linear layers in the forward() method

glacial verge Mar 31, 2023, 5:11 PM

#

@tired sun How are you sampling from the networks prediction? Uniformly?

tired sun Mar 31, 2023, 5:11 PM

#

glacial verge <@125471292022456320> How are you sampling from the networks prediction? Uniform...

for the loss I use a cross entropy

#

and then to sample (when generating text after training) I just do an argmax

#

choose the max value

glacial verge Mar 31, 2023, 5:15 PM

#

Reading through your code now. Its too verbose. You could try packaging some of the layers in a Sequential container.

tired sun Mar 31, 2023, 5:15 PM

#

yeah that would probably be easier to read for sure

glacial verge Mar 31, 2023, 5:18 PM

#

tired sun yeah that would probably be easier to read for sure

I'll think on it. Would it be OK if I run the code? Or do you have a Colab notebook I can look at?

tired sun Mar 31, 2023, 5:18 PM

#

glacial verge I'll think on it. Would it be OK if I run the code? Or do you have a Colab noteb...

yea i can just send you the code

strange grove Mar 31, 2023, 6:58 PM

#

tired sun The way the model is set up is it takes the last read_size characters (60 in thi...

then it is a 60-gram character model. This is super super sparse space. Try jacking up the dropout. How big is your training data. Try running it through The Pile.

terse night Mar 31, 2023, 8:40 PM

#

use word2vec embeddings for the tokens, that might help. Better embeddings like Elmo, BERT and all can be used but they use a more complicated architecture than a MLP layer and if you want you can train your own word2vec embeddings .

tired sun Apr 1, 2023, 5:48 AM

#

FINAL: siderate, not to say unreasonable.

Mr. Parke came in, but could only shake my h||andstnd tnpuerunedtor sen eng tnay or t sart ete sav te n tirra ng tor toar the sotony oh a tn tarnt tnsereetorenir tynht te sIin eoand te sav sn tist teaete thet the sney these th te tis th toane tft ohe e trd tIonmtor tes,elf,

“

HE CORCTH.T hock tnsom the soon tf cerstitk d temunee the siteeh ou the srevi tli toae oith t doare aieme ng ttaeke tnd the tilished tn tnshratetooehtstf

#

Everything after the "||" is generated

#

Its just jibberish 😦

tired sun Apr 1, 2023, 5:49 AM

#

terse night use word2vec embeddings for the tokens, that might help. Better embeddings like ...

right now im using embeddings for characters

#Trying to make my own neural network for language generation, struggling to get coherent text