I'm trying to make a network that generates a character, given a sequence. I don't want to use LSTM/Transformer/RNN, I want to try and see how far I can get without these.
So far I am able to make the network overfit, but it is really bad at generalizing.
I'm not sure if I need to make my network deeper, or more wide, or what.
Also it takes a long time to train (takes a good couple hours on an RTX 4090 to get down to even 1.7 training loss, which still produces gibberish like this: "The sovk If the serl wk tnt tn t site toove thich tn toooe te tnsorvlng torhhfr torphith
TThe sont on tn t toarn tn thes trrt tn tn t d tfher Tut tn the soketin the sordnh pg th th th the sord th tnpert th tond tndtn hnere hrto d r og thich tirld tnl w tn th te tptont rtd
he sa eu tf tomttrueteon toaeave tp the sor")
I can only get the training accuracy up to like 55% (meaning 55% of the time the character is generated correctly, but that still ends up looking like jibberish)
Anybody have any tips?
Currently the network has about 1.5 million parameters
super(Gen, self).__init__()
print("SIZE: ", read_size * vocab_size)
embedding_dim = 10
mid_size = 200
self.vocab_size = vocab_size
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lin1 = nn.Linear(read_size * embedding_dim, 400)
self.bn = nn.BatchNorm1d(400)
self.lin2 = nn.Linear(400, mid_size)
self.bn2 = nn.BatchNorm1d(mid_size)
self.linears = nn.ModuleList([nn.Linear(mid_size, mid_size) for i in range(15)])
self.resid = nn.ModuleList([nn.Linear(mid_size, mid_size) for i in range(15)])
self.batchnorms = nn.ModuleList([nn.BatchNorm1d(mid_size) for i in range(15)])
self.layernorm = nn.LayerNorm(mid_size)
self.lin4 = nn.Linear(mid_size, 100)
self.lin5 = nn.Linear(100, 50)
self.lin6 = nn.Linear(50, 20)
self.bn6=nn.BatchNorm1d(20)
self.lin7 = nn.Linear(20, 10)
self.lin8 = nn.Linear(10, output_size * vocab_size)
self.drp = nn.Dropout(0.15)
self.drpmid = nn.Dropout(0.01)
self.drpsmall=nn.Dropout(0.00)
def forward(self, x):
x = x.long()
x = self.embedding(x)
x = x.view(x.shape[0], -1)
x = nn.functional.leaky_relu(self.bn(self.lin1(x)))
x = self.drpsmall(nn.functional.leaky_relu(self.bn2(self.lin2(x))))
for i, l in enumerate(self.linears):
x = self.drpmid(nn.functional.leaky_relu(self.batchnorms[i](l(x)) + nn.functional.leaky_relu(self.resid[i](x)), negative_slope=0.01))
x = nn.functional.leaky_relu(self.lin4(x))
x = nn.functional.leaky_relu(self.lin5(x))
x=self.drp(x)
x = nn.functional.selu(self.bn6(self.lin6(x)))
x = nn.functional.leaky_relu(self.lin7(x))
x=self.lin8(x)
return x```