hey guys, i am trying to understand a concept in reinforcment learning, mainly deep q learning, So how i understood it is that there are two neural network one Online Q network and aother Target Q network, and the goal is for the online Q network to adjust its weights to so the values predicted are as close to the Target Q network as possible, Here is the part i don't understand, Both of the networks are basically the same expcet for how frequent the weights are changed, so how do we trust the values that are coming from the Target Q network if its basically the same as the Online Q network?
#Difference between Target Q network and Online Q network
5 messages · Page 1 of 1 (latest)
I am still new, but i will give it a go at explaining. First you have to understand what is a Q value. it is basically the reward for performing a certain at a certain state. the target Q network is used to predict estimated future reward using bellman equation.
So by having another network that is more stable and does not change every timestep, it helps stabilize and improve training by providing more consistent target values for the Q-learning update
The code is as follows:
target_Q = self.critic_target(next_state, next_action)
target_Q = reward + ((1 - done) * args.gamma * target_Q).detach()
oh ok this makes sense,