#Difference between Target Q network and Online Q network

5 messages · Page 1 of 1 (latest)

fiery vault
#

hey guys, i am trying to understand a concept in reinforcment learning, mainly deep q learning, So how i understood it is that there are two neural network one Online Q network and aother Target Q network, and the goal is for the online Q network to adjust its weights to so the values predicted are as close to the Target Q network as possible, Here is the part i don't understand, Both of the networks are basically the same expcet for how frequent the weights are changed, so how do we trust the values that are coming from the Target Q network if its basically the same as the Online Q network?

young lichen
#

I am still new, but i will give it a go at explaining. First you have to understand what is a Q value. it is basically the reward for performing a certain at a certain state. the target Q network is used to predict estimated future reward using bellman equation.

#

So by having another network that is more stable and does not change every timestep, it helps stabilize and improve training by providing more consistent target values for the Q-learning update

#

The code is as follows:
target_Q = self.critic_target(next_state, next_action)
target_Q = reward + ((1 - done) * args.gamma * target_Q).detach()

fiery vault
#

oh ok this makes sense,