Why Temporal Difference (TD) Learning?

How?

TD learning is updating the value function based on the error between the TD target (estimate of the return) and the value function (the difference is called TD error)

$$ V(S_t) \leftarrow V(S_t) + \alpha[(R_{t+1} + V(S_{t+1})) - V(S_t)] $$

This originates from the following equation:

$$ V(S_t) \leftarrow (1 - \alpha)V(S_t) + \alpha G_t $$

The idea behind this is that the learning rate determines the balance between the current estimate and the return $G_t$ (which is replaced with TD target for TD learning)