I was wondering what the difference was in terms of performance. I'm assuming sampling from height won't help and it will probably in fact need to sample the height AND the normal map.
Best case scenario, same performance if the height map is channel packed somewhere and not used as a separate unneeded sample.
This is my assumption as probably everyone would be using height for normals otherwise.