Not sure what it's worth, but here's my two bits: Anyone who's watched Jurassic Park should know that critical systems should always have a manual override. Now, just how critical you consider a doorbell is up to you of course - I get about two visitors a year and the parcel couriers never use them anyways, so for me it would not be.
On the topic of video integration in HA, I'm a bit disillusioned. That's not HA's fault by the way. It's just that video is still much too clunky, laggy and data-intensive for a seamless integration, even with modern networking. So if you want to have the best of both worlds, here's what I'd propose:
1, don't rely on the flashy new ring/blink/reolink systems, they tend to sacrifice reliability for price and flexibility. Get a dedicated old-fashioned IP-based video doorbell system instead.
2, run a dedicated line from the camera to the screen for maximup erformance and reliability instead of relying on your existing network infrastructure. If you have reason to, add a dedicated switch to get the data into your network. That is to say, if your network is a tree, dont't mount the camera and the screen on different branches, but right out therre on the same twig.
3, consider keeping your video infrastructure in a dedicated VLAN with good QoS or a routed subnet, either of which will keep much of the network prattle routinely going on in a highly-populated network away from the critical lines