As you've mentioned, I highly recommend you look at Prosody for the server. It is by far the easiest, but also really really good. The only thing ejabber might be better at is for extremely large deployments with failover and load balancing.
XMPP doesn't use sip, it has its own protocol for voice and video calls (called Jingles). All servers, afaik, support it. On the other hand, SIP/RTP servers such as FreeSwitch and Asterisk do support Jingle bridging!
OMEMO and GPG support is purely a client side thing, so server support is irrelevant. Though some servers can be configured to refuse to pass unencrypted messages.
With XMPP bridges are usually implemented as external components (a feature built-into the XMPP standard). Slidge franeworm seems to be the latest and greatest in terms of external bridges: https://sr.ht/~nicoco/slidge/ a WhatsApp bridge is built using it: https://git.sr.ht/~nicoco/slidge-whatsapp