There are three ways such a scene can play out.
1) A is on camera, hails B. Camera turns to B, who responds.
2) A, on camera, hails B, who is off camera. B responds.
3) A, off camera, hails B, who is on camera. B responds.
Now, the first scenario presents no problems. Whenever the camera moves, we can say that there was an arbitrarily long cut in the shot, and any commbadge delays would happen during that cut.
The second scenario presents no problems, either. A could hail B with an arbitrarily long phrase, and B could respond even while the tail end of A's phrase is coming out of B's commbadge. The absence of the camera from the receiving end would hide any commbadge delays.
The third scenario mirrors the first, but the rationalization we need is subtly different. We cannot tell if A started his hail when B hears him start the hail, or perhaps slightly earlier. The latter would explain away commbadge delays: A's hail would be kept "on hold" until the computer resolved the recipient (that is, heard the "to B" part), and only then piped to A's commbadge.
There is no scenario where both A and B would simultaneously be on camera, because nobody in his right mind would use a commbadge to contact somebody who's in the same shot with him. So we never actually face the problem of the commbadge delay directly. But the contrast between the mirror scenarios 2 and 3 forces us to indirectly face the problem: in 2, we hear that there is no delay even when there obviously ought to be one, and we have to evoke the "B responded before listening to the whole thing" rationalization in addition to the "the computer kept the message on hold while establishing the recipient" one. In 3, we don't and can't use the overlap rationalization. It then becomes a bit weird how people in certain scenarios always overlap, while people in mirror scenarios never do... Not impossible or aphysical or anything like that, just statistically weird.
But the delay problem is very real, and the saving grace is that the delay never is very long - because, just like Cruella sez, the phrases used for opening the conversation are short, and the delay thus ought to consist of just two words: name of speaker and "to" (plus whatever reaction time the computer requires - probably close to zero).
Timo Saloniemi