That's right. And there's a long-established technique in film where you see one thing while hearing another. The most obvious example is seeing the exterior of a skyscraper, and you can already hear sound from the scene inside the office, like a phone ringing or a conversation in progress. It gives the narrative a "sweeping along" feeling, like we're not wasting any time and we trust the audience to get it.
Good example. Another is whether we, the audience, hear the other side of a phone call or not. Sometimes we do, sometimes not, but if we do, I don't think it's necessary to demand, "How can we hear what this person is saying to Don Draper when Don is clearly not using a speakerphone?" Another example, closer to the voiceover, is non-diegetic music such as TOS' beautiful scores and library cues. There's no reason to demand, "Hey, where exactly are the horn players we can hear when Kirk notices the bridge crew 'slowing down' after Deela dopes his coffee? Were they hyperaccelerated too? Sitting near the navigation sub-systems station?" It's just part of the experience for the viewers.