I think you're taking the metaphor about a string quartet as a "conversation among equals" too literally.
In terms of perception, I'm not sure there's much of a relationship to a human conversation. To make things equal, the string players would need to take turns soloing while the others wait more or less silently to respond, each with their own solo response. You'd be bored out of your gourd if string quartets were written that way.
But more to the point, the vast majority of time in a string quartet is devoted to two or more of the players producing phrases of music in parallel, and that is musically coherent and pleasing to the players and audience. Most humans cannot track two humans speaking in parallel at all. That alone tells us that music cognition is a very different phenomenon than speech cognition.
In short, I'm not sure why a string quartet would be considered the optimal genre for humans to produce music together. And even if it is, the reasons why are even less likely to do with the protocols around human speech cognition, and certainly not with some bizarre equivalent of the "theory of mind" associated with the musical phrase produced by one of the instruments[1].
1: Small digression-- In Elliott Carter's 2nd String Quartet he actually started with a concept that each instrument was a kind of "character" in a play among the quartet. In this case, the problem with OP's metaphor becomes obvious even in the introduction-- the homogenous timbre of a string quartet makes it difficult to hear the differences among the characters. (IIRC I think even Carter admitted this.)
In terms of perception, I'm not sure there's much of a relationship to a human conversation. To make things equal, the string players would need to take turns soloing while the others wait more or less silently to respond, each with their own solo response. You'd be bored out of your gourd if string quartets were written that way.
But more to the point, the vast majority of time in a string quartet is devoted to two or more of the players producing phrases of music in parallel, and that is musically coherent and pleasing to the players and audience. Most humans cannot track two humans speaking in parallel at all. That alone tells us that music cognition is a very different phenomenon than speech cognition.
In short, I'm not sure why a string quartet would be considered the optimal genre for humans to produce music together. And even if it is, the reasons why are even less likely to do with the protocols around human speech cognition, and certainly not with some bizarre equivalent of the "theory of mind" associated with the musical phrase produced by one of the instruments[1].
1: Small digression-- In Elliott Carter's 2nd String Quartet he actually started with a concept that each instrument was a kind of "character" in a play among the quartet. In this case, the problem with OP's metaphor becomes obvious even in the introduction-- the homogenous timbre of a string quartet makes it difficult to hear the differences among the characters. (IIRC I think even Carter admitted this.)