I have been looking for the same thing, either from Meta's SAM 3[1] model, either from things like the OP.
There has been some research specifically in this area with what appears to be classic ML models [2], but it's unclear to me if it can generalize to dances it has not been trained on.