For distributed training, one important feature is to be able to do GPU-to-GPU c...

For distributed training, one important feature is to be able to do GPU-to-GPU communication, such as allreduce, allgather, and all2all. Those are not supported at the moment but they are in our roadmap. At this level, however, it seems the language runtime itself plays a reduced role, so I don't expect the experience to be much different to, say, Python/JAX.

For the second question, my understanding is that all big tech models rely on distributed training, so distributed training is a requisite for competing really.