A live music event where musicians are located remotely from each other is possible through the Internet but highly constrained by network latency. This is especially true with rhythmic music that requires tight synchrony, or in situations where musicians are separated by long distances. To overcome time delays, we propose an intelligent system that listens to the audio input at one end and synthesizes a predicted audio output at the other. In this context, we study how our musical exposure, or enculturation, gives rise to musical anticipation. Moreover, as we admit that such prediction cannot be error-free, we aim to model the musical intentions of the performers and the expectations of the listeners.