Our model takes a point cloud video clip with 12 frames as input, and subsequently conditions a spatiotemporal neural field in order to predict output point clouds of the complete dynamic scene at a chosen moment in time. These longer videos are created by repeatedly applying the model over subsequent time windows.