Modeling protein-protein interactions is a necessary step for elucidating the mechanisms behind fundamental biological processes. Recent advances in protein structure prediction have laid the groundwork for resolving those interactions through protein co-folding. In this work, we repurpose the building blocks of protein folding architectures to directly operate on structures and perform end-to-end protein docking, without the need for costly sequence alignments. To do this, we introduce a two-track pipeline to reason pairwise three-dimensional matching of interfaces and guide Euclidean-equivariant models for iterative construction and refinement of complexes. Our approach performs on par with state-of-the-art methods while reducing computation time by orders of magnitude.