Iterative Stencil Loop (ISL) computation is a well known type of scientific computation applied in different fields, from image processing to seismic simulations, from numerical methods to physical modeling. In such a computation, series of sweeps are performed over a regular grid, updating its points by means of a fixed nearest-neighbor pattern. Thanks to their regular computation structure, ISL algorithms are ideal candidates for automatic optimizations and hardware acceleration. However, an efficient implementation of ISL pipelines requires optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values. This talk presents Halide, a language and compiler for optimizing parallelism, locality, and recomputation in ISL pipelines, both in a single node and distributed context. Halide uses simple language constructs to express what to compute and a separate scheduling co-language for expressing when and where to perform the computation. The distributed benchmarks achieved up to 18x speedup on a 16 node testing machine and up to 57x speedup on 64 nodes of the NERSC Cori supercomputer.