Writing Maintainable SIMD Intrinsics Using C++ Templates

With the release of CPUs supporting AVX instructions, there are now 2 major versions of SIMD instructions available to software developers. This presents a challenge to developers who want to write code that performs well on a variety of processors. It is expensive and error prone to maintain a scalar, SSE, and now AVX implementation of the same piece of code. In this article, I present a technique that I found useful to help solve this problem.
