...and speaking of BUILD 2014, Eric Brumer did a cool presentation on native code performance on modern CPUs.
He covered some new things like fused-multiply, AVX2, vectorization, store buffers, store load forwarding, etc.
My favourite slide out of the entire presentation is the one of item #2, where he shows that performance on modern CPUs is memory bound (should be true for most workloads):
This is something most people on the concurrency world are already aware, but it seems that this is true even for single-threaded coded, mostly (but not only) due to vectorization.
I think he distilled a very good idea out of it: that we should pay attention to the where the loads and stores are located in our code, which is not an easy task for a developer, but a necessary one for those of us wishing to write high-performance code.