Float Sum combine1: Maximum use of data abstraction: Best: 10.73 (14%), Overall Best: 10.74 40-most: 10.74 cycles/element Float Sum combine2: Take vec_length() out of loop: Best: 8.92 (2%), Overall Best: 8.95 40-most: 8.95 cycles/element Float Sum combine3: Array reference to vector data: Best: 2.68 (58%), Overall Best: 2.68 40-most: 2.69 cycles/element Float Sum combine3w: Update *dest within loop only with write: Best: 2.67 (2%), Overall Best: 2.69 40-most: 2.70 cycles/element Float Sum combine4: Array reference, accumulate in temporary: Best: 2.66 (2%), Overall Best: 2.68 40-most: 2.71 cycles/element Float Sum combine4b: Include bonds check in loop: Best: 2.70 (6%), Overall Best: 2.70 40-most: 2.74 cycles/element Float Sum combine4p: Pointer reference, accumulate in temporary: Best: 2.68 (8%), Overall Best: 2.68 40-most: 2.70 cycles/element Float Sum combine5: Array code, unrolled by 2: Best: 2.68 (4%), Overall Best: 2.68 40-most: 2.74 cycles/element Float Sum combine5p: Pointer code, unrolled by 2, for loop: Best: 2.69 (10%), Overall Best: 2.69 40-most: 2.74 cycles/element Float Sum unroll2aw: Array code, unrolled by 2, while loop: Best: 2.68 (2%), Overall Best: 2.68 40-most: 2.75 cycles/element Float Sum unroll3a: Array code, unrolled by 3: Best: 2.68 (2%), Overall Best: 2.69 40-most: 2.75 cycles/element Float Sum unroll4a: Array code, unrolled by 4: Best: 2.68 (2%), Overall Best: 2.69 40-most: 2.75 cycles/element Float Sum unroll5a: Array code, unrolled by 5: Best: 2.69 (6%), Overall Best: 2.69 40-most: 2.75 cycles/element Float Sum unroll6a: Array code, unrolled by 6: Best: 2.68 (2%), Overall Best: 2.68 40-most: 2.75 cycles/element Float Sum unroll7a: Array code, unrolled by 7: Best: 2.68 (4%), Overall Best: 2.69 40-most: 2.74 cycles/element Float Sum unroll8a: Array code, unrolled by 8: Best: 2.62 (2%), Overall Best: 2.67 40-most: 2.74 cycles/element Float Sum unroll9a: Array code, unrolled by 9: Best: 2.68 (2%), Overall Best: 2.69 40-most: 2.74 cycles/element Float Sum unroll10a: Array code, unrolled by 10: Best: 2.68 (2%), Overall Best: 2.69 40-most: 2.72 cycles/element Float Sum unroll16a: Array code, unrolled by 16: Best: 2.68 (2%), Overall Best: 2.69 40-most: 2.71 cycles/element Float Sum unroll2: Pointer code, unrolled by 2: Best: 2.67 (4%), Overall Best: 2.69 40-most: 2.69 cycles/element Float Sum unroll3: Pointer code, unrolled by 3: Best: 2.68 (66%), Overall Best: 2.68 40-most: 2.69 cycles/element Float Sum unroll4: Pointer code, unrolled by 4: Best: 2.68 (16%), Overall Best: 2.69 40-most: 2.69 cycles/element Float Sum unroll8: Pointer code, unrolled by 8: Best: 2.68 (68%), Overall Best: 2.68 40-most: 2.69 cycles/element Float Sum unroll16: Pointer code, unrolled by 16: Best: 2.67 (2%), Overall Best: 2.69 40-most: 2.69 cycles/element Float Sum combine6: Array code, unrolled by 2, Superscalar x2: Best: 1.33 (2%), Overall Best: 1.34 40-most: 1.34 cycles/element Float Sum unroll4x2a: Array code, unrolled by 4, Superscalar x2: Best: 1.34 (56%), Overall Best: 1.34 40-most: 1.35 cycles/element Float Sum unroll8x2a: Array code, unrolled by 8, Superscalar x2: Best: 1.32 (2%), Overall Best: 1.33 40-most: 1.34 cycles/element Float Sum unroll3x3a: Array code, unrolled by 3, Superscalar x3: Best: 0.88 (2%), Overall Best: 0.90 40-most: 0.90 cycles/element Float Sum unroll4x4a: Array code, unrolled by 4, Superscalar x4: Best: 0.89 (2%), Overall Best: 0.91 40-most: 0.91 cycles/element Float Sum unroll5x5a: Array code, unrolled by 5, Superscalar x5: Best: 0.92 (28%), Overall Best: 0.92 40-most: 0.94 cycles/element Float Sum unroll6x6a: Array code, unrolled by 6, Superscalar x6: Best: 0.93 (6%), Overall Best: 0.93 40-most: 0.97 cycles/element Float Sum unroll7x7a: Array code, unrolled by 7, Superscalar x7: Best: 0.92 (2%), Overall Best: 0.93 40-most: 0.99 cycles/element Float Sum unroll8x4a: Array code, unrolled by 8, Superscalar x4: Best: 0.94 (2%), Overall Best: 0.95 40-most: 1.00 cycles/element Float Sum unroll8x8a: Array code, unrolled by 8, Superscalar x8: Best: 0.93 (4%), Overall Best: 0.93 40-most: 1.00 cycles/element Float Sum unroll9x9a: Array code, unrolled by 9, Superscalar x9: Best: 0.95 (4%), Overall Best: 0.96 40-most: 1.03 cycles/element Float Sum unroll10x10a: Array code, unrolled by 10, Superscalar x10: Best: 0.95 (4%), Overall Best: 0.96 40-most: 1.03 cycles/element Float Sum unroll2x6a: Array code, unrolled by 12, Superscalar x6: Best: 0.96 (4%), Overall Best: 0.96 40-most: 1.03 cycles/element Float Sum unroll12x12a: Array code, unrolled by 12, Superscalar x12: Best: 0.95 (12%), Overall Best: 0.95 40-most: 0.98 cycles/element Float Sum unroll16x16a: Array code, unrolled by 16, Superscalar x16: Best: 0.94 (4%), Overall Best: 0.94 40-most: 1.01 cycles/element Float Sum unroll20x20a: Array code, unrolled by 20, Superscalar x20: Best: 1.17 (6%), Overall Best: 1.18 40-most: 1.24 cycles/element Float Sum unroll8x2: Pointer code, unrolled by 8, Superscalar x2: Best: 1.34 (6%), Overall Best: 1.34 40-most: 1.41 cycles/element Float Sum unroll8x4: Pointer code, unrolled by 8, Superscalar x4: Best: 0.93 (2%), Overall Best: 0.93 40-most: 0.99 cycles/element Float Sum unroll8x8: Pointer code, unrolled by 8, Superscalar x8: Best: 0.93 (8%), Overall Best: 0.93 40-most: 0.98 cycles/element Float Sum unroll9x3: Pointer code, unrolled by 9, Superscalar x3: Best: 0.89 (2%), Overall Best: 0.92 40-most: 0.95 cycles/element Float Sum unrollx2as: Array code, Unroll x2, Superscalar x2, noninterleaved: Best: 1.34 (8%), Overall Best: 1.34 40-most: 1.36 cycles/element Float Sum combine7: Array code, unrolled by 2, different associativity: Best: 1.33 (4%), Overall Best: 1.34 40-most: 1.35 cycles/element Float Sum unroll3aa: Array code, unrolled by 3, Different Associativity: Best: 1.18 (2%), Overall Best: 1.19 40-most: 1.20 cycles/element Float Sum unroll4aa: Array code, unrolled by 4, Different Associativity: Best: 0.87 (2%), Overall Best: 0.89 40-most: 0.90 cycles/element Float Sum unroll5aa: Array code, unrolled by 5, Different Associativity: Best: 0.89 (18%), Overall Best: 0.90 40-most: 0.90 cycles/element Float Sum unroll6aa: Array code, unrolled by 6, Different Associativity: Best: 0.90 (58%), Overall Best: 0.90 40-most: 0.91 cycles/element Float Sum unroll7aa: Array code, unrolled by 7, Different Associativity: Best: 0.89 (2%), Overall Best: 0.90 40-most: 0.91 cycles/element Float Sum unroll8aa: Array code, unrolled by 8, Different Associativity: Best: 0.89 (8%), Overall Best: 0.90 40-most: 0.91 cycles/element Float Sum unroll9aa: Array code, unrolled by 9, Different Associativity: Best: 0.90 (20%), Overall Best: 0.91 40-most: 0.91 cycles/element Float Sum unroll10aa: Array code, unrolled by 10, Different Associativity: Best: 0.89 (2%), Overall Best: 0.92 40-most: 0.92 cycles/element Float Sum unroll12aa: Array code, unrolled by 12, Different Associativity: Best: 0.92 (24%), Overall Best: 0.92 40-most: 0.94 cycles/element Float Sum simd_v1: SSE code, 1*VSIZE-way parallelism: Best: 7.33 (20%), Overall Best: 7.33 40-most: 7.36 cycles/element Float Sum simd_v2: SSE code, 2*VSIZE-way parallelism: Best: 5.77 (2%), Overall Best: 5.81 40-most: 5.85 cycles/element Float Sum simd_v4: SSE code, 4*VSIZE-way parallelism: Best: 2.51 (2%), Overall Best: 2.53 40-most: 2.57 cycles/element Float Sum simd_v8: SSE code, 8*VSIZE-way parallelism: Best: 1.72 (4%), Overall Best: 1.76 40-most: 1.80 cycles/element Float Sum simd_v10: SSE code, 10*VSIZE-way parallelism: Best: 1.69 (2%), Overall Best: 1.71 40-most: 1.76 cycles/element Float Sum simd_v12: SSE code, 12*VSIZE-way parallelism: Best: 1.77 (20%), Overall Best: 1.78 40-most: 1.81 cycles/element Float Sum simd_v2a: SSE code, 2*VSIZE-way parallelism, reassociate: Best: 5.71 (2%), Overall Best: 5.71 40-most: 5.76 cycles/element Float Sum simd_v4a: SSE code, 4*VSIZE-way parallelism, reassociate: Best: 2.64 (2%), Overall Best: 2.72 40-most: 2.75 cycles/element Float Sum simd_v8a: SSE code, 8*VSIZE-way parallelism, reassociate: Best: 1.64 (6%), Overall Best: 1.69 40-most: 1.76 cycles/element