#AVX Intrinsics help

5 messages · Page 1 of 1 (latest)

tidal otter
#

For some homework i need to vectorise the k-loop of this code:


    for (int i = 0; i < N; i++)
        for (int j = 0; j < N; j++)
            for (int k = 0; k < N; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }

}```

to which i've produced this:

```void q1_vec_k() {

    //note:  B array is transposed during the init routine

    __m256 cV, bV, aV;
    __m128 xmm1; //vector for extracting significant 128 bits of cV vector to use SS command

    for (int i = 0; i < N; i++) {
        for (int j = 0; j < N; j++) {

            cV =_mm256_setzero_ps(); //sets the value of all C vector elements to 0

            for (int k = 0; k < (N/8) * 8; k += 8) {
                
                bV = _mm256_loadu_ps(&kB[k][j]); //loads 8 values of the transposed B-Array (column order of the original array)
                aV = _mm256_loadu_ps(&A[i][k]); //loads 8 values of the A array (row order)

                cV = _mm256_fmadd_ps(aV, bV, cV); //cv += (aV * bV)
            }

            cV = _mm256_hadd_ps(cV, cV);
            cV = _mm256_hadd_ps(cV, cV);
            cV = _mm256_hadd_ps(cV, cV); //at this point 1 element of cV is the sum of 8 elements

            xmm1 = _mm256_extractf128_ps(cV, 0); //extracts the first 4 elements of cV, converting it to a 128 bit vector, allowing for use of store_ss

            _mm_store_ss(&kC[i][j], xmm1);

            for (int k = (N / 8) * 8; k < N; k++) {
                kC[i][j] += A[i][k] * B[k][j];
            }
        }
    }
}```

but this doesn't produce the same result: upwards of 20 - 70 out of what it should be, so it returns things as false, and its really stressing me out. Can anyone help?

Full code listed below:
limpid irisBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

tidal otter
#

apologies if this is a really easy fix, im quite new to all of this and its really frustrating me as i cant figure out what im doing wrong

tidal otter
#

!resolved

#

!solved