Block Matrix Multiplication

When dealing with large matrices, it helps to have an algorithm to break multiplication into operations on matrix subblocks. Tens of thousands of indices add up to gigabytes, and you’re going to have trouble calling X.matmul on a matrix library backend. Here’s a simple implementation that plays to the tune of Major->Minor->Major to reassemble the subblocks.

Row / Column, Major and Minor

“In row-major order, the consecutive elements of a row reside next to each other, whereas the same holds true for consecutive elements of a column in column-major order. While the terms allude to the rows and columns of a two-dimensional array, i.e. a matrix, the orders can be generalized to arrays of any dimension by noting that the terms row-major and column-major are equivalent to lexicographic and colexicographic orders, respectively.”

excerpt and illustration: Row- and column-major order

The Major/Minor distinction will be helpful in writing a block matrix multiplication algorithm as we’ll see below.

An outer loop increments in Major, the second loop increments in Minor, and the third loop writes values corresponding to both the Minor and Major indices.

Essentially, the values are determined by the relationship: Major -> (Minor depending Major)

source code, Python

def structured_mult(A,B,blockwid=(500,500)):
    """
    structured_mult(A,B,blockwid)
    return matrix product A*B
    """
    m,_ = A.shape; _,n = B.shape
    shape = (m,n)
    A_0 = np.zeros(shape)

    # working index incrementors
    x_i,y_i = 0,0
    x_j,y_j = 0,0

    bx,by = blockwid

    M,N = np.array(A.shape)//blockwid

    print("M,N", M,N)
    print("bx,by",bx,by)

    # tot_ct = 0

    # parameter <--> incrementor, Key
    # p = y_j
    # n = x_j, n = y_i
    # q = y_j, q = x_i
    # see statement (*)
    for q in range(M):
        y_i = 0
        x_j = 0

        for n in range(N):
            #print(tot_ct, x_i, x_j, y_i, y_j)
            # tot_ct+=1
            # print( A[x_i:x_i+500,y_i:y_i+500])
            # print( B[x_j:x_j+500,y_j:y_j+500])
            # print(x_i ,   x_i+500   ,  y_i, y_i+500, )
            # print(x_j          ,    x_j+500,    y_j        ,   y_j+500)

            #print("shape",A[x_i:x_i+by, y_i:y_i+bx].shape)
            for p in range(M):
                # Major -> Minor -> Major

                # statement (*)
                G = tf.matmul(
                    A[x_i:x_i+bx, y_i:y_i+by] , B[x_j:x_j+by, p*bx:(p+1)*bx])

                A_0[q*bx:(q+1)*bx,p*bx:(p+1)*bx] += G

            y_i += by
            x_j += by
        y_j += bx
        x_i += bx

    return np.array(A_0)

source code, Python

By Alexander Wei

Leave a comment Cancel reply