When dealing with large matrices, it helps to have an algorithm to break multiplication into operations on matrix subblocks. Tens of thousands of indices add up to gigabytes, and you’re going to have trouble calling X.matmul on a matrix library backend. Here’s a simple implementation that plays to the tune of Major->Minor->Major to reassemble the… Continue reading Block Matrix Multiplication