Introduction to GKMX

GKMX stands for General Kernel Matrix Operations, which currently contain two templates GKMM (Matrix Matrix) and GKRM (Reduce Matrix). In brief, GKMM is a matrix matrix multiplication (C=A*B), but matrices A, B and C can have different types. If you are familiar with GEMM in the BLAS library (Basic Linear Algebra Subprograms), then you can now interpret GEMM as a special case of GKMM where all the types are the same (float, double, complex). However, when types of A, B and C are different, multiplication (x) and addition (+) may have no definition on these types. To make this work, an instance of GKMM must come with some appropriate transformations that satisfy the type system. We describe more features of GKMM and GKRM in details.

GKMM (General Kernel Matrix Matrix)

Definition

GKMM< OPKERNEL, OP1, OP2, TA, TB, TC, TV > computes the following:

gkmm definition

Here K has type OPKERNEL, bigoplus has type OP1, and bigotimes has type OP2. These operations satisfy

< TV > = OP1( < TV > OP2( < TA >, < TB > ) ), and

< TC > = OPKERNEL( < TV > ).

For example to implement a batched matrix matrix multiplication in double precision on x86_64 and Intel Xeon Phi, then we can create an instance gkmm called gkmm_dfma. Ignore those number in the template first. Those numbers are block sizes to optimization. We define OP1 and OP2 with STL. Notice that we did not use std::identity but define our own identity operator to accept extra arguments.

using namespace hmlp::gkmx;

template<typename TC, typename TV>
struct identity
{
  inline TC operator()( const TV& x, int i, int j, int b ) const
  { return x; }
};

void gkmm_dfma
(
  hmlpOperation_t transA, hmlpOperation_t transB,
  int m, int n, int k,
  double *A, int lda,
  double *B, int ldb,
  double *C, int ldc,
  int batchSize
)
{
  std::plus<double> op1;
  std::multiplies<double> op2;
  identity<double, double> opkernel;

  double initV = 0.0;

  gkmm
  <104, 4096, 256, 8, 4, 104, 4096, 8, 4, 32,
  false, 
  identity<double, double>, std::plus<double>, std::multiplies<double>,
  double, double, double, double>
  (
    transA, transB,
    m, n, k,
    A, lda,
    B, ldb,
    C, ldc,
    batchSize,
    opkernel, op1, op2, initV
  );
}

Templates

Microkernels

See SEMIRINGKERNEL and FUSEDKERNEL for the prototype of architecture dependent kernels of GKMM.

GKRM (General Kernel Reduce Matrix)

GKRM< OPKERNEL, OP1, OP2, OPREDUCE, TA, TB, TC, TV > computes the following:

gkrm definition

The new reduce operator (bigodot) has type OPREDUCE.

Templates

Microkernels

See SEMIRINGKERNEL and FUSEDKERNEL for the prototype of architecture dependent kernels of GKRM.

Algorithms and Implementation

The goal of GKMX is to enable high performance computing on all matrix matrix multiplication like operations on most of the popular architectures. For x86_64 and Intel Xeon Phi architectures, GKMX implements the Goto algorithm [] in the BLIS framework [].

Limitation

HMLP is currently not an open source project. Do not distribute!!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction to GKMX

GKMM (General Kernel Matrix Matrix)

Definition

Templates

Microkernels

GKRM (General Kernel Reduce Matrix)

Templates

Microkernels

Algorithms and Implementation

Limitation

Clone this wiki locally