-
Notifications
You must be signed in to change notification settings - Fork 0
Sparse matrices
Sparse matrices can be created in several ways: by converting a dense matrix to sparse:
scala> val a = bernrnd(0.5,5,5) a: BIDMat.IMat = 0 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 1 0 0 0 0 scala> val b = sparse(a) b: BIDMat.SMat = ( 1, 0) 1 ( 2, 0) 1 ( 4, 0) 1 ( 3, 1) 1 ( 1, 2) 1 ( 2, 2) 1 ( 0, 3) 1 ( 1, 3) 1 ... ... ...
from which you can see that sparse matrices print out as (row, column, value) tuples. Note that a was actually an IMat, but was implicitly cast to FMat, which is the argument type for sparse().
There is a sparse random number generator, sprand() whose arguments are nrows, ncols, and probability of element presence. The full() function converts back from a sparse matrix to a dense one.
scala> val a = sprand(5,5,0.5) a: BIDMat.SMat = ( 0, 0) 0.32618 ( 2, 0) 0.12405 ( 3, 0) 0.48652 ( 0, 1) 0.87441 ( 1, 1) 0.44249 ( 2, 1) 0.16163 ( 0, 2) 0.56278 ( 1, 2) 0.56792 ... ... ... scala> full(a) res0: BIDMat.FMat = 0.32618 0.87441 0.56278 0.50188 0 0 0.44249 0.56792 0.75591 0.58401 0.12405 0.16163 0 0.91592 0 0.48652 0 0 0.82078 0 0 0 0.97430 0 0
The operators \ and on work as they do for dense matrices. Its important to know that both horizontal and vertical concatenation is an efficient operation for sparse matrices. In both cases, the time taken is proportional to the size of the output: no additional sorting is needed.
scala> val b = sprand(5,3,0.3) b: BIDMat.SMat = ( 1, 0) 0.78089 ( 0, 1) 0.87985 ( 2, 1) 0.13684 ( 1, 2) 0.38906 ( 2, 2) 0.078547 ( 3, 2) 0.63639 scala> a \ b res1: BIDMat.SMat = ( 0, 0) 0.32618 ( 2, 0) 0.12405 ( 3, 0) 0.48652 ( 0, 1) 0.87441 ( 1, 1) 0.44249 ( 2, 1) 0.16163 ( 0, 2) 0.56278 ( 1, 2) 0.56792 ... ... ... scala> full(a\b) res2: BIDMat.FMat = 0.32618 0.87441 0.56278 0.50188 0 0 0.87985 0 0 0.44249 0.56792 0.75591 0.58401 0.78089 0 0.38906 0.12405 0.16163 0 0.91592 0 0 0.13684 0.078547 0.48652 0 0 0.82078 0 0 0 0.63639 0 0 0.97430 0 0 0 0 0 scala> full(a) res4: BIDMat.FMat = 0.32618 0.87441 0.56278 0.50188 0 0 0.44249 0.56792 0.75591 0.58401 0.12405 0.16163 0 0.91592 0 0.48652 0 0 0.82078 0 0 0 0.97430 0 0 scala> full(b) res5: BIDMat.FMat = 0 0.87985 0 0.78089 0 0.38906 0 0.13684 0.078547 0 0 0.63639 0 0 0
find functions work exactly as they do for dense matrices. i.e. find(A) returns the single indices of non-zeros of A, find2(A) returns a two-tuple with row and column indices of non-zeros, and find3(A) returns a three-tuple with row, column and value as column matrices. It should always be the case that findx(A) = findx(sparse(A)) and findx(B) = findx(full(B)) for findx one of find(), find2() or find3().
Just as for Dense matrices, the find functions for sparse matrices return the indices in column-major order. That means that the indices returned by find() will be strictly increasing.
The find() functions play a special role for sparse matrices however, because they allow access to alternative representations of sparse arrays. Sparse matrices in BIDMat, as in Fortran and Matlab, are internally stored in compressed sparse column (CSC) format. CSC is a column-major format where column indices are compressed. For a matrix in column-major order, the column indices are non-decreasing, and e.g. all the elements with column index j occur in one contiguous block in this ordering. Rather than store them explicitly, in CSC you store only the start and end indices of this block. Specifically, in a CSC matrix there is an array jc of length ncols+1 where jc(j) is the index of the first element whose column index is j in the column-major list of all non-zero elements. jc(ncols) is equal to the number of non-zeros in the matrix.
The find3() function effectively converts the matrix to coordinate format, where all of row, column and value matrices are available. It allows the matrix elements to be reordered or filtered in various ways. All the find functions are efficient, and take linear time.
The sparse function is overloaded to construct (CSC) sparse matrices from vectors of indices and value. So if we had indices defined like this:
scala> val ii = icol(1,0,2,1,2,3) ii: BIDMat.IMat = 1 0 2 1 2 3 scala> val jj = icol(0,1,1,2,2,2) jj: BIDMat.IMat = 0 1 1 2 2 2 scala> val vv = col(0.78, 0.88, 0.13, 0.39, 0.08, 0.64) vv: BIDMat.FMat = 0.78000 0.88000 0.13000 0.39000 0.080000 0.64000 scala> val a = sparse(ii,jj,vv) a: BIDMat.SMat = ( 1, 0) 0.78000 ( 0, 1) 0.88000 ( 2, 1) 0.13000 ( 1, 2) 0.39000 ( 2, 2) 0.080000 ( 3, 2) 0.64000 scala> full(a) res0: BIDMat.FMat = 0 0.88000 0 0.78000 0 0.39000 0 0.13000 0.080000 0 0 0.64000
Note that creating sparse matrices in this way is expensive. Its necessary to sort the indices to column-major order, to reduce (sum values) over duplicate indices, and to remove zero values. In general this is an O(n log n) process.
Sparse array elements can be accessed directly as A(row,column)
scala> full(a) res0: BIDMat.FMat = 0 0.88000 0 0.78000 0 0.39000 0 0.13000 0.080000 0 0 0.64000 scala> a(0,1) res1: Double = 0.88
but this is a slow operation. Since the row indices are stored in a sorted array per column, a binary search is needed to find the row index if it exists. Its better to use "batch" calculations on sparse matrices that work on all their elements at once, or on blocks of values.
Updating is possible on sparse matrix elements, but only if the element exists. Adding a new non-zero would require returning a new array, and this is not supported:
scala> full(a) res0: BIDMat.FMat = 0 0.88000 0 0.78000 0 0.39000 0 0.13000 0.080000 0 0 0.64000 scala> a(0,1)=0.2 res2: Double = 0.2 scala> full(a) res4: BIDMat.FMat = 0 0.20000 0 0.78000 0 0.39000 0 0.13000 0.080000 0 0 0.64000 scala> a(0,0)=1 java.lang.RuntimeException: Can't set missing values
Slicing is an important operation for sparse matrices because it allows operations on sparse matrices in blocks. Of particular importance is column slicing (where the row index is a wildcard), which is always efficient:
scala> val a = sprand(5,10,0.4) a: BIDMat.SMat = ( 0, 0) 0.97430 ( 2, 0) 0.50188 ( 0, 1) 0.75591 ( 1, 1) 0.91592 ( 3, 1) 0.82078 ( 4, 1) 0.58401 ( 1, 2) 0.51697 ( 3, 2) 0.19109 ... ... ... scala> full(a) res7: BIDMat.FMat = 0.97430 0.75591 0 0 0.33410 0.25824 0 0 0 0.38906 0 0.91592 0.51697 0 0 0.58547 0 0 0 0 0.50188 0 0 0 0 0.25645 0.88392 0 0.87985 0 0 0.82078 0.19109 0.066242 0 0.77816 0 0.12139 0.13684 0.078547 0 0.58401 0.93623 0 0.64584 0 0 0.78089 0 0 scala> val b = a(?, 1 \ 2 \ 4 \ 4) b: BIDMat.SMat = ( 0, 0) 0.75591 ( 1, 0) 0.91592 ( 3, 0) 0.82078 ( 4, 0) 0.58401 ( 1, 1) 0.51697 ( 3, 1) 0.19109 ( 4, 1) 0.93623 ( 0, 2) 0.33410 ... ... ... scala> full(b) res8: BIDMat.FMat = 0.75591 0 0.33410 0.33410 0.91592 0.51697 0 0 0 0 0 0 0.82078 0.19109 0 0 0.58401 0.93623 0.64584 0.64584
On the other hand, row slicing in general requires a sort of the result and can be O(n log n). If Regular row slicing is needed, it may be better to compute the transpose of the matrix first, and then slice by columns.