Thresholding sparse matrices in Matlab
Here are the methods I tried:
function [tA] = hard_threshold(A, t)
tic;
tA = sparse(size(A));
tA(abs(A) >= t) = A(abs(A) >= t);
toc;
clear tA;
tic;
tA = A;
tA(abs(tA) < t) = 0;
toc;
clear tA;
tic;
tA = A;
find_A = find(A);
find_tA = find(abs(A) >= t);
victim_tA = setdiff(find_A, find_tA);
tA(victim_tA) = 0;
toc;
fprintf('numel(A):%i nnz(A):%i nnz(tA):%i \n', numel(A), nnz(A), nnz(tA)');
end
I first tried a small sparse matrix with 100k elements, 1% sparsity, removing 50% of nonzeros:
A = sprand(1e5,1,0.01); tA = hard_threshold(A, 0.5);
Elapsed time is 0.128991 seconds.
Elapsed time is 0.007644 seconds.
Elapsed time is 0.003038 seconds.
numel(A):100000 nnz(A):995 nnz(tA):489
I next repeated with 1m elements:
A = sprand(1e6,1,0.01); tA = hard_threshold(A, 0.5);
Elapsed time is 15.456836 seconds.
Elapsed time is 0.082908 seconds.
Elapsed time is 0.018396 seconds.
numel(A):1000000 nnz(A):9966 nnz(tA):5019
With 100m elements, excluding the first, slowest, method:
A = sprand(1e8,1,0.01); tA = hard_threshold(A, 0.5);
Elapsed time is 16.405617 seconds.
Elapsed time is 0.259951 seconds.
numel(A):100000000 nnz(A):994845 nnz(tA):498195
The time differential is about the same even when the thresholded matrix is much sparser than the original:
A = sprand(1e8,1,0.01); tA = hard_threshold(A, 0.95);
Elapsed time is 12.980427 seconds.
Elapsed time is 0.238180 seconds.
numel(A):100000000 nnz(A):995090 nnz(tA):49950
The second method fails due to memory constraints for really large sparse matrices:
Error using <
Requested 1000000000x1 (7.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may
take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information. Error in hard_threshold (line 10)
tA(abs(tA) < t) = 0;
After excluding the second method, the third method gives:
A = sprand(1e9,1,0.01); tA = hard_threshold(A, 0.5);
Elapsed time is 1.894251 seconds.
numel(A):1000000000 nnz(A):9950069 nnz(tA):4977460
Are there any other approaches that are faster?
This was originally published here: https://calvinmccarter.wordpress.com/2017/06/07/thresholding-sparse-matrices-in-matlab/