How to speed up large double sums in a table?Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's
Who is frowning in the sentence "Daisy looked at Tom frowning"?
Why can't I share a one use code with anyone else?
Is it standard to have the first week's pay indefinitely withheld?
Single word that parallels "Recent" when discussing the near future
Why are lawsuits between the President and Congress not automatically sent to the Supreme Court
Why did the soldiers of the North disobey Jon?
How was the blinking terminal cursor invented?
Was the dragon prowess intentionally downplayed in S08E04?
Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?
Polynomial division: Is this trick obvious?
Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y
Would a "ring language" be possible?
Working hours and productivity expectations for game artists and programmers
How does this piece of code determine array size without using sizeof( )?
Would life always name the light from their sun "white"
How to pass store code to custom URL in magento 2
How to generate a triangular grid from a list of points
Why is the marginal distribution/marginal probability described as "marginal"?
How can we delete item permanently without storing in Recycle Bin?
What technology would Dwarves need to forge titanium?
Why were the bells ignored in S8E5?
Why doesn't Iron Man's action affect this person in Endgame?
301 Redirects what does ([a-z]+)-(.*) and ([0-9]+)-(.*) mean
Holding rent money for my friend which amounts to over $10k?
How to speed up large double sums in a table?
Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's
$begingroup$
I am calculating a 3-by-3 matrix whose elements are given as follows:
$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$
where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$
Below is my implementation of the matrix $M$ in Mathematica:
matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];
and here vecs
is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).
- How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using
Table
or the fact that I access the components stored in a big array usingPart
? Any advice would be very helpful.
matrix performance-tuning table summation
$endgroup$
add a comment |
$begingroup$
I am calculating a 3-by-3 matrix whose elements are given as follows:
$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$
where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$
Below is my implementation of the matrix $M$ in Mathematica:
matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];
and here vecs
is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).
- How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using
Table
or the fact that I access the components stored in a big array usingPart
? Any advice would be very helpful.
matrix performance-tuning table summation
$endgroup$
$begingroup$
Avoid usingN
. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11
$begingroup$
@zhk good point, I actually usednpart
as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced bynpart
instead ofN
in the shown code.
$endgroup$
– user929304
May 5 at 12:13
add a comment |
$begingroup$
I am calculating a 3-by-3 matrix whose elements are given as follows:
$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$
where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$
Below is my implementation of the matrix $M$ in Mathematica:
matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];
and here vecs
is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).
- How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using
Table
or the fact that I access the components stored in a big array usingPart
? Any advice would be very helpful.
matrix performance-tuning table summation
$endgroup$
I am calculating a 3-by-3 matrix whose elements are given as follows:
$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$
where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$
Below is my implementation of the matrix $M$ in Mathematica:
matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];
and here vecs
is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).
- How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using
Table
or the fact that I access the components stored in a big array usingPart
? Any advice would be very helpful.
matrix performance-tuning table summation
matrix performance-tuning table summation
edited May 5 at 15:11
user64494
3,70011222
3,70011222
asked May 5 at 12:10
user929304user929304
37829
37829
$begingroup$
Avoid usingN
. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11
$begingroup$
@zhk good point, I actually usednpart
as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced bynpart
instead ofN
in the shown code.
$endgroup$
– user929304
May 5 at 12:13
add a comment |
$begingroup$
Avoid usingN
. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11
$begingroup$
@zhk good point, I actually usednpart
as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced bynpart
instead ofN
in the shown code.
$endgroup$
– user929304
May 5 at 12:13
$begingroup$
Avoid using
N
. It is a built-in functionality..$endgroup$
– zhk
May 5 at 12:11
$begingroup$
Avoid using
N
. It is a built-in functionality..$endgroup$
– zhk
May 5 at 12:11
$begingroup$
@zhk good point, I actually used
npart
as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart
instead of N
in the shown code.$endgroup$
– user929304
May 5 at 12:13
$begingroup$
@zhk good point, I actually used
npart
as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart
instead of N
in the shown code.$endgroup$
– user929304
May 5 at 12:13
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Using only matrix operations makes this much faster (runtime scales quadratically with npart
):
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];
For npart=10000
this will take a lot of memory though: the intermediate array diff
will be huge.
update
After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs
:
matrixM = 2*(npart-1)*Covariance[vecs]
Much much faster, no space issues, no stability issues!
$endgroup$
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
3
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
1
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
1
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
|
show 2 more comments
$begingroup$
Your double sum can be written in terms of single sums.
$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$
A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by
$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$
where $barr_k^i$ is the mean of the $r_k^i$ values.
nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];
m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)
AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)
Using single sums is about 40,000 times faster for 1,000 particles.
$endgroup$
2
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
As a concrete example of my concern, usingr = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first definingr1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method withr1
instead ofr
.
$endgroup$
– Roman
May 5 at 16:48
2
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
1
$begingroup$
see update on my answer: this robustification means using a built-inCovariance
function!
$endgroup$
– Roman
May 5 at 17:34
|
show 2 more comments
$begingroup$
This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression
.
Now we can do this:
npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];
(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First
r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First
Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]
16.8389
0.076707
6.0471*10^-15
Thus, utilizing ACACompression
is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).
The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.
When the points are not in remarkably ill-poised, ACACompression
will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression
will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer
and the double Sum
s themselves which need at least $O(N^2)$ time. More severely, Outer
needs also $O(N^2)$ memory. This is why using Outer
becomes infeasible quite qickly for increasing $N$.
$endgroup$
2
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Using only matrix operations makes this much faster (runtime scales quadratically with npart
):
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];
For npart=10000
this will take a lot of memory though: the intermediate array diff
will be huge.
update
After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs
:
matrixM = 2*(npart-1)*Covariance[vecs]
Much much faster, no space issues, no stability issues!
$endgroup$
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
3
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
1
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
1
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
|
show 2 more comments
$begingroup$
Using only matrix operations makes this much faster (runtime scales quadratically with npart
):
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];
For npart=10000
this will take a lot of memory though: the intermediate array diff
will be huge.
update
After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs
:
matrixM = 2*(npart-1)*Covariance[vecs]
Much much faster, no space issues, no stability issues!
$endgroup$
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
3
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
1
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
1
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
|
show 2 more comments
$begingroup$
Using only matrix operations makes this much faster (runtime scales quadratically with npart
):
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];
For npart=10000
this will take a lot of memory though: the intermediate array diff
will be huge.
update
After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs
:
matrixM = 2*(npart-1)*Covariance[vecs]
Much much faster, no space issues, no stability issues!
$endgroup$
Using only matrix operations makes this much faster (runtime scales quadratically with npart
):
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];
For npart=10000
this will take a lot of memory though: the intermediate array diff
will be huge.
update
After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs
:
matrixM = 2*(npart-1)*Covariance[vecs]
Much much faster, no space issues, no stability issues!
edited May 5 at 17:34
answered May 5 at 12:59
RomanRoman
8,27811238
8,27811238
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
3
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
1
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
1
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
|
show 2 more comments
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
3
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
1
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
1
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21
3
3
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25
1
1
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29
1
1
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33
|
show 2 more comments
$begingroup$
Your double sum can be written in terms of single sums.
$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$
A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by
$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$
where $barr_k^i$ is the mean of the $r_k^i$ values.
nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];
m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)
AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)
Using single sums is about 40,000 times faster for 1,000 particles.
$endgroup$
2
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
As a concrete example of my concern, usingr = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first definingr1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method withr1
instead ofr
.
$endgroup$
– Roman
May 5 at 16:48
2
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
1
$begingroup$
see update on my answer: this robustification means using a built-inCovariance
function!
$endgroup$
– Roman
May 5 at 17:34
|
show 2 more comments
$begingroup$
Your double sum can be written in terms of single sums.
$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$
A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by
$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$
where $barr_k^i$ is the mean of the $r_k^i$ values.
nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];
m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)
AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)
Using single sums is about 40,000 times faster for 1,000 particles.
$endgroup$
2
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
As a concrete example of my concern, usingr = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first definingr1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method withr1
instead ofr
.
$endgroup$
– Roman
May 5 at 16:48
2
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
1
$begingroup$
see update on my answer: this robustification means using a built-inCovariance
function!
$endgroup$
– Roman
May 5 at 17:34
|
show 2 more comments
$begingroup$
Your double sum can be written in terms of single sums.
$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$
A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by
$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$
where $barr_k^i$ is the mean of the $r_k^i$ values.
nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];
m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)
AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)
Using single sums is about 40,000 times faster for 1,000 particles.
$endgroup$
Your double sum can be written in terms of single sums.
$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$
A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by
$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$
where $barr_k^i$ is the mean of the $r_k^i$ values.
nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];
m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)
AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)
Using single sums is about 40,000 times faster for 1,000 particles.
edited May 5 at 17:21
answered May 5 at 16:06
JimBJimB
18.9k12863
18.9k12863
2
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
As a concrete example of my concern, usingr = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first definingr1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method withr1
instead ofr
.
$endgroup$
– Roman
May 5 at 16:48
2
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
1
$begingroup$
see update on my answer: this robustification means using a built-inCovariance
function!
$endgroup$
– Roman
May 5 at 17:34
|
show 2 more comments
2
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
As a concrete example of my concern, usingr = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first definingr1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method withr1
instead ofr
.
$endgroup$
– Roman
May 5 at 16:48
2
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
1
$begingroup$
see update on my answer: this robustification means using a built-inCovariance
function!
$endgroup$
– Roman
May 5 at 17:34
2
2
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48
$begingroup$
As a concrete example of my concern, using
r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method with r1
instead of r
.$endgroup$
– Roman
May 5 at 16:48
$begingroup$
As a concrete example of my concern, using
r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6
already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]]
and then doing the fast method with r1
instead of r
.$endgroup$
– Roman
May 5 at 16:48
2
2
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02
1
1
$begingroup$
see update on my answer: this robustification means using a built-in
Covariance
function!$endgroup$
– Roman
May 5 at 17:34
$begingroup$
see update on my answer: this robustification means using a built-in
Covariance
function!$endgroup$
– Roman
May 5 at 17:34
|
show 2 more comments
$begingroup$
This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression
.
Now we can do this:
npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];
(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First
r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First
Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]
16.8389
0.076707
6.0471*10^-15
Thus, utilizing ACACompression
is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).
The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.
When the points are not in remarkably ill-poised, ACACompression
will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression
will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer
and the double Sum
s themselves which need at least $O(N^2)$ time. More severely, Outer
needs also $O(N^2)$ memory. This is why using Outer
becomes infeasible quite qickly for increasing $N$.
$endgroup$
2
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
add a comment |
$begingroup$
This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression
.
Now we can do this:
npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];
(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First
r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First
Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]
16.8389
0.076707
6.0471*10^-15
Thus, utilizing ACACompression
is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).
The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.
When the points are not in remarkably ill-poised, ACACompression
will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression
will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer
and the double Sum
s themselves which need at least $O(N^2)$ time. More severely, Outer
needs also $O(N^2)$ memory. This is why using Outer
becomes infeasible quite qickly for increasing $N$.
$endgroup$
2
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
add a comment |
$begingroup$
This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression
.
Now we can do this:
npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];
(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First
r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First
Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]
16.8389
0.076707
6.0471*10^-15
Thus, utilizing ACACompression
is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).
The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.
When the points are not in remarkably ill-poised, ACACompression
will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression
will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer
and the double Sum
s themselves which need at least $O(N^2)$ time. More severely, Outer
needs also $O(N^2)$ memory. This is why using Outer
becomes infeasible quite qickly for increasing $N$.
$endgroup$
This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression
.
Now we can do this:
npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];
(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First
r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First
Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]
16.8389
0.076707
6.0471*10^-15
Thus, utilizing ACACompression
is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).
The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.
When the points are not in remarkably ill-poised, ACACompression
will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression
will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer
and the double Sum
s themselves which need at least $O(N^2)$ time. More severely, Outer
needs also $O(N^2)$ memory. This is why using Outer
becomes infeasible quite qickly for increasing $N$.
edited May 5 at 15:33
answered May 5 at 14:42
Henrik SchumacherHenrik Schumacher
62.2k586172
62.2k586172
2
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
add a comment |
2
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
2
2
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Avoid using
N
. It is a built-in functionality..$endgroup$
– zhk
May 5 at 12:11
$begingroup$
@zhk good point, I actually used
npart
as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced bynpart
instead ofN
in the shown code.$endgroup$
– user929304
May 5 at 12:13