How to speed up large double sums in a table?Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's

Who is frowning in the sentence "Daisy looked at Tom frowning"?

Why can't I share a one use code with anyone else?

Is it standard to have the first week's pay indefinitely withheld?

Single word that parallels "Recent" when discussing the near future

Why are lawsuits between the President and Congress not automatically sent to the Supreme Court

Why did the soldiers of the North disobey Jon?

How was the blinking terminal cursor invented?

Was the dragon prowess intentionally downplayed in S08E04?

Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?

Polynomial division: Is this trick obvious?

Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y

Would a "ring language" be possible?

Working hours and productivity expectations for game artists and programmers

How does this piece of code determine array size without using sizeof( )?

Would life always name the light from their sun "white"

How to pass store code to custom URL in magento 2

How to generate a triangular grid from a list of points

Why is the marginal distribution/marginal probability described as "marginal"?

How can we delete item permanently without storing in Recycle Bin?

What technology would Dwarves need to forge titanium?

Why were the bells ignored in S8E5?

Why doesn't Iron Man's action affect this person in Endgame?

301 Redirects what does ([a-z]+)-(.*) and ([0-9]+)-(.*) mean

Holding rent money for my friend which amounts to over $10k?



How to speed up large double sums in a table?


Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's













4












$begingroup$


I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.









share|improve this question











$endgroup$











  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13
















4












$begingroup$


I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.









share|improve this question











$endgroup$











  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13














4












4








4


2



$begingroup$


I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.









share|improve this question











$endgroup$




I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.






matrix performance-tuning table summation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 5 at 15:11









user64494

3,70011222




3,70011222










asked May 5 at 12:10









user929304user929304

37829




37829











  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13

















  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13
















$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11




$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11












$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13





$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13











3 Answers
3






active

oldest

votes


















12












$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$












  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33


















7












$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34


















6












$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









12












$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$












  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33















12












$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$












  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33













12












12








12





$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$



Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!







share|improve this answer














share|improve this answer



share|improve this answer








edited May 5 at 17:34

























answered May 5 at 12:59









RomanRoman

8,27811238




8,27811238











  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33
















  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33















$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21




$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21




3




3




$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25





$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25





1




1




$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37




$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37












$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29




$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29




1




1




$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33




$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33











7












$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34















7












$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34













7












7








7





$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$



Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.







share|improve this answer














share|improve this answer



share|improve this answer








edited May 5 at 17:21

























answered May 5 at 16:06









JimBJimB

18.9k12863




18.9k12863







  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34












  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34







2




2




$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43




$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43












$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48




$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48












$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48




$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48




2




2




$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02




$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02




1




1




$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34




$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34











6












$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40















6












$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40













6












6








6





$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$



This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.







share|improve this answer














share|improve this answer



share|improve this answer








edited May 5 at 15:33

























answered May 5 at 14:42









Henrik SchumacherHenrik Schumacher

62.2k586172




62.2k586172







  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40












  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40







2




2




$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36




$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36












$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40




$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

















draft saved

draft discarded
















































Thanks for contributing an answer to Mathematica Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020