How to speed up large double sums in a table?Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's

Who is frowning in the sentence "Daisy looked at Tom frowning"?

Why can't I share a one use code with anyone else?

Is it standard to have the first week's pay indefinitely withheld?

Single word that parallels "Recent" when discussing the near future

Why are lawsuits between the President and Congress not automatically sent to the Supreme Court

Why did the soldiers of the North disobey Jon?

How was the blinking terminal cursor invented?

Was the dragon prowess intentionally downplayed in S08E04?

Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?

Polynomial division: Is this trick obvious?

Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y

Would a "ring language" be possible?

Working hours and productivity expectations for game artists and programmers

How does this piece of code determine array size without using sizeof( )?

Would life always name the light from their sun "white"

How to pass store code to custom URL in magento 2

How to generate a triangular grid from a list of points

Why is the marginal distribution/marginal probability described as "marginal"?

How can we delete item permanently without storing in Recycle Bin?

What technology would Dwarves need to forge titanium?

Why were the bells ignored in S8E5?

Why doesn't Iron Man's action affect this person in Endgame?

301 Redirects what does ([a-z]+)-(.*) and ([0-9]+)-(.*) mean

Holding rent money for my friend which amounts to over $10k?



How to speed up large double sums in a table?


Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's













4












$begingroup$


I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.









share|improve this question











$endgroup$











  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13
















4












$begingroup$


I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.









share|improve this question











$endgroup$











  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13














4












4








4


2



$begingroup$


I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.









share|improve this question











$endgroup$




I am calculating a 3-by-3 matrix whose elements are given as follows:



$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$



where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$



Below is my implementation of the matrix $M$ in Mathematica:



matrixM = Table[(1./npart)*
Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] -
Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];


and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).




  • How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.






matrix performance-tuning table summation






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 5 at 15:11









user64494

3,70011222




3,70011222










asked May 5 at 12:10









user929304user929304

37829




37829











  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13

















  • $begingroup$
    Avoid using N. It is a built-in functionality..
    $endgroup$
    – zhk
    May 5 at 12:11










  • $begingroup$
    @zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
    $endgroup$
    – user929304
    May 5 at 12:13
















$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11




$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11












$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13





$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13











3 Answers
3






active

oldest

votes


















12












$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$












  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33


















7












$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34


















6












$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









12












$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$












  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33















12












$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$












  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33













12












12








12





$begingroup$

Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!






share|improve this answer











$endgroup$



Using only matrix operations makes this much faster (runtime scales quadratically with npart):



matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart];


For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.




update



After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:



matrixM = 2*(npart-1)*Covariance[vecs]


Much much faster, no space issues, no stability issues!







share|improve this answer














share|improve this answer



share|improve this answer








edited May 5 at 17:34

























answered May 5 at 12:59









RomanRoman

8,27811238




8,27811238











  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33
















  • $begingroup$
    This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
    $endgroup$
    – user929304
    May 5 at 13:21






  • 3




    $begingroup$
    @user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
    $endgroup$
    – Roman
    May 5 at 13:25







  • 1




    $begingroup$
    Very instructive, thanks a lot.
    $endgroup$
    – user929304
    May 5 at 13:37










  • $begingroup$
    What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
    $endgroup$
    – user929304
    May 5 at 18:29






  • 1




    $begingroup$
    @CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
    $endgroup$
    – Roman
    May 6 at 8:33















$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21




$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21




3




3




$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25





$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25





1




1




$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37




$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37












$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29




$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29




1




1




$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33




$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33











7












$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34















7












$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34













7












7








7





$begingroup$

Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.






share|improve this answer











$endgroup$



Your double sum can be written in terms of single sums.



$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$



A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by



$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$



where $barr_k^i$ is the mean of the $r_k^i$ values.



nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles,
i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] -
Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)


Using single sums is about 40,000 times faster for 1,000 particles.







share|improve this answer














share|improve this answer



share|improve this answer








edited May 5 at 17:21

























answered May 5 at 16:06









JimBJimB

18.9k12863




18.9k12863







  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34












  • 2




    $begingroup$
    I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
    $endgroup$
    – Roman
    May 5 at 16:43










  • $begingroup$
    @Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
    $endgroup$
    – JimB
    May 5 at 16:48










  • $begingroup$
    As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
    $endgroup$
    – Roman
    May 5 at 16:48






  • 2




    $begingroup$
    @Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
    $endgroup$
    – JimB
    May 5 at 17:02






  • 1




    $begingroup$
    see update on my answer: this robustification means using a built-in Covariance function!
    $endgroup$
    – Roman
    May 5 at 17:34







2




2




$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43




$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43












$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48




$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48












$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48




$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48




2




2




$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02




$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02




1




1




$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34




$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34











6












$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40















6












$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$








  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40













6












6








6





$begingroup$

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.






share|improve this answer











$endgroup$



This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.



Now we can do this:



npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
(Transpose[diff].diff)/npart
]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
Block[x, y, col, row,
x = r[[i]];
y = r[[j]];
col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
U, V = ACACompression[row, col];
Total[U, 2].Total[V, 2]
],
i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]



16.8389



0.076707



6.0471*10^-15




Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).



The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.



When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.







share|improve this answer














share|improve this answer



share|improve this answer








edited May 5 at 15:33

























answered May 5 at 14:42









Henrik SchumacherHenrik Schumacher

62.2k586172




62.2k586172







  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40












  • 2




    $begingroup$
    Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
    $endgroup$
    – user929304
    May 5 at 18:36










  • $begingroup$
    You're welcome!
    $endgroup$
    – Henrik Schumacher
    May 5 at 18:40







2




2




$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36




$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36












$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40




$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

















draft saved

draft discarded
















































Thanks for contributing an answer to Mathematica Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company