How to speed up large double sums in a table?Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's

Who is frowning in the sentence "Daisy looked at Tom frowning"?

Why can't I share a one use code with anyone else?

Is it standard to have the first week's pay indefinitely withheld?

Single word that parallels "Recent" when discussing the near future

Why are lawsuits between the President and Congress not automatically sent to the Supreme Court

Why did the soldiers of the North disobey Jon?

How was the blinking terminal cursor invented?

Was the dragon prowess intentionally downplayed in S08E04?

Why aren't satellites disintegrated even though they orbit earth within their Roche Limits?

Polynomial division: Is this trick obvious?

Five Powers of Fives Produce Unique Pandigital Number...Solve for X..Tell me Y

Would a "ring language" be possible?

Working hours and productivity expectations for game artists and programmers

How does this piece of code determine array size without using sizeof( )?

Would life always name the light from their sun "white"

How to pass store code to custom URL in magento 2

How to generate a triangular grid from a list of points

Why is the marginal distribution/marginal probability described as "marginal"?

How can we delete item permanently without storing in Recycle Bin?

What technology would Dwarves need to forge titanium?

Why were the bells ignored in S8E5?

Why doesn't Iron Man's action affect this person in Endgame?

301 Redirects what does ([a-z]+)-(.*) and ([0-9]+)-(.*) mean

Holding rent money for my friend which amounts to over $10k?

How to speed up large double sums in a table?

Calculating the trace of the product of several matricesImproving the speed of evaluating Table with nested functionsHow to speed up ReplaceAll in Table?How to increase the evaluation speed of this somewhat complicated table / matrix operation?How to speed up the construction of a large matrixSpeed up computation of sum from large matrixError with Parallel Calculation on Large Multiple SumsSpeed up double summationSpeeding up code for (0,1) Matrix and Calculating Recurrent Rate (RQA analyses)Speed up the generation of large Hadamard matricesEfficient computation of matrices involving large sums of KroneckerDelta's

I am calculating a 3-by-3 matrix whose elements are given as follows:

$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$

where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$

Below is my implementation of the matrix $M$ in Mathematica:

matrixM = Table[(1./npart)*
 Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] - 
 Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];

and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).

How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.

edited May 5 at 15:11

user64494

3,70011222

asked May 5 at 12:10

user929304

37829

$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11

$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13

add a comment |

I am calculating a 3-by-3 matrix whose elements are given as follows:

$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$

where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$

Below is my implementation of the matrix $M$ in Mathematica:

matrixM = Table[(1./npart)*
 Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] - 
 Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];

and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).

How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.

edited May 5 at 15:11

user64494

3,70011222

asked May 5 at 12:10

user929304

37829

$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11

$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13

add a comment |

I am calculating a 3-by-3 matrix whose elements are given as follows:

$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$

where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$

Below is my implementation of the matrix $M$ in Mathematica:

matrixM = Table[(1./npart)*
 Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] - 
 Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];

and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).

How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.

edited May 5 at 15:11

user64494

3,70011222

asked May 5 at 12:10

user929304

37829

I am calculating a 3-by-3 matrix whose elements are given as follows:

$$
M_mn = frac1Nsum_i=1^N sum_j=1^N (r^i_m - r^j_m)(r^i_n - r^j_n) tag1
$$

where $N$ is the total number of particles, and $r^i_m$ denotes the m-th component of the i-th particle's vector. The sums can be computationally expensive as often $N$ is around $10000.$

Below is my implementation of the matrix $M$ in Mathematica:

matrixM = Table[(1./npart)*
 Sum[Sum[(Part[vecs, i, m] - Part[vecs, j, m])*(Part[vecs, i, n] - 
 Part[vecs, j, n]), j, 1, npart], i, 1, npart], m, 3, n, 3];

and here vecs is the array of all particle vectors (so one line per particle and each line of the array is a vector of 3 components).

How could I speed up computations of this nature? Is there possibly a bottleneck in my efficiency due to the way I generate the matrix using Table or the fact that I access the components stored in a big array using Part? Any advice would be very helpful.

matrix performance-tuning table summation

edited May 5 at 15:11

user64494

3,70011222

asked May 5 at 12:10

user929304

37829

edited May 5 at 15:11

user64494

3,70011222

asked May 5 at 12:10

user929304

37829

edited May 5 at 15:11

user64494

3,70011222

edited May 5 at 15:11

user64494

3,70011222

edited May 5 at 15:11

user64494

3,70011222

asked May 5 at 12:10

user929304

37829

asked May 5 at 12:10

user929304

37829

asked May 5 at 12:10

user929304

37829

$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11

$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13

add a comment |

$begingroup$
Avoid using N. It is a built-in functionality..
$endgroup$
– zhk
May 5 at 12:11

$begingroup$
@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.
$endgroup$
– user929304
May 5 at 12:13

Avoid using N. It is a built-in functionality..

– zhk
May 5 at 12:11

@zhk good point, I actually used npart as variable name in my own notebook, here as I was writing the post I ended up using same names as in eq. (1), now it's replaced by npart instead of N in the shown code.

– user929304
May 5 at 12:13

add a comment |

3 Answers
3

active

oldest

votes

Using only matrix operations makes this much faster (runtime scales quadratically with npart):

matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
 (Transpose[diff].diff)/npart];

For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.

update

After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:

matrixM = 2*(npart-1)*Covariance[vecs]

Much much faster, no space issues, no stability issues!

edited May 5 at 17:34

answered May 5 at 12:59

Roman

8,27811238

$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21

3

$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25

1

$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37

$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29

1

$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33

|
show 2 more comments

Your double sum can be written in terms of single sums.

$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$

A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by

$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$

where $barr_k^i$ is the mean of the $r_k^i$ values.

nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
 Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles, 
 i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
 2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] - 
 Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)

Using single sums is about 40,000 times faster for 1,000 particles.

edited May 5 at 17:21

answered May 5 at 16:06

JimB

18.9k12863

2

$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43

$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48

$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48

2

$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02

1

$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34

|
show 2 more comments

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.

Now we can do this:

npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1], 
 (Transpose[diff].diff)/npart
 ]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
 Block[x, y, col, row,
 x = r[[i]];
 y = r[[j]];
 col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
 U, V = ACACompression[row, col];
 Total[U, 2].Total[V, 2]
 ],
 i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]

16.8389

0.076707

6.0471*10^-15

Thus, utilizing ACACompression is about 200 times faster for 5000 points, returning the result correctly up to machine precision. For 100000 uniformly distributed points, it needs about 1 second. For 1000000 uniformly distributed points, it needs about 20 second (probably due to a memory bottlenecks).

The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.

When the points are not in remarkably ill-poised, ACACompression will terminate with an almost exact low-rank factorization of the matrix that contains the summands of the double sum. For $N$ particles, ACACompression will need roughly $O(N)$ time and $O(N)$ memory for the factorization -- in contrast to Outer and the double Sums themselves which need at least $O(N^2)$ time. More severely, Outer needs also $O(N^2)$ memory. This is why using Outer becomes infeasible quite qickly for increasing $N$.

edited May 5 at 15:33

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

2

$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36

$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197722%2fhow-to-speed-up-large-double-sums-in-a-table%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Using only matrix operations makes this much faster (runtime scales quadratically with npart):

matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
 (Transpose[diff].diff)/npart];

For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.

update

After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:

matrixM = 2*(npart-1)*Covariance[vecs]

Much much faster, no space issues, no stability issues!

edited May 5 at 17:34

answered May 5 at 12:59

Roman

8,27811238

$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21

3

$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25

1

$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37

$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29

1

$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33

|
show 2 more comments

Using only matrix operations makes this much faster (runtime scales quadratically with npart):

matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
 (Transpose[diff].diff)/npart];

For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.

update

After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:

matrixM = 2*(npart-1)*Covariance[vecs]

Much much faster, no space issues, no stability issues!

edited May 5 at 17:34

answered May 5 at 12:59

Roman

8,27811238

$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21

3

$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25

1

$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37

$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29

1

$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33

|
show 2 more comments

Using only matrix operations makes this much faster (runtime scales quadratically with npart):

matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
 (Transpose[diff].diff)/npart];

For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.

update

After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:

matrixM = 2*(npart-1)*Covariance[vecs]

Much much faster, no space issues, no stability issues!

edited May 5 at 17:34

answered May 5 at 12:59

Roman

8,27811238

Using only matrix operations makes this much faster (runtime scales quadratically with npart):

matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1],
 (Transpose[diff].diff)/npart];

For npart=10000 this will take a lot of memory though: the intermediate array diff will be huge.

update

After the discussions on the other solutions, it occurred to me that what's being calculated here is simply the covariance matrix of the vectors in vecs:

matrixM = 2*(npart-1)*Covariance[vecs]

Much much faster, no space issues, no stability issues!

edited May 5 at 17:34

answered May 5 at 12:59

Roman

8,27811238

edited May 5 at 17:34

answered May 5 at 12:59

Roman

8,27811238

answered May 5 at 12:59

Roman

8,27811238

answered May 5 at 12:59

Roman

8,27811238

$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21

3

$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25

1

$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37

$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29

1

$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33

|
show 2 more comments

$begingroup$
This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?
$endgroup$
– user929304
May 5 at 13:21

3

$begingroup$
@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
$endgroup$
– Roman
May 5 at 13:25

1

$begingroup$
Very instructive, thanks a lot.
$endgroup$
– user929304
May 5 at 13:37

$begingroup$
What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!
$endgroup$
– user929304
May 5 at 18:29

1

$begingroup$
@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.
$endgroup$
– Roman
May 6 at 8:33

This is incredible Roman! I just compared the absolute timings for both approaches, and yours takes 34 times less time to compute! Would you kindly add a brief explanation for how you spotted that the double sum could be translated into a matrix product? In case such translation is not immediately apparent (where the expression is a bit more involved), do you know what was causing the main slow-down in my implementation?

– user929304
May 5 at 13:21

@user929304 every sum over consecutive parts can be written as a dot product. It's a matter of patience and experience to figure out how exactly. As to what slows your implementation down: matrix-matrix dot products are one of the most efficient subroutines available on a computer, as they lie at the core of (almost) every algorithm and entire CPU architectures are designed around making them efficient. You cannot compete with that by writing down your own sum. See en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

– Roman
May 5 at 13:25

Very instructive, thanks a lot.

– user929304
May 5 at 13:37

What an update! Great observation, I myself wasn't aware I was in fact calculating the covariance matrix! Suddenly the computation has become instantaneous (7000 times speed up for one example I just tried). How can it be performed this rapidly!

– user929304
May 5 at 18:29

@CATrevillian When you want to write fast code, try to avoid doing anything "by hand", like looping, summing, etc., and try to think in terms of higher-order abstractions: vectors instead of elements, matrices instead of lists of vectors, dot-products instead of summations, geometric/unitary/orthogonal transformations instead of matrix multiplications, etc. This is the core of mathematics, and used to be even more important back in the days before computers.

– Roman
May 6 at 8:33

|
show 2 more comments

Your double sum can be written in terms of single sums.

$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$

A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by

$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$

where $barr_k^i$ is the mean of the $r_k^i$ values.

nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
 Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles, 
 i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
 2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] - 
 Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)

Using single sums is about 40,000 times faster for 1,000 particles.

edited May 5 at 17:21

answered May 5 at 16:06

JimB

18.9k12863

2

$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43

$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48

$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48

2

$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02

1

$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34

|
show 2 more comments

Your double sum can be written in terms of single sums.

$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$

A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by

$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$

where $barr_k^i$ is the mean of the $r_k^i$ values.

nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
 Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles, 
 i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
 2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] - 
 Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)

Using single sums is about 40,000 times faster for 1,000 particles.

edited May 5 at 17:21

answered May 5 at 16:06

JimB

18.9k12863

2

$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43

$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48

$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48

2

$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02

1

$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34

|
show 2 more comments

Your double sum can be written in terms of single sums.

$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$

A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by

$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$

where $barr_k^i$ is the mean of the $r_k^i$ values.

nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
 Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles, 
 i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
 2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] - 
 Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)

Using single sums is about 40,000 times faster for 1,000 particles.

edited May 5 at 17:21

answered May 5 at 16:06

JimB

18.9k12863

Your double sum can be written in terms of single sums.

$$M_mn=2sum_i=1^N r_m^i r_n^i-2sum_i=1^N r_m^i sum_i=1^N r_n^i/N$$

A theoretically equivalent but more numerically stable version of the above (suggested by @Roman) is given by

$$M_mn=2sum_i=1^N (r_m^i - barr_m^i)(r_n^i-barr_n^i)$$

where $barr_k^i$ is the mean of the $r_k^i$ values.

nparticles = 1000;
SeedRandom[12345];
r = RandomVariate[UniformDistribution[], nparticles, 3];

m = 1; n = 2;
AbsoluteTiming[
 Sum[(r[[i, m]] - r[[j, m]]) (r[[i, n]] - r[[j, n]])/nparticles, 
 i, nparticles, j, nparticles]]
(* 3.4060337777777776`,-2.42222014120762` *)

AbsoluteTiming[
 2 Total[(r[[All, m]] - Total[r[[All, m]]]/nparticles) (r[[All, n]] - 
 Total[r[[All, n]]]/nparticles)]]
(* 0.00008335802469135802`,-2.4222201412076174` *)

Using single sums is about 40,000 times faster for 1,000 particles.

edited May 5 at 17:21

answered May 5 at 16:06

JimB

18.9k12863

edited May 5 at 17:21

answered May 5 at 16:06

JimB

18.9k12863

answered May 5 at 16:06

JimB

18.9k12863

answered May 5 at 16:06

JimB

18.9k12863

2

$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43

$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48

$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48

2

$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02

1

$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34

|
show 2 more comments

2

$begingroup$
I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!
$endgroup$
– Roman
May 5 at 16:43

$begingroup$
@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.
$endgroup$
– JimB
May 5 at 16:48

$begingroup$
As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.
$endgroup$
– Roman
May 5 at 16:48

2

$begingroup$
@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.
$endgroup$
– JimB
May 5 at 17:02

1

$begingroup$
see update on my answer: this robustification means using a built-in Covariance function!
$endgroup$
– Roman
May 5 at 17:34

I was going to write down the same thing and then decided that I wasn't convinced of the numerical stability of the transformation for large lists (difference of products instead of product of differences). For very large $N$ it may be advisable to first shift the numbers $r_m^i$ such that their averages are zero. Nonetheless I'm still wary of this way of doing the sum. It's fast though, sure!

– Roman
May 5 at 16:43

@Roman The single sum approach can certainly be made more numerically stable. The resulting sum is pretty much the same as a estimating a covariance so using similar "robustifying" (if I can make up such a word) techniques (like standardizing the numbers with their respective means and standard deviations) would certainly be in order.

– JimB
May 5 at 16:48

As a concrete example of my concern, using r = RandomVariate[UniformDistribution[], nparticles, 3] + 10^6 already gives the wrong answer with this fast method. You can fix this by first defining r1 = Transpose[Transpose[r] - Mean[r]] and then doing the fast method with r1 instead of r.

– Roman
May 5 at 16:48

@Roman "Unsuspecting users" have kept me in business for quite a long while. However, your point is well taken. I'll modify the above answer with what you've suggested.

– JimB
May 5 at 17:02

see update on my answer: this robustification means using a built-in Covariance function!

– Roman
May 5 at 17:34

|
show 2 more comments

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.

Now we can do this:

npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1], 
 (Transpose[diff].diff)/npart
 ]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
 Block[x, y, col, row,
 x = r[[i]];
 y = r[[j]];
 col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
 U, V = ACACompression[row, col];
 Total[U, 2].Total[V, 2]
 ],
 i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]

16.8389

0.076707

6.0471*10^-15

The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.

edited May 5 at 15:33

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

2

$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36

$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

add a comment |

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.

Now we can do this:

npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1], 
 (Transpose[diff].diff)/npart
 ]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
 Block[x, y, col, row,
 x = r[[i]];
 y = r[[j]];
 col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
 U, V = ACACompression[row, col];
 Total[U, 2].Total[V, 2]
 ],
 i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]

16.8389

0.076707

6.0471*10^-15

The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.

edited May 5 at 15:33

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

2

$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36

$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

add a comment |

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.

Now we can do this:

npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1], 
 (Transpose[diff].diff)/npart
 ]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
 Block[x, y, col, row,
 x = r[[i]];
 y = r[[j]];
 col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
 U, V = ACACompression[row, col];
 Total[U, 2].Total[V, 2]
 ],
 i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]

16.8389

0.076707

6.0471*10^-15

The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.

edited May 5 at 15:33

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

This is a nice application of low rank factorization by Adaptive Cross Approximation. We require the first code block from this post which defines the function ACACompression.

Now we can do this:

npart = 5000;
SeedRandom[1234];
vecs = RandomReal[-1, 1, npart, 3];

(* Roman's proposal*)
matrixM = With[diff = Flatten[Outer[Subtract, vecs, vecs, 1], 1], 
 (Transpose[diff].diff)/npart
 ]; // AbsoluteTiming // First

r = Transpose[vecs];
matrixMACA = Table[
 Block[x, y, col, row,
 x = r[[i]];
 y = r[[j]];
 col = row = k [Function] (x[[k]] - x) (y[[k]] - y)/npart;
 U, V = ACACompression[row, col];
 Total[U, 2].Total[V, 2]
 ],
 i, 1, 3, j, 1, 3]; // AbsoluteTiming // First

Max[Abs[matrixMACA - matrixM]]/Max[Abs[matrixM]]

16.8389

0.076707

6.0471*10^-15

The efficiency and accuracy depend quite much on the distribution of points, but one does not require overly uniformly distributed point clouds in order to make that work.

edited May 5 at 15:33

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

edited May 5 at 15:33

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

answered May 5 at 14:42

Henrik Schumacher

62.2k586172

2

$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36

$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

add a comment |

2

$begingroup$
Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.
$endgroup$
– user929304
May 5 at 18:36

$begingroup$
You're welcome!
$endgroup$
– Henrik Schumacher
May 5 at 18:40

Many thanks Henrik! Excellent complement to the other answers, in particular learning the ACA method will be also very useful for all future and similar computations. This whole post has been so incredibly instructive.

– user929304
May 5 at 18:36

You're welcome!

– Henrik Schumacher
May 5 at 18:40

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematica Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

4,P B,5Sejuk4rz2 OlQv,6V8,DmfZCsg6,oCLB7658,FZbzWhEj4

搜尋此網誌

Otdfbt

3 Answers
3

update

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

update

update

update

update

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

3 Answers 3

update

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

update

update

update

update

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

3 Answers
3

3 Answers
3

3 Answers
3

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070