Why is a `for` loop so much faster to count True values?Python built-in sum function vs. for loop performancestrpbrk() in PythonImprove INSERT-per-second performance of SQLite?Python progression path - From apprentice to guruWhy are elementwise additions much faster in separate loops than in a combined loop?Why is reading lines from stdin much slower in C++ than Python?Why is processing a sorted array faster than processing an unsorted array?Why does Python code run faster in a function?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy is [] faster than list()?Why does “not(True) in [False, True]” return False?Why is 2 * (i * i) faster than 2 * i * i in Java?

What would be the way to say "just saying" in German? (Not the literal translation)

A word that means "blending into a community too much"

Why am I Seeing A Weird "Notch" on the Data Line For Some Logical 1s?

Is there a DSLR/mirorless camera with minimal options like a classic, simple SLR?

Is it possible to have 2 different but equal size real number sets that have the same mean and standard deviation?

How do free-speech protections in the United States apply in public to corporate misrepresentations?

Are there any normal animals in Pokemon universe?

If I leave the US through an airport, do I have to return through the same airport?

Electricity free spaceship

What are some really overused phrases in French that are common nowadays?

How to make insert mode mapping count as multiple undos?

Generate basis elements of the Steenrod algebra

Why can my keyboard only digest 6 keypresses at a time?

Is it possible to fly backward if you have really strong headwind?

Can I utilise a baking stone to make crepes?

How can I end combat quickly when the outcome is inevitable?

Is there a set of positive integers of density 1 which contains no infinite arithmetic progression?

Does putting salt first make it easier for attacker to bruteforce the hash?

How to hide rifle during medieval town entrance inspection?

How creative should the DM let an artificer be in terms of what they can build?

Understanding "Current Draw" in terms of "Ohm's Law"

How do photos of the same subject compare between the Nikon D700 and D70?

What does 思ってやっている mean?

Origin of "boor"



Why is a `for` loop so much faster to count True values?


Python built-in sum function vs. for loop performancestrpbrk() in PythonImprove INSERT-per-second performance of SQLite?Python progression path - From apprentice to guruWhy are elementwise additions much faster in separate loops than in a combined loop?Why is reading lines from stdin much slower in C++ than Python?Why is processing a sorted array faster than processing an unsorted array?Why does Python code run faster in a function?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsWhy is [] faster than list()?Why does “not(True) in [False, True]” return False?Why is 2 * (i * i) faster than 2 * i * i in Java?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








42















I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):



def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count

def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))


In addition I looked at using a list comprehension and list.count:



def count_even_digits_spyr03_list(n):
return [c in "02468" for c in str(n)].count(True)


The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:



enter image description here



Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?




Using an if as a filter like so:



def count_even_digits_spyr03_sum2(n):
return sum(1 for c in str(n) if c in "02468")


Improves the timing only to the same level as the list comprehension.




When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:



enter image description here










share|improve this question



















  • 5





    You're summing a lot of zeroes, what if you move c in "02468" to the end of the list comprehension, as a filter, and make the value 1?

    – jonrsharpe
    May 24 at 7:50











  • @jonrsharpe That improves the timing to be about the same as the list comprehension with list.count.

    – Graipher
    May 24 at 7:54











  • If you profile your code, you will probably see that your timings are dominated by function calls, which are very expensive in Python. count_even_digits_spyr03_for has only one (str(n)), while all the others have two. For much larger ns you should probably start seeing sum() with @jonrsharpe edit take over this.

    – norok2
    May 24 at 8:04












  • @norok2 That's not true, all the functions only have one str(n) call in them, the statement after for ... in only gets executed once, even if it's a comprehension.

    – Markus Meskanen
    May 24 at 9:52







  • 1





    @MarkusMeskanen There is the additional call to sum, I guess.

    – Graipher
    May 24 at 9:56

















42















I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):



def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count

def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))


In addition I looked at using a list comprehension and list.count:



def count_even_digits_spyr03_list(n):
return [c in "02468" for c in str(n)].count(True)


The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:



enter image description here



Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?




Using an if as a filter like so:



def count_even_digits_spyr03_sum2(n):
return sum(1 for c in str(n) if c in "02468")


Improves the timing only to the same level as the list comprehension.




When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:



enter image description here










share|improve this question



















  • 5





    You're summing a lot of zeroes, what if you move c in "02468" to the end of the list comprehension, as a filter, and make the value 1?

    – jonrsharpe
    May 24 at 7:50











  • @jonrsharpe That improves the timing to be about the same as the list comprehension with list.count.

    – Graipher
    May 24 at 7:54











  • If you profile your code, you will probably see that your timings are dominated by function calls, which are very expensive in Python. count_even_digits_spyr03_for has only one (str(n)), while all the others have two. For much larger ns you should probably start seeing sum() with @jonrsharpe edit take over this.

    – norok2
    May 24 at 8:04












  • @norok2 That's not true, all the functions only have one str(n) call in them, the statement after for ... in only gets executed once, even if it's a comprehension.

    – Markus Meskanen
    May 24 at 9:52







  • 1





    @MarkusMeskanen There is the additional call to sum, I guess.

    – Graipher
    May 24 at 9:56













42












42








42


6






I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):



def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count

def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))


In addition I looked at using a list comprehension and list.count:



def count_even_digits_spyr03_list(n):
return [c in "02468" for c in str(n)].count(True)


The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:



enter image description here



Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?




Using an if as a filter like so:



def count_even_digits_spyr03_sum2(n):
return sum(1 for c in str(n) if c in "02468")


Improves the timing only to the same level as the list comprehension.




When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:



enter image description here










share|improve this question
















I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):



def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count

def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))


In addition I looked at using a list comprehension and list.count:



def count_even_digits_spyr03_list(n):
return [c in "02468" for c in str(n)].count(True)


The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:



enter image description here



Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?




Using an if as a filter like so:



def count_even_digits_spyr03_sum2(n):
return sum(1 for c in str(n) if c in "02468")


Improves the timing only to the same level as the list comprehension.




When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:



enter image description here







python python-3.x performance for-loop sum






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 24 at 10:12







Graipher

















asked May 24 at 7:45









GraipherGraipher

4,9811836




4,9811836







  • 5





    You're summing a lot of zeroes, what if you move c in "02468" to the end of the list comprehension, as a filter, and make the value 1?

    – jonrsharpe
    May 24 at 7:50











  • @jonrsharpe That improves the timing to be about the same as the list comprehension with list.count.

    – Graipher
    May 24 at 7:54











  • If you profile your code, you will probably see that your timings are dominated by function calls, which are very expensive in Python. count_even_digits_spyr03_for has only one (str(n)), while all the others have two. For much larger ns you should probably start seeing sum() with @jonrsharpe edit take over this.

    – norok2
    May 24 at 8:04












  • @norok2 That's not true, all the functions only have one str(n) call in them, the statement after for ... in only gets executed once, even if it's a comprehension.

    – Markus Meskanen
    May 24 at 9:52







  • 1





    @MarkusMeskanen There is the additional call to sum, I guess.

    – Graipher
    May 24 at 9:56












  • 5





    You're summing a lot of zeroes, what if you move c in "02468" to the end of the list comprehension, as a filter, and make the value 1?

    – jonrsharpe
    May 24 at 7:50











  • @jonrsharpe That improves the timing to be about the same as the list comprehension with list.count.

    – Graipher
    May 24 at 7:54











  • If you profile your code, you will probably see that your timings are dominated by function calls, which are very expensive in Python. count_even_digits_spyr03_for has only one (str(n)), while all the others have two. For much larger ns you should probably start seeing sum() with @jonrsharpe edit take over this.

    – norok2
    May 24 at 8:04












  • @norok2 That's not true, all the functions only have one str(n) call in them, the statement after for ... in only gets executed once, even if it's a comprehension.

    – Markus Meskanen
    May 24 at 9:52







  • 1





    @MarkusMeskanen There is the additional call to sum, I guess.

    – Graipher
    May 24 at 9:56







5




5





You're summing a lot of zeroes, what if you move c in "02468" to the end of the list comprehension, as a filter, and make the value 1?

– jonrsharpe
May 24 at 7:50





You're summing a lot of zeroes, what if you move c in "02468" to the end of the list comprehension, as a filter, and make the value 1?

– jonrsharpe
May 24 at 7:50













@jonrsharpe That improves the timing to be about the same as the list comprehension with list.count.

– Graipher
May 24 at 7:54





@jonrsharpe That improves the timing to be about the same as the list comprehension with list.count.

– Graipher
May 24 at 7:54













If you profile your code, you will probably see that your timings are dominated by function calls, which are very expensive in Python. count_even_digits_spyr03_for has only one (str(n)), while all the others have two. For much larger ns you should probably start seeing sum() with @jonrsharpe edit take over this.

– norok2
May 24 at 8:04






If you profile your code, you will probably see that your timings are dominated by function calls, which are very expensive in Python. count_even_digits_spyr03_for has only one (str(n)), while all the others have two. For much larger ns you should probably start seeing sum() with @jonrsharpe edit take over this.

– norok2
May 24 at 8:04














@norok2 That's not true, all the functions only have one str(n) call in them, the statement after for ... in only gets executed once, even if it's a comprehension.

– Markus Meskanen
May 24 at 9:52






@norok2 That's not true, all the functions only have one str(n) call in them, the statement after for ... in only gets executed once, even if it's a comprehension.

– Markus Meskanen
May 24 at 9:52





1




1





@MarkusMeskanen There is the additional call to sum, I guess.

– Graipher
May 24 at 9:56





@MarkusMeskanen There is the additional call to sum, I guess.

– Graipher
May 24 at 9:56












5 Answers
5






active

oldest

votes


















33














sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:



  • The use of a generator expression causes overhead for constantly pausing and resuming the generator.

  • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.

  • Adding booleans instead of ints prevents sum from using its integer fast path.


Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.



If we replace the genexp with a list comprehension:



In [66]: def f1(x):
....: return sum(c in '02468' for c in str(x))
....:
In [67]: def f2(x):
....: return sum([c in '02468' for c in str(x)])
....:
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop


we see an immediate speedup, at the cost of wasting a bunch of memory on a list.




If you look at your genexp version:



def count_even_digits_spyr03_sum(n):
return sum(c in "02468" for c in str(n))


you'll see it has no if. It just throws booleans into sum. In constrast, your loop:



def count_even_digits_spyr03_for(n):
count = 0
for c in str(n):
if c in "02468":
count += 1
return count


only adds anything if the digit is even.



If we change the f2 defined earlier to also incorporate an if, we see another speedup:



In [71]: def f3(x):
....: return sum([True for c in str(x) if c in '02468'])
....:
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop


f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.




It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:



if (PyLong_CheckExact(item)) {


Once we make the final change, changing True to 1:



In [73]: def f4(x):
....: return sum([1 for c in str(x) if c in '02468'])
....:
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop


we see one last small speedup.




So after all that, do we beat the explicit loop?



In [75]: def explicit_loop(x):
....: count = 0
....: for c in str(x):
....: if c in '02468':
....: count += 1
....: return count
....:
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop


Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.



Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:



def f5(x):
return len([1 for c in str(x) if c in '02468'])


Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:



def f6(x):
s = str(x)
return sum(s.count(c) for c in '02468')


Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:



>>> import timeit
>>> def f(x):
... return sum([1 for c in str(x) if c in '02468'])
...
>>> def g(x):
... return len([1 for c in str(x) if c in '02468'])
...
>>> def h(x):
... s = str(x)
... return sum(s.count(c) for c in '02468')
...
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
... count = 0
... for c in str(x):
... if c in '02468':
... count += 1
... return count
...
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128





share|improve this answer




















  • 2





    Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

    – Graipher
    May 25 at 8:28


















9














@MarkusMeskanen's answer has the right bits – function calls are slow, and both genexprs and listcomps are basically function calls.



Anyway, to be pragmatic:



Using str.count(c) is faster, and this related answer of mine about strpbrk() in Python could make things faster still.



def count_even_digits_spyr03_count(n):
s = str(n)
return sum(s.count(c) for c in "02468")


def count_even_digits_spyr03_count_unrolled(n):
s = str(n)
return s.count("0") + s.count("2") + s.count("4") + s.count("6") + s.count("8")


Results:



string length: 502
count_even_digits_spyr03_list 0.04157966522
count_even_digits_spyr03_sum 0.05678154459
count_even_digits_spyr03_for 0.036128606150000006
count_even_digits_spyr03_count 0.010441866129999991
count_even_digits_spyr03_count_unrolled 0.009662931009999999





share|improve this answer























  • Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

    – Graipher
    May 24 at 10:12







  • 2





    That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

    – AKX
    May 24 at 10:17






  • 1





    If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

    – Graipher
    May 24 at 10:22






  • 1





    @Graipher Feel free to. :)

    – AKX
    May 24 at 10:29


















9














If we use dis.dis(), we can see how the functions actually behave.



count_even_digits_spyr03_for():



 7 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (count)

8 6 SETUP_LOOP 42 (to 51)
9 LOAD_GLOBAL 0 (str)
12 LOAD_GLOBAL 1 (n)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
>> 19 FOR_ITER 28 (to 50)
22 STORE_FAST 1 (c)

9 25 LOAD_FAST 1 (c)
28 LOAD_CONST 2 ('02468')
31 COMPARE_OP 6 (in)
34 POP_JUMP_IF_FALSE 19

10 37 LOAD_FAST 0 (count)
40 LOAD_CONST 3 (1)
43 INPLACE_ADD
44 STORE_FAST 0 (count)
47 JUMP_ABSOLUTE 19
>> 50 POP_BLOCK

11 >> 51 LOAD_FAST 0 (count)
54 RETURN_VALUE


We can see that there's only one function call, that's to str() at the beginning:



9 LOAD_GLOBAL 0 (str)
...
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)


Rest of it is highly optimized code, using jumps, stores, and inplace adding.



What comes to count_even_digits_spyr03_sum():



 14 0 LOAD_GLOBAL 0 (sum)
3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (str)
15 LOAD_GLOBAL 2 (n)
18 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
21 GET_ITER
22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
28 RETURN_VALUE


While I can't perfectly explain the differences, we can clearly see that there are more function calls (probably sum() and in(?)), which make the code run much slower than executing the machine instructions directly.






share|improve this answer




















  • 1





    This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

    – Markus Meskanen
    May 24 at 10:04







  • 4





    The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

    – user2357112
    May 24 at 19:39


















4














There are a few differences that actually contribute to the observed performance differences. I aim to give a high-level overview of these differences but try not to go too much into the low-level details or possible improvements. For the benchmarks I use my own package simple_benchmark.



Generators vs. for loops



Generators and generator expressions are syntactic sugar that can be used instead of writing iterator classes.



When you write a generator like:



def count_even(num):
s = str(num)
for c in s:
yield c in '02468'


Or a generator expression:



(c in '02468' for c in str(num))


That will be transformed (behind the scenes) into a state machine that is accessible through an iterator class. In the end it will be roughly equivalent to (although the actual code generated around a generator will be faster):



class Count:
def __init__(self, num):
self.str_num = iter(str(num))

def __iter__(self):
return self

def __next__(self):
c = next(self.str_num)
return c in '02468'


So a generator will always have one additional layer of indirection. That means that advancing the generator (or generator expression or iterator) means that you call __next__ on the iterator that is generated by the generator which itself calls __next__ on the object you actually want to iterate over. But it also has some overhead because you actually need to create one additional "iterator instance". Typically these overheads are negligible if you do anything substantial in each iteration.



Just to provide an example how much overhead a generator imposes compared to a manual loop:



import matplotlib.pyplot as plt
from simple_benchmark import BenchmarkBuilder
%matplotlib notebook

bench = BenchmarkBuilder()

@bench.add_function()
def iteration(it):
for i in it:
pass

@bench.add_function()
def generator(it):
it = (item for item in it)
for i in it:
pass

@bench.add_arguments()
def argument_provider():
for i in range(2, 15):
size = 2**i
yield size, [1 for _ in range(size)]

plt.figure()
result = bench.run()
result.plot()


enter image description here



Generators vs. List comprehensions



Generators have the advantage that they don't create a list, they "produce" the values one-by-one. So while a generator has the overhead of the "iterator class" it can save the memory for creating an intermediate list. It's a trade-off between speed (list comprehension) and memory (generators). This has been discussed in various posts around StackOverflow so I don't want to go into much more detail here.



import matplotlib.pyplot as plt
from simple_benchmark import BenchmarkBuilder
%matplotlib notebook

bench = BenchmarkBuilder()

@bench.add_function()
def generator_expression(it):
it = (item for item in it)
for i in it:
pass

@bench.add_function()
def list_comprehension(it):
it = [item for item in it]
for i in it:
pass

@bench.add_arguments('size')
def argument_provider():
for i in range(2, 15):
size = 2**i
yield size, list(range(size))

plt.figure()
result = bench.run()
result.plot()


enter image description here




sum should be faster than manual iteration



Yes, sum is indeed faster than an explicit for loop. Especially if you iterate over integers.



import matplotlib.pyplot as plt
from simple_benchmark import BenchmarkBuilder
%matplotlib notebook

bench = BenchmarkBuilder()

@bench.add_function()
def my_sum(it):
sum_ = 0
for i in it:
sum_ += i
return sum_

bench.add_function()(sum)

@bench.add_arguments()
def argument_provider():
for i in range(2, 15):
size = 2**i
yield size, [1 for _ in range(size)]

plt.figure()
result = bench.run()
result.plot()


enter image description here



String methods vs. Any kind of Python loop



To understand the performance difference when using string methods like str.count compared to loops (explicit or implicit) is that strings in Python are actually stored as values in an (internal) array. That means a loop doesn't actually call any __next__ methods, it can use a loop directly over the array, this will be significantly faster. However it also imposes a method lookup and a method call on the string, that's why it's slower for very short numbers.



Just to provide a small comparison how long it takes to iterate a string vs. how long it takes Python to iterate over the internal array:



import matplotlib.pyplot as plt
from simple_benchmark import BenchmarkBuilder
%matplotlib notebook

bench = BenchmarkBuilder()

@bench.add_function()
def string_iteration(s):
# there is no "a" in the string, so this iterates over the whole string
return 'a' in s

@bench.add_function()
def python_iteration(s):
for c in s:
pass

@bench.add_arguments('string length')
def argument_provider():
for i in range(2, 20):
size = 2**i
yield size, '1'*size

plt.figure()
result = bench.run()
result.plot()


In this benchmark it's ~200 times faster to let Python do the iteration over the string than to iterate over the string with a for loop.



enter image description here



Why do all of them converge for large numbers?



This is actually because the number to string conversion will be dominant there. So for really huge numbers you're essentially just measuring how long it takes to convert that number to a string.



You'll see the difference if you compare the versions that take a number and convert it to a string with the one that take the converted number (I use the functions from another answer here to illustrate that). Left is the number-benchmark and on the right is the benchmark that takes the strings - also the y-axis is the same for both plots:
enter image description here



As you can see the benchmarks for the functions that take the string are significantly faster for large numbers than the ones that take a number and convert them to a string inside. This indicates that the string-conversion is the "bottleneck" for large numbers. For convenience I also included a benchmark only doing the string conversion to the left plot (which becomes significant/dominant for large numbers).



%matplotlib notebook

from simple_benchmark import BenchmarkBuilder
import matplotlib.pyplot as plt
import random

bench1 = BenchmarkBuilder()

@bench1.add_function()
def f1(x):
return sum(c in '02468' for c in str(x))

@bench1.add_function()
def f2(x):
return sum([c in '02468' for c in str(x)])

@bench1.add_function()
def f3(x):
return sum([True for c in str(x) if c in '02468'])

@bench1.add_function()
def f4(x):
return sum([1 for c in str(x) if c in '02468'])

@bench1.add_function()
def explicit_loop(x):
count = 0
for c in str(x):
if c in '02468':
count += 1
return count

@bench1.add_function()
def f5(x):
s = str(x)
return sum(s.count(c) for c in '02468')

bench1.add_function()(str)

@bench1.add_arguments(name='number length')
def arg_provider():
for i in range(2, 15):
size = 2 ** i
yield (2**i, int(''.join(str(random.randint(0, 9)) for _ in range(size))))


bench2 = BenchmarkBuilder()

@bench2.add_function()
def f1(x):
return sum(c in '02468' for c in x)

@bench2.add_function()
def f2(x):
return sum([c in '02468' for c in x])

@bench2.add_function()
def f3(x):
return sum([True for c in x if c in '02468'])

@bench2.add_function()
def f4(x):
return sum([1 for c in x if c in '02468'])

@bench2.add_function()
def explicit_loop(x):
count = 0
for c in x:
if c in '02468':
count += 1
return count

@bench2.add_function()
def f5(x):
return sum(x.count(c) for c in '02468')

@bench2.add_arguments(name='number length')
def arg_provider():
for i in range(2, 15):
size = 2 ** i
yield (2**i, ''.join(str(random.randint(0, 9)) for _ in range(size)))

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
b1 = bench1.run()
b2 = bench2.run()
b1.plot(ax=ax1)
b2.plot(ax=ax2)
ax1.set_title('Number')
ax2.set_title('String')





share|improve this answer
































    0














    All your functions contain equal number of calls to str(n) (one call) and c in "02468" (for every c in n). Since then I would like to simplify:



    import timeit

    num = ''.join(str(i % 10) for i in range(1, 10000001))

    def count_simple_sum():
    return sum(1 for c in num)

    def count_simple_for():
    count = 0
    for c in num:
    count += 1
    return count


    print('For Loop Sum:', timeit.timeit(count_simple_for, number=10))
    print('Built-in Sum:', timeit.timeit(count_simple_sum, number=10))


    sum is still slower:



    For Loop Sum: 2.8987821330083534
    Built-in Sum: 3.245505138998851


    The key difference between these two functions is that in count_simple_for you are iterating only throw num with pure for loop for c in num, but in count_simple_sum your are creating a generator object here (from @Markus Meskanen answer with dis.dis):



     3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
    6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')


    sum is iterating over this generator object to sum the elements produced, and this generator is iterating over elements in num to produce 1 on each element. Having one more iteration step is expensive because it requires to call generator.__next__() on each element and these calls are placed into try: ... except StopIteration: block which also adds some overhead.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56288015%2fwhy-is-a-for-loop-so-much-faster-to-count-true-values%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      33














      sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:



      • The use of a generator expression causes overhead for constantly pausing and resuming the generator.

      • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.

      • Adding booleans instead of ints prevents sum from using its integer fast path.


      Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.



      If we replace the genexp with a list comprehension:



      In [66]: def f1(x):
      ....: return sum(c in '02468' for c in str(x))
      ....:
      In [67]: def f2(x):
      ....: return sum([c in '02468' for c in str(x)])
      ....:
      In [68]: x = int('1234567890'*50)
      In [69]: %timeit f1(x)
      10000 loops, best of 5: 52.2 µs per loop
      In [70]: %timeit f2(x)
      10000 loops, best of 5: 40.5 µs per loop


      we see an immediate speedup, at the cost of wasting a bunch of memory on a list.




      If you look at your genexp version:



      def count_even_digits_spyr03_sum(n):
      return sum(c in "02468" for c in str(n))


      you'll see it has no if. It just throws booleans into sum. In constrast, your loop:



      def count_even_digits_spyr03_for(n):
      count = 0
      for c in str(n):
      if c in "02468":
      count += 1
      return count


      only adds anything if the digit is even.



      If we change the f2 defined earlier to also incorporate an if, we see another speedup:



      In [71]: def f3(x):
      ....: return sum([True for c in str(x) if c in '02468'])
      ....:
      In [72]: %timeit f3(x)
      10000 loops, best of 5: 34.9 µs per loop


      f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.




      It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:



      if (PyLong_CheckExact(item)) {


      Once we make the final change, changing True to 1:



      In [73]: def f4(x):
      ....: return sum([1 for c in str(x) if c in '02468'])
      ....:
      In [74]: %timeit f4(x)
      10000 loops, best of 5: 33.3 µs per loop


      we see one last small speedup.




      So after all that, do we beat the explicit loop?



      In [75]: def explicit_loop(x):
      ....: count = 0
      ....: for c in str(x):
      ....: if c in '02468':
      ....: count += 1
      ....: return count
      ....:
      In [76]: %timeit explicit_loop(x)
      10000 loops, best of 5: 32.7 µs per loop


      Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.



      Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:



      def f5(x):
      return len([1 for c in str(x) if c in '02468'])


      Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:



      def f6(x):
      s = str(x)
      return sum(s.count(c) for c in '02468')


      Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:



      >>> import timeit
      >>> def f(x):
      ... return sum([1 for c in str(x) if c in '02468'])
      ...
      >>> def g(x):
      ... return len([1 for c in str(x) if c in '02468'])
      ...
      >>> def h(x):
      ... s = str(x)
      ... return sum(s.count(c) for c in '02468')
      ...
      >>> x = int('1234567890'*50)
      >>> timeit.timeit(lambda: f(x), number=10000)
      0.331528635986615
      >>> timeit.timeit(lambda: g(x), number=10000)
      0.30292080697836354
      >>> timeit.timeit(lambda: h(x), number=10000)
      0.15950968803372234
      >>> def explicit_loop(x):
      ... count = 0
      ... for c in str(x):
      ... if c in '02468':
      ... count += 1
      ... return count
      ...
      >>> timeit.timeit(lambda: explicit_loop(x), number=10000)
      0.3305045129964128





      share|improve this answer




















      • 2





        Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

        – Graipher
        May 25 at 8:28















      33














      sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:



      • The use of a generator expression causes overhead for constantly pausing and resuming the generator.

      • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.

      • Adding booleans instead of ints prevents sum from using its integer fast path.


      Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.



      If we replace the genexp with a list comprehension:



      In [66]: def f1(x):
      ....: return sum(c in '02468' for c in str(x))
      ....:
      In [67]: def f2(x):
      ....: return sum([c in '02468' for c in str(x)])
      ....:
      In [68]: x = int('1234567890'*50)
      In [69]: %timeit f1(x)
      10000 loops, best of 5: 52.2 µs per loop
      In [70]: %timeit f2(x)
      10000 loops, best of 5: 40.5 µs per loop


      we see an immediate speedup, at the cost of wasting a bunch of memory on a list.




      If you look at your genexp version:



      def count_even_digits_spyr03_sum(n):
      return sum(c in "02468" for c in str(n))


      you'll see it has no if. It just throws booleans into sum. In constrast, your loop:



      def count_even_digits_spyr03_for(n):
      count = 0
      for c in str(n):
      if c in "02468":
      count += 1
      return count


      only adds anything if the digit is even.



      If we change the f2 defined earlier to also incorporate an if, we see another speedup:



      In [71]: def f3(x):
      ....: return sum([True for c in str(x) if c in '02468'])
      ....:
      In [72]: %timeit f3(x)
      10000 loops, best of 5: 34.9 µs per loop


      f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.




      It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:



      if (PyLong_CheckExact(item)) {


      Once we make the final change, changing True to 1:



      In [73]: def f4(x):
      ....: return sum([1 for c in str(x) if c in '02468'])
      ....:
      In [74]: %timeit f4(x)
      10000 loops, best of 5: 33.3 µs per loop


      we see one last small speedup.




      So after all that, do we beat the explicit loop?



      In [75]: def explicit_loop(x):
      ....: count = 0
      ....: for c in str(x):
      ....: if c in '02468':
      ....: count += 1
      ....: return count
      ....:
      In [76]: %timeit explicit_loop(x)
      10000 loops, best of 5: 32.7 µs per loop


      Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.



      Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:



      def f5(x):
      return len([1 for c in str(x) if c in '02468'])


      Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:



      def f6(x):
      s = str(x)
      return sum(s.count(c) for c in '02468')


      Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:



      >>> import timeit
      >>> def f(x):
      ... return sum([1 for c in str(x) if c in '02468'])
      ...
      >>> def g(x):
      ... return len([1 for c in str(x) if c in '02468'])
      ...
      >>> def h(x):
      ... s = str(x)
      ... return sum(s.count(c) for c in '02468')
      ...
      >>> x = int('1234567890'*50)
      >>> timeit.timeit(lambda: f(x), number=10000)
      0.331528635986615
      >>> timeit.timeit(lambda: g(x), number=10000)
      0.30292080697836354
      >>> timeit.timeit(lambda: h(x), number=10000)
      0.15950968803372234
      >>> def explicit_loop(x):
      ... count = 0
      ... for c in str(x):
      ... if c in '02468':
      ... count += 1
      ... return count
      ...
      >>> timeit.timeit(lambda: explicit_loop(x), number=10000)
      0.3305045129964128





      share|improve this answer




















      • 2





        Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

        – Graipher
        May 25 at 8:28













      33












      33








      33







      sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:



      • The use of a generator expression causes overhead for constantly pausing and resuming the generator.

      • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.

      • Adding booleans instead of ints prevents sum from using its integer fast path.


      Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.



      If we replace the genexp with a list comprehension:



      In [66]: def f1(x):
      ....: return sum(c in '02468' for c in str(x))
      ....:
      In [67]: def f2(x):
      ....: return sum([c in '02468' for c in str(x)])
      ....:
      In [68]: x = int('1234567890'*50)
      In [69]: %timeit f1(x)
      10000 loops, best of 5: 52.2 µs per loop
      In [70]: %timeit f2(x)
      10000 loops, best of 5: 40.5 µs per loop


      we see an immediate speedup, at the cost of wasting a bunch of memory on a list.




      If you look at your genexp version:



      def count_even_digits_spyr03_sum(n):
      return sum(c in "02468" for c in str(n))


      you'll see it has no if. It just throws booleans into sum. In constrast, your loop:



      def count_even_digits_spyr03_for(n):
      count = 0
      for c in str(n):
      if c in "02468":
      count += 1
      return count


      only adds anything if the digit is even.



      If we change the f2 defined earlier to also incorporate an if, we see another speedup:



      In [71]: def f3(x):
      ....: return sum([True for c in str(x) if c in '02468'])
      ....:
      In [72]: %timeit f3(x)
      10000 loops, best of 5: 34.9 µs per loop


      f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.




      It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:



      if (PyLong_CheckExact(item)) {


      Once we make the final change, changing True to 1:



      In [73]: def f4(x):
      ....: return sum([1 for c in str(x) if c in '02468'])
      ....:
      In [74]: %timeit f4(x)
      10000 loops, best of 5: 33.3 µs per loop


      we see one last small speedup.




      So after all that, do we beat the explicit loop?



      In [75]: def explicit_loop(x):
      ....: count = 0
      ....: for c in str(x):
      ....: if c in '02468':
      ....: count += 1
      ....: return count
      ....:
      In [76]: %timeit explicit_loop(x)
      10000 loops, best of 5: 32.7 µs per loop


      Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.



      Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:



      def f5(x):
      return len([1 for c in str(x) if c in '02468'])


      Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:



      def f6(x):
      s = str(x)
      return sum(s.count(c) for c in '02468')


      Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:



      >>> import timeit
      >>> def f(x):
      ... return sum([1 for c in str(x) if c in '02468'])
      ...
      >>> def g(x):
      ... return len([1 for c in str(x) if c in '02468'])
      ...
      >>> def h(x):
      ... s = str(x)
      ... return sum(s.count(c) for c in '02468')
      ...
      >>> x = int('1234567890'*50)
      >>> timeit.timeit(lambda: f(x), number=10000)
      0.331528635986615
      >>> timeit.timeit(lambda: g(x), number=10000)
      0.30292080697836354
      >>> timeit.timeit(lambda: h(x), number=10000)
      0.15950968803372234
      >>> def explicit_loop(x):
      ... count = 0
      ... for c in str(x):
      ... if c in '02468':
      ... count += 1
      ... return count
      ...
      >>> timeit.timeit(lambda: explicit_loop(x), number=10000)
      0.3305045129964128





      share|improve this answer















      sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:



      • The use of a generator expression causes overhead for constantly pausing and resuming the generator.

      • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.

      • Adding booleans instead of ints prevents sum from using its integer fast path.


      Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.



      If we replace the genexp with a list comprehension:



      In [66]: def f1(x):
      ....: return sum(c in '02468' for c in str(x))
      ....:
      In [67]: def f2(x):
      ....: return sum([c in '02468' for c in str(x)])
      ....:
      In [68]: x = int('1234567890'*50)
      In [69]: %timeit f1(x)
      10000 loops, best of 5: 52.2 µs per loop
      In [70]: %timeit f2(x)
      10000 loops, best of 5: 40.5 µs per loop


      we see an immediate speedup, at the cost of wasting a bunch of memory on a list.




      If you look at your genexp version:



      def count_even_digits_spyr03_sum(n):
      return sum(c in "02468" for c in str(n))


      you'll see it has no if. It just throws booleans into sum. In constrast, your loop:



      def count_even_digits_spyr03_for(n):
      count = 0
      for c in str(n):
      if c in "02468":
      count += 1
      return count


      only adds anything if the digit is even.



      If we change the f2 defined earlier to also incorporate an if, we see another speedup:



      In [71]: def f3(x):
      ....: return sum([True for c in str(x) if c in '02468'])
      ....:
      In [72]: %timeit f3(x)
      10000 loops, best of 5: 34.9 µs per loop


      f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.




      It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:



      if (PyLong_CheckExact(item)) {


      Once we make the final change, changing True to 1:



      In [73]: def f4(x):
      ....: return sum([1 for c in str(x) if c in '02468'])
      ....:
      In [74]: %timeit f4(x)
      10000 loops, best of 5: 33.3 µs per loop


      we see one last small speedup.




      So after all that, do we beat the explicit loop?



      In [75]: def explicit_loop(x):
      ....: count = 0
      ....: for c in str(x):
      ....: if c in '02468':
      ....: count += 1
      ....: return count
      ....:
      In [76]: %timeit explicit_loop(x)
      10000 loops, best of 5: 32.7 µs per loop


      Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.



      Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:



      def f5(x):
      return len([1 for c in str(x) if c in '02468'])


      Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:



      def f6(x):
      s = str(x)
      return sum(s.count(c) for c in '02468')


      Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:



      >>> import timeit
      >>> def f(x):
      ... return sum([1 for c in str(x) if c in '02468'])
      ...
      >>> def g(x):
      ... return len([1 for c in str(x) if c in '02468'])
      ...
      >>> def h(x):
      ... s = str(x)
      ... return sum(s.count(c) for c in '02468')
      ...
      >>> x = int('1234567890'*50)
      >>> timeit.timeit(lambda: f(x), number=10000)
      0.331528635986615
      >>> timeit.timeit(lambda: g(x), number=10000)
      0.30292080697836354
      >>> timeit.timeit(lambda: h(x), number=10000)
      0.15950968803372234
      >>> def explicit_loop(x):
      ... count = 0
      ... for c in str(x):
      ... if c in '02468':
      ... count += 1
      ... return count
      ...
      >>> timeit.timeit(lambda: explicit_loop(x), number=10000)
      0.3305045129964128






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited May 25 at 18:59

























      answered May 24 at 20:40









      user2357112user2357112

      163k14191286




      163k14191286







      • 2





        Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

        – Graipher
        May 25 at 8:28












      • 2





        Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

        – Graipher
        May 25 at 8:28







      2




      2





      Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

      – Graipher
      May 25 at 8:28





      Thanks for the nice answer! Yours was able to most concisely and precisely describe all the problems, so I chose it as the accepted. The first point, the overhead of the generator expression, I probably would have figured out eventually, the non-existing if was the one I could see but deemed insufficient to explain the difference, but the bool not counting as int in this case I definitely did not think could be a reason.

      – Graipher
      May 25 at 8:28













      9














      @MarkusMeskanen's answer has the right bits – function calls are slow, and both genexprs and listcomps are basically function calls.



      Anyway, to be pragmatic:



      Using str.count(c) is faster, and this related answer of mine about strpbrk() in Python could make things faster still.



      def count_even_digits_spyr03_count(n):
      s = str(n)
      return sum(s.count(c) for c in "02468")


      def count_even_digits_spyr03_count_unrolled(n):
      s = str(n)
      return s.count("0") + s.count("2") + s.count("4") + s.count("6") + s.count("8")


      Results:



      string length: 502
      count_even_digits_spyr03_list 0.04157966522
      count_even_digits_spyr03_sum 0.05678154459
      count_even_digits_spyr03_for 0.036128606150000006
      count_even_digits_spyr03_count 0.010441866129999991
      count_even_digits_spyr03_count_unrolled 0.009662931009999999





      share|improve this answer























      • Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

        – Graipher
        May 24 at 10:12







      • 2





        That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

        – AKX
        May 24 at 10:17






      • 1





        If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

        – Graipher
        May 24 at 10:22






      • 1





        @Graipher Feel free to. :)

        – AKX
        May 24 at 10:29















      9














      @MarkusMeskanen's answer has the right bits – function calls are slow, and both genexprs and listcomps are basically function calls.



      Anyway, to be pragmatic:



      Using str.count(c) is faster, and this related answer of mine about strpbrk() in Python could make things faster still.



      def count_even_digits_spyr03_count(n):
      s = str(n)
      return sum(s.count(c) for c in "02468")


      def count_even_digits_spyr03_count_unrolled(n):
      s = str(n)
      return s.count("0") + s.count("2") + s.count("4") + s.count("6") + s.count("8")


      Results:



      string length: 502
      count_even_digits_spyr03_list 0.04157966522
      count_even_digits_spyr03_sum 0.05678154459
      count_even_digits_spyr03_for 0.036128606150000006
      count_even_digits_spyr03_count 0.010441866129999991
      count_even_digits_spyr03_count_unrolled 0.009662931009999999





      share|improve this answer























      • Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

        – Graipher
        May 24 at 10:12







      • 2





        That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

        – AKX
        May 24 at 10:17






      • 1





        If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

        – Graipher
        May 24 at 10:22






      • 1





        @Graipher Feel free to. :)

        – AKX
        May 24 at 10:29













      9












      9








      9







      @MarkusMeskanen's answer has the right bits – function calls are slow, and both genexprs and listcomps are basically function calls.



      Anyway, to be pragmatic:



      Using str.count(c) is faster, and this related answer of mine about strpbrk() in Python could make things faster still.



      def count_even_digits_spyr03_count(n):
      s = str(n)
      return sum(s.count(c) for c in "02468")


      def count_even_digits_spyr03_count_unrolled(n):
      s = str(n)
      return s.count("0") + s.count("2") + s.count("4") + s.count("6") + s.count("8")


      Results:



      string length: 502
      count_even_digits_spyr03_list 0.04157966522
      count_even_digits_spyr03_sum 0.05678154459
      count_even_digits_spyr03_for 0.036128606150000006
      count_even_digits_spyr03_count 0.010441866129999991
      count_even_digits_spyr03_count_unrolled 0.009662931009999999





      share|improve this answer













      @MarkusMeskanen's answer has the right bits – function calls are slow, and both genexprs and listcomps are basically function calls.



      Anyway, to be pragmatic:



      Using str.count(c) is faster, and this related answer of mine about strpbrk() in Python could make things faster still.



      def count_even_digits_spyr03_count(n):
      s = str(n)
      return sum(s.count(c) for c in "02468")


      def count_even_digits_spyr03_count_unrolled(n):
      s = str(n)
      return s.count("0") + s.count("2") + s.count("4") + s.count("6") + s.count("8")


      Results:



      string length: 502
      count_even_digits_spyr03_list 0.04157966522
      count_even_digits_spyr03_sum 0.05678154459
      count_even_digits_spyr03_for 0.036128606150000006
      count_even_digits_spyr03_count 0.010441866129999991
      count_even_digits_spyr03_count_unrolled 0.009662931009999999






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered May 24 at 10:09









      AKXAKX

      47.3k45670




      47.3k45670












      • Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

        – Graipher
        May 24 at 10:12







      • 2





        That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

        – AKX
        May 24 at 10:17






      • 1





        If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

        – Graipher
        May 24 at 10:22






      • 1





        @Graipher Feel free to. :)

        – AKX
        May 24 at 10:29

















      • Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

        – Graipher
        May 24 at 10:12







      • 2





        That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

        – AKX
        May 24 at 10:17






      • 1





        If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

        – Graipher
        May 24 at 10:22






      • 1





        @Graipher Feel free to. :)

        – AKX
        May 24 at 10:29
















      Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

      – Graipher
      May 24 at 10:12






      Interesting, I would have thought that having to iterate over the string multiple times would ruin it, but it works until the strings get very large (see updated question). Although for small numbers it is slower for me than the for loop.

      – Graipher
      May 24 at 10:12





      2




      2





      That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

      – AKX
      May 24 at 10:17





      That's where you'd want something like strpbrk() – the glibc version is pretty darn fast at finding any given character from a set in a string. See the link in my answer :) (It does require an additional C extension though, but if you need the speed...)

      – AKX
      May 24 at 10:17




      1




      1





      If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

      – Graipher
      May 24 at 10:22





      If you don't mind I will add your code (the one in this answer, the one with strpbrk from glibc is probably overkill) to the timings in my answer to the question on Code Review. Unless you want to add it as a separate answer there?

      – Graipher
      May 24 at 10:22




      1




      1





      @Graipher Feel free to. :)

      – AKX
      May 24 at 10:29





      @Graipher Feel free to. :)

      – AKX
      May 24 at 10:29











      9














      If we use dis.dis(), we can see how the functions actually behave.



      count_even_digits_spyr03_for():



       7 0 LOAD_CONST 1 (0)
      3 STORE_FAST 0 (count)

      8 6 SETUP_LOOP 42 (to 51)
      9 LOAD_GLOBAL 0 (str)
      12 LOAD_GLOBAL 1 (n)
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      18 GET_ITER
      >> 19 FOR_ITER 28 (to 50)
      22 STORE_FAST 1 (c)

      9 25 LOAD_FAST 1 (c)
      28 LOAD_CONST 2 ('02468')
      31 COMPARE_OP 6 (in)
      34 POP_JUMP_IF_FALSE 19

      10 37 LOAD_FAST 0 (count)
      40 LOAD_CONST 3 (1)
      43 INPLACE_ADD
      44 STORE_FAST 0 (count)
      47 JUMP_ABSOLUTE 19
      >> 50 POP_BLOCK

      11 >> 51 LOAD_FAST 0 (count)
      54 RETURN_VALUE


      We can see that there's only one function call, that's to str() at the beginning:



      9 LOAD_GLOBAL 0 (str)
      ...
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)


      Rest of it is highly optimized code, using jumps, stores, and inplace adding.



      What comes to count_even_digits_spyr03_sum():



       14 0 LOAD_GLOBAL 0 (sum)
      3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
      6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')
      9 MAKE_FUNCTION 0
      12 LOAD_GLOBAL 1 (str)
      15 LOAD_GLOBAL 2 (n)
      18 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      21 GET_ITER
      22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      28 RETURN_VALUE


      While I can't perfectly explain the differences, we can clearly see that there are more function calls (probably sum() and in(?)), which make the code run much slower than executing the machine instructions directly.






      share|improve this answer




















      • 1





        This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

        – Markus Meskanen
        May 24 at 10:04







      • 4





        The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

        – user2357112
        May 24 at 19:39















      9














      If we use dis.dis(), we can see how the functions actually behave.



      count_even_digits_spyr03_for():



       7 0 LOAD_CONST 1 (0)
      3 STORE_FAST 0 (count)

      8 6 SETUP_LOOP 42 (to 51)
      9 LOAD_GLOBAL 0 (str)
      12 LOAD_GLOBAL 1 (n)
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      18 GET_ITER
      >> 19 FOR_ITER 28 (to 50)
      22 STORE_FAST 1 (c)

      9 25 LOAD_FAST 1 (c)
      28 LOAD_CONST 2 ('02468')
      31 COMPARE_OP 6 (in)
      34 POP_JUMP_IF_FALSE 19

      10 37 LOAD_FAST 0 (count)
      40 LOAD_CONST 3 (1)
      43 INPLACE_ADD
      44 STORE_FAST 0 (count)
      47 JUMP_ABSOLUTE 19
      >> 50 POP_BLOCK

      11 >> 51 LOAD_FAST 0 (count)
      54 RETURN_VALUE


      We can see that there's only one function call, that's to str() at the beginning:



      9 LOAD_GLOBAL 0 (str)
      ...
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)


      Rest of it is highly optimized code, using jumps, stores, and inplace adding.



      What comes to count_even_digits_spyr03_sum():



       14 0 LOAD_GLOBAL 0 (sum)
      3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
      6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')
      9 MAKE_FUNCTION 0
      12 LOAD_GLOBAL 1 (str)
      15 LOAD_GLOBAL 2 (n)
      18 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      21 GET_ITER
      22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      28 RETURN_VALUE


      While I can't perfectly explain the differences, we can clearly see that there are more function calls (probably sum() and in(?)), which make the code run much slower than executing the machine instructions directly.






      share|improve this answer




















      • 1





        This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

        – Markus Meskanen
        May 24 at 10:04







      • 4





        The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

        – user2357112
        May 24 at 19:39













      9












      9








      9







      If we use dis.dis(), we can see how the functions actually behave.



      count_even_digits_spyr03_for():



       7 0 LOAD_CONST 1 (0)
      3 STORE_FAST 0 (count)

      8 6 SETUP_LOOP 42 (to 51)
      9 LOAD_GLOBAL 0 (str)
      12 LOAD_GLOBAL 1 (n)
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      18 GET_ITER
      >> 19 FOR_ITER 28 (to 50)
      22 STORE_FAST 1 (c)

      9 25 LOAD_FAST 1 (c)
      28 LOAD_CONST 2 ('02468')
      31 COMPARE_OP 6 (in)
      34 POP_JUMP_IF_FALSE 19

      10 37 LOAD_FAST 0 (count)
      40 LOAD_CONST 3 (1)
      43 INPLACE_ADD
      44 STORE_FAST 0 (count)
      47 JUMP_ABSOLUTE 19
      >> 50 POP_BLOCK

      11 >> 51 LOAD_FAST 0 (count)
      54 RETURN_VALUE


      We can see that there's only one function call, that's to str() at the beginning:



      9 LOAD_GLOBAL 0 (str)
      ...
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)


      Rest of it is highly optimized code, using jumps, stores, and inplace adding.



      What comes to count_even_digits_spyr03_sum():



       14 0 LOAD_GLOBAL 0 (sum)
      3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
      6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')
      9 MAKE_FUNCTION 0
      12 LOAD_GLOBAL 1 (str)
      15 LOAD_GLOBAL 2 (n)
      18 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      21 GET_ITER
      22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      28 RETURN_VALUE


      While I can't perfectly explain the differences, we can clearly see that there are more function calls (probably sum() and in(?)), which make the code run much slower than executing the machine instructions directly.






      share|improve this answer















      If we use dis.dis(), we can see how the functions actually behave.



      count_even_digits_spyr03_for():



       7 0 LOAD_CONST 1 (0)
      3 STORE_FAST 0 (count)

      8 6 SETUP_LOOP 42 (to 51)
      9 LOAD_GLOBAL 0 (str)
      12 LOAD_GLOBAL 1 (n)
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      18 GET_ITER
      >> 19 FOR_ITER 28 (to 50)
      22 STORE_FAST 1 (c)

      9 25 LOAD_FAST 1 (c)
      28 LOAD_CONST 2 ('02468')
      31 COMPARE_OP 6 (in)
      34 POP_JUMP_IF_FALSE 19

      10 37 LOAD_FAST 0 (count)
      40 LOAD_CONST 3 (1)
      43 INPLACE_ADD
      44 STORE_FAST 0 (count)
      47 JUMP_ABSOLUTE 19
      >> 50 POP_BLOCK

      11 >> 51 LOAD_FAST 0 (count)
      54 RETURN_VALUE


      We can see that there's only one function call, that's to str() at the beginning:



      9 LOAD_GLOBAL 0 (str)
      ...
      15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)


      Rest of it is highly optimized code, using jumps, stores, and inplace adding.



      What comes to count_even_digits_spyr03_sum():



       14 0 LOAD_GLOBAL 0 (sum)
      3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
      6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')
      9 MAKE_FUNCTION 0
      12 LOAD_GLOBAL 1 (str)
      15 LOAD_GLOBAL 2 (n)
      18 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      21 GET_ITER
      22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      25 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
      28 RETURN_VALUE


      While I can't perfectly explain the differences, we can clearly see that there are more function calls (probably sum() and in(?)), which make the code run much slower than executing the machine instructions directly.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited May 24 at 10:12

























      answered May 24 at 10:03









      Markus MeskanenMarkus Meskanen

      8,11573288




      8,11573288







      • 1





        This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

        – Markus Meskanen
        May 24 at 10:04







      • 4





        The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

        – user2357112
        May 24 at 19:39












      • 1





        This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

        – Markus Meskanen
        May 24 at 10:04







      • 4





        The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

        – user2357112
        May 24 at 19:39







      1




      1





      This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

      – Markus Meskanen
      May 24 at 10:04






      This is not a perfect answer that would explain it in detail, as I'm not too familiar with the dis output myself. But I believe it shows the generic idea; delegating the job to different functions is much slower than straight out executing the code. Feel free to fill in any gaps if anyone's more knowledged of the topic.

      – Markus Meskanen
      May 24 at 10:04





      4




      4





      The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

      – user2357112
      May 24 at 19:39





      The dis output for the second function is only showing about a third of the story. Another third is in the genexp, which wasn't disassembled, and another third is in sum, which you can't disassemble because it's in C. The function calls aren't really the problem; the problem is constantly going in and out of the genexp stack frame.

      – user2357112
      May 24 at 19:39











      4














      There are a few differences that actually contribute to the observed performance differences. I aim to give a high-level overview of these differences but try not to go too much into the low-level details or possible improvements. For the benchmarks I use my own package simple_benchmark.



      Generators vs. for loops



      Generators and generator expressions are syntactic sugar that can be used instead of writing iterator classes.



      When you write a generator like:



      def count_even(num):
      s = str(num)
      for c in s:
      yield c in '02468'


      Or a generator expression:



      (c in '02468' for c in str(num))


      That will be transformed (behind the scenes) into a state machine that is accessible through an iterator class. In the end it will be roughly equivalent to (although the actual code generated around a generator will be faster):



      class Count:
      def __init__(self, num):
      self.str_num = iter(str(num))

      def __iter__(self):
      return self

      def __next__(self):
      c = next(self.str_num)
      return c in '02468'


      So a generator will always have one additional layer of indirection. That means that advancing the generator (or generator expression or iterator) means that you call __next__ on the iterator that is generated by the generator which itself calls __next__ on the object you actually want to iterate over. But it also has some overhead because you actually need to create one additional "iterator instance". Typically these overheads are negligible if you do anything substantial in each iteration.



      Just to provide an example how much overhead a generator imposes compared to a manual loop:



      import matplotlib.pyplot as plt
      from simple_benchmark import BenchmarkBuilder
      %matplotlib notebook

      bench = BenchmarkBuilder()

      @bench.add_function()
      def iteration(it):
      for i in it:
      pass

      @bench.add_function()
      def generator(it):
      it = (item for item in it)
      for i in it:
      pass

      @bench.add_arguments()
      def argument_provider():
      for i in range(2, 15):
      size = 2**i
      yield size, [1 for _ in range(size)]

      plt.figure()
      result = bench.run()
      result.plot()


      enter image description here



      Generators vs. List comprehensions



      Generators have the advantage that they don't create a list, they "produce" the values one-by-one. So while a generator has the overhead of the "iterator class" it can save the memory for creating an intermediate list. It's a trade-off between speed (list comprehension) and memory (generators). This has been discussed in various posts around StackOverflow so I don't want to go into much more detail here.



      import matplotlib.pyplot as plt
      from simple_benchmark import BenchmarkBuilder
      %matplotlib notebook

      bench = BenchmarkBuilder()

      @bench.add_function()
      def generator_expression(it):
      it = (item for item in it)
      for i in it:
      pass

      @bench.add_function()
      def list_comprehension(it):
      it = [item for item in it]
      for i in it:
      pass

      @bench.add_arguments('size')
      def argument_provider():
      for i in range(2, 15):
      size = 2**i
      yield size, list(range(size))

      plt.figure()
      result = bench.run()
      result.plot()


      enter image description here




      sum should be faster than manual iteration



      Yes, sum is indeed faster than an explicit for loop. Especially if you iterate over integers.



      import matplotlib.pyplot as plt
      from simple_benchmark import BenchmarkBuilder
      %matplotlib notebook

      bench = BenchmarkBuilder()

      @bench.add_function()
      def my_sum(it):
      sum_ = 0
      for i in it:
      sum_ += i
      return sum_

      bench.add_function()(sum)

      @bench.add_arguments()
      def argument_provider():
      for i in range(2, 15):
      size = 2**i
      yield size, [1 for _ in range(size)]

      plt.figure()
      result = bench.run()
      result.plot()


      enter image description here



      String methods vs. Any kind of Python loop



      To understand the performance difference when using string methods like str.count compared to loops (explicit or implicit) is that strings in Python are actually stored as values in an (internal) array. That means a loop doesn't actually call any __next__ methods, it can use a loop directly over the array, this will be significantly faster. However it also imposes a method lookup and a method call on the string, that's why it's slower for very short numbers.



      Just to provide a small comparison how long it takes to iterate a string vs. how long it takes Python to iterate over the internal array:



      import matplotlib.pyplot as plt
      from simple_benchmark import BenchmarkBuilder
      %matplotlib notebook

      bench = BenchmarkBuilder()

      @bench.add_function()
      def string_iteration(s):
      # there is no "a" in the string, so this iterates over the whole string
      return 'a' in s

      @bench.add_function()
      def python_iteration(s):
      for c in s:
      pass

      @bench.add_arguments('string length')
      def argument_provider():
      for i in range(2, 20):
      size = 2**i
      yield size, '1'*size

      plt.figure()
      result = bench.run()
      result.plot()


      In this benchmark it's ~200 times faster to let Python do the iteration over the string than to iterate over the string with a for loop.



      enter image description here



      Why do all of them converge for large numbers?



      This is actually because the number to string conversion will be dominant there. So for really huge numbers you're essentially just measuring how long it takes to convert that number to a string.



      You'll see the difference if you compare the versions that take a number and convert it to a string with the one that take the converted number (I use the functions from another answer here to illustrate that). Left is the number-benchmark and on the right is the benchmark that takes the strings - also the y-axis is the same for both plots:
      enter image description here



      As you can see the benchmarks for the functions that take the string are significantly faster for large numbers than the ones that take a number and convert them to a string inside. This indicates that the string-conversion is the "bottleneck" for large numbers. For convenience I also included a benchmark only doing the string conversion to the left plot (which becomes significant/dominant for large numbers).



      %matplotlib notebook

      from simple_benchmark import BenchmarkBuilder
      import matplotlib.pyplot as plt
      import random

      bench1 = BenchmarkBuilder()

      @bench1.add_function()
      def f1(x):
      return sum(c in '02468' for c in str(x))

      @bench1.add_function()
      def f2(x):
      return sum([c in '02468' for c in str(x)])

      @bench1.add_function()
      def f3(x):
      return sum([True for c in str(x) if c in '02468'])

      @bench1.add_function()
      def f4(x):
      return sum([1 for c in str(x) if c in '02468'])

      @bench1.add_function()
      def explicit_loop(x):
      count = 0
      for c in str(x):
      if c in '02468':
      count += 1
      return count

      @bench1.add_function()
      def f5(x):
      s = str(x)
      return sum(s.count(c) for c in '02468')

      bench1.add_function()(str)

      @bench1.add_arguments(name='number length')
      def arg_provider():
      for i in range(2, 15):
      size = 2 ** i
      yield (2**i, int(''.join(str(random.randint(0, 9)) for _ in range(size))))


      bench2 = BenchmarkBuilder()

      @bench2.add_function()
      def f1(x):
      return sum(c in '02468' for c in x)

      @bench2.add_function()
      def f2(x):
      return sum([c in '02468' for c in x])

      @bench2.add_function()
      def f3(x):
      return sum([True for c in x if c in '02468'])

      @bench2.add_function()
      def f4(x):
      return sum([1 for c in x if c in '02468'])

      @bench2.add_function()
      def explicit_loop(x):
      count = 0
      for c in x:
      if c in '02468':
      count += 1
      return count

      @bench2.add_function()
      def f5(x):
      return sum(x.count(c) for c in '02468')

      @bench2.add_arguments(name='number length')
      def arg_provider():
      for i in range(2, 15):
      size = 2 ** i
      yield (2**i, ''.join(str(random.randint(0, 9)) for _ in range(size)))

      f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
      b1 = bench1.run()
      b2 = bench2.run()
      b1.plot(ax=ax1)
      b2.plot(ax=ax2)
      ax1.set_title('Number')
      ax2.set_title('String')





      share|improve this answer





























        4














        There are a few differences that actually contribute to the observed performance differences. I aim to give a high-level overview of these differences but try not to go too much into the low-level details or possible improvements. For the benchmarks I use my own package simple_benchmark.



        Generators vs. for loops



        Generators and generator expressions are syntactic sugar that can be used instead of writing iterator classes.



        When you write a generator like:



        def count_even(num):
        s = str(num)
        for c in s:
        yield c in '02468'


        Or a generator expression:



        (c in '02468' for c in str(num))


        That will be transformed (behind the scenes) into a state machine that is accessible through an iterator class. In the end it will be roughly equivalent to (although the actual code generated around a generator will be faster):



        class Count:
        def __init__(self, num):
        self.str_num = iter(str(num))

        def __iter__(self):
        return self

        def __next__(self):
        c = next(self.str_num)
        return c in '02468'


        So a generator will always have one additional layer of indirection. That means that advancing the generator (or generator expression or iterator) means that you call __next__ on the iterator that is generated by the generator which itself calls __next__ on the object you actually want to iterate over. But it also has some overhead because you actually need to create one additional "iterator instance". Typically these overheads are negligible if you do anything substantial in each iteration.



        Just to provide an example how much overhead a generator imposes compared to a manual loop:



        import matplotlib.pyplot as plt
        from simple_benchmark import BenchmarkBuilder
        %matplotlib notebook

        bench = BenchmarkBuilder()

        @bench.add_function()
        def iteration(it):
        for i in it:
        pass

        @bench.add_function()
        def generator(it):
        it = (item for item in it)
        for i in it:
        pass

        @bench.add_arguments()
        def argument_provider():
        for i in range(2, 15):
        size = 2**i
        yield size, [1 for _ in range(size)]

        plt.figure()
        result = bench.run()
        result.plot()


        enter image description here



        Generators vs. List comprehensions



        Generators have the advantage that they don't create a list, they "produce" the values one-by-one. So while a generator has the overhead of the "iterator class" it can save the memory for creating an intermediate list. It's a trade-off between speed (list comprehension) and memory (generators). This has been discussed in various posts around StackOverflow so I don't want to go into much more detail here.



        import matplotlib.pyplot as plt
        from simple_benchmark import BenchmarkBuilder
        %matplotlib notebook

        bench = BenchmarkBuilder()

        @bench.add_function()
        def generator_expression(it):
        it = (item for item in it)
        for i in it:
        pass

        @bench.add_function()
        def list_comprehension(it):
        it = [item for item in it]
        for i in it:
        pass

        @bench.add_arguments('size')
        def argument_provider():
        for i in range(2, 15):
        size = 2**i
        yield size, list(range(size))

        plt.figure()
        result = bench.run()
        result.plot()


        enter image description here




        sum should be faster than manual iteration



        Yes, sum is indeed faster than an explicit for loop. Especially if you iterate over integers.



        import matplotlib.pyplot as plt
        from simple_benchmark import BenchmarkBuilder
        %matplotlib notebook

        bench = BenchmarkBuilder()

        @bench.add_function()
        def my_sum(it):
        sum_ = 0
        for i in it:
        sum_ += i
        return sum_

        bench.add_function()(sum)

        @bench.add_arguments()
        def argument_provider():
        for i in range(2, 15):
        size = 2**i
        yield size, [1 for _ in range(size)]

        plt.figure()
        result = bench.run()
        result.plot()


        enter image description here



        String methods vs. Any kind of Python loop



        To understand the performance difference when using string methods like str.count compared to loops (explicit or implicit) is that strings in Python are actually stored as values in an (internal) array. That means a loop doesn't actually call any __next__ methods, it can use a loop directly over the array, this will be significantly faster. However it also imposes a method lookup and a method call on the string, that's why it's slower for very short numbers.



        Just to provide a small comparison how long it takes to iterate a string vs. how long it takes Python to iterate over the internal array:



        import matplotlib.pyplot as plt
        from simple_benchmark import BenchmarkBuilder
        %matplotlib notebook

        bench = BenchmarkBuilder()

        @bench.add_function()
        def string_iteration(s):
        # there is no "a" in the string, so this iterates over the whole string
        return 'a' in s

        @bench.add_function()
        def python_iteration(s):
        for c in s:
        pass

        @bench.add_arguments('string length')
        def argument_provider():
        for i in range(2, 20):
        size = 2**i
        yield size, '1'*size

        plt.figure()
        result = bench.run()
        result.plot()


        In this benchmark it's ~200 times faster to let Python do the iteration over the string than to iterate over the string with a for loop.



        enter image description here



        Why do all of them converge for large numbers?



        This is actually because the number to string conversion will be dominant there. So for really huge numbers you're essentially just measuring how long it takes to convert that number to a string.



        You'll see the difference if you compare the versions that take a number and convert it to a string with the one that take the converted number (I use the functions from another answer here to illustrate that). Left is the number-benchmark and on the right is the benchmark that takes the strings - also the y-axis is the same for both plots:
        enter image description here



        As you can see the benchmarks for the functions that take the string are significantly faster for large numbers than the ones that take a number and convert them to a string inside. This indicates that the string-conversion is the "bottleneck" for large numbers. For convenience I also included a benchmark only doing the string conversion to the left plot (which becomes significant/dominant for large numbers).



        %matplotlib notebook

        from simple_benchmark import BenchmarkBuilder
        import matplotlib.pyplot as plt
        import random

        bench1 = BenchmarkBuilder()

        @bench1.add_function()
        def f1(x):
        return sum(c in '02468' for c in str(x))

        @bench1.add_function()
        def f2(x):
        return sum([c in '02468' for c in str(x)])

        @bench1.add_function()
        def f3(x):
        return sum([True for c in str(x) if c in '02468'])

        @bench1.add_function()
        def f4(x):
        return sum([1 for c in str(x) if c in '02468'])

        @bench1.add_function()
        def explicit_loop(x):
        count = 0
        for c in str(x):
        if c in '02468':
        count += 1
        return count

        @bench1.add_function()
        def f5(x):
        s = str(x)
        return sum(s.count(c) for c in '02468')

        bench1.add_function()(str)

        @bench1.add_arguments(name='number length')
        def arg_provider():
        for i in range(2, 15):
        size = 2 ** i
        yield (2**i, int(''.join(str(random.randint(0, 9)) for _ in range(size))))


        bench2 = BenchmarkBuilder()

        @bench2.add_function()
        def f1(x):
        return sum(c in '02468' for c in x)

        @bench2.add_function()
        def f2(x):
        return sum([c in '02468' for c in x])

        @bench2.add_function()
        def f3(x):
        return sum([True for c in x if c in '02468'])

        @bench2.add_function()
        def f4(x):
        return sum([1 for c in x if c in '02468'])

        @bench2.add_function()
        def explicit_loop(x):
        count = 0
        for c in x:
        if c in '02468':
        count += 1
        return count

        @bench2.add_function()
        def f5(x):
        return sum(x.count(c) for c in '02468')

        @bench2.add_arguments(name='number length')
        def arg_provider():
        for i in range(2, 15):
        size = 2 ** i
        yield (2**i, ''.join(str(random.randint(0, 9)) for _ in range(size)))

        f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
        b1 = bench1.run()
        b2 = bench2.run()
        b1.plot(ax=ax1)
        b2.plot(ax=ax2)
        ax1.set_title('Number')
        ax2.set_title('String')





        share|improve this answer



























          4












          4








          4







          There are a few differences that actually contribute to the observed performance differences. I aim to give a high-level overview of these differences but try not to go too much into the low-level details or possible improvements. For the benchmarks I use my own package simple_benchmark.



          Generators vs. for loops



          Generators and generator expressions are syntactic sugar that can be used instead of writing iterator classes.



          When you write a generator like:



          def count_even(num):
          s = str(num)
          for c in s:
          yield c in '02468'


          Or a generator expression:



          (c in '02468' for c in str(num))


          That will be transformed (behind the scenes) into a state machine that is accessible through an iterator class. In the end it will be roughly equivalent to (although the actual code generated around a generator will be faster):



          class Count:
          def __init__(self, num):
          self.str_num = iter(str(num))

          def __iter__(self):
          return self

          def __next__(self):
          c = next(self.str_num)
          return c in '02468'


          So a generator will always have one additional layer of indirection. That means that advancing the generator (or generator expression or iterator) means that you call __next__ on the iterator that is generated by the generator which itself calls __next__ on the object you actually want to iterate over. But it also has some overhead because you actually need to create one additional "iterator instance". Typically these overheads are negligible if you do anything substantial in each iteration.



          Just to provide an example how much overhead a generator imposes compared to a manual loop:



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def iteration(it):
          for i in it:
          pass

          @bench.add_function()
          def generator(it):
          it = (item for item in it)
          for i in it:
          pass

          @bench.add_arguments()
          def argument_provider():
          for i in range(2, 15):
          size = 2**i
          yield size, [1 for _ in range(size)]

          plt.figure()
          result = bench.run()
          result.plot()


          enter image description here



          Generators vs. List comprehensions



          Generators have the advantage that they don't create a list, they "produce" the values one-by-one. So while a generator has the overhead of the "iterator class" it can save the memory for creating an intermediate list. It's a trade-off between speed (list comprehension) and memory (generators). This has been discussed in various posts around StackOverflow so I don't want to go into much more detail here.



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def generator_expression(it):
          it = (item for item in it)
          for i in it:
          pass

          @bench.add_function()
          def list_comprehension(it):
          it = [item for item in it]
          for i in it:
          pass

          @bench.add_arguments('size')
          def argument_provider():
          for i in range(2, 15):
          size = 2**i
          yield size, list(range(size))

          plt.figure()
          result = bench.run()
          result.plot()


          enter image description here




          sum should be faster than manual iteration



          Yes, sum is indeed faster than an explicit for loop. Especially if you iterate over integers.



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def my_sum(it):
          sum_ = 0
          for i in it:
          sum_ += i
          return sum_

          bench.add_function()(sum)

          @bench.add_arguments()
          def argument_provider():
          for i in range(2, 15):
          size = 2**i
          yield size, [1 for _ in range(size)]

          plt.figure()
          result = bench.run()
          result.plot()


          enter image description here



          String methods vs. Any kind of Python loop



          To understand the performance difference when using string methods like str.count compared to loops (explicit or implicit) is that strings in Python are actually stored as values in an (internal) array. That means a loop doesn't actually call any __next__ methods, it can use a loop directly over the array, this will be significantly faster. However it also imposes a method lookup and a method call on the string, that's why it's slower for very short numbers.



          Just to provide a small comparison how long it takes to iterate a string vs. how long it takes Python to iterate over the internal array:



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def string_iteration(s):
          # there is no "a" in the string, so this iterates over the whole string
          return 'a' in s

          @bench.add_function()
          def python_iteration(s):
          for c in s:
          pass

          @bench.add_arguments('string length')
          def argument_provider():
          for i in range(2, 20):
          size = 2**i
          yield size, '1'*size

          plt.figure()
          result = bench.run()
          result.plot()


          In this benchmark it's ~200 times faster to let Python do the iteration over the string than to iterate over the string with a for loop.



          enter image description here



          Why do all of them converge for large numbers?



          This is actually because the number to string conversion will be dominant there. So for really huge numbers you're essentially just measuring how long it takes to convert that number to a string.



          You'll see the difference if you compare the versions that take a number and convert it to a string with the one that take the converted number (I use the functions from another answer here to illustrate that). Left is the number-benchmark and on the right is the benchmark that takes the strings - also the y-axis is the same for both plots:
          enter image description here



          As you can see the benchmarks for the functions that take the string are significantly faster for large numbers than the ones that take a number and convert them to a string inside. This indicates that the string-conversion is the "bottleneck" for large numbers. For convenience I also included a benchmark only doing the string conversion to the left plot (which becomes significant/dominant for large numbers).



          %matplotlib notebook

          from simple_benchmark import BenchmarkBuilder
          import matplotlib.pyplot as plt
          import random

          bench1 = BenchmarkBuilder()

          @bench1.add_function()
          def f1(x):
          return sum(c in '02468' for c in str(x))

          @bench1.add_function()
          def f2(x):
          return sum([c in '02468' for c in str(x)])

          @bench1.add_function()
          def f3(x):
          return sum([True for c in str(x) if c in '02468'])

          @bench1.add_function()
          def f4(x):
          return sum([1 for c in str(x) if c in '02468'])

          @bench1.add_function()
          def explicit_loop(x):
          count = 0
          for c in str(x):
          if c in '02468':
          count += 1
          return count

          @bench1.add_function()
          def f5(x):
          s = str(x)
          return sum(s.count(c) for c in '02468')

          bench1.add_function()(str)

          @bench1.add_arguments(name='number length')
          def arg_provider():
          for i in range(2, 15):
          size = 2 ** i
          yield (2**i, int(''.join(str(random.randint(0, 9)) for _ in range(size))))


          bench2 = BenchmarkBuilder()

          @bench2.add_function()
          def f1(x):
          return sum(c in '02468' for c in x)

          @bench2.add_function()
          def f2(x):
          return sum([c in '02468' for c in x])

          @bench2.add_function()
          def f3(x):
          return sum([True for c in x if c in '02468'])

          @bench2.add_function()
          def f4(x):
          return sum([1 for c in x if c in '02468'])

          @bench2.add_function()
          def explicit_loop(x):
          count = 0
          for c in x:
          if c in '02468':
          count += 1
          return count

          @bench2.add_function()
          def f5(x):
          return sum(x.count(c) for c in '02468')

          @bench2.add_arguments(name='number length')
          def arg_provider():
          for i in range(2, 15):
          size = 2 ** i
          yield (2**i, ''.join(str(random.randint(0, 9)) for _ in range(size)))

          f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
          b1 = bench1.run()
          b2 = bench2.run()
          b1.plot(ax=ax1)
          b2.plot(ax=ax2)
          ax1.set_title('Number')
          ax2.set_title('String')





          share|improve this answer















          There are a few differences that actually contribute to the observed performance differences. I aim to give a high-level overview of these differences but try not to go too much into the low-level details or possible improvements. For the benchmarks I use my own package simple_benchmark.



          Generators vs. for loops



          Generators and generator expressions are syntactic sugar that can be used instead of writing iterator classes.



          When you write a generator like:



          def count_even(num):
          s = str(num)
          for c in s:
          yield c in '02468'


          Or a generator expression:



          (c in '02468' for c in str(num))


          That will be transformed (behind the scenes) into a state machine that is accessible through an iterator class. In the end it will be roughly equivalent to (although the actual code generated around a generator will be faster):



          class Count:
          def __init__(self, num):
          self.str_num = iter(str(num))

          def __iter__(self):
          return self

          def __next__(self):
          c = next(self.str_num)
          return c in '02468'


          So a generator will always have one additional layer of indirection. That means that advancing the generator (or generator expression or iterator) means that you call __next__ on the iterator that is generated by the generator which itself calls __next__ on the object you actually want to iterate over. But it also has some overhead because you actually need to create one additional "iterator instance". Typically these overheads are negligible if you do anything substantial in each iteration.



          Just to provide an example how much overhead a generator imposes compared to a manual loop:



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def iteration(it):
          for i in it:
          pass

          @bench.add_function()
          def generator(it):
          it = (item for item in it)
          for i in it:
          pass

          @bench.add_arguments()
          def argument_provider():
          for i in range(2, 15):
          size = 2**i
          yield size, [1 for _ in range(size)]

          plt.figure()
          result = bench.run()
          result.plot()


          enter image description here



          Generators vs. List comprehensions



          Generators have the advantage that they don't create a list, they "produce" the values one-by-one. So while a generator has the overhead of the "iterator class" it can save the memory for creating an intermediate list. It's a trade-off between speed (list comprehension) and memory (generators). This has been discussed in various posts around StackOverflow so I don't want to go into much more detail here.



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def generator_expression(it):
          it = (item for item in it)
          for i in it:
          pass

          @bench.add_function()
          def list_comprehension(it):
          it = [item for item in it]
          for i in it:
          pass

          @bench.add_arguments('size')
          def argument_provider():
          for i in range(2, 15):
          size = 2**i
          yield size, list(range(size))

          plt.figure()
          result = bench.run()
          result.plot()


          enter image description here




          sum should be faster than manual iteration



          Yes, sum is indeed faster than an explicit for loop. Especially if you iterate over integers.



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def my_sum(it):
          sum_ = 0
          for i in it:
          sum_ += i
          return sum_

          bench.add_function()(sum)

          @bench.add_arguments()
          def argument_provider():
          for i in range(2, 15):
          size = 2**i
          yield size, [1 for _ in range(size)]

          plt.figure()
          result = bench.run()
          result.plot()


          enter image description here



          String methods vs. Any kind of Python loop



          To understand the performance difference when using string methods like str.count compared to loops (explicit or implicit) is that strings in Python are actually stored as values in an (internal) array. That means a loop doesn't actually call any __next__ methods, it can use a loop directly over the array, this will be significantly faster. However it also imposes a method lookup and a method call on the string, that's why it's slower for very short numbers.



          Just to provide a small comparison how long it takes to iterate a string vs. how long it takes Python to iterate over the internal array:



          import matplotlib.pyplot as plt
          from simple_benchmark import BenchmarkBuilder
          %matplotlib notebook

          bench = BenchmarkBuilder()

          @bench.add_function()
          def string_iteration(s):
          # there is no "a" in the string, so this iterates over the whole string
          return 'a' in s

          @bench.add_function()
          def python_iteration(s):
          for c in s:
          pass

          @bench.add_arguments('string length')
          def argument_provider():
          for i in range(2, 20):
          size = 2**i
          yield size, '1'*size

          plt.figure()
          result = bench.run()
          result.plot()


          In this benchmark it's ~200 times faster to let Python do the iteration over the string than to iterate over the string with a for loop.



          enter image description here



          Why do all of them converge for large numbers?



          This is actually because the number to string conversion will be dominant there. So for really huge numbers you're essentially just measuring how long it takes to convert that number to a string.



          You'll see the difference if you compare the versions that take a number and convert it to a string with the one that take the converted number (I use the functions from another answer here to illustrate that). Left is the number-benchmark and on the right is the benchmark that takes the strings - also the y-axis is the same for both plots:
          enter image description here



          As you can see the benchmarks for the functions that take the string are significantly faster for large numbers than the ones that take a number and convert them to a string inside. This indicates that the string-conversion is the "bottleneck" for large numbers. For convenience I also included a benchmark only doing the string conversion to the left plot (which becomes significant/dominant for large numbers).



          %matplotlib notebook

          from simple_benchmark import BenchmarkBuilder
          import matplotlib.pyplot as plt
          import random

          bench1 = BenchmarkBuilder()

          @bench1.add_function()
          def f1(x):
          return sum(c in '02468' for c in str(x))

          @bench1.add_function()
          def f2(x):
          return sum([c in '02468' for c in str(x)])

          @bench1.add_function()
          def f3(x):
          return sum([True for c in str(x) if c in '02468'])

          @bench1.add_function()
          def f4(x):
          return sum([1 for c in str(x) if c in '02468'])

          @bench1.add_function()
          def explicit_loop(x):
          count = 0
          for c in str(x):
          if c in '02468':
          count += 1
          return count

          @bench1.add_function()
          def f5(x):
          s = str(x)
          return sum(s.count(c) for c in '02468')

          bench1.add_function()(str)

          @bench1.add_arguments(name='number length')
          def arg_provider():
          for i in range(2, 15):
          size = 2 ** i
          yield (2**i, int(''.join(str(random.randint(0, 9)) for _ in range(size))))


          bench2 = BenchmarkBuilder()

          @bench2.add_function()
          def f1(x):
          return sum(c in '02468' for c in x)

          @bench2.add_function()
          def f2(x):
          return sum([c in '02468' for c in x])

          @bench2.add_function()
          def f3(x):
          return sum([True for c in x if c in '02468'])

          @bench2.add_function()
          def f4(x):
          return sum([1 for c in x if c in '02468'])

          @bench2.add_function()
          def explicit_loop(x):
          count = 0
          for c in x:
          if c in '02468':
          count += 1
          return count

          @bench2.add_function()
          def f5(x):
          return sum(x.count(c) for c in '02468')

          @bench2.add_arguments(name='number length')
          def arg_provider():
          for i in range(2, 15):
          size = 2 ** i
          yield (2**i, ''.join(str(random.randint(0, 9)) for _ in range(size)))

          f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
          b1 = bench1.run()
          b2 = bench2.run()
          b1.plot(ax=ax1)
          b2.plot(ax=ax2)
          ax1.set_title('Number')
          ax2.set_title('String')






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited May 24 at 23:00

























          answered May 24 at 22:30









          MSeifertMSeifert

          81.8k20168198




          81.8k20168198





















              0














              All your functions contain equal number of calls to str(n) (one call) and c in "02468" (for every c in n). Since then I would like to simplify:



              import timeit

              num = ''.join(str(i % 10) for i in range(1, 10000001))

              def count_simple_sum():
              return sum(1 for c in num)

              def count_simple_for():
              count = 0
              for c in num:
              count += 1
              return count


              print('For Loop Sum:', timeit.timeit(count_simple_for, number=10))
              print('Built-in Sum:', timeit.timeit(count_simple_sum, number=10))


              sum is still slower:



              For Loop Sum: 2.8987821330083534
              Built-in Sum: 3.245505138998851


              The key difference between these two functions is that in count_simple_for you are iterating only throw num with pure for loop for c in num, but in count_simple_sum your are creating a generator object here (from @Markus Meskanen answer with dis.dis):



               3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
              6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')


              sum is iterating over this generator object to sum the elements produced, and this generator is iterating over elements in num to produce 1 on each element. Having one more iteration step is expensive because it requires to call generator.__next__() on each element and these calls are placed into try: ... except StopIteration: block which also adds some overhead.






              share|improve this answer



























                0














                All your functions contain equal number of calls to str(n) (one call) and c in "02468" (for every c in n). Since then I would like to simplify:



                import timeit

                num = ''.join(str(i % 10) for i in range(1, 10000001))

                def count_simple_sum():
                return sum(1 for c in num)

                def count_simple_for():
                count = 0
                for c in num:
                count += 1
                return count


                print('For Loop Sum:', timeit.timeit(count_simple_for, number=10))
                print('Built-in Sum:', timeit.timeit(count_simple_sum, number=10))


                sum is still slower:



                For Loop Sum: 2.8987821330083534
                Built-in Sum: 3.245505138998851


                The key difference between these two functions is that in count_simple_for you are iterating only throw num with pure for loop for c in num, but in count_simple_sum your are creating a generator object here (from @Markus Meskanen answer with dis.dis):



                 3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
                6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')


                sum is iterating over this generator object to sum the elements produced, and this generator is iterating over elements in num to produce 1 on each element. Having one more iteration step is expensive because it requires to call generator.__next__() on each element and these calls are placed into try: ... except StopIteration: block which also adds some overhead.






                share|improve this answer

























                  0












                  0








                  0







                  All your functions contain equal number of calls to str(n) (one call) and c in "02468" (for every c in n). Since then I would like to simplify:



                  import timeit

                  num = ''.join(str(i % 10) for i in range(1, 10000001))

                  def count_simple_sum():
                  return sum(1 for c in num)

                  def count_simple_for():
                  count = 0
                  for c in num:
                  count += 1
                  return count


                  print('For Loop Sum:', timeit.timeit(count_simple_for, number=10))
                  print('Built-in Sum:', timeit.timeit(count_simple_sum, number=10))


                  sum is still slower:



                  For Loop Sum: 2.8987821330083534
                  Built-in Sum: 3.245505138998851


                  The key difference between these two functions is that in count_simple_for you are iterating only throw num with pure for loop for c in num, but in count_simple_sum your are creating a generator object here (from @Markus Meskanen answer with dis.dis):



                   3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
                  6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')


                  sum is iterating over this generator object to sum the elements produced, and this generator is iterating over elements in num to produce 1 on each element. Having one more iteration step is expensive because it requires to call generator.__next__() on each element and these calls are placed into try: ... except StopIteration: block which also adds some overhead.






                  share|improve this answer













                  All your functions contain equal number of calls to str(n) (one call) and c in "02468" (for every c in n). Since then I would like to simplify:



                  import timeit

                  num = ''.join(str(i % 10) for i in range(1, 10000001))

                  def count_simple_sum():
                  return sum(1 for c in num)

                  def count_simple_for():
                  count = 0
                  for c in num:
                  count += 1
                  return count


                  print('For Loop Sum:', timeit.timeit(count_simple_for, number=10))
                  print('Built-in Sum:', timeit.timeit(count_simple_sum, number=10))


                  sum is still slower:



                  For Loop Sum: 2.8987821330083534
                  Built-in Sum: 3.245505138998851


                  The key difference between these two functions is that in count_simple_for you are iterating only throw num with pure for loop for c in num, but in count_simple_sum your are creating a generator object here (from @Markus Meskanen answer with dis.dis):



                   3 LOAD_CONST 1 (<code object <genexpr> at 0x10dcc8c90, file "test.py", line 14>)
                  6 LOAD_CONST 2 ('count2.<locals>.<genexpr>')


                  sum is iterating over this generator object to sum the elements produced, and this generator is iterating over elements in num to produce 1 on each element. Having one more iteration step is expensive because it requires to call generator.__next__() on each element and these calls are placed into try: ... except StopIteration: block which also adds some overhead.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered May 24 at 10:56









                  SanyashSanyash

                  4,22981533




                  4,22981533



























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56288015%2fwhy-is-a-for-loop-so-much-faster-to-count-true-values%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Club Baloncesto Breogán Índice Historia | Pavillón | Nome | O Breogán na cultura popular | Xogadores | Adestradores | Presidentes | Palmarés | Historial | Líderes | Notas | Véxase tamén | Menú de navegacióncbbreogan.galCadroGuía oficial da ACB 2009-10, páxina 201Guía oficial ACB 1992, páxina 183. Editorial DB.É de 6.500 espectadores sentados axeitándose á última normativa"Estudiantes Junior, entre as mellores canteiras"o orixinalHemeroteca El Mundo Deportivo, 16 setembro de 1970, páxina 12Historia do BreogánAlfredo Pérez, o último canoneiroHistoria C.B. BreogánHemeroteca de El Mundo DeportivoJimmy Wright, norteamericano do Breogán deixará Lugo por ameazas de morteResultados de Breogán en 1986-87Resultados de Breogán en 1990-91Ficha de Velimir Perasović en acb.comResultados de Breogán en 1994-95Breogán arrasa al Barça. "El Mundo Deportivo", 27 de setembro de 1999, páxina 58CB Breogán - FC BarcelonaA FEB invita a participar nunha nova Liga EuropeaCharlie Bell na prensa estatalMáximos anotadores 2005Tempada 2005-06 : Tódolos Xogadores da Xornada""Non quero pensar nunha man negra, mais pregúntome que está a pasar""o orixinalRaúl López, orgulloso dos xogadores, presume da boa saúde económica do BreogánJulio González confirma que cesa como presidente del BreogánHomenaxe a Lisardo GómezA tempada do rexurdimento celesteEntrevista a Lisardo GómezEl COB dinamita el Pazo para forzar el quinto (69-73)Cafés Candelas, patrocinador del CB Breogán"Suso Lázare, novo presidente do Breogán"o orixinalCafés Candelas Breogán firma el mayor triunfo de la historiaEl Breogán realizará 17 homenajes por su cincuenta aniversario"O Breogán honra ao seu fundador e primeiro presidente"o orixinalMiguel Giao recibiu a homenaxe do PazoHomenaxe aos primeiros gladiadores celestesO home que nos amosa como ver o Breo co corazónTita Franco será homenaxeada polos #50anosdeBreoJulio Vila recibirá unha homenaxe in memoriam polos #50anosdeBreo"O Breogán homenaxeará aos seus aboados máis veteráns"Pechada ovación a «Capi» Sanmartín e Ricardo «Corazón de González»Homenaxe por décadas de informaciónPaco García volve ao Pazo con motivo do 50 aniversario"Resultados y clasificaciones""O Cafés Candelas Breogán, campión da Copa Princesa""O Cafés Candelas Breogán, equipo ACB"C.B. Breogán"Proxecto social"o orixinal"Centros asociados"o orixinalFicha en imdb.comMario Camus trata la recuperación del amor en 'La vieja música', su última película"Páxina web oficial""Club Baloncesto Breogán""C. B. Breogán S.A.D."eehttp://www.fegaba.com

                      Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

                      Cegueira Índice Epidemioloxía | Deficiencia visual | Tipos de cegueira | Principais causas de cegueira | Tratamento | Técnicas de adaptación e axudas | Vida dos cegos | Primeiros auxilios | Crenzas respecto das persoas cegas | Crenzas das persoas cegas | O neno deficiente visual | Aspectos psicolóxicos da cegueira | Notas | Véxase tamén | Menú de navegación54.054.154.436928256blindnessDicionario da Real Academia GalegaPortal das Palabras"International Standards: Visual Standards — Aspects and Ranges of Vision Loss with Emphasis on Population Surveys.""Visual impairment and blindness""Presentan un plan para previr a cegueira"o orixinalACCDV Associació Catalana de Cecs i Disminuïts Visuals - PMFTrachoma"Effect of gene therapy on visual function in Leber's congenital amaurosis"1844137110.1056/NEJMoa0802268Cans guía - os mellores amigos dos cegosArquivadoEscola de cans guía para cegos en Mortágua, PortugalArquivado"Tecnología para ciegos y deficientes visuales. Recopilación de recursos gratuitos en la Red""Colorino""‘COL.diesis’, escuchar los sonidos del color""COL.diesis: Transforming Colour into Melody and Implementing the Result in a Colour Sensor Device"o orixinal"Sistema de desarrollo de sinestesia color-sonido para invidentes utilizando un protocolo de audio""Enseñanza táctil - geometría y color. Juegos didácticos para niños ciegos y videntes""Sistema Constanz"L'ocupació laboral dels cecs a l'Estat espanyol està pràcticament equiparada a la de les persones amb visió, entrevista amb Pedro ZuritaONCE (Organización Nacional de Cegos de España)Prevención da cegueiraDescrición de deficiencias visuais (Disc@pnet)Braillín, un boneco atractivo para calquera neno, con ou sen discapacidade, que permite familiarizarse co sistema de escritura e lectura brailleAxudas Técnicas36838ID00897494007150-90057129528256DOID:1432HP:0000618D001766C10.597.751.941.162C97109C0155020