What is the rationale of making subtraction of two pointers not related to the same array undefined behavior? [duplicate]What is the rationale for limitations on pointer arithmetic or comparison?Why is it undefined behavior to delete[] an array of derived objects via a base pointer?Is it still legal to do pointer arithmetic on a deleted array?Class that holds a reference to itselfIs pointer arithmetic on allocated storage UB?Is adding to a “char *” pointer UB, when it doesn't actually point to a char array?What made i = i++ + 1; legal in C++17?Can ptrdiff_t represent all subtractions of pointers to elements of the same array object?Is initializing a pointer declarator with an invalid pointer undefined behavior?Is creating a pointer one past the end of a non-array pointer not derived from unary operator & undefined behavior in C++17?Undefined behavior with pointer arithmetic on dynamically allocated memory
Is there any mention of ghosts who live outside the Hogwarts castle?
Does the fact that we can only measure the two-way speed of light undermine the axiom of invariance?
Are clauses with "который" restrictive or non-restrictive by default?
What pc resources are used when bruteforcing?
Managing heat dissipation in a magic wand
Surface of the 3x3x3 cube as a graph
Is there a word for pant sleeves?
Why "strap-on" boosters, and how do other people say it?
Variable does not Exist: CaseTrigger
Is the default 512 byte physical sector size appropriate for SSD disks under Linux?
What does it mean when みたいな is at the end of a sentence?
Coloring lines in a graph the same color if they are the same length
What does it mean for something to be strictly less than epsilon for an arbitrary epsilon?
Department head said that group project may be rejected. How to mitigate?
Salesforce bug enabled "Modify All"
mmap: effect of other processes writing to a file previously mapped read-only
Was murdering a slave illegal in American slavery, and if so, what punishments were given for it?
why "American-born", not "America-born"?
csname in newenviroment
How would a physicist explain this starship engine?
Why is a weak base more able to deprotonate a strong acid than a weak acid?
Download app bundles from App Store to run on iOS Emulator on Mac
Why is 'additive' EQ more difficult to use than 'subtractive'?
Sony VAIO Duo 13 Wifi not working on Ubuntu 16.04
What is the rationale of making subtraction of two pointers not related to the same array undefined behavior? [duplicate]
What is the rationale for limitations on pointer arithmetic or comparison?Why is it undefined behavior to delete[] an array of derived objects via a base pointer?Is it still legal to do pointer arithmetic on a deleted array?Class that holds a reference to itselfIs pointer arithmetic on allocated storage UB?Is adding to a “char *” pointer UB, when it doesn't actually point to a char array?What made i = i++ + 1; legal in C++17?Can ptrdiff_t represent all subtractions of pointers to elements of the same array object?Is initializing a pointer declarator with an invalid pointer undefined behavior?Is creating a pointer one past the end of a non-array pointer not derived from unary operator & undefined behavior in C++17?Undefined behavior with pointer arithmetic on dynamically allocated memory
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
This question already has an answer here:
What is the rationale for limitations on pointer arithmetic or comparison?
7 answers
According to the C++ draft expr.add when you subtract pointers of the same types, but not belonging to the same array, the behavior is undefined (emphasis is mine):
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header ([support.types]).
- If P and Q both evaluate to null pointer values, the result is 0.
(5.2)
Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
Otherwise, the behavior is undefined.
[ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined.
— end note
]
What is the rationale for making such behavior undefined instead of, for instance, implementation-defined?
c++ language-lawyer pointer-arithmetic
marked as duplicate by xskxzr, Blaze, M.M
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
May 8 at 14:07
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
|
show 11 more comments
This question already has an answer here:
What is the rationale for limitations on pointer arithmetic or comparison?
7 answers
According to the C++ draft expr.add when you subtract pointers of the same types, but not belonging to the same array, the behavior is undefined (emphasis is mine):
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header ([support.types]).
- If P and Q both evaluate to null pointer values, the result is 0.
(5.2)
Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
Otherwise, the behavior is undefined.
[ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined.
— end note
]
What is the rationale for making such behavior undefined instead of, for instance, implementation-defined?
c++ language-lawyer pointer-arithmetic
marked as duplicate by xskxzr, Blaze, M.M
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
May 8 at 14:07
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
12
What meaning would the resulting value have?
– Blaze
May 8 at 8:14
3
i dont think there would be much difference if it was implementation defined, you probably would have to read in your compilers documentation that it is undefined ;)
– formerlyknownas_463035818
May 8 at 8:15
2
@αλεχολυτ what I'm getting at is that the resulting value is nonsensical, there's no way to use it properly. Instead of forcing the compiler to generate a nonsense value, the standard just says that it's UB, allowing the compiler to salvage that situation however it wants, possibly by not even doing the subtraction and thus saving time. I mean, it could just optimize away the line and the value is whatever was in memory to begin with, the result would be just as useless. In general, leaving things up to the implementation generates potential for possible optimization.
– Blaze
May 8 at 8:26
3
What if the objects are in different memory segments? There's no meaningful "difference" then.
– melpomene
May 8 at 8:34
1
@Blaze Assuming a linear memory layout, the resulting value isn't entirely non-sensical. I've seen code that actually relies on pointer arithmetic across separate arrays. For example, settingd = p - q
and later assuming thatq + d
yieldsp
.
– nwellnhof
May 8 at 8:35
|
show 11 more comments
This question already has an answer here:
What is the rationale for limitations on pointer arithmetic or comparison?
7 answers
According to the C++ draft expr.add when you subtract pointers of the same types, but not belonging to the same array, the behavior is undefined (emphasis is mine):
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header ([support.types]).
- If P and Q both evaluate to null pointer values, the result is 0.
(5.2)
Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
Otherwise, the behavior is undefined.
[ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined.
— end note
]
What is the rationale for making such behavior undefined instead of, for instance, implementation-defined?
c++ language-lawyer pointer-arithmetic
This question already has an answer here:
What is the rationale for limitations on pointer arithmetic or comparison?
7 answers
According to the C++ draft expr.add when you subtract pointers of the same types, but not belonging to the same array, the behavior is undefined (emphasis is mine):
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header ([support.types]).
- If P and Q both evaluate to null pointer values, the result is 0.
(5.2)
Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
Otherwise, the behavior is undefined.
[ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined.
— end note
]
What is the rationale for making such behavior undefined instead of, for instance, implementation-defined?
This question already has an answer here:
What is the rationale for limitations on pointer arithmetic or comparison?
7 answers
c++ language-lawyer pointer-arithmetic
c++ language-lawyer pointer-arithmetic
edited May 8 at 14:02
Boann
37.9k1291123
37.9k1291123
asked May 8 at 8:11
αλεχολυταλεχολυτ
2,06211750
2,06211750
marked as duplicate by xskxzr, Blaze, M.M
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
May 8 at 14:07
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by xskxzr, Blaze, M.M
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
May 8 at 14:07
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
12
What meaning would the resulting value have?
– Blaze
May 8 at 8:14
3
i dont think there would be much difference if it was implementation defined, you probably would have to read in your compilers documentation that it is undefined ;)
– formerlyknownas_463035818
May 8 at 8:15
2
@αλεχολυτ what I'm getting at is that the resulting value is nonsensical, there's no way to use it properly. Instead of forcing the compiler to generate a nonsense value, the standard just says that it's UB, allowing the compiler to salvage that situation however it wants, possibly by not even doing the subtraction and thus saving time. I mean, it could just optimize away the line and the value is whatever was in memory to begin with, the result would be just as useless. In general, leaving things up to the implementation generates potential for possible optimization.
– Blaze
May 8 at 8:26
3
What if the objects are in different memory segments? There's no meaningful "difference" then.
– melpomene
May 8 at 8:34
1
@Blaze Assuming a linear memory layout, the resulting value isn't entirely non-sensical. I've seen code that actually relies on pointer arithmetic across separate arrays. For example, settingd = p - q
and later assuming thatq + d
yieldsp
.
– nwellnhof
May 8 at 8:35
|
show 11 more comments
12
What meaning would the resulting value have?
– Blaze
May 8 at 8:14
3
i dont think there would be much difference if it was implementation defined, you probably would have to read in your compilers documentation that it is undefined ;)
– formerlyknownas_463035818
May 8 at 8:15
2
@αλεχολυτ what I'm getting at is that the resulting value is nonsensical, there's no way to use it properly. Instead of forcing the compiler to generate a nonsense value, the standard just says that it's UB, allowing the compiler to salvage that situation however it wants, possibly by not even doing the subtraction and thus saving time. I mean, it could just optimize away the line and the value is whatever was in memory to begin with, the result would be just as useless. In general, leaving things up to the implementation generates potential for possible optimization.
– Blaze
May 8 at 8:26
3
What if the objects are in different memory segments? There's no meaningful "difference" then.
– melpomene
May 8 at 8:34
1
@Blaze Assuming a linear memory layout, the resulting value isn't entirely non-sensical. I've seen code that actually relies on pointer arithmetic across separate arrays. For example, settingd = p - q
and later assuming thatq + d
yieldsp
.
– nwellnhof
May 8 at 8:35
12
12
What meaning would the resulting value have?
– Blaze
May 8 at 8:14
What meaning would the resulting value have?
– Blaze
May 8 at 8:14
3
3
i dont think there would be much difference if it was implementation defined, you probably would have to read in your compilers documentation that it is undefined ;)
– formerlyknownas_463035818
May 8 at 8:15
i dont think there would be much difference if it was implementation defined, you probably would have to read in your compilers documentation that it is undefined ;)
– formerlyknownas_463035818
May 8 at 8:15
2
2
@αλεχολυτ what I'm getting at is that the resulting value is nonsensical, there's no way to use it properly. Instead of forcing the compiler to generate a nonsense value, the standard just says that it's UB, allowing the compiler to salvage that situation however it wants, possibly by not even doing the subtraction and thus saving time. I mean, it could just optimize away the line and the value is whatever was in memory to begin with, the result would be just as useless. In general, leaving things up to the implementation generates potential for possible optimization.
– Blaze
May 8 at 8:26
@αλεχολυτ what I'm getting at is that the resulting value is nonsensical, there's no way to use it properly. Instead of forcing the compiler to generate a nonsense value, the standard just says that it's UB, allowing the compiler to salvage that situation however it wants, possibly by not even doing the subtraction and thus saving time. I mean, it could just optimize away the line and the value is whatever was in memory to begin with, the result would be just as useless. In general, leaving things up to the implementation generates potential for possible optimization.
– Blaze
May 8 at 8:26
3
3
What if the objects are in different memory segments? There's no meaningful "difference" then.
– melpomene
May 8 at 8:34
What if the objects are in different memory segments? There's no meaningful "difference" then.
– melpomene
May 8 at 8:34
1
1
@Blaze Assuming a linear memory layout, the resulting value isn't entirely non-sensical. I've seen code that actually relies on pointer arithmetic across separate arrays. For example, setting
d = p - q
and later assuming that q + d
yields p
.– nwellnhof
May 8 at 8:35
@Blaze Assuming a linear memory layout, the resulting value isn't entirely non-sensical. I've seen code that actually relies on pointer arithmetic across separate arrays. For example, setting
d = p - q
and later assuming that q + d
yields p
.– nwellnhof
May 8 at 8:35
|
show 11 more comments
3 Answers
3
active
oldest
votes
Speaking more academically: pointers are not numbers. They are pointers.
It is true that a pointer on your system is implemented as a numerical representation of an address-like representation of a location in some abstract kind of memory (probably a virtual, per-process memory space).
But C++ doesn't care about that. C++ wants you to think of pointers as post-its, as bookmarks, to specific objects. The numerical address values are just a side-effect. The only arithmetic that makes sense on a pointer is forwards and backwards through an array of objects; nothing else is philosophically meaningful.
This may seem pretty arcane and useless, but it's actually deliberate and useful. C++ doesn't want to constrain implementations to imbuing further meaning to practical, low-level computer properties that it cannot control. And, since there is no reason for it to do so (why would you want to do this?) it just says that the result is undefined.
In practice you may find that your subtraction works. However, compilers are extremely complicated and make great use of the standard's rules in order to generate the fastest code possible; that can and often will result in your program appearing to do strange things when you break the rules. Don't be too surprised if your pointer arithmetic operation is mangled when the compiler assumes that both the originating value and the result refer to the same array — an assumption that you violated.
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
@αλεχολυτ Yep, by treating the object as an array ofunsigned char
;) (or similar)
– Lightness Races in Orbit
May 8 at 14:21
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
@αλεχολυτ Eachunsigned char
(or similar) is effectively an object.
– Lightness Races in Orbit
May 8 at 14:39
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
|
show 2 more comments
As noted by some in the comments, unless the resulting value has some meaning or usable in some way, there is no point in making the behavior defined.
There has been a study done for the C language to answer questions related to Pointer Provenance (and with an intention to propose wording changes to the C specification.) and one of the questions was:
Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first? (source)
The conclusion of the authors of the study was published in a paper titled: Exploring C Semantics and Pointer Provenance and with respect to this particular question, the answer was:
Inter-object pointer arithmetic
The first example in this section relied on guessing
(and then checking) the offset between two allocations. What if one instead calculates
the offset, with pointer subtraction; should that let one move between objects, as below?// pointer_offset_from_ptr_subtraction_global_xy.c
#include <stdio.h>
#include <string.h>
#include <stddef.h>
int x=1, y=2;
int main()
int *p = &x;
int *q = &y;
ptrdiff_t offset = q - p;
int *r = p + offset;
if (memcmp(&r, &q, sizeof(r)) == 0)
*r = 11; // is this free of UB?
printf("y=%d *q=%d *r=%dn",y,*q,*r);
In ISO C11, the
q-p
is UB (as a pointer subtraction between pointers to different
objects, which in some abstract-machine executions are not one-past-related). In
a variant semantics that allows construction of more-than-one-past pointers, one
would have to to choose whether the*r=11
access is UB or not. The basic provenance
semantics will forbid it, because r will retain the provenance of thex
allocation, but
its address is not in bounds for that. This is probably the most desirable semantics:
we have found very few example idioms that intentionally use inter-object pointer
arithmetic, and the freedom that forbidding it gives to alias analysis and optimisation seems significant.
This study was picked up by the C++ community, summarized and was sent to WG21 (The C++ Standards Committee) for feedback.
Relevant point of the Summary:
Pointer difference is only defined for pointers with the same provenance and within the same array.
So, they have decided to keep it undefined for now.
Note that there is a study group SG12 within the C++ Standards Committee for studying Undefined Behavior & Vulnerabilities. This group conducts a systematic review to catalog cases of vulnerabilities and undefined/unspecified behavior in the standard, and recommend a coherent set of changes to define and/or specify the behavior. You can keep track of the proceedings of this group to see if there are going to be any changes in the future to the behaviors that are currently undefined or unspecified.
add a comment |
First see this question mentioned in the comments for why it isn't well defined. The answer given concisely is that arbitrary pointer arithmetic is not possible in segmented memory models used by some (now archaic?) systems.
What is the rationale to make such behavior undefined instead of, for instance, implementation defined?
Whenever standard specifies something as undefined behaviour, it usually could be specified merely to be implementation defined instead. So, why specify anything as undefined?
Well, undefined behaviour is more lenient. In particular, being allowed to assume that there is no undefined behaviour, a compiler may perform optimisations that would break the program if the assumptions weren't correct. So, a reason to specify undefined behaviour is optimisation.
Let's consider function fun(int* arr1, int* arr2)
that takes two pointers as arguments. Those pointers could point to the same array, or not. Let's say the function iterates through one of the pointed arrays (arr1 + n
), and must compare each position to the other pointer for equality ((arr1 + n) != arr2
) in each iteration. For example to ensure that the pointed object is not overridden.
Let's say that we call the function like this: fun(array1, array2)
. The compiler knows that (array1 + n) != array2
, because otherwise behaviour is undefined. Therefore the if the function call is expanded inline, the compiler can remove the redundant check (arr1 + n) != arr2
which is always true. If pointer arithmetic across array boundaries were well (or even implementation) defined, then (array1 + n) == array2
could be true with some n
, and this optimisation would be impossible - unless the compiler can prove that (array1 + n) != array2
holds for all possible values of n
which can sometimes be more difficult to prove.
Pointer arithmetic across members of a class could be implemented even in segmented memory models. Same goes for iterating over the boundaries of a subarray. There are use cases where these could be quite useful, but these are technically UB.
An argument for UB in these cases is more possibilities for UB optimisation. You don't necessarily need to agree that this is a sufficient argument.
Ah, I'm confusing the rules for ordering pointers.==
and!=
are well defined for pointers to objects of the same type (orvoid *
)
– Caleth
May 8 at 10:55
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Speaking more academically: pointers are not numbers. They are pointers.
It is true that a pointer on your system is implemented as a numerical representation of an address-like representation of a location in some abstract kind of memory (probably a virtual, per-process memory space).
But C++ doesn't care about that. C++ wants you to think of pointers as post-its, as bookmarks, to specific objects. The numerical address values are just a side-effect. The only arithmetic that makes sense on a pointer is forwards and backwards through an array of objects; nothing else is philosophically meaningful.
This may seem pretty arcane and useless, but it's actually deliberate and useful. C++ doesn't want to constrain implementations to imbuing further meaning to practical, low-level computer properties that it cannot control. And, since there is no reason for it to do so (why would you want to do this?) it just says that the result is undefined.
In practice you may find that your subtraction works. However, compilers are extremely complicated and make great use of the standard's rules in order to generate the fastest code possible; that can and often will result in your program appearing to do strange things when you break the rules. Don't be too surprised if your pointer arithmetic operation is mangled when the compiler assumes that both the originating value and the result refer to the same array — an assumption that you violated.
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
@αλεχολυτ Yep, by treating the object as an array ofunsigned char
;) (or similar)
– Lightness Races in Orbit
May 8 at 14:21
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
@αλεχολυτ Eachunsigned char
(or similar) is effectively an object.
– Lightness Races in Orbit
May 8 at 14:39
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
|
show 2 more comments
Speaking more academically: pointers are not numbers. They are pointers.
It is true that a pointer on your system is implemented as a numerical representation of an address-like representation of a location in some abstract kind of memory (probably a virtual, per-process memory space).
But C++ doesn't care about that. C++ wants you to think of pointers as post-its, as bookmarks, to specific objects. The numerical address values are just a side-effect. The only arithmetic that makes sense on a pointer is forwards and backwards through an array of objects; nothing else is philosophically meaningful.
This may seem pretty arcane and useless, but it's actually deliberate and useful. C++ doesn't want to constrain implementations to imbuing further meaning to practical, low-level computer properties that it cannot control. And, since there is no reason for it to do so (why would you want to do this?) it just says that the result is undefined.
In practice you may find that your subtraction works. However, compilers are extremely complicated and make great use of the standard's rules in order to generate the fastest code possible; that can and often will result in your program appearing to do strange things when you break the rules. Don't be too surprised if your pointer arithmetic operation is mangled when the compiler assumes that both the originating value and the result refer to the same array — an assumption that you violated.
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
@αλεχολυτ Yep, by treating the object as an array ofunsigned char
;) (or similar)
– Lightness Races in Orbit
May 8 at 14:21
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
@αλεχολυτ Eachunsigned char
(or similar) is effectively an object.
– Lightness Races in Orbit
May 8 at 14:39
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
|
show 2 more comments
Speaking more academically: pointers are not numbers. They are pointers.
It is true that a pointer on your system is implemented as a numerical representation of an address-like representation of a location in some abstract kind of memory (probably a virtual, per-process memory space).
But C++ doesn't care about that. C++ wants you to think of pointers as post-its, as bookmarks, to specific objects. The numerical address values are just a side-effect. The only arithmetic that makes sense on a pointer is forwards and backwards through an array of objects; nothing else is philosophically meaningful.
This may seem pretty arcane and useless, but it's actually deliberate and useful. C++ doesn't want to constrain implementations to imbuing further meaning to practical, low-level computer properties that it cannot control. And, since there is no reason for it to do so (why would you want to do this?) it just says that the result is undefined.
In practice you may find that your subtraction works. However, compilers are extremely complicated and make great use of the standard's rules in order to generate the fastest code possible; that can and often will result in your program appearing to do strange things when you break the rules. Don't be too surprised if your pointer arithmetic operation is mangled when the compiler assumes that both the originating value and the result refer to the same array — an assumption that you violated.
Speaking more academically: pointers are not numbers. They are pointers.
It is true that a pointer on your system is implemented as a numerical representation of an address-like representation of a location in some abstract kind of memory (probably a virtual, per-process memory space).
But C++ doesn't care about that. C++ wants you to think of pointers as post-its, as bookmarks, to specific objects. The numerical address values are just a side-effect. The only arithmetic that makes sense on a pointer is forwards and backwards through an array of objects; nothing else is philosophically meaningful.
This may seem pretty arcane and useless, but it's actually deliberate and useful. C++ doesn't want to constrain implementations to imbuing further meaning to practical, low-level computer properties that it cannot control. And, since there is no reason for it to do so (why would you want to do this?) it just says that the result is undefined.
In practice you may find that your subtraction works. However, compilers are extremely complicated and make great use of the standard's rules in order to generate the fastest code possible; that can and often will result in your program appearing to do strange things when you break the rules. Don't be too surprised if your pointer arithmetic operation is mangled when the compiler assumes that both the originating value and the result refer to the same array — an assumption that you violated.
answered May 8 at 10:49
Lightness Races in OrbitLightness Races in Orbit
299k56485832
299k56485832
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
@αλεχολυτ Yep, by treating the object as an array ofunsigned char
;) (or similar)
– Lightness Races in Orbit
May 8 at 14:21
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
@αλεχολυτ Eachunsigned char
(or similar) is effectively an object.
– Lightness Races in Orbit
May 8 at 14:39
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
|
show 2 more comments
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
@αλεχολυτ Yep, by treating the object as an array ofunsigned char
;) (or similar)
– Lightness Races in Orbit
May 8 at 14:21
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
@αλεχολυτ Eachunsigned char
(or similar) is effectively an object.
– Lightness Races in Orbit
May 8 at 14:39
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
In additinal to iterate over objects in array you can iterate on byte-representation over object.
– αλεχολυτ
May 8 at 14:20
@αλεχολυτ Yep, by treating the object as an array of
unsigned char
;) (or similar)– Lightness Races in Orbit
May 8 at 14:21
@αλεχολυτ Yep, by treating the object as an array of
unsigned char
;) (or similar)– Lightness Races in Orbit
May 8 at 14:21
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
As far as I know in that case there's technically no objects except big one.
– αλεχολυτ
May 8 at 14:37
@αλεχολυτ Each
unsigned char
(or similar) is effectively an object.– Lightness Races in Orbit
May 8 at 14:39
@αλεχολυτ Each
unsigned char
(or similar) is effectively an object.– Lightness Races in Orbit
May 8 at 14:39
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
Are you sure? Here is definition of object in C++.
– αλεχολυτ
May 8 at 14:49
|
show 2 more comments
As noted by some in the comments, unless the resulting value has some meaning or usable in some way, there is no point in making the behavior defined.
There has been a study done for the C language to answer questions related to Pointer Provenance (and with an intention to propose wording changes to the C specification.) and one of the questions was:
Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first? (source)
The conclusion of the authors of the study was published in a paper titled: Exploring C Semantics and Pointer Provenance and with respect to this particular question, the answer was:
Inter-object pointer arithmetic
The first example in this section relied on guessing
(and then checking) the offset between two allocations. What if one instead calculates
the offset, with pointer subtraction; should that let one move between objects, as below?// pointer_offset_from_ptr_subtraction_global_xy.c
#include <stdio.h>
#include <string.h>
#include <stddef.h>
int x=1, y=2;
int main()
int *p = &x;
int *q = &y;
ptrdiff_t offset = q - p;
int *r = p + offset;
if (memcmp(&r, &q, sizeof(r)) == 0)
*r = 11; // is this free of UB?
printf("y=%d *q=%d *r=%dn",y,*q,*r);
In ISO C11, the
q-p
is UB (as a pointer subtraction between pointers to different
objects, which in some abstract-machine executions are not one-past-related). In
a variant semantics that allows construction of more-than-one-past pointers, one
would have to to choose whether the*r=11
access is UB or not. The basic provenance
semantics will forbid it, because r will retain the provenance of thex
allocation, but
its address is not in bounds for that. This is probably the most desirable semantics:
we have found very few example idioms that intentionally use inter-object pointer
arithmetic, and the freedom that forbidding it gives to alias analysis and optimisation seems significant.
This study was picked up by the C++ community, summarized and was sent to WG21 (The C++ Standards Committee) for feedback.
Relevant point of the Summary:
Pointer difference is only defined for pointers with the same provenance and within the same array.
So, they have decided to keep it undefined for now.
Note that there is a study group SG12 within the C++ Standards Committee for studying Undefined Behavior & Vulnerabilities. This group conducts a systematic review to catalog cases of vulnerabilities and undefined/unspecified behavior in the standard, and recommend a coherent set of changes to define and/or specify the behavior. You can keep track of the proceedings of this group to see if there are going to be any changes in the future to the behaviors that are currently undefined or unspecified.
add a comment |
As noted by some in the comments, unless the resulting value has some meaning or usable in some way, there is no point in making the behavior defined.
There has been a study done for the C language to answer questions related to Pointer Provenance (and with an intention to propose wording changes to the C specification.) and one of the questions was:
Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first? (source)
The conclusion of the authors of the study was published in a paper titled: Exploring C Semantics and Pointer Provenance and with respect to this particular question, the answer was:
Inter-object pointer arithmetic
The first example in this section relied on guessing
(and then checking) the offset between two allocations. What if one instead calculates
the offset, with pointer subtraction; should that let one move between objects, as below?// pointer_offset_from_ptr_subtraction_global_xy.c
#include <stdio.h>
#include <string.h>
#include <stddef.h>
int x=1, y=2;
int main()
int *p = &x;
int *q = &y;
ptrdiff_t offset = q - p;
int *r = p + offset;
if (memcmp(&r, &q, sizeof(r)) == 0)
*r = 11; // is this free of UB?
printf("y=%d *q=%d *r=%dn",y,*q,*r);
In ISO C11, the
q-p
is UB (as a pointer subtraction between pointers to different
objects, which in some abstract-machine executions are not one-past-related). In
a variant semantics that allows construction of more-than-one-past pointers, one
would have to to choose whether the*r=11
access is UB or not. The basic provenance
semantics will forbid it, because r will retain the provenance of thex
allocation, but
its address is not in bounds for that. This is probably the most desirable semantics:
we have found very few example idioms that intentionally use inter-object pointer
arithmetic, and the freedom that forbidding it gives to alias analysis and optimisation seems significant.
This study was picked up by the C++ community, summarized and was sent to WG21 (The C++ Standards Committee) for feedback.
Relevant point of the Summary:
Pointer difference is only defined for pointers with the same provenance and within the same array.
So, they have decided to keep it undefined for now.
Note that there is a study group SG12 within the C++ Standards Committee for studying Undefined Behavior & Vulnerabilities. This group conducts a systematic review to catalog cases of vulnerabilities and undefined/unspecified behavior in the standard, and recommend a coherent set of changes to define and/or specify the behavior. You can keep track of the proceedings of this group to see if there are going to be any changes in the future to the behaviors that are currently undefined or unspecified.
add a comment |
As noted by some in the comments, unless the resulting value has some meaning or usable in some way, there is no point in making the behavior defined.
There has been a study done for the C language to answer questions related to Pointer Provenance (and with an intention to propose wording changes to the C specification.) and one of the questions was:
Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first? (source)
The conclusion of the authors of the study was published in a paper titled: Exploring C Semantics and Pointer Provenance and with respect to this particular question, the answer was:
Inter-object pointer arithmetic
The first example in this section relied on guessing
(and then checking) the offset between two allocations. What if one instead calculates
the offset, with pointer subtraction; should that let one move between objects, as below?// pointer_offset_from_ptr_subtraction_global_xy.c
#include <stdio.h>
#include <string.h>
#include <stddef.h>
int x=1, y=2;
int main()
int *p = &x;
int *q = &y;
ptrdiff_t offset = q - p;
int *r = p + offset;
if (memcmp(&r, &q, sizeof(r)) == 0)
*r = 11; // is this free of UB?
printf("y=%d *q=%d *r=%dn",y,*q,*r);
In ISO C11, the
q-p
is UB (as a pointer subtraction between pointers to different
objects, which in some abstract-machine executions are not one-past-related). In
a variant semantics that allows construction of more-than-one-past pointers, one
would have to to choose whether the*r=11
access is UB or not. The basic provenance
semantics will forbid it, because r will retain the provenance of thex
allocation, but
its address is not in bounds for that. This is probably the most desirable semantics:
we have found very few example idioms that intentionally use inter-object pointer
arithmetic, and the freedom that forbidding it gives to alias analysis and optimisation seems significant.
This study was picked up by the C++ community, summarized and was sent to WG21 (The C++ Standards Committee) for feedback.
Relevant point of the Summary:
Pointer difference is only defined for pointers with the same provenance and within the same array.
So, they have decided to keep it undefined for now.
Note that there is a study group SG12 within the C++ Standards Committee for studying Undefined Behavior & Vulnerabilities. This group conducts a systematic review to catalog cases of vulnerabilities and undefined/unspecified behavior in the standard, and recommend a coherent set of changes to define and/or specify the behavior. You can keep track of the proceedings of this group to see if there are going to be any changes in the future to the behaviors that are currently undefined or unspecified.
As noted by some in the comments, unless the resulting value has some meaning or usable in some way, there is no point in making the behavior defined.
There has been a study done for the C language to answer questions related to Pointer Provenance (and with an intention to propose wording changes to the C specification.) and one of the questions was:
Can one make a usable offset between two separately allocated objects by inter-object subtraction (using either pointer or integer arithmetic), to make a usable pointer to the second by adding the offset to the first? (source)
The conclusion of the authors of the study was published in a paper titled: Exploring C Semantics and Pointer Provenance and with respect to this particular question, the answer was:
Inter-object pointer arithmetic
The first example in this section relied on guessing
(and then checking) the offset between two allocations. What if one instead calculates
the offset, with pointer subtraction; should that let one move between objects, as below?// pointer_offset_from_ptr_subtraction_global_xy.c
#include <stdio.h>
#include <string.h>
#include <stddef.h>
int x=1, y=2;
int main()
int *p = &x;
int *q = &y;
ptrdiff_t offset = q - p;
int *r = p + offset;
if (memcmp(&r, &q, sizeof(r)) == 0)
*r = 11; // is this free of UB?
printf("y=%d *q=%d *r=%dn",y,*q,*r);
In ISO C11, the
q-p
is UB (as a pointer subtraction between pointers to different
objects, which in some abstract-machine executions are not one-past-related). In
a variant semantics that allows construction of more-than-one-past pointers, one
would have to to choose whether the*r=11
access is UB or not. The basic provenance
semantics will forbid it, because r will retain the provenance of thex
allocation, but
its address is not in bounds for that. This is probably the most desirable semantics:
we have found very few example idioms that intentionally use inter-object pointer
arithmetic, and the freedom that forbidding it gives to alias analysis and optimisation seems significant.
This study was picked up by the C++ community, summarized and was sent to WG21 (The C++ Standards Committee) for feedback.
Relevant point of the Summary:
Pointer difference is only defined for pointers with the same provenance and within the same array.
So, they have decided to keep it undefined for now.
Note that there is a study group SG12 within the C++ Standards Committee for studying Undefined Behavior & Vulnerabilities. This group conducts a systematic review to catalog cases of vulnerabilities and undefined/unspecified behavior in the standard, and recommend a coherent set of changes to define and/or specify the behavior. You can keep track of the proceedings of this group to see if there are going to be any changes in the future to the behaviors that are currently undefined or unspecified.
edited May 8 at 14:55
Lightness Races in Orbit
299k56485832
299k56485832
answered May 8 at 10:21
P.WP.W
21k41961
21k41961
add a comment |
add a comment |
First see this question mentioned in the comments for why it isn't well defined. The answer given concisely is that arbitrary pointer arithmetic is not possible in segmented memory models used by some (now archaic?) systems.
What is the rationale to make such behavior undefined instead of, for instance, implementation defined?
Whenever standard specifies something as undefined behaviour, it usually could be specified merely to be implementation defined instead. So, why specify anything as undefined?
Well, undefined behaviour is more lenient. In particular, being allowed to assume that there is no undefined behaviour, a compiler may perform optimisations that would break the program if the assumptions weren't correct. So, a reason to specify undefined behaviour is optimisation.
Let's consider function fun(int* arr1, int* arr2)
that takes two pointers as arguments. Those pointers could point to the same array, or not. Let's say the function iterates through one of the pointed arrays (arr1 + n
), and must compare each position to the other pointer for equality ((arr1 + n) != arr2
) in each iteration. For example to ensure that the pointed object is not overridden.
Let's say that we call the function like this: fun(array1, array2)
. The compiler knows that (array1 + n) != array2
, because otherwise behaviour is undefined. Therefore the if the function call is expanded inline, the compiler can remove the redundant check (arr1 + n) != arr2
which is always true. If pointer arithmetic across array boundaries were well (or even implementation) defined, then (array1 + n) == array2
could be true with some n
, and this optimisation would be impossible - unless the compiler can prove that (array1 + n) != array2
holds for all possible values of n
which can sometimes be more difficult to prove.
Pointer arithmetic across members of a class could be implemented even in segmented memory models. Same goes for iterating over the boundaries of a subarray. There are use cases where these could be quite useful, but these are technically UB.
An argument for UB in these cases is more possibilities for UB optimisation. You don't necessarily need to agree that this is a sufficient argument.
Ah, I'm confusing the rules for ordering pointers.==
and!=
are well defined for pointers to objects of the same type (orvoid *
)
– Caleth
May 8 at 10:55
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
add a comment |
First see this question mentioned in the comments for why it isn't well defined. The answer given concisely is that arbitrary pointer arithmetic is not possible in segmented memory models used by some (now archaic?) systems.
What is the rationale to make such behavior undefined instead of, for instance, implementation defined?
Whenever standard specifies something as undefined behaviour, it usually could be specified merely to be implementation defined instead. So, why specify anything as undefined?
Well, undefined behaviour is more lenient. In particular, being allowed to assume that there is no undefined behaviour, a compiler may perform optimisations that would break the program if the assumptions weren't correct. So, a reason to specify undefined behaviour is optimisation.
Let's consider function fun(int* arr1, int* arr2)
that takes two pointers as arguments. Those pointers could point to the same array, or not. Let's say the function iterates through one of the pointed arrays (arr1 + n
), and must compare each position to the other pointer for equality ((arr1 + n) != arr2
) in each iteration. For example to ensure that the pointed object is not overridden.
Let's say that we call the function like this: fun(array1, array2)
. The compiler knows that (array1 + n) != array2
, because otherwise behaviour is undefined. Therefore the if the function call is expanded inline, the compiler can remove the redundant check (arr1 + n) != arr2
which is always true. If pointer arithmetic across array boundaries were well (or even implementation) defined, then (array1 + n) == array2
could be true with some n
, and this optimisation would be impossible - unless the compiler can prove that (array1 + n) != array2
holds for all possible values of n
which can sometimes be more difficult to prove.
Pointer arithmetic across members of a class could be implemented even in segmented memory models. Same goes for iterating over the boundaries of a subarray. There are use cases where these could be quite useful, but these are technically UB.
An argument for UB in these cases is more possibilities for UB optimisation. You don't necessarily need to agree that this is a sufficient argument.
Ah, I'm confusing the rules for ordering pointers.==
and!=
are well defined for pointers to objects of the same type (orvoid *
)
– Caleth
May 8 at 10:55
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
add a comment |
First see this question mentioned in the comments for why it isn't well defined. The answer given concisely is that arbitrary pointer arithmetic is not possible in segmented memory models used by some (now archaic?) systems.
What is the rationale to make such behavior undefined instead of, for instance, implementation defined?
Whenever standard specifies something as undefined behaviour, it usually could be specified merely to be implementation defined instead. So, why specify anything as undefined?
Well, undefined behaviour is more lenient. In particular, being allowed to assume that there is no undefined behaviour, a compiler may perform optimisations that would break the program if the assumptions weren't correct. So, a reason to specify undefined behaviour is optimisation.
Let's consider function fun(int* arr1, int* arr2)
that takes two pointers as arguments. Those pointers could point to the same array, or not. Let's say the function iterates through one of the pointed arrays (arr1 + n
), and must compare each position to the other pointer for equality ((arr1 + n) != arr2
) in each iteration. For example to ensure that the pointed object is not overridden.
Let's say that we call the function like this: fun(array1, array2)
. The compiler knows that (array1 + n) != array2
, because otherwise behaviour is undefined. Therefore the if the function call is expanded inline, the compiler can remove the redundant check (arr1 + n) != arr2
which is always true. If pointer arithmetic across array boundaries were well (or even implementation) defined, then (array1 + n) == array2
could be true with some n
, and this optimisation would be impossible - unless the compiler can prove that (array1 + n) != array2
holds for all possible values of n
which can sometimes be more difficult to prove.
Pointer arithmetic across members of a class could be implemented even in segmented memory models. Same goes for iterating over the boundaries of a subarray. There are use cases where these could be quite useful, but these are technically UB.
An argument for UB in these cases is more possibilities for UB optimisation. You don't necessarily need to agree that this is a sufficient argument.
First see this question mentioned in the comments for why it isn't well defined. The answer given concisely is that arbitrary pointer arithmetic is not possible in segmented memory models used by some (now archaic?) systems.
What is the rationale to make such behavior undefined instead of, for instance, implementation defined?
Whenever standard specifies something as undefined behaviour, it usually could be specified merely to be implementation defined instead. So, why specify anything as undefined?
Well, undefined behaviour is more lenient. In particular, being allowed to assume that there is no undefined behaviour, a compiler may perform optimisations that would break the program if the assumptions weren't correct. So, a reason to specify undefined behaviour is optimisation.
Let's consider function fun(int* arr1, int* arr2)
that takes two pointers as arguments. Those pointers could point to the same array, or not. Let's say the function iterates through one of the pointed arrays (arr1 + n
), and must compare each position to the other pointer for equality ((arr1 + n) != arr2
) in each iteration. For example to ensure that the pointed object is not overridden.
Let's say that we call the function like this: fun(array1, array2)
. The compiler knows that (array1 + n) != array2
, because otherwise behaviour is undefined. Therefore the if the function call is expanded inline, the compiler can remove the redundant check (arr1 + n) != arr2
which is always true. If pointer arithmetic across array boundaries were well (or even implementation) defined, then (array1 + n) == array2
could be true with some n
, and this optimisation would be impossible - unless the compiler can prove that (array1 + n) != array2
holds for all possible values of n
which can sometimes be more difficult to prove.
Pointer arithmetic across members of a class could be implemented even in segmented memory models. Same goes for iterating over the boundaries of a subarray. There are use cases where these could be quite useful, but these are technically UB.
An argument for UB in these cases is more possibilities for UB optimisation. You don't necessarily need to agree that this is a sufficient argument.
edited May 8 at 11:05
answered May 8 at 10:17
eerorikaeerorika
92.2k667138
92.2k667138
Ah, I'm confusing the rules for ordering pointers.==
and!=
are well defined for pointers to objects of the same type (orvoid *
)
– Caleth
May 8 at 10:55
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
add a comment |
Ah, I'm confusing the rules for ordering pointers.==
and!=
are well defined for pointers to objects of the same type (orvoid *
)
– Caleth
May 8 at 10:55
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
Ah, I'm confusing the rules for ordering pointers.
==
and !=
are well defined for pointers to objects of the same type (or void *
)– Caleth
May 8 at 10:55
Ah, I'm confusing the rules for ordering pointers.
==
and !=
are well defined for pointers to objects of the same type (or void *
)– Caleth
May 8 at 10:55
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
@Caleth Cool. That's what I remembered :) The relational operators aren't themselves UB either (at least in latest draft). It's just that the order is unspecified, so they don't impose a strict ordering, which may lead to violation of some preconditions.
– eerorika
May 8 at 10:59
add a comment |
12
What meaning would the resulting value have?
– Blaze
May 8 at 8:14
3
i dont think there would be much difference if it was implementation defined, you probably would have to read in your compilers documentation that it is undefined ;)
– formerlyknownas_463035818
May 8 at 8:15
2
@αλεχολυτ what I'm getting at is that the resulting value is nonsensical, there's no way to use it properly. Instead of forcing the compiler to generate a nonsense value, the standard just says that it's UB, allowing the compiler to salvage that situation however it wants, possibly by not even doing the subtraction and thus saving time. I mean, it could just optimize away the line and the value is whatever was in memory to begin with, the result would be just as useless. In general, leaving things up to the implementation generates potential for possible optimization.
– Blaze
May 8 at 8:26
3
What if the objects are in different memory segments? There's no meaningful "difference" then.
– melpomene
May 8 at 8:34
1
@Blaze Assuming a linear memory layout, the resulting value isn't entirely non-sensical. I've seen code that actually relies on pointer arithmetic across separate arrays. For example, setting
d = p - q
and later assuming thatq + d
yieldsp
.– nwellnhof
May 8 at 8:35