Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix mismatch between thread ID sets and reachable threads #484

Merged
merged 1 commit into from
Dec 9, 2021

Conversation

sim642
Copy link
Member

@sim642 sim642 commented Dec 8, 2021

This makes nonunique thread spawns also spawn the unique version to keep it reachable since thread ID sets refer to it.
It should fix the issue particularly with incremental comparison where a thread ID set refers to a unique thread, which later was found to be nonunique, making the unique context of it not reached and thus pruned.

Compared to using Hoare sets for thread ID sets, this is significantly simpler and also should work with thread joins correctly.
Compared to path splitting according to the created thread set in the threadid analysis, this should be more efficient by not splitting the states.

TODO

This makes nonunique thread spawns also spawn the unique version to keep it reachable since thread ID sets refer to it.
Copy link
Member

@vesalvojdani vesalvojdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does fix the issue: https://goblint.cs.ut.ee/vesal-results/latest/ We still have a potential unsoundness / precision improvement with restart, but there the difference is in the int domain.

@sim642
Copy link
Member Author

sim642 commented Dec 9, 2021

I ran sv-benchmarks before vs after and didn't see any precision or performance change due to this change either, so it should be good.

@sim642 sim642 merged commit 994680c into master Dec 9, 2021
@sim642 sim642 deleted the threadset-recreate-unique branch December 9, 2021 07:59
@@ -136,9 +136,9 @@ struct
let n = Base.threadenter n v in
let ((p', s') as composed) = compose current n in
if is_unique composed && S.mem n cs then
(p, S.singleton n)
[(p, S.singleton n); composed] (* also respawn unique version of the thread to keep it reachable while thread ID sets refer to it *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a comment that this potentially harms precision for creates that are encountered multiple times but not in a loop and is only needed for incremental.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of example do you have in mind where this precision loss can happen? I cannot think of one right now.

And if it does somehow happen, then I think the proper way to get such precision back would be to also track a set of must-created threads, not just the may-created threads that it currently tracks. Then it would be possible to distinguish the loop case (where it isn't must-created) from some other case (where I guess it is must-created?).

Copy link
Member

@michael-schwarz michael-schwarz Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something like

// SKIP PARAM: --set ana.path_sens[+] threadflag --set ana.activated[+] threadJoins
#include <pthread.h>
#include <assert.h>

int t_fun(void *arg) {
  int i = (int)arg;
  return i+1;
}

int crea_and_join(int arg) {
  pthread_t id;

  int i;
  // Not my proudest moment but this can't be a proper pointer or that would escape
  pthread_create(&id, NULL, t_fun, (void**)arg);
  pthread_join(id, (void**) &i);
  assert(i == arg+1);
}


int main(void) {
  int t;

  crea_and_join(10);
  crea_and_join(5);

  return 0;
}

But somehow it also works here which is highly confusing. One would expect that now also for the unique thread id
main, t_fun@tests/regression/36-apron/96-exp.c:19:3-19:47, {} would have to account for the possibility that t_fun is passed 5 but it doesn't. Or am I missing something here?
Also, for some strange reason id is still a singleton set in crea_and_join(5) which is strange as it should contain both thread ids, if I spawn two different threads here, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, my local master was behind. This actually is an example highlighting the precision loss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threadflag path-sensitivity and the threadJoins analysis aren't really needed for this, are they?

I now see what you mean, but this precision we're losing here is extremely specific: using a wrapper function twice to distinguish the first and the second thread created from the same location, while casting the argument to an unrealistic type.

I feel like there are numerous cleaner way to get this precision than rely on the previous inconsistent behavior, e.g. having a must-created set in ThreadIdDomain.History or maybe disabling this double-spawning when threadflag is path-sensitive or having a threadWrapper analysis to distinguish these or having a more general form of inlining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threadflag path-sensitivity and the threadJoins analysis aren't really needed for this, are they?

Nope

I feel like there are numerous cleaner way to get this precision than rely on the previous inconsistent behavior, e.g. having a must-created set in ThreadIdDomain.History or maybe disabling this double-spawning when threadflag is path-sensitive or having a threadWrapper analysis to distinguish these or having a more general form of inlining.

The cleanest but probably also expensive way would be to do it as we describe it in the paper.

Overall, you are right, the precision loss here seems to be limited to very peculiar changes, so it probably doesn't do anything in the grand scheme of things.

@michael-schwarz
Copy link
Member

I ran sv-benchmarks before vs after and didn't see any precision or performance change due to this change either, so it should be good.

The more interesting comparison would be what happens to the numbers we reported in the more-traces paper.

Actually, I am wondering if there should not be a switch to turn this behavior off, as it is only needed for incremental and it might be surprising to have such a precision loss to a reader of the more traces paper.

@sim642
Copy link
Member Author

sim642 commented Dec 9, 2021

Actually, I am wondering if there should not be a switch to turn this behavior off, as it is only needed for incremental and it might be surprising to have such a precision loss to a reader of the more traces paper.

This inconsistency is also present in the non-incremental case. A global thread ID set may contain a unique thread ID, but if you look at that thread function, then it only exists in the non-unique context. So the part of the solution is referring to something which doesn't appear in the solution, because from the final solution the unique context is no longer reachable and is thus pruned by reachability. But reachability doesn't go into the domain elements of particular analyses to find any references to the pruned contexts.

@sim642 sim642 added this to the v2.0.0 milestone Aug 12, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants