Skip to content

Missoptimization of functions with C calling convention #33868

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Florob opened this issue May 25, 2016 · 14 comments
Closed

Missoptimization of functions with C calling convention #33868

Florob opened this issue May 25, 2016 · 14 comments
Labels
A-FFI Area: Foreign function interface (FFI) P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta.

Comments

@Florob
Copy link
Contributor

Florob commented May 25, 2016

When trying to look at how aggregates are passed to functions I came across the following:
The code presented below erroneously outputs 488447261 instead of 20.
This is using the 1.9.0 preview release (or anything newer) and seems to be a regression from 1.8.0.
The behaviour only occurs when using the C calling conventions and optimizations, eliminating either factor gives the correct result.

#[repr(C)]
struct S {
    a: u32,
    b: f32,
    c: u32
}

extern "C" fn test(s: S) -> u32 {
    s.a + s.b as u32 + s.c
}

fn main() {
    println!("{}", test(S { a: 12, b: 3.4, c: 5 }));
}
@retep998
Copy link
Member

Which target is this for? There are multiple calling conventions across operating systems and architectures after all.

@Florob
Copy link
Contributor Author

Florob commented May 25, 2016

This is on x86_64-unknown-linux-gnu.

@michaelwoerister
Copy link
Member

I can confirm this for rustc 1.10.0-nightly (dd6e8d45e 2016-05-23)

@nagisa nagisa added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label May 25, 2016
@nagisa
Copy link
Member

nagisa commented May 25, 2016

Works on stable.

@nagisa
Copy link
Member

nagisa commented May 25, 2016

cc @eddyb could it be your call trans refactor?

@nagisa
Copy link
Member

nagisa commented May 25, 2016

Seems like we invoke some UB:

; Function Attrs: noinline norecurse nounwind readnone uwtable
define i32 @test({ i64, i64 }) unnamed_addr #0 {
entry-block:
  ret i32 undef
}

is what the function ends up being optimised to.

@nagisa nagisa added I-wrong A-FFI Area: Foreign function interface (FFI) labels May 25, 2016
@nagisa
Copy link
Member

nagisa commented May 25, 2016

This is what I managed to reduce the issue to.

@arielb1
Copy link
Contributor

arielb1 commented May 25, 2016

This suffices:

#[no_mangle]
pub extern "C"  fn test(s: S) -> u32 {
    s.c
}

Bad IR:

define i32 @test({ i64, i64 }) unnamed_addr #0 !dbg !176 {
entry-block:
  %s = alloca %S
  %1 = bitcast %S* %s to { i64, i64 }*
  store { i64, i64 } %0, { i64, i64 }* %1, align 4
  call void @llvm.dbg.declare(metadata %S* %s, metadata !202, metadata !203), !dbg !204
  %2 = getelementptr inbounds %S, %S* %s, i32 0, i32 2
  %3 = load i32, i32* %2, !dbg !205
  ret i32 %3, !dbg !207
}

@arielb1
Copy link
Contributor

arielb1 commented May 25, 2016

The old, working functionality uses a shim:

  %1 = alloca { i64, i64 }
  store { i64, i64 } %0, { i64, i64 }* %1
  %2 = bitcast { i64, i64 }* %1 to %S*
  %3 = call i32 @_ZN4test10__rust_abiE(%S* noalias nocapture dereferenceable(12) %2)
  ret i32 %3

@arielb1
Copy link
Contributor

arielb1 commented May 25, 2016

We are storing 16 bytes of data in a 12-byte bag and that is killing us.

@nagisa
Copy link
Member

nagisa commented May 25, 2016

For the reference, define i32 @test(i64 %s.coerce0, i32 %s.coerce1) #0 { is the signature generated by clang for an identical function. Alternatively we could use {i64, i32} instead of {i64, i64}.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 25, 2016

@arielb1 what was the signature of the shim function? nevermind

@nikomatsakis nikomatsakis added the P-high High priority label May 25, 2016
@eddyb
Copy link
Member

eddyb commented May 25, 2016

Ah, this is the problem @dotdash was aware of, isn't it?
Where none of our current system works perfectly right with respect to cast arguments/returns of different sizes but some of the old code was long-winded enough to avoid the problem.

EDIT: That was actually more about calls, rather than definitions, and I can't get calls to misbehave (we do the same thing there that the old code does AFAICT). One thing that we might need to look into is MIR trans not doing a cast returned value memcpy.

@alexcrichton
Copy link
Member

The core team discussed this issue today and our conclusions were as follows:

  • Definitely seems like something we should fix immediately, hence the P-high.
  • We're very likely to backport the fix to beta (unless it turns out to require a compiler refactor, which seems unlikely)
  • We're moving forward with the 1.9 release tomorrow as this has been on nightly for at least 6 weeks and hasn't been discovered yet. That's at least a signal that the breakage isn't major enough to warrant blocking or delaying the entire release.
  • Once the fix is backported to beta and has some testing on the beta branch we can consider whether to make a point release (e.g. 1.9.1). If this bug is widespread in practice then we can definitely do that, but if the impact continues to be low we may not do that.

@bors bors closed this as completed in 97bf80f May 26, 2016
alexcrichton pushed a commit to alexcrichton/rust that referenced this issue May 26, 2016
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-FFI Area: Foreign function interface (FFI) P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta.
Projects
None yet
Development

No branches or pull requests

8 participants