Skip to content

[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Oct 20, 2023

Conversation

mtvec
Copy link
Contributor

@mtvec mtvec commented Sep 6, 2023

This patch implements MCInstrAnalysis state in order to be able analyze auipc+jalr pairs inside evaluateBranch.

This is implemented as follows:

  • State: array of currently known GPR values;
  • Whenever an auipc is detected in updateState, update the state value of RD with the immediate;
  • Whenever a jalr is detected in evaluateBranch, check if the state holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This PR builds on #65479. Please only review the top commit here.

@mtvec mtvec requested a review from a team as a code owner September 6, 2023 13:45
@github-actions github-actions bot added backend:RISC-V mc Machine (object) code labels Sep 6, 2023
@mtvec mtvec force-pushed the riscv-eval-auipc-jalr branch from 4065620 to 2b9d089 Compare September 6, 2023 15:12
@mtvec
Copy link
Contributor Author

mtvec commented Sep 20, 2023

Ping.

@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2023

@llvm/pr-subscribers-llvm-binary-utilities

Changes

This patch implements MCInstrAnalysis state in order to be able analyze auipc+jalr pairs inside evaluateBranch.

This is implemented as follows:

  • State: array of currently known GPR values;
  • Whenever an auipc is detected in updateState, update the state value of RD with the immediate;
  • Whenever a jalr is detected in evaluateBranch, check if the state holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This PR builds on #65479. Please only review the top commit here.


Full diff: https://github.com/llvm/llvm-project/pull/65480.diff

5 Files Affected:

  • (modified) llvm/include/llvm/MC/MCInstrAnalysis.h (+15)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp (+64)
  • (modified) llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s (+2-2)
  • (added) llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s (+45)
  • (modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+20-5)
diff --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index c3c675c39c5590c..dac12af599e6f34 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
   MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
   virtual ~MCInstrAnalysis() = default;
 
+  /// Clear the internal state. See updateState for more information.
+  virtual void resetState() {}
+
+  /// Update internal state with \p Inst at \p Addr.
+  ///
+  /// For some types a analyses, inspecting a single instruction is not
+  /// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
+  /// on AArch64. To support inspecting multiple instructions, targets may keep
+  /// track of an internal state while analysing instructions. Clients should
+  /// call updateState for every instruction which allows later calls to one of
+  /// the analysis functions to take previous instructions into account.
+  /// Whenever state becomes irrelevant (e.g., when starting to disassemble a
+  /// new function), clients should call resetState to clear it.
+  virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
+
   virtual bool isBranch(const MCInst &Inst) const {
     return Info->get(Inst.getOpcode()).isBranch();
   }
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 75af5c2de09469b..5349fa0a4b30881 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -114,10 +114,65 @@ static MCTargetStreamer *createRISCVNullTargetStreamer(MCStreamer &S) {
 namespace {
 
 class RISCVMCInstrAnalysis : public MCInstrAnalysis {
+  std::optional<int64_t> GPRState[31];
+
+  static bool isGPR(unsigned Reg) {
+    return Reg >= RISCV::X0 && Reg <= RISCV::X31;
+  }
+
+  void setGPRState(unsigned Reg, std::optional<int64_t> Value) {
+    assert(isGPR(Reg) && "Invalid GPR reg");
+
+    if (Reg != RISCV::X0)
+      GPRState[Reg - RISCV::X1] = Value;
+  }
+
+  std::optional<int64_t> getGPRState(unsigned Reg) const {
+    assert(isGPR(Reg) && "Invalid GPR reg");
+
+    if (Reg == RISCV::X0)
+      return 0;
+    return GPRState[Reg - RISCV::X1];
+  }
+
 public:
   explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
       : MCInstrAnalysis(Info) {}
 
+  void resetState() override {
+    std::fill(std::begin(GPRState), std::end(GPRState), std::nullopt);
+  }
+
+  void updateState(const MCInst &Inst, uint64_t Addr) override {
+    // Terminators mark the end of a basic block which means the sequentially
+    // next instruction will be the first of another basic block and the current
+    // state will typically not be valid anymore. For calls, we assume all
+    // registers may be clobbered by the callee (TODO: should we take the
+    // calling convention into account?).
+    if (isTerminator(Inst) || isCall(Inst)) {
+      resetState();
+      return;
+    }
+
+    switch (Inst.getOpcode()) {
+    default: {
+      // Clear the state of all defined registers for instructions that we don't
+      // explicitly support.
+      auto NumDefs = Info->get(Inst.getOpcode()).getNumDefs();
+      for (unsigned I = 0; I < NumDefs; ++I) {
+        auto DefReg = Inst.getOperand(I).getReg();
+        if (isGPR(DefReg))
+          setGPRState(DefReg, std::nullopt);
+      }
+      break;
+    }
+    case RISCV::AUIPC:
+      setGPRState(Inst.getOperand(0).getReg(),
+                  Addr + (Inst.getOperand(1).getImm() << 12));
+      break;
+    }
+  }
+
   bool evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
                       uint64_t &Target) const override {
     if (isConditionalBranch(Inst)) {
@@ -140,6 +195,15 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
       return true;
     }
 
+    if (Inst.getOpcode() == RISCV::JALR) {
+      if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
+        Target = *TargetRegState + Inst.getOperand(2).getImm();
+        return true;
+      }
+
+      return false;
+    }
+
     return false;
   }
 
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
index 5fec4e6e25a39a3..ebd86a702b70e7c 100644
--- a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
@@ -57,11 +57,11 @@ c.jal bar
 c.j bar
 
 # CHECK: auipc ra, 0
-# CHECK: jalr	ra, 16(ra){{$}}
+# CHECK: jalr	ra, 16(ra) <foo+0x58>
 call .Llocal
 
 # CHECK: auipc ra, 0
-# CHECK: jalr	ra, 16(ra){{$}}
+# CHECK: jalr	ra, 16(ra) <bar>
 call bar
 
 .Llocal:
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
new file mode 100644
index 000000000000000..91b643e961fc6df
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
@@ -0,0 +1,45 @@
+# RUN: llvm-mc -filetype=obj -triple riscv32 -mattr=+c < %s | \
+# RUN:     llvm-objdump -d -M no-aliases --no-show-raw-insn - | \
+# RUN:     FileCheck %s
+
+## Test multiple interleaved auipc/jalr pairs
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: auipc t1, 0
+2: auipc t1, %pcrel_hi(bar)
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+## Target should not be printed because the call above clobbers register state
+# CHECK: jalr ra, {{[0-9]+}}(t1){{$}}
+jalr %pcrel_lo(2b)(t1)
+
+## Test that auipc+jalr with a write to the target register in between does not
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t0, 0
+li t0, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a write to an unrelated register in between does
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t1, 0
+li t1, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a terminator in between does not print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.j {{.*}} <bar>
+j bar
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+# CHECK-LABEL: <bar>:
+bar:
+# CHECK: c.nop
+nop
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 96d74d6e2d5e865..385be9fb9257e16 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -842,7 +842,7 @@ class DisassemblerTarget {
   std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
   std::shared_ptr<MCContext> Context;
   std::unique_ptr<MCDisassembler> DisAsm;
-  std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
+  std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
   std::shared_ptr<MCInstPrinter> InstPrinter;
   PrettyPrinter *Printer;
 
@@ -1265,14 +1265,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
   }
 }
 
-static void collectLocalBranchTargets(
-    ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
-    MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
-    uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
+static void
+collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
+                          MCDisassembler *DisAsm, MCInstPrinter *IP,
+                          const MCSubtargetInfo *STI, uint64_t SectionAddr,
+                          uint64_t Start, uint64_t End,
+                          std::unordered_map<uint64_t, std::string> &Labels) {
   // So far only supports PowerPC and X86.
   if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
     return;
 
+  if (MIA)
+    MIA->resetState();
+
   Labels.clear();
   unsigned LabelCount = 0;
   Start += SectionAddr;
@@ -1298,6 +1303,9 @@ static void collectLocalBranchTargets(
           !Labels.count(Target) &&
           !(STI->getTargetTriple().isPPC() && Target == Index))
         Labels[Target] = ("L" + Twine(LabelCount++)).str();
+      MIA->updateState(Inst, Index);
+    } else if (!Disassembled && MIA) {
+      MIA->resetState();
     }
     Index += Size;
   }
@@ -1939,6 +1947,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
                                BBAddrMapLabels);
       }
 
+      if (DT->InstrAnalysis)
+        DT->InstrAnalysis->resetState();
+
       while (Index < End) {
         // ARM and AArch64 ELF binaries can interleave data and text in the
         // same section. We rely on the markers introduced to understand what
@@ -2155,6 +2166,10 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
               if (TargetOS == &CommentStream)
                 *TargetOS << "\n";
             }
+
+            DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
+          } else if (!Disassembled && DT->InstrAnalysis) {
+            DT->InstrAnalysis->resetState();
           }
         }
 

@arichardson
Copy link
Member

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

@mtvec
Copy link
Contributor Author

mtvec commented Oct 2, 2023

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):

$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null

So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of std::optional<uint64_t> but one containing just uint64_t and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

@arichardson
Copy link
Member

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):

$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null

So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of std::optional<uint64_t> but one containing just uint64_t and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 3, 2023

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

I implemented this and to my surprise, the overhead is still around 1%. After some digging, I noticed that the increase in runtime seems almost entirely due to the increased output size (total size increase is about 2% on the clang benchmark, the functions contributing the most to the increased runtime are output-related ones like llvm::sys::unicode::columnWidthUTF8).

Since this overhead is unavoidable, do you still feel it's worthwhile to implement the optimization?

@arichardson
Copy link
Member

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

I implemented this and to my surprise, the overhead is still around 1%. After some digging, I noticed that the increase in runtime seems almost entirely due to the increased output size (total size increase is about 2% on the clang benchmark, the functions contributing the most to the increased runtime are output-related ones like llvm::sys::unicode::columnWidthUTF8).

Since this overhead is unavoidable, do you still feel it's worthwhile to implement the optimization?

I guess it makes sense that printing and looking up symbols is more expensive than zeroing 512 bytes somewhat regularly. I think the optimization does make sense unless it makes the code a lot less readable. But I'll leave that decision to other reviewers. The change looks good to me now but please wait for a review from someone else before merging.

@mtvec
Copy link
Contributor Author

mtvec commented Oct 4, 2023

I think the optimization does make sense unless it makes the code a lot less readable.

I don't think it makes the code unreadable so I implemented the optimization.

This patch implements `MCInstrAnalysis` state in order to be able
analyze auipc+jalr pairs inside `evaluateBranch`.

This is implemented as follows:
- State: array of currently known GPR values;
- Whenever an auipc is detected in `updateState`, update the state value
  of RD with the immediate;
- Whenever a jalr is detected in `evaluateBranch`, check if the state
  holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output
of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new
one testing the output for interleaved auipc+jalr pairs.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants