[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

mtvec · 2023-09-06T13:45:13Z

This patch implements MCInstrAnalysis state in order to be able analyze auipc+jalr pairs inside evaluateBranch.

This is implemented as follows:

State: array of currently known GPR values;
Whenever an auipc is detected in updateState, update the state value of RD with the immediate;
Whenever a jalr is detected in evaluateBranch, check if the state holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This PR builds on #65479. Please only review the top commit here.

llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp

llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s

mtvec · 2023-09-20T08:04:32Z

Ping.

llvm/include/llvm/MC/MCInstrAnalysis.h

llvm/tools/llvm-objdump/llvm-objdump.cpp

llvmbot · 2023-10-02T08:36:09Z

@llvm/pr-subscribers-llvm-binary-utilities

Changes

This patch implements MCInstrAnalysis state in order to be able analyze auipc+jalr pairs inside evaluateBranch.

This is implemented as follows:

State: array of currently known GPR values;
Whenever an auipc is detected in updateState, update the state value of RD with the immediate;
Whenever a jalr is detected in evaluateBranch, check if the state holds a value for RS1 and use that to compute its target.

Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump.

This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

This PR builds on #65479. Please only review the top commit here.

Full diff: https://github.com/llvm/llvm-project/pull/65480.diff

5 Files Affected:

(modified) llvm/include/llvm/MC/MCInstrAnalysis.h (+15)
(modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp (+64)
(modified) llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s (+2-2)
(added) llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s (+45)
(modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+20-5)

diff --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index c3c675c39c5590c..dac12af599e6f34 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
   MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
   virtual ~MCInstrAnalysis() = default;
 
+  /// Clear the internal state. See updateState for more information.
+  virtual void resetState() {}
+
+  /// Update internal state with \p Inst at \p Addr.
+  ///
+  /// For some types a analyses, inspecting a single instruction is not
+  /// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
+  /// on AArch64. To support inspecting multiple instructions, targets may keep
+  /// track of an internal state while analysing instructions. Clients should
+  /// call updateState for every instruction which allows later calls to one of
+  /// the analysis functions to take previous instructions into account.
+  /// Whenever state becomes irrelevant (e.g., when starting to disassemble a
+  /// new function), clients should call resetState to clear it.
+  virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
+
   virtual bool isBranch(const MCInst &Inst) const {
     return Info->get(Inst.getOpcode()).isBranch();
   }
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 75af5c2de09469b..5349fa0a4b30881 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -114,10 +114,65 @@ static MCTargetStreamer *createRISCVNullTargetStreamer(MCStreamer &S) {
 namespace {
 
 class RISCVMCInstrAnalysis : public MCInstrAnalysis {
+  std::optional<int64_t> GPRState[31];
+
+  static bool isGPR(unsigned Reg) {
+    return Reg >= RISCV::X0 && Reg <= RISCV::X31;
+  }
+
+  void setGPRState(unsigned Reg, std::optional<int64_t> Value) {
+    assert(isGPR(Reg) && "Invalid GPR reg");
+
+    if (Reg != RISCV::X0)
+      GPRState[Reg - RISCV::X1] = Value;
+  }
+
+  std::optional<int64_t> getGPRState(unsigned Reg) const {
+    assert(isGPR(Reg) && "Invalid GPR reg");
+
+    if (Reg == RISCV::X0)
+      return 0;
+    return GPRState[Reg - RISCV::X1];
+  }
+
 public:
   explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
       : MCInstrAnalysis(Info) {}
 
+  void resetState() override {
+    std::fill(std::begin(GPRState), std::end(GPRState), std::nullopt);
+  }
+
+  void updateState(const MCInst &Inst, uint64_t Addr) override {
+    // Terminators mark the end of a basic block which means the sequentially
+    // next instruction will be the first of another basic block and the current
+    // state will typically not be valid anymore. For calls, we assume all
+    // registers may be clobbered by the callee (TODO: should we take the
+    // calling convention into account?).
+    if (isTerminator(Inst) || isCall(Inst)) {
+      resetState();
+      return;
+    }
+
+    switch (Inst.getOpcode()) {
+    default: {
+      // Clear the state of all defined registers for instructions that we don't
+      // explicitly support.
+      auto NumDefs = Info->get(Inst.getOpcode()).getNumDefs();
+      for (unsigned I = 0; I < NumDefs; ++I) {
+        auto DefReg = Inst.getOperand(I).getReg();
+        if (isGPR(DefReg))
+          setGPRState(DefReg, std::nullopt);
+      }
+      break;
+    }
+    case RISCV::AUIPC:
+      setGPRState(Inst.getOperand(0).getReg(),
+                  Addr + (Inst.getOperand(1).getImm() << 12));
+      break;
+    }
+  }
+
   bool evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
                       uint64_t &Target) const override {
     if (isConditionalBranch(Inst)) {
@@ -140,6 +195,15 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
       return true;
     }
 
+    if (Inst.getOpcode() == RISCV::JALR) {
+      if (auto TargetRegState = getGPRState(Inst.getOperand(1).getReg())) {
+        Target = *TargetRegState + Inst.getOperand(2).getImm();
+        return true;
+      }
+
+      return false;
+    }
+
     return false;
   }
 
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
index 5fec4e6e25a39a3..ebd86a702b70e7c 100644
--- a/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/branches.s
@@ -57,11 +57,11 @@ c.jal bar
 c.j bar
 
 # CHECK: auipc ra, 0
-# CHECK: jalr	ra, 16(ra){{$}}
+# CHECK: jalr	ra, 16(ra) <foo+0x58>
 call .Llocal
 
 # CHECK: auipc ra, 0
-# CHECK: jalr	ra, 16(ra){{$}}
+# CHECK: jalr	ra, 16(ra) <bar>
 call bar
 
 .Llocal:
diff --git a/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
new file mode 100644
index 000000000000000..91b643e961fc6df
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s
@@ -0,0 +1,45 @@
+# RUN: llvm-mc -filetype=obj -triple riscv32 -mattr=+c < %s | \
+# RUN:     llvm-objdump -d -M no-aliases --no-show-raw-insn - | \
+# RUN:     FileCheck %s
+
+## Test multiple interleaved auipc/jalr pairs
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: auipc t1, 0
+2: auipc t1, %pcrel_hi(bar)
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+## Target should not be printed because the call above clobbers register state
+# CHECK: jalr ra, {{[0-9]+}}(t1){{$}}
+jalr %pcrel_lo(2b)(t1)
+
+## Test that auipc+jalr with a write to the target register in between does not
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t0, 0
+li t0, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a write to an unrelated register in between does
+## print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.li t1, 0
+li t1, 0
+# CHECK: jalr ra, {{[0-9]+}}(t0) <bar>
+jalr %pcrel_lo(1b)(t0)
+
+## Test that auipc+jalr with a terminator in between does not print the target
+# CHECK: auipc t0, 0
+1: auipc t0, %pcrel_hi(bar)
+# CHECK: c.j {{.*}} <bar>
+j bar
+# CHECK: jalr ra, {{[0-9]+}}(t0){{$}}
+jalr %pcrel_lo(1b)(t0)
+
+# CHECK-LABEL: <bar>:
+bar:
+# CHECK: c.nop
+nop
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 96d74d6e2d5e865..385be9fb9257e16 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -842,7 +842,7 @@ class DisassemblerTarget {
   std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
   std::shared_ptr<MCContext> Context;
   std::unique_ptr<MCDisassembler> DisAsm;
-  std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
+  std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
   std::shared_ptr<MCInstPrinter> InstPrinter;
   PrettyPrinter *Printer;
 
@@ -1265,14 +1265,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
   }
 }
 
-static void collectLocalBranchTargets(
-    ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
-    MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
-    uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
+static void
+collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
+                          MCDisassembler *DisAsm, MCInstPrinter *IP,
+                          const MCSubtargetInfo *STI, uint64_t SectionAddr,
+                          uint64_t Start, uint64_t End,
+                          std::unordered_map<uint64_t, std::string> &Labels) {
   // So far only supports PowerPC and X86.
   if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
     return;
 
+  if (MIA)
+    MIA->resetState();
+
   Labels.clear();
   unsigned LabelCount = 0;
   Start += SectionAddr;
@@ -1298,6 +1303,9 @@ static void collectLocalBranchTargets(
           !Labels.count(Target) &&
           !(STI->getTargetTriple().isPPC() && Target == Index))
         Labels[Target] = ("L" + Twine(LabelCount++)).str();
+      MIA->updateState(Inst, Index);
+    } else if (!Disassembled && MIA) {
+      MIA->resetState();
     }
     Index += Size;
   }
@@ -1939,6 +1947,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
                                BBAddrMapLabels);
       }
 
+      if (DT->InstrAnalysis)
+        DT->InstrAnalysis->resetState();
+
       while (Index < End) {
         // ARM and AArch64 ELF binaries can interleave data and text in the
         // same section. We rely on the markers introduced to understand what
@@ -2155,6 +2166,10 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
               if (TargetOS == &CommentStream)
                 *TargetOS << "\n";
             }
+
+            DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
+          } else if (!Disassembled && DT->InstrAnalysis) {
+            DT->InstrAnalysis->resetState();
           }
         }

arichardson · 2023-10-02T14:19:01Z

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

mtvec · 2023-10-02T17:57:18Z

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):

$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null

So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of std::optional<uint64_t> but one containing just uint64_t and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

arichardson · 2023-10-02T18:03:30Z

In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):
$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null
So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of std::optional<uint64_t> but one containing just uint64_t and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

mtvec · 2023-10-03T10:52:34Z

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

I implemented this and to my surprise, the overhead is still around 1%. After some digging, I noticed that the increase in runtime seems almost entirely due to the increased output size (total size increase is about 2% on the clang benchmark, the functions contributing the most to the increased runtime are output-related ones like llvm::sys::unicode::columnWidthUTF8).

Since this overhead is unavoidable, do you still feel it's worthwhile to implement the optimization?

arichardson · 2023-10-03T14:09:30Z

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

I implemented this and to my surprise, the overhead is still around 1%. After some digging, I noticed that the increase in runtime seems almost entirely due to the increased output size (total size increase is about 2% on the clang benchmark, the functions contributing the most to the increased runtime are output-related ones like llvm::sys::unicode::columnWidthUTF8).

Since this overhead is unavoidable, do you still feel it's worthwhile to implement the optimization?

I guess it makes sense that printing and looking up symbols is more expensive than zeroing 512 bytes somewhat regularly. I think the optimization does make sense unless it makes the code a lot less readable. But I'll leave that decision to other reviewers. The change looks good to me now but please wait for a review from someone else before merging.

mtvec · 2023-10-04T06:35:57Z

I think the optimization does make sense unless it makes the code a lot less readable.

I don't think it makes the code unreadable so I implemented the optimization.

This patch implements `MCInstrAnalysis` state in order to be able analyze auipc+jalr pairs inside `evaluateBranch`. This is implemented as follows: - State: array of currently known GPR values; - Whenever an auipc is detected in `updateState`, update the state value of RD with the immediate; - Whenever a jalr is detected in `evaluateBranch`, check if the state holds a value for RS1 and use that to compute its target. Note that this is similar to how binutils implements it and the output of llvm-objdump should now mostly match the one of GNU objdump. This patch also updates the relevant llvm-objdump patches and adds a new one testing the output for interleaved auipc+jalr pairs.

mtvec requested review from asb, MaskRay, jrtc27, topperc and jh7370 September 6, 2023 13:45

mtvec requested a review from a team as a code owner September 6, 2023 13:45

github-actions bot added backend:RISC-V mc Machine (object) code labels Sep 6, 2023

mtvec force-pushed the riscv-eval-auipc-jalr branch from 4065620 to 2b9d089 Compare September 6, 2023 15:12

arichardson requested changes Sep 7, 2023

View reviewed changes

llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp Show resolved Hide resolved

llvm/test/tools/llvm-objdump/ELF/RISCV/multi-instr-target.s Outdated Show resolved Hide resolved

arichardson reviewed Sep 29, 2023

View reviewed changes

llvm/include/llvm/MC/MCInstrAnalysis.h Outdated Show resolved Hide resolved

llvm/tools/llvm-objdump/llvm-objdump.cpp Show resolved Hide resolved

llvmbot added the llvm:binary-utilities label Oct 2, 2023

arichardson approved these changes Oct 3, 2023

View reviewed changes

mtvec mentioned this pull request Oct 17, 2023

[MC][NFC] Allow MCInstrAnalysis to store state #65479

Merged

mtvec force-pushed the riscv-eval-auipc-jalr branch from 0c03d56 to f8b8f8a Compare October 20, 2023 06:34

mtvec merged commit 95f924f into llvm:main Oct 20, 2023

mtvec deleted the riscv-eval-auipc-jalr branch October 20, 2023 06:36

madhur13490 mentioned this pull request Oct 20, 2023

Revert commit ba8565fbcb975e2d067ce3ae5a7dbaae4953edd3 madhur13490/llvm-project#3

Closed

spupyrev mentioned this pull request Oct 23, 2023

[BOLT] Rename cds to cdsort #69966

Merged

banach-space mentioned this pull request Oct 24, 2023

[mlir][vector] Add scalable vectors to tests for vector.contract #70039

Merged

MaskRay mentioned this pull request Jun 21, 2025

[feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv #144620

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

Uh oh!

mtvec commented Sep 6, 2023

Uh oh!

Uh oh!

Uh oh!

mtvec commented Sep 20, 2023

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Oct 2, 2023

Uh oh!

arichardson commented Oct 2, 2023

Uh oh!

mtvec commented Oct 2, 2023

Uh oh!

arichardson commented Oct 2, 2023

Uh oh!

mtvec commented Oct 3, 2023

Uh oh!

arichardson commented Oct 3, 2023

Uh oh!

mtvec commented Oct 4, 2023

Uh oh!

Uh oh!

[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

[RISCV][MC] Implement evaluateBranch for auipc+jalr pairs #65480

Uh oh!

Conversation

mtvec commented Sep 6, 2023

Uh oh!

Uh oh!

Uh oh!

mtvec commented Sep 20, 2023

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Oct 2, 2023

Uh oh!

arichardson commented Oct 2, 2023

Uh oh!

mtvec commented Oct 2, 2023

Uh oh!

arichardson commented Oct 2, 2023

Uh oh!

mtvec commented Oct 3, 2023

Uh oh!

arichardson commented Oct 3, 2023

Uh oh!

mtvec commented Oct 4, 2023

Uh oh!

Uh oh!