Skip to content

[feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv #144620

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

arjunUpatel
Copy link

Resolves #108469. Continuation of #109914

As a result of the force push, the context of the old commit history was erased. In the previous iteration changes were made to llvm-objdump which affected non-riscv targets. It was decided to move these changes to another PR. This commit continues the reversion of that iteration
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-mc
@llvm/pr-subscribers-llvm-binary-utilities

@llvm/pr-subscribers-backend-risc-v

Author: Arjun Patel (arjunUpatel)

Changes

Resolves #108469. Continuation of #109914


Full diff: https://github.com/llvm/llvm-project/pull/144620.diff

11 Files Affected:

  • (modified) llvm/include/llvm/MC/MCInstrAnalysis.h (+5)
  • (modified) llvm/lib/MC/MCInstrAnalysis.cpp (+6)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp (+109-10)
  • (added) llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv-ar ()
  • (added) llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv32-ar-coverage ()
  • (added) llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv64-ar-coverage ()
  • (added) llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg (+2)
  • (added) llvm/test/tools/llvm-objdump/RISCV/riscv-ar.s (+57)
  • (added) llvm/test/tools/llvm-objdump/RISCV/riscv32-ar-coverage.s (+30)
  • (added) llvm/test/tools/llvm-objdump/RISCV/riscv64-ar-coverage.s (+106)
  • (modified) llvm/tools/llvm-objdump/llvm-objdump.cpp (+6-5)
diff --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index 63a4e02a92360..1f05e8546c3d1 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -182,6 +182,11 @@ class LLVM_ABI MCInstrAnalysis {
   evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
                  uint64_t &Target) const;
 
+  /// Given an instruction that accesses a memory address, try to compute
+  /// the target address. Return true on success, and the address in \p Target.
+  virtual bool evaluateInstruction(const MCInst &Inst, uint64_t Addr,
+                                   uint64_t Size, uint64_t &Target) const;
+
   /// Given an instruction tries to get the address of a memory operand. Returns
   /// the address on success.
   virtual std::optional<uint64_t>
diff --git a/llvm/lib/MC/MCInstrAnalysis.cpp b/llvm/lib/MC/MCInstrAnalysis.cpp
index cea905d092e0b..1ae0c91a2590c 100644
--- a/llvm/lib/MC/MCInstrAnalysis.cpp
+++ b/llvm/lib/MC/MCInstrAnalysis.cpp
@@ -30,6 +30,12 @@ bool MCInstrAnalysis::evaluateBranch(const MCInst & /*Inst*/, uint64_t /*Addr*/,
   return false;
 }
 
+bool MCInstrAnalysis::evaluateInstruction(const MCInst &Inst, uint64_t Addr,
+                                          uint64_t Size,
+                                          uint64_t &Target) const {
+  return false;
+}
+
 std::optional<uint64_t> MCInstrAnalysis::evaluateMemoryOperandAddress(
     const MCInst &Inst, const MCSubtargetInfo *STI, uint64_t Addr,
     uint64_t Size) const {
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index f3b93f032588c..1ffdca4125fab 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -29,7 +29,9 @@
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/MC/TargetRegistry.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/MathExtras.h"
 #include <bitset>
+#include <cstdint>
 
 #define GET_INSTRINFO_MC_DESC
 #define ENABLE_INSTR_PREDICATE_VERIFIER
@@ -129,6 +131,7 @@ namespace {
 class RISCVMCInstrAnalysis : public MCInstrAnalysis {
   int64_t GPRState[31] = {};
   std::bitset<31> GPRValidMask;
+  unsigned int ArchRegWidth;
 
   static bool isGPR(MCRegister Reg) {
     return Reg >= RISCV::X0 && Reg <= RISCV::X31;
@@ -165,8 +168,8 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
   }
 
 public:
-  explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
-      : MCInstrAnalysis(Info) {}
+  explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info, unsigned int ArchRegWidth)
+      : MCInstrAnalysis(Info), ArchRegWidth(ArchRegWidth) {}
 
   void resetState() override { GPRValidMask.reset(); }
 
@@ -182,6 +185,17 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
     }
 
     switch (Inst.getOpcode()) {
+    case RISCV::C_LUI:
+    case RISCV::LUI: {
+      setGPRState(Inst.getOperand(0).getReg(),
+                  SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
+      break;
+    }
+    case RISCV::AUIPC: {
+      setGPRState(Inst.getOperand(0).getReg(),
+                  Addr + SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
+      break;
+    }
     default: {
       // Clear the state of all defined registers for instructions that we don't
       // explicitly support.
@@ -193,10 +207,6 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
       }
       break;
     }
-    case RISCV::AUIPC:
-      setGPRState(Inst.getOperand(0).getReg(),
-                  Addr + SignExtend64<32>(Inst.getOperand(1).getImm() << 12));
-      break;
     }
   }
 
@@ -234,6 +244,89 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
     return false;
   }
 
+  bool evaluateInstruction(const MCInst &Inst, uint64_t Addr, uint64_t Size,
+                           uint64_t &Target) const override {
+    switch(Inst.getOpcode()) {
+    default:
+      return false;
+    case RISCV::C_ADDI:
+    case RISCV::ADDI: {
+      MCRegister Reg = Inst.getOperand(1).getReg();
+      auto TargetRegState = getGPRState(Reg);
+      if (TargetRegState && Reg != RISCV::X0) {
+        Target = *TargetRegState + Inst.getOperand(2).getImm();
+        Target &= maskTrailingOnes<uint64_t>(ArchRegWidth);
+        return true;
+      }
+      break;
+    }
+    case RISCV::C_ADDIW:
+    case RISCV::ADDIW: {
+      MCRegister Reg = Inst.getOperand(1).getReg();
+      auto TargetRegState = getGPRState(Reg);
+      if (TargetRegState && Reg != RISCV::X0) {
+        Target = *TargetRegState + Inst.getOperand(2).getImm();
+        Target = SignExtend64<32>(Target);
+        return true;
+      }
+      break;
+    }
+    case RISCV::LB:
+    case RISCV::LH:
+    case RISCV::LD:
+    case RISCV::LW:
+    case RISCV::LBU:
+    case RISCV::LHU:
+    case RISCV::LWU:
+    case RISCV::SB:
+    case RISCV::SH:
+    case RISCV::SW:
+    case RISCV::SD:
+    case RISCV::FLH:
+    case RISCV::FLW:
+    case RISCV::FLD:
+    case RISCV::FSH:
+    case RISCV::FSW:
+    case RISCV::FSD:
+    case RISCV::C_LD:
+    case RISCV::C_SD:
+    case RISCV::C_FLD:
+    case RISCV::C_FSD:
+    case RISCV::C_SW:
+    case RISCV::C_LW:
+    case RISCV::C_FSW:
+    case RISCV::C_FLW:
+    case RISCV::C_LBU:
+    case RISCV::C_LH:
+    case RISCV::C_LHU:
+    case RISCV::C_SB:
+    case RISCV::C_SH:
+    case RISCV::C_LWSP:
+    case RISCV::C_SWSP:
+    case RISCV::C_LDSP:
+    case RISCV::C_SDSP:
+    case RISCV::C_FLWSP:
+    case RISCV::C_FSWSP:
+    case RISCV::C_FLDSP:
+    case RISCV::C_FSDSP:
+    case RISCV::C_LD_RV32:
+    case RISCV::C_SD_RV32:
+    case RISCV::C_SDSP_RV32:
+    case RISCV::LD_RV32:
+    case RISCV::C_LDSP_RV32:
+    case RISCV::SD_RV32: {
+      MCRegister Reg = Inst.getOperand(1).getReg();
+      auto TargetRegState = getGPRState(Reg);
+      if (TargetRegState && Reg != RISCV::X0) {
+        Target = *TargetRegState + Inst.getOperand(2).getImm();
+        return true;
+      }
+      break;
+    }
+    }
+    return false;
+  }
+
   bool isTerminator(const MCInst &Inst) const override {
     if (MCInstrAnalysis::isTerminator(Inst))
       return true;
@@ -327,8 +420,12 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
 
 } // end anonymous namespace
 
-static MCInstrAnalysis *createRISCVInstrAnalysis(const MCInstrInfo *Info) {
-  return new RISCVMCInstrAnalysis(Info);
+static MCInstrAnalysis *createRISCV32InstrAnalysis(const MCInstrInfo *Info) {
+  return new RISCVMCInstrAnalysis(Info, 32);
+}
+
+static MCInstrAnalysis *createRISCV64InstrAnalysis(const MCInstrInfo *Info) {
+  return new RISCVMCInstrAnalysis(Info, 64);
 }
 
 extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTargetMC() {
@@ -344,12 +441,14 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTargetMC() {
     TargetRegistry::RegisterELFStreamer(*T, createRISCVELFStreamer);
     TargetRegistry::RegisterObjectTargetStreamer(
         *T, createRISCVObjectTargetStreamer);
-    TargetRegistry::RegisterMCInstrAnalysis(*T, createRISCVInstrAnalysis);
-
     // Register the asm target streamer.
     TargetRegistry::RegisterAsmTargetStreamer(*T, createRISCVAsmTargetStreamer);
     // Register the null target streamer.
     TargetRegistry::RegisterNullTargetStreamer(*T,
                                                createRISCVNullTargetStreamer);
   }
+  TargetRegistry::RegisterMCInstrAnalysis(getTheRISCV32Target(),
+                                          createRISCV32InstrAnalysis);
+  TargetRegistry::RegisterMCInstrAnalysis(getTheRISCV64Target(),
+                                          createRISCV64InstrAnalysis);
 }
diff --git a/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv-ar b/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv-ar
new file mode 100644
index 0000000000000..bc335bc24f88d
Binary files /dev/null and b/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv-ar differ
diff --git a/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv32-ar-coverage b/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv32-ar-coverage
new file mode 100755
index 0000000000000..8ac5ea52258a1
Binary files /dev/null and b/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv32-ar-coverage differ
diff --git a/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv64-ar-coverage b/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv64-ar-coverage
new file mode 100755
index 0000000000000..ceaf049b9a4fe
Binary files /dev/null and b/llvm/test/tools/llvm-objdump/RISCV/Inputs/riscv64-ar-coverage differ
diff --git a/llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg b/llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg
new file mode 100644
index 0000000000000..17351748513d9
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/RISCV/lit.local.cfg
@@ -0,0 +1,2 @@
+if not "RISCV" in config.root.targets:
+    config.unsupported = True
diff --git a/llvm/test/tools/llvm-objdump/RISCV/riscv-ar.s b/llvm/test/tools/llvm-objdump/RISCV/riscv-ar.s
new file mode 100644
index 0000000000000..72e5f586b721e
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/RISCV/riscv-ar.s
@@ -0,0 +1,57 @@
+# RUN: llvm-objdump -d %p/Inputs/riscv-ar | FileCheck %s
+
+# CHECK:   auipc a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   ld a0, {{-?0x[0-9a-fA-F]+}}(a0)
+# CHECK:   auipc a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addi a0, a0, {{-?0x[0-9a-fA-F]+}} <gdata>
+# CHECK:   auipc	a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addi a0, a0, {{-?0x[0-9a-fA-F]+}} <gdata>
+# CHECK:   auipc	a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   lw a0, {{-?0x[0-9a-fA-F]+}}(a0) <gdata>
+# CHECK:   auipc	a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addi a0, a0, {{-?0x[0-9a-fA-F]+}} <ldata>
+# CHECK:   auipc	ra, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   jalr {{-?0x[0-9a-fA-F]+}}(ra) <func>
+# CHECK:   auipc	t1, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   jr {{-?0x[0-9a-fA-F]+}}(t1) <func>
+# CHECK:   lui a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addiw a0, a0, {{-?0x[0-9a-fA-F]+}} <gdata+0x12242678>
+# CHECK:   lui a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addiw	a0, a0, {{-?0x[0-9a-fA-F]+}} <gdata+0x1438ad>
+# CHECK:   slli a0, a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addi a0, a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   slli a0, a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addi a0, a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   slli a0, a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addi a0, a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   lui a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   lui a0, {{-?0x[0-9a-fA-F]+}}
+# CHECK:   addiw a0, a0, {{-?0x[0-9a-fA-F]+}} <_start+0xfefff>
+
+.global _start
+.text
+_start:
+  la a0, gdata
+  lla a0, gdata
+  lla a0, gdata
+  lw a0, gdata
+  lla a0, ldata
+
+  call func
+  tail func
+
+  li a0, 0x12345678
+  li a0, 0x1234567890abcdef
+  li a0, 0x10000
+  li a0, 0xfffff
+
+  .skip 0x100000
+func:
+  ret
+
+ldata:
+  .int 0
+
+.data
+gdata:
+  .int 0
diff --git a/llvm/test/tools/llvm-objdump/RISCV/riscv32-ar-coverage.s b/llvm/test/tools/llvm-objdump/RISCV/riscv32-ar-coverage.s
new file mode 100644
index 0000000000000..2e1a109f9ec86
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/RISCV/riscv32-ar-coverage.s
@@ -0,0 +1,30 @@
+# RUN: llvm-objdump -d %p/Inputs/riscv32-ar-coverage | FileCheck %s
+
+# CHECK: 00001000 <_start>:
+# CHECK-NEXT:     1000: 00000517     	auipc	a0, 0x0
+# CHECK-NEXT:     1004: 0559         	addi	a0, a0, 0x16 <target>
+# CHECK-NEXT:     1006: 00000517     	auipc	a0, 0x0
+# CHECK-NEXT:     100a: 6910         	ld	a2, 0x10(a0) <target>
+# CHECK-NEXT:     100c: 00000517     	auipc	a0, 0x0
+# CHECK-NEXT:     1010: 00c53523     	sd	a2, 0xa(a0) <target>
+# CHECK-NEXT:     1014: 0000         	unimp
+
+# the structure of this test file is similar to that of riscv64-ar-coverage
+# with the major difference being that these tests are focused on instructions
+# for 32 bit architecture
+
+.global _start
+.text
+_start:
+  auipc a0, 0x0
+  addi a0, a0, 0x16   # addi -- behavior changes with differentr architectures
+
+  auipc a0, 0x0
+  c.ld a2, 0x10(a0)   # zclsd instruction
+
+  auipc a0, 0x0
+  sd a2, 0xa(a0)      # zilsd instruction
+
+.skip 0x2
+target:
+  ret:
diff --git a/llvm/test/tools/llvm-objdump/RISCV/riscv64-ar-coverage.s b/llvm/test/tools/llvm-objdump/RISCV/riscv64-ar-coverage.s
new file mode 100644
index 0000000000000..003defd351dfd
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/RISCV/riscv64-ar-coverage.s
@@ -0,0 +1,106 @@
+# RUN: llvm-objdump -d %p/Inputs/riscv64-ar-coverage | FileCheck %s
+
+# CHECK: 0000000000001000 <_start>:
+# CHECK-NEXT:     1000: 00001517     	auipc	a0, 0x1
+# CHECK-NEXT:     1004: 00450513     	addi	a0, a0, 0x4 <target>
+# CHECK-NEXT:     1008: 00001517     	auipc	a0, 0x1
+# CHECK-NEXT:     100c: 1571         	addi	a0, a0, -0x4 <target>
+# CHECK-NEXT:     100e: 6509         	lui	a0, 0x2
+# CHECK-NEXT:     1010: 0045059b     	addiw	a1, a0, 0x4 <target>
+# CHECK-NEXT:     1014: 6509         	lui	a0, 0x2
+# CHECK-NEXT:     1016: 2511         	addiw	a0, a0, 0x4 <target>
+# CHECK-NEXT:     1018: 00102537     	lui	a0, 0x102
+# CHECK-NEXT:     101c: c50c         	sw	a1, 0x8(a0) <far_target>
+# CHECK-NEXT:     101e: 00102537     	lui	a0, 0x102
+# CHECK-NEXT:     1022: 4508         	lw	a0, 0x8(a0) <far_target>
+# CHECK-NEXT:     1024: 6509         	lui	a0, 0x2
+# CHECK-NEXT:     1026: 6585         	lui	a1, 0x1
+# CHECK-NEXT:     1028: 0306         	slli	t1, t1, 0x1
+# CHECK-NEXT:     102a: 0511         	addi	a0, a0, 0x4 <target>
+# CHECK-NEXT:     102c: 0505         	addi	a0, a0, 0x1
+# CHECK-NEXT:     102e: 00200037     	lui	zero, 0x200
+# CHECK-NEXT:     1032: 00a02423     	sw	a0, 0x8(zero)
+# CHECK-NEXT:     1036: 00101097     	auipc	ra, 0x101
+# CHECK-NEXT:     103a: fd6080e7     	jalr	-0x2a(ra) <func>
+# CHECK-NEXT:     103e: 640d         	lui	s0, 0x3
+# CHECK-NEXT:     1040: 8800         	sb	s0, 0x0(s0) <zcb>
+# CHECK-NEXT:     1042: 4522         	lw	a0, 0x8(sp)
+
+
+.global _start
+.text
+
+# The core of the feature being added was address resolution for instruction 
+# sequences where an register is populated by immediate values via two
+# separate instructions. First by an instruction that provides the upper bits
+# (auipc, lui ...) followed by another instruction for the lower bits (addi,
+# jalr, ld ...).
+
+_start:
+  # Test block 1-3 each focus on a certain starting instruction in a sequences, 
+  # the ones that provide the upper bits. The other sequence is another
+  # instruction the provides the lower bits. The second instruction is
+  # arbitrarily chosen to increase code coverage
+
+  # test block #1
+  lla a0, target     # addi
+  auipc a0, 0x1
+  c.addi a0, -0x4    # c.addi
+
+  # test block #2
+  c.lui a0, 0x2
+  addiw a1, a0, 0x4  # addiw
+  c.lui a0, 0x2
+  c.addiw a0, 0x4    # c.addiw
+
+  # test block #3
+  lui a0, 0x102
+  sw a1, 0x8(a0)     # sw
+  lui a0, 0x102
+  c.lw a0, 0x8(a0)   # lw
+
+  # Test block 4 tests instruction interleaving, essentially the code's
+  # ability to keep track of a valid sequence even if multiple other unrelated
+  # instructions separate the two
+
+  # test #4
+  lui a0, 0x2
+  lui a1, 0x1        # unrelated instruction
+  slli t1, t1, 0x1   # unrelated instruction
+  addi a0, a0, 0x4
+  addi a0, a0, 0x1   # verify register tracking terminates
+
+  # Test 5 ensures that an instruction writing into the zero register does
+  # not trigger resolution because that register's value cannot change and
+  # the sequence is equivalent to never running the first instruction
+
+  # test #5
+  lui x0, 0x200
+  sw a0, 0x8(x0)
+
+  # Test 6 ensures that the newly added functionality is compatible with
+  # code that already worked for branch instructions
+
+  # test #6
+  call func
+
+  # test #7 zcb extension
+  lui x8, 0x3
+  c.sb x8, 0(x8)
+
+  # test #8 stack based load/stores
+  c.lwsp a0, 0x8(sp)
+
+# these are the labels that the instructions above are expecteed to resolve to
+.section .data
+.skip 0x4
+target:
+  .word 1
+.skip 0xff8
+zcb:
+  .word 1
+.skip 0xff004
+far_target:
+  .word 2
+func:
+  ret
diff --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 5ecb33375943f..b5d316e1e3857 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -1520,8 +1520,8 @@ collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
     if (MIA) {
       if (Disassembled) {
         uint64_t Target;
-        bool TargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
-        if (TargetKnown && (Target >= Start && Target < End) &&
+        bool BranchTargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
+        if (BranchTargetKnown && (Target >= Start && Target < End) &&
             !Targets.count(Target)) {
           // On PowerPC and AIX, a function call is encoded as a branch to 0.
           // On other PowerPC platforms (ELF), a function call is encoded as
@@ -2356,8 +2356,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
             llvm::raw_ostream *TargetOS = &FOS;
             uint64_t Target;
             bool PrintTarget = DT->InstrAnalysis->evaluateBranch(
-                Inst, SectionAddr + Index, Size, Target);
-
+                                   Inst, SectionAddr + Index, Size, Target) ||
+                               DT->InstrAnalysis->evaluateInstruction(
+                                   Inst, SectionAddr + Index, Size, Target);
             if (!PrintTarget) {
               if (std::optional<uint64_t> MaybeTarget =
                       DT->InstrAnalysis->evaluateMemoryOperandAddress(
@@ -2430,7 +2431,7 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
                   break;
               }
 
-              // Branch targets are printed just after the instructions.
+              // Branch and instruction targets are printed just after the instructions.
               // Print the labels corresponding to the target if there's any.
               bool BBAddrMapLabelAvailable = BBAddrMapLabels.count(Target);
               bool LabelAvailable = AllLabels.count(Target);

Follows up on discussion from linked pull request that has been closed. Essentially, not having a target specific factory method
Copy link

github-actions bot commented Jun 18, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions h,cpp -- llvm/include/llvm/MC/MCInstrAnalysis.h llvm/lib/MC/MCInstrAnalysis.cpp llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp llvm/tools/llvm-objdump/llvm-objdump.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
index 849c7dad8..c98eeadae 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp
@@ -248,7 +248,7 @@ public:
                            uint64_t &Target,
                            const MCSubtargetInfo &STI) const override {
     unsigned int ArchRegWidth = STI.getTargetTriple().getArchPointerBitWidth();
-    switch(Inst.getOpcode()) {
+    switch (Inst.getOpcode()) {
     default:
       return false;
     case RISCV::C_ADDI:

…o register

Handles the case of users writing instructions that access a data in the first page of a program. Such an instruction would not need the upper bits so a user is likely to use the zero register as the base to add the immediate to
Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MaskRay MaskRay requested review from aengelke and jh7370 June 21, 2025 07:27
Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for enhancing disassembler. This looks quite good, however, request changes to prevent accidental landing, as I believe this needs further consideration.

#65480 introduced updateState, which does very similar thing. @aengelke just put up #145009 for AArch64, and might have some thoughts as well.

@jh7370 as the regular llvm-objdump reviewer.

@arjunUpatel
Copy link
Author

arjunUpatel commented Jun 23, 2025

i did not mean to THAT. im a dumbass please ignore review pings. i dont think there is any way for me to stop the pings now that they are out. really sorry for all the trouble i was trying to figure out graphite and stacking pull requests and did some really vile stuff while reverting a merge commit which auto sent all the requests. suffices to say i now know when i should be resetting over reverting and be pushing with caution

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a huge amount I can comment on here. From an llvm-objdump perspective, the approach is quite simple, but I do think there is room for improvement around the details. I can't really provide much better feedback than that, because the disassembler is more the MC layer, which my knowledge doesn't cover especially well.

Regarding the testing, is it really necessary to use a cross-project-test for these, rather than an llvm-mc or similar directly? I'm guessing the key thing is that you want the target addresses evaluated, while llvm-mc simply creates addresses? Does the testing work if you instead rely on relocations to evaluate branch etc targets? Because if so, putting the test under llvm/test/tools/llvm-objdump and using llvm-mc to assemble the instructions would make more sense, since that wouldn't require multiple projects to be enabled to work.

@@ -234,6 +243,91 @@ class RISCVMCInstrAnalysis : public MCInstrAnalysis {
return false;
}

bool evaluateInstruction(const MCInst &Inst, uint64_t Addr, uint64_t Size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a distinct function to evaluateBranch in the virtual interface? It feels like the two are fundamentally the same from a high-level point of view, since a branch is an instruction. Of course, within the target-specific layer, you could split it then.

If the purpose is purely so you know when a thing is branch specifically, it feels like you could achieve that through some other means, for example an enum return code that has values like Invalid, Branch, Other, or maybe a bitfield with some basic flags.

Copy link
Author

@arjunUpatel arjunUpatel Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes more sense to merge the two since their functionality overlaps heavily. The purpose of splitting the two is that RISCV addi instruction behaves differently when the subtargetinfo is different (32 bit vs 64 bit). Hence, the subtargetinfo is passed as a parameter to evaluateInstruction, while evaluateBranch does not accept this parameter. We can either set the subtargetinfo as a default parameter in the declaration of evaluateBranch or make the subtargetinfo an attribute of MCInstrAnalysis. I like the latter because off ofa quick skim it looks like across all the invocations of the functions that are attributes of MCInstrAnalysis and have subtargetinfo as a parameter, none actively change the subtargetinfo once the parent object is initialized. How about I experiment with these two approaches in a separate PR?

@arjunUpatel
Copy link
Author

Regarding the testing, is it really necessary to use a cross-project-test for these, rather than an llvm-mc or similar directly?

The object file generated by llvm-mc does not seem to be able to encode instructions in the floating point and zcb extensions properly. When the output is disassembled with llvm-objdump, we get <unknown> for these instructions instead of address resolution as seen here:

      2e: 00102427     	<unknown>
      32: 00110097     	auipc	ra, 0x110
      36: fda080e7     	jalr	-0x26(ra) <func>
      3a: 640d         	lui	s0, 0x3
      3c: 8800         	<unknown>

Assembled with llvm-mc -triple=riscv64 -mattr=+f,+c,+zcb riscv64-ar-coverage.s -o riscv64-ar-coverage.s. Could I be missing some options that might be causing this?

Does the testing work if you instead rely on relocations to evaluate branch etc targets?

I'm not sure I understand what you mean by this. Can you please elaborate?

@lenary
Copy link
Member

lenary commented Jun 23, 2025

Regarding the testing, is it really necessary to use a cross-project-test for these, rather than an llvm-mc or similar directly?

The object file generated by llvm-mc does not seem to be able to encode instructions in the floating point and zcb extensions properly. When the output is disassembled with llvm-objdump, we get <unknown> for these instructions instead of address resolution as seen here:

      2e: 00102427     	<unknown>
      32: 00110097     	auipc	ra, 0x110
      36: fda080e7     	jalr	-0x26(ra) <func>
      3a: 640d         	lui	s0, 0x3
      3c: 8800         	<unknown>

Assembled with llvm-mc -triple=riscv64 -mattr=+f,+c,+zcb riscv64-ar-coverage.s -o riscv64-ar-coverage.s. Could I be missing some options that might be causing this?

One way to make this work is to add the same flag as --mattr=+f,+c,+zcb when invoking llvm-objdump, but this is really only used for testing. Another way is to add .attribute arch, "rv64i2p1_c2p1_f2p0_zcb1p0" (e.g. the ISA string you want to use) to the assembly, which emits extension metadata into the object that llvm-objdump can use (and so doesn't need the --mattr= flag).

@jrtc27
Copy link
Collaborator

jrtc27 commented Jun 23, 2025

Regarding the testing, is it really necessary to use a cross-project-test for these, rather than an llvm-mc or similar directly?

The object file generated by llvm-mc does not seem to be able to encode instructions in the floating point and zcb extensions properly. When the output is disassembled with llvm-objdump, we get <unknown> for these instructions instead of address resolution as seen here:

      2e: 00102427     	<unknown>
      32: 00110097     	auipc	ra, 0x110
      36: fda080e7     	jalr	-0x26(ra) <func>
      3a: 640d         	lui	s0, 0x3
      3c: 8800         	<unknown>

Assembled with llvm-mc -triple=riscv64 -mattr=+f,+c,+zcb riscv64-ar-coverage.s -o riscv64-ar-coverage.s. Could I be missing some options that might be causing this?

One way to make this work is to add the same flag as --mattr=+f,+c,+zcb when invoking llvm-objdump, but this is really only used for testing. Another way is to add .attribute arch, "rv64i2p1_c2p1_f2p0_zcb1p0" (e.g. the ISA string you want to use) to the assembly, which emits extension metadata into the object that llvm-objdump can use (and so doesn't need the --mattr= flag).

Or use llvm-mc -riscv-add-build-attributes (which Clang enables by default when used as a driver for assembly files)

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean by this. Can you please elaborate?

In the previous version, you were creating a fully-linked output, so all the address calculations will have been fully resolved. In a typical object generated by llvm-mc, the references to symbols will be made via relocations, which the linker then resolves. See #145009 for an example. NB: I don't know enough about RISCV, nor have I attempted to understand your test enough to determine whether that would actually be the case in your instance.

@@ -0,0 +1,32 @@
# RUN: llvm-mc -riscv-add-build-attributes -triple=riscv32 -filetype=obj -mattr=+zclsd,+zilsd %s -o %t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "ar-coverage" mean in these test file names?

# CHECK-NEXT 10: 00c53523 sd a2, 0xa(a0) <target>
# CHECK-NEXT 14: 0000 unimp

# The structure of this test file is similar to that of riscv64-ar-coverage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in newer llvm-objdump tests, we use ## for true comment markers, versus # for lit and FileCheck directives, to help them stand out better.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature][riscv] handle target address calculation in llvm-objdump disassembly for riscv
6 participants