Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

share: add -pattern-limit to limit analysis effort #4904

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

widlarizer
Copy link
Collaborator

@widlarizer widlarizer commented Feb 14, 2025

The share pass uses SAT to find cells with same observability with regards to inputs of cells other than $mux. First, it expresses the observability conditions as basically a logical function over the S inputs of $mux cells in find_cell_activation_patterns by traversing contiguous $mux regions. This function is stored per cell as a sum of products in activation_patterns_cache[cell]. In some unfortunately shaped designs, the count of these products increases exponentially with the length of a chain of divering-converging muxes. This PR adds a user-configurable limit to how many "patterns" can be stored per cell. When this limit is exceeded, the cell is assumed to be always "active" in the sense that some of its input bits are always observable by outputs or cells other than multiplexers. This is always safe to do so as it can only inhibit merging cells. This limit is set to 1000 by default.

To test, you can observe the linear rise of memory usage beyond any reasonable bounds when running yosys -p "verific -sv slow-share.sv; prep; share" on the following:

module test_case (
  input  wire         clk,
  input  wire         rst_clk,
  output wire [32:0]  rd [8:0],
  input  wire  [5:0]  wa [8:0],
  input  wire  [8:0]  wen
);

  logic [63:0][32:0] big;
  logic [63:0][32:0] wd;
  logic [5:0]        ra [8:0];

  always @(posedge clk) begin
      if (rst_clk) begin
        big <= '0;
      end
      else begin
        for (int i = 0; i < 9; i++) begin
          if (wen[i]) big[wa[i]] <= wd[wa[i]];
        end
      end
  end

  generate
    for (genvar i = 0; i < 9; i++) begin
      assign rd[i] = big[ra[i]];
    end
  endgenerate

endmodule

While this reproducer uses kind of a lot of wire bits, it is a completely reasonable design pattern. Pre-share statistics:

   Number of wires:                699
   Number of wire bits:          43561
   Number of public wires:          95
   Number of public wire bits:    2560
   Number of ports:                 21
   Number of port bits:            362
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                606
     $bmux                           9
     $dff                            1
     $mem_v2                         1
     $mux                          586
     $shl                            9
  • add scratchpad support so that the limit can be adjusted even when share is used from synth commands

@widlarizer widlarizer force-pushed the emil/share-limit-effort branch from 48f16eb to 8968986 Compare February 14, 2025 20:13
@mikesinouye
Copy link
Contributor

Hi @widlarizer, are there any updates on this PR? My initial impression is that this approach will mitigate the issue for these types of designs, but 1000 is too high of a limit. Maybe n = 100 would work better?

However, I believe it's worth adding this fix now, and if there's additional improvement that can be done within the limit later that would be even better.

@widlarizer
Copy link
Collaborator Author

@mikesinouye I'll try to push it forward. It's basically a hotfix, I'm not inclined to lowering it further since it might pessimize some designs. It's opened a can of worms on how to implement share in a more proper way though, see #4920 #4921

{
if (cache.size() + std::distance(begin_pattern, end_pattern) > config.pattern_limit) {
cache.clear();
cache.insert(ssc_pair_t());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this lead to unnecessary work? Once the cache contains the empty ssc_pair_t() then the result is always unshareable and we can skip the rest. At the moment this could potentially result in the cache reaching the pattern limit many times over since each time it just resets the cache size to 1. May be worth adding a boolean return for if the insertion succeeded, and then breaking out of loops if it returns false? Testing locally with the toy example from the PR description that reduces runtime and memory usage by ~10%

Co-authored-by: KrystalDelusion <93062060+KrystalDelusion@users.noreply.github.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants