Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Lotus: Provide a way to lookup messages by EthTxID #1029

Closed
Stebalien opened this issue Oct 26, 2022 · 17 comments
Closed

Lotus: Provide a way to lookup messages by EthTxID #1029

Stebalien opened this issue Oct 26, 2022 · 17 comments

Comments

@Stebalien
Copy link
Member

No description provided.

@Stebalien Stebalien added this to the M2.1 milestone Oct 26, 2022
@Stebalien Stebalien modified the milestones: M2.1, M2.1 Post code-freeze Nov 8, 2022
@Stebalien Stebalien added P3 P3: Might get resolved Topic: Ethereum JSON-RPC labels Nov 8, 2022
@raulk
Copy link
Member

raulk commented Dec 17, 2022

This is a blocker for Hyperspace, unfortunately. It's a blocking usability/compatibility issue.

@jennijuju
Copy link
Member

After implementing this filecoin-project/lotus#9839, we can easily expose an API for it.

@raulk
Copy link
Member

raulk commented Jan 4, 2023

It might not be very unreasonable to add support for calculating transaction hashes to popular libraries as most of them are user-extensible. For example:

Assuming there are similar solutions for web3.js, Foundry, and other libraries.

@scotthconner committed to reviewing the state-of-the-art libraries and checking if we can plug in our own transaction formatter here: https://filecoinproject.slack.com/archives/CP50PPW2X/p1671741786941649?thread_ts=1671733951.808009&cid=CP50PPW2X

Adding him as an assignee here.

@scotthconner
Copy link

scotthconner commented Jan 4, 2023

The root cause is workflows that assume they can counter-factually determine the transaction hash before submission to the network. This acts as a security feature and prevents signers from having to actively trust RPC end-points. In the case for all EVM chains and tools, this is a hash of the transaction inputs, including things like nonce, gas, function selector, call data, etc, but is not a function of the network or blockchain state (something that can only be computed once included in a block).

Web3js, ethers, hardhat plugins, etc, all operate on the ability to check the transaction hash against expectations after submission. Foundry, a huge up-and-comer in EVM development space compared to hard hat - has had discussion on how they want to handle (and avoid) similar workflows and vendor additions here: foundry-rs/foundry#2279

Arbitrum, Optimism, Polygon, Avalanche, and associated test-nets are all fully compatible with Hardhat, Foundry, ethers, web3js, wagmi, WalletConnect, etc. as EVM compatible chains. Their tutorials, examples, corpus of cross-deployed products, and commit contributions to open source toolchains suggest most successful EVM compatible networks use the same tools in the same way.

@maciejwitowski maciejwitowski removed the P3 P3: Might get resolved label Jan 9, 2023
@jennijuju
Copy link
Member

@scotthconner any AI needed from rpc api ep perspective from clients?

@maciejwitowski
Copy link
Contributor

@scotthconner
Copy link

@scotthconner any AI needed from rpc api ep perspective from clients?

We just need to make sure that gateway owners have this indexing turned on.

@maciejwitowski
Copy link
Contributor

ETA Monday

@jennijuju
Copy link
Member

@geoff-vball can you comment with your PR, how can one lookup message by EthTxID here and close the issue with that comment?

@raulk
Copy link
Member

raulk commented Jan 18, 2023

I think we have a problem with the consistency of this index. We treat it as a cache (with GC and all), but we should focus on this index being strongly consistent with the chain and the mpool. Otherwise, if an entry drops we will being returning the wrong hash in a number of places.

For example:

  • On EthGetMessageCidByTransactionHash, we would interpret the supplied transaction hash as a message CID (thinking it's a native message), and would return the CID of a message that doesn't exist, even though in the past we knew it was an Eth tx -- we just forgot. Adding insult to injury, a user calling this API with the same input might get different results before and after a GC.
  • On eth_getTransactionReceipt, we would fail to return the receipt of a message whose receipt we previously returned (but then GC ran and erased the mapping).
  • On eth_getBlock, we might return the wrong transaction hashes.

Furthermore, this lookup table must be compulsorily enabled if the Eth endpoint is enabled. It should not be an independent option.

Sadly, it may also be time to phase out native message support from the Ethereum JSON-RPC API. That way, if an expected hash is not found, we can error due to a data consistency issue, instead of speculating that it might be a native message.

In other words, the transaction hash must be deterministically handled and returned.

Finally, because this needs to be strongly consistent, we will need to provide some way to reconcile chain/mpool data with the index.

@scotthconner
Copy link

For the instances we handle the data in there incorrectly:

  • The case where "we forgot" is when the eth transaction was never actually included into a block, either because the chain state causes a reversion by the time it tries to be included, the fee was too low, or it was otherwise replaced (like a higher nonce getting included first). So the cases where a bad message CID turns up is when someone is trying to request an eth hash that wasn't actually finalized? Is this correct?

  • For EVM compatibility, do they handle receipt permanence as part of chain state? Like, its useful to be able to see a transaction that was added to the mem-pool but never actually included?

  • for getBlock, what is the actual risk that we will return wrong data versus missing data?

I tend to agree that the option should be on by default, and not something that is manually set. I also tend to agree that trying to mix in native message support into the ETH API is likely more trouble than its worth, unless we can identify critical use cases, Deterministic behavior, much like Ethereum itself, is obviously desired.

The question I have is about the trade-offs. The current implementation enables clients to handle ethereum and FEVM transactions in the same manner, using all the same assumptions and toolchains. The current implementation however does not cover cases where transactions are not included into blocks in the same manner that Ethereum does. IMO, it seems worthwhile to gain happy-path parity with ethereum-based developer experience at the expense of failure cases behaving differently when transactions aren't included, as long as the results aren't dangerous or insecure.

@geoff-vball
Copy link
Contributor

geoff-vball commented Jan 18, 2023

@raulk The first bullet is not correct. If we don't find the mapping in the database, we'll do the reversible conversion, and then look up that message to see if it exists. Otherwise we return nil.

I don't think the third point is correct either. If we have all the messages in the block, we recalculate the hashes from the messages.

We will never return the wrong hash. At every point we will either say "The correct hash is X" or "not found". This is how nodes work today for the majority of our data. Operators are given the option to store all mappings for as long as they want. I'm not sure why we would force all node operators to disable the entire Eth API if they don't want to store a massive index.

@Stebalien
Copy link
Member Author

I think @geoff-vball is correct, there's a middle ground as long as we never blindly convert a tx hash into a CID. We can:

  1. Index all inbound messages in the mpool, and all messages found in blocks.
  2. Provide some form of garbage collection logic that clears out all old mappings after some period of time. At worst, we'll return "not found" to the user.

Some nodes will want to keep an index of every message ever seen, most will likely only care about a few days worth of history.

IMO, whether or not we want to continue to support "native" messages in the Ethereum API is an orthogonal question.

@raulk
Copy link
Member

raulk commented Jan 18, 2023

@geoff-vball

@raulk The first bullet is not correct. If we don't find the mapping in the database, we'll do the reversible conversion, and then look up that message to see if it exists. Otherwise we return nil.

👍

I don't think the third point is correct either. If we have all the messages in the block, we recalculate the hashes from the messages.

That's right. I checked the implementation and given that we have fetched the message from the store, we process the message to calculate the EthTxHash -- makes sense!

However, I do think the second point stands:

On eth_getTransactionReceipt, we would fail to return the receipt of a message whose receipt we previously returned (but then GC ran and erased the mapping).

And this applies to eth_getTransactionByHash too.


Taking this into account:

  • Are we OK failing to return transaction and receipt data that we successfully returned in the past?

If the answer is yes, then my concern is resolved as there is no ambiguity possible around transaction IDs. However, I would still advise to turn on the index automatically with the Eth API. Having nodes return different hashes depending on configuration is very confusing.

@scotthconner
Copy link

Fully aligned @raulk

@geoff-vball
Copy link
Contributor

I think the best course of action is to have some top-level config variable that turns on all Eth API functionality including events and the tx hash lookup with no GC. We can then include finer configuration options if users want to turn one off, or set some policy for GC.

With the top level config, we can release new functionality that will work out of the box without users having to turn things on individually. This also saves us from having to set the default of the DB to "on" for all users, which is unnecessary for users that don't want to deal with FEVM.

@geoff-vball
Copy link
Contributor

I also think there is currently ambiguity in EthGetTransactionHashByCid that I will fix promptly.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

6 participants