-
Notifications
You must be signed in to change notification settings - Fork 108
bitcoin: add bitcoin docs (WIP) #270
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass over this. Exciting!
|
||
The Bitcoin format consistently uses a double-SHA2-256 hash to produce content digests. This algorithm is simply the SHA2-256 digest of a SHA2-256 digest of the raw bytes. These digests are also used publicly when referring to individual transactions and whole block graphs. The Bitcoin Core CLI as well as the many web-based block explorers allow data look-up by these addresses. | ||
|
||
When publishing these addresses, they are typically presented as big-endian in hexadecimal. To represent these in byte form on a little-endian system, they therefore need to be reversed and the hexadecimal decoded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since endianness is usually defined over a multibyte integer type, I am for real not sure which type of "little endian" is meant here ( and casual googling doesn't help ). If I see the following 128bit long payload on disk:
00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff
What is the actual value:
33 22 11 00 77 66 55 44 bb aa 99 88 ff ee dd cc
77 66 55 44 33 22 11 00 ff ee dd cc bb aa 99 88
ff ee dd cc bb aa 99 88 77 66 55 44 33 22 11 00
- Something else?
Alternatively - if the on-disk structures are explicitly defined over >64bit integer types: this needs to be called out early, so folks like me get in the right mindset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3, as if you read it entirely as a 32-byte unsigned integer, you read it in the reverse than you would if you treated it as LE. "usually defined over a multibyte integer type" is what's being got at here, but it's 32-bytes, not some repeating sub-pattern.
The "as if" makes me think this is leaning too heavily on the "uint256" thing too much. I'm tempted to remove that language entirely and say it's just a byte string and by convention it gets byte-reversed and turned into hexadecimal when presented publicly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In truth, I never touch this "uint256" thing myself in any of my code. It treat all of these things as byte arrays and then reverse+hexadecimal whenever I need to present the value. Otherwise they're only useful as byte arrays. So I guess that fact in itself suggests the backing out of this concept. It's really just window dressing to make the zeros go at the start of block addresses.
|
||
### Transactions | ||
|
||
There are at least one transaction in a Bitcoin block graph. The first transaction is called the "coinbase" and represents the miner rewards. A block graph may _only_ contain a coinbase or it may also also contain a number of transactions representing the movement of coins between wallets. Each transaction contains a list of one or more "Transaction Ins" and a list of one or more "Transaction Outs" representing the flow of coins. The coinbase contains a single Transaction In containing the block reward and the Transaction Outs list represent the destination of the rewards. Non-coinbase transactions contain Transaction Ins representing the source of the coins being transacted, linking to previous transactions, and a list of Transaction Outs containing the details of the destination wallets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are at least one transaction in a Bitcoin block graph.
Technically, past ~2140, when everyone working on this is dead, this may no longer be true ;P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that it's only true if people are transacting on Bitcoin and beyond ~2140 there may no longer be transactions? It's still going to be true as long as someone is mining Bitcoin because there's always a coinbase. There cannot exist a "bitcoin block graph" without at least one transaction!
I'm looking through Zcash right now and it's kind of sad how many coinbase-only transactions there are near the head. It makes it look like it now exists to be mined ...
} | ||
|
||
type OutPoint struct { | ||
hash Bytes # 256-bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, together with the #int64 below, almost makes one want to say "ipld schema integers are of arbitrary precision", and leave it up to the codecs when to switch the wire-representation, and leave it to codecs when to use a language internal bigint
and when to use a native integer.
This has probably been discussed already, so feel free to ignore with no further discussion.
* `version`: a signed 32-bit integer | ||
* `segwit`: is implicit and `false` for all block graphs prior to the SegWit soft fork, which occurred at a height of 481,824. After this height, the two bytes following `version` are inspected, if they are equal to `[0x0, 0x1]`, the bytes are consumed and `segwit` is `true`. If the bytes are not exactly these values, `segwit` is false, and the two bytes instead form the begining of `vin` (the first byte of `vin` is part of the compact size integer, and as `vin` must contain one or more elements, it cannot be `0x00`, hence the reliability of the `segwit` flag maintaining backward-compatibility). | ||
* `vin`: one or more elements, prefixed by a compact size int, then, for each element up to the size: | ||
* `hash`: an unsigned 256-bit integer / a 32-byte binary string, the OutPoint transaction ID hash identifying the source transaction for the coins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes together with the endianness discussion above: being an integer and a string at the same time can't be a thing.
Notes to self arising from discussion so far, for when I do revisions:
|
what’s the status here? i’d like to get something up on the specs website that i can link to |
status is that each time I sit down to attack this I'm overwhelmed by the size of the task to pull it together into a coherent form that covers everything that it needs to; but it does weigh on me that it's outstanding and I need to get it closed out along with js-multiformats reworks of the codec(s). It's not in a worthy state to even merge as a draft tbh, so you're out of luck for now but I'll try and get to it asap. |
It would be really cool to merge this, even if we want to put some disclaimer texts in somewhere. This is way more and better information than we have on this topic anywhere else, as far as I know. |
Not merge-worthy IMO, it's so far from what it should have been. I think a better approach might be to start from the reverse end, like the Filecoin, and now Ethereum data specs, and work backward. It turned out to be really hard to work forward like I was doing it here. |
Not complete, but it's big enough and very tedious, that I just want to push something. If anyone feels like reviewing as a WIP feedback would be appreciated but I've got a lot more to do to connect the pieces to IPLD. Will call for reviews when I think it's ~finished.