今天介绍一下将在 Monoxide 公链系统中采用的区块压缩算法。从比特币的 BIP152 出发的进一步改进算法。使用 Txilm
算法将使的区块传输带宽需求降低到原来的 1/80,同时使得冲突概率控制在 1/1000 左右,消解冲突的计算量可以忽略不计。这个算法不依赖各个全节点之间的 mempool 是高度一致的,无需额外的协议保持 mempool 的同步。

Txilm Protocol: Lossy Block Compression with Salted Short Hashing

Background

Compact blocks carries only TXIDs by assuming most TXes in a newly created
block are already stored in the mempool for most full nodes. This is proposed
in BIP152. More details can be found inbitcoincore.org/en/2016

Compact blocks basically replace each TX (300–400 bytes roughly) with a
32-bytes TXID (e.g. SHA256 of the TX). This yields a nearly 10 times bandwidth
saving.

We aim to further reduce the size of each TX representation in a compact block
to around 40 bits. We can achieve 6.4x compression over the original proposal
of Compact blocks (6.4 = 32 bytes / 40 bits). The new proposal is simply
without using additional data structures like bloom filter or IBLT. Also, the
new proposal doesn’t rely on consistent mempool across full nodes.

Combined with Canonical transaction ordering rule (CTOR), the short hashing
can be further reduced to 32 bits, which yields 8x compression over the
original proposal of Compact blocks. A totally 80 times data size reduction is
realized.

Rationale

We present each TX in a block by a small hash value based on TXID:

[code]

TXID-HASH = h(TXID)

[/code]

In which h is a small hash function that generates 32-bit to 64-bit hash
values. It can be just CRC32, CRC40 or CRC64. The proposed new compact block
scheme includes only a list of TXID_HASH, ordered as the original list of
TX.

Ambiguity may occurs with such a k-bit small-sized hash, which needs to be
resolved by each full node. Once receiving a new block that includes the
TXID-HASH list from the sender, the receiver searches each received TXID-HASH in the hash list produced by its mempool. For each TXID_HASH, three
cases may happen as follows:

  1. Not found: There is no TX in the mempool that matches the received TXID_HASH. The receiver will ask the sender or other peers for the TXID.
  2. A single match is found: the TXID is resolved.
  3. Multiple matches are found: the receiver will collect all matched TXIDs as candidates for a 2nd-stage resolving.

In the 2nd-stage, all combinations of candidates of multiple TXID-HASH are
iterated for recomputing the Merkle tree. A correct combination will result in
a matched Merkle root with the one carried by the block header.

If any of the combinations in (3) or the resolved TXID list in (2) can not
match the Merkle root in the block header, the receiver will fall back to ask
the sender to transfer the complete TXID list of the block, which is
described in the original Compact blocks proposal. The cause of this situation
may happen is that at least one TXID in the receiver mempool has the same
TXID-HASH in the received TXID-HASH list, while this is not the one
included in the block.

An optional optimization to the 2nd-stage search is to add a lightweight pre-check before actually recomputing the Merkle root. We propose a lightweight
Merkle tree by replacing SHA256 with CRC32, the CRC32-Merkle tree, with a
4-byte root. When creating a new block, the 4-byte CRC32-Merkle root will
prepended to the encoded TXID-HASH data. This yields 40x acceleration in
searching the right combination. (SHA256 over 16 bytes v.s. CRC32 over 8
bytes)

Resolving ambiguity may incur additional latency. Iterating the combination of
many ambiguous TXID_HASH may also consume a considerable amount of CPU time.
The feasibility of this proposal highly depends on the probability of hash
collision, which is related to the length of the hash value (the k-bit) and
also the size of mempool.

Probability of a Single Collision

A single collision is defined as any case of the 1 or 3 occurred at least
once. Such collision can be within the TXID-HASH list received, or ones
between the received list and the mempool.

Given a TXID-HASH with k bits, the collision rate is 1/2^k. So the
probability of a single collision occurred in a new block can be formulated as
Generalized Birthday problem. Say, we have total mTX in the mempool in
average, and the new block carries n TX. The probability of a single
collision can be approximated as:

[code]

PSC = 1 - (1 - 1/2^k)^( m*n + n*n/2 ).

[/code]

For example, we have m = 60000, n = 10000:

  • k = 32, PSC = 0.14
  • k = 40, PSC = 0.00059
  • k = 48, PSC = 0.0000023

We recommends k = 40 as a reasonable value with good compression ratio and
pretty low collision rate. A sufficient k is roughly proportional to log( m*n + n*n/2 ).

For example, we enlarge to 100x: m = 6000000, n = 1000000:

  • k = 48, PSC = 0.023
  • k = 56, PSC = 0.000090
  • k = 64, PSC = 0.00000035

Or, we reduce to something much smaller: m = 3000, n = 200:

  • k = 24, PSC = 0.036
  • k = 32, PSC = 0.00014
  • k = 40, PSC = 0.00000056

Combined with CTOR

Canonical transaction ordering rule (CTOR) is an ordering scheme to sort
transactions in a block and in mempools according to the hash of transaction,
which is already deployed in BCH network. Based on CTOR, the proposed scheme
will achieve much lower collision probability and/or much higher compression
ratio.

Since transactions are ordered in both blocks and mempools, the candidate
space of any ambiguous TXID-HASH will be narrowed to a range bound by its
previous and next TXIDwith resolved TXID-HASH, instead of the entire
mempool. Assuming the newly confirmed TXID are evenly distributed in the
sorted mempool, the size of the potential collision space will be reduced from
m to m/n . The collision within block only occurs if the ambiguous TXID-HASHare adjacent after ordering. This dramatically reduces the collision
probability and the cost of resolving ambiguity, which allows even shorter
hash with higher compression ratio.

With transaction ordering by CTOR, the collision probability can be
approximated as:

[code]

PSC = 1 - (1 - 1/2^k)^( m + n/2 )

[/code]

For small blocks: m = 3000, n = 200:

  • k = 16, PSC = 0.046
  • k = 24, PSC = 0.00018
  • k = 32, PSC = 0.00000072

For medium blocks: m = 60000, n = 10000:

  • k = 16, PSC = 0.63
  • k = 24, PSC = 0.0039
  • k = 32, PSC = 0.000015

For large blocks: m = 6000000, n = 1000000:

  • k = 24, PSC = 0.32
  • k = 32, PSC = 0.0015
  • k = 40, PSC = 0.0000059

Collision Attack

It is easy to construct a new transaction with its TXID-HASH matches with
that of an existing transaction. Massively creation of such malicious
transactions for collision may invalidate the collision probability analysis
stated above and makes verification of new blocks costly, which eventually
results in higher fork rate. We propose two simple strategies to address the
attack model.

Short Hashing with Salt A simple strategy for defense is to introduce a salt
when calculate the TXID-HASH:

[code]

TXID-HASH = h( Salt + TXID )

[/code]

The salt should be specific to the block carrying those TXID-HASHes and
included in the encoded data. For example, just take the CRC32-Merkle as the
Salt, or introduce another 4-byte field with random bits.

By introducing Salt: The attacker cannot construct malicious transactions even
that existing transactions are known to all, until a new block carrying them
is out. Malicious transactions are unlikely reach full nodes earlier than the
new block is received and verified. It requires the attacker can flood
malicious transactions to the entire network very fast. Theoretically, it is
possible (e.g. by controlling a large botnet) especially the underlying P2P
network is small. The strategy makes malicious transaction specific to a
single block, thus malicious transactions spreaded previously will not be able
to attack future block creation and makes attack much inefficient
economically.

Fall back when Under Attack Introducing Salt dramatically raised the cost of
collision attack, while in the extreme case, massive collision attack is still
possible, regardless of the cost. We require miners fall back to TXID list
when the entire network is under attack. It is also incentivized because
creating orphan blocks is wasting hash rate.

Such attack can be observed by all full nodes including miners when verifying
incoming new blocks. It can be done by simply counting the number of ambiguous
TXID-HASH per-block. If the counts significantly higher the expected value
and forks are observed. The next block should fall back to TXID list.

P.S. 你是高手看到这里了,本专栏诚邀中文技术翻译的小伙伴 ~~ 有兴趣的联系我哦 !
P.S II. 如果有错误,请不吝指正 ~~

Reference

Compact Blocks FAQ

https://bitcoincore.org/en/2016/06/07/compact-blocks-faq/

BIP152: Compact Blocks

https://github.com/bitcoin/bips/blob/master/bip-0152.mediawiki

Canonical Transaction Ordering for Bitcoin

https://blog.vermorel.com/pdf/canonical-tx-ordering-2018-06-12.pdf

Wiki: Birthday problem

https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem

Big Number Calculation

https://www.ttmath.org/online_calculator

来源链接:mp.weixin.qq.com