今天介绍一下将在 Monoxide 公链系统中采用的区块压缩算法。从比特币的 BIP152 出发的进一步改进算法。使用 Txilm

算法将使的区块传输带宽需求降低到原来的 1/80，同时使得冲突概率控制在 1/1000 左右，消解冲突的计算量可以忽略不计。这个算法不依赖各个全节点之间的 mempool 是高度一致的，无需额外的协议保持 mempool 的同步。

## Txilm Protocol: Lossy Block Compression with Salted Short Hashing

### Background

Compact blocks carries only TXIDs by assuming most TXes in a newly created

block are already stored in the mempool for most full nodes. This is proposed

in BIP152. More details can be found inbitcoincore.org/en/2016

Compact blocks basically replace each TX (300–400 bytes roughly) with a

32-bytes TXID (e.g. SHA256 of the TX). This yields a nearly 10 times bandwidth

saving.

We aim to further reduce the size of each TX representation in a compact block

to around 40 bits. We can achieve 6.4x compression over the original proposal

of Compact blocks (6.4 = 32 bytes / 40 bits). The new proposal is simply

without using additional data structures like bloom filter or IBLT. Also, the

new proposal doesn’t rely on consistent mempool across full nodes.

Combined with Canonical transaction ordering rule (CTOR), the short hashing

can be further reduced to 32 bits, which yields 8x compression over the

original proposal of Compact blocks. A totally 80 times data size reduction is

realized.

### Rationale

We present each TX in a block by a small hash value based on `TXID`

:

[code]

TXID-HASH = h(TXID)

[/code]

In which h is a small hash function that generates 32-bit to 64-bit hash

values. It can be just CRC32, CRC40 or CRC64. The proposed new compact block

scheme includes only a list of `TXID_HASH`

, ordered as the original list of

`TX`

.

Ambiguity may occurs with such a k-bit small-sized hash, which needs to be

resolved by each full node. Once receiving a new block that includes the

`TXID-HASH`

list from the sender, the receiver searches each received `TXID-HASH`

in the hash list produced by its mempool. For each `TXID_HASH`

, three

cases may happen as follows:

- Not found: There is no TX in the mempool that matches the received
`TXID_HASH`

. The receiver will ask the sender or other peers for the`TXID`

. - A single match is found: the
`TXID`

is resolved. - Multiple matches are found: the receiver will collect all matched
`TXIDs`

as candidates for a 2nd-stage resolving.

In the 2nd-stage, all combinations of candidates of multiple `TXID-HASH`

are

iterated for recomputing the Merkle tree. A correct combination will result in

a matched Merkle root with the one carried by the block header.

If any of the combinations in (3) or the resolved `TXID`

list in (2) can not

match the Merkle root in the block header, the receiver will fall back to ask

the sender to transfer the complete `TXID`

list of the block, which is

described in the original Compact blocks proposal. The cause of this situation

may happen is that at least one `TXID`

in the receiver mempool has the same

`TXID-HASH`

in the received `TXID-HASH`

list, while this is not the one

included in the block.

An optional optimization to the 2nd-stage search is to add a lightweight pre-check before actually recomputing the Merkle root. We propose a lightweight

Merkle tree by replacing SHA256 with CRC32, the CRC32-Merkle tree, with a

4-byte root. When creating a new block, the 4-byte CRC32-Merkle root will

prepended to the encoded TXID-HASH data. This yields 40x acceleration in

searching the right combination. (SHA256 over 16 bytes v.s. CRC32 over 8

bytes)

Resolving ambiguity may incur additional latency. Iterating the combination of

many ambiguous `TXID_HASH`

may also consume a considerable amount of CPU time.

The feasibility of this proposal highly depends on the probability of hash

collision, which is related to the length of the hash value (the k-bit) and

also the size of mempool.

### Probability of a Single Collision

A single collision is defined as any case of the 1 or 3 occurred at least

once. Such collision can be within the `TXID-HASH`

list received, or ones

between the received list and the mempool.

Given a `TXID-HASH`

with k bits, the collision rate is 1/2^k. So the

probability of a single collision occurred in a new block can be formulated as

Generalized Birthday problem. Say, we have total m`TX`

in the mempool in

average, and the new block carries n `TX`

. The probability of a single

collision can be approximated as:

[code]

PSC = 1 - (1 - 1/2^k)^( m*n + n*n/2 ).

[/code]

For example, we have m = 60000, n = 10000:

- k = 32, PSC = 0.14
- k = 40, PSC = 0.00059
- k = 48, PSC = 0.0000023

We recommends k = 40 as a reasonable value with good compression ratio and

pretty low collision rate. A sufficient k is roughly proportional to `log( m*n + n*n/2 )`

.

For example, we enlarge to 100x: m = 6000000, n = 1000000:

- k = 48, PSC = 0.023
- k = 56, PSC = 0.000090
- k = 64, PSC = 0.00000035

Or, we reduce to something much smaller: m = 3000, n = 200:

- k = 24, PSC = 0.036
- k = 32, PSC = 0.00014
- k = 40, PSC = 0.00000056

### Combined with CTOR

Canonical transaction ordering rule (CTOR) is an ordering scheme to sort

transactions in a block and in mempools according to the hash of transaction,

which is already deployed in BCH network. Based on CTOR, the proposed scheme

will achieve much lower collision probability and/or much higher compression

ratio.

Since transactions are ordered in both blocks and mempools, the candidate

space of any ambiguous `TXID-HASH`

will be narrowed to a range bound by its

previous and next `TXID`

with resolved `TXID-HASH`

, instead of the entire

mempool. Assuming the newly confirmed `TXID`

are evenly distributed in the

sorted mempool, the size of the potential collision space will be reduced from

m to m/n . The collision within block only occurs if the ambiguous `TXID-HASH`

are adjacent after ordering. This dramatically reduces the collision

probability and the cost of resolving ambiguity, which allows even shorter

hash with higher compression ratio.

With transaction ordering by CTOR, the collision probability can be

approximated as:

[code]

PSC = 1 - (1 - 1/2^k)^( m + n/2 )

[/code]

For small blocks: m = 3000, n = 200:

- k = 16, PSC = 0.046
- k = 24, PSC = 0.00018
- k = 32, PSC = 0.00000072

For medium blocks: m = 60000, n = 10000:

- k = 16, PSC = 0.63
- k = 24, PSC = 0.0039
- k = 32, PSC = 0.000015

For large blocks: m = 6000000, n = 1000000:

- k = 24, PSC = 0.32
- k = 32, PSC = 0.0015
- k = 40, PSC = 0.0000059

### Collision Attack

It is easy to construct a new transaction with its `TXID-HASH`

matches with

that of an existing transaction. Massively creation of such malicious

transactions for collision may invalidate the collision probability analysis

stated above and makes verification of new blocks costly, which eventually

results in higher fork rate. We propose two simple strategies to address the

attack model.

Short Hashing with Salt A simple strategy for defense is to introduce a salt

when calculate the `TXID-HASH`

:

[code]

TXID-HASH = h( Salt + TXID )

[/code]

The salt should be specific to the block carrying those TXID-HASHes and

included in the encoded data. For example, just take the CRC32-Merkle as the

Salt, or introduce another 4-byte field with random bits.

By introducing Salt: The attacker cannot construct malicious transactions even

that existing transactions are known to all, until a new block carrying them

is out. Malicious transactions are unlikely reach full nodes earlier than the

new block is received and verified. It requires the attacker can flood

malicious transactions to the entire network very fast. Theoretically, it is

possible (e.g. by controlling a large botnet) especially the underlying P2P

network is small. The strategy makes malicious transaction specific to a

single block, thus malicious transactions spreaded previously will not be able

to attack future block creation and makes attack much inefficient

economically.

Fall back when Under Attack Introducing Salt dramatically raised the cost of

collision attack, while in the extreme case, massive collision attack is still

possible, regardless of the cost. We require miners fall back to `TXID`

list

when the entire network is under attack. It is also incentivized because

creating orphan blocks is wasting hash rate.

Such attack can be observed by all full nodes including miners when verifying

incoming new blocks. It can be done by simply counting the number of ambiguous

`TXID-HASH`

per-block. If the counts significantly higher the expected value

and forks are observed. The next block should fall back to `TXID`

list.

P.S. 你是高手看到这里了，本专栏诚邀中文技术翻译的小伙伴 ~~ 有兴趣的联系我哦 !

P.S II. 如果有错误，请不吝指正 ~~

**Reference**

Compact Blocks FAQ

https://bitcoincore.org/en/2016/06/07/compact-blocks-faq/

BIP152: Compact Blocks

https://github.com/bitcoin/bips/blob/master/bip-0152.mediawiki

Canonical Transaction Ordering for Bitcoin

https://blog.vermorel.com/pdf/canonical-tx-ordering-2018-06-12.pdf

Wiki: Birthday problem

https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem

Big Number Calculation

https://www.ttmath.org/online_calculator

来源链接：mp.weixin.qq.com