The BFT lens: Hot-Stuff and Casper

This post is the first in a series discussing:title.png


Today I am going to overview a new algorithmic foundation called ‘Hot-Stuff the Linear, Optimal-Resilience, One-Message BFT Devil’ (in short, Hot-Stuff), developed jointly with my colleagues Ittai Abraham and Guy Gueta, and harness it to explain the safety and liveness of Casper the Friendly Finality Gadget.

The key take-aways are:

  • We have excellent foundations for Byzantine fault tolerance (or BFT), yet incredible innovation is actually happening in the arena.
  • At first glance, BFT protocols in the age of blockchains, e.g., TendermintCasper the Friendly GHOST, and Casper the Friendly Finality Gadget, have very different “feel” from classical solutions like DLSPBFT and others. In particular, a new proposer in these protocols makes a proposition without carrying an explicit proof of safety. This is the main source of complexity in traditional BFT solutions, and evidently intimidates folks outside the community. However, we have recently been able to articulate a unified foundation called Hot-Stuff, that captures both worlds.
  • Hot-Stuff can be utilized to reason about Casper, give a simple argument for its safety, and express rigorous conditions for liveness.
  • The Hot-Stuff contains a pluggable `beacon’ abstraction, a mechanism and policy for triggering proposals. In particular, we describe an implementation which uses a rotating-proposer that achieves superior communication complexity than previously known BFT solutions.

In a future post, I will discuss the interplay between finality gadgets and the security of the PoW chain they finalize.


Blockchain and BFT

A blockchain is a state-machine-replication (SMR) engine that has a 3-layer architecture. The foundation is a consensus core that forms agreement over an immutable sequence of updates to a shared state. This is known as the Byzantine fault tolerance problem, or BFT. The consensus algorithm in Bitcoin, Nakamoto Consensus, solves BFT for permissionless settings, with an unknown universe of participants whose behavior is untrusted.

On top of BFT is a layer that specifies a state-machine API for updates. Bitcoin’s state-machine and state-updates use a limited scripting language; Ethereum expands the state-machine and state-updates with a Turing complete abstraction (whose resources are bounded).

The top layer represents the application which in Bitcoin is the shared provenance tracking of digital assets, and in Ethereum, could be anything decentralized.

SMR

The BFT Problem Model

The classical settings for BFT solutions is a permissioned set of n=3f+1 replicas, of which a threshold of 2f+1 is presumed to behave according to spec and not fail. These are called correct replicas. The other f replicas are called Byzantine. The permissioned model is quite different from the settings in which Bitcoin blockchain is set to solve the Consensus problem, where participants are unknown, and they are admitted without explicit permission (hence they are called permissionless). Participation is instead conditioned on Proof-of-Work (PoW). From a foundational point of view the two models are quite close, both solve consensus with a resilience threshold, the permissioned model on the number of Byzantine replicas, the PoW model on the cumulative compute power of Byzantine miners. Another variant is the Proof-of-Stake (PoS) model that puts a resilience threshold on the cumulative stake of Byzantine stake holders.

A key element of the BFT problem definition is a partial synchrony model concerning communication delays. This assumption is worthy of attention:

  • A synchronous model assumes a known upper bound on transmission delays. In practice, for this assumption to hold in a large-scale decentralized system, the bound needs to be very high. Waiting for this bound on each step provides a worst-case upper bound, but it also determines a (pessimistic) lower-bound.
  • On the other hand, if there is no known bound on transmission delays, then it is theoretically possible that an algorithm will keep running into adversarial scheduling that will prevent it from ever converging (see the celebrated FLP impossibility result).

In 1988, a beautiful solution approach was given in DLS that strikes a practical path between these two extremes. DLS guarantees that safety is never compromised, even during periods of asynchrony. Progress is guaranteed during periods of synchrony. This solution approach is extremely robust. First, it allows progress at the network speed, not in a lock-step pessimistic manner. Second, it allows to set very conservative synchronous bounds, which makes partial synchrony assumption realistic.

Safety despite asynchrony is another key aspect in which solutions for the permissioned BFT model differ from solutions in the permissionless one. The PoW model is inherently synchronous, hence may lose safety under asynchrony attacks. For the same reason, as alluded to above, solutions in the synchronous settings such as Nakamoto Consensus inherently suffer from high latency.

Revisiting BFT

There are various reasons to revisit BFT:

Permissionless blockchains.
Within permissionless settings, people are looking for ways to harness BFT to alleviate various deficiencies of Nakamoto Consensus. Several ‘hybrid’ schemes were developed that combine PoW chains with BFT to increase throughput, decrease latency to finality, and promote fairness. ByzcoinBitcoin-NG, and Hybrid Consensus use a permissionless chain to determine a participant/proposer rotation in a reconfigurable BFT engine. Solida/Solidus is a is a chainless BFT protocol in the permissionless settings that uses PoW to generate propositions and rotate members. Thunderella is a BFT engine that uses a permissionless chain for recovery from failures. Casper the Friendly Finality Gadget and Casper the Friendly GHOST use a BFT engine as a finalizing authority over a permissionless chain.
Permissioned blockchains.
People have rekindled interest in traditional BFT solutions for permissioned/“consortium” settings. They revisit the classical foundations, e.g., PBFT and BFT-SMaRt, as well as invent new permissioned solutions targeting performance, fairness and privacy issues, e.g., TendermintHoney Badger, and Algorand.

From PBFT via Hot-Stuff to Casper

The BFT literature explicitly talks about rounds and message; instead, we will look at a chain of propositions made level by level, as depicted here:

BFTinBClens

The figure above shows that at each level of the chain, there may be one or more propositions. Because we are in a Byzantine model, conflicting propositions represent either the case of a faulty proposer that equivocates, or the case where there are concurrently contending proposers. From a safety point of view, there really is no difference between these cases, the algorithm must maintain safety against conflicting propositions. Having (eventually) an uncontested leader (or proposition) is required only for liveness.

In PBFT, at each level, a proposer tries to get a unique value ‘locked’ in two phases, ‘prepare’ and ‘commit’. First, a proposer tries to obtain 2f+1 prepares. A set of 2f+1 signed prepares is called a ‘commit-certificate’. In the second phase, replicas commit to the commit-certificate. If 2f+1 replicas commit to a certificate, it becomes a committed decision.

PBFTfull

At the next level, the proposer for the level tries to reach a decision as well. The proposer needs to choose one branch to extend. For example, in the scenario depicted above, if a decision on X1 was reached at level 1, then the level-2 proposer must choose to extend the ‘X’ branch.

In PBFT, a new proposer collects commit-certificates from 2f+1 replicas. The highest commit-certificate in the set is a safe branch to extend. The proposer sends the set of 2f+1 commit-certificates as proof of safety along with a new proposition. A replica accepts a proposition from a new proposer only if it carries such a proof.

In our example, because 2f+1 replicas have committed to X1, the quorum at level 2 will intersect the quorum that committed to X1 in at least one correct replica. Therefore, the second proposer must choose branch ‘X’.

The PBFT quadratic (all-all) exchange of commit votes with signatures has somewhat bad reputation as being costly and impractical, compared with linear solutions for benign SMR, e.g., Paxos and Raft. Additionally, a new proposer proof incurs a communication cost of O(n3) per proposer-replacement. Even if the system is synchronous, a cascade of f failures may cause f proposer replacements and a quartic communication cost, O(n4).

For completeness, let me briefly mention two classes of improvements that were suggested over PBFT. The first was introduced by the PBFT authors in the PBFT, ACM TOCS version. It replaces signatures on messages with vectors carrying two-way authenticators. There are pros and cons: two-way authenticators are much faster to compute and verify, but complicat the protocol, and increase communication complexity by a factor n. The second one is a line of works that introduce an optimistically fast track to decision. This may be trickier than it seems. We recently surfaced in an ArXiv note on Revisiting Fast PBFT safety and liveness issues in state-of-art solutions. We built a system called SBFT that implements fast decision track correctly. SBFT substantially improves on PBFT performance and scaling. In an accompanying ArXiv note, we provide foundational guidelines on the Thelma, Velma, and Zelma optimistically fast BFT protocols. All of these improvements optimize for good conditions, but their proposer replacement protocol remains the same as PBFT.

Hot-Stuff

Hot-Stuff reduces by factor n the PBFT proposer proof. This does not make the protocol more complex, on the contrary, it considerably simplifies it. In Hot-Stuff, a new proposer carries only one commit-certificate, the highest-level certificate it knows. Replicas reject a proposition if it conflicts with the highest-level certificate they committed to. This modification is very simple, but as I said, suprisingly powerful. It is illustrated below, PBFT on the left, the Hot-Stuff linear proposer protocol on the right:

PBFTfullLVC
An ArXiv manuscript on Hot-Stuff describes how to drive down the communication by another factor n by employing threshold cryptography.

The next modification will make the Hot-Stuff protocol look much more like a blockchain than a BFT protocol. Again, this is a small but powerful modification. In Hot-Stuff, the commit-phase of each level is pushed into the prepare-phase of the next level. We will refer to the single per-level locking phase as ‘vote’. This works as follows.

When a proposer extends the chain with a new value, it optionally includes in the proposition the commit-certificate for the previous level. A vote on the proposition is both an explicit prepare and an implicit commit. It is an explicit prepare on the new value. And if the proposition includes a commit-certificate for the preceding level, it is an implicit commit on the certificate. The complete Hot-Stuff framework is depicted below.

HSfull

It is safe for proposers not to include in a proposition a commit-certificate for the preceding level. But note that a decision may be reached only at levels in which a proposer does include a commit-certificate for the preceding level. The same happens in PBFT: A commit-phase may either complete, or a timeout is reached and a proposer transitions to the next level without a decision.

The Hot-Stuff “Pseudo-code”

The entire Hot-Stuff protocol is a one-message exchange. The Hot-Stuff framework explicitly separates an abstract mechanism for progress, called a ‘beacon’, allowing for different pluggable implementations:

Beacon functionality:
propose: ( level, commit-certificate, new command )
Infinitely often, the beacon must broadcast for two consecutive levels unique propositions that do not conflict with the highest-level existing commit-certificate.
Replica functionality:
vote: ( level, commit-certificate, new command ) 
The replica functionality is the same as before, simply merging the prepare for one level with the (optional) commit of the preceding level. As before, replicas reject a proposition if it conflicts with the highest-level commit-certificate they hold.

The full details are described in an ArXiv manuscript on Hot-Stuff. The full manuscript elaborates on several possible materializations of the beacon functionality: Hardware clocks, proposer rotation, and PoW.

Casper

We first regard Casper as a pure BFT engine, but hint to its intended use-case by referring to participants as ‘validators’ instead of replicas. Having introduced Hot-Stuff, we can describe the Casper BFT protocol in a single-step refinement. The gist of it is quite simple. Casper moves the responsibility for pushing a commit-certificate from the beacon-functionality to peer-to-peer dissemination among the validators.

More specifically, each validator signs its vote and sends it directly to a random subset of the validators. Validators forward votes they receive at random for a while. This gossip-style spreading strategy guarantees with high probability that all validators will collect a commit-certificate within a reasonable time frame.

Once a replica obtains a commit-certificate, it refers to it in its vote at the next level. The figure below depicts Casper and highlights the differences from Hot-Stuff:

Removing the responsibility from the proposer role to collect and distribute commit-certificates is inherent for the setting that Casper addresses, as discussed below.

This variation is safe, replicas can indeed spread votes among themselves. It is live, but just like Hot-Stuff and PBFT, progress depends on certain synchrony. In Hot-Stuff, proposers (occasionally) need to collect a commit-certificate from the preceding level in order to make progress. In Casper, proposers do not collect, nor spread, commit-certificates. Instead, proposers must (occasionally) wait sufficiently long for the validators to spread votes among themselves and obtain commit-certificates in order to make progress. This requires proposers to (occasionally) delay for the worst case dissemination bound. Recall what I said above: Relying on synchrony provides a worst-case upper bound, but sets a (pessimistic) lower-bound.

As mentioned above, the complete Hot-Stuff solution achieves factor n improvement in communication complexity by applying threshold cryptography. This improvement relies on a proposer’s active involvement.

The Casper Finality Engine

The motivation for a finality engine is given by Buterin in an article titled Minimal Slashing Conditions. In a PoW chain, there is no explicit decision on blocks. As a block becomes buried deeper in the chain, it becomes harder to fork the chain below the block’s depth. Hence, the degree of ‘finality’ of a block is proportional to its depth. The idea of a finality engine is to provide an accompanying decision by a BFT engine on blocks at arbitrary depths. The guarantee provided by the BFT engine is indepedent of the depth of the block.

Technically, the interplay between Casper and the PoW chain works as follows. Casper is a BFT mechanism designed to work with a PoW as a ‘beacon’. The public PoW chain serves propositions to the BFT engine, and the engine forms agreement on a single chain. In order for this to work, the finality engine does not need to impose any change of format or content on the public chain. This is already illustrated above, simply replacing internal proposers with an auxiliary source. As explained above, this materialization of the ‘beacon’ functionality is safe.

In order to provide progress, the PoW chain generating propositions for the BFT engine must fulfill the beacon liveness conditions. Casper stipulates that the PoW chain maintains the following three conditions that together guarantee beacon liveness:

    1. First, propositions need to extend a branch from the highest-level committed block, because decisions by the finality engine are irreversible.
    2. Second, the source generating the propositions must (occasionally) allow time for the validators to disseminate votes and collect commit-certificates in between propositions.
swing4
A commit-certificate is formed on the (short) ‘X’ branch. One correct validator is “trapped” by voting on this commit-certificate. In this case, the Casper commandments prohibit this validator from ever participating in votes on the ‘Y’ branch. The PoW chain must therefore extend the ‘X’ branch, or else, Casper may get stuck.
  1. Last, it is not enough to take the longest chain from the last committed, as might be natural to expect. The PoW chain must extend a branch that contains the commit-certificate with the highest level. (See illustration of this condition on the right.)

Under the above three conditions, Casper provides liveness.

Epilogue

When we began our exploration of new-wave BFT protocols like Tendermint and Casper, they appeared quite different from traditional BFT protocols like DLS and PBFT. I hope that Hot-Stuff provided a useful algorithmic framework to discuss both worlds.


Acknowledgements
Most of the material discussed here was developed jointly with Ittai Abraham.
In addition, many thanks to Ittai, Bryan Fink and Linas Tumasonis
for helpful comments with this blogpost.
Advertisements

32 thoughts on “The BFT lens: Hot-Stuff and Casper

  1. Tendermint creator here… our algorithm is derived from DLS (which our 2014 whitepaper and onwards cited), and it has since evolved to become more like PBFT, but in fact is simpler. So, I must object to the characterization that Tendermint is somehow unlike classical BFT algorithms, when I have in fact coined the very phrase “classical BFT consensus” to refer to Tendermint for paving the way forward in introducing classical BFT consensus into the blockchain world.

    Please refer to Ethan Buchman’s thesis or the website Tendermint.com for details. Tendermint does have explicit proposer rotation, btw. The mechanism is superior to PBFTs view change mechanism, and has handled by locking (influence from DLS). Tendermint is not just for permissioned chains. See cosmos.network as well.

    Your analysis in comparing to Casper is insightful. Looking forward to a separate comparison with our current version of Tendermint.

    1. Thanks for your comment! I indeed started this blogpost with an explanation of DLS. I later removed it for brevity’s sake. I might add a post that reviews DLS soon.

      I did read Ethan’s thesis 🙂 .

      Tendermint and Casper greatly inspired our work on Hot-Stuff (we say that in the ArXiv manuscript, coming in a few days).

      I’ll be happy to blog about Tendermint using the Hot-Stuff framework.

    1. Good question!

      First, let’s make sure we agree on the definitions. A synchronous model is defined as follows: There exists a known bound D on message transmission delays, such that every message sent between two correct replicas arrives within time D. In addition, every replica can “measure” time accurately enough to check if a time period D has elapsed (perhaps with a small epsilon margin).

      When I say that PoW is synchronous I mean that its safety depends on an synchrony. Specifically, PoW inherently assumes that the propagation delay of new blocks is less than a known upper bound (e.g., ten minutes in Botcoin).

      What happens if this assumption is violated? Then safety might be compromised. To demonstrate this, let’s use the simplest (but extreme) scenario. Suppose that all the replicas in the system are split into two partitions, A and B. All transmissions among all the A replicas arrive in time. Likewise, all the ones among B arrive in time. However, messages from A take a year to arrive to any replica in B, and vice versa. Clearly, in this case, two separate chains will form within the two partitions and they will never converge. Every time a message about a block solved in partition A arrives at B, the chain in B will already be a year worth of blocks longer; and vice versa.

      What happens in the same settings with PBFT, Hot-Stuff, and other -asynchronous- BFT solutions? They will make no progress until the network partition is fixed.

      Note that the scenario above was given as a demonstration only. In practice, we would expect such a condition to not last for more than a few hours/day/week? The asynchronous design choice is to stay safe during such a period.

      For a formal analysis of the synchronicity of Bitcoin, I would refer to https://eprint.iacr.org/2014/765.pdf .

  2. Very interesting read. Thanks! From what I understand though, Algorand works in a permissionless environment (only its BA* requires a permissioned setting)

    1. HI David,

      Thanks for your comment. In a nutshell, I agree, I should probably classify Algorand as permission-less, or better, as Hybrid/permission-less.

      Sortition scales by reducing the agreement problem to a rotating “committee” of voters. Committee voting is weighed by the assets of participants, rather than by count, but either way, there is a known total amount (of “weight”) at each epoch. The system is bootstrapped with a known committee of users/weights. This committee is then “rotated” based on transaction information inside blocks that are appended to the chain.

      This is very close to Solida/Solidus: Algorand requires you to obtain assets in order to rotate into the universe, and then use Sortition to select a subset of the universe, Solida requires you to demonstrate PoW in order to rotate into the committee, and then you are an explicit member. In both, the rotation of the committee is so frequent and so dynamic that it looks like a permission-less setting, but it is more of a hybrid.

      1. Hey Dahlia,

        I wonder if there are clear definitions for permissioned/permission-less (or even hybrid) settings in a distributed system?
        It seems like there is a very thin line between the meaning of the two.

        Usually people refer to a system with a permission-less setting, as a system where anyone can join and participate in the consensus algorithm. It is, indeed, very intuitive definition.
        Though, things start to get a little bit nasty if we change the above definition a bit. Where we can change the word “participate” to “equally participate”, meaning that every entity in the system have an equal share in the consensus process.

        Although this “new” definition is reasonable it looks unfeasible in any system that enables an anonymous entity to freely join, since Sybil attacks may occur. Therefore any system that allows such free joining has a mechanism for mitigating Sybil attacks like POW and POS.
        Such mechanisms inevitably create unbalanced system where some of the entities have enough resources (hash power, stake, etc.) to effectively participate in the consensus process and others’ participation is practically impossible (though theoretically possible).

        Consider the following example: Assume a blockchain where any entity can join freely and anonymously. The system give an equal chance to any proposer of a block to have its proposed block accepted. BUT, to be a proposer you have to be a company with an estimated market cap of at least 1B dollars.
        Even though these example have its own flaws (who decides which company is eligible to be proposer?) it gives an hint about the problem – just like POW and POS anyone can participate but it is only practical for a few (maybe theoretically anyone can create 1B dollar worth of company).

        One property I heard that is fairly differentiating systems with permissioned setting from a permission-less ones is the property that in a permission setting there is, at any time, a relatively small list of identities that can propose or validate a block. This list is known ahead by everyone.
        This property seems a good differentiator between the two settings but somehow feel inaccurate (what if 99.9999% of the hash power in Bitcoin is centralised in 10 known identities? does it suddenly makes Bitcoin a permissioned system?).

        Maybe the difference is pinned in the governance of the blockchain and therefore the above example I gave is pointless.

        I would appreciate to hear your opinion in that matter.

        Thanks,
        Avi

  3. Thanks for the insightful comment, Avi. I agree that there is a good question here, that may not have been fully addressed yet.

    Here are some thoughts on the matter:

    At a high level, there is one model, a universe of (weighted) participants, whose total size (weight) is (roughly) known, such that the total weight of bad participants is less than some threshold (e.g., a third in asynchronous systems). A “permissioned” model is a fixed set of known participants, each one has an equal one-unit weight. PoW weighs participants by computational power. PoS weighs by a monetary deposit. In all models, safety is guaranteed against some weighted-threshold of bad participants. A mapping of different models to the corresponding threshold functions is described in a recent tutorial we wrote on [Abraham and Malkhi 2017, “Blockchain and BFT”].

    These models differ in what drives your trust in them: A permissioned model relies on knowledge of participants and the belief that no more than a third could be compromised. PoW relies on large-scaling and on market forces to keep two thirds of the computational power in the hands of good participants. PoS relies on the threat of a large deposit being slashed due to misbehavior to keep two thirds behaving correctly.

    Solutions to the consensus in these models use distinct techniques. For example, in a PoW setting, Nakamoto Consensus does not employ explicit quorum voting, and in Casper, “leader” proposals are input from an auxiliary source.

    However, various hybrid solutions emerged recently that mix ingredients from the PoW, PoS, and BFT models, and blur the distinction between them. Several of these are employ `universe reduction’, selecting small sub-committees from a very large universe, to drive decisions within sub-committees using (more or less) standard BFT engines.

    For example, [Abraham et al. 2017, “Solida”] is a BFT engine built around a rotating-committee. Committee decisions are driven by a standard PBFT algorithm. Solida uses PoW as the admission criterion to rotate committees in a manner that preserves a correct threshold. There are algorithmic modifications to PBFT that handle admission of PoW leader, and are outside the scope here.

    Casper selects rotating sub-committees based on explicit stake. The Casper BFT algorithm uses PoW as a “leader” that infinitely often, guarantees a unique proposal. I already discuss Casper as a BFT solution in this blogpost.

    In Algorand, the universe is weighed by assets, and sub-committee sampling is done via a new cryptographic tool called Sortition. Algorand rotates the sub-committee on each consensus decision, and there are algorithmic advances that ensure that each participant need only send one message. These are very important enhancements, but outside the scope of our discussion.

    Earlier non-cryptographic reduction techniques exist that build on the seminal work of [Feige 1999, “Noncryptographic Selection Protocols’’], e.g., [King and Saia 2006, “Scalable Leader Election’’] and other works.

    Looking forward to hearing future research insights on this from you!

    [ Abraham and Malkhi 2017, “Blockchain and BFT”] https://dahliamalkhi.files.wordpress.com/2016/08/blockchainbft-beatcs2017.pdf

    [Abraham et al. 2017, “Solida”] https://arxiv.org/abs/1612.02916

    [Feige 1999, “Noncryptographic Selection Protocols’’] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.225.9799&rep=rep1&type=pdf

    [King and Saia 2006, “Scalable Leader Election’’] http://www.cs.unm.edu/~saia/papers/leader.pdf

  4. Thank you very much for a great reading, I’m looking forward to read the promised continuation! I’m still trying to understand the details of Hot-Stuff, and have a question – is there a chance that the way the proposer makes a “short-cut” in the proposer-proof is reminiscent of Hashgraph’s idea of “gossiping about the gossip”? Also in general, where do you put Hashgraph in your classification?

  5. Hashgraph is an asynchronous BFT protocol in the permissioned settings. Asynchronous protocols solve a harder problem than PBFT, Tendermint, Casper and Hot-Stuff. They use randomization for (expected) progress, not synchrony. Because they try to circumvent asynchrony, they tend to be more theoretical and complicated. The reason PBFT-based protocols are considered practical is because they guarantee a two-step decisions during (normal) periods of synchrony and they are very simple. If you look at Hashgraph’s whitepaper, you see latencies of more than ten seconds when the system grows above a hundred nodes.

    Other relevant asynchronous BFT protocols are Honey Badger [Miller et al., 2016], and Parsimonious Asynchronous BFT Broadcast [Cachin et al., 2006], and Random Oracles in Constantinopole [Cachin et al., 2000].

    Briefly, Hashgraph is a leaderless protocol that extracts a deterministic totally-ordered sequence of blocks from a causally-ordered graph.

    Seeing this paradigm now makes me smile, because it revives a line of works from the early 90’s that heavily occupied many, and has been the topic of my PhD research (see [1,2,3] below).

    I don’t see an easy mapping from Hot-Stuff to Hashgraph. I may get tempted to blog about Hashgraph now.

    1. Louise E. Moser, P. M. Melliar-Smith, and Vivek Agrawala. 1991. Total ordering algorithms. In Proceedings of the 19th annual conference on Computer Science (CSC ’91). ACM, New York, NY, USA, 375-380. DOI=http://dx.doi.org/10.1145/327164.327298

    2. Byzantine-Resistant Total Ordering Algorithms
    In Proceedings of the 19th annual conference on Computer Science (CSC ’91). ACM, New York, NY, USA, 375-380. DOI=http://dx.doi.org/10.1145/327164.327298

    3. Early Delivery Totally Ordered Broadcast in Asynchronous Environments.
    Danny Dolev, Shlomo Kramer and Dalia Malki.
    Proceedings of the 23nd Annual International Symposium on Fault-Tolerant Computing (FTCS), June 1993, pp. 544-553.
    https://dahliamalkhi.files.wordpress.com/2016/08/multicast-ftcs1993.pdf

  6. Thank you, the clarification about the Hashgraph being pure asynchronous! Indeed the diagrams from Hashgraph’s white-paper show not only a significant latency for a 100+ nodes configuration, but also somehow a significant performance drop when 8 nodes are not co-located. The same about the Honey Badger – the latency of their reference implementation floats above 100 sec. It appears that the price for being asynchronous is really high.

    There is something I’m missing about the HS – the example in the Data-Structure section shows that

    “Proposition a(4) carries a CC for a(2) which cannot become committed”

    . Is that because a(2) is not the _direct_ parent of a(4) that happens to carry CC(a(2))? If so, how it would be possible to commit a(2) – if a(3) didn’t bring any CC?

  7. Regarding your question here: Indeed, a commit vote occurs only on a direct parent.

    The proposer for the next level waits for a commit-certificate for a certain duration. If it expires without collecting a commit-certificate, it moves to the next level anyway, in order to allow progress. (Abstractly, this is the same as PBFT not reaching a decision in one view, and moving to the next.)

    In the example here, there is never a commit vote directly on a(2). Later commands on the A branch may become committed. This drives a commit decision on the entire prefix of the branch.

    (I just answered you on stackexchange.)

    1. Let me humbly ask a few more questions.

      The protocol description mentions that a “beacon is allowed to send a COMMIT certificate for a lower level or none at all”. What would be the purpose of sending a CC(cmd(curLevel-2))? How is that information actionable from a replica perspective? I can’t see any special treatment for in the example. On the other hand from the Algorithm 1 it appears that such a PROPOSE would bump replica’s highCC(highLevel) from 1 to 2 – is that principal? Or maybe only for the “branch switching” situation?

      Then about regarding the “finality”. My understanding is that Algorithm 1 dictates a replica to apply commands from CC(ccLevel) to SMR the moment a which is higher than the current highCC(highLevel) at the replica is received. From my limited understanding such a PROPOSE message doesn’t immediately guarantee 2f+1 of COMMIT-votes for ccLevel – similar to the message in the example.

      Could you kindly clarify about these, please?

    2. Apparently one should not use greater/less-than brackets on WordPress. Please ignore/delete the broken text above, and use this message instead…

      Let me humbly ask a few more questions.

      The protocol description mentions that a “beacon is allowed to send a COMMIT certificate for a lower level or none at all”. What would be the purpose of sending a CC(cmd(curLevel-2))? How is that information actionable from a replica perspective? I can’t see any special treatment for [CC(a(2)) | a(4)] in the example. On the other hand from the Algorithm 1 it appears that such a PROPOSE would bump replica’s highCC(highLevel) from 1 to 2 – is that principal? Or maybe only for the “branch switching” situation?

      Then about regarding the “finality”. My understanding is that Algorithm 1 dictates a replica to apply commands from CC(ccLevel) to SMR the moment a [PROPOSE, CC(ccLevel),..] which is higher than the current highCC(highLevel) at the replica is received. From my limited understanding such a PROPOSE message doesn’t immediately guarantee 2f+1 of COMMIT-votes for ccLevel – similar to the [CC(a(2)) | a(4)] message in the example.

      Could you kindly clarify about these, please?

      1. “What would be the purpose of sending a CC(cmd(curLevel-2))? ”

        To cause replicas to switch to the proposer’s branch and vote for it.

        “apply commands from CC(ccLevel) to SMR the moment a [PROPOSE, CC(ccLevel),..] which is higher than the current highCC(highLevel) at the replica is received.”

        No. A command cmd(level) becomes committed when a [PROPOSE, CC(cmd(level)), newcmd(level+1)] receives 2f+1 votes. An SMR is executed only when committed. It may be executed speculatively before it becomes committed with optimism; but then it may need to be rolled back.

      2. Thanks for your time, again!

        I’m glad that my guess about the “branch switching” was relevant, in a way.

        About the “committed” – as you say “A command cmd(level) becomes committed when a [PROPOSE, CC(cmd(level)), newcmd(level+1)] receives 2f+1 votes”. That is the decision is made at the beacon (which collects the vote messages), so there is a need to bring the functionality of a beacon into each replica – is that correct?

        One more thing which I find confusing (and hides in the “omitted for brevity” section) is “retrieve missing ancestors as needed” – in the body of Algorithm 1 and “retreive missing commands as needed” – in Algorithm 2. How do these happen? They are not application specific, are they?

        The last questions are about Algorithm 2. Does “upon pre-enter l > curlevel” mean that no operations performed if the condition doesn’t hold? (I guess I can figure that out from the proof, right?).
        Also in the timeout branch there is an instruction to “pre-enter level rLevel+x”. What is the intention here? In other places pre-enter is passed a single parameter.

        While reading the paper I found a couple of trivial misprints – you can look for “rataion”, “After GST, is some correct replica”, “votes for for level”.

      1. This would be unsafe. A “b” branch might have a block at level 3 with 2f+1 votes, say b(3). Since b(3) is the highest block with a CC, a higher proposal (say, b(5)) that contains CC(b(3)) would cause voters to switch over to the “b” branch. Later, b(5) could even commit, and with it, the entire “b”-branch prefix.

        Popping back to a high-level, intuitive explanation, you should think of the “immediate parent” rule as a two-phase protocol. The two phases are spread over two levels and two blocks. Nevertheless, they are indivisible phases of a protocol, much like PBFT, Tendermint, and other protocols consist of two-phases of voting.

  8. Thank you again. Your explanation totally makes sense!

    I’m interested in building a reference implementation for HS – do you happen to know about someone’s working on that, so we can join forces?

  9. “About the “committed” – as you say “A command cmd(level) becomes committed when
    a [PROPOSE, CC(cmd(level)), newcmd(level+1)] receives 2f+1 votes”. That is the
    decision is made at the beacon (which collects the vote messages), so there is a
    need to bring the functionality of a beacon into each replica – is that
    correct?”

    I stated the condition for a command to become committed, without specifying how
    to spread this information. Clearly, all replicas need to learn about committed commands in
    order to execute them.

    In some implementations, this can be materialized by gossipping votes to everyone.

    In our rotating-proposer paradigm, these 2f+1 will be collected by the proposer
    for level+2, as a COMMIT-certificate on newcmd(level+1). This COMMIT-certificate
    will be included in the PROPOSE for level+2. It a fortiori serves to reach a
    decision on cmd(level).

    On “missing ancsetors”: This is just to say that the protocol works on
    full-predixes. When a replica receives a proposal, say cmd(level), it may ask
    the proposer (or other replicas) to retransmit previous blocks in the chain.

    This type of “state transfer” is intrisic in any implementation: If a replica
    falls behind, and the last block it received is, say, cmd(k), then if it
    receives cmd(k+j) (or votes for it) it has to catch up via state-transfer.

    “Does “upon pre-enter l > curlevel”
    mean that no operations performed if the condition doesn’t hold? (I guess I can
    figure that out from the proof, right?).”

    Yes, this code is intnded to be invoked once.

    “pre-enter level rLevel+x”.

    This is the invokation, and the level paramter equals rLevel+x. It is English,
    not code 🙂 . sorry.

    “While reading the paper I found a couple of trivial misprints – …”

    thx!

    PS Regarding meeting Ittai — I will be happy to put you two in touch over e-mail. Please e-mail me and I’ll follow up.

  10. The safe-cc rule states that the condition for a VOTE-message being sent for a command is that “it does not conflict with the highest-level COMMIT-certificate _prefix_ it saw”. I want to make sure I understand this properly – there term prefix is bit vague for me here.

    Let’s look at the example diagram, and suppose that the messages have arrived at the replica in the following sequence: {a1, a2, b2, b3, …}.

    Then:
    * a1 will be voted for – as there are no CC known at it’s arrival
    * a2 will be voted for – as it carries the highCC by itself, (highCC<=a1)
    * b2 will not be voted for – as there was a vote on level 2 already
    * b3 will not be voted for – as it conflicts with a2

    Here we didn't check for a conflict with highCC directly (which is a1), but rather with a2 which is the "prefix" that introduced highCC. Is that correct? Then, after receiving a4, similarly – the conflicts will be measured against a4, and not a2. Is that correct?

    On the other hand, if the message sequence was {a1, b2, a2, b3, a3, a4, b4, …} then:
    * a1 will be voted for – as there are no CC known at it's arrival
    * b2 will be voted for – as it carries the highCC by itself (highCC<=a1)
    * a2 will not be voted for – as there was a vote on level 2 already
    * b3 will be voted for – there is no conflict with b2
    * a3 will not be voted for – it conflicts w/b2
    * a4 will be voted for – as it carries the highCC by itself (highCC<=a2)
    * b4 will not be voted for – it conflicts with a4, the "highCC's prefix"

    1. “Here we didn’t check for a conflict with highCC directly (which is a1), but rather with a2 which is the “prefix” that introduced highCC. Is that correct? Then, after receiving a4, similarly – the conflicts will be measured against a4, and not a2. Is that correct?”

      Correct.

      The rest is correct, too!

  11. if replica receives a proposal, and the proposal violate the safe-cc rule, what’s the response the replica sendto beacon? According to Lemma 5, the response must have the highest CC the replica had, then new proposer can have the highest commit-certificate by any correct replica. I think only when proposer rotation happens, the correct replica make the response which contains highestCC. why not let the Enter message directly contains highestCC? and one rotation period there is only one proposer, right? expect to your answer, thanks.

    1. and I think only the safe-cc rule is not enough, a correct replica should not vote for the same level twice and should not vote for the level lower than the level he already votes, even if the lower level carries highest CC he doesn’t have.

      1. The beacon functionality is abstract, the proposer may rotate inside or not. The soonest a beacon has the highest-CC, the best, because then the proposal can obtain votes. This is needed for progress (not for safety).

        Your observation regarding the monotonicity of voting is correct. The pseudo-code omitted this by mistake (in fact, note that it does not refer to ‘curlevel’ at all at a replica), and we will fix it.

        We are going to post a more precise and detailed description of Hot-Stuff shortly. Stay tuned!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s