vendor/github.com/hashicorp/memberlist/README.md

   1 # memberlist [![GoDoc](https://godoc.org/github.com/hashicorp/memberlist?status.png)](https://godoc.org/github.com/hashicorp/memberlist)
   2
   3 memberlist is a [Go](http://www.golang.org) library that manages cluster
   4 membership and member failure detection using a gossip based protocol.
   5
   6 The use cases for such a library are far-reaching: all distributed systems
   7 require membership, and memberlist is a re-usable solution to managing
   8 cluster membership and node failure detection.
   9
  10 memberlist is eventually consistent but converges quickly on average.
  11 The speed at which it converges can be heavily tuned via various knobs
  12 on the protocol. Node failures are detected and network partitions are partially
  13 tolerated by attempting to communicate to potentially dead nodes through
  14 multiple routes.
  15
  16 ## Building
  17
  18 If you wish to build memberlist you'll need Go version 1.2+ installed.
  19
  20 Please check your installation with:
  21
  22 ```
  23 go version
  24 ```
  25
  26 ## Usage
  27
  28 Memberlist is surprisingly simple to use. An example is shown below:
  29
  30 ```go
  31 /* Create the initial memberlist from a safe configuration.
  32    Please reference the godoc for other default config types.
  33    http://godoc.org/github.com/hashicorp/memberlist#Config
  34 */
  35 list, err := memberlist.Create(memberlist.DefaultLocalConfig())
  36 if err != nil {
  37         panic("Failed to create memberlist: " + err.Error())
  38 }
  39
  40 // Join an existing cluster by specifying at least one known member.
  41 n, err := list.Join([]string{"1.2.3.4"})
  42 if err != nil {
  43         panic("Failed to join cluster: " + err.Error())
  44 }
  45
  46 // Ask for members of the cluster
  47 for _, member := range list.Members() {
  48         fmt.Printf("Member: %s %s\n", member.Name, member.Addr)
  49 }
  50
  51 // Continue doing whatever you need, memberlist will maintain membership
  52 // information in the background. Delegates can be used for receiving
  53 // events when members join or leave.
  54 ```
  55
  56 The most difficult part of memberlist is configuring it since it has many
  57 available knobs in order to tune state propagation delay and convergence times.
  58 Memberlist provides a default configuration that offers a good starting point,
  59 but errs on the side of caution, choosing values that are optimized for
  60 higher convergence at the cost of higher bandwidth usage.
  61
  62 For complete documentation, see the associated [Godoc](http://godoc.org/github.com/hashicorp/memberlist).
  63
  64 ## Protocol
  65
  66 memberlist is based on ["SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol"](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf),
  67 with a few minor adaptations, mostly to increase propagation speed and
  68 convergence rate.
  69
  70 A high level overview of the memberlist protocol (based on SWIM) is
  71 described below, but for details please read the full
  72 [SWIM paper](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)
  73 followed by the memberlist source. We welcome any questions related
  74 to the protocol on our issue tracker.
  75
  76 ### Protocol Description
  77
  78 memberlist begins by joining an existing cluster or starting a new
  79 cluster. If starting a new cluster, additional nodes are expected to join
  80 it. New nodes in an existing cluster must be given the address of at
  81 least one existing member in order to join the cluster. The new member
  82 does a full state sync with the existing member over TCP and begins gossiping its
  83 existence to the cluster.
  84
  85 Gossip is done over UDP with a configurable but fixed fanout and interval.
  86 This ensures that network usage is constant with regards to number of nodes, as opposed to
  87 exponential growth that can occur with traditional heartbeat mechanisms.
  88 Complete state exchanges with a random node are done periodically over
  89 TCP, but much less often than gossip messages. This increases the likelihood
  90 that the membership list converges properly since the full state is exchanged
  91 and merged. The interval between full state exchanges is configurable or can
  92 be disabled entirely.
  93
  94 Failure detection is done by periodic random probing using a configurable interval.
  95 If the node fails to ack within a reasonable time (typically some multiple
  96 of RTT), then an indirect probe as well as a direct TCP probe are attempted. An
  97 indirect probe asks a configurable number of random nodes to probe the same node,
  98 in case there are network issues causing our own node to fail the probe. The direct
  99 TCP probe is used to help identify the common situation where networking is
 100 misconfigured to allow TCP but not UDP. Without the TCP probe, a UDP-isolated node
 101 would think all other nodes were suspect and could cause churn in the cluster when
 102 it attempts a TCP-based state exchange with another node. It is not desirable to
 103 operate with only TCP connectivity because convergence will be much slower, but it
 104 is enabled so that memberlist can detect this situation and alert operators.
 105
 106 If both our probe, the indirect probes, and the direct TCP probe fail within a
 107 configurable time, then the node is marked "suspicious" and this knowledge is
 108 gossiped to the cluster. A suspicious node is still considered a member of
 109 cluster. If the suspect member of the cluster does not dispute the suspicion
 110 within a configurable period of time, the node is finally considered dead,
 111 and this state is then gossiped to the cluster.
 112
 113 This is a brief and incomplete description of the protocol. For a better idea,
 114 please read the
 115 [SWIM paper](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)
 116 in its entirety, along with the memberlist source code.
 117
 118 ### Changes from SWIM
 119
 120 As mentioned earlier, the memberlist protocol is based on SWIM but includes
 121 minor changes, mostly to increase propagation speed and convergence rates.
 122
 123 The changes from SWIM are noted here:
 124
 125 * memberlist does a full state sync over TCP periodically. SWIM only propagates
 126   changes over gossip. While both eventually reach convergence, the full state
 127   sync increases the likelihood that nodes are fully converged more quickly,
 128   at the expense of more bandwidth usage. This feature can be totally disabled
 129   if you wish.
 130
 131 * memberlist has a dedicated gossip layer separate from the failure detection
 132   protocol. SWIM only piggybacks gossip messages on top of probe/ack messages.
 133   memberlist also piggybacks gossip messages on top of probe/ack messages, but
 134   also will periodically send out dedicated gossip messages on their own. This
 135   feature lets you have a higher gossip rate (for example once per 200ms)
 136   and a slower failure detection rate (such as once per second), resulting
 137   in overall faster convergence rates and data propagation speeds. This feature
 138   can be totally disabed as well, if you wish.
 139
 140 * memberlist stores around the state of dead nodes for a set amount of time,
 141   so that when full syncs are requested, the requester also receives information
 142   about dead nodes. Because SWIM doesn't do full syncs, SWIM deletes dead node
 143   state immediately upon learning that the node is dead. This change again helps
 144   the cluster converge more quickly.