[ikiwiki] / trunk / apt-p2p / protocol.mdwn Repository:
ViewVC logotype

Annotation of /trunk/apt-p2p/protocol.mdwn

Parent Directory Parent Directory | Revision Log Revision Log


Revision 123 - (view) (download)
Original Path: trunk/apt-dht/protocol.mdwn

1 : camrdale 118 Apt-DHT's Distributed Hash Table is very similar to BitTorrent's.
2 :     BitTorrent uses a "distributed sloppy hash table" (DHT) for storing peer
3 :     contact information for "trackerless" torrents. In effect, each peer
4 : camrdale 119 becomes a tracker. The protocol is based on [Kademlia][10] and is
5 : camrdale 118 implemented over UDP.
6 :    
7 :     [10]: http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf
8 :    
9 :     Please note the terminology used in this document to avoid confusion. A
10 :     "peer" is a client/server listening on a TCP port that implements the
11 :     BitTorrent protocol. A "node" is a client/server listening on a UDP port
12 :     implementing the distributed hash table protocol. The DHT is composed of
13 :     nodes and stores the location of peers. BitTorrent clients include a DHT
14 :     node, which is used to contact other nodes in the DHT to get the
15 :     location of peers to download from using the BitTorrent protocol.
16 :    
17 :     ## Overview
18 :    
19 :     Each node has a globally unique identifier known as the "node ID." Node
20 :     IDs are chosen at random from the same 160-bit space as file hashes (use
21 :     SHA1 and plenty of entropy to ensure a unique ID). A "distance metric"
22 :     is used to compare two node IDs or a node ID and an infohash for
23 :     "closeness." Nodes must maintain a routing table containing the contact
24 :     information for a small number of other nodes. The routing table becomes
25 :     more detailed as IDs get closer to the node's own ID. Nodes know about
26 :     many other nodes in the DHT that have IDs that are "close" to their own
27 :     but have only a handful of contacts with IDs that are very far away from
28 :     their own.
29 :    
30 :     In Kademlia, the distance metric is XOR and the result is interpreted as
31 : camrdale 121 an unsigned integer. distance(A,B) = |A XOR B| Smaller values are
32 : camrdale 118 closer.
33 :    
34 :     When a node wants to find peers for a file download, it uses the
35 :     distance metric to compare the hash of the file with the IDs of the
36 :     nodes in its own routing table. It then contacts the nodes it knows
37 :     about with IDs closest to the file hash and asks them for the contact
38 :     information of peers that currently have the file. If a contacted node
39 :     knows about peers for the file, the peer contact information is returned
40 :     with the response. Otherwise, the contacted node must respond with the
41 :     contact information of the nodes in its routing table that are closest
42 :     to the hash of the file. The original node iteratively queries nodes
43 :     that are closer to the target hash until it cannot find any closer
44 :     nodes. After the download is completed, the client then inserts the peer
45 :     contact information for itself onto the responding nodes with IDs
46 :     closest to the hash of the file.
47 :    
48 : camrdale 121 The return value for a query for peers includes an opaque value known as
49 :     the "token." For a node to announce that its controlling peer is
50 :     downloading a torrent, it must present the token received from the same
51 :     queried node in a recent query for nodes. When a node attempts to
52 :     "announce" a torrent, the queried node checks the token against the
53 :     querying node's IP address. This is to prevent malicious hosts from
54 :     signing up other hosts for downloads. Since the token is merely returned
55 :     by the querying node to the same node it received the token from, the
56 :     implementation is not defined. Tokens must be accepted for a reasonable
57 :     amount of time after they have been distributed. The current
58 :     implementation uses the SHA1 hash of the IP address concatenated onto a
59 :     secret that changes every 15 minutes and tokens up to one hour old are
60 :     accepted.
61 :    
62 : camrdale 118 ## Routing Table
63 :    
64 :     Every node maintains a routing table of known good nodes. The nodes in
65 :     the routing table are used as starting points for queries in the DHT.
66 :     Nodes from the routing table are returned in response to queries from
67 :     other nodes.
68 :    
69 :     Not all nodes that we learn about are equal. Some are "good" and some
70 :     are not. Many nodes using the DHT are able to send queries and receive
71 :     responses, but are not able to respond to queries from other nodes. It
72 :     is important that each node's routing table must contain only known good
73 :     nodes. A good node is a node that has responded to one of our queries
74 :     within the last 15 minutes. A node is also good if it has ever responded
75 :     to one of our queries and has sent us a query within the last 15
76 :     minutes. After 15 minutes of inactivity, a node becomes questionable.
77 :     Nodes become bad when they fail to respond to multiple queries in a row.
78 :     Nodes that we know are good are given priority over nodes with unknown
79 :     status.
80 :    
81 :     The routing table covers the entire node ID space from 0 to
82 :     2<sup>160</sup>. The routing table is subdivided into "buckets" that
83 :     each cover a portion of the space. An empty table has one bucket with an
84 :     ID space range of min=0, max=2<sup>160</sup>. When a node with ID "N" is
85 :     inserted into the table, it is placed within the bucket that has min
86 :     &lt;= N &lt; max. An empty table has only one bucket so any node must
87 :     fit within it. Each bucket can only hold K nodes, currently eight,
88 :     before becoming "full." When a bucket is full of known good nodes, no
89 :     more nodes may be added unless our own node ID falls within the range of
90 :     the bucket. In that case, the bucket is replaced by two new buckets each
91 :     with half the range of the old bucket and the nodes from the old bucket
92 :     are distributed among the two new ones. For a new table with only one
93 :     bucket, the full bucket is always split into two new buckets covering
94 :     the ranges 0..2<sup>159</sup> and 2<sup>159</sup>..2<sup>160</sup>.
95 :    
96 :     When the bucket is full of good nodes, the new node is simply discarded.
97 :     If any nodes in the bucket are known to have become bad, then one is
98 :     replaced by the new node. If there are any questionable nodes in the
99 :     bucket have not been seen in the last 15 minutes, the least recently
100 :     seen node is pinged. If the pinged node responds then the next least
101 :     recently seen questionable node is pinged until one fails to respond or
102 :     all of the nodes in the bucket are known to be good. If a node in the
103 :     bucket fails to respond to a ping, it is suggested to try once more
104 :     before discarding the node and replacing it with a new good node. In
105 :     this way, the table fills with stable long running nodes.
106 :    
107 :     Each bucket should maintain a "last changed" property to indicate how
108 :     "fresh" the contents are. When a node in a bucket is pinged and it
109 :     responds, or a node is added to a bucket, or a node in a bucket is
110 :     replaced with another node, the bucket's last changed property should be
111 :     updated. Buckets that have not been changed in 15 minutes should be
112 :     "refreshed." This is done by picking a random ID in the range of the
113 :     bucket and performing a find_nodes search on it. Nodes that are able to
114 :     receive queries from other nodes usually do not need to refresh buckets
115 :     often. Nodes that are not able to receive queries from other nodes
116 :     usually will need to refresh all buckets periodically to ensure there
117 :     are good nodes in their table when the DHT is needed.
118 :    
119 :     Upon inserting the first node into its routing table and when starting
120 :     up thereafter, the node should attempt to find the closest nodes in the
121 :     DHT to itself. It does this by issuing find_node messages to closer and
122 :     closer nodes until it cannot find any closer. The routing table should
123 :     be saved between invocations of the client software.
124 :    
125 :     ## KRPC Protocol
126 :    
127 :     The KRPC protocol is a simple RPC mechanism consisting of bencoded
128 :     dictionaries sent over UDP. A single query packet is sent out and a
129 :     single packet is sent in response. There is no retry. There are three
130 :     message types: query, response, and error. For the DHT protocol, there
131 : camrdale 123 are currently five queries: ping, join, find\_node, get\_value, and
132 :     store\_value.
133 : camrdale 118
134 :     A KRPC message is a single dictionary with two keys common to every
135 :     message and additional keys depending on the type of message. Every
136 :     message has a key "t" with a string value representing a transaction ID.
137 :     This transaction ID is generated by the querying node and is echoed in
138 :     the response, so responses may be correlated with multiple queries to
139 :     the same node. The transaction ID should be encoded as a random binary
140 :     string, typically 20 bytes in length corresponding to the SHA1 hash.
141 :    
142 :     The other key contained in every KRPC message is "y" with a single
143 :     character value describing the type of message. The value of the "y" key
144 :     is one of "q" for query, "r" for response, or "e" for error.
145 :    
146 : camrdale 119 ### Queries
147 : camrdale 118
148 :     Queries, or KRPC message dictionaries with a "y" value of "q", contain
149 :     two additional keys; "q" and "a". Key "q" has a string value containing
150 :     the method name of the query. Key "a" has a dictionary value containing
151 :     named arguments to the query.
152 :    
153 : camrdale 119 ### Responses
154 : camrdale 118
155 :     Responses, or KRPC message dictionaries with a "y" value of "r", contain
156 :     one additional key "r". The value of "r" is a dictionary containing
157 :     named return values. Response messages are sent upon successful
158 :     completion of a query.
159 :    
160 : camrdale 119 ### Errors
161 : camrdale 118
162 :     Errors, or KRPC message dictionaries with a "y" value of "e", contain
163 :     one additional key "e". The value of "e" is a list. The first element is
164 :     an integer representing the error code. The second element is a string
165 :     containing the error message. Errors are sent when a query cannot be
166 :     fulfilled. The following table describes the possible error codes:
167 :    
168 :     <table>
169 : camrdale 120 <tr><th>Code</th><th>Description</th></tr>
170 : camrdale 118 <tr><td>201</td><td>Generic Error</td></tr>
171 :     <tr><td>202</td><td>Server Error</td></tr>
172 :     <tr><td>203</td><td>Protocol Error, such as a malformed packet, invalid arguments, or bad token</td></tr>
173 :     <tr><td>204</td><td>Method Unknown</td></tr>
174 :     </table>
175 :    
176 : camrdale 119 #### Example Error Packets
177 : camrdale 118
178 : camrdale 121 generic error = {"t":"12345678901234567890", "y":"e", "e":[201, "A Generic Error Occurred"]}
179 :     bencoded = d1:eli201e24:A Generic Error Occurrede1:t20:123456789012345678901:y1:ee
180 : camrdale 118
181 : camrdale 119 ### Contact Encoding
182 : camrdale 118
183 :     Contact information for peers is encoded as a 6-byte string. Also known
184 :     as "Compact IP-address/port info" the 4-byte IP address is in network
185 :     byte order with the 2 byte port in network byte order concatenated onto
186 :     the end.
187 :    
188 :     Contact information for nodes is encoded as a 26-byte string. Also known
189 :     as "Compact node info" the 20-byte Node ID in network byte order has the
190 :     compact IP-address/port info concatenated to the end.
191 :    
192 :     ## DHT Queries
193 :    
194 :     All queries have an "id" key and value containing the node ID of the
195 :     querying node. All responses have an "id" key and value containing the
196 :     node ID of the responding node.
197 :    
198 : camrdale 119 ### ping
199 : camrdale 118
200 :     The most basic query is a ping. "q" = "ping" A ping query has a single
201 :     argument, "id" the value is a 20-byte string containing the senders node
202 :     ID in network byte order. The appropriate response to a ping has a
203 :     single key "id" containing the node ID of the responding node.
204 :    
205 : camrdale 119 arguments: {"id": "<querying nodes id>"}
206 :     response: {"id": "<queried nodes id>"}
207 : camrdale 118
208 : camrdale 119 #### Example Packets
209 : camrdale 118
210 :     ping Query = {"t":"12345678901234567890", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}}
211 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t20:123456789012345678901:y1:qe
212 : camrdale 118
213 :     Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
214 :     bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re
215 :    
216 : camrdale 119 ### join
217 : camrdale 118
218 :     join is identical to ping, except in the response provided. It is
219 :     intended to be sent only to bootstrap nodes during the process of a new
220 : camrdale 121 peer joining the system. The response to a "q" = "join" query has two
221 : camrdale 123 additional responses, a key "ip\_addr" whose value is the bootstrap
222 : camrdale 118 node's understanding of the peer's IP address, and a key "port" whose
223 :     value is the port number that the bootstrap node received the request
224 :     from.
225 :    
226 : camrdale 119 arguments: {"id": "<querying nodes id>"}
227 :     response: {"id": "<queried nodes id>", "ip_addr": "<querying nodes IP>", "port": "<querying nodes port>"}
228 : camrdale 118
229 : camrdale 119 #### Example Packets
230 : camrdale 118
231 :     join Query = {"t":"12345678901234567890", "y":"q", "q":"join", "a":{"id":"abcdefghij0123456789"}}
232 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:join1:t20:123456789012345678901:y1:qe
233 : camrdale 118
234 :     Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"mnopqrstuvwxyz123456", "ip_addr":"123.123.123.123", "port":12345}}
235 :     bencoded = d1:rd2:id20:mnopqrstuvwxyz1234567:ip_addr15:123.123.123.1234:porti12345ee1:t20:123456789012345678901:y1:re
236 :    
237 : camrdale 123 ### find\_node
238 : camrdale 118
239 :     Find node is used to find the contact information for a node given its
240 : camrdale 123 ID. "q" = "find\_node" A find_node query has two arguments, "id"
241 : camrdale 118 containing the node ID of the querying node, and "target" containing the
242 : camrdale 123 ID of the node sought by the querier. When a node receives a find\_node
243 : camrdale 118 query, it should respond with a key "nodes" and value of a string
244 :     containing the compact node info for the target node or the K (8)
245 : camrdale 121 closest good nodes in its own routing table. A "token" key is also
246 :     included in the return value. The token value is a required argument for
247 : camrdale 123 a future store\_value query. The token value should be a short binary
248 : camrdale 121 string.
249 : camrdale 118
250 : camrdale 119 arguments: {"id": "<querying nodes id>", "target": "<id of target node>"}
251 : camrdale 121 response: {"id": "<queried nodes id>", "nodes": "<compact node info>", "token": "<opaque write token>"}
252 : camrdale 118
253 : camrdale 119 #### Example Packets
254 : camrdale 118
255 :     find_node Query = {"t":"12345678901234567890", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}
256 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t20:123456789012345678901:y1:qe
257 : camrdale 118
258 : camrdale 121 Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456...", "token":"aoeusnth"}}
259 :     bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...5:token8:aoeusnthe1:t20:123456789012345678901:y1:re
260 : camrdale 118
261 : camrdale 123 ### get\_value
262 : camrdale 118
263 : camrdale 123 Get values associated with a file hash. "q" = "get\_value" A get\_value
264 : camrdale 118 query has two arguments, "id" containing the node ID of the querying
265 :     node, and "key" containing the hash of the file. If the queried node has
266 : camrdale 121 values for the hash, they are returned in a key "values" as a list of
267 :     strings. See the next section for a description of the possible stored
268 :     values. If the queried node has no values for the hash, a key "nodes" is
269 :     returned containing the K nodes in the queried nodes routing table
270 :     closest to the hash supplied in the query.
271 : camrdale 118
272 : camrdale 119 arguments: {"id": "<querying nodes id>", "key": "<20-byte hash of target file>"}
273 : camrdale 121 response: {"id": "<queried nodes id>", "values": ["<peer 1 value string>", "<peer 2 value string>"]}
274 : camrdale 119 or: {"id": "<queried nodes id>", "nodes": "<compact node info>"}
275 : camrdale 118
276 : camrdale 119 #### Example Packets
277 : camrdale 118
278 :     get_value Query = {"t":"12345678901234567890", "y":"q", "q":"get_value", "a": {"id":"abcdefghij0123456789", "key":"mnopqrstuvwxyz123456"}}
279 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz123456e1:q9:get_value1:t20:123456789012345678901:y1:qe
280 : camrdale 118
281 : camrdale 121 Response with values = {"t":"12345678901234567890", "y":"r", "r": {"id":"abcdefghij0123456789", "values": ["d1:c6:456abc", "d1:c6:def456e"]}}
282 :     bencoded = d1:rd2:id20:abcdefghij01234567896:valuesl6:axje.u6:idhtnmee1:t20:123456789012345678901:y1:re
283 : camrdale 118
284 : camrdale 119 Response with nodes = {"t":"12345678901234567890", "y":"r", "r": {"id":"abcdefghij0123456789", "nodes": "def456..."}}
285 :     bencoded = d1:rd2:id20:abcdefghij01234567895:nodes9:def456...e1:t20:123456789012345678901:y1:re
286 : camrdale 118
287 : camrdale 123 ### store\_value
288 : camrdale 118
289 : camrdale 123 Store a value in the DHT. store\_value has four arguments: "id"
290 : camrdale 121 containing the node ID of the querying node, "key" containing the hash
291 :     of the file, "value" is a string containing the value to be stored, and
292 :     the "token" received in response to a previous find_node query. The
293 :     queried node must verify that the token was previously sent to the same
294 :     IP address as the querying node. See the next section for a description
295 :     of the possible stored strings. The queried node should store the
296 :     contact information of the querying node under the file hash in its
297 :     store of peer contact information.
298 : camrdale 118
299 : camrdale 121 arguments: {"id" : "<querying nodes id>", "key" : "<20-byte hash of target file>", "value" : "<string value>", "token" : "<opaque token>"}
300 : camrdale 119 response: {"id" : "<queried nodes id>"}
301 : camrdale 118
302 : camrdale 119 #### Example Packets
303 : camrdale 118
304 : camrdale 121 store_value Query = {"t":"12345678901234567890", "y":"q", "q":"store_value", "a": {"id":"abcdefghij0123456789", "key":"mnopqrstuvwxyz123456", "value":"d1:c6:def456e", "token":"aoeusnth"}}
305 :     bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz1234565:token8:aoeusnth5:value13:d1:c6:def456ee1:q13:store_value1:t20:123456789012345678901:y1:qe
306 : camrdale 118
307 :     Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
308 :     bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re
309 :    
310 : camrdale 121 ## Stored Values
311 :    
312 :     All of the values stored and retrieved in the previous section's
313 : camrdale 123 description of the "get\_value" and "store\_value" DHT queries are
314 : camrdale 121 strings. Currently, all stored values are bencoded strings of a
315 :     dictionary containing many possibilities. In the simplest possible
316 :     implementation, the dictionary will have a single key "c", whose value
317 :     is the 6-byte compact contact information for the peer providing the
318 :     file. All other possibilities build on this simple model.
319 :    
320 :     stored value = {"c":"def456"}
321 :     bencoded = d1:c6:def456e
322 :    
323 :     ### Piece Hashes
324 :    
325 :     In order to support the efficient downloading of large files from
326 :     multiple peers, hashes of smaller pieces of the large files are
327 :     necessary.
328 :    
329 :     #### Piece Size
330 :    
331 :     To accomplish this, a piece size must be chosen that is consistent
332 :     across all clients, and is no larger than some acceptable maximum piece
333 :     size. An optimal piece size would have all the pieces be the same size,
334 :     which would be less than or equal to the maximum piece size, to prevent
335 :     the last piece from being very small. However, downloading occurs by
336 :     requesting smaller *chunks* of a piece, so choosing a piece size that is
337 :     a multiple of the chunk size will prevent having to request very small
338 :     chunks. Therefore, the following formula, in which `size` is the size of
339 :     the file, `MAX_SIZE` is the maximum piece size, and `CHUNK_SIZE` is the
340 :     chunk size, will give the number of pieces and optimal piece size:
341 :    
342 :     n = 1 + floor((size - 1) / MAX_SIZE)
343 : camrdale 122 optimal = max(MAX_SIZE/2, ceil((size/n)/CHUNK_SIZE)*CHUNK_SIZE)
344 : camrdale 121
345 :     There will then be `n-1` pieces of size `optimal`, and 1 piece of size
346 :     `size - (n-1)*optimal`. The maximum piece size is 512 KiB (524288
347 :     bytes), and the chunk size is 16 KiB (16384 bytes). So, for example, for
348 :     a file of size 1234567, there will be 2 pieces of size 425984 followed
349 :     by a single piece of size 382599.
350 :    
351 :     #### Piece Hash Storage
352 :    
353 :     Files larger than the maximum piece size will require multiple pieces.
354 :     Each piece of the file will have to be hashed to get these piece hashes.
355 :     Once hashed, the piece hashes will be concatenated and stored in a
356 :     binary string whose length is a multiple of 20 (the length of the SHA1
357 :     hash).
358 :    
359 :     #### Storage in the DHT
360 :    
361 :     Storage of the piece hashes so that peers accessing the DHT can retrieve
362 :     them will depend on the number of pieces. This is due to the limitation
363 :     on the length of a UDP packet to prevent fragmentation, which prevents
364 :     returning very long strings, especially when there are multiple values
365 :     for a single key. This leads to 4 scenarios:
366 :    
367 :     1. The file requires only a single piece. In this case, no
368 :     piece-hashing is needed, so store only the basic dictionary with
369 :     the "c" key.
370 :    
371 : camrdale 122 stored value = {"c":"def456"}
372 :     bencoded = d1:c6:def456e
373 : camrdale 121
374 :     2. The file is 2 to 4 pieces in length. The piece hash strings are
375 :     short enough to store directly with the contact information. A key
376 :     "t" is added to the stored dictionary which is a dictionary
377 :     containing a key "t" which has as a value the concatenated
378 :     string of piece hashes.
379 :    
380 : camrdale 122 stored value = {"c":"def456", "t":{"t":"abcdefghij0123456789..."}}
381 :     bencoded = d1:c6:def4561:td1:t23:abcdefghij0123456789...ee
382 : camrdale 121
383 :     3. The file is 5 to 70 pieces in length. The piece hash strings are too
384 :     long to store with the contact information, as that would limit the
385 :     node to returning very few values. Instead, hash the piece hash string
386 :     using SHA1, and store that string with a key "h" in the stored
387 :     dictionary. Then, store the concatenated piece hash string in the DHT
388 :     using the hash of the string as the key. There, the stored dictionary
389 :     will contain only a single key "t", whose value will be the
390 :     concatenated string of piece hashes.
391 :    
392 : camrdale 122 stored value = {"c":"def456", "h":"12345678901234567890"}
393 :     bencoded = d1:c6:def4561:h20:12345678901234567890e
394 : camrdale 121
395 : camrdale 122 stored value at key 12345678901234567890 = {"t":"abcdefghij0123456789..."}
396 :     bencoded = d1:t23:abcdefghij0123456789...e
397 : camrdale 121
398 :     4. The file is more than 70 pieces in length. The piece hash string is
399 :     now too long for a single UDP packet, and so must be stored outside
400 :     the DHT. Again, hash the piece hash string using SHA1, and store that
401 :     string with a key "l" in the stored dictionary. Then, respond to any
402 :     peer download requests for that hash with a bencoded dictionary
403 :     containing a key "t" whose value is the concatenated string of piece
404 :     hashes.
405 :    
406 : camrdale 122 stored value = {"c":"def456", "l":"12345678901234567890"}
407 :     bencoded = d1:c6:def4561:l20:12345678901234567890e

CVS Admin
ViewVC Help
Powered by ViewVC 1.0.5