[ikiwiki] / trunk / apt-p2p / protocol.mdwn Repository:
ViewVC logotype

Annotation of /trunk/apt-p2p/protocol.mdwn

Parent Directory Parent Directory | Revision Log Revision Log


Revision 125 - (view) (download)
Original Path: trunk/apt-dht/protocol.mdwn

1 : camrdale 118 Apt-DHT's Distributed Hash Table is very similar to BitTorrent's.
2 :     BitTorrent uses a "distributed sloppy hash table" (DHT) for storing peer
3 :     contact information for "trackerless" torrents. In effect, each peer
4 : camrdale 119 becomes a tracker. The protocol is based on [Kademlia][10] and is
5 : camrdale 118 implemented over UDP.
6 :    
7 :     [10]: http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf
8 :    
9 :     Please note the terminology used in this document to avoid confusion. A
10 :     "peer" is a client/server listening on a TCP port that implements the
11 :     BitTorrent protocol. A "node" is a client/server listening on a UDP port
12 :     implementing the distributed hash table protocol. The DHT is composed of
13 :     nodes and stores the location of peers. BitTorrent clients include a DHT
14 :     node, which is used to contact other nodes in the DHT to get the
15 :     location of peers to download from using the BitTorrent protocol.
16 :    
17 :     ## Overview
18 :    
19 :     Each node has a globally unique identifier known as the "node ID." Node
20 :     IDs are chosen at random from the same 160-bit space as file hashes (use
21 :     SHA1 and plenty of entropy to ensure a unique ID). A "distance metric"
22 :     is used to compare two node IDs or a node ID and an infohash for
23 :     "closeness." Nodes must maintain a routing table containing the contact
24 :     information for a small number of other nodes. The routing table becomes
25 :     more detailed as IDs get closer to the node's own ID. Nodes know about
26 :     many other nodes in the DHT that have IDs that are "close" to their own
27 :     but have only a handful of contacts with IDs that are very far away from
28 :     their own.
29 :    
30 :     In Kademlia, the distance metric is XOR and the result is interpreted as
31 : camrdale 121 an unsigned integer. distance(A,B) = |A XOR B| Smaller values are
32 : camrdale 118 closer.
33 :    
34 :     When a node wants to find peers for a file download, it uses the
35 :     distance metric to compare the hash of the file with the IDs of the
36 :     nodes in its own routing table. It then contacts the nodes it knows
37 :     about with IDs closest to the file hash and asks them for the contact
38 :     information of peers that currently have the file. If a contacted node
39 :     knows about peers for the file, the peer contact information is returned
40 :     with the response. Otherwise, the contacted node must respond with the
41 :     contact information of the nodes in its routing table that are closest
42 :     to the hash of the file. The original node iteratively queries nodes
43 :     that are closer to the target hash until it cannot find any closer
44 :     nodes. After the download is completed, the client then inserts the peer
45 :     contact information for itself onto the responding nodes with IDs
46 :     closest to the hash of the file.
47 :    
48 : camrdale 124 The return value for a query for nodes close to a hash includes an opaque value known as
49 : camrdale 121 the "token." For a node to announce that its controlling peer is
50 : camrdale 124 sharing a file, it must present the token received from the same
51 : camrdale 121 queried node in a recent query for nodes. When a node attempts to
52 : camrdale 124 "announce" a shared file, the queried node checks the token against the
53 : camrdale 121 querying node's IP address. This is to prevent malicious hosts from
54 :     signing up other hosts for downloads. Since the token is merely returned
55 :     by the querying node to the same node it received the token from, the
56 :     implementation is not defined. Tokens must be accepted for a reasonable
57 :     amount of time after they have been distributed. The current
58 :     implementation uses the SHA1 hash of the IP address concatenated onto a
59 : camrdale 124 secret that changes every 5 minutes and tokens up to 10 minutes old are
60 : camrdale 121 accepted.
61 :    
62 : camrdale 118 ## Routing Table
63 :    
64 :     Every node maintains a routing table of known good nodes. The nodes in
65 :     the routing table are used as starting points for queries in the DHT.
66 :     Nodes from the routing table are returned in response to queries from
67 :     other nodes.
68 :    
69 :     Not all nodes that we learn about are equal. Some are "good" and some
70 :     are not. Many nodes using the DHT are able to send queries and receive
71 :     responses, but are not able to respond to queries from other nodes. It
72 :     is important that each node's routing table must contain only known good
73 :     nodes. A good node is a node that has responded to one of our queries
74 :     within the last 15 minutes. A node is also good if it has ever responded
75 :     to one of our queries and has sent us a query within the last 15
76 :     minutes. After 15 minutes of inactivity, a node becomes questionable.
77 :     Nodes become bad when they fail to respond to multiple queries in a row.
78 :     Nodes that we know are good are given priority over nodes with unknown
79 :     status.
80 :    
81 :     The routing table covers the entire node ID space from 0 to
82 :     2<sup>160</sup>. The routing table is subdivided into "buckets" that
83 :     each cover a portion of the space. An empty table has one bucket with an
84 :     ID space range of min=0, max=2<sup>160</sup>. When a node with ID "N" is
85 :     inserted into the table, it is placed within the bucket that has min
86 :     &lt;= N &lt; max. An empty table has only one bucket so any node must
87 :     fit within it. Each bucket can only hold K nodes, currently eight,
88 :     before becoming "full." When a bucket is full of known good nodes, no
89 :     more nodes may be added unless our own node ID falls within the range of
90 :     the bucket. In that case, the bucket is replaced by two new buckets each
91 :     with half the range of the old bucket and the nodes from the old bucket
92 :     are distributed among the two new ones. For a new table with only one
93 :     bucket, the full bucket is always split into two new buckets covering
94 :     the ranges 0..2<sup>159</sup> and 2<sup>159</sup>..2<sup>160</sup>.
95 :    
96 :     When the bucket is full of good nodes, the new node is simply discarded.
97 :     If any nodes in the bucket are known to have become bad, then one is
98 :     replaced by the new node. If there are any questionable nodes in the
99 :     bucket have not been seen in the last 15 minutes, the least recently
100 :     seen node is pinged. If the pinged node responds then the next least
101 :     recently seen questionable node is pinged until one fails to respond or
102 :     all of the nodes in the bucket are known to be good. If a node in the
103 :     bucket fails to respond to a ping, it is suggested to try once more
104 :     before discarding the node and replacing it with a new good node. In
105 :     this way, the table fills with stable long running nodes.
106 :    
107 :     Each bucket should maintain a "last changed" property to indicate how
108 :     "fresh" the contents are. When a node in a bucket is pinged and it
109 :     responds, or a node is added to a bucket, or a node in a bucket is
110 :     replaced with another node, the bucket's last changed property should be
111 :     updated. Buckets that have not been changed in 15 minutes should be
112 :     "refreshed." This is done by picking a random ID in the range of the
113 :     bucket and performing a find_nodes search on it. Nodes that are able to
114 :     receive queries from other nodes usually do not need to refresh buckets
115 :     often. Nodes that are not able to receive queries from other nodes
116 :     usually will need to refresh all buckets periodically to ensure there
117 :     are good nodes in their table when the DHT is needed.
118 :    
119 :     Upon inserting the first node into its routing table and when starting
120 :     up thereafter, the node should attempt to find the closest nodes in the
121 :     DHT to itself. It does this by issuing find_node messages to closer and
122 :     closer nodes until it cannot find any closer. The routing table should
123 :     be saved between invocations of the client software.
124 :    
125 :     ## KRPC Protocol
126 :    
127 :     The KRPC protocol is a simple RPC mechanism consisting of bencoded
128 :     dictionaries sent over UDP. A single query packet is sent out and a
129 :     single packet is sent in response. There is no retry. There are three
130 :     message types: query, response, and error. For the DHT protocol, there
131 : camrdale 123 are currently five queries: ping, join, find\_node, get\_value, and
132 :     store\_value.
133 : camrdale 118
134 :     A KRPC message is a single dictionary with two keys common to every
135 :     message and additional keys depending on the type of message. Every
136 :     message has a key "t" with a string value representing a transaction ID.
137 :     This transaction ID is generated by the querying node and is echoed in
138 :     the response, so responses may be correlated with multiple queries to
139 :     the same node. The transaction ID should be encoded as a random binary
140 :     string, typically 20 bytes in length corresponding to the SHA1 hash.
141 :    
142 :     The other key contained in every KRPC message is "y" with a single
143 :     character value describing the type of message. The value of the "y" key
144 :     is one of "q" for query, "r" for response, or "e" for error.
145 :    
146 : camrdale 119 ### Queries
147 : camrdale 118
148 :     Queries, or KRPC message dictionaries with a "y" value of "q", contain
149 :     two additional keys; "q" and "a". Key "q" has a string value containing
150 :     the method name of the query. Key "a" has a dictionary value containing
151 :     named arguments to the query.
152 :    
153 : camrdale 119 ### Responses
154 : camrdale 118
155 :     Responses, or KRPC message dictionaries with a "y" value of "r", contain
156 :     one additional key "r". The value of "r" is a dictionary containing
157 :     named return values. Response messages are sent upon successful
158 :     completion of a query.
159 :    
160 : camrdale 119 ### Errors
161 : camrdale 118
162 :     Errors, or KRPC message dictionaries with a "y" value of "e", contain
163 :     one additional key "e". The value of "e" is a list. The first element is
164 :     an integer representing the error code. The second element is a string
165 :     containing the error message. Errors are sent when a query cannot be
166 :     fulfilled. The following table describes the possible error codes:
167 :    
168 : camrdale 125 <table border="1">
169 :     <tr align="left"><th>Code</th><th>Description</th></tr>
170 :     <tr><td>200</td><td>Generic Error</td></tr>
171 :     <tr><td>201</td><td>Server Error, the node encountered an error</td></tr>
172 :     <tr><td>202</td><td>Malformed Packet, the packet did not match the protocol</td></tr>
173 :     <tr><td>203</td><td>Method Unknown, a request to fulfill an unknown method</td></tr>
174 :     <tr><td>204</td><td>Malformed Request, incorrect arguments to the method</td></tr>
175 :     <tr><td>205</td><td>Invalid Token, the token has expired, get a new one</td></tr>
176 : camrdale 118 </table>
177 :    
178 : camrdale 119 #### Example Error Packets
179 : camrdale 118
180 : camrdale 121 generic error = {"t":"12345678901234567890", "y":"e", "e":[201, "A Generic Error Occurred"]}
181 :     bencoded = d1:eli201e24:A Generic Error Occurrede1:t20:123456789012345678901:y1:ee
182 : camrdale 118
183 : camrdale 119 ### Contact Encoding
184 : camrdale 118
185 :     Contact information for peers is encoded as a 6-byte string. Also known
186 :     as "Compact IP-address/port info" the 4-byte IP address is in network
187 :     byte order with the 2 byte port in network byte order concatenated onto
188 :     the end.
189 :    
190 :     Contact information for nodes is encoded as a 26-byte string. Also known
191 :     as "Compact node info" the 20-byte Node ID in network byte order has the
192 :     compact IP-address/port info concatenated to the end.
193 :    
194 :     ## DHT Queries
195 :    
196 :     All queries have an "id" key and value containing the node ID of the
197 :     querying node. All responses have an "id" key and value containing the
198 :     node ID of the responding node.
199 :    
200 : camrdale 119 ### ping
201 : camrdale 118
202 :     The most basic query is a ping. "q" = "ping" A ping query has a single
203 :     argument, "id" the value is a 20-byte string containing the senders node
204 :     ID in network byte order. The appropriate response to a ping has a
205 :     single key "id" containing the node ID of the responding node.
206 :    
207 : camrdale 119 arguments: {"id": "<querying nodes id>"}
208 :     response: {"id": "<queried nodes id>"}
209 : camrdale 118
210 : camrdale 119 #### Example Packets
211 : camrdale 118
212 :     ping Query = {"t":"12345678901234567890", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}}
213 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t20:123456789012345678901:y1:qe
214 : camrdale 118
215 :     Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
216 :     bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re
217 :    
218 : camrdale 119 ### join
219 : camrdale 118
220 :     join is identical to ping, except in the response provided. It is
221 :     intended to be sent only to bootstrap nodes during the process of a new
222 : camrdale 121 peer joining the system. The response to a "q" = "join" query has two
223 : camrdale 123 additional responses, a key "ip\_addr" whose value is the bootstrap
224 : camrdale 118 node's understanding of the peer's IP address, and a key "port" whose
225 :     value is the port number that the bootstrap node received the request
226 :     from.
227 :    
228 : camrdale 119 arguments: {"id": "<querying nodes id>"}
229 :     response: {"id": "<queried nodes id>", "ip_addr": "<querying nodes IP>", "port": "<querying nodes port>"}
230 : camrdale 118
231 : camrdale 119 #### Example Packets
232 : camrdale 118
233 :     join Query = {"t":"12345678901234567890", "y":"q", "q":"join", "a":{"id":"abcdefghij0123456789"}}
234 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:join1:t20:123456789012345678901:y1:qe
235 : camrdale 118
236 :     Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"mnopqrstuvwxyz123456", "ip_addr":"123.123.123.123", "port":12345}}
237 :     bencoded = d1:rd2:id20:mnopqrstuvwxyz1234567:ip_addr15:123.123.123.1234:porti12345ee1:t20:123456789012345678901:y1:re
238 :    
239 : camrdale 123 ### find\_node
240 : camrdale 118
241 :     Find node is used to find the contact information for a node given its
242 : camrdale 123 ID. "q" = "find\_node" A find_node query has two arguments, "id"
243 : camrdale 118 containing the node ID of the querying node, and "target" containing the
244 : camrdale 123 ID of the node sought by the querier. When a node receives a find\_node
245 : camrdale 118 query, it should respond with a key "nodes" and value of a string
246 :     containing the compact node info for the target node or the K (8)
247 : camrdale 121 closest good nodes in its own routing table. A "token" key is also
248 :     included in the return value. The token value is a required argument for
249 : camrdale 123 a future store\_value query. The token value should be a short binary
250 : camrdale 121 string.
251 : camrdale 118
252 : camrdale 119 arguments: {"id": "<querying nodes id>", "target": "<id of target node>"}
253 : camrdale 121 response: {"id": "<queried nodes id>", "nodes": "<compact node info>", "token": "<opaque write token>"}
254 : camrdale 118
255 : camrdale 119 #### Example Packets
256 : camrdale 118
257 :     find_node Query = {"t":"12345678901234567890", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}
258 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t20:123456789012345678901:y1:qe
259 : camrdale 118
260 : camrdale 121 Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456...", "token":"aoeusnth"}}
261 :     bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...5:token8:aoeusnthe1:t20:123456789012345678901:y1:re
262 : camrdale 118
263 : camrdale 123 ### get\_value
264 : camrdale 118
265 : camrdale 123 Get values associated with a file hash. "q" = "get\_value" A get\_value
266 : camrdale 118 query has two arguments, "id" containing the node ID of the querying
267 :     node, and "key" containing the hash of the file. If the queried node has
268 : camrdale 121 values for the hash, they are returned in a key "values" as a list of
269 :     strings. See the next section for a description of the possible stored
270 :     values. If the queried node has no values for the hash, a key "nodes" is
271 :     returned containing the K nodes in the queried nodes routing table
272 :     closest to the hash supplied in the query.
273 : camrdale 118
274 : camrdale 119 arguments: {"id": "<querying nodes id>", "key": "<20-byte hash of target file>"}
275 : camrdale 121 response: {"id": "<queried nodes id>", "values": ["<peer 1 value string>", "<peer 2 value string>"]}
276 : camrdale 119 or: {"id": "<queried nodes id>", "nodes": "<compact node info>"}
277 : camrdale 118
278 : camrdale 119 #### Example Packets
279 : camrdale 118
280 :     get_value Query = {"t":"12345678901234567890", "y":"q", "q":"get_value", "a": {"id":"abcdefghij0123456789", "key":"mnopqrstuvwxyz123456"}}
281 : camrdale 119 bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz123456e1:q9:get_value1:t20:123456789012345678901:y1:qe
282 : camrdale 118
283 : camrdale 121 Response with values = {"t":"12345678901234567890", "y":"r", "r": {"id":"abcdefghij0123456789", "values": ["d1:c6:456abc", "d1:c6:def456e"]}}
284 :     bencoded = d1:rd2:id20:abcdefghij01234567896:valuesl6:axje.u6:idhtnmee1:t20:123456789012345678901:y1:re
285 : camrdale 118
286 : camrdale 119 Response with nodes = {"t":"12345678901234567890", "y":"r", "r": {"id":"abcdefghij0123456789", "nodes": "def456..."}}
287 :     bencoded = d1:rd2:id20:abcdefghij01234567895:nodes9:def456...e1:t20:123456789012345678901:y1:re
288 : camrdale 118
289 : camrdale 123 ### store\_value
290 : camrdale 118
291 : camrdale 123 Store a value in the DHT. store\_value has four arguments: "id"
292 : camrdale 121 containing the node ID of the querying node, "key" containing the hash
293 :     of the file, "value" is a string containing the value to be stored, and
294 :     the "token" received in response to a previous find_node query. The
295 :     queried node must verify that the token was previously sent to the same
296 :     IP address as the querying node. See the next section for a description
297 :     of the possible stored strings. The queried node should store the
298 :     contact information of the querying node under the file hash in its
299 :     store of peer contact information.
300 : camrdale 118
301 : camrdale 121 arguments: {"id" : "<querying nodes id>", "key" : "<20-byte hash of target file>", "value" : "<string value>", "token" : "<opaque token>"}
302 : camrdale 119 response: {"id" : "<queried nodes id>"}
303 : camrdale 118
304 : camrdale 119 #### Example Packets
305 : camrdale 118
306 : camrdale 121 store_value Query = {"t":"12345678901234567890", "y":"q", "q":"store_value", "a": {"id":"abcdefghij0123456789", "key":"mnopqrstuvwxyz123456", "value":"d1:c6:def456e", "token":"aoeusnth"}}
307 :     bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz1234565:token8:aoeusnth5:value13:d1:c6:def456ee1:q13:store_value1:t20:123456789012345678901:y1:qe
308 : camrdale 118
309 :     Response = {"t":"12345678901234567890", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
310 :     bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re
311 :    
312 : camrdale 121 ## Stored Values
313 :    
314 :     All of the values stored and retrieved in the previous section's
315 : camrdale 123 description of the "get\_value" and "store\_value" DHT queries are
316 : camrdale 121 strings. Currently, all stored values are bencoded strings of a
317 :     dictionary containing many possibilities. In the simplest possible
318 :     implementation, the dictionary will have a single key "c", whose value
319 :     is the 6-byte compact contact information for the peer providing the
320 :     file. All other possibilities build on this simple model.
321 :    
322 :     stored value = {"c":"def456"}
323 :     bencoded = d1:c6:def456e
324 :    
325 :     ### Piece Hashes
326 :    
327 :     In order to support the efficient downloading of large files from
328 :     multiple peers, hashes of smaller pieces of the large files are
329 :     necessary.
330 :    
331 :     #### Piece Size
332 :    
333 :     To accomplish this, a piece size must be chosen that is consistent
334 :     across all clients, and is no larger than some acceptable maximum piece
335 :     size. An optimal piece size would have all the pieces be the same size,
336 :     which would be less than or equal to the maximum piece size, to prevent
337 :     the last piece from being very small. However, downloading occurs by
338 :     requesting smaller *chunks* of a piece, so choosing a piece size that is
339 :     a multiple of the chunk size will prevent having to request very small
340 :     chunks. Therefore, the following formula, in which `size` is the size of
341 :     the file, `MAX_SIZE` is the maximum piece size, and `CHUNK_SIZE` is the
342 :     chunk size, will give the number of pieces and optimal piece size:
343 :    
344 :     n = 1 + floor((size - 1) / MAX_SIZE)
345 : camrdale 122 optimal = max(MAX_SIZE/2, ceil((size/n)/CHUNK_SIZE)*CHUNK_SIZE)
346 : camrdale 121
347 :     There will then be `n-1` pieces of size `optimal`, and 1 piece of size
348 :     `size - (n-1)*optimal`. The maximum piece size is 512 KiB (524288
349 :     bytes), and the chunk size is 16 KiB (16384 bytes). So, for example, for
350 :     a file of size 1234567, there will be 2 pieces of size 425984 followed
351 :     by a single piece of size 382599.
352 :    
353 :     #### Piece Hash Storage
354 :    
355 :     Files larger than the maximum piece size will require multiple pieces.
356 :     Each piece of the file will have to be hashed to get these piece hashes.
357 :     Once hashed, the piece hashes will be concatenated and stored in a
358 :     binary string whose length is a multiple of 20 (the length of the SHA1
359 :     hash).
360 :    
361 :     #### Storage in the DHT
362 :    
363 :     Storage of the piece hashes so that peers accessing the DHT can retrieve
364 :     them will depend on the number of pieces. This is due to the limitation
365 :     on the length of a UDP packet to prevent fragmentation, which prevents
366 :     returning very long strings, especially when there are multiple values
367 :     for a single key. This leads to 4 scenarios:
368 :    
369 :     1. The file requires only a single piece. In this case, no
370 :     piece-hashing is needed, so store only the basic dictionary with
371 :     the "c" key.
372 :    
373 : camrdale 122 stored value = {"c":"def456"}
374 :     bencoded = d1:c6:def456e
375 : camrdale 121
376 :     2. The file is 2 to 4 pieces in length. The piece hash strings are
377 :     short enough to store directly with the contact information. A key
378 :     "t" is added to the stored dictionary which is a dictionary
379 :     containing a key "t" which has as a value the concatenated
380 :     string of piece hashes.
381 :    
382 : camrdale 122 stored value = {"c":"def456", "t":{"t":"abcdefghij0123456789..."}}
383 :     bencoded = d1:c6:def4561:td1:t23:abcdefghij0123456789...ee
384 : camrdale 121
385 :     3. The file is 5 to 70 pieces in length. The piece hash strings are too
386 :     long to store with the contact information, as that would limit the
387 :     node to returning very few values. Instead, hash the piece hash string
388 :     using SHA1, and store that string with a key "h" in the stored
389 :     dictionary. Then, store the concatenated piece hash string in the DHT
390 :     using the hash of the string as the key. There, the stored dictionary
391 :     will contain only a single key "t", whose value will be the
392 :     concatenated string of piece hashes.
393 :    
394 : camrdale 122 stored value = {"c":"def456", "h":"12345678901234567890"}
395 :     bencoded = d1:c6:def4561:h20:12345678901234567890e
396 : camrdale 121
397 : camrdale 122 stored value at key 12345678901234567890 = {"t":"abcdefghij0123456789..."}
398 :     bencoded = d1:t23:abcdefghij0123456789...e
399 : camrdale 121
400 :     4. The file is more than 70 pieces in length. The piece hash string is
401 :     now too long for a single UDP packet, and so must be stored outside
402 :     the DHT. Again, hash the piece hash string using SHA1, and store that
403 :     string with a key "l" in the stored dictionary. Then, respond to any
404 :     peer download requests for that hash with a bencoded dictionary
405 :     containing a key "t" whose value is the concatenated string of piece
406 :     hashes.
407 :    
408 : camrdale 122 stored value = {"c":"def456", "l":"12345678901234567890"}
409 :     bencoded = d1:c6:def4561:l20:12345678901234567890e

CVS Admin
ViewVC Help
Powered by ViewVC 1.0.5