Apt-P2P's Distributed Hash Table is very similar to BitTorrent's. BitTorrent uses a "distributed sloppy hash table" (DHT) for storing peer contact information for "trackerless" torrents. Apt-P2P's implementation stores peer contact information for each file hash. In effect, each peer becomes a 'tracker' for those files that have hashes close to it. The protocol is based on Kademlia and is implemented over UDP.

Please note the terminology used in this document to avoid confusion. A "peer" is a client/server listening on a TCP port that implements the Apt-P2P downloading protocol. A "node" is a client/server listening on a UDP port implementing the distributed hash table protocol. The DHT is composed of nodes and stores the location of peers. Apt-P2P clients include a DHT node, which is used to contact other nodes in the DHT to get the location of peers to download from using the Apt-P2P downloading protocol.

Overview

Each node has a globally unique identifier known as the "node ID." Node IDs are chosen at random from the same 160-bit space as file hashes (use SHA1 and plenty of entropy to ensure a unique ID). A "distance metric" is used to compare two node IDs, or a node ID and a file hash for "closeness." Nodes must maintain a routing table containing the contact information for a small number of other nodes. The routing table becomes more detailed as IDs get closer to the node's own ID. Nodes know about many other nodes in the DHT that have IDs that are "close" to their own but have only a handful of contacts with IDs that are very far away from their own.

In Kademlia, the distance metric is XOR and the result is interpreted as an unsigned integer. distance(A,B) = |A XOR B| Smaller values are closer.

When a node wants to find peers for a file download, it uses the distance metric to compare the hash of the file with the IDs of the nodes in its own routing table. It then contacts the nodes it knows about with IDs closest to the file hash and asks them for the contact information of peers that currently have the file. A contacted node returns the number of peers it knows about for the file, as well as contact information of the nodes in its routing table that are closest to the hash of the file. The original node iteratively queries nodes that are closer to the target hash until it cannot find any closer nodes. The node then queries the closest found nodes to the hash of the file to get the peer contact information the nodes' know about, until it has enough (or all) peers for the file. After the download is completed, the client then inserts the peer contact information for itself onto the responding nodes with IDs closest to the hash of the file.

The return value for a query for nodes close to a hash includes an opaque value known as the "token." For a node to announce that its controlling peer is sharing a file, it must present the token received from the same queried node in a recent query for nodes. When a node attempts to "announce" a shared file, the queried node checks the token against the querying node's IP address. This is to prevent malicious hosts from signing up other hosts for downloads. Since the token is merely returned by the querying node to the same node it received the token from, the implementation is not defined. Tokens must be accepted for a reasonable amount of time after they have been distributed. The current implementation uses the SHA1 hash of the IP address concatenated onto a secret that changes every 5 minutes and tokens up to 10 minutes old are accepted.

Routing Table

Every node maintains a routing table of known good nodes. The nodes in the routing table are used as starting points for queries in the DHT. Nodes from the routing table are returned in response to queries from other nodes.

Not all nodes that we learn about are equal. Some are "good" and some are not. Many nodes using the DHT are able to send queries and receive responses, but are not able to respond to queries from other nodes. It is important that each node's routing table must contain only known good nodes. A good node is a node that has responded to one of our queries within the last 15 minutes. A node is also good if it has ever responded to one of our queries and has sent us a query within the last 15 minutes. After 15 minutes of inactivity, a node becomes questionable. Nodes become bad when they fail to respond to multiple queries in a row. Nodes that we know are good are given priority over nodes with unknown status.

The routing table covers the entire node ID space from 0 to 2160. The routing table is subdivided into "buckets" that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2160. When a node with ID "N" is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming "full." When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2159 and 2159..2160.

When the bucket is full of good nodes, the new node is simply discarded. If any nodes in the bucket are known to have become bad, then one is replaced by the new node. If there are any questionable nodes in the bucket (have not been seen in the last 15 minutes), the least recently seen node is pinged. If the pinged node fails to respond, it is discarded node and replaced with a new good node. In this way, the table fills with stable long running nodes.

Each bucket should maintain a "last changed" property to indicate how "fresh" the contents are. When a node in a bucket is pinged and it responds, or a node is added to a bucket, or a node in a bucket is replaced with another node, the bucket's last changed property should be updated. Buckets that have not been changed in 15 minutes should be "refreshed." This is done by picking a random ID in the range of the bucket and performing a find_node's search on it. Nodes that are able to receive queries from other nodes usually do not need to refresh buckets often. Nodes that are not able to receive queries from other nodes usually will need to refresh all buckets periodically to ensure there are good nodes in their table when the DHT is needed.

Upon inserting the first node into its routing table and when starting up thereafter, the node should attempt to find the closest nodes in the DHT to itself. It does this by issuing find_node messages to closer and closer nodes until it cannot find any closer. The routing table should be saved between invocations of the client software.

KRPC Protocol

The KRPC protocol is a simple RPC mechanism consisting of bencoded dictionaries sent over UDP. A single query packet is sent out and a single packet is sent in response. There is no retry. There are three message types: query, response, and error. For the DHT protocol, there are currently six queries: ping, join, find_node, find_value, get_value, and store_value.

A KRPC message is a single dictionary with two keys common to every message and additional keys depending on the type of message. Every message has a key "t" with a string value representing a transaction ID. This transaction ID is generated by the querying node and is echoed in the response, so responses may be correlated with multiple queries to the same node. The transaction ID should be encoded as a random binary string, typically 20 bytes in length corresponding to the SHA1 hash.

The other key contained in every KRPC message is "y" with a single character value describing the type of message. The value of the "y" key is one of "q" for query, "r" for response, or "e" for error.

Queries

Queries, or KRPC message dictionaries with a "y" value of "q", contain two additional keys; "q" and "a". Key "q" has a string value containing the method name of the query. Key "a" has a dictionary value containing named arguments to the query.

Responses

Responses, or KRPC message dictionaries with a "y" value of "r", contain one additional key "r". The value of "r" is a dictionary containing named return values. Response messages are sent upon successful completion of a query.

Errors

Errors, or KRPC message dictionaries with a "y" value of "e", contain one additional key "e". The value of "e" is a list. The first element is an integer representing the error code. The second element is a string containing the error message. Errors are sent when a query cannot be fulfilled. The following table describes the possible error codes:

CodeDescription
200Generic Error
201Server Error, the node encountered an error
202Malformed Packet, the packet did not match the protocol
203Method Unknown, a request to fulfill an unknown method
204Malformed Request, incorrect arguments to the method
205Invalid Token, the token has expired, get a new one
206Response Too Long, the response packet exceeds the UDP limits for fragmentation (1472 bytes)

Example Error Packets

generic error = {"t":"12345678901234567890",
                 "y":"e",
                 "e":[200, "A Generic Error Occurred"]}
     bencoded = d1:eli200e24:A Generic Error Occurrede1:t20:123456789012345678901:y1:ee

Contact Encoding

Contact information for peers is encoded as a 6-byte string. Also known as "Compact peer info", it is the 4-byte IP address in network byte order with the 2 byte port in network byte order concatenated onto the end.

Contact information for nodes is encoded as a 26-byte string. Also known as "Compact node info", it is the 20-byte Node ID in network byte order with the compact peer info concatenated to the end.

DHT Queries

All queries have an "id" named argument with value containing the node ID of the querying node. All responses have an "id" named return value containing the node ID of the responding node.

ping

The most basic query is a ping: "q" = "ping". A ping query has a single argument, "id" the value is a 20-byte string containing the senders node ID in network byte order. The appropriate response to a ping has a single key "id" containing the node ID of the responding node.

arguments: {"id": "<querying nodes id>"}
 response: {"id": "<queried nodes id>"}

Example Packets

ping Query = {"t":"12345678901234567890",
              "y":"q",
              "q":"ping",
              "a":{"id":"abcdefghij0123456789"}}
  bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t20:123456789012345678901:y1:qe

Response = {"t":"12345678901234567890",
            "y":"r",
            "r": {"id":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re

join

join is identical to ping, except in the response provided: "q" = "join". It is intended to be sent only to bootstrap nodes during the process of a new peer joining the system. The response to a join query has two additional responses, a key "ip_addr" whose value is the bootstrap node's understanding of the peer's IP address, and a key "port" whose value is the port number that the bootstrap node received the request from.

arguments: {"id": "<querying nodes id>"}
 response: {"id": "<queried nodes id>",
            "ip_addr": "<querying nodes IP>",
            "port": "<querying nodes port>"}

Example Packets

join Query = {"t":"12345678901234567890",
              "y":"q",
              "q":"join",
              "a":{"id":"abcdefghij0123456789"}}
  bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:join1:t20:123456789012345678901:y1:qe

Response = {"t":"12345678901234567890",
            "y":"r",
            "r": {"id":"mnopqrstuvwxyz123456",
                  "ip_addr":"123.123.123.123",
                  "port":12345}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz1234567:ip_addr15:123.123.123.1234:porti12345ee1:t20:123456789012345678901:y1:re

find_node

Find node is used to find the contact information for a node given its ID: "q" = "find_node". A find_node query has two arguments, "id" containing the node ID of the querying node, and "target" containing the ID of the node sought by the querier. When a node receives a find_node query, it should respond with a key "nodes" and value of list of strings containing the K (8) closest good nodes in its own routing table. A "token" key is also included in the return value. The token value is a required argument for a future store_value query. The token value should be a short binary string.

arguments: {"id": "<querying nodes id>",
            "target": "<id of target node>"}
 response: {"id": "<queried nodes id>",
            "nodes": ["<compact node info>"],
            "token": "<opaque write token>"}

Example Packets

find_node Query = {"t":"12345678901234567890",
                   "y":"q",
                   "q":"find_node",
                   "a": {"id":"abcdefghij0123456789",
                         "target":"mnopqrstuvwxyz123456"}}
       bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t20:123456789012345678901:y1:qe

Response = {"t":"12345678901234567890",
            "y":"r",
            "r": {"id":"0123456789abcdefghij",
                  "nodes": ["12345678901234567890def456"],
                  "token":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:0123456789abcdefghij5:nodesl26:12345678901234567890def456e5:token20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re

find_value

Find values associated with a file hash: "q" = "find_value". A find_value query has two arguments, "id" containing the node ID of the querying node, and "key" containing the hash of the file. The queried node returns in a key "num" which is the number of values it has for that key. A key "nodes" is also returned containing a list of compact node info for the K nodes in the queried nodes routing table closest to the "key" hash supplied in the query.

arguments: {"id": "<querying nodes id>",
            "key": "<20-byte hash of target file>"}
 response: {"id": "<queried nodes id>",
            "num": <number of values>,
            "nodes": ["<compact node info>"]}

Example Packets

find_value Query = {"t":"12345678901234567890",
                    "y":"q",
                    "q":"find_value",
                    "a": {"id":"abcdefghij0123456789",
                          "key":"mnopqrstuvwxyz123456"}}
       bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz123456e1:q10:find_value1:t20:123456789012345678901:y1:qe

Response = {"t":"12345678901234567890",
            "y":"r",
            "r": {"id":"abcdefghij0123456789",
                  "num": 23,
                  "nodes": ["12345678901234567890def456"]}}
bencoded = d1:rd2:id20:abcdefghij01234567895:nodesl26:12345678901234567890def456e3:numi23ee1:t20:123456789012345678901:y1:re

get_value

Get values associated with a file hash: "q" = "get_value". A get_value query has three arguments, "id" containing the node ID of the querying node, "num" containing the maximum number of values to return, and "key" containing the hash of the file. Setting "num" to 0 indicates that the querying node wants as many values as possible. The queried node returns any values for the hash in a key "values" as a list of strings. The queried node may not return as many values as requested, as it may have a limit on the maximum it can send, or the UDP packet size limit may restrict the results. The order of the returned values should be shuffled for each request. See the next section for a description of the possible stored values.

arguments: {"id": "<querying nodes id>",
            "key": "<20-byte hash of target file>",
            "num": <number of values>}
 response: {"id": "<queried nodes id>",
            "values": ["<value 1 string>", "<value 2 string>"]}

Example Packets

get_value Query = {"t":"12345678901234567890",
                   "y":"q",
                   "q":"get_value",
                   "a": {"id":"abcdefghij0123456789",
                         "key":"mnopqrstuvwxyz123456",
                         "num":10}}
       bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz1234563:numi10ee1:q9:get_value1:t20:123456789012345678901:y1:qe

Response = {"t":"12345678901234567890",
            "y":"r",
            "r": {"id":"abcdefghij0123456789",
                  "values": ["d1:c6:456abc", "d1:c6:def456e"]}}
bencoded = d1:rd2:id20:abcdefghij01234567896:valuesl13:d1:c6:456abce13:d1:c6:def456eee1:t20:123456789012345678901:y1:re

store_value

Store a value in the DHT: "q" = "store_value". A store_value query has four arguments: "id" containing the node ID of the querying node, "key" containing the hash of the file, "value" is a string containing the value to be stored, and the "token" received in response to a previous find_node query. The queried node must verify that the token was previously sent to the same IP address as the querying node. See the next section for a description of the possible stored strings. The queried node should store the contact information of the querying node under the file hash in its store of peer contact information.

arguments: {"id" : "<querying nodes id>",
            "key" : "<20-byte hash of target file>",
            "value" : "<string value>",
            "token" : "<opaque token>"}
 response: {"id" : "<queried nodes id>"}

Example Packets

store_value Query = {"t":"12345678901234567890",
                     "y":"q",
                     "q":"store_value",
                     "a": {"id":"abcdefghij0123456789",
                           "key":"mnopqrstuvwxyz123456",
                           "value":"d1:c6:def456e",
                           "token":"aoeusnth"}}
         bencoded = d1:ad2:id20:abcdefghij01234567893:key20:mnopqrstuvwxyz1234565:token8:aoeusnth5:value13:d1:c6:def456ee1:q11:store_value1:t20:123456789012345678901:y1:qe

Response = {"t":"12345678901234567890",
            "y":"r",
            "r": {"id":"mnopqrstuvwxyz123456"}}
bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t20:123456789012345678901:y1:re

Stored Values

All of the values stored and retrieved in the previous section's description of the "get_value" and "store_value" DHT queries are strings. Currently, all stored values are bencoded strings of a dictionary containing many possibilities. In the simplest possible implementation, the dictionary will have a single key "c", whose value is the 6-byte compact contact information for the peer providing the file. All other possibilities build on this simple model.

stored value = {"c":"def456"}
    bencoded = d1:c6:def456e

Piece Hashes

In order to support the efficient downloading of large files from multiple peers, hashes of smaller pieces of the large files are necessary.

Piece Size

A piece size must be chosen that is consistent across all clients. This piece size will be 512 KB, which is 524288 bytes. Any files larger than this will have hashes of pieces of this size calculated. Any files equal to or smaller than this will not have any piece information stored in the DHT.

Piece Hash Storage

Files larger than the maximum piece size will require multiple pieces. Each piece of the file will have to be hashed to get these piece hashes. Once hashed, the piece hashes will be concatenated and stored in a binary string whose length is a multiple of 20 (the length of the SHA1 hash).

Storage in the DHT

Storage of the piece hashes so that peers accessing the DHT can retrieve them will depend on the number of pieces. This is due to the limitation on the length of a UDP packet to prevent fragmentation, which prevents returning very long strings, especially when there are multiple values for a single key. This leads to 4 scenarios:

  1. The file requires only a single piece. In this case, no piece-hashing is needed, so store only the basic dictionary with the "c" key.

    stored value = {"c":"def456"}
        bencoded = d1:c6:def456e
    
  2. The file is 2 to 4 pieces in length. The piece hash strings are short enough to store directly with the contact information. A key "t" is added to the stored dictionary which is a dictionary containing a key "t" which has as a value the concatenated string of piece hashes.

    stored value = {"c":"def456",
                    "t": {"t":"abcdefghij0123456789..."}}
        bencoded = d1:c6:def4561:td1:t23:abcdefghij0123456789...ee
    
  3. The file is 5 to 70 pieces in length. The piece hash strings are too long to store with the contact information, as that would limit the node to returning very few values. Instead, hash the piece hash string using SHA1, and store that string with a key "h" in the stored dictionary. Then, store the concatenated piece hash string in the DHT using the hash of the string as the key. There, the stored dictionary will contain only a single key "t", whose value will be the concatenated string of piece hashes.

    stored value = {"c":"def456",
                    "h":"12345678901234567890"}
        bencoded = d1:c6:def4561:h20:12345678901234567890e
    
    
    stored value at key 12345678901234567890 = {"t":"abcdefghij0123456789..."}
                                    bencoded = d1:t23:abcdefghij0123456789...e
    
  4. The file is more than 70 pieces in length. The piece hash string is now too long for a single UDP packet, and so must be stored outside the DHT. Again, hash the piece hash string using SHA1, and store that string with a key "l" in the stored dictionary. Then, respond to any peer download requests for that hash with a bencoded dictionary containing a key "t" whose value is the concatenated string of piece hashes.

    stored value = {"c":"def456",
                    "l":"12345678901234567890"}
        bencoded = d1:c6:def4561:l20:12345678901234567890e
    
    
    response to download request for 12345678901234567890 = {"t":"abcdefghij0123456789..."}
                                                 bencoded = d1:t23:abcdefghij0123456789...e