The QUAFFLER Protocol and File Formats (by Clifford Wolf) ========================================================= QUAFFLER is a software and protocol for distributing large files in large networks. Other than in typical server-client star-topologies there is no need for massive upload bandwidth for the central server. Instead, every downloader also provides some of the upload bandwidth. The central server (tracker) does not provide the actual data. Instead it just coordinates the nodes. Usually the person who provides the tracker also provides at least one node which does have the file already. Roles ----- There are different roles in a QUAFFLER network: nodes The 'normal' QUAFFLER client. All the real data is only replicated between the nodes. The trackers and indexers do only manage metadata. tracker A tracker manages the connection graph between the nodes to guarantee a fast file transfer. Every file in a QUAFFLER network has it's own connection graph. A tracker does not need to know anything about the files which are downloaded using it. All required information is already provided in the QUAFFLER file identifier which is sent to the tracker when a node connects to it (see QUAFFLER URI below). indexer Usually a user has much more files that he could and would share than he is actually sharing using a tracker right now. Users can upload the list of files they want to share to an indexer. Other users can search the list of files then and use the indexer to tell those who have a file already to connect to a tracker and provide the data. Using QUAFFLER indexers is optional. One can also use the trackers only without ever connecting to an indexer. Fragmentation Scheme -------------------- The file is split up in blocks and the blocks are split up into chunks. Type File Size Block Size Chunks/Block Chunk Size A 0 - 64 GB 1 MB 16 64 KB B 64 - 128 GB 2 MB 32 64 KB C 128 - 256 GB 4 MB 64 64 KB D 256 - 512 GB 8 MB 128 64 KB E 512 - 1024 GB 16 MB 256 64 KB The fragmentation scheme can be determined from the file size. But since it is such an important information, it is also passed in an extra field in the QUAFFLER file identifiers. File Metadata ------------- The metadata for a file consists of: size The probably most important file metadata is the file size. The file size defines the number of chunks per block (see above) as well as the number of block. block metadata array Each block consists of chunks. Each block has a metadata array with the sha1 checksums of all chunks. The sha1 checksums are stored in binary (20 bytes per checksum). Usually (files smaller than 64 GB, fragmentation scheme 'A') such a block metadata array is 320 bytes big. attribute block The attribute block contains a comma-seperated list of key-value pairs in the synatx "". The maximum size of the attribute block is 64 kilobytes. The following attributes are defined: mime-type The mime type of the file. description A free-form description of this file. multifile This is a multifile download (see section about multifile download below). Non-standard attributes must start with "x-". index metadata array An array with a sha1 checksum for each block. This checksums are not checksums for the data in the blocks but checksums for the block's block metadata array. At the end of the list there is an additional sha1 checksum for the attribute block. For a 1 GB file this index metadata array is 20 KB big. index checksum The sha1 checksum of the index metadata array. This index checksum is part of the QUAFFLER URI. This metadata (and some bitmaps) are also stored in an extra file so the QUAFFLER client can access and cache it easily. The format of this file is described in a seperate section below. QUAFFLER URI ------------ Example: quaffler://tracker23.example.com:4223/333b831aa6e22b52009007bcba4fcc11b1a7b4c2:609167360:A/pr0n.mpeg The QUAFFLER client connects to a QUAFFLER tracker. This connection is established using a URI. The protocol name for that is "quaffler" and there is no default port number. The first element of the URI path is the QUAFFLER file identifier. It consists of three colon-separated elements: The index checksum as hexadecimal number with lowercase letters, the length of the file in bytes as decimal number and the fragmentation scheme as uppercase letter. The remaining URI path is irrelevant for the QUAFFLER protocol and might be used by the client as hint for naming the file. Connecting and Crypto Handshake ------------------------------- The following happens after a Quaffler TCP connection has been established: 1. The initiator sends a 2048 bit RSA key which has been created using two 1024 bit prime numbers with the highest bit set. So it is guaranteed that a 255 byte message block can be encrypted with the key. The key is sent in little endian byteorder. 2. The responder creates an 8*8 RC4 S-Box (256 Bytes), encrypts the first 255 bytes of the S-Box with the RSA key recivied in step 1 and sends it to the initiator. The last byte in the S-Box can be easily reconstructed from the other bytes on the recivieng end. 3. The initiator decrypts the RC4 S-Box and is from now on using that S-Box for everything it sends over this connection. 4. Then the initiator creates an 8*8 RC4 S-Box for the responder and sends the first 255 bytes of it to the responder (already encrypted with the S-Box recivied from the responder). 5. The responder decrypts the RC4 S-Box and is from now on using that S-Box for everything it sends over this connection. 6. Initiator and responder send the packages described in the next section over the now encrypted connection. The RSA keys used in this cryptographic handshake can be regenerated anytime since authentication is done using a different keypair (see 'initiator' package below). It is recommended to regenerate the RSA keys for this cryptographic handshake when it is older than 24 hours. The initiator may send a zero byte instead of the RSA key to disable encryption for this connection (a RSA key is the product of two odd numbers, so the first byte can't be zero in little endian representation). But this should only be allowed in test environments where one want's to avoid the additional cpu load produced by the crypto stack. Protocol Packages ----------------- Each package starts with a 1 byte package opcode (the hex numbers in the list below), followed by additional fields. The following data types are defined for this additional fields: $string: One byte with the length of the string, followed by the string. The string must not contain 0x00 bytes. $data: An unsigned word (three bytes, big endian) with the length of the data block, followed by the data. $byte: A simple plain 8-bit data byte (unsigned). $word: A 16-bit word (big endian, unsigned). $dword: A 32-bit word (big endian, unsigned). Some packages are only used in node-to-node communications (N-N), others may only be sent from the tracker to a node (T-N) or vice versa (N-T) or from a indexer to a node (I-N) or vice versa (N-I). Keep the syntax of the following list intact. It is used to auto-generate the low-level state machine for parsing incoming packages. ** 0x01 initiator ALL-ALL $string:banner $string:connid $string:mode $string:options $string:quafflerid $data:auth_key $data:auth_token This is the first package in a connection. The banner must always be set to "QUAFFLER V1". For node-node connections the connid is the connid sent by the tracker in the 'connect' package. For node-tracker and node-indexer connections the connid is the sha1 checksum of the RSA public key sent in the auth_key field. The mode field specifies the type of the connection. Possible values are "node-tracker", "node-indexer" and "node-node". The options field contains a comma seperated list of protocol options. The possible options are described in a seperate section below. The quafflerid field contains the QUAFFLER file identifier (see "URI" above) which is subject of this connection with the length field set to block_size * number_of_blocks. This field is left blank in "node-indexer" connections. The auth_key and auth_token fields are unused and left blank in node-node connections. In node-tracker and node-indexer connections the auth_key field contains a 2048 bit RSA public key and the auth_token field contains the RC4 S-Box sent to the initiator in step 2 of the crypto handshake, encrypted with the private key matching the public key in the auth_key field. The auth_key and auth_token fields are always left blank when the crypto handshake was skipped. ** 0x02 responder ALL-ALL $string:options The 2nd package in a connection, when the connection is excepted by the reponder. The options field contains the protocol options for the responder. When the connection is refused by the responder, an error package must be sent back. ** 0x03 ping T-N When this package is the first package in a connection to a node, a pong package must be sent back and the connection must be closed. This mechanism is used by trackers to determine if a node can actually receive connections (i.e. is not behind a NAT firewall). ** 0x04 pong N-T Response to a ping package. ** 0x05 error ALL-ALL $byte:errorcode $string:message This package is sent when a protocol error or misbehavior of the peer has been detected. Usually it is the last package sent before immediately closing the connection. Possible values for the errcode field are: 0 general error 1 username/password authentication failed 2 other authentication faild (address based, etc..) 3 role missmatch error (e.g. tracker connection to a node) The message field contains a free-form text message describing the error. ** 0x06 message ALL-N $string:message A node receiving such a package should display it in some kind of console window. No user interaction should be required to acknowledge the message. Such messages should simply scroll by. ** 0x10 reqstatus T-N,N-T This package can be used by the tracker to request a status update from a node. The node must respond as soon as possible with updating all status information on the tracker by sending 'status' and 'block_got' packages as well as 'connstatus' packages. It is illigal for a node to send status updates to the tracker without beeing istructed to do so using a reqstatus package. When all status information has been sent to the tracker, a reqstatus package must be sent back from the node to the tracker to mark the end of the status update. ** 0x11 status N-N,N-T $byte:status_bitmap The node sending this package informs his peers about his status. The bits in status_bitmap have the following meaning: 0 Got the entire file (don't need to sent block_status) 1 Got the index metadata array 2 Got the attribute block 3 unused 4 unused 5 unused 5 unused 6 unused 7 unused The unused bits must always be set to 0. ** 0x12 block_status N-N,N-T $data:block_bitmap The block_bitmap has the bits for those blocks set which the node sending the package has. Trailing zero-bytes might be omitted. ** 0x13 block_got N-N,N-T $word:block_number The node sending this package informs his peers that it has set a bit in its block status bitmap. ** 0x20 request_chunk N-N $word:block_number $byte:chunk_number Requesting a chunk from the other node. ** 0x21 request_block N-N $word:block_number Requesting a block metadata array from the other node. ** 0x22 request_attr N-N Requesting the attribute block from the other node. ** 0x23 request_index N-N Requesting the index metadata array from the other node. ** 0x30 send_chunk N-N $word:block_number $byte:chunk_number $data:data The response to a request_chunk package. ** 0x31 send_block N-N $word:block_number $data:data The response to a request_block package. ** 0x32 send_attr N-N $data:data The response to a request_attr package. ** 0x33 send_index N-N $data:data The response to a request_index package. ** 0x40 connect T-N $byte:flags $byte:upload_priority $string:connid $string:initiator $string:responder The tracker instructs a node to initiate or respond to a connection. The connid is used as authentication credential for the new connection and also must be used in whenever referring to this connection. The initiator and responder fields do have the format "::". E.g.: TCP:192.168.23.42:666 Other protocols than "TCP" might be defined later. The bits in the flags filed have the following meaning: 0 Set when this node is initiator 1 Set when this is an "expensive" connection 2 Set when this is a new node-tracker connection 3 unused 4 unused 5 unused 6 unused 7 unused The unused bits must always be set to 0. The upload_priority field is used to prioritize the available upload bandwith. Connections with lower priority should not be stalled if higher priority connections are active. Instead the bandwith should be shared accordingly to the upload_priority values (255=max, 1=min, 0=illigal). ** 0x41 responderok N-T $string:connid This message is sent back from the node to the tracker after a connect package has been sent to a reponder. The tracker must wait with sending the connect package to the initiator for this package. ** 0x42 disconnect T-N $string:connid The tracker instructs a node to close a connection. The node does not need to close the connection immediately. Usually one completes the currently running requests first. ** 0x43 connstatus N-T $string:connid $dword:uploaded $dword:downloaded $word:peerblocks Such a package is sent to the tracker for each connection every now and then. It contains the number of bytes up- and downloaded since the last connstatus has been sent for this connection. Only the number of bytes in the 'data' fields of send_chunk, send_block, send_attr and send_index is counted here. It is very important that this counters are calculated right because the tracker might think that someone is cheating if they are drifting apart on the peers. The peerblocks field contains the number of blocks the peer has. This can be used by the tracker to verify that a node is not showing other nodes a different picture than the tracker. ** 0x44 conndrop N-T $byte:flags $string:connid A node might also disconnect from a peer without being instructed by the tracker to do so. It must send the tracker such a package then. The tracker does not need to send a 'disconnect' package back to the node. But the node must be ready to receive a 'disconnect' package after it has sent a 'conndrop'. This is possible because of race conditions when the node at the other end of the dropped connection also has send a conndrop package to the tracker. The bits in flags have this meaning: 0 Set when the peer was playing wrong 1 Set when there have been connectivity problems 2 Set when there was a protocol error 3 unused 4 unused 5 unused 6 unused 7 unused The unused bits must always be set to 0. ** 0x45 connflags T-N $byte:flags $byte:upload_priority $string:connid This package is sent by the tracker to change the metadata (flags and upload_priority) for an existing node-node connection. ** 0x50 indexer_register N-I $string:quafflerid $dword:slotid After connecting to a QUAFFLER indexer, a node must upload the list of files it wants to share (unless the indexer-read-only option is set). This is done by sending packages of this type to the indexer. Only files which are entirely downloaded may be registered to the indexer. The quafflerid field contains a QUAFFLER file identifier with the length field set to block_size * number_of_blocks. The slotid identifies this record in the context of this user. Sending another indexer_register package with the same slotid overwrites this record. Set all bits (-1) in the slotid to deactivate the mechanism. ** 0x51 indexer_unregister N-I $dword:slotid Removes a prior registered record. ** 0x52 indexer_flush N-I Removes all prior registered records. ** 0x53 indexer_revision N-I $dword:revision Register a database revision number. When reconnecting the server will send this number in the 'indexer-lastrev' option back to the node. This way the node can do incremental updates and does not need to re-send the entire file list when reconnecting. ** 0x54 indexer_join I-N $dword:slotid $string:tracker $string:quafflerid The node should create a node-tracker connection for one of its files because more seeders for the file are required. ** 0xff extension $string:name $data:data This package is used for extending the protocol. The peers must negotiate using the options in the initiator and responder packages which extension packages they support. An unknown extension package must cause a protocol error. Protocol Options ---------------- The initiator and responder packages do have an options field. This field contains a coma seperated list of options, such as: username=hugo password=secret Username and password. The values are URL encoded. This options are only legal in the node-tracker and the node-indexer mode. listenport The listening port of this node as decimal number. This option is required in the node-tracker mode. indexer-read-only Do not send indexer_register packages. This option is only legal in the indexer-node mode. indexer-lastrev The last revision set with the indexer_revision package. Must be passed and set to '0' if the node is recogniced but never has send a indexer_revision package to the indexer. This option is only legal in the indexer-node mode. Additional non-standard options must start with the "x-" prefix. Protocol Details: Node-Tracker ------------------------------ The node-tracker connection is always initiated by the node. The connection ID of this connection is the identity of the user and should be re-used in later connections to the same tracker. The node must have the "node-tracker" option set, the tracker must have the "tracker-node" option set. Right after the connection has been established, the node must send a "status" and a "block_status" package. After that, the tracker will instruct the node to connect to other nodes (see "Protocol Details: Node-Node" below). The node must keep the tracker up to date about its status using "status", "block_got" and "connstatus" packages. This can be done asynchronously with up to a few minutes delay to keep the TCP/IP overhead in the node-tracker communication as low as possible. The "conndrop" package may be used to tell the tracker that the node has problems with a connection and so has closed it. Protocol Details: Node-Indexer ------------------------------ The QUAFFLER protocol is design for big files. So the node-indexer protocol is designed for a small (max a few thousand) number of files. A to large number of files will create a huge overhead with registering all the files with the indexer. Every tracker is also an indexer. Node-tracker and node-indexer connections are only distinguished by the value of the mode field of the initiator package. Protocol Details: Node-Node --------------------------- The connection ID in Node-Node connections is the one specified by the tracker. The responder must wait for the "connect" package from the tracker before sending the "OK" line to the initiator. Both nodes must set the option "node-node" in the QUAFFLER banner. As soon as the connection has been established, the nodes inform each other about their current status using the "status" and "block_status" packages. They also keep each other up-to-date using "status" and "block_got" packages as long as they are connected to each other. Then the peers do request (request_index, request_block and request_chunk) and send (send_index, send_block and send_chunk) data. The protocol is completely asynchronous (i.e. a send_chunk must not follow directly a request_chunk). But a node might not send more than four pending requests. For the 5th request it must first wait for the first requests to be answered. - A node must first download the complete index metadata array and the attribute block from other nodes before requesting block metadata arrays and chunks. - A node must prefer finishing blocks it has started already in favor of starting new blocks. - A node must prefer downloading from connections which do not have the "expensive" flag set. - A node must prefer downloading blocks which only one or a few of his peers do have. - 'Good' random numbers must be used for the final decision which block should be downloaded next, when the other criterias are not sufficient to reduce the number of candidates to one block. There is no benefit for a node in breaking this rules. So there also is no good reason for cheating. Even if a node does cheat it does very limited harm to the network. So it also should be hard to sabotage a QUAFFLER network. Metadata File Format -------------------- The QUAFFLER metadata file format looks like this: 32 bytes (offset: 0) The string "-- QUAFFLER METADATA FILE V1 --" followed by a newline character. 20 bytes (offset: 32) The index checksum 8 bytes (offset: 52) The size of the file (big endian) 1 byte (offset: 60) The fragmentation scheme (as uppercase ASCII letter) 256 bytes (offset: 61) The connect string for the tracker in the format "::". E.g.: TCP:tracker23.example.com:4223 the remaining space in this field must be filled with zero bytes. The field may also be unused. In this case it must be filled entirely with zero bytes. 1 byte (offset: 317) A flag bitmask with the following meanings for the bits: 0 Got the entire file 1 Got the index metadata array 2 Got the attribute block 3 unused 4 unused 5 unused 6 unused 7 unused The unused bits must always be set to 0. [number of blocks] bits (zero-padded to a multiple of 8 bits) (offset: 318) The block complete bitmap: A bit is set when the corresponding block is downloaded completely. [number of blocks] bits (zero-padded to a multiple of 8 bits) The block in-progres bitmap A bit is set when the corresponding block is downloaded partially. This is set when the block metadata array has been downloaded and is cleared when the block is complete. [number of blocks + 1] * 20 bytes The index metadata array. The last sha1 checksum is the checksum of the attribute block. [number of chunks per block] bits (zero-padded to a multiple of 8 bits) The chunk complete bitmap: A bit is set when the corresponding chunk is downloaded completely. [number of chunks per block] * 20 bytes The block metadata array 2 bytes The size of the attribute block. variable-length data The attribute block. If the ".qmdf" file is smaller than it should be, the remaining space can be assumed that the missing that is filled with zero bytes. That makes it possible to download a ".qmdf" file that only has the header (including the connect string) and then join the network without the need for specifying a QUAFFLER URI. The QUAFFLER metadata files are called like the data files, but with the ".qmdf" file extension appended. The QUAFFLER metadata files have aproximately less then 0.035% the size of the data file. So the ".qmdf" file for a 1 GB data file is less then 350 KB big. Multifile Downloads ------------------- Sometimes one wants to share an entire directory tree instead of a single file using QUAFFLER. In this case the file shared using QUAFFLER is in fact some kind of virtual archive instead of a real file. The first part of the file shared using QUAFFLER is a QUAFFLER directory index (.qdix) file. The format is as following: 32 bytes The string "-- QUAFFLER DIRINDEX FILE V1 --" followed by a newline character. 2 bytes Length of the file name (big endian). vardata File name with '/' as directory seperators. 8 bytes Size of the file (big endian). 2 bytes Number of the block in the QUAFFLER file holding the first block of this file. A multifile download has the attribute "multifile" set in the attribute block. The value is the size of the QUAFFLER dirctory index file as decimal number. A client downloading a file with the "multifile" attribute set must honor an additional rule: First download the entire dirctory index file. Empty directories are not shared. Non-empty directories are only shared because they are implicitely created when the files in the directory are written. Padding, tail-blocks and tail-chunks ------------------------------------ Concerning the QUAFFLER protocol and calculating the sha1 checksums the last chunk of a file is zero-padded and the remaining chunks of the last blocks are filled with zeros. Since all clients do know the size of the file they also know what data in the last chunk is padding and what chunks do consist entirely of padding. Chunks which do not contain any real data also do not need to be transfered. A sequence of zeros at the end of the payload of the send_* packages may always be ommitted, regardless if it is padding or real data. The size of the entire file is known because it is part of the QUAFFLER identifier. The size of the index metadata array can be calculated from the size and the fragmentation scheme. The size of the attribute block can be easily determined because zero bytes are not legal inside the attribute block. In case of multifile downloads the size of the ".qdix" file is stored in the "multifile" attribute and the size of the real files is stored in the ".qdix" file. Multitracker Environments ------------------------- [ To be written ] QUAFFLER and Multicasting ------------------------- [ To be written ]