This feed contains pages in the "GoogleSoC" category. These pages all relate to my work with the 2007 Google Summer of Code.
The next release of DebTorrent is now available. This release includes new functionality for communicating with APT using a new transport method specifically designed for debtorrent, and many bug fixes.
The major changes in this release are all in the communication between APT and DebTorrent. The HTTP server in DebTorrent that listens for APT requests has been upgraded to support HTTP/1.1 persistent connections and pipelining, which allows APT to have multiple outstanding requests for files. This is useful as DebTorrent suffers from the typical bittorrent slow start, so requesting multiple files at a time helps to speed up the download considerably.
Though better, HTTP/1.1 is not ideal for DebTorrent however, as a maximum of 10 outstanding requests is maintained by APT's http method, and files must still be returned in the order they were requested (which is not ideal for bittorrent-type downloading since downloads occur randomly).
To further improve the APT communication I have modified APT's http method to create a debtorrent method. This new debtorrent transport for APT is packaged separately as apt-transport-debtorrent, and once installed APT can be told to use it by replacing "http://" with "debtorrent://" in your sources.list file. This method sends all requests it receives immediately to DebTorrent, and will receive responses from DebTorrent in any order. You can find this new method on the Alioth project, or in my personal repository (only amd64 and i386 versions are currently available).
Unfortunately, the story doesn't end here. The APT code responsible for sending requests to the method also limits the maximum number of outstanding requests that it will send to the method to 10, which is not really necessary since all existing methods limit the requests they send out themselves. I have therefore patched the current APT code to increase this limit to 1000 (a one line change), and released this patched version as 0.7.6-0.1. You can find this patched version in my personal repository (again, only for i386 and amd64). I have tested it with the other methods available and it causes no problems, and I hope to get the change included in the regular APT code soon.
To sum up:
- new DebTorrent over HTTP = fast
- new DebTorrent with new apt-transport-debtorrent = faster
- new DebTorrent with new apt-transport-debtorrent and a patched APT = fastest
The last DebTorrent version (0.1.3.1) is currently in the NEW queue, and judging by the length of it, will be there for about another week. After DebTorrent is added to the archive, I will be upgrading it to this new version. I also hope to get the new apt-transport-debtorrent package into the NEW queue soon.
This brings to an end the Google Summer of Code (which this project was created as a part of), but development of DebTorrent will of course continue (probably a little slower). The next major change will be the addition of unique piece numbers, which is almost complete but needs to be extensively tested. I'd like to thank Anthony Towns, Steve McIntyre, and Michael Vogt for their help over the last 4 months, and also the many others who sent me encouraging emails or engaged in interesting discussions about this project. It's the people who make a project like this a fun and memorable thing to do.
Here's the changelog for the new DebTorrent release:
- APT communication supports HTTP/1.1 connections, including persistent connections and pipelining
- Add support for the new debtorrent APT transport method (see the new apt-transport-debtorrent package)
- Make the Packages decompression and torrent creation threaded
- Improve the startup initialization of files
- Add init and configuration files for the tracker
- bug fixes:
- restarts would fail when downloaded files have been modified
- deleting old cached data would fail
- small tracker bug causing exceptions
- prevent enabling files before the initialization is complete
- only connect to unique peers from the tracker that are not already connected
- tracker would return all torrents' peers for every request
This release fixes 2 bugs, one minor, and one serious. The serious bug would probably have caused anyone using a recent (0.7) version of APT to experience hangs with APT saying "waiting for headers". If you had this issue and you're using APT 0.7, please update to this new version. If you had this issue with an older version of APT, please report it as a bug.
Agian, if you do find any bugs or have any problems, please submit them to the DebTorrent mailing list, or come and find me (camrdale) on IRC in the #debtorrent channel on OFTC.
Here's the changelog:
- First debian package release (again) (Closes: #428005)
- fixed: cached HTTP 404 responses get passed properly to APT
- fixed: downloading the same file from a previous torrent works now
I still consider the program to be alpha quality, though all the functionality for a beta release is there, it just needs to be tested (so tell your friends). I run the program daily and use it for my apt-based updating, so I'm pretty sure it works, but there are definitely bugs. If you do find one please submit it to the DebTorrent mailing list.
deb http://debian.camrdale.org/ unstable main contrib non-free
Once installed, it will start running automatically, and will restart on bootup, so all that is needed is to modify your sources.list files to point them at DebTorrent by prepending localhost:9988 to the mirror name. For example, the entry above for my personal repository would become:
deb http://localhost:9988/debian.camrdale.org/ unstable main contrib non-free
Here's the changelog:
- First debian package release (Closes: #428005)
- Cleanup all the configuration options
- Add a global config file
- Moved all logging to log files
- Stopped displaying periodic updates
- Added init script and default options
This is also the first release that I consider actually useable, as it now listens for HTTP requests from APT for packages to download, and feeds the downloaded packages back to APT. It also includes a backup HTTP downloader that will use a Debian mirror to download packages from, only when no peers can be found that have them. This means your download always works, even if you're an early adopter (which I hope you are) and there aren't that many peers available. Finally, the larger packages have now been split into multiple pieces, which makes downloading them much more efficient.
Here's the changelog:
- Add proxying capability to listen for HTTP requests from APT
- Add caching for all files downloaded
- Add automatic starting of torrents when Packages files are downloaded
- Modify startup to initialize all torrent downloads to download nothing
- Add automatic enabling of files to download based on requests from APT
- Add a backup HTTP download from a mirror when no peers can be found for a package
- Modify torrent creation to break large packages into multiple pieces based on the information from http://merkel.debian.org/~ajt/extrapieces/ (thanks to aj for most of this)
- Add download status information available from http://localhost:9988/
- Add lots more documentation
I've already started work on the next release, which will include almost no new features, but will be much easier (I hope) to use. It will also be distributed in a .deb binary format for the first time, and (again I hope) be available in the Debian archive. Here are the plans I've come up with for the steps to complete for the next release:
- Make a debtorrent daemon script based on btlaunchmany
- Make a config file with lots of explanations
- Load configuration info from /etc
- Save downloads and state to /var/cache
- Log all messages to /var/log
- Clean up the debug logging, possibly add debug levels
- Use bittornado packaging files (debian/)
- Add init script
I've never created/packaged a daemon before, so hopefully I haven't made any blunders in those plans.
I've been spending some time on the #bittorrent channel on Freenode, to see if I could get any good information from other bittorrent client developers. After some poor initial contact, I did get some good suggestions, as well as some tidbits of information I wasn't aware of.
One of the tidbits was that most of the bittorrent client developers frowned on the use of selective downloading for torrents, to the point where some refuse to implement it in their clients (apparently the latest mainline client doesn't even have it, though I haven't checked due to license issues).
This concerned me, as the DebTorrent client I'm working on will rely heavily on selective downloading to only download packages to be installed by the user. It's not clear to me that it would be an issue though. I'm sure it will make it more difficult to find a rare package in a large swarm of peers, but that's to be expected, and is nicely solved by using an [[!HTTP download from a backup mirror|Jun-27-2006]]. This backup downloading could become extreme though, making the bittorrent-like download (peer-to-peer) mostly just an HTTP download (client-server). To see if it will be a problem, I conducted a simulation of different sized swarms to determine the amount of unnecessary HTTP downloading that will occur.
First, some assumptions to make the simulation easier:
- peers join and download sequentially and one at a time
- peers never leave
- peers download all the packages for their system at once (i.e. all fresh installs, no upgrades)
- peers are all interested in the same version and architecture of packages
- there are N total peers, and each can make C connections to other peers
With these assumptions, the simulation becomes quite simple. I used the popcon data to assign packages appropriately to the N peers. The peers download their assigned packages from the C previous peers, if possible, or otherwise by HTTP. I ran this multiple times, varying N and C, and calculating each time how much was downloaded through the debtorrent protocol, and how much through HTTP.
This graph shows the percentage of the total download that used HTTP from a mirror. The optimal line shows the minimum possible (i.e. all peers are connected), which corresponds to only one copy (the first) of every package being downloaded using HTTP.
This verifies my previous thinking, which is that fracturing the download population into multiple small swarms results in lots of inefficiencies, and lots of HTTP downloading. All efforts will need to be taken to keep the downloading populations together. (This is especially difficult for unstable, when a new swarm could be created twice a day due to archive updates.) Swarms of 1,000 peers or more seem to be sufficient to minimize the HTTP downloading.
This graph shows the amount of unnecessary HTTP downloading that occurred, which is just the difference of each line from the optimal one in the previous plot. This shows the danger of selective downloading, as all of this unnecessary HTTP downloading occurs because the swarm is too big to find peers that have the desired package (they do exist). Fortunately, this unnecessary downloading seems to approach a maximum as the swarm size increases.
For the large swarm sizes needed to make the downloading efficient, and assuming a reasonable number of connections of 100 (bittorrent defaults to maintaining at least 40, and large swarms usually means a large number of connections), we can expect to be using HTTP for about 4% of the download. This number is surprisingly low due to the popularity distribution of packages (only rare ones are hard to find, but they aren't downloaded very often). I think 4% is manageable in our situation, given that there is a backup method to find the rare packages, though it does show the need to have that backup method available. This is probably more of a problem for regular bittorrent clients, in which there is no backup method.
It's been a while since I've given a status update for my Google Summer of Code project to create a BitTorrent proxy for downloading packages using APT, so here it is.
I've been working hard on integrating support for APT into the DebTorrent program. I've almost got it working perfectly, now it's just a matter of testing to make sure all is well. The functionality works like this:
- DebTorrent listens on a port for HTTP requests
- an http://localhost:port/mirror_name/debian/... entry is added to APT's sources.list file (similar to how apt-cacher works for proxying)
- an apt-get update will then send HTTP requests to the DebTorrent
- DebTorrent proxies these requests, downloading files it doesn't have and saving them in a cache before passing them on to APT
- DebTorrent recognizes requests for Packages files, and also uses them to start the torrents for those files
- an apt-get install will send an HTTP request to DebTorrent for a
package (.deb) file
- rather than getting package files from the mirror, DebTorrent finds the file in one of its running torrents and enables it for download
- the package file is downloaded (either from other peers, or using the backup HTTP downloader that gets it from the mirror if no peers can be found)
- once the download is complete, DebTorrent passes the package file to APT
There are two things I really like about this. One of the best is the backup HTTP downloader. It insures that if you're an early adopter and there are no peers, or if the package you're requesting is rare and can't be found in any connected peers, the download will still occur in a reasonable amount of time (taking no more, or less, mirror bandwidth than if you had just been using APT directly). The other thing I like is that you get the BitTorrent-style peer-to-peer downloading, with simple HTTP proxying thrown in for free. You can run DebTorrent on a single computer on your network, and have the others connect to it to initiate downloads and request packages.
I haven't done any serious time testing, but I estimate it currently only takes twice as long to use as a regular APT update and download from a mirror. Most of that slowdown is because it currently only processes a single request at a time from APT, which is not very efficient for BitTorrent systems where downloading from multiple peers is how the highest download speeds are achieved. I have been talking to the APT maintainer, Michael Vogt, about a better way to do this, probably by adding a new APT transport method for DebTorrent (i.e. debtorrent:// instead of http://). This will not only speed up the downloads, but also hopefully provide better feedback to the APT user, as currently it will seem nothing is happening until the download comes in all at once at the end.
My work has unfortunately been slowed by other commitments, and bugs. I have two papers to submit to a conference by Monday, but after that I should be back full time on DebTorrent. I also spent a long time tracking down and fixing a bug in the underlying BitTornado code (which lead to much rejoicing at 3am), only to find it was fixed in upstream's CVS. (Doh!)
The start has finally arrived, today is officially the first day of coding for the 2007 Google Summer of Code.
Some of the changes in this release are:
- Added ability to parse dpkg status for priorities of files to download
- Fixed a bug in bittornado that prevented using priorities with pre-allocation
- Directories are no longer pre-allocated when they will contain no files
You can read detailed instructions for how to download using it in the README file, but the basic idea for using DebTorrent as part of an update is to:
- Update your Packages files with apt-get update
- Download packages using DebTorrent and the new
- set it to 1 to download all packages currently installed
- set it to 2 to download only new versions of installed packages
- Point Apt to the location of your downloads
- apt-get update again, and then install
This is still an alpha release, but at least it has some more functionality than the previous one. Feel free to test it, but keep in mind that at this early stage most of the torrents will be unseeded.
I spent most of this past weekend (a long weekend in Canada, thanks to our royal heritage) writing documentation for the DebTorrent code. The original BitTornado code was seriously lacking in this area, having only 670 lines of comments out of almost 20,000 total lines of code, which is 3.5% if you're counting. (In fairness, this may be due in part to it being based on the original bittorrent client.)
Since Python is supposed to be known for it's well-documented code, I decided to tackle this shortcoming sooner rather than later. It's also a good opportunity for me to become familiar with all of the code. I'm only about halfway there, but the end result of my weekend's worth of work is this diffstat:
DebTorrent/BT1/Choker.py | 99 ++++++++ DebTorrent/BT1/Connecter.py | 313 +++++++++++++++++++++++++++ DebTorrent/BT1/Downloader.py | 448 ++++++++++++++++++++++++++++++++++++++- DebTorrent/BT1/Storage.py | 418 ++++++++++++++++++++++++++++++++++-- DebTorrent/BT1/__init__.py | 9 DebTorrent/BT1/btformats.py | 47 ++++ DebTorrent/BTcrypto.py | 147 ++++++++++++ DebTorrent/ConfigDir.py | 248 +++++++++++++++++++++ DebTorrent/ConnChoice.py | 12 - DebTorrent/CurrentRateMeasure.py | 69 +++++- DebTorrent/__init__.py | 14 + DebTorrent/bencode.py | 193 ++++++++++++++++ DebTorrent/bitfield.py | 89 +++++++ DebTorrent/clock.py | 53 ++++ DebTorrent/download_bt1.py | 380 ++++++++++++++++++++++++++++++++- DebTorrent/inifile.py | 70 ++++-- btcompletedir.py | 25 ++ btcopyannounce.py | 20 + btdownloadheadless.py | 111 +++++++++ btlaunchmany.py | 56 ++++ btmakemetafile.py | 19 + btreannounce.py | 6 btrename.py | 11 btsethttpseeds.py | 6 btshowmetainfo.py | 6 bttrack.py | 6 setup.py | 20 + 27 files changed, 2809 insertions(+), 86 deletions(-)
I've also started using the epydoc program to automatically generate some documentation web pages for easy browsing of the code. So far it's worked out great! So well that I decided to use some of the new features available in a recent beta release that was not yet in the debian archive. The packaging seemed simple enough, so I created my own package containing the new upstream version, which is now available in my repository.
The first release is complete. You may now download debtorrent version 0.1.0 from the alioth project file list.
- it's an alpha release, so don't expect too much functionality
- there's currently no communication with apt, so a download will be the entire archive (unless you write a really long priority list)
- there will be very few people using it, so seed a torrent yourself to test, or check out the tracker to find torrents that are seeded
Here's the changelog:
- Initial release, based on BitTornado 0.3.18
- Added variable-sized pieces capability
- Modified programs to create/use .dtorrent files
- Added ability to use Packages files directly
Well it's been a busy week. Work is truly underway, as I have just about finished the first release of debtorrent. The first step in the process, adding variable-sized pieces the size of each file, has been completed and seems to work properly. I've tested it using a single seed, and with 3 downloaders. All were able to contact each other, and all downloaded the entire archive successfully. The archive I tested it on was stable/contrib/i386, as it is only 112MB.
There's just a few more modifications to make to polish the code, and then the 0.1.0 version of debtorrent should be released in the next week. It would be out already, but I've been plagued with a cold for the last week.
I had a lengthy discussion with Anthony about the eventual need for some modifications to the apt-ftparchive program to support some of the proposed advanced debtorrent functionality. More investigation is needed on the implications of these changes, which will probably require larger Packages/Release files to be generated, and some extra computation time. To collect information about these changes and their implications, I've created a new wiki page.