I just completed a very successful and long overdue dist-upgrade of my unstable machine, using mainly DebTorrent for downloading the packages, so I thought I'd post some of my thoughts and experiences.

The download consisted of 1294 packages to upgrade, totalling 1350 MB and taking 2h12m to download. Here are some of the good and bad things I observed.

Good things

There was a single other peer with me in the same torrent, and I managed to download 182 MB from him, which is about 13.5% of the total download. This is the first time I have noticed this downloading from peers occurring, as usually there are too few peers, too many torrents and too many possible packages to download for any sharing to occur. This will hopefully change in the future when more people start using DebTorrent, and when unique piece numbers are introduced to make the torrents last longer. However, it does show how the use of the backup HTTP downloader can seamlessly integrate with downloading from peers to provide a good user experience, even for early adopters.

The CPU time used was only 10m46s, which translates to an average CPU usage of 8%, which is very reasonable.

The average download speed was 174 KB/s, which is 58% of my maximum download speed. Though this may seem like a bad thing, my goal all along has been to make sure that the download time would not be more than twice as long as using HTTP. Of course, using DebTorrent may never be as fast as a straight HTTP download from a well-provisioned server, but that it not the point. The idea is to reduce the bandwidth needs of hosting a debian archive. But in the future, when there are many peers in a single torrent the download speed may be even faster than using HTTP, especially for peers with very high download rates that could not be matched by a single server.

Things that need improvement

The completion percentage reported by APT during the download was fairly inaccurate. Here are some sample readings I noted, compared with the actual completion percentage from the DebTorrent status page:

APT Reports Actual Completion
10% 40%
20% 70%
30% 85%
40% 90%
50% 93%
60% 96%
70% 97.5%
80% 98%
90% 99%

The discrepancy occurs because larger packages are broken up into pieces, so they can be partially downloaded without APT knowing about it (since only fully downloaded packages are passed to APT). Clearly this situation is far from ideal, and can lead to the user feeling that the download is progressing very slowly, or not at all. There are some plans to add status updates to the communication between APT and the DebTorrent client, but they require changes to the APT code to support them, so it may take some time to implement.

Another problem is the memory usage I saw during the download, which was approximately 213 MB. This is obviously unnecessarily large, though the metainfo that DebTorrent needs to be aware of is quite large (stored in a text file it is about 3 MB). There does seem to be a memory allocation bug in Python 2.4 which causes increased memory usage, so moving to python 2.5 might help. However, my preliminary tests show this only saves you about 20% for DebTorrent. I think I will have to delve deeper into which parts are using all this memory, and unfortunately python doesn't seem to have a good memory profiler to help with this. I will be looking at both PySizer and Guppy/Heapy to start, but if anyone knows of a better solution, please let me know.