Welcome to my blog. Have a look at the most recent posts below, or browse the tag cloud on the right. An archive of all posts is also available. If you want to leave a comment, please create or edit the Discussion page for the post you'd like to comment on. You should not need to login.
I have just uploaded an important update to apt-p2p that is highly recommended for all users.
At some point over the last 48 hours bittorrent nodes have started to infiltrate the apt-p2p DHT (you may have noticed many ValueError tracebacks in your log file). The (mainline) bittorrent DHT is very similar in protocol to apt-p2p, and so the nodes are able to partially communicate and pollute each others routing table. I didn't think this would ever happen, as there wouldn't seem to be a reason for them to ever come into contact, but somehow it did.
I've made some minor changes that exploit the differences between the two protocols to exclude bittorrent nodes from the apt-p2p routing table, and to drop any requests from bittorrent nodes (which should prevent apt-p2p nodes from polluting the bittorrent DHT). However, it is very important that all users upgrade to this new version to prevent any further mixing of the DHTs.
This new version of DebTorrent finalizes the work I previously uploaded to experimental. This new version is the first in unstable to use unique piece numbers to create long-lived torrents for testing and unstable. It also comes with a new version of the apt-transport-debtorrent package, which can now tell APT not to cache any package files downloaded with DebTorrent, which should save some disk space for some people.
The locations of the long-lived torrent files and piece files that DebTorrent uses has also changed, as they are now hosted on my merkel page and should be more up to date. However, I was not able to get the ssh trigger for the sync from ftp-master setup, nor am I able to access the projectb database through pygresql. If anyone knows how I can get those working, please email me.
Here is the changelog for debtorrent:
- Display the torrent identifier on the local status page rather than the info hash (Closes: #465339)
- Add support for the new No Content message to apt-transport-debtorrent so that apt treats debtorrent sources as local
- Make the pieces and unique piece number locations config options, and use the new ones at merkel.debian.org/~camrdale/.
- Decompress the files needed to create torrents so they are created and started faster (this and the previous Closes: #463676)
And for apt-transport-debtorrent:
- Add a config option to make debtorrent sources local (Closes: #477383)
- Upgrade the recommends on debtorrent to 0.1.7.
- Add a config file for APT to load, and a manpage describing the options.
This update brings mostly speed improvements to finding peers to download from, and performing the download. Nodes in the DHT are now checked for responsiveness more thoroughly before adding them to the routing table, and rechecked more frequently afterwards. This should prevent bad nodes from propagating through the system, and departing nodes remaining in the system long after they have left. Downloading from peers is also quicker, as bad peers are dropped with a quicker timeout value and after a small number of errors. There was also a bug revealed by about 10% of the mirrors that caused downloads to fail, and was fixed in version 0.1.2.
This update is STRONGLY recommended, as keeping unresponsive nodes out of the DHT will make everyone's experience better. If you tried apt-p2p and had some problems, I hope you'll consider trying again now. Unfortunately, these are enhancements I could not predict needing before releasing to the public, as the problems they solve are all caused by having a larger number of users, some of which are unresponsive.
There are still a large number of peers (maybe 50% of them) that are
unreachable, and so can not share any files with other peers. I don't
have a good way to check yet, but you can go to sites such as this
one to probe your client for you. Just enter the remote IP address
and port (available on the status page) of your client in the form
http://18.104.22.168:9977/). If the check
returns any HTTP headers (even a 404 Not Found response) then you're
fine, but if it doesn't return anything then your peer is firewalled or
NATted and should probably be fixed.
Here are the changelog entries for 0.1.2 and 0.1.3:
- Speed up downloading from peers
- Set a new peer's ranking values so they don't get an unfair advantage.
- Reduce the HTTP connection timeout to 10s.
- Drop peers after a limited number of errors.
- Speed up the DHT requests when nodes fail
- Schedule a re-ping message after adding a new node.
- When a node fails, schedule a future ping to check again.
- Send periodic finds to nodes that are stale.
- Increase the stored value redundancy to 6.
- Increase the concurrency of DHT requests to 8.
- Add early termination and ignoring slow responses to recursive DHT actions when timeouts occur.
- Remove the debconf note about port forwarding (Closes:#479492)
- Add a NEWS entry for port forwarding
- Fixed a bug in the HTTP downloader that caused errors with some mirrors that always close the connections (Closes: #479455)
After many long months of planning and work, I have completed another peer-to-peer downloader for Debian. If you've been keeping track, that makes 2 now. This one is called apt-p2p, and as of yesterday it is available in unstable.
The functionality is very similar to the first one I wrote,
DebTorrent, so if you've used that one you should feel very
comfortable. After installation you just add a
localhost:9977 to the
start of your sources.list entries (see the man page). The only difference is the port
number (DebTorrent uses 9988 by default) and that you can use it on all
your sources.list entries, whether they be
Debian archive, or any other archive. Then an
apt-get update gets it
started, and you can begin installing packages. Point your web browser
http://localhost:9977 to see what's going on.
IMPORTANT: as with any P2P program, it works much better if you open a port through your NAT or firewall. Without this crucial step, you won't be able to share with any peers, and your lookup of peers to download from may take longer. Make sure to forward both TCP and UDP ports 9977, or whichever port you set in the config file. For more details, see the port forwarding section of the FAQ.
The similarities with DebTorrent are all external, so let's look at how it differs from DebTorrent internally:
- it's very general, it doesn't matter what you're trying to
download, a source package or a Packages file, from constantly-updated
unstable or a year-old stable, for i386 or hppa architectures
(DebTorrent only works for
.debfiles, is only supported for the official archive, and breaks downloaders into groups by architectures)
- it doesn't require anything other than what's available to apt (DebTorrent uses piece hashes of large files, and ordering information, both of which are downloaded separately)
- it can be very fast when downloading from mirrors (mirror downloads with DebTorrent are not so fast)
- the code is simple, and makes use of available code, such as Twisted, Khashmir, and python-apt, which should make future enhancements and maintenance easier (DebTorrent is large and monolithic)
- requires less memory and CPU power (50% to 75% less memory than DebTorrent)
Here's some technical details for those interested:
- makes use of hashes to uniquely identify files
- uses a Distributed Hash Table (DHT) to find peers
- also stores piece hashes in the DHT for efficient downloading
- uses HTTP/1.1 requests to download from peers
- no peers available causes it to fallback to a download from the mirror
Though I know it can be fast, I'm not yet sure if the peer lookup in the DHT will be quick enough to keep up with the downloading. All my tests so far show that it is, but until there are a number of peers out there trying it, I can't be sure. Also, I have some improvements in mind to enhance the speed, in particular the wait for a timeout to occur, so this may improve in the future. If you see a delay in downloading where apt seems to be stalled saying 'Waiting for headers', for now be patient and see what happens. It may be that apt-p2p is downloading in the background (it does this sometimes), or that it's waiting for a lookup in the DHT to complete. If it hangs for more than a minute, or there's errors in the log file, please file a bug so that I can look into it.
Finally, for you DebTorrent fans don't worry, I haven't given up on it. Stay tuned for more info on it coming soon.
I have uploaded a new version of DebTorrent to experimental and to my personal repository. This new version implements the unique piece numbers, which has been planned for quite some time now. Well, it's finally here.
To summarize, unique piece numbers keep torrent's alive longer, by assigning files unique piece numbers that never get reused in that torrent. New files get new piece numbers added to the end, but peers in the old torrent can still share most of the old files with peers in the new torrent (in fact, it's the same torrent, but old and new peers have different ideas about what it contains).
The creation of torrents from Packages files has also changed, as now 2
torrents are created, one for whatever arch the Packages file was for,
and one for the Architecture:all files. Since the Arch:all files are the
same for all architectures, this also allows for more sharing of common
files between peers on different architectures. This change required a
change in the cache directories, which is described in more detail in
the NEWS file, but is handled almost automatically. The only thing
to do is the make sure to do an
apt-get update after upgrading so that
the torrents can be restarted.
There was also some changes in the statistics reporting. In the client, the uploaded and downloaded statistics will now persist over restarts so that you can see how much you've done over a longer period. The tracker status page also got an update, and now shows the total uploaded and downloaded bytes for each torrent, as well as some more descriptive names for some of the torrents.
Due to the large changes in this release, I opted for uploading to experimental (and my repository) for now, so that it can be tested a bit. Please do test it, and let me know of any bugs, problems or concerns that you may have.
Here is the changelog:
- Add support for unique piece numbers
- increases duration of oft-updated torrents so that more peers can participate
- currently supported only by debian testing and unstable
- see http://wiki.debian.org/DebTorrent/UniquePieces for more info
- Switch to using 2 torrents per Packages file: one for architecture-
specific files, and one for architecture-independent files
- also added a new script splitcachefor_all to ease the upgrade
- Use python-debian for all reading of RFC 822 type files
- also requires python-apt
- Add torrent names to the tracker display
- Make the download/upload statistics persist over restarts
- Report more and better statistics on the tracker's info page
It's been a while, but a new update to DebTorrent is available in unstable. There is also a new version of the helper apt-transport-debtorrent (0.2.0) to go with it, which was mostly the cause of the long delay (due to a pending apt transition to testing). It is now highly recommended to install both, read below for new reasons why.
One of the previous problems with debtorrent was the incorrect
status updates displayed by apt, due to it not being aware of
pieces of large files that have already downloaded. Using these new
versions together will help to improve the update status messages
shown during a download. Now,
apt-get update will display a status
at the bottom that will look something like:
DebTorrent: 837MiB left at 319 KiB/s (46m03s)
You may not be able to see it all if there are multiple updates to different mirrors under way. In aptitude, the status will appear on a line by itself at the bottom of the list of downloaded files. The status line will update about once per second, and should give a good indication of how much time is really left. The other (incorrect) status information presented by apt/aptitude will still be there but you can ignore it. I've only tested apt and aptitude, so I'm not sure how (or if) synaptic, adept or gnome-apt will display these messages, though they are general apt status messages, usually used for displaying the status while logging into an FTP server.
This should also fix another problem that debtorrent can have
with downloading large files (usually on slower connections). Since
apt can't see the download happening for a long time, the connection
would sometimes time out. With these status updates the connection
is constantly active, so the timeouts shouldn't be a problem when
downloading large packages (although they may still occur when doing
apt-get update, as there are no status updates during that
I did implement a better solution, as suggested by MichałPolitowski on the wiki. This solution uses the creation of sparse files to let apt know how much of a file has actually been downloaded. However, apt doesn't understand how a single process can download multiple files in prarallel, so it only follows the last file that was started, and complains when a file finishes that was not the last one that started. I've disabled that for now while I see if it's possible to modify apt to work better in this situation.
Here are the changes for debtorrent:
- Update to support apt debtorrent transport version 0.2
- send piece downloaded status messages (currently disbaled due to apt not liking it)
- send general status update messages
- Fix some minor packaging issues
- Upgrade to standards version 3.7.3 (no changes)
- Remove the unneeded binary-arch rule
- Changes the XS-Vcs-* headers to Vcs-*
- Moved Homepage from description to Source package fields
And for apt-transport-debtorrent:
- Upgrade transport version to 0.2
- Create sparse files based on status messages (102) from DebTorrent
- Send general status updates to apt (103) from DebTorrent
- Remove support for Range headers as they may confuse the sparse file allocation
- Fix typo in long description
- Upgrade to standards version 3.7.3 (no changes)
- Changes the XS-Vcs-* headers to Vcs-*
- Moved Homepage from description to Source package fields
Since the conclusion of the Google Summer of Code, the DebTorrent project has been without a sponsor. I was going to email debian-mentors to see if anyone was interested, but I decided to first post to my blog to see if there are any Debian Developers reading this who have the time and are especially interested in sponsoring it.
Here are some of the details of the package:
- Package name: debtorrent
- Current version: 0.1.4.1
- Author: Cameron Dale firstname.lastname@example.org
- URL: http://debtorrent.alioth.debian.org/
- License: MIT
- Section: net
- Priority: optional
- Language: python
- VCS: subversion http://svn.debian.org/wsvn/debtorrent/debtorrent/trunk/
It is currently maintained using an svn-buildpackage subversion repository, but I might be interested in switching to git if a sponsor is interested.
I recently upgraded my Internet service with Telus, which included a new wireless router to replace my old high-speed modem. The new router is a 2Wire 2700HG-E which is configured through a web interface.
The web interface worked fine at first on some other machines, but whenever I accessed it from my main desktop machine the pages would either fail to load, or have the images in the wrong locations. After much debugging (though it seems obvious now) I finally discovered that this was due to my enabling HTTP pipelining in FireFox on that machine (as suggested by Mozilla).
Being the diligent citizen that I am, I of course filed a support request so that this could be fixed in the next version of the 2Wire software. Here is the response I received:
The HomePortal GUI was designed for internet browsers on default settings. We recommend keeping the pipelineing [sic] setting on "false".
Not exactly a good solution, and a little surprising, considering it's relatively straight-forward to implement. I know because I recently upgraded a simple HTTP handler from 1.0 to 1.1 specifically to support pipelining, as part of my work on DebTorrent.
Now I've just read "Structured streams: a new transport abstraction", in which the authors state:
implementing pipelining correctly in web servers has proven challenging enough that seven years after the standardization of HTTP/1.1, popular browsers still leave pipelining disabled for compatibility
So it seems pipelining is disabled for a reason, though I had it enabled for years in FireFox and saw no problems (that I know of) until now. Maybe the problems are a thing of the past. Does anyone else have any experiences (good or bad) with pipelining to share?
I just completed a very successful and long overdue dist-upgrade of my unstable machine, using mainly DebTorrent for downloading the packages, so I thought I'd post some of my thoughts and experiences.
The download consisted of 1294 packages to upgrade, totalling 1350 MB and taking 2h12m to download. Here are some of the good and bad things I observed.
There was a single other peer with me in the same torrent, and I managed to download 182 MB from him, which is about 13.5% of the total download. This is the first time I have noticed this downloading from peers occurring, as usually there are too few peers, too many torrents and too many possible packages to download for any sharing to occur. This will hopefully change in the future when more people start using DebTorrent, and when unique piece numbers are introduced to make the torrents last longer. However, it does show how the use of the backup HTTP downloader can seamlessly integrate with downloading from peers to provide a good user experience, even for early adopters.
The CPU time used was only 10m46s, which translates to an average CPU usage of 8%, which is very reasonable.
The average download speed was 174 KB/s, which is 58% of my maximum download speed. Though this may seem like a bad thing, my goal all along has been to make sure that the download time would not be more than twice as long as using HTTP. Of course, using DebTorrent may never be as fast as a straight HTTP download from a well-provisioned server, but that it not the point. The idea is to reduce the bandwidth needs of hosting a debian archive. But in the future, when there are many peers in a single torrent the download speed may be even faster than using HTTP, especially for peers with very high download rates that could not be matched by a single server.
Things that need improvement
The completion percentage reported by APT during the download was fairly inaccurate. Here are some sample readings I noted, compared with the actual completion percentage from the DebTorrent status page:
|APT Reports||Actual Completion|
The discrepancy occurs because larger packages are broken up into pieces, so they can be partially downloaded without APT knowing about it (since only fully downloaded packages are passed to APT). Clearly this situation is far from ideal, and can lead to the user feeling that the download is progressing very slowly, or not at all. There are some plans to add status updates to the communication between APT and the DebTorrent client, but they require changes to the APT code to support them, so it may take some time to implement.
Another problem is the memory usage I saw during the download, which was approximately 213 MB. This is obviously unnecessarily large, though the metainfo that DebTorrent needs to be aware of is quite large (stored in a text file it is about 3 MB). There does seem to be a memory allocation bug in Python 2.4 which causes increased memory usage, so moving to python 2.5 might help. However, my preliminary tests show this only saves you about 20% for DebTorrent. I think I will have to delve deeper into which parts are using all this memory, and unfortunately python doesn't seem to have a good memory profiler to help with this. I will be looking at both PySizer and Guppy/Heapy to start, but if anyone knows of a better solution, please let me know.