Thursday, October 06, 2005

FTP mirror systems considered harmful for SUSE Linux releases (or the other way around)

Today it happened. After releasing the final release of SUSE Linux 10.0 the mirror system broke down and is still not reliably functional again. Most of the mirrors either did not manage to get a complete set of files or they became completely overloaded and thus were no longer reliably available.

Why did this happen? The reason is quite simple: The size of synced out files was much to high to be handled by network bandwidth and mirror's disk hardware. The ironic point here is that it was not the size of the distribution alone that killed the mirror system but the fact that every mirror had to sync the same files multiple times. If a mirror mirrored both, suse.com and opensuse.org, it mirrored parts of the 10.0 release 4 times.

This leads into a fatal vicious circle: Most of the mirrors get their files from a GWDG server. When the mirrors don't manage to get an almost complete set of files before the official announcement they are considered unreliable by most users and all users go to the GWDG servers resulting in more and more load there which itself leads to other mirrors receiving files slower and slower.

The redundancy of files on the mirrors results from the fact that SUSE started to provide apart from the FTP tree various sets of ISO files all containing mostly the same files. Optimal solution from a technical point of view would be to remove these files from mirrors completely and to provide a tool that can create these ISOs from the normal FTP tree on the client machine. Because of the fact that most users are considered incapable using such a tool this is currently a no-go.

So we can conclude that the current FTP/rsync based mirror approach is not feasible for the way SUSE Linux is distributed by the mirrors. Instead mirrors should be provided by a tool (e.g. a script) that automatically syncs a minimal set of files and automatically builds all other files that contain redundant information. That way they could sync up with the GWDG server earlier thus being considered more reliable by the users which would remove load from the GWDG servers keeping them fully operational.

First ideas were already considered about that topic. Let's hope that the ideas evolve before the problems evolve.

No comments: