Network clustering: Algorithms, modeling, and applications

Date of Completion

January 2010


Engineering, Computer|Computer Science




Recent research has shown that spatial clustering features have presented in many large scale distributed networks, such as the Internet, peer-to-peer networks, and wireless sensor networks. Topologies of such networks can be partitioned into "densely" intra-connected clusters which are "sparsely" inter-connected. Understanding these clustering features could greatly facilitate various networking research areas. However, they are far from being well studied, mainly due to the lack of good network clustering algorithms. In this dissertation, we tackle the challenge of network clustering algorithm design by introducing a new clustering algorithm, SAGA, and its distributed version, SDC. We then further apply network clustering into different research areas. ^ Our work consists of three research thrusts: (1) Effective clustering algorithm design; (2) Clustering-based Internet topology modeling; (3) Scalable and efficient hierarchical p2p file sharing. In the first thrust, we address the fundamental problem of network clustering. We present a novel centralized clustering algorithm, called SACA, and prove that it can satisfy all the desired design goals. One advantage of SACA over other centralized algorithms is that it does not require global topology information. Inspired by this decentralized nature of SAGA, we develop a fully distributed algorithm, called SDC, which can be readily deployed into large-scale distributed systems. In the second thrust of this dissertation, we apply network clustering into Internet topology modeling. Clustering features are significant properties of the Internet topology, but very little research effort is devoted into the large scale clustering features, which results in the lack of realistic topology generation model. In our work, we provide comprehensive characterizations on the clustering features in the AS-level Internet topology and present a realistic topology generation model based on the characterized clustering features. We prove that our model can reproduce all the existing properties of the AS-level Internet topology. In the third thrust of our work, we utilize our distributed clustering method SDC to enhance the performance of hierarchical p2p file sharing systems. Network clustering is a common technique in hierarchical p2p systems. We develop a network clustering protocol PPDC based on SDC for PSON, a powerful p2p file sharing system proposed in our previous work. We show that a good network clustering protocol can significantly improve the scalability and efficiency of PSON. Besides network clustering, we further improve the performance of PSON with an effective load balancing mechanism. ^ In this dissertation, we will present these three thrusts of work in detail. We will also discuss some future directions that are closely related to our work. ^