Analyzing Information Diffusion Process over Social Networks
Osaka University and AFOSR/AOARD
2009. 12. 14
302-309, 11:00 AM
The rise of the Internet and the World Wide Web accelerates the creation of various large-scale social networks, where nodes (vertices) correspond to people or some social entities, and links (edges) correspond to social interaction between them. Clearly these social networks reflect complex social structures and distributed social trends. A social network can also play an important role as a medium for the spread of various types of information in the form of so-called ``word-of-mouth'' communications. For example, innovation, hot topics and even malicious rumors can propagate through social networks, and computer viruses can diffuse through email networks. Widely-used information diffusion models are not deterministic but probabilistic. Because of this probabilistic nature, influence of a node over a network is defined in terms of its expectation (average number of nodes it influences). Thus, analyzing how information diffuses over social networks involves heavy computation. First, I show that a typical social network is different from a random network and address the difficulty associated with the information diffusion problem. Second, I explain the networks (blog and Wikipedia) and the information diffusion models (independent cascade, linear threshold and their variants) we used for our analyses. Third, I introduce a notion of bond percolation and explain how to efficiently estimate the influence degree of each node in a large social network. Fourth, I explain how to apply this method to the problem of finding a limited number of influential nodes that are effective for the spread of information using a naturally greedy algorithm. Use of the greedy algorithm is justified by the submodular property of node influence. Theoretical comparison of computational complexity show that the proposed method is expected to achieve a great reduction in computational cost. The experiments demonstrate that the proposed method is in deed much more efficient than the conventional method. Further, the nodes identified as influential are substantially different from those identified by centrality heuristics. Fifth, I address the problem of minimizing the propagation of undesirable things by blocking a limited number of links in a network, a converse problem to the influence maximization problem. This minimization problem is alternative to the problem of preventing the spread of contamination by removing nodes in a network. The same naturally greedy approach can be applied to find a good approximate solution to this problem. I demonstrate experimentally that the proposed method significantly outperforms conventional link-removal methods. What is interesting is that blocking links between nodes with high out-degrees is not necessarily effective. Last, I extend the diffusion model that allows multiple activations and explain how to apply the similar idea to efficiently compute the node influence with supportive experimental results.
This page is
maintained by Ji-seon Yoo (email@example.com).