Lecture notes for COMS 6998 - Social Network Economics
Fall 2013, Columbia University
by A. Chaintreau
Part A.1 - Consensus and the Wisdom of the Crowd
General review of results on consensus with landmarks papers
Original paper proposing iterated consensus
M. H. Degroot, “Reaching a Consensus,” Journal of the American Statistical Association, vol. 69, no. 345, pp. 118–121, Mar. 1974.
Summary of contributions:
- The panel expert problem: You have appointed a panel of experts each with an opinion (x1,…,xn). How can you obtain a representative opinion ∑isixi (where the coefficients (si)i remain to be chosen) that seems a good consensus. The key observation is that not all experts are equally skilled and that, in addition to her own opinion, each expert also has some perception of the skill of other experts.
- Proposes iterated averaging as a consensus method. Using experts' judgment on the quality of their peers, this proposes to ask them to weight the opinion of others with theirs in order to refine their opinion.
- Proves (THM1-2-3) that a consensus is reached as soon as (i) "states communicate" (i.e. the graph formed by drawing a directed edge from expert i to expert j when i takes j's opinion into account is a strongly connected graph) and (ii) this graph is not periodic (no integer larger than 1 divides all lengths of all cycles).
- The proof can be done either by evoking Perron Frobenius theorem on nonnegative matrices or by interpreting the dynamics as the evolution of an Homogeneous Markov Chain). Moreover, this proves that the weights (si)i of experts in the final outcome corresponds to the stationary distribution of this Markov Chain, which is also the eigenvector found in Perron Frobenius theorem.
P. M. DeMarzo, D. Vayanos, and J. Zwiebel, “Persuasion Bias, Social Influence, and Unidimensional Opinions,” The Quarterly Journal of Economics, vol. 118, no. 3, pp. 909–968, Aug. 2003.
Summary of contributions:
- Formulates a model of naive opinion dynamics through synchronous averaging. The first step is a Bayesian updates (i.e. it is optimal given the available information). After the first step, this averaging is myopic. It corresponds to agents updating their opinions using their social circle as if the information obtained was entirely new (while in reality it implicitly depends on previous states).
- Convergence can be shown under (i) strongly connected and (ii) self-confidence assumption (THM1) (using the same Markov Chain argument): nodes influence all others according to the same strength, which is the eigenvector of the influence Matrix. This results holds even when we introduce a time-varying self confidence factor that represents how much weight an agent places on her own opinion.
- Correctness of consensus is not guaranteed. It holds (THM2) under a balanced conditions (e.g., for regular networks of agents with identical precision). This is usually not the case, although (THM3) if agents know all the social networks and are individually rational, they can distinguish over time the right signal by solving an LP problem (a result later revisited in E. Mossel and O. Tamuz, “Efficient Bayesian Learning in Social Networks with Gaussian Estimators,” arXiv.org, vol. stat.AP. 03-Feb-2010.).
- Dimension collapse: Even if you start from a multidimensional space, the relative divergences from the common opinion of an agent are all shown to reside in a unidimensional space (THM4) which is related to the second eigenvalue. Not only are nodes not necessarily correct, but they are relatively speaking polarized, which means that the debate reduces to a unidimensional position (e.g., liberal/conservative) which characterize all issues.
- Additional facts: Speed of convergence to consensus and relative divergence are controlled by second and third eigenvalues respectively (THM5). For bilateral communication qi,j=qj,i, the influence of a node (THM6) corresponds to its degree (or a generalization considering his credit). And relative distance to the mean is same on all dimensions (THM7).
Model and Assumptions:
- Initial belief for each agent xi(0)=θ+εi where θ∈ℝL is true state of the world and εi some zero mean error factor, independent among agents and for every dimensions.
We define π0i=1/E[ϵ2i] as the precision (a higher precision denotes an expert with more accurate information), and we assume that nodes truthfully report their initial value.
- In addition to her own estimation, each expert i has access to the initial belief from a set of other experts N(i) (we generally denote qij=1 iff i listens to j). Expert i estimates the precision of agents j as π0i,j. If this estimation is objectively correct, then π0i,j=π0j. We assume that all agents also decide to weight their opinion Agents may decide to weight their own opinion differently with time (with a weight λt that may depend on the number of iteration made in this process).
- First update: After one round, and receiving all estimation from her neighbors expert i updates her belief to
xi(1)=(1−λ0)∑jqijπ0i,jπ1ixj(0)+λ0xi(0)withπ1i=∑jqijπ0i,j.
Note that π1i represents the sum of "precision" accross all information available to i, so it measures the confidence i has after one step of iteration. It is required to normalize the new belief.
This can be rewritten in a matrix form as x(1)=(1−λ0)Tx(0)+λ0x(0) where Ti,j=qi,jπ0i,jπ1i is called the influence matrix.
- This corresponds to a Bayesian belief update at the first time step, but assuming that nodes naively apply the same rule at any time slot t, potentially with a different value of self-confidence λ we can write the evolution of the system as:
x(t+1)=T(λt)x(t)whereT(λt)=λt×I+(1−λt)×T.
Main results:
- This process converges (THM1) as soon as (i) the graph defined by adding the edge (i,j) whenever i listens to j is strongly connected, and (ii) λt>0 and ∑t≥01λt=∞. The limit is independent of the values of (λt)t≥0 and corresponds to the left eigenvector of T.
- Correctness: Under which conditions is the limit the optimal estimator of θ? It is not hard to find counterexamples of estimation of precision are not accurate, so we assume they are. The estimator is the best
iff ∑jqj,iTj,j=1 (e.g., it holds for regular network with equal precision). This generally is not satisfied by any non-regular structure where some nodes have much larger in-degree and out-degree.
- Dimension collapse: For any t≥0 and any dimension l, we introduce
x¯l(t)=1N∑jxj,l(t) the average opinion in dimension l, and Δl(t)=1N∑j||xi,l(t)−x¯l(t)|| the mean distance to the average opinion.
We can then, for a given i, define
di,l(t)=xti,l−x¯tlΔtl which represents, for this dimension, how much i is relatively speaking, positively or negatively different from the average opinion.
The main result is that there exists a vector D in ℝL and a set of real number (ϕi)i for each i such that limt→∞di,l(t)=ϕi⋅Dl. In other words, the relative disagreement vector di of all nodes becomes with large t arbitrarily closer to the direction defined by D, and occupies a particular position in this unidimensional space that is characterized by ϕi.
Asymptotic correctness: the Wisdom of the crowd
Motivation and origin
In many settings such as the one considered above, the entire population as a whole has access to a very large set of information that, if they were centrally known by a single person, would be sufficient to obtain an extremely good estimator of the true state of the world.
Two celebrated papers argue early on mathematically and empirically for this argument in favor of democratic decision, coining the terms "The wisdom of the crowds" in which an estimator that combines many imperfect signals end up being better than any single individual expert, however skilled this expert might be.
M. J. Condorcet, “Essai sur l‘application de l’analyse à la probabilité des décisions rendus à la pluralité des voix,” Imprimerie Royale, 1785.
Essentially formulates a weak law of large numbers in favor of using a democratic process in court.
F. Galton, “Vox Populi,” Nature, vol. 75, no. 1949, pp. 450–451, Mar. 1907.
Empirically compares the average of all proposed answers to an "ox weight guessing competition" organized in a stock and poultry exhibition. Shows that the median is surprisingly close to the truth and the deviation of the answer is comparable to a normal distribution
However, many social systems do not allow information owned by all individuals to be known in a public manner. Instead, information is implicitly aggregated in an interaction process. Whether or not this aggregation is successful depends on various factors: the limitation of one individual's access to others, and the possibility that information is given too much weight if the social interaction process is unbalanced.
Wisdom of the crowd in iterated consensus
B. Golub and M. O. Jackson, “Naïve Learning in Social Networks and the Wisdom of Crowds,” American Economic Journal: Microeconomics, vol. 2, no. 1, pp. 112–149, Feb. 2010.
Can be seen as following up DeMarzo et al. 2003 paper as a starting point and answering the following question: Since the estimator is not necessarily optimal, under which conditions this estimator becomes asymptotically exacts as the population of nodes grow.
Summary of contributions:
- Extends convergence of iterated consensus beyond the case of strongly connected network and aperiodic case. It shows that T is convergent as long as it is strongly aperiodic (i.e., on any closed subsets, greatest common divisor of lengths of cycles is 1). This is also equivalent to the existence of a row vector (si)1≤i≤n estimating the influence of each node in the associated groups, while other nodes converge to a weighted version. (PROP1-THM3-4).
- Defines wise crowds as series of graph/influence matrices for whom the long term evolution always approach the right answer almost surely. Prove that a crowd is wise iff the maximum influence of a node in the crowd asymptotically vanishes as the population grows (i.e., limn→∞maxisi(n)=0) (PROP-2), where si(n) denotes the weights of node i in the system containing n nodes.
- Provides a counterexample: A family (i.e., a collection of subsets Bn defined for all n) that is small (bounded independently of n) and influential (uniformly prominent on all other nodes) is sufficient to block wisdom (PROP 3).
- On the positive side, shows that two conditions (balance and minimal dispersion) are sufficient for the wisdom of the crowd (THM1). Without being strictly speaking necessary, counter-examples show that there is no hope to achieve wisdom in general when only one condition holds.
Model and Assumptions:
- Very similar model to DeMarzo et al. above. Every nodes initially has a signal, independent of others, which is a real number centered on θ with variance ≥σ2. This last assumption allows to avoid cases in which some nodes have individually a perfect estimator and no information aggregation is stricto sensu needed.
- Nodes influence by other according to a matrix (Ti,j)i,j that roughly speaking correspond to self condifence λt that do not depend on t but may differ among the nodes. It is possible that some nodes have no self confidence. A similar convergence result is shown If it is strongly connected, then under aperiodicity we have: ∀x0 initial belief of nodes limt→∞x(t)=Ttx(0)=sTx0 where s is the unique left eigenvector of T associated with eigenvalue 1 whose (positive) entries sum to 1.
- Notation:
- Since we consider sequences of larger and larger populations we generally introduce the notation T(n) for the associated influence matrix, and s(n) for the associated row eigenvector that also represents the weight of each node in the final consensus.
- For any subset A and B, we define TA,B=∑i∈A,j∈BTi,j which is roughly speaking how much as a whole the nodes from subset A weights the opinion of all nodes in B in their updated average.
- Similarly, we define sA=∑i∈Asi as the influence of this subset.
- For any influence matrix T, we say that B is α-prominent if Ti,B≥α for any i∉B.
Proof of counterexample (PROP3):
We assume Bn is finite (i.e. supn|Bn|<∞) and uniformly prominent:
∃α>0such that∀n,∃tnsuch that(T(n)tn)i,Bn≥αfor any i∉B.
(This is equivalent to say that
Bn is always
α-prominent for some iterate of
T(n), with
α>0 independent of
n.)
We wish to prove it implies that the wisdom of the crowd can fail in that case.
- According to PROP2 above, it is sufficient to show that si(n) does not go to zero for all nodes.
- Note first that for any influence matrix, B α-prominent implies that sB≥α1−α.
Indeed, the influence si sums to 1, and we have, since s is a left eigenvector of T that
∀i,si=∑jTj,isj≥∑j∉BTj,isj
hence
∑i∈Bsi≥∑i∈B∑j∉BTj,isj≥α∑j∉Bsj if B is α-prominent.
Since ∑j∉Bsj=1−∑i∈Bsi, this sum is bounded away from zero as follows: sB≥α1+α.
- This implies immediately sBn(n)≥α1+α because Bn is α-prominent for an iterate of T(n), which will have the same left eigenvector associated with 1.
- Finally, we observe that this implies maxi∈Bnsi(n)≥1|B|α1+α. A family that is finite and uniformly prominent hence satisfies that one node in it receives at least a positive fraction of influence and it proves the result.
Proof of sufficient conditions for wisdom of the crowd (THM1):
We wish to prove that a crowd is wise if we have
(i) min dispersion: There exists q,r>0 such that (T(n))Bn,Cn≥r whenever |Bn|≥q and |Cn|n→n→∞1.
(ii) balance: ∃jn→∞ as n→∞ such that Bn≤jn implies ∃A>0 such that
TBn¯,Bn(n)<A×TBn,Bn¯(n).
NB: Min dispersion implies that no family (except very small one) misses to put weight at least r on any subset that misses a negligible fraction of nodes. Balance ensures that the weights given by others to a family cannot be arbitrarily far from the weight this family gives back, except when it is extremely.
We assume without loss of generality that nodes are reordered so that si≥si+1 for all i. Note that we can assume without loss of generality as well that j(n)/n→0.
STEP1. We first deduce there exists kn such that k(n)≥q,k(n)/n→0,k(n)×sk(n)(n)→0.
We note, that for any δ>0 the property {∀i∈[q;j(n)],i×si(n)≥δ} can only be satisfied by a finite number of n. This is because this property implies ∑j(n)i=qsi(n)≥∑j(n)i=qδi, the LHS is finite while the RHS diverges as n grows large.
We deduce ∀δ>0, Zδ,n={∀i∈[q;j(n)],i×si(n)<δ} is non empty as soon as n≥nδ=sup{m|Zδ,m=∅}+1. Note that as δ monotically decreases to zero, n monotically increases to ∞. This implies that for any n, δn=inf{δ|nδ≤n} approaches zero as n grows large.
Note that by monotonicity, Zδ′,n≠∅ as long as δ′>δn. We can then choose k(n) to be an arbitrary element of Zδn+12n,n. It proves the claim above.
STEP2. We now deduce that sq(n) vanishes as n grows large.
We first observe that for any complementary sets H and L=HC we have the weighted balance law that is satisfied:
∑j∈HsjTj,L=∑i∈LsiTi,H.
We now introduce Hn containing the k(n) nodes with most influence and Ln its complementary set. Note that since k(n)/n→0 the condition for balance applies, so coefficient T in above inequality account for a constant.
Let Bn be the set containing the q nodes with most influence, it is contained in Hn for sufficiently large n.
We have
∑j∈Hnsj(n)Tj,Ln≥∑j∈Bnsj(n)Tj,Ln≥sq(n)TBn,Ln≥sq(n)r
using the minimal dispersion. But the LHS is also equal to
∑i∈Lnsi(n)Ti,Hn≤sk(n)TLn,Hn. The last term on the RHS can be large since Ln contains a lot of nodes that may put a lot of weight on Hn, but by balance, it is no more than a constant time THn,Ln which is bounded by |Hn|=k(n).
This together implies that limn→∞sq(n)=0.
STEP3. Let k be the largest integer so that lim supsk(n)>0. If we introduce Hn the set {1,…,k} with its complementary Ln, we can again applied the weighted balance rule to obtain
sk(n)THn,Ln≤∑i∈Hnsi(n)Ti,Ln=∑j∈Lnsj(n)Tj,Hn≤sk+1(n)TLn,Hn.
This implies a contradiction since coefficient Ts are within a constant, the LHS diverges while the RHS converges. It proves that for any k limsk(n)=0, which proves the result.
NB: originally the proof uses that the weighted balance equality may be written
FHTH,L=FLTL,H
where, for any subset A, FA is ∑i∈AsiTi,A¯TA,A¯
but this ratio may somewhat add confusion and is not needed so it's avoided here.
Going further : More on Social Learning
Herding
A classic setting for social learning is the following: each agent receives a private signal and is able to observe some actions made by others (such as in a sequential irreversible public choices). In this setting, even when agents are perfectly rational (i.e. they stricly apply Bayesian rule), the system may enter states in which information is not aggregated. Although collectively the population possess sufficient information to make the best guess with high probability, the process can enter after a few steps a state in which all agents choose the same action although it is suboptimal. This phenomenon is sometimes referred to as herding or information(al) cascade; the former terminology is by far more descriptive and should be preferred.
This first appeared in
A. Banerjee, “A Simple Model of Herd Behavior,” The Quarterly Journal of Economics, vol. 107, no. 3, pp. 797–817, Aug. 1992.
S. Bikhchandani, D. Hirshleifer, and I. Welch, “A theory of fads, fashion, custom, and cultural change as informational cascades,” Journal of political Economy, vol. 100, no. 5, pp. 992–1026, 1992.
And it was recently exposed in textbook:
D. Easley and J. M. Kleinberg, Networks, Crowds, and Markets. Cambridge University Press, 2010. (Chap. 16)
Revisiting herding on social networks
Most works on herding assume that agents see actions of all others in the past, while in many situations each agent may only see a subset (i.e., her friend).
D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar, “Bayesian learning in social networks,” The Review of Economic Studies, vol. 78, no. 4, pp. 1201–1236, 2011.
I. Lobel and E. Sadler, “Social learning and network uncertainty,” Available at SSRN 2102401, 2012.
I. Lobel and E. Sadler, “Preferences, Homophily, and Social Learning,” 2013.
D. Gale and S. Kariv, “Bayesian learning in social networks,” Games and Economic Behavior, vol. 45, no. 2, pp. 329–346, Nov. 2003.
A recent
E. Mossel, A. Sly, and O. Tamuz, “Asymptotic learning on Bayesian social networks,” Probability theory and related fields, pp. 1–31, 2012.
Variants of naive learning
G. Ellison and D. Fudenberg, “Rules of Thumb for Social Learning,” Journal of political Economy, vol. 101, no. 4, pp. 612–643, Aug. 1993.
G. Ellison and D. Fudenberg, “Word-of-Mouth Communication and Social Learning,” The Quarterly Journal of Economics, 1995.
V. Bala and S. Goyal, “Learning from Neighbours,” Rev Econ Studies, vol. 65, no. 3, pp. 595–621, Jul. 1998.
A. Banerjee and D. Fudenberg, “Word-of-mouth learning,” Games and Economic Behavior, vol. 46, no. 1, pp. 1–22, Jan. 2004.
A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, “Non-Bayesian social learning,” Games and Economic Behavior, vol. 76, no. 1, pp. 210–225, Sep. 2012.
The robustness and simplicity of iterated averaging lead to a broad interest in designing similar algorithm for various communication computation tasks. They are sometimes refered to as gossip algorithms, rumor mongering, epidemic algorithm, or distributed averaging. It would be impossible to describe all the works done in this area since more than two decades, I'm focusing on very classical results and topics that have seen recent advances.
Gossip algorithms for efficient communication
A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Epidemic algorithms for replicated database maintenance,” PODC '87: Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, Dec. 1987.
is generally considered the first examples of application of this principle to the design of distributed system (in this case large databases).
Various works have considered how the topology of the graph constraining the epidemics affects the speed and output of an epidemics process, such as the propagation of a computer virus. Among them:
A. Ganesh, L. Massoulié, and D. Towsley, “The effect of network topology on the spread of epidemics,” INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, vol. 2, pp. 1455–1466, 2005.
Present several bounds based on the spectral property of the graph to predict when is an epidemic long-lived and it disappears quickly.
More on distributed consensus
A. Tahbaz-Salehi and A. Jadbabaie, “A Necessary and Sufficient Condition for Consensus Over Random Networks,” IEEE Transactions on Automatic Control, vol. 53, no. 3, pp. 791–795, 2008.
provides a refined conditions based on coefficient of ergodicity of the sequences of matrices that leads to a consensus.
A. Olshevsky and J. N. Tsitsiklis, “Convergence Speed in Distributed Consensus and Averaging,” SIAM J. Control Optim., vol. 48, no. 1, pp. 33–55, Jan. 2009.
Establish general bounds on the convergence time of distributed averaging.
D. Shah, “Gossip Algorithms,” Foundations and Trends® in Networking, vol. 3, no. 1, 2009.
A survey of results in the area with applications to communications, averaging, dealing especially with the case when the gossip algorithms is restricted to a given graph.
Algorithmic complexity of rumor spreading:
R. M. Karp, C. Schindelhauer, S. Shenker, and B. Vocking, “Randomized rumor spreading,” presented at the Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, 2000, pp. 565–574.
Provides tight bounds on the speed and cost (in number of messages) of broadcast in a complete network using gossip algorithms. Introduces a push/pull stages that correspond to the beginning and end of a broadcast. This leaves open the complexity of similar algorithms on arbitrary graphs.
A long standing problem that only got cracked recently is to derive the exact conditions on a graph that allows for a rumor spreading algorithm to achieve a logarithmic time and cost.
F. Chierichetti, S. Lattanzi, and A. Panconesi, “Rumor spreading in social networks,” Proceedings of ICALP, pp. 375–386, 2009.
F. Chierichetti, S. Lattanzi, and A. Panconesi, “Almost tight bounds for rumour spreading with conductance,” Proceedings of the 42nd ACM symposium on Theory of computing, pp. 399–408, 2010.
G. Giakkoupis, “Tight Bounds for Rumor Spreading with Vertex Expansion,” SODA '14: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014.
A recent work analyzed gossip on a variant of the preferential attachment model (that reproduced power law degree distribution as well as some other social properties). It proved the striking result that rumor spreads faster than logarithmically, by exploiting nodes with very large degree or hubs.
B. Doer, M. Fouz, and T. Friedrich, “Why rumors spread so quickly in social networks,” Communications of the ACM, vol. 55, no. 6, p. 70, Jun. 2012.
Written with StackEdit.