Lecture notes for **COMS 6998 - Social Network Economics**

Fall 2013, Columbia University

by A. Chaintreau

M. H. Degroot, “Reaching a Consensus,” Journal of the American Statistical Association, vol. 69, no. 345, pp. 118–121, Mar. 1974.

Summary of contributions:

**The panel expert problem**: You have appointed a panel of experts each with an opinion(x1,…,xn) . How can you obtain a representative opinion∑isixi (where the coefficients(si)i remain to be chosen) that seems a good consensus. The key observation is that not all experts are equally skilled and that, in addition to her own opinion, each expert also has some perception of the skill of other experts.- Proposes
**iterated averaging**as a consensus method. Using experts' judgment on the quality of their peers, this proposes to ask them to weight the opinion of others with theirs in order to refine their opinion. - Proves (THM1-2-3) that
**a consensus is reached**as soon as (i) "states communicate" (i.e. the graph formed by drawing a directed edge from experti to expertj wheni takesj 's opinion into account is a strongly connected graph) and (ii) this graph is not periodic (no integer larger than 1 divides all lengths of all cycles). - The proof can be done either by evoking Perron Frobenius theorem on nonnegative matrices or by interpreting the dynamics as the evolution of an Homogeneous Markov Chain). Moreover, this proves that the weights
(si)i of experts in the final outcome corresponds to the stationary distribution of this Markov Chain, which is also the eigenvector found in Perron Frobenius theorem.

P. M. DeMarzo, D. Vayanos, and J. Zwiebel, “Persuasion Bias, Social Influence, and Unidimensional Opinions,” The Quarterly Journal of Economics, vol. 118, no. 3, pp. 909–968, Aug. 2003.

Summary of contributions:

- Formulates a
**model of naive opinion dynamics**through synchronous averaging. The first step is a Bayesian updates (i.e. it is optimal given the available information). After the first step, this averaging is myopic. It corresponds to agents updating their opinions using their social circle as if the information obtained was entirely new (while in reality it implicitly depends on previous states). **Convergence**can be shown under (i) strongly connected and (ii) self-confidence assumption (THM1) (using the same Markov Chain argument): nodes influence all others according to the same strength, which is the eigenvector of the influence Matrix. This results holds even when we introduce a time-varying self confidence factor that represents how much weight an agent places on her own opinion.**Correctness of consensus**is not guaranteed. It holds (THM2) under a balanced conditions (e.g., for regular networks of agents with identical precision). This is usually not the case, although (THM3) if agents know all the social networks and are individually rational, they can distinguish over time the right signal by solving an LP problem (a result later revisited in E. Mossel and O. Tamuz, “Efficient Bayesian Learning in Social Networks with Gaussian Estimators,” arXiv.org, vol. stat.AP. 03-Feb-2010.).**Dimension collapse**: Even if you start from a multidimensional space, the relative divergences from the common opinion of an agent are all shown to reside in a unidimensional space (THM4) which is related to the second eigenvalue. Not only are nodes not necessarily correct, but they are relatively speaking polarized, which means that the debate reduces to a unidimensional position (e.g., liberal/conservative) which characterize all issues.**Additional facts**: Speed of convergence to consensus and relative divergence are controlled by second and third eigenvalues respectively (THM5). For bilateral communicationqi,j=qj,i , the influence of a node (THM6) corresponds to its degree (or a generalization considering his credit). And relative distance to the mean is same on all dimensions (THM7).

Model and Assumptions:

**Initial belief**for each agentxi(0)=θ+εi whereθ∈ℝL is true state of the world andεi some zero mean error factor, independent among agents and for every dimensions. We defineπ0i=1/E[ϵ2i] as the*precision*(a higher precision denotes an expert with more accurate information), and we assume that nodes truthfully report their initial value.- In addition to her own estimation, each expert
i has**access**to the initial belief from a set of other expertsN(i) (we generally denoteqij=1 iffi listens toj ). Experti estimates the precision of agentsj asπ0i,j . If this estimation is objectively correct, thenπ0i,j=π0j . We assume that all agents also decide to weight their opinion Agents may decide to weight their own opinion differently with time (with a weightλt that may depend on the number of iteration made in this process). **First update:**After one round, and receiving all estimation from her neighbors experti updates her belief toNote thatxi(1)=(1−λ0)∑jqijπ0i,jπ1ixj(0)+λ0xi(0)withπ1i=∑jqijπ0i,j. π1i represents the sum of "precision" accross all information available toi , so it measures the confidencei has after one step of iteration. It is required to normalize the new belief.

This can be rewritten in a matrix form asx(1)=(1−λ0)Tx(0)+λ0x(0) whereTi,j=qi,jπ0i,jπ1i is called the*influence matrix*.- This corresponds to a Bayesian belief update at the first time step, but assuming that nodes naively apply the same rule at any time slot
t , potentially with a different value of self-confidenceλ we can write the evolution of the system as:x(t+1)=T(λt)x(t)whereT(λt)=λt×I+(1−λt)×T.

Main results:

- This process
**converges**(THM1) as soon as (i) the graph defined by adding the edge(i,j) wheneveri listens toj is strongly connected, and (ii)λt>0 and∑t≥01λt=∞ . The limit is independent of the values of(λt)t≥0 and corresponds to the left eigenvector ofT . **Correctness**: Under which conditions is the limit the optimal estimator ofθ ? It is not hard to find counterexamples of estimation of precision are not accurate, so we assume they are. The estimator is the best iff∑jqj,iTj,j=1 (e.g., it holds for regular network with equal precision). This generally is not satisfied by any non-regular structure where some nodes have much larger in-degree and out-degree.**Dimension collapse**: For anyt≥0 and any dimensionl , we introducex¯l(t)=1N∑jxj,l(t) the average opinion in dimensionl , andΔl(t)=1N∑j||xi,l(t)−x¯l(t)|| the mean distance to the average opinion. We can then, for a giveni , definedi,l(t)=xti,l−x¯tlΔtl which represents, for this dimension, how muchi is relatively speaking, positively or negatively different from the average opinion.

The main result is that there exists a vectorD inℝL and a set of real number(ϕi)i for eachi such thatlimt→∞di,l(t)=ϕi⋅Dl . In other words, the relative disagreement vectordi of all nodes becomes with larget arbitrarily closer to the direction defined byD , and occupies a particular position in this unidimensional space that is characterized byϕi .

In many settings such as the one considered above, the entire population as a whole has access to a very large set of information that, if they were centrally known by a single person, would be sufficient to obtain an extremely good estimator of the true state of the world.

Two celebrated papers argue early on mathematically and empirically for this argument in favor of democratic decision, coining the terms "The wisdom of the crowds" in which an estimator that combines many imperfect signals end up being better than any single individual expert, however skilled this expert might be.

M. J. Condorcet, “Essai sur l‘application de l’analyse à la probabilité des décisions rendus à la pluralité des voix,” Imprimerie Royale, 1785.

Essentially formulates a weak law of large numbers in favor of using a democratic process in court.

F. Galton, “Vox Populi,” Nature, vol. 75, no. 1949, pp. 450–451, Mar. 1907.

Empirically compares the average of all proposed answers to an "ox weight guessing competition" organized in a stock and poultry exhibition. Shows that the median is surprisingly close to the truth and the deviation of the answer is comparable to a normal distribution

However, many social systems do not allow information owned by all individuals to be known in a public manner. Instead, information is implicitly aggregated in an interaction process. Whether or not this aggregation is successful depends on various factors: the limitation of one individual's access to others, and the possibility that information is given too much weight if the social interaction process is unbalanced.

B. Golub and M. O. Jackson, “Naïve Learning in Social Networks and the Wisdom of Crowds,” American Economic Journal: Microeconomics, vol. 2, no. 1, pp. 112–149, Feb. 2010.

Can be seen as following up DeMarzo et al. 2003 paper as a starting point and answering the following question: Since the estimator is not necessarily optimal, under which conditions this estimator becomes asymptotically exacts as the population of nodes grow.

Summary of contributions:

- Extends
**convergence**of iterated consensus beyond the case of strongly connected network and aperiodic case. It shows thatT is convergent as long as it is strongly aperiodic (*i.e.*, on any closed subsets, greatest common divisor of lengths of cycles is 1). This is also equivalent to the existence of a row vector(si)1≤i≤n estimating the influence of each node in the associated groups, while other nodes converge to a weighted version. (PROP1-THM3-4). - Defines
**wise crowds**as series of graph/influence matrices for whom the long term evolution always approach the right answer almost surely. Prove that a crowd is wise iff the maximum influence of a node in the crowd asymptotically vanishes as the population grows (*i.e.*,limn→∞maxisi(n)=0 ) (PROP-2), wheresi(n) denotes the weights of nodei in the system containingn nodes. - Provides a
**counterexample**: A*family*(i.e., a collection of subsetsBn defined for alln ) that is small (bounded independently ofn ) and influential (uniformly prominent on all other nodes) is sufficient to block wisdom (PROP 3). - On the positive side, shows that two conditions (balance and minimal dispersion) are
**sufficient**for the wisdom of the crowd (THM1). Without being strictly speaking necessary, counter-examples show that there is no hope to achieve wisdom in general when only one condition holds.

Model and Assumptions:

- Very similar model to DeMarzo et al. above. Every nodes initially has a signal, independent of others, which is a real number centered on
θ with variance≥σ2 . This last assumption allows to avoid cases in which some nodes have individually a perfect estimator and no information aggregation is stricto sensu needed. - Nodes influence by other according to a matrix
(Ti,j)i,j that roughly speaking correspond to self condifenceλt that do not depend ont but may differ among the nodes. It is possible that some nodes have no self confidence. A similar**convergence**result is shown If it is strongly connected, then under aperiodicity we have:∀x0 initial belief of nodeslimt→∞x(t)=Ttx(0)=sTx0 wheres is the unique left eigenvector ofT associated with eigenvalue1 whose (positive) entries sum to1 . **Notation**:- Since we consider sequences of larger and larger populations we generally introduce the notation
T(n) for the associated influence matrix, ands(n) for the associated row eigenvector that also represents the weight of each node in the final consensus. - For any subset
A andB , we defineTA,B=∑i∈A,j∈BTi,j which is roughly speaking how much as a whole the nodes from subsetA weights the opinion of all nodes inB in their updated average. - Similarly, we define
sA=∑i∈Asi as the influence of this subset. - For any influence matrix
T , we say thatB isα -prominent ifTi,B≥α for anyi∉B .

- Since we consider sequences of larger and larger populations we generally introduce the notation

Proof of counterexample (PROP3):

We assume **finite** (i.e. **uniformly prominent**:

We wish to prove it implies that the wisdom of the crowd can fail in that case.

- According to PROP2 above, it is sufficient to show that
si(n) does not go to zero for all nodes. - Note first that for any influence matrix,
B α -prominent implies thatsB≥α1−α .

Indeed, the influencesi sums to 1, and we have, sinces is a left eigenvector ofT that∀i,si=∑jTj,isj≥∑j∉BTj,isj hence∑i∈Bsi≥∑i∈B∑j∉BTj,isj≥α∑j∉Bsj ifB isα -prominent.

Since∑j∉Bsj=1−∑i∈Bsi , this sum is bounded away from zero as follows:sB≥α1+α . - This implies immediately
sBn(n)≥α1+α becauseBn isα -prominent for an iterate ofT(n) , which will have the same left eigenvector associated with1 . - Finally, we observe that this implies
maxi∈Bnsi(n)≥1|B|α1+α . A family that is finite and uniformly prominent hence satisfies that one node in it receives at least a positive fraction of influence and it proves the result.

Proof of sufficient conditions for wisdom of the crowd (THM1):

We wish to prove that a crowd is wise if we have

(i) **min dispersion**: There exists

(ii) **balance**:

NB: Min dispersion implies that no family (except very small one) misses to put weight at least

r on any subset that misses a negligible fraction of nodes. Balance ensures that the weights given by others to a family cannot be arbitrarily far from the weight this family gives back, except when it is extremely.We assume without loss of generality that nodes are reordered so that

si≥si+1 for alli . Note that we can assume without loss of generality as well thatj(n)/n→0 .**STEP1**. We first deduce there existskn such thatk(n)≥q,k(n)/n→0,k(n)×sk(n)(n)→0 .

We note, that for anyδ>0 the property{∀i∈[q;j(n)],i×si(n)≥δ} can only be satisfied by a finite number ofn . This is because this property implies∑j(n)i=qsi(n)≥∑j(n)i=qδi , the LHS is finite while the RHS diverges asn grows large.

We deduce∀δ>0 ,Zδ,n={∀i∈[q;j(n)],i×si(n)<δ} is non empty as soon asn≥nδ=sup{m|Zδ,m=∅}+1 . Note that asδ monotically decreases to zero,n monotically increases to∞ . This implies that for anyn ,δn=inf{δ|nδ≤n} approaches zero asn grows large.

Note that by monotonicity,Zδ′,n≠∅ as long asδ′>δn . We can then choosek(n) to be an arbitrary element ofZδn+12n,n . It proves the claim above.**STEP2**. We now deduce thatsq(n) vanishes asn grows large.

We first observe that for any complementary setsH andL=HC we have the*weighted balance law*that is satisfied:We now introduce∑j∈HsjTj,L=∑i∈LsiTi,H. Hn containing thek(n) nodes with most influence andLn its complementary set. Note that sincek(n)/n→0 the condition for balance applies, so coefficientT in above inequality account for a constant.Let

Bn be the set containing theq nodes with most influence, it is contained inHn for sufficiently largen .We have

∑j∈Hnsj(n)Tj,Ln≥∑j∈Bnsj(n)Tj,Ln≥sq(n)TBn,Ln≥sq(n)r using the minimal dispersion. But the LHS is also equal to∑i∈Lnsi(n)Ti,Hn≤sk(n)TLn,Hn . The last term on the RHS can be large sinceLn contains a lot of nodes that may put a lot of weight onHn , but by balance, it is no more than a constant timeTHn,Ln which is bounded by|Hn|=k(n) . This together implies thatlimn→∞sq(n)=0 .**STEP3**. Letk be the largest integer so thatlim supsk(n)>0 . If we introduceHn the set{1,…,k} with its complementaryLn , we can again applied the weighted balance rule to obtainsk(n)THn,Ln≤∑i∈Hnsi(n)Ti,Ln=∑j∈Lnsj(n)Tj,Hn≤sk+1(n)TLn,Hn . This implies a contradiction since coefficientT s are within a constant, the LHS diverges while the RHS converges. It proves that for anyk limsk(n)=0 , which proves the result.NB: originally the proof uses that the weighted balance equality may be written

FHTH,L=FLTL,H where, for any subsetA ,FA is∑i∈AsiTi,A¯TA,A¯ but this ratio may somewhat add confusion and is not needed so it's avoided here.

**Herding**

A classic setting for social learning is the following: each agent receives a private signal and is able to observe some actions made by others (such as in a sequential irreversible public choices). In this setting, even when agents are perfectly rational (i.e. they stricly apply Bayesian rule), the system may enter states in which information is not aggregated. Although collectively the population possess sufficient information to make the best guess with high probability, the process can enter after a few steps a state in which all agents choose the same action although it is suboptimal. This phenomenon is sometimes referred to as *herding* or *information(al) cascade*; the former terminology is by far more descriptive and should be preferred.

This first appeared in

A. Banerjee, “A Simple Model of Herd Behavior,” The Quarterly Journal of Economics, vol. 107, no. 3, pp. 797–817, Aug. 1992.

S. Bikhchandani, D. Hirshleifer, and I. Welch, “A theory of fads, fashion, custom, and cultural change as informational cascades,” Journal of political Economy, vol. 100, no. 5, pp. 992–1026, 1992.

And it was recently exposed in textbook:

D. Easley and J. M. Kleinberg, Networks, Crowds, and Markets. Cambridge University Press, 2010. (Chap. 16)

**Revisiting herding on social networks**

Most works on herding assume that agents see actions of all others in the past, while in many situations each agent may only see a subset (i.e., her friend).

D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar, “Bayesian learning in social networks,” The Review of Economic Studies, vol. 78, no. 4, pp. 1201–1236, 2011.

I. Lobel and E. Sadler, “Social learning and network uncertainty,” Available at SSRN 2102401, 2012.

I. Lobel and E. Sadler, “Preferences, Homophily, and Social Learning,” 2013.

D. Gale and S. Kariv, “Bayesian learning in social networks,” Games and Economic Behavior, vol. 45, no. 2, pp. 329–346, Nov. 2003.

A recent

E. Mossel, A. Sly, and O. Tamuz, “Asymptotic learning on Bayesian social networks,” Probability theory and related fields, pp. 1–31, 2012.

**Variants of naive learning**

G. Ellison and D. Fudenberg, “Rules of Thumb for Social Learning,” Journal of political Economy, vol. 101, no. 4, pp. 612–643, Aug. 1993.

G. Ellison and D. Fudenberg, “Word-of-Mouth Communication and Social Learning,” The Quarterly Journal of Economics, 1995.

V. Bala and S. Goyal, “Learning from Neighbours,” Rev Econ Studies, vol. 65, no. 3, pp. 595–621, Jul. 1998.

A. Banerjee and D. Fudenberg, “Word-of-mouth learning,” Games and Economic Behavior, vol. 46, no. 1, pp. 1–22, Jan. 2004.

A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, “Non-Bayesian social learning,” Games and Economic Behavior, vol. 76, no. 1, pp. 210–225, Sep. 2012.

The robustness and simplicity of iterated averaging lead to a broad interest in designing similar algorithm for various communication computation tasks. They are sometimes refered to as gossip algorithms, rumor mongering, epidemic algorithm, or distributed averaging. It would be impossible to describe all the works done in this area since more than two decades, I'm focusing on very classical results and topics that have seen recent advances.

**Gossip algorithms for efficient communication**

A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry, “Epidemic algorithms for replicated database maintenance,” PODC '87: Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, Dec. 1987.

is generally considered the first examples of application of this principle to the design of distributed system (in this case large databases).

Various works have considered how the topology of the graph constraining the epidemics affects the speed and output of an epidemics process, such as the propagation of a computer virus. Among them:

A. Ganesh, L. Massoulié, and D. Towsley, “The effect of network topology on the spread of epidemics,” INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, vol. 2, pp. 1455–1466, 2005.

Present several bounds based on the spectral property of the graph to predict when is an epidemic long-lived and it disappears quickly.

**More on distributed consensus**

A. Tahbaz-Salehi and A. Jadbabaie, “A Necessary and Sufficient Condition for Consensus Over Random Networks,” IEEE Transactions on Automatic Control, vol. 53, no. 3, pp. 791–795, 2008.

provides a refined conditions based on coefficient of ergodicity of the sequences of matrices that leads to a consensus.

A. Olshevsky and J. N. Tsitsiklis, “Convergence Speed in Distributed Consensus and Averaging,” SIAM J. Control Optim., vol. 48, no. 1, pp. 33–55, Jan. 2009.

Establish general bounds on the convergence time of distributed averaging.

D. Shah, “Gossip Algorithms,” Foundations and Trends® in Networking, vol. 3, no. 1, 2009.

A survey of results in the area with applications to communications, averaging, dealing especially with the case when the gossip algorithms is restricted to a given graph.

**Algorithmic complexity of rumor spreading:**

R. M. Karp, C. Schindelhauer, S. Shenker, and B. Vocking, “Randomized rumor spreading,” presented at the Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, 2000, pp. 565–574.

Provides tight bounds on the speed and cost (in number of messages) of broadcast in a complete network using gossip algorithms. Introduces a push/pull stages that correspond to the beginning and end of a broadcast. This leaves open the complexity of similar algorithms on arbitrary graphs.

A long standing problem that only got cracked recently is to derive the exact conditions on a graph that allows for a rumor spreading algorithm to achieve a logarithmic time and cost.

F. Chierichetti, S. Lattanzi, and A. Panconesi, “Rumor spreading in social networks,” Proceedings of ICALP, pp. 375–386, 2009. F. Chierichetti, S. Lattanzi, and A. Panconesi, “Almost tight bounds for rumour spreading with conductance,” Proceedings of the 42nd ACM symposium on Theory of computing, pp. 399–408, 2010. G. Giakkoupis, “Tight Bounds for Rumor Spreading with Vertex Expansion,” SODA '14: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014.

A recent work analyzed gossip on a variant of the preferential attachment model (that reproduced power law degree distribution as well as some other social properties). It proved the striking result that rumor spreads faster than logarithmically, by exploiting nodes with very large degree or hubs.

B. Doer, M. Fouz, and T. Friedrich, “Why rumors spread so quickly in social networks,” Communications of the ACM, vol. 55, no. 6, p. 70, Jun. 2012.

Written with StackEdit.