The veritasium of data.

I came across an interesting fact from the podcast "Seen and the unseen" hosted by Amit Varma; If you search on google "My husband wants me to.." the top autocomplete suggestion in India and Bangladesh is "breastfeed-him". A statistical answer to such a private fact captivated me and led me to follow a rabbit hole into knowing about Seth-Stephen Davidowitz, a trained economist looking at big data from tech giants to answer some difficult questions.

Let's start off by complimenting the cover of the book Everybody Lies(shown below). Pictorial representation of the camouflaged elephant in the room is an apt representation for a book on lying.

Some notable key insights that speak for the revolutionary nature of the book,


1. The effect of cold on depression searches: Cold places search more for depression-related terms than warmer climates. Intuitive right? It goes on further to explain that the effectiveness of warmer weather is twice the effectiveness of anti-depressants!


2. Having a partner with the same set of interests and common friend circle should lead to more stable relationships? No. Facebook studying relationship points the other way. Maybe having distinct friend circles to zone out of the pressures of staying together might be a better input for stable pairing.


3. Top-performing athletes come from difficult neighborhoods? Le Bron? No. There you are more likely to be a top athlete if you come from a well-to-do family. I think I can safely extend this to every other competitive profession. Cultural capital!?



4. The reason why google trumped all other search engines. Pre-google search engines used to count the number of times the search term occurred in a webpage to rank it, so you get things like porn with car written many times when you search for cars. Larry and Sergei devised better metrics, cross-referrals in websites to quantify rank aka PageRank. Rest is history.


5. Looking to buy high-end wine from an auction? You could calculate its price based on an equation that determines its price based on winter rainfall, average growing season temperature, harvest rainfall. Discovered by economist Ashenfelter known as the first law of viticulture.


6. The sharp peak in the usage of " The united states is Vs are " after the civil war. So much history/politics/culture distilled down to a 1-D signal. Amazinggg


7. Fb posts of women used "Tomorrow" more than men. Men are bad at planning ahead? No conclusive arguments but interesting.


8. The power of sentiment analysis and using that to categorize movie plots. Rags to riches (rise), Riches to rags( falls), Cindrella (Rise,fall, rise) etc.


9. A study by Jonah Berger and Katherine L milkman showed that more positive the news article, the more it is shared! Very counter-intuitive! We usually asssume journalism tends to the fatalistic, but that seems to be a wrong paradigm.


10. The hard conflict between traditional surveys where only 25 percent men 8 percent women admit watching pornography and search data saying "porn" is searched for more than "weather". Some truths are more reliable from the internet than surveys? Because we lie.


11. People search for "regret not having kids"  seven times more than "regret having kids" . This data point flips after having kids, where people are 3.6 times more likely to admit to google that they "regret having kids" than "not having kids" . Kids are pain.Period.


12. "Gay"  auto-completes more often the search "Is my husband..." than "cheating" (10% percent more) "alcoholic" (8x more) etc. 


13. Combination of porn, search data points to 5 percent being the total number of gay people. 


14.Among the top PornHub searches by women is a genre of porno that features sex against women. (25 percent of searches by women). 


15. Peak of Stormfront new users ( Racist online forum ) the day Obama sweared in. 


16. The internet leads to segregation is the mantra these days. Websites that cater to the fringes. That has its caveat, you are more likely to meet a person with an opposing political view on a website (45.2%) than as a friend (34 %); where 50% shows perfect desegregation, equally likely to meet either a conservative/liberal. Time to rethink definitions of white nationalists/ left-liberals in the internet age.


17. Unemployment rate tracks child abuse rate, but 2008 recession didn't according to traditional child service data. But search data said otherwise, children searching more about " being hit" tracking unemployment rate, thus shooting. The idea was that reporting dropped in the time. People who report like teachers are overworked/ unemployed to report these during a recession.


18 . Falseties of social media data using popularity of Atlantic(high brow newspaper) Vs the National Enquirer( Sensational newspaper) in social media. The circulations of both are equal but the popularity of the Atlantic is 30 times more than the National Enquirer in social media. People cultivate an image in social media, thus exaggerating the Lying.


19. Outrage over FB newsfeed for being stalker-esque but the data showing a sharp rise in engagement using the news feed feature. People posture, data doesn't.


20. There is a table in Page 157 in the book that tabulates the success of introspecting common knowledge and finding caveats such as : "Zuckerberg getting rich on learning that although people don't want to stalk their friends they actually want to, people want to judge their freind and keep up with them","People dont want to listen about bondage to, 125 million copies of 50 shades of grey" etc. This was a very englightening bit.


21. You are more likely to support the sports team which was dominating when you were around 10. This probability drops off on either side. For example people in their 20's support Chelsea, manchester while gen-z would be fans of man city,liverpoool etc. Vettel Vs Hamilton in F1. The same holds for politics.


22. People who cheat on taxes are near more tax professionals and people who know how to cheat. 


23. People born into college towns like Ann Arbor,Madison,Ithaca,New York etc are more likely to produce exceptional figures( as accounted for wikipedia pages). Another obvious correlation is being in an urban setting.


24. Violent movies should increase crime rates?NO! The exact opposite, crime rates drop when a violent movie is screened. What is the catch?Alchol.No movie theater serves alcohol, thus alcohol-related crimes drop explains it. The takeaway being keeping violent aggressive men apart is the best strategy for combating crime and conflict.


25. Doppelganger search to predict trajectories in sports careers, book recommendations, health procedures etc.


26. The curse of dimensionality: Lot of variables, chasing few data points. Imagine you have 1000 coins and you flip each every day and see if it matches with S&P going down or up. Maybe coin 391 gets it right 70 percent of the times and it is statistically significant. Do you take the 391th coin and flip it from now on to predict the market? No. This is the curse of dimensionality.

 

I structured this review as sort of notes for me to go back to pretty quickly on the ideas in the book. If any of you find any of these insights very intriguing, do check out the book which lays out a lot more than what is mentioned here.

Comments

Popular posts from this blog

Ask for help, not for time.

An alpine hike up Mt.Baldy