Mark Zuckerberg, CEO of Facebook, testified before the United States Congress yesterday about Facebook’s response to the Cambridge Analytics data scandal. We’ve collected the details data scientists should know about the Zuckerberg testimony.
In a nutshell:
- Zuck performed better than expected by calmly answering all questions and effectively advocating for Facebook; he repeated, “We do not sell data to advertisers,” five times
- The Senators were uninformed about technology issues while being laser-focused on cultural issues, like Senator Ted Cruz’s questions about conservative Facebook pages being shut down
- Facebook is not prepared for GDPR and Zuckerberg agrees additional US regulation is needed
The hearing included 44 Senators from the Commerce Committee and Judiciary Committee on April 10th, 2018. The Senators are investigating the Cambridge Analytica scandal while promoting three bills related to online data privacy.
Senator Chuck Grassley’s opening remarks illustrate clearly why this hearing is critical to all involved in data science:
“Although not unprecedented, this is a unique hearing. The issues we will consider range from data privacy and security to consumer protection and the Federal Trade Commission enforcement touching on jurisdictions of these two committees.”
These are key issues for our industry that we’ll discuss in detail, including Zuckerberg’s straw men arguments about data, the implications of this testimony for upcoming regulation of silicon valley, and some great nuggets from his notes.
To read a full transcript of the hearing and opening remarks, check out the Washington Post.
Zuckerberg’s Straw Men
There are a few themes Zuckerberg frequently repeated during his first day of testimony: that users have always had full control over their data, Facebook doesn’t sell data to advertisers, and that the data breach only affected publicly available information. The overarching narrative Zuckerberg presented is that you have full control of your data and only license it to Facebook until you revoke it.
Early on, Zuckerberg told Senator Nelson, “Senator, people have a control over how their information is used in ads in the product today.”
Zuckerberg returned to this theme very early in the hearing, “the first line of our Terms of Service say that you control and own the information and content that you put on Facebook.”
None of the Senators asked cogent follow up questions around the ownership and control of data. It is clear from the public response to the Cambridge Analytica data scraping that people were unaware of the amount of data they were sharing. This question of data ownership and control is certainly going to be key to any new US regulations.
Zuckerberg said no less than five times, “we do not sell data to advertisers. We don’t sell data to anyone.” Senator Cornyn picked up on this red herring and immediately replied, “Well, you clearly rent it.”
Zuckerberg replied, “What we allow is for advertisers to tell us who they want to reach, and then we do the placement. So, if an advertiser comes to us and says, “All right, I am a ski shop and I want to sell skis to women,” then we might have some sense, because people shared skiing-related content, or said they were interested in that, they shared whether they’re a woman, and then we can show the ads to the right people without that data ever changing hands and going to the advertiser.”
This is a key issue that hinges largely on semantics and outdated norms around data ownership. Facebook’s official policy is that they sell advertising and outlets pay for the placements. They consider the data collection to be underlying infrastructure for the advertising. This obviously conflicts with what we’ve learned about Cambridge Analytica and at least 600 other apps aggressively scraping Facebook data.
Zuckerberg was also careful to suggest that all of the data shared with apps Cambridge Analytica was public profile information, like names, birthdates and friends. He did not address the recent revelations the developers could see direct messages using the API nor that Facebook collected information on text messages and phone calls. His notes also spell out, “no credit card or social security numbers were compromised.” Zuckerberg is trying to minimize regulatory and legal fallout here by minimizing the perceived value of the information.
Reactive Humans vs. Proactive AI
Zuckerberg spoke to Facebook’s use of AI twice at length. Zuckerberg believes A.I. will solve many of the problems Facebook has with data scraping and permissions… eventually.
This long response to Senator Thune’s question about using tools to identify hate speech is very interesting. Facebook is transitioning from using human responses to complaints to an AI based tool set for identifying hate speech. The entire response is quoted below:
Yes, Mr. Chairman. I’ll speak to hate speech, and then I’ll talk about enforcing our content policies more broadly. So — actually, maybe, if — if you’re OK with it, I’ll go in the other order.
So, from the beginning of the company in 2004 — I started in my dorm room; it was me and my roommate. We didn’t have A.I. technology that could look at the content that people were sharing. So — so we basically had to enforce our content policies reactively.
People could share what they wanted, and then, if someone in the community found it to be offensive or against our policies, they’d flag it for us, and we’d look at it reactively. Now, increasingly, we’re developing A.I. tools that can identify certain classes of bad activity proactively and flag it for our team at Facebook.
By the end of this year, by the way, we’re going to have more than 20,000 people working on security and content review, working across all these things. So, when content gets flagged to us, we have those — those people look at it. And, if it violates our policies, then we take it down.
Some problems lend themselves more easily to A.I. solutions than others. So hate speech is one of the hardest, because determining if something is hate speech is very linguistically nuanced, right?
It’s — you need to understand, you know, what is a slur and what — whether something is hateful not just in English, but the majority of people on Facebook use it in languages that are different across the world.
Contrast that, for example, with an area like finding terrorist propaganda, which we’ve actually been very successful at deploying A.I. tools on already.
Today, as we sit here, 99 percent of the ISIS and Al Qaida content that we take down on Facebook, our A.I. systems flag before any human sees it. So that’s a success in terms of rolling out A.I. tools that can proactively police and enforce safety across the community.
Hate speech — I am optimistic that, over a 5 to 10-year period, we will have A.I. tools that can get into some of the nuances — the linguistic nuances of different types of content to be more accurate in flagging things for our systems.
But, today, we’re just not there on that. So a lot of this is still reactive. People flag it to us. We have people look at it. We have policies to try to make it as not subjective as possible. But, until we get it more automated, there is a higher error rate than I’m happy with.
Time will tell if these artificially intelligent bots mitigate or exacerbate the problem, like the algorithm now powering their news plugin.
Does America Need a GDPR? Zuck agrees
Many of the Senators discussed two impending data protection statutes and the need for an American equivalent to GDPR. Senator Nelson brought up new regulation in his opening remarks:
“Let me just cut to the chase,” said Senator Bill Nelson, a Democrat, before Zuckerberg started giving evidence. “If you and other social media companies do not get your act in order, none of us are going to have any privacy any more. If Facebook and other online companies will not or cannot fix the privacy invasions, then we are going to have to. We, the Congress.”
Later, Senator Markey asked if Zuck would support an American GDPR law, “Europeans have passed that as a law. Facebook’s going to live with that law beginning on May 25th. Would you support that as the law in the United States?”
Zuckerberg replied, “Senator, as a principle, yes, I would.”
Hours later, Senator Cantwell asked, “Do you believe European regulations should be applied here in the U.S.?”
Zuckerberg replied, “Senator, I think everyone in the world deserves good privacy protection. And, regardless of whether we implement the exact same regulation, I would guess that it would be somewhat different, because we have somewhat different sensibilities in the U.S. as to other countries.
We’re committed to rolling out the controls and the affirmative consent and the special controls around sensitive types of technology, like face recognition, that are required in GDPR. We’re doing that around the world.”
Datum From Mark’s Notes
— Stefan Becket (@becket) April 10, 2018
An interesting facet of the hearing is that Zuckerberg’s notes were photographed clearly and shared on Twitter:
Here are some interesting tidbits for data scientists:
- “People who used app gave Kogan FB information like public profile, page likes, friend list + birthday; same for friends’ whose settings allowed sharing; NO credit card/SSN info.”
- “Malicious actors linked public info (name, profile photo, gender, user ID) to phone numbers they already had; shut it down. Need to do more to prevent abuse.”
- “Working to be more proactive; AI, hiring more people e.g. terror, e.g. suicide.”
- “(Don’t say we already do what GDPR requires)”
Links to Further Reading on the Zuckerberg Testimony
New Marketing Head at ODSC. Small Data Expert. Ask me about Marketing, Bow Ties, or unrepentant futurism.
- Creating Typography Using Word Cloud in Python 47 views | by ODSC Community | under Deep Learning, Modeling
- Discovering 135 Nights of Sleep with Data, Anomaly Detection, and Time Series 33 views | by Juan De Dios Santos | under Modeling, Python, R, Statistics, Tools & Languages
- Text Classification in Python 32 views | by Miguel Fernández Zafra | under Modeling, NLP/Text Analytics, Python, Tools & Languages