Some Developers Don't Know What Their Apps Do With Your Data. Here's Why.

Most apps use off-the-shelf code—and some of it can be risky

cell phone snooping iStock-994454878 iStock-1056507982 iStock-1174223675

A couple of years ago, Disconnect, a small tech company in San Francisco, was approached with an enticing offer: For every 100,000 people who used Disconnect’s apps, a company called Elephant Data promised to pay it $1,000 a month. All Disconnect had to do was to add a few lines of code into its apps.

Thousands of dollars a month is a tidy sum for a small app company, but Disconnect turned down the offer. It develops apps and research that promote digital privacy—and occasionally collaborates with Consumer Reports on security investigations. Proposals like Elephant's often come from companies trying to collect user data for advertising, which could not be more at odds with Disconnect's mission.

As it turns out, what Elephant was doing was much worse. Last year, an investigation from Upstream, a mobile security company, found that Elephant Data’s code secretly recruited consumers’ phones into a scheme that jacked up their phone bills and contributed to the tens of billions of dollars digital ad networks lose to fraud every year.

Elephant was shopping around a shady SDK, or software development kit. SDKs are important building blocks found in almost every kind of software, including phone apps, and they range from the utterly mundane to the explicitly malicious. In between these extremes lies an entire world of code supplied by data brokers and app analytics companies with cheery tech-startup names the average consumer has probably never heard of.

But these data companies have heard about you: They exist to trade in information about who you are, the things you like, everywhere you go, and what you do on your phone.

More on Data Privacy and Security

Making an app is a lot like putting together a Lego set. Rather than designing every element from scratch, developers spend much of their time assembling bits of code written by other people. SDKs and code libraries allow an app-maker to offer basic functions, like a login page or notifications, without having to cook them up anew. These software tools, which are usually licensed for a fee or at no charge, enable developers to run ads in their app, figure out who’s downloading it, and know when it crashes.

Making an app without these tools would be like building a house by first mining clay for bricks or felling lumber for beams. “Novice and professional programmers alike by necessity only focus on a tiny fraction of the code in the final products they create—far less than 1 percent,” says Cynthia Lee, a Stanford computer science lecturer.

For consumers worried about their data, that's both good and bad.

On one hand, you wouldn’t want every last app developer to try making the most complex or sensitive parts of a program on its own—like, say, the tools that encrypt personal information before sending it off over the internet. Unless it's a cryptography expert, it's liable to make a mistake that would leave your data vulnerable to hackers. Instead, a developer will usually rely on code from a trusted company for these difficult jobs.

On the other hand, the hodgepodge of various people’s code in any one app invites privacy perils. An app-maker that wants to add a new feature—a chat function, say—may just choose one based on recommendations on an online forum without thinking too hard about what the new code will do with users’ data.

“We’re at this state now where a lot of developers are pulling in code that they couldn't even explain how it works—they just know the end result,” says Patrick Jackson, Disconnect’s chief technology officer. “That’s dangerous, because if you’re blindly just pulling in any [code] library and you don’t know what it’s doing, you’re putting your users at risk.”

This means when you open up an app, you’re not just dealing with the company that made it. Sensitive information about you gets shipped off to all sorts of other companies whose code is tucked into that app, too. Suddenly, you’ve entered a relationship with a dozen data brokers or marketing firms you’ve probably never heard of, without your knowledge or explicit consent—or even good oversight by the app developer.

Some of these marketing-oriented SDKs have a shockingly expansive reach. Code by AppsFlyer, a marketing company, is found in more than 40 percent of the apps on the Google Play Store, according to data from AppFigures, a research firm. Chartboost, which makes software for displaying in-app ads, shows up in 21 percent of the apps in Apple’s App Store. The big players are omnipresent, too: Several Google and Facebook SDKs pop up in a commanding majority of apps.

“Consumers are basically stuck,” says Serge Egelman, a Berkeley computer science professor and CTO of the privacy research company AppCensus. “If you download an app, there’s absolutely no way for you to know if it’s going to send your data to Facebook or Braze or Flurry,” he says, naming popular marketing and analytics companies.

(Consumer Reports has removed code from a number of third-party vendors, including Flurry, as we've worked on improving the privacy practices in our own mobile app. Our app does not use code from Facebook, but it does incorporate code from Google and Adobe Site Catalyst that gives us information such as how often each link gets clicked on.)

Bad Actors Hiding in the Code

For Elephant Data and similar SDKs, this opacity is valuable cover. Nobody would knowingly sign up for an international ad-fraud conspiracy, but they might stumble into one if they download an app quietly running Elephant’s code in the background.

Upstream’s research focused on a popular file-sharing app called 4Shared that incorporated Elephant Data’s SDK. The app was silently loading and clicking on invisible ads on people’s phones, apparently to defraud companies that pay to have their ads displayed. In some cases, Elephant Data even made fraudulent purchases on behalf of users. Upstream found 2 million devices in 17 countries (including the U.S.) that were behaving this way, and estimated it may have cost their owners as much as $150 million in data charges.

Over the years, Disconnect was contacted by other companies offering money in return for installing their code. One came from a company called AppJolt, which later became part of OneAudience, an app-analytics company. In February, Facebook sued OneAudience over an SDK it claimed was improperly harvesting user data. A spokeswoman from OneAudience's public relations firm tells CR that the company shut down in November and pointed to a statement that said the data was "never intended to be collected, never added to our database and never used."

It's unusual for a company to pay developers to use their SDKs. More often, the software is free or developers are charged for it. Offering to pay for placement isn't a sure sign that a company is engaging in fraud, but consumers still may not be comfortable with what the SDK provider is doing. For instance, a company called X-mode pays app developers to use its SDK, which collects users' location data to be aggregated and sold to other businesses.

A rogue SDK's bad behavior can be hard to detect—even for an app developer that's implemented the code, says Dimitris Maniantis, CEO of Upstream. Elephant Data presents itself as a “market intelligence” service that helps app developers understand more about their users. And it goes to lengths to hide its illicit activity: Its privacy policy makes no mention of it, and 4Shared's Irin Len tells CR that the company "knew nothing" of the Elephant Data SDK's alleged behavior. Len says 4Shared broke off its relationship with Elephant before the Upstream report was published, but would not say why.

It’s not clear how many other apps are running Elephant Data’s SDK. The company, which appears to be based in Hong Kong, did not respond to CR’s repeated requests for comment.

Building From Scratch

Fraud aside, developers that want to build apps that respect their users' privacy can find it difficult to avoid participating in the legal third-party data economy.

Several years ago, one company—Perry Street Software—made the leap: It began stripping other companies’ SDKs out of its products, a pair of popular gay dating apps called Jack’d and Scruff. The effort took a “tremendous amount” of time and money, says Perry Street CEO Eric Silverberg.

But for a company that caters to the gay community in the U.S. and abroad—users who, depending on their circumstances, could be fired, arrested, or assaulted if their identities leaked—plugging those potential data leaks felt important. So the company pulled out vendors’ SDKs for analyzing app performance, tracking installs, and displaying advertisements bought on third-party networks. Now, marketers deal directly with Perry Street if they want to advertise in the dating apps. Facebook, too, got discarded, even though that meant Jack’d and Scruff wouldn’t be able to benefit from the company’s powerful advertising platform.

Silverberg shared a scrap of business-school advice that has stayed with him: Be careful of the company you keep. “There’s just a universe of actors all clamoring to get access to your data, and you need to be careful,” he says.

For the average startup, going cold turkey probably isn’t realistic. “When we got our start, we were using third-party ad networks, and they were a critical source of revenue,” Silverberg says. “We’d never be here if it weren’t for that revenue. I completely understand an app starting today needing revenue from those networks.”

That means the average consumer is constantly dealing with data-hungry companies operating just below the surface of their apps. Experts tell CR there’s little a user can do to protect themselves, beyond avoiding sketchy apps from anonymous developers. “I try to think: Is this developed by a company I’ve heard of? So I’m not just downloading random stuff from the App Store,” says Cynthia Taylor, a computer science professor at Oberlin College.

But that's not much of a defense against abuse, experts say. “Right now the issue is that the burden of determining whether an app is going to be behaving or not is shifted to the end user,” says Berkeley’s Egelman. “Consumers just don’t have the ability to make these decisions. And other stakeholders have abdicated their responsibility.”

Headshot of CRO author Kaveh Waddell

Kaveh Waddell

I'm an investigative journalist at CR's Digital Lab, covering algorithmic bias, misinformation, and technology-enabled abuses of power. In the past, I've reported for Axios and The Atlantic, and as a freelancer in Beirut. Outside work, I enjoy biking and hiking in and around San Francisco, where I live, and doing the crossword while cheating as little as possible. Find me on Twitter at @kavehwaddell.