Ever wonder where tech reporters get all their fancy data? For instance, how do we know that 52% of mobile app sessions were for games in the first couple of months or 2012 or that the use of native apps versus the mobile Web is tied? The truth is that a lot of the interesting stats in the mobile ecosystem are provided by marketers and advertisers. Those networks know how consumers are using their devices to a degree of granularity that at times is creepy. How do they know what users are doing?

A new research report tells us how ad networks implement in-app libraries to deliver advertising to consumers and help developers get paid. For the most part, the largest networks are benign but consumers have learned to never trust an advertiser. In-app libraries can often function like the app that hosts them but can have access to far more information that the user ever intended.

NC State Dives Deep Into Android Advertising

For the consumer, the fundamental difference between users downloading Android apps and iOS apps is that Android is designed to let the user know all of the explicit permissions that an app can use on the device. For instance, a popular app like Rdio will have access to system tools, read the ready phone state and have network for communications access. Permissions allow for developers to be upfront with the consumer about what they are doing and how they are doing it and allow security checks to be made by third parties.

More so than iOS, Android developers rely on advertising networks to make money. Developers connect to ad networks through SDKs and APIs within an app creating sub-level to an app that the developers do not necessarily control. Think of it on two levels: there are the permissions that the app can use that are explicit to the end user. Those same permissions can be used by ad networks and are not explicit to the user. This can lead to privacy issues as user information that the user never intended to share ends up on ad networks' servers.

Image: Research from Flurry, one of the top networks found in NC State's research.

A research paper from North Carolina State University studied the behavior of ad networks and analytics services in Google Play (Android Market). Researchers downloaded 100,000 apps from March to May 2011 and examined which were running ads and how the ads interacted with the app, the device and user data. The findings are concerning for users sensitive to being tracked by advertisers.

Researchers then identified the top 100 ad libraries found within those such as AdMob, Mobclix, Flurry, Millennial Media, Tapjoy and others. Of these top 100 ad libraries, they were present in the 100,000 app sample 52.1% of the time.

In-app libraries are tied to the host app and afforded the same permissions. Hence, by using an ad-supported app, a user is also giving permission to the advertiser to use that information.

"Due to the fact that ad libraries are incorporated into the host apps that use them, they in essence form a symbiotic relationship," the study states. "Based on such relationship, an ad library can effectively leverage it and naturally inherit all permissions a user may grant the host app, thus undermining the app-based privacy and security safeguards."

One example is that some ad networks allow dynamic code loading within an app, meaning that an ad network can send remote code to a device without the user explicitly upgrading the app. In the most benign intentions, this code loading is used to update the ad network for performance, content and optimization. Yet, dynamic code loading could also be used to sniff out other permissions within a device and upload user information (such as calendar, phone history or contacts lists) or download malicious material.

Three Categories of Creepy

There are three categories of problematic behaviors that NC State researchers came across in their analysis.

Invasively collecting personal information: Where in-app ad libraries request data not directly useful to their purpose. Researchers found that the larger ad libraries do not perform this activity but smaller networks might and there is no way for a user to know what network is serving its ads.

Example: Ad network sosceo (which is so tiny that a Google search of it turns next to nothing on it) starts when a UI element is activated (a user clicking on link, for instance). The ad network then query's the devices contact list for the most recent phone call made and stored in a data field. What does an ad library need with your most recent calls list?

Permissively disclosing data to running ads: This is, "direct exposure of personal information to running ads."

Example: Mobclix, one of the most popular ad libraries for Android, has a variety of ways to attain personal information in running ads. It uses a method that binds Android APIs to JavaScript that gives the ad library access to a variety of device access permissions in a method similar to how HTML applications are wrapped for native deployment by services like PhoneGap. Researchers note that most of the permissions granted to the network have appropriate use confirmation (through the host apps permissions) but several of them do not.

Of the top ad libraries, Mobclix has the most red flag behaviors, including 11 of the 20 fields of the researcher's chart. That includes proving app permissions, using JavaScript, camera and calendar access, the ability to read phone information and use the vibrator.

Image: Infographic from Moblix on "60 seconds of smartphone use."

Unsafely fetching and loading dynamic code: Mentioned above. According to the study, "dynamically loaded code cannot be reliably analyzed, effectively bypassing existing static analysis efforts." Dynamically loaded code can be changed which makes it difficult to study ad library behavior. If an ad network obfuscates much of its source code (such as using random letters or meaningless words), it is almost impossible to determine what the network is doing.
Example: Five of the 100 ad libraries in the study use this practice. One library has the ability to download payloads that can control the app remotely. That is surprisingly similar to ways that spammers and malicious hackers attack mobile devices and PCs to create botnets.

The Bottom Line: There Isn't One

The open nature of the Android ecosystem makes this type of analysis available. It would be curious to see how many of these same networks operate on iOS though it is likely that they do not differ in any meaningful ways. The ability for an ad library to be both part of the app and fundamentally distinct from it should give users pause.

It should come as no surprise that ad libraries are searching for ways to get more user information. The ability to know everything possible about a person gives the advertiser power that they have never had before. Some of these ad libraries can even track down a person to their personally identifiable information, a practice that has long been considered shady in the digital advertising world. We can understand how ads that know your location and general behavior can be very beneficial to the user, developer and the ad network but, as the study points out, there are likely better ways to track and implement rules for ad library behavior.

The natural assumption by the end user is, and has been for a long time, is that all advertisers are generally evil. This is not true as many of them simply want to connect the user to relevant information that helps drive transactions that sustain a great portion of economic activity in the United States.

It also behooves Google to structure Android in such as way that ads are easily delivered and targetable. Unlike iOS or Windows Phone, Google does not directly profit from supplying the platform to manufacturers and app developers. Yet, what we see with the research from NC State is that the ecosystem that directly benefits Google (through AdMob) also creates a set of principles that can be abused.

Developers: how much do you know about the ad library that you implement and how it behaves? Is it a matter of loading an SDK and letting it run on its own or are you continually monitoring how it affects your app? Let us know in the comments.