It has been over two months since my initial post discussing the critical topic of location data sharing among various third-party entities. If you have yet to read it, I highly recommend starting from there for context.

Since that time, I have engaged in discussions with numerous professionals working in the field. My conversations included members from non-profit organizations advocating for data privacy rights across different countries, as well as employees from adtech companies who either validated or refuted my claimsthough mostly, they supported my assertions. I even had the honor of being invited onto the "Lock and Code" podcast to discuss the implications of geolocation data on privacy issues. Enough about my achievements, though; let's dive into the topic.

In my earlier post, I shared a process for analyzing mobile app traffic that I conducted manually. While this method was valuable for educational purposes, it was also time-consuming. I spent hours sifting through hundreds of requests using Charles Proxy, trying to identify "interesting" data from a single application. Realizing the need for a more efficient approach, I have since upgraded my methodology, which I am excited to share with you in this post.

To summarize briefly

I have developed a concise guide and a Python notebook designed for anyone interested in recording mobile app traffic and uncovering some surprising insights within it. You can find the link to my GitHub repository at the end of this article.

The initial setup process should take only 10 to 30 minutes, and I found that analyzing the data using my newly created algorithm took about ten minutes per app. Of course, if you stumble upon something intriguing, you could easily spend hours delving deeper into the data.

Heres the link to the repository: [GitHub Link].

I had a realization during this endeavor: manually reviewing traffic from hundreds of apps would take an enormous amount of timetime I would love to spend, but my goal is to uncover the truth rather than rely solely on internet tables. To facilitate this, I took the existing Gravy Analytics Google Sheet and created a simple form where anyone can contribute data (no personal information is collected). This form feeds directly into the Google Sheet.

You can check any app you currently have on your phone or simply pick a random one from the table. If others have already commented on the app, you can add your own observations regarding the traffic.

Furthermore, if you wish to help crowdsource information regarding the types of data that apps on the list collect and share, please fill out the form provided. Remember to redact any private information in your submissions.

Visualizing Domain Power

Before we delve into the specifics of the guide, I wanted to share some fascinating visualizations I created during my analysis. I have developed a keen interest in graph visualizations, particularly those that display complex relationships.

While analyzing app traffic, I became curious about the distribution of adtech company domains within the requests. To explore this, I combined my traffic analysis with graphic visualization using PyVis, which yielded some compelling results.

In this visualization, each circle represents a domain or subdomain, with the hierarchy depicted through nested circles. For example, the domain o-sdk.ads.unity3d.com is illustrated with three circles: o-sdk inside ads inside unity3d. The colors represent the apps I analyzed, and I utilized low opacities to enhance visibility. Interestingly, purple emerged as the color representing the combination of all analyzed apps. The size of each circle indicates the frequency of requests associated with a particular domain.

The analysis reveals that Unity leads the mobile gaming app traffic landscape, significantly outpacing its competitors. In comparison, Googles g/doubleclick still holds a prominent place in the ad network sphere.

Check out the visualizations in motion: [Link to Visualization].

Additionally, I investigated the density of requests over time, noting that hundreds can occur within just a few seconds. I documented flows from five different games to provide a clearer picture of this traffic.

When comparing the data, it's evident that Unity dominates, generating the highest number of subdomain calls and requests. Applovin, which acquired Adjust in 2021, recently reported a net income of $1.58 billion for 2024 and has shifted its focus away from mobile game development to concentrate on advertising services.

Interestingly, while Facebook appears in requests from nearly every game I analyzed, its overall performance was the lowest among the metrics examined across five examples.

Getting Started with Mobile Traffic Analysis

Lets move on to the practical guide for analyzing mobile app traffic.

Step 1: Install mitmproxy on Your PC

Mitmproxy is a powerful open-source tool tailored for intercepting network traffic, complete with a user-friendly web interface called mitmweb. Please be aware that some antivirus programs may flag mitmproxy as malware, which can be alarming but is ultimately harmless.

You can download it from the official site and choose your preferred installation method. For Mac users, the command brew install mitmproxy will do the trick. Once installed, launch the web interface using the command: mitmweb --listen-host 0.0.0.0 --listen-port 8080.

I recommend using mitmweb for its intuitive interface, which is particularly useful for discovering and understanding the scale of real-time bidding (RTB) requests.

Step 2: Prepare Your Mobile Device

Next, configure your iPhone or Android device to set up a proxy and install a security certificate. I focused primarily on iOS for this guide, but setting up Android is equally straightforward and well-documented.

You may need to enable developer mode to access the Certificate Trust settings on your device. Ensure that both your iPhone and computer are connected to the same Wi-Fi network for the setup to function correctly.

Step 3: Start Collecting Requests!

Run the command ipconfig getifaddr en0 to find your computers local IP address. Then, access your Wi-Fi settings on your iPhone and manually set the proxy with:

  • Server: Your local IP address
  • Port: 8080

Next, open your browser on the iPhone and navigate to mitm.it to install the necessary certificate. Detailed instructions are available in various resources, including [Link to Setup Instructions].

The certificate is crucial for decrypting TLS-encrypted traffic; without it, you will only see encrypted packets.

If you have completed these steps correctly, you should now be able to see requests flowing through the mitmweb interface. To focus solely on a specific app, close all other applications, press "Clear flows," and then open the app you wish to analyze.

After sufficient data collection, save the flows from mitmweb using File Save and rename the resulting file as appname.flow.

Filtering and Analyzing the Data

Next, open the mitm_test.ipynb file in either a local Jupyter Notebook or Google Colab. Working with Python notebooks can be quite efficient, especially in a local environment.

For those unfamiliar with coding, Google Colab is a user-friendly option that allows you to import your recorded flows easily. The notebook contains clear instructions, so I wont repeat every detail here; however, I want to emphasize the importance of filling in as many relevant keywords as possible.

While I primarily searched for IP and geo data, you might want to explore other keywords like "screen_brightness" or "IDFA." The results will reflect exact matches, so consider variations in spelling or abbreviations.

Once you have entered your keywords, run the cells in the notebook to generate a df_filtered table containing requests or responses that correspond to your criteria. This step will also provide a snippet of the matches and highlight the keywords responsible for the hits.

You can examine individual rows in detail by utilizing the `.loc` command in Pandas to retrieve complete values for specific columns. For longer strings, I recommend copying the text into a text editor for easier searching and context extraction.

Room for Improvement

I recognize that this method is relatively rudimentary; I could automate the process using mitmdump to streamline saving and filtering directly to a CSV file. However, I intentionally kept the process semi-manual to ensure critical insights are not overlooked within the diverse formats of requests and responses.

In my exploration, I attempted to filter out content types and encodings but discovered that some relevant requests were excluded from the filtered results. This prompted me to disable certain filters, leading to intriguing findings: I identified gs-loc.apple.com, an endpoint utilized by Apple to request user location information. This endpoint was called during a three-minute traffic recording from a single app, which I had not observed in my previous analyses.

However, I want to clarify that I cannot definitively attribute this request's occurrence to the specific app, as this endpoint is accessible only through designated Apple API methods and requires appropriate permissions. More on this in future postsstay tuned!

In conclusion, I hope this guide has piqued your interest enough to explore at least one app and contribute to this collaborative effort or simply to experiment for fun. If you have any comments, thoughts, or would like to engage in further discussion, feel free to reach out at hello@timsh.org.