Mindfulness App: How to Analyse User Behaviour Using Google Firebase Events Data
Since 2015, more than 2,500 meditation or mindfulness mobile apps have launched. The demand for mobile app supported mental self care is obvious and the market is likely to see an incremental opportunity of US$ 184 million between 2019 and 2029. Although prominent brands such as Headspace and Calm claim nearly 70% of the overall market share, emerging players are trying to increase and reinforce their market standing by offering premium or specialised services. The key to this process is understanding user behaviour and providing services catering their preferences.
This article presents how to generate insights into user behaviour using Google Firebase events data, drawing on examples from a recently completed project for Norbu Stress Control app. Norbu is the winner #GooglePlayBestOf 2020 Best App of the Year in the category of “Personal growth”. The project is part of the Practicum by Yandex data analyst program.
Data and tasks
Key tasks for the Norbu project are to examine how users navigate and use a range of Norbu premium services. The company is also keen to find out where there might be potential barriers to app navigation.
The data consists of information about user events from 2021–04–04 to 2021–05–31. The app is linked to Firebase, a platform developed by Google for event tracking. Firebase lets the business to track what their app users are doing on the app and provides reporting for up to 500 distinct events. The Firebase Realtime Database is a cloud-hosted NoSQL database where data is stored as JSON and synchronised in realtime to every connected client.
Moreover, Firebase sets up daily syncs of the app data from the Firebase project to BigQuery. With BigQuery, data can be analysed with BigQuery SQL or exported to be analysed using other tools. In this article, the focus is on how to analyse Firebase events data using Python.
In what follows, the structure of Firebase events data will first be examined. Following that, the most common challenges in analysing Firebase events data are discussed, including:
- keeping data structure in local files
- accessing event parameter information
- visualize user journey
Solutions to the above challenges are presented using examples from the Norbu project exploring the following issues:
- Locations of the app users
- User stress management journey
- Effectiveness of training
- When do users remove the app?
Firebase events data structure
Google provides detailed instructions on downloading events data from BigQuery to pandas. Python’s pandas_gbq library also supports reading data from BigQuery into Jupyter Notebook. The screenshot below shows what the events data looks like after being imported.
As Firebase events data is stored as JSON, the “event_params” column is in fact a list of dictionaries containing the event information. Each unique “event_name” has its own corresponding event parameters. There are also columns that are themselves a dictionary, such as “device” and “geo”.
If the events data needs to be exported and stored locally for later access and analysis, it’s important to export them in the right format. A csv file will not work because it will flatten the JSON data. As a result, the dictionary or list of dictionaries column will become strings, making it challenging to access their key values.
One way to address this challenge is to export and import the data as pickle file, which helps to keep the data structure as it is. To save as a pickle file, simply use:
To read the file, use
Locations of app users
One of the information Norbu wants to find out is where its users come from. This information is stored in the “geo” column in dictionary format. Among all the key value pairs, only the country and continent information is needed and will be extracted.
This can be achieved by simply applying a lambda function to the “geo” column and creating the “country” and “continent” column.
events['country'] = events['geo'].apply(lambda x: x['country'])
events['continent'] = events['geo'].apply(lambda x: x['continent'])
The result is the dataframe below.
With the help of the new columns, the distribution of app users by country can be easily visualised as below. As the graph shows, Iran stands out as the top source country where almost half of the Norbu users come from.
User stress management journey
Norbu also wants to find out how users use each of its core services. One such service is Norbu’s stress management training.
To start the training, users first arrive at the stress home screen. Then they are invited to rate their current stress level from 1 to 10. Following this step, users can proceed to play a neuron massage ball game to activate their peripheral attention, participate in a breathing exercise, complete a meditation session, and finally rate their stress level again.
The first step to examine user journey is to find out how many users proceed from home screen, “scr_stress_level”, where they are asked the rate their stress level, to completing the training. If users complete the training, a “result_session” event will be triggered which stores their before and after self assessment result. The funnel chart below shows the completion rate.
The chart above shows that 13% of the users from stress level home screen actually completed the training session. This tells the completion rate, but where did rest of the users go?
To obtain this insights, a function to plot user journey is needed. In this article, details of this function will not be discussed. For those who are interested, the code for the function can be viewed and downloaded from this GitHub repo. If your firebase events dataframe is named as events, user_pseudo_id is used to distinguish unique users, and the column with event timestamp information is named as datetime, the function is ready to be applied.
The plot_user_flow function to visualize user journey is adapted from code in this article, with modifications made to make it suitable for firebase analytics data.
The function takes 5 parameters:
- the events dataset
- name of the event that serves as the start point
- number of steps to follow from that event onwards
- number of paths to be shown for each step (the rest will be grouped as ‘Other’, and
- name for the plot.
To show where users go from the stress level home screen, the function is run on the events data, the starting step is the home screen of stress_level, and 5 steps onwards from home screen will be shown, focusing on 5 main paths. The title of the plot is “User journey from stress home screen”.
plot_user_flow(events, 'scr_stress_level', 5, 5, "User journey from stress homescreen")
From the above Sankey diagram, it can be noticed that most of the users from stress home screen go to ball game home screen, a logical next step in order to complete the training. From there, almost all users start playing the ball game, but only approximately half completed the game and arrived at “result_game”.
To view the complete journey from home screen to end of training, the 3rd parameter of the function (the number of steps) can be increased from 5, for example, to 10, or more if necessary. There is no limit to how many steps the journey can be shown.
Sankey diagrams for user journey can be helpful to see how users navigate the app and whether they are progressing along the steps as expected. If they don’t, then it is worth investigating what might have prevented them from doing so.
Effectiveness of training
After taking a look at user journey, the next question is: did the training session help to reduce user stress level? To obtain this information, the “result_session” event and its parameters need to be examined. The screenshot below shows a slice of dataset including only those with “result_session” as event_name.
Each value in the “event_params” column is a list of dictionaries. How can the values from this list be extracted? First, the screenshot below shows what the list looks like. The information between the 2 yellow lines shows the full list of event_params dictionaries for “result_session”.
For the dictionary that has “before” as the key, its value is also a dictionary, in which the stress level is saved as the value to the double_value key. This is the self rated stress level before the session. Similarly, the self rated stress level after the session is stored in the dictionary that has “after” as its key.
In order to compare the before and after stress level, a function is needed to extract the scores and save them in two new columns.
In the code below, extract_stress_before is a function that will be applied to each value in the “event_params” column, essentially a list. For each dictionary in the list, if the key is “before”, then the value of “double_value” is returned.
Applying the two functions to the “event_params” column, two columns are created containing the before and after self assessed stress level for each user. The graph below plots the difference (after-before) in stress level for each user.
The plot above shows that, for the 22913 users in the data who have completed the stress management training session, majority (78.8%) experienced reduced stress level. In a nutshell, the training session did help most to reduce their perceived stress level.
When do users remove the app?
Another matter of interest for Norbu is app removal: when and why do users remove the app? While it’s hard to tell from the events data why users remove the app, when this happened can certainly be found out.
In Firebase events data, there is a dictionary in the “event_params” list regardless of the event_name: “ga_session_number”. The screenshot below shows an example of the full list of event_params dictionaries for the event named “app_remove”.
“ga_session_number”, according to Google support, is the monotonically increasing identifier (starting with 1) of the ordinal position of a session as it relates to a user (e.g., a user’s 1st or 5th session) associated with each event that occurs in a session. Extracting the value for this key therefore can tell at which session the user removed the app.
A similar function to those in the earlier section can be defined to extract this information.
Slicing the dataset to include only those relevant to “app_remove”, and then applying the function to the “event_params” column, a column containing the session_number at which the users removed the app is created.
A quick summary of this column shows that 75% of the users removed the app during or before their 4th session. As a matter of fact, this finding is consistent with research on users’ removal of apps: on average, mobile apps lose 77% of their daily average users within the first three days after download. After 30 days, that number spikes to 90%. 47% of those users will delete the app after that. Norbu is not alone in the battle to keep users.
To sum up, this article presents the solutions to some of the most common challenges in analyzing firebase events data to generate app use insights.
In particular, it shows how to save the events data locally in order to keep the data structures intact, how to extract the information needed from columns containing dictionary or a list of dictionaries, and how to analyse and visualize user journey. The methods and functions introduced could be easily applied when analysing firebase events data for any mobile applications.