By: Gagik Chakhoyan, Data Science Team Lead
In this series of blog posts we will cover how SoloLearn uses data to make business decisions.
We can imagine the content of this article would be both familiar and helpful for those working on products such as ours, or who struggle with the same sorts of problems -- namely, building a product from scratch, iterating on it, or changing it in ways that make the lives of its users easier by helping them discover and learn what they are looking for and making their experience the most engaging and immersive that it can be.
In this first post, we will begin with a high-level introduction to how we approach data and subsequent posts will cover more specific topics and include technical details.
Sololearn Is Unique
SoloLearn is the world’s largest mobile coding community. It is not only where millions of learners come for programming lessons, but also where these learners make friends, communicate with peers, challenge each other and support each other. In this way, SoloLearn is both an edtech as well as a social application. This duality is great for the learner, but often a challenge for the data team.
Typically, the data and product teams in a SAAS business are focused on finding and perfecting the experience of the “core action” for the user -- in other words, the action that correlates best with retention and ongoing use of the product. The converse is that if the user does not perform this action, they are likely to leave the application. In products designed for single-use cases, which is common in the edtech industry, the core action is obvious -- complete the lesson. If we start to add use cases, as SoloLearn does, core action identification becomes more challenging because of the sheer volume of data.
For example, Sololearn provides a Code Playground where learners can test their knowledge at any time, even after only a couple of lessons in a course. Additionally, learners can become friends with peers, and use SoloLearn’s social feed to post activities or other interesting events. On SoloLearn, the user learns to code, writes code, and is engaged in social activities, etc. -- so as you can see, finding the user’s core action becomes much more challenging.
Finding Patterns Through Data Aggregation
If you are a data scientist working in a dynamic environment such as the one we enjoy at Sololearn, it is very easy to focus on all of the numbers and lose sight of the fact that each data point represents a specific and unique human with their own motivations that is hoping SoloLearn solves their problems or meets their needs. This fact is something that we remind ourselves of as often as possible: each Sololearner is a unique individual, in a certain period of their life journey, having found Sololearn with personal expectations of how they will use the app and what value it will provide for them. However, no product can have a solution for all possible unique users and their needs, so the answer for how to provide an exceptional experience for all users lies in data aggregation.
Behavioural patterns always occur among a pool of users. Your job as a data scientist is to identify what those patterns are and what are the metrics that summarize them in the cleanest way possible. Aggregation of the data is a sacrifice you must make every day, since keeping things simple and keeping as much of the information about all users are in constant conflict. The decision of which metrics to track is crucial for every product; these decisions should be made carefully, and in close collaboration between product and marketing teams.
A good rule of thumb in choosing key metric (a.k.a. North Star Metric) is to start with your product strategy. There is a saying that goes “what you measure, you optimize”. It is a wise idea to start with your expectations about the product and its desired usage from the learner’s perspective.
Let’s take a popular product such as a Zoom. It is not hard to guess what its key metric is. Zoom’s primary use is to host or attend a meeting. So its key metric is tied to the number of meetings hosted (per week). In a product with multiple use cases such as SoloLearn, the choice of metric becomes harder. There are several ways that you can use data to confirm/reject your hypotheses about your key metric, almost all of which use qualitative research and quantitative analysis (mainly correlational analysis). We will cover these concepts and how SoloLearn applies them in future blog posts.
All SoloLearn Decisions Are Supported By Data
While some people may say data is a mirror of reality, and others claim data serves as a window as a way to look at the world, one thing is clear: it is difficult to make good decisions if they are not grounded in carefully selected and analyzed data.
Make sure you track all necessary data. The answer to when we need to implement tracking of data is “always”. Even if there is no urgent need to have data tracking in place, it is best to create a framework for collecting it and for mining it from the start. Doing so will ultimately pay off, and the more data you ultimately have, the more tools you have in your arsenal to inform future decisions.
There is always a question about what data to track, so keep a balance between not overloading your applications with data logging requests but also having enough data to support the ideas and hypotheses that you want to test.
Even Great Data Still Requires a Great Team
There are two primary inputs when it comes to growth. Many companies emphasize the role of an experimental growth engine where you test as many ideas as quickly as possible. In a future post we will cover how we run multiple A/B tests simultaneously while keeping the data and results clean. Execution is critically important but success hinges on the properly defining the problem and crafting a clear hypothesis about the causes of the problem and how we propose to solve it.
After we define a specific problem or propose an idea about how to make the users’ experience better, the team moves on to the stage of initial data validation to see what we can ascertain based on observational data analysis. Observational studies are, in general, less costly, since they do not require inputs from other teams, but the tradeoff is often very noisy data. So, more often than not, the team determines that there is a need for a properly designed experiment which will help us validate or reject the hypotheses. At SoloLearn, a Product Manager leads the process of designing an experiment that, if properly implemented, will determine if the problem is solved, or the idea does make the users’ experience on the app better. This design includes the metrics by which we will determine success and the specific data we will track to make that determination objectively.
In future posts, we will discuss how to achieve growth in SAAS apps as well as describe Sololearn’s experimental culture.
Gagik holds a Ph.D. in Mathematical Economics and serves as SoloLearn’s Data Science Team Lead and currently teaches Data Science at Yerevan State University. His areas of interest are Statistics, Probabilities, Data Visualization and Deep Learning. He began his career in financial risk management before shifting to software at SoloLearn.