A Data-Driven Approach to Improving our Customer-Professional Matching
By: Ben Anderson & Xin Liu
When a customer posts a request on Thumbtack, we want to match them with the right professional for the job. When the marketplace was small, this was easy-just blast the request out to all of the pros in the request’s category and location. Today, with millions of requests a year and hundreds of thousands of active pros, we can’t rely on that simple algorithm anymore. The definition of “right” is no longer obvious-the pro and the customer each have their own preferences, and we need to balance how we benefit customers and pros to grow a healthy marketplace in the long run.
In this post, we’ll explore some of the early work we did to improve our marketplace by building systems to leverage our growing historical data. In particular, we’ll discuss our effort to model pro interests in order to send pros more compelling requests.
How we used to think about the model
In the old days of Thumbtack, we only matched on logistics: is this pro in the right category (wedding photography, house cleaning, etc.) and geographical area to serve this customer’s request? As we grew, we introduced a simple binary “limiting” system on the pro side — if a pro didn’t engage with our platform for a certain amount of time, we would limit them to a low fixed number of requests per week. This was much better than nothing, but had a long list of issues. Pros who went on too long of a vacation would suddenly find they weren’t getting requests. Pros in very active markets would get more requests than they could handle. Finally, when we limited requests, we weren’t sending the requests that the Pro was most interested in.
Last year, we started working on the first step to a more efficient marketplace, by improving our understanding of Pros’ interests. The goal of our new system was to encourage Pros who were not engaged with Thumbtack to come back, by showing them more relevant requests. We did this by building a model that used our historical data on Pro engagement to optimize for a Pro’s interest in a request. Specifically, we predicted the probability that a given Pro would quote on a given request if we sent it to them, P(quote|pro,request), and used that information to determine who to send the request to.
Model
There are a variety of potential approaches we can use to model the attractiveness of customer requests to Professionals. We mainly considered collaborative filtering and class probability estimation. We chose logistic regression for its good interpretability, ease of implementation, and extensibility to future improvement. We aimed to predict whether or not a professional would quote on a given request by looking at historical Pro engagement data.
Features
One way to measure how likely a Professional would be interested in a request is to check if the professional has engaged with similar requests before. For example, if a professional had a much higher quote rate (quotes/notifications) in House Cleaning around SoMa area (in San Francisco) than in other categories (e.g., Carpet Cleaning) and locations the Professional provides services for, we may consider this Professional to be interested in any future requests in House Cleaning and SoMa. In addition to category and location, we also considered the Professional’s past engagement in other dimensions such as request time, job size, etc.
Counting
Counting is fundamental in computing the engagement based features above. Intuitively, a quote from six months ago should not be counted as the same value of a quote three days ago. One straightforward approach is to have several versions of a feature based on different tracking time windows. However, there are several downsides to this approach: the number of features will increase quickly, we can only have a limited number of time windows (1 day, 1 week, or 1 month), and cannot track a time window in between (e.g., 2 weeks). To address the above issues, we use to represent engagement counters. For example, If we set the half life of a quote to be one month, then a quote submitted last month will be counted as 0.5 quotes in today’s feature computation. Different half lives can be used for different types of features to capture various decay speed. For example, customer review ratings should have a longer half life (e.g., a few months.) than quotes. This approach has been effective and generic in counting a variety of metrics in our system.
Online Tuning
Now that we have a model for predicting P(quote|pro,request), we need to decide how to use this number to determine whether to notify a Pro of a request. The easy thing to do might be to simply rank by this score, and pick the pros who are most interested to notify. However, doing this would result in some bad outcomes for the marketplace. We ultimately want to notify Pros probabilistically so that they have some chance of being notified on every request. This allows for changing preferences and corrects for overfitting. We calculate the probability we notify a given pro of a request, P_{notify} using a combination of factors. Consider the following groups of pros:
- New Pros: we don’t have historical engagement data for new Pros, but we want to notify them of requests to give them a chance to engage. We override the regular P_{notify} for these pros until we understand what they’re interested in.
- Disengaged Pros: we know that Pros disengage with our platform for various reasons, and will often re-engage in the future. We want to always send a minimum number of weekly requests to a Pro to give them the option of re-engaging when they’re ready.
A/B Testing and Next Steps
We ran this new model against the old one as an A/B test and saw a significant increase in quotes/pro with a drop in notifications sent, resulting in a huge improvement in quotes/notification.
Of course, we’re not done here. Ultimately, our goal is to optimize the match for both the customer and the pro. We’re only beginning to build out our data driven systems. Join Thumbtack and help us build it.
Originally published at https://engineering.thumbtack.com on July 1, 2016.