RFM Analysis for Customer Segmentation

RFM Metrics for Customer Segmentation (Photo by Odorite.com)
  1. Recency: How much time has elapsed since a customer’s last activity or transaction with the brand?
  2. Frequency: How often has the customer transacted or interacted with the brand during a particular period of time?
  3. Monetary: How much a customer has spent with the brand during a particular period of time?
  1. It employs objective numerical scales to produce a high-level picture of consumers that is both succinct and instructive.
  2. It’s simple enough that marketers can utilize it without expensive tools.
  3. It’s simple — the segmentation method’s output is simple to comprehend and analyze.
  • Champion Customer: bought recently, buy often and spends the most
  • Loyal/Committed: spend good money and often, responsive to promotions
  • Potential: recent customers, but spent a good amount and bought more than once
  • Promising: recent shoppers, but haven’t spent much
  • Requires Attention: above average recency, frequency, and monetary values; may not have bought very recently though
  • Demands Activation: below average recency, frequency, and monetary values; will lose them if not reactivated
  • Can’t Lose them: made biggest purchases, and often but haven’t returned for a long time
# Importing Required Libraries

import pandas as pd
import numpy as np
from datetime import timedelta
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
Invoices Dataset
Pandas DataFrame Showing Invoices Dataset
  • Invoice: This is a unique number generated by this FMCG store to help trace payment details.
  • StockCode: This is a unique number assigned to each product in a particular category to help in stock-keeping/tracking purposes.
  • Description: This explains the product’s why and provides information about the products.
  • Quantity: This gives the number of products purchased.
  • InvoiceDate: This represents the time stamp (time and date) on which the invoice has been billed and the transaction officially recorded.
  • Price: This refers to the price of each product.
  • CustomerID: This refers to the unique number assigned to each customer.
  • Country: This refers to the country in which the purchase is being made.
  • First, we’ll be using the Python Script below to convert the InvoiceDate Feature from Object format to DateTime format.
# Converting InvoiceDate from object to datetime format

invoices_data['InvoiceDate'] = pd.to_datetime(invoices_data['InvoiceDate'])
  • Now, we will drop Not Available Values present in DataFrame using Python Scrip below:
# Drop NA Values

invoices_data.dropna(inplace=True)
  • Now, when we generate descriptive statistics of Dataset we have the following information:
# Generate descriptive stats of dataset

invoices_data.describe()
Descriptive Stats for Invoices DataFrame
  • In the above picture, we can see customers have ordered in a negative value which cannot be possible so we need to filter quantity >0 using Python Script below:
# Filter Required column for RFM Analysis

filter = (invoices_data.Quantity>0)
invoices_data = invoices_data[filter]
Descriptive Stats after Filtering Quantity
  • We create a new column TotalSum column with the Python Script below:
# Creating TotalSum column for Invoices dataset

invoices_data['TotalSum']= invoices_data['Quantity']*invoices_data['Price']
  • We then create a snapshot of the date, with the Python Script below:
# Create snapshot date

snapshot_date = invoices_data['InvoiceDate'].max() + timedelta(days=1)
print(snapshot_date)
  • Now, we drop the records that are Returned items indicated with C by filtering
# Drop the returned items Records

invoices_data= invoices_data[~invoices_data['StockCode'].str.contains('C')]
  • We can group customers by CustomerID after creating the snapshot date using the python script below:
# Grouping by CustomerID

invoices_rfm = invoices_data.groupby(['Customer ID']).agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
'Invoice': 'count',
'TotalSum': 'sum'})
Invoices Dataset after Aggregating Fields
  1. Recency: The more recently a customer has interacted or transacted with a brand. How long has it been since a customer engaged in an activity or made a purchase with the brand? The most common activity is a purchase for an FMCG store, though other examples include the most recent visit to a website or the use of a mobile app for other scenarios/industries.
  2. Frequency: During a given time period, how many times has a consumer transacted or interacted with the brand? Customers who participate in activities regularly are clearly more involved and loyal than those who do so infrequently. it answers the question, of how often?
  3. Monetary: This factor, also known as “monetary value,” reflects how much a customer has spent with the brand over a given period of time. Those who spend a lot of money should be handled differently from customers who spend little. The average purchase amount is calculated by dividing monetary by frequency, which is a significant secondary element to consider when segmenting customers.
  • Here, is a Python Script to rename the columns:
invoices_rfm.columns = ['Recency', 'Frequency', 'Monetary']
invoices_rfm.head()
RFM Dataframe after Renaming Field
  • We can plot the distribution using the Python Script below:
# Plot RFM distributions
plt.figure(figsize=(12,10))

# Plot distribution of R
plt.subplot(3, 1, 1); sns.distplot(invoices_rfm['Recency'])

# Plot distribution of F
plt.subplot(3, 1, 2);
sns.distplot(invoices_rfm['Frequency

# Plot distribution of M
plt.subplot(3, 1, 3); sns.distplot(invoices_rfm['Monetary'])

# Show the plot
plt.show()
Plot Distribution of Recency, Frequency, Monetary Value
  • We’ll be Calculating the R, F, and M groups.
  • Creating labels for Recency, Frequency, and Monetary Value,
  • Assigning labels created to 4 equal percentile groups,
  • Then create new columns R, F, and M.
R_labels, F_labels, M_labels = range(5,0,-1),range(1,6),range(1,6)

invoices_rfm['R'] = pd.qcut(invoices_rfm['Recency'],q=5,labels=R_labels)
invoices_rfm['F'] = pd.qcut(invoices_rfm['Frequency'],q=5,labels=F_labels)
invoices_rfm['M'] = pd.qcut(invoices_rfm['Monetary'],q=5,labels=M_labels)

invoices_rfm.head()
Pandas Data frame Showing the Calculated R, F, and M groups of the data frame
  • We have to concatenate the RFM quartile values to create RFM segments using the python scripts below:
# Concating the RFM quartile values to create RFM Segments

def concat_rfm(x): return str(x['R']) + str(x['F']) + str(x['M'])

invoices_rfm['RFM_Concat'] = invoices_rfm.apply(concat_rfm, axis=1)
invoices_rfm.head()
Pandas Data frame Showing the Created RFM Segments of the data frame
  • Now let’s count the number of unique segments
  • Then Calculate the RFM score with the python scripts below.
# Count num of unique segments
rfm_count_unique=invoices_rfm.groupby('RFM_Concat')['RFM_Concat'].nunique()
rfm_count_unique.sum()

# Calculate RFM_Score
invoices_rfm['RFM_Score'] = invoices_rfm[['R','F','M']].sum(axis=1)
invoices_rfm.head()
Pandas Data frame Showing the Calculated RFM score of each customer in the data frame
  • Then we create a conditional Statement using the python scripts below to segment Customers (by CustomerID column) as one of the segments: “Can’t Lose Them”, “Champions”, “Loyal/Committed”, “Potential”, “Promising”, “Requires attention”, or “Demands Activation”:
# Define invoices_rfm_level function

def invoices_rfm_level(df):
if df['RFM_Score'] >= 9:
return 'Can\'t Loose Them'
elif ((df['RFM_Score'] >= 8) and (df['RFM_Score'] < 9)):
return 'Champions'
elif ((df['RFM_Score'] >= 7) and (df['RFM_Score'] < 8)):
return 'Loyal/Commited'
elif ((df['RFM_Score'] >= 6) and (df['RFM_Score'] < 7)):
return 'Potential'
elif ((df['RFM_Score'] >= 5) and (df['RFM_Score'] < 6)):
return 'Promising'
elif ((df['RFM_Score'] >= 4) and (df['RFM_Score'] < 5)):
return 'Requires Attention'
else:
return 'Demands Activation'

# Create a new variable RFM_Level
invoices_rfm['RFM_Segment']= invoices_rfm.apply(invoices_rfm_level, axis=1)

# Printing the header with top 15 rows
invoices_rfm.head(15)
  • We have a Pandas Data frame Showing the Calculated RFM Segment of each customer in the data frame below:
Pandas Data frame Showing the Calculated RFM Level of each customer in the data frame
  • Calculating the average values for each RFM Segment, and return the size of each segment using the python script below:
# Calculate average values for each RFM_Level, 
# and return a size of each segment

rfm_segment_agg = invoices_rfm.groupby('RFM_Segment').agg({
'Recency': 'mean',
'Frequency': 'mean',
'Monetary': ['mean', 'count']
}).round(1)

# Print the aggregated dataset
rfm_segment_agg
  • We have a Pandas Data frame Showing the Calculated values for each RFM_Segment of each customer in the data frame below:
Pandas Data frame Showing the Calculated values for each RFM_Level of each customer in the data frame
  • Plotting the RFM Segment on the Bar plot using the Python Script below:
plt.bar(x=rfm_segment_agg.index,h=rfm_segment_agg["Monetary"]["count"])
plt.xticks(rotation=90)
Bar plot Representing the count of each Segment
  • Squarify library: I chose Squarify because, squarify library is built on top of Matplotlib, and it uses space efficiently.
  • Plotting the RFM level on the Squarify plot using the Python Script below:
rfm_segment_agg.columns = ['RecencyMean','FrequencyMean',
'MonetaryMean', 'Count']

#Create our plot and resize it.
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(16, 9)
squarify.plot(sizes=rfm_segment_agg['Count'],
label=['Can\'t Loose Them',
'Champions',
'Loyal/Commited',
'Requires Attention',
'Potential',
'Promising',
'Demands Activation'], alpha=.6 )

plt.title("RFM Segments by Count")
plt.axis('off')
plt.show()
A Squarify Plot of Customer RFM Segmentation
  • Who are your best customers?
  • Which of your customers could contribute to your churn rate?
  • Who has the potential to become valuable customers?
  • Which of your customers can be retained?
  • Which of your customers are most likely to respond to engagement campaigns?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harshal Kakaiya

Harshal Kakaiya

1 Follower

Machine Learning 🤖 | Data Science 🔬 | Data Analytics 📈