Data Science and Databases
9 minute read

Optimizing Retailer Revenue with Sales Forecasting AI

Ahmed is a senior data scientist who loves to dig into clients' problems and solve them using state-of-the-art data-driven solutions.

Forecasting is a technique that uses historical data and events to build estimates about future trends, potential disasters, and the overall behavior of any subject. Forecasting can be used as probabilistic support for decision analysis, to estimate expenses, revenues, and budget plans.

Forecasting in business can be divided into two distinct categories: qualitative forecasting and quantitative forecasting.

  • Qualitative forecasting. Qualitative forecasting is concerned with market research and market strategies, hence it is more expert-driven and influenced by human factors. It is usually aimed at short-term strategy-building.
  • Quantitative forecasting. Quantitative forecasting excludes any human factor. It depends only on the historical data an entity has and aims to predict what some factors like sales, prices, and other financial aspects will be in the long run.

For more information, you can take a look at Investopedia’s Financial Forecasting primer.

Both types of forecasting have shown a lot of promise and managed to create business enhancements for many entities.

If you would like to learn more about how forecasting can affect market decisions, a good place to start is Prediction Markets: Fundamentals, Designs, and Applications by Stefan Luckner et al.

One problem that we can address using quantitative forecasting is demand forecasting or sales forecasting.

Demand Forecasting and Sales Forecasting Approaches

Suppose you are a retailer operating a lot of stores, and each store has a static product stock replenishment system based on human decisions that are based on certain events such as seasons and market trends.

Occasionally, you will run into one of these problems that can lead to two major problems:

  • Overstocked products. Having a substantial stock of product planned to be sold during a certain timeframe but not sold.
  • Out-of-stock products. Having an opportunity to sell product but being unable because the product is not available.

According to an IHL Group survey of 600 households and retailers, retailers are losing nearly $1 trillion in sales annually because of out-of-stock problems.

“Shoppers encounter out-of-stocks in as often as one in three shopping trips, according to the report, which was emailed to Retail Dive. At food, drug and mass retailers, they encounter out-of-stock items in one in five trips, at department store and specialty retailers it’s one in four, and at electronics stores one in three,” IHL Group found.

As it appears, both of these problems lead to a decrease in revenue because we either lost a sale probability or we invested more money in unsold products, which means having assets that won’t generate revenue any time soon to compensate for their costs.

This is clearly detrimental to the entity’s cash flow, and to address this risk, we need two things:

  • More inputs to help us make the decision
  • A forecasting team that can do long-term strategic planning for stock replenishment systems

So, the question is: What are the indications that you need to adopt AI in your company to help your forecasting process?

To make this decision, you need expert answers to the following questions:

  • Is predicting your sales pipeline difficult?
  • Is your sales forecasting inaccurate, or not accurate enough (even though you have historical data)?
  • Do you suffer from out-of-stock or overstock problems?
  • Are you unable to extract descriptive and inferential insights from the data you possess to drive your decisions and planning?

The answers to these questions should be a clear signal that helps you decide whether to start employing AI into your forecasting strategies or not.

How Can AI Benefit the Sales Forecasting Process?

AI has shown great results in outperforming human forecasting in many companies, enabling faster decision-making and planning as well as more reliable risk management strategies. This is why top companies are adopting AI in their planning.

When dealing with a demand forecasting problem, the time series forecasting method can be used to predict the sales for each product, thus allowing companies to optimize stock replenishment and minimizing the occurrence of the aforementioned problems. However, many models struggle with forecasting at an individual product level, or product category level, because of the lack of necessary features. So, the question is: How can we make it work and make the most out of our data?

For real-life retailers, these problems are anything but trivial. You either have 1,000+ products that introduce a lot of non-linearity in the dataset and multivariate dependencies, or you need to be warned about the amount of projected stock replenishment with a lot of advance notice to be able to produce or buy it, or do whatever you need to acquire it by the time demand materializes.

In this case, classical models like ARIMA and ETS won’t perform, and we will need a more robust method like RNNs and XGBoost, and this is why we need a lot of feature creation to tackle this problem.

For this to work, we need to:

  • Acquire the necessary input features required to explain the variety and diversity of the products.
  • Categorize our data, so each category is of the same time series behavior, and each category will be addressed using a standalone model.
  • Train our models on the categorized input features acquired.

For the sake of this article, we will take XGBoost as an example of such a model.

Required Features in Sales Forecasting Models

The set of features needed for this problem is classified into four main groups:

  • Time-related features
  • Sales-related features
  • Price-related features
  • Stock-related features

Unlike deep learning (Recurrent Neural Network), machine learning models cannot get long-term or short-term dependencies within a time series without creating a manual feature extraction layer for the datetime feature.

Many features can be extracted from the date, such as:

  • Year
  • Day
  • Hour
  • Weekend or weekday (whether the day is a weekday or a weekend)
  • Dayofweek

Many approaches just extract those time features and use them as inputs and train models, but further engineering can be done. As we can see, the features (day, hour, dayofweek) are periodic, which means they have a range of repetitive values. How can a model deal with this?

The short answer is, it can’t because what the model sees is that hour 00:00 is 23 hours away from 23:00, but in fact, it is one hour away. One way to solve this is to convert these features into cyclic transformation.

Time-related Features

Using the concept of sine and cosine, or vector representation, one can convert each hour (24 hours) into an angle, and using the sin and cosine of them will make it much easier for the model to detect the real proportions between hours, regardless of the periodicity.

This will remove the discontinuity happening in the periodic time features, or any periodic feature.

For our article, we will use the Sample Superstore dataset found publicly and try to predict the target monthly sales for a certain product category.

Also, we will use Python 3.7 environment with the following libraries:

  • NumPy
  • Pandas
  • XGBoost
  • Sklearn

Now, I will show you how to build the period feature converting function and test whether it was helpful or not.

def convert_periodic(val,period):
    theta = 2*np.pi*val/period
    sin_period = np.sin(theta)
    cos_period = np.cos(theta)
    return sin_period,cos_period

def convert_month(x):
    return convert_periodic(x,12)

df['sin_month'], df['cos_month'] = zip(*df['month'].map(convert_month))

With this in place, we are ready to test whether the added feature will improve performance or not.

X = df.drop(['Order Date','Sales','sin_month', 'cos_month'],axis = 1)
y = np.log1p(df['Sales'])


X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, shuffle=False)

As we can see, we have done a log 1p transform for our target sales feature, because it is a skewed feature (not normally distributed).

Now, we will fit an XGBoost regressor on the data.

y_pred = model.predict(X_test)

print(f'Loss without cyclic conversion on testing set is {sqrt(mean_squared_error(y_pred,y_test))}')

Loss without cyclic conversion on testing set is 0.4313676193485837

Next, we will try with our created feature.

X = df.drop(['Order Date','Sales'],axis = 1)
y = np.log1p(df['Sales'])


X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, shuffle=False)
y_pred = model.predict(X_test)

print(f'Loss with cyclic conversion on testing set is {sqrt(mean_squared_error(y_pred,y_test))}')

Loss with cyclic conversion on testing set is 0.33868030449130826

As we can see, the loss improved from 0.43 RMSE to 0.33 RMSE.

Some other time-related features that you can think about, depending on your problem, are:

  • Number of months since the item was in the store
  • Number of days since the last sale

This is the main core input feature needed to predict our sales, so how to get the most out of sales data? We can achieve this using the concept of lag and autocorrelation.

Lag features are historical sales records for the products. For example, if we took a 12-lag feature for monthly sales as an input to our model to predict sales for May 2020, it means we will provide the model with data records between May 2019 and April 2020. This can be really helpful.

Also, it can be interpreted using autocorrelation plots to check how correlated is the target feature with its lagged features. This also helps to select only the correlated features among the lagged features, so we decrease memory usage and feature redundancy.

This is how we can add lag features into our dataframe:

for i in range(3):
    df[f'lag_{i+1}'] = df['Sales'].shift(i+1)
df = df.dropna()
df.head()

Sales-related Features

Here, I chose a value of a three-lag feature to be included in our training set. This feature is a hyperparameter—you can choose it based on the autocorrelation plot or by trying many values and just choosing the best in the tuning stage.

X = df.drop(['Order Date','Sales'],axis = 1)
y = np.log1p(df['Sales'])


X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, shuffle=False)
y_pred = model.predict(X_test)

print(f'Loss with lag features on testing set is {sqrt(mean_squared_error(y_pred,y_test))}')

Loss with lag and aggregated sales features on testing set is 0.2862175857169188

Now, RMSE has improved to 0.28, using both lag features and cyclic conversions.

Some additional sales-related features you may add:

  • Item fraction sold (the fraction of items sold in terms of the total sales in a store)
  • Frequency of sale events for the item’s category
  • Adding the concept of seniority

Seniority is a concept introduced to assign a seniority level to new items in a store:

  • Seniority 0: items new to the company
  • Seniority 1: items never sold in this store but sold in other stores of the company
  • Seniority 2: items that have been sold in this store before

A simple argument is that one of the direct causes of sales’ rise and decline is price and promotions. Price is one of the best ways to differentiate between different categories, subcategories, and super-categories of products.

For example, assuming that a category and a subcategory has been assigned to each product, one can create the following price features:

  • (Mean, Max, Min, Median) prices across category
  • (Mean, Max, Min, Median) prices across subcategory
  • Comparisons between those statistics, such as the difference between each statistic in both category and subcategory

This aggregation can be performed several times using many groupings by subject (assuming we aim to predict monthly demand), such as:

  • Monthly, Store, category
  • Monthly, Store, subcategory
  • Monthly, Store, Item, category
  • Monthly, Store, Item, subcategory

Also, more features can be added excluding the Monthly grouping to study the behavior of prices overall.

This one is not so common among retailers and sales forecasters, but it makes a lot of difference in sales forecasting models. Stock datasets mainly have the inventory data of each product, daily in each store. From this, we can combine it with sales data to get a monthly turnover ratio for each product. This ratio will indicate how fast the stock of a product gets sold completely, and it has two main benefits:

  • It can help the model forecast sales based on the current inventory level.
  • It can help us use this value to cluster products into slow-, medium-, and fast-moving products. This clustering will help us with decision-making and modeling.

For this, you need daily inventory data for each product, along with the sales data, and then you can calculate the inventory turnover ratio as follows:

Stock-related Features

Hint: These aggregations are done based on a time range. For example, if we are working on forecasting monthly sales, then ITO will be calculated as total sales in the last month over average inventory value during the same month.

Sales Forecasting Can Turn Data Into Opportunity

In summary, sales forecasting can help firms increase revenues and turn a profit, provided they have the right data pipelines and use the correct feature engineering methods. This article was a trial to show that all kinds of data can be useful in solving this problem.

Every company should investigate whether AI is needed for its forecasting problems, and in case it is, it will need expert AI engineers and advice from machine learning engineers to create a sales forecasting system of its own.

If you are a company/retailer willing to apply this sales forecasting technique, start by gathering all the data you can, especially daily sales, daily inventory, and daily transactions.

Once you possess this data, you can use it to increase your revenues and optimize stock replenishment strategies, allowing your business to make the highest profit possible with available resources, as demonstrated in several examples above, as well as sales forecasting practices used by leading retailers.

References:

The dataset
Further reading on financial forecasting

Understanding the basics

How do you calculate a sales forecast (on a product level)?

By gathering the sales, stock, price data, creating a database for them, preprocessing them, and performing feature engineering to create explainable features, then applying a forecasting method like XGBoost or RNN.

What are the four steps to preparing a sales forecast?

The sales forecasting process is divided into four steps: data gathering, data preprocessing, feature engineering, and data modeling.

What is the best forecasting method for sales?

ARIMA and ETS are perfect for total sales, but on the product level, something like XGBoost or RNN performs better.

Why is demand/sales forecasting important?

Because it solves the two main problems of demand and sales, which are excessive stock and out-of-stock problems. This leads to higher revenue and better cash flow.

What is the difference between sales potential and sales forecast?

Sales potential answers the question, “How many units of a certain brand could be sold?” On the other hand, a sales forecast answers the question, “How many units will be sold?”