Ahmed Khaled
Ahmed is a senior data scientist who loves to dig into clients’ problems and solve them using state-of-the-art data-driven solutions.
Retailers often face supply and demand issues that cause them to miss out on potential sales or tie up a lot of money in overstocked products.
In this article, Toptal Data Scientist Ahmed Khaled explains how retailers can boost revenues and cut costs with sales forecasts backed by artificial intelligence.
Retailers often face supply and demand issues that cause them to miss out on potential sales or tie up a lot of money in overstocked products.
In this article, Toptal Data Scientist Ahmed Khaled explains how retailers can boost revenues and cut costs with sales forecasts backed by artificial intelligence.
Ahmed is a senior data scientist who loves to dig into clients’ problems and solve them using state-of-the-art data-driven solutions.
Forecasting is a technique that uses historical data and events to build estimates about future trends, potential disasters, and the overall behavior of any subject. Forecasting can be used as probabilistic support for decision analysis, to estimate expenses, revenues, and budget plans.
Forecasting in business can be divided into two distinct categories: qualitative forecasting and quantitative forecasting.
For more information, you can take a look at Investopedia’s Financial Forecasting primer.
Both types of forecasting have shown a lot of promise and managed to create business enhancements for many entities.
If you would like to learn more about how forecasting can affect market decisions, a good place to start is Prediction Markets: Fundamentals, Designs, and Applications by Stefan Luckner et al.
One problem that we can address using quantitative forecasting is demand forecasting or sales forecasting.
Suppose you are a retailer operating a lot of stores, and each store has a static product stock replenishment system based on human decisions that are based on certain events such as seasons and market trends.
Occasionally, you will run into one of these problems that can lead to two major problems:
According to an IHL Group survey of 600 households and retailers, retailers are losing nearly $1 trillion in sales annually because of out-of-stock problems.
“Shoppers encounter out-of-stocks in as often as one in three shopping trips, according to the report, which was emailed to Retail Dive. At food, drug and mass retailers, they encounter out-of-stock items in one in five trips, at department store and specialty retailers it’s one in four, and at electronics stores one in three,” IHL Group found.
As it appears, both of these problems lead to a decrease in revenue because we either lost a sale probability or we invested more money in unsold products, which means having assets that won’t generate revenue any time soon to compensate for their costs.
This is clearly detrimental to the entity’s cash flow, and to address this risk, we need two things:
So, the question is: What are the indications that you need to adopt AI in your company to help your forecasting process?
To make this decision, you need expert answers to the following questions:
The answers to these questions should be a clear signal that helps you decide whether to start employing AI into your forecasting strategies or not.
AI has shown great results in outperforming human forecasting in many companies, enabling faster decision-making and planning as well as more reliable risk management strategies. This is why top companies are adopting AI in their planning.
When dealing with a demand forecasting problem, the time series forecasting method can be used to predict the sales for each product, thus allowing companies to optimize stock replenishment and minimizing the occurrence of the aforementioned problems. However, many models struggle with forecasting at an individual product level, or product category level, because of the lack of necessary features. So, the question is: How can we make it work and make the most out of our data?
For real-life retailers, these problems are anything but trivial. You either have 1,000+ products that introduce a lot of non-linearity in the dataset and multivariate dependencies, or you need to be warned about the amount of projected stock replenishment with a lot of advance notice to be able to produce or buy it, or do whatever you need to acquire it by the time demand materializes.
In this case, classical models like ARIMA and ETS won’t perform, and we will need a more robust method like RNNs and XGBoost, and this is why we need a lot of feature creation to tackle this problem.
For this to work, we need to:
For the sake of this article, we will take XGBoost as an example of such a model.
The set of features needed for this problem is classified into four main groups:
Unlike deep learning (Recurrent Neural Network), machine learning models cannot get long-term or short-term dependencies within a time series without creating a manual feature extraction layer for the datetime feature.
Many features can be extracted from the date, such as:
Many approaches just extract those time features and use them as inputs and train models, but further engineering can be done. As we can see, the features (day, hour, dayofweek) are periodic, which means they have a range of repetitive values. How can a model deal with this?
The short answer is, it can’t because what the model sees is that hour 00:00 is 23 hours away from 23:00, but in fact, it is one hour away. One way to solve this is to convert these features into cyclic transformation.
Using the concept of sine and cosine, or vector representation, one can convert each hour (24 hours) into an angle, and using the sin and cosine of them will make it much easier for the model to detect the real proportions between hours, regardless of the periodicity.
This will remove the discontinuity happening in the periodic time features, or any periodic feature.
For our article, we will use the Sample Superstore dataset found publicly and try to predict the target monthly sales for a certain product category.
Also, we will use Python 3.7 environment with the following libraries:
Now, I will show you how to build the period feature converting function and test whether it was helpful or not.
def convert_periodic(val,period):
theta = 2*np.pi*val/period
sin_period = np.sin(theta)
cos_period = np.cos(theta)
return sin_period,cos_period
def convert_month(x):
return convert_periodic(x,12)
df['sin_month'], df['cos_month'] = zip(*df['month'].map(convert_month))
With this in place, we are ready to test whether the added feature will improve performance or not.
X = df.drop(['Order Date','Sales','sin_month', 'cos_month'],axis = 1)
y = np.log1p(df['Sales'])
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, shuffle=False)
As we can see, we have done a log 1p transform for our target sales feature, because it is a skewed feature (not normally distributed).
Now, we will fit an XGBoost regressor on the data.
y_pred = model.predict(X_test)
print(f'Loss without cyclic conversion on testing set is {sqrt(mean_squared_error(y_pred,y_test))}')
Loss without cyclic conversion on testing set is 0.4313676193485837
Next, we will try with our created feature.
X = df.drop(['Order Date','Sales'],axis = 1)
y = np.log1p(df['Sales'])
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, shuffle=False)
y_pred = model.predict(X_test)
print(f'Loss with cyclic conversion on testing set is {sqrt(mean_squared_error(y_pred,y_test))}')
Loss with cyclic conversion on testing set is 0.33868030449130826
As we can see, the loss improved from 0.43 RMSE to 0.33 RMSE.
Some other time-related features that you can think about, depending on your problem, are:
This is the main core input feature needed to predict our sales, so how to get the most out of sales data? We can achieve this using the concept of lag and autocorrelation.
Lag features are historical sales records for the products. For example, if we took a 12-lag feature for monthly sales as an input to our model to predict sales for May 2020, it means we will provide the model with data records between May 2019 and April 2020. This can be really helpful.
Also, it can be interpreted using autocorrelation plots to check how correlated is the target feature with its lagged features. This also helps to select only the correlated features among the lagged features, so we decrease memory usage and feature redundancy.
This is how we can add lag features into our dataframe:
for i in range(3):
df[f'lag_{i+1}'] = df['Sales'].shift(i+1)
df = df.dropna()
df.head()
Here, I chose a value of a three-lag feature to be included in our training set. This feature is a hyperparameter—you can choose it based on the autocorrelation plot or by trying many values and just choosing the best in the tuning stage.
X = df.drop(['Order Date','Sales'],axis = 1)
y = np.log1p(df['Sales'])
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, shuffle=False)
y_pred = model.predict(X_test)
print(f'Loss with lag features on testing set is {sqrt(mean_squared_error(y_pred,y_test))}')
Loss with lag and aggregated sales features on testing set is 0.2862175857169188
Now, RMSE has improved to 0.28, using both lag features and cyclic conversions.
Some additional sales-related features you may add:
Seniority is a concept introduced to assign a seniority level to new items in a store:
A simple argument is that one of the direct causes of sales’ rise and decline is price and promotions. Price is one of the best ways to differentiate between different categories, subcategories, and super-categories of products.
For example, assuming that a category and a subcategory has been assigned to each product, one can create the following price features:
This aggregation can be performed several times using many groupings by subject (assuming we aim to predict monthly demand), such as:
Also, more features can be added excluding the Monthly grouping to study the behavior of prices overall.
This one is not so common among retailers and sales forecasters, but it makes a lot of difference in sales forecasting models. Stock datasets mainly have the inventory data of each product, daily in each store. From this, we can combine it with sales data to get a monthly turnover ratio for each product. This ratio will indicate how fast the stock of a product gets sold completely, and it has two main benefits:
For this, you need daily inventory data for each product, along with the sales data, and then you can calculate the inventory turnover ratio as follows:
Hint: These aggregations are done based on a time range. For example, if we are working on forecasting monthly sales, then ITO will be calculated as total sales in the last month over average inventory value during the same month.
In summary, sales forecasting can help firms increase revenues and turn a profit, provided they have the right data pipelines and use the correct feature engineering methods. This article was a trial to show that all kinds of data can be useful in solving this problem.
Every company should investigate whether AI is needed for its forecasting problems, and in case it is, it will need expert AI engineers and advice from machine learning engineers to create a sales forecasting system of its own.
If you are a company/retailer willing to apply this sales forecasting technique, start by gathering all the data you can, especially daily sales, daily inventory, and daily transactions.
Once you possess this data, you can use it to increase your revenues and optimize stock replenishment strategies, allowing your business to make the highest profit possible with available resources, as demonstrated in several examples above, as well as sales forecasting practices used by leading retailers.
By gathering the sales, stock, price data, creating a database for them, preprocessing them, and performing feature engineering to create explainable features, then applying a forecasting method like XGBoost or RNN.
The sales forecasting process is divided into four steps: data gathering, data preprocessing, feature engineering, and data modeling.
ARIMA and ETS are perfect for total sales, but on the product level, something like XGBoost or RNN performs better.
Because it solves the two main problems of demand and sales, which are excessive stock and out-of-stock problems. This leads to higher revenue and better cash flow.
Sales potential answers the question, “How many units of a certain brand could be sold?” On the other hand, a sales forecast answers the question, “How many units will be sold?”
Located in Cairo, Cairo Governorate, Egypt
Member since March 23, 2020
Ahmed is a senior data scientist who loves to dig into clients’ problems and solve them using state-of-the-art data-driven solutions.
World-class articles, delivered weekly.
World-class articles, delivered weekly.
Join the Toptal® community.