Our client is a popular retailer, with over 3000 physical stores in the United States as well an online store. They wanted to increase revenue from each of their stores by improving merchandise effectiveness in their physical stores . They wanted to determine what products should be introduced, stored, or discontinued at each store based on forecasted revenue and profits from each product.


We divided the challenge into two separate issues:
  • For existing products: We defined the catchment area for each store to apportion online sales to the physical store and identify the opportunity to sell those products offline.
  • For new products, we used offline and online sales data of similar existing products to project the sales potential, and therefore make stocking decisions.


For existing products:
  • To begin with, the model determines the catchment area for each store based on census data, population density, geographical factors, driving distance etc.
  • For each such defined catchment area, we estimate the online-offline sales ratio of each SKU to identify gaps in the merchandise mix.
  • For each SKU, the model factors in product attributes and buyers’ demographic profiles, to predict offline sales.
  • A stocking recommendation is finally based on incremental sales potential as compared to the current mix in the store.
  • The ML model continuously optimizes the merchandize mix based on the correlation between product level demand forecasts and actual sales at individual stores.
To forecast demand for new products, we needed to define its similarity to comparable existing products.
  • Similarity between products is best defined by the similarity based on attributes important to the target customer segment.
  • A genetic algorithm was designed to find the optimum weights to be assigned to each product attributes based on historical data.
  • The optimal weights were used to identify existing products similar to the new target product.
    Using historical sales data of these similar products a sales forecasting mode was developed for the new product.
  • Based on this sales forecast, a stocking recommendation for each stores was finally based on the consumer demographics for that store’s catchment area.


Python, SQL server, Machine learning – Random Forests, Spark, Genetic Algorithm, Neural Network


Testing of this model for a small geography showed great potential to optimize the merchandise mix in the store hence, leading to significant increase in sales.