Pareto Analysis Tool
Last version update is AutoPareto21 – 13/05/2024
File Name: AutoPareto21.py
Number of lines: 1,069
Introduction
Pareto analysis, often referred to as 80/20 rule, is a foundational technique in warehousing and inventory management that helps identify which stock keeping units (SKUs) contribute most significantly to overall sales or activity. Typically, a small percentage of SKUS (e.g. 20%) account for the majority (e.g. 80%) of sales or operational throughput. By categorizing SKUs based on their contribution, warehouse planners can optimize storage layouts, picking strategies, and replenishment priorities. This targeted approach enhances operational efficiency, reduces handling times, and ensures that high-impact items receive appropriate focus in design and resourcing decisions.
The Korouei Pareto Analysis Tool is a Python-based solution developed to support data-driven inventory classification and warehouse optimization. It enables users to perform SKU-level analysis by integrating sales and dimensional data to identify high-impact items across customizable mover categories (AA, A, B, C, D). The tool offers built-in volumetric calculations, tote capacity estimation, filtering by custom parameters, and dynamic threshold adjustment. It also includes optional XYZ demand consistency analysis to complement the value-based categorization. With automated visualizations, comprehensive Excel exports, and user-guided inputs, the tool provides a powerful and adaptable framework for understanding SKU performance and informing operational decisions.
Running the code
Upon execution, the tool prompts the user to confirm whether external SKU information should be imported. This step is essential for enabling the volumetric analysis module, which estimates the number of units that can fit into a standard tote, as well as the total number of totes required per SKU category.
To perform this analysis, the imported SKU file must contain the following three parameters:
· Each Width
· Each Height
· Each Length
Without these three fields, the volumetric analysis cannot proceed.
Additionally, the tool merges two selected Excel files and consolidates all columns into a single Pandas DataFrame, ensuring the data is prepared for downstream analysis and reporting.
The output file shown above includes key calculated fields that support volumetric and sales analysis at the SKU level. The columns SKU Volume and SKU Sold Volume represent the individual and aggregated volumes sold, based on the input dimensions and sales quantity.
Additional calculated fields include:
· SKU/TOTE: Indicates how many units of each SKU can physically fit into a single tote, based on the item’s volume.
· TOTES/SKU: Reflects the total number of totes required to accommodate all units of that SKU sold within the selected period.
These insights are essential for downstream warehousing tasks such as tote planning, storage optimization, and throughput estimation.
Following data import, the next user prompt asks whether to adjust the Movers Category thresholds. These thresholds define how SKUs are grouped based on their contribution to total sales volume, using a tiered classification model: AA, A, B, C and D.
This categorization helps identify high-performing SKUs (e.g., top sellers) versus long-tail items, which is essential for optimizing picking strategies, storage allocation, and replenishment cycles.
By default, the tool applied the following percentage-based classifications (ranked by cumulative SKU contribution):
· AA – TOP 0% to 5% of SKU
· A – 5% to 15%
· B – 15% to 30%
· C – 30% to 50%
· D – 50% to 100%
While the tool allows full customization of these thresholds, it is recommended to use the standard configuration unless a specific business case requires adjustment.
After confirming the Movers Category settings, the tool provides an optional filtering function. This allows users to narrow down the dataset based on specific column values before running the analysis.
If the user selects ‘Y’ to apply a filter, the tool prompts for the name of the column to filter. Once a valid column name is entered, all unique values in that column are listed for selection.
The following filtering options are available.
· Int – Select values by their corresponding index number for quicker entry.
· Manual entry: Type in the exact value(s) directly.
· d – Deletes the last entered value in case of a mistake.
· n – Ends the filter selection process and moves on to the next input prompt.
This flexible filtering system allows for efficient targeting of relevant subsets of data, such as a specific data range, product category, or customer type; helping users focus their analysis on the most meaningful slices of information.
Once filtering is complete, the tool requests the tree core input parameters required to perform the Pareto analysis. These inputs define how SKU performance will be calculated and categorized:
· SKU: The unique identifier for each SKU sold during the selected period.
· Items Sold (Sales): The quantity of each SKU sold during the selected period.
· Date : The date field used for temporal segmentation of sales data.
These inputs form the foundation for the sales volume ranking, allowing the tool to assign each SKU to its corresponding movers category based on its contribution to total sales.
After the required inputs (SKU, Sales, and Date) are entered, the tool begins the initial Pareto analysis. Upon completion, the user is prompted to optionally enable Volume Analysis by responding with Y/N.
If selected, this module calculates:
· The total volume of SKUs sold
· How many units of each SKU fit into a tote
· The total number of totes required for each SKU, based on volume
Note: The tote dimension parameters are currently hardcoded within the Python script and cannot be adjusted via the user interface. Future iterations should include user-defined tote dimensions as part of the input prompts.
Additionally, the current version assumes predefined column names for dimensional fields (SKU Length, SKU Height, SKU Width). To improve usability and flexibility, future updates should allow users to dynamically select these column names at runtime.
After the volumetric analysis is complete, the tool prompts the user to specify whether any additional columns should be carried over to the final export Excel file. This feature is useful for retaining supplementary information; for example, including a SKU Description column to better interpret which items are driving performance in the final output.
Users can continue adding columns in sequence by selecting ‘Y’, and entering the desired column names one by one. The loop will continue until any key other than ‘Y’ is pressed, at which point the process will exit and the export will proceed.
This optional step ensures the final dataset is enriched with user-defined context, improving the flexibility and value of downstream analysis.
Export Files
This chart visualizes the Pareto distribution of SKU sales across the defined move categories (AA, A, B, C, D), along with the cumulative sales contribution for each segment.
For example:
· The AA category, representing the top 5% of SKUs by volume, accounts for 34% of the total sales during the 4-day analysis period.
· The A category (next 10% of SKU’s from 5% to 15%) contributes an additional 24.6% of total sales.
The graph includes:
· A cumulative percentage curve (blue), showing how quickly a small number of SKUs contribute to the majority of sales.
· Red dashed lines to indicate the cumulative percentage thresholds for each category.
· Clear vertical markers identifying the SKU distribution breakpoints.
This visualization supports strategic decision-making around SKU prioritization, storage optimization, and process design; reinforcing the insight that a relatively small subset of SKUs drives the bulk warehouse activity.
The second chart represents a standard 80/20 Pareto analysis, which is commonly used to illustrate that approximately 80% of outcomes (e.g. sales volume) are driven by roughly 20% of causes (e.g. SKUs). While this principle serves as a useful generelization, actual warehouse data often varies.
In the scenario shown, 80% of total sales are generated by approximately 33.4% of the SKUs, indicating a broader spread of demand than the classic 80/20 assumption. This insight helps organizations evaluate SKU performance concentration and informs decisions around stock management, slotting, and automation strategy.
The chart features:
· A cumulative sales curve (blue) plotting the contribution of each SKU.
· A vertical red line marking the SKU percentage responsible for 80% of sales.
· A horizontal red line at the 80% sales level to highlight the crossover point.
This visualization complements the categorized mover graph by providing a high-level snapshot of sales concentration dynamics.
XYZ Demand Consistency Analysis
The XYZ analysis classified SKUs based on the consistency of their demand over a defined time period; While Paret (ABC) analysis focuses on value contribution, XYZ adds a temporal dimension, revealing which SKUs are ordered frequently versus sporadically. This dual analysis provides deeper insight into inventory planning and operational strategy.
XYZ Analysis Breakdown:
X Items (Regular Demand):
Criteria: Ordered on more than 67% of the analyzed days. (³ 2/3 of the time)
Profile: These items have a stable and predictable demand. These items are typically high-priority for consistent stock availability and safety stock planning.
Y Items (Intermittent Demand):
Criteria: Ordered between 33% and 67% of days. (1/3 to 2/3)
Profile: These items have less regular demand (moderate consistency) and may be influenced by factors like seasonality or promotions. Inventory strategies for these items might involve keeping a buffer stock but not as much as X items. These items benefit from flexible stocking strategies.
Z Items (Irregular Demand):
Criteria: Ordered on fewer than 33% of days. (<1/3)
Profile: Highly erratic and unpredictable. These are often niche, obsolete, or slow-moving items, which may require just-in-time ordering or minimal stock levels.
Understanding the Thresholds
The 67% and 33% threshold are standard benchmarks in inventory analytics, allowing organizations to quickly differentiate demand patterns. While they may be adjusted for specific use cases, these values strike a balance between analytical clarity and operational relevance.
67% Threshold: This means that an item categorized as X is ordered at least two-thirds of the time. It reflects high and consistent demand.
33% Threshold: This marks the boundary between Y and Z items. An item that falls below this threshold is considered irregular in its demand pattern, making it less predictable.
Operational Impact
Understanding these categories allows businesses to optimize their inventory levels:
X items demand high availability and may justify higher safety stock.
Y items require more nuanced planning and may be influenced by forecast accuracy.
Z items can introduce inefficiencies if overstocked; businesses may choose to fulfil these manually or consider phasing them out.
Chart Interpretation
The bar chart above visualizes the volume distribution across XYZ categories. Each group’s SKU count, sales total, and volume are broken down, helping planners identify where operational focus should lie.
In essence, the XYZ analysis, combined with Pareto (ABC) analysis, helps create a nuanced view of inventory management, balancing both the value and demand regularity of items.
The XYZ analysis results can also be displayed as percentage values, as shown in the chart above. This view provides a comparative breakdown of each ABC-XYZ combination by SKU count, order lines, and total items sold, giving a more digestible perspective on demand regularity and item value contribution.
Displaying data in percentage format helps quickly identify trends such as:
· The dominance of AX (high value, regularly ordered items), ideal for automation and prime storage locations.
· The presence of DZ or CX items, which may indicate inefficient stock or potential rationalization opportunities.
This combined visualization enhances decision making for inventory control, warehouse zoning, and SKU prioritization strategies.
Movers Category Distribution
The final set of visualizations provides a clear breakdown of SKU distribution and sales volume across the movers categories (AA to D).
The pie charts on the left display:
· % of SKUs per category: showing how SKUs are distributed across the mover segments.
· % of total sales per category: highlighting the contribution of each segment to overall item throughput.
The bar charts on the right show:
· Number of SKUs per category (left bar chart), emphasizing the volume of SKUs that fall into each classification.
· Total items sold per category (right bar chart), offering insight into demand concentration across categories.
These visualizations reinforce a key Pareto principle: a small number of high-performing SKUs (e.g. AA and A) account for the majority of sales, while a large number of low-impact SKUs (e.g., D) contribute relatively little to total volume.
This insight supports strategic decisions in slotting, automation, storage allocation, and SKU rationalization.
Export Excel Data
The first tab pareto analysis provides a full review of all the data and graphs that were used in the output.
This second tab pareto details is used for further filtering and offline use in excel. Any additional excel analysis can be done via this tab.
Finally Raw filtered data is the initial filtered data that was used to carry out the pareto analysis.