Manufacturing Operations Dataset Documentation
Overview
This package contains anonymised manufacturing operations datasets extracted from an MES (Manufacturing Execution System). These files support the analysis of cycle times, routing/operation durations, stoppage/downtime, and order due-date performance.
Files Summary
| File | Rows | Columns | Approx Size (MB) |
| :---- | :---- | :---- | :---- |
| df_unit_level_cycle_downtime_anonymised.csv | 165,384 | 6 | 6.34 |
| order_analysis_anonymised.csv | 53,164 | 9 | 3.83 |
| product_cycle_anonymised.csv | 952,730 | 17 | 156.75 |
| stoppage_all_anonymised.csv | 10,891 | 14 | 1.93 |
Anonymisation & Standards
- Anonymisation: Company names are replaced with Company_XX. Identifiers such as orderId, unitId, stockId, operationId, eventId, and stoppage id are anonymised tokens. No personally identifying information (PII) is present.
- Timezones: Timestamp fields include a UTC offset (e.g., +00:00). When loading into Pandas, it is recommended to use: pd.to_datetime(col, utc=True, errors='coerce').
- Negative Values: Some duration fields contain negative values due to system clock issues. These should be filtered or investigated during the cleaning phase.
Data Dictionary
1. product_cycle_anonymised.csv
Used for unit-level and operation-level benchmarking, bottleneck identification, and variability analysis across routing steps.
| Column | Type | Nullable | Description | Notes |
| :---- | :---- | :---- | :---- | :---- |
| stockId | object | No | Anonymised SKU identifier. | |
| avgEstimatedDurationMinutes | float64 | No | Average estimated duration for this stock/route context. | Range: 1 to 6.66e+03 |
| avgActualDurationMinutes | float64 | No | Average actual duration for this stock/route context. | |
| unitId | object | No | Anonymised unit/item identifier. | |
| unitEstimatedDurationMinutes | float64 | No | Estimated duration for the unit. | |
| unitActualDurationMinutes | float64 | No | Actual duration for the unit. | |
| routingIndex | int64 | No | Sequence index of the step within the route. | Range: 1 to 19 |
| routingEstimatedDurationMinutes | float64 | No | Estimated duration for the routing step. | |
| routingActualDurationMinutes | float64 | No | Actual duration for the routing step. | Contains 139 negative values. |
| operationId | object | No | Anonymised operation identifier. | |
| operationEstimatedDurationMinutes | float64 | No | Estimated duration for the operation. | |
| operationActualDurationMinutes | float64 | No | Actual duration for the operation. | Contains 165 negative values. |
| itemRoutingOperationId | object | No | Unique unit-operation instance ID. | Primary key for stoppage joins. |
| Date | object | Yes | Production or extract date. | ISO timestamp; 3.35% missing. |
| Company | object | No | Anonymised company identifier. | |
2. df_unit_level_cycle_downtime_anonymised.csv
Used for unit-level rollups to compare flow time against downtime contribution.
| Column | Type | Nullable | Description | Notes |
| :---- | :---- | :---- | :---- | :---- |
| Company | object | No | Anonymised company identifier. | |
| unitId | object | No | Anonymised unit/item identifier. | |
| unitActualMin | float64 | No | Actual elapsed time for the unit (minutes). | |
| totalStopMinutes | float64 | Yes | Total stoppage/downtime minutes for the unit. | 96% missing (treat as 0). |
| numEvents | float64 | Yes | Number of stoppage events linked to the unit. | 96% missing (treat as 0). |
| numUnits | int64 | No | Count of units represented (usually 1). | |
3. order_analysis_anonymised.csv
Used for analysing on-time delivery performance and lateness patterns for completed vs open orders.
| Column | Type | Nullable | Description | Notes |
| :---- | :---- | :---- | :---- | :---- |
| orderId | object | No | Anonymised sales/work order identifier. | |
| requiredByDate | object | Yes | Required-by (due) date/time for the order. | ISO timestamp. |
| completedDate | object | Yes | Completion date/time. | Null for open orders (81%). |
| lateByDays | float64 | Yes | Order lateness in days (completed - required). | |
| lateByDays_raw | float64 | Yes | Raw lateness (includes negative values). | |
| open_days_late | float64 | Yes | Lateness relative to extract time for open orders. | |
| Is_Perfect | float64 | Yes | Perfect-order flag (1=on time, 0=late). | |
| combinedLateDays | float64 | Yes | Single metric combining open and closed lateness. | |
| Company | object | No | Anonymised company identifier. | |
4. stoppage_all_anonymised.csv
Used for identifying top downtime reasons by operation or company.
| Column | Type | Nullable | Description | Notes |
| :---- | :---- | :---- | :---- | :---- |
| id | object | No | Anonymised stoppage reason identifier. | |
| description | object | No | Stoppage reason description (e.g. Machine Issue). | |
| operationId | object | No | Anonymised operation identifier. | |
| operationName | object | No | Human-readable operation name. | |
| eventId | object | No | Anonymised stoppage event identifier. | |
| eventStartDate | object | No | Stoppage event start timestamp. | |
| eventEndDate | object | No | Stoppage event end timestamp. | |
| eventTimeTrackedInMinutes | float64 | No | Tracked time as recorded by the source system. | |
| eventDurationMinutes | float64 | No | Duration computed from end - start. | |
| itemRoutingOperationId | object | No | Unique unit-operation instance ID. | Link to product_cycle. |
| itemUnit | object | No | Anonymised unit/item identifier. | Link to unitId. |
| Company | object | No | Anonymised company identifier. | |
Suggested Join Keys
- Unit Tracking: unitId ↔ itemUnit
- Operation Specifics: itemRoutingOperationId (Best key to connect duration records to specific stoppage events).
- Organisation: Company
- Process Step: operationId
Data Quality Notes
- Missing Data: Many records in df_unit_level_cycle_downtime_anonymised.csv have nulls for downtime. This typically indicates zero downtime events occurred for that unit.
- Open Orders: A high percentage (81%) of orders are currently open. Analysis of these should focus on open_days_late rather than completedDate.
- Negatives: Negative durations in product_cycle_anonymised.csv (specifically in actual duration columns) should be treated as data quality errors.
Provenance: Please record the extract date, source system version, and any applied filters (date range, plant, or operation types) externally to ensure benchmarking is reproducible.