이 데이터셋의 오래된 정보를 보고 있습니다. 현재 버전을 보시려면 여기를 선택하세요.

Manufacturing Operations Dataset

Manufacturing Operations Dataset Documentation

Overview

This package contains anonymised manufacturing operations datasets extracted from an MES (Manufacturing Execution System). These files support the analysis of cycle times, routing/operation durations, stoppage/downtime, and order due-date performance.

Files Summary

| File | Rows | Columns | Approx Size (MB) | | :---- | :---- | :---- | :---- | | df_unit_level_cycle_downtime_anonymised.csv | 165,384 | 6 | 6.34 | | order_analysis_anonymised.csv | 53,164 | 9 | 3.83 | | product_cycle_anonymised.csv | 952,730 | 17 | 156.75 | | stoppage_all_anonymised.csv | 10,891 | 14 | 1.93 |

Anonymisation & Standards

  • Anonymisation: Company names are replaced with Company_XX. Identifiers such as orderId, unitId, stockId, operationId, eventId, and stoppage id are anonymised tokens. No personally identifying information (PII) is present.
  • Timezones: Timestamp fields include a UTC offset (e.g., +00:00). When loading into Pandas, it is recommended to use: pd.to_datetime(col, utc=True, errors='coerce').
  • Negative Values: Some duration fields contain negative values due to system clock issues. These should be filtered or investigated during the cleaning phase.

Data Dictionary

1. product_cycle_anonymised.csv

Used for unit-level and operation-level benchmarking, bottleneck identification, and variability analysis across routing steps.

| Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | stockId | object | No | Anonymised SKU identifier. | | | avgEstimatedDurationMinutes | float64 | No | Average estimated duration for this stock/route context. | Range: 1 to 6.66e+03 | | avgActualDurationMinutes | float64 | No | Average actual duration for this stock/route context. | | | unitId | object | No | Anonymised unit/item identifier. | | | unitEstimatedDurationMinutes | float64 | No | Estimated duration for the unit. | | | unitActualDurationMinutes | float64 | No | Actual duration for the unit. | | | routingIndex | int64 | No | Sequence index of the step within the route. | Range: 1 to 19 | | routingEstimatedDurationMinutes | float64 | No | Estimated duration for the routing step. | | | routingActualDurationMinutes | float64 | No | Actual duration for the routing step. | Contains 139 negative values. | | operationId | object | No | Anonymised operation identifier. | | | operationEstimatedDurationMinutes | float64 | No | Estimated duration for the operation. | | | operationActualDurationMinutes | float64 | No | Actual duration for the operation. | Contains 165 negative values. | | itemRoutingOperationId | object | No | Unique unit-operation instance ID. | Primary key for stoppage joins. | | Date | object | Yes | Production or extract date. | ISO timestamp; 3.35% missing. | | Company | object | No | Anonymised company identifier. | |

2. df_unit_level_cycle_downtime_anonymised.csv

Used for unit-level rollups to compare flow time against downtime contribution.

| Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | Company | object | No | Anonymised company identifier. | | | unitId | object | No | Anonymised unit/item identifier. | | | unitActualMin | float64 | No | Actual elapsed time for the unit (minutes). | | | totalStopMinutes | float64 | Yes | Total stoppage/downtime minutes for the unit. | 96% missing (treat as 0). | | numEvents | float64 | Yes | Number of stoppage events linked to the unit. | 96% missing (treat as 0). | | numUnits | int64 | No | Count of units represented (usually 1). | |

3. order_analysis_anonymised.csv

Used for analysing on-time delivery performance and lateness patterns for completed vs open orders.

| Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | orderId | object | No | Anonymised sales/work order identifier. | | | requiredByDate | object | Yes | Required-by (due) date/time for the order. | ISO timestamp. | | completedDate | object | Yes | Completion date/time. | Null for open orders (81%). | | lateByDays | float64 | Yes | Order lateness in days (completed - required). | | | lateByDays_raw | float64 | Yes | Raw lateness (includes negative values). | | | open_days_late | float64 | Yes | Lateness relative to extract time for open orders. | | | Is_Perfect | float64 | Yes | Perfect-order flag (1=on time, 0=late). | | | combinedLateDays | float64 | Yes | Single metric combining open and closed lateness. | | | Company | object | No | Anonymised company identifier. | |

4. stoppage_all_anonymised.csv

Used for identifying top downtime reasons by operation or company.

| Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | id | object | No | Anonymised stoppage reason identifier. | | | description | object | No | Stoppage reason description (e.g. Machine Issue). | | | operationId | object | No | Anonymised operation identifier. | | | operationName | object | No | Human-readable operation name. | | | eventId | object | No | Anonymised stoppage event identifier. | | | eventStartDate | object | No | Stoppage event start timestamp. | | | eventEndDate | object | No | Stoppage event end timestamp. | | | eventTimeTrackedInMinutes | float64 | No | Tracked time as recorded by the source system. | | | eventDurationMinutes | float64 | No | Duration computed from end - start. | | | itemRoutingOperationId | object | No | Unique unit-operation instance ID. | Link to product_cycle. | | itemUnit | object | No | Anonymised unit/item identifier. | Link to unitId. | | Company | object | No | Anonymised company identifier. | |

Suggested Join Keys

  • Unit Tracking: unitId ↔ itemUnit
  • Operation Specifics: itemRoutingOperationId (Best key to connect duration records to specific stoppage events).
  • Organisation: Company
  • Process Step: operationId

Data Quality Notes

  • Missing Data: Many records in df_unit_level_cycle_downtime_anonymised.csv have nulls for downtime. This typically indicates zero downtime events occurred for that unit.
  • Open Orders: A high percentage (81%) of orders are currently open. Analysis of these should focus on open_days_late rather than completedDate.
  • Negatives: Negative durations in product_cycle_anonymised.csv (specifically in actual duration columns) should be treated as data quality errors.

Provenance: Please record the extract date, source system version, and any applied filters (date range, plant, or operation types) externally to ensure benchmarking is reproducible.

Dataset's Files and Resources

추가 정보

필드
저자 SMDH Data Science Team
관리자 Dermot Kerr
최종 업데이트 2월 20, 2026, 17:46 (UTC)
생성됨 2월 20, 2026, 17:42 (UTC)
Shared with SMDH data scientists False