Manufacturing Operations Dataset Documentation
Overview
This package contains anonymised manufacturing operations datasets extracted from an MES (Manufacturing Execution System). These files support the analysis of cycle times, routing/operation durations, stoppage/downtime, and order due-date performance.
Anonymisation & Standards
- Anonymisation: Company names are replaced with Company_XX. Identifiers such as orderId, unitId, stockId, operationId, eventId, and stoppage id are anonymised tokens. No personally identifying information (PII) is present.
- Timezones: Timestamp fields include a UTC offset (e.g., +00:00). When loading into Pandas, it is recommended to use: pd.to_datetime(col, utc=True, errors='coerce').
- Negative Values: Some duration fields contain negative values due to system clock issues. These should be filtered or investigated during the cleaning phase.
Data Quality Notes
- Missing Data: Many records in df_unit_level_cycle_downtime_anonymised.csv have nulls for downtime. This typically indicates zero downtime events occurred for that unit.
- Open Orders: A high percentage (81%) of orders are currently open. Analysis of these should focus on open_days_late rather than completedDate.
- Negatives: Negative durations in product_cycle_anonymised.csv (specifically in actual duration columns) should be treated as data quality errors.
Provenance: Please record the extract date, source system version, and any applied filters (date range, plant, or operation types) externally to ensure benchmarking is reproducible.