Pređite na sadržaj

Promene

View changes from to


Na 20. februar 2026. 17:48:01 UTC, Gravatar Dermot Kerr:
  • Updated description of Manufacturing Operations Dataset from

    # **Manufacturing Operations Dataset Documentation** ## **Overview** This package contains anonymised manufacturing operations datasets extracted from an MES (Manufacturing Execution System). These files support the analysis of **cycle times**, **routing/operation durations**, **stoppage/downtime**, and **order due-date performance**. ## **Files Summary** | File | Rows | Columns | Approx Size (MB) | | :---- | :---- | :---- | :---- | | df\_unit\_level\_cycle\_downtime\_anonymised.csv | 165,384 | 6 | 6.34 | | order\_analysis\_anonymised.csv | 53,164 | 9 | 3.83 | | product\_cycle\_anonymised.csv | 952,730 | 17 | 156.75 | | stoppage\_all\_anonymised.csv | 10,891 | 14 | 1.93 | ## **Anonymisation & Standards** * **Anonymisation:** Company names are replaced with Company\_XX. Identifiers such as orderId, unitId, stockId, operationId, eventId, and stoppage id are anonymised tokens. No personally identifying information (PII) is present. * **Timezones:** Timestamp fields include a UTC offset (e.g., \+00:00). When loading into Pandas, it is recommended to use: pd.to\_datetime(col, utc=True, errors='coerce'). * **Negative Values:** Some duration fields contain negative values due to system clock issues. These should be filtered or investigated during the cleaning phase. ## **Data Dictionary** ### **1\. product\_cycle\_anonymised.csv** Used for unit-level and operation-level benchmarking, bottleneck identification, and variability analysis across routing steps. | Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | **stockId** | object | No | Anonymised SKU identifier. | | | **avgEstimatedDurationMinutes** | float64 | No | Average estimated duration for this stock/route context. | Range: 1 to 6.66e+03 | | **avgActualDurationMinutes** | float64 | No | Average actual duration for this stock/route context. | | | **unitId** | object | No | Anonymised unit/item identifier. | | | **unitEstimatedDurationMinutes** | float64 | No | Estimated duration for the unit. | | | **unitActualDurationMinutes** | float64 | No | Actual duration for the unit. | | | **routingIndex** | int64 | No | Sequence index of the step within the route. | Range: 1 to 19 | | **routingEstimatedDurationMinutes** | float64 | No | Estimated duration for the routing step. | | | **routingActualDurationMinutes** | float64 | No | Actual duration for the routing step. | Contains 139 negative values. | | **operationId** | object | No | Anonymised operation identifier. | | | **operationEstimatedDurationMinutes** | float64 | No | Estimated duration for the operation. | | | **operationActualDurationMinutes** | float64 | No | Actual duration for the operation. | Contains 165 negative values. | | **itemRoutingOperationId** | object | No | Unique unit-operation instance ID. | **Primary key** for stoppage joins. | | **Date** | object | Yes | Production or extract date. | ISO timestamp; 3.35% missing. | | **Company** | object | No | Anonymised company identifier. | | ### **2\. df\_unit\_level\_cycle\_downtime\_anonymised.csv** Used for unit-level rollups to compare flow time against downtime contribution. | Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | **Company** | object | No | Anonymised company identifier. | | | **unitId** | object | No | Anonymised unit/item identifier. | | | **unitActualMin** | float64 | No | Actual elapsed time for the unit (minutes). | | | **totalStopMinutes** | float64 | Yes | Total stoppage/downtime minutes for the unit. | 96% missing (treat as 0). | | **numEvents** | float64 | Yes | Number of stoppage events linked to the unit. | 96% missing (treat as 0). | | **numUnits** | int64 | No | Count of units represented (usually 1). | | ### **3\. order\_analysis\_anonymised.csv** Used for analysing on-time delivery performance and lateness patterns for completed vs open orders. | Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | **orderId** | object | No | Anonymised sales/work order identifier. | | | **requiredByDate** | object | Yes | Required-by (due) date/time for the order. | ISO timestamp. | | **completedDate** | object | Yes | Completion date/time. | Null for open orders (81%). | | **lateByDays** | float64 | Yes | Order lateness in days (completed \- required). | | | **lateByDays\_raw** | float64 | Yes | Raw lateness (includes negative values). | | | **open\_days\_late** | float64 | Yes | Lateness relative to extract time for open orders. | | | **Is\_Perfect** | float64 | Yes | Perfect-order flag (1=on time, 0=late). | | | **combinedLateDays** | float64 | Yes | Single metric combining open and closed lateness. | | | **Company** | object | No | Anonymised company identifier. | | ### **4\. stoppage\_all\_anonymised.csv** Used for identifying top downtime reasons by operation or company. | Column | Type | Nullable | Description | Notes | | :---- | :---- | :---- | :---- | :---- | | **id** | object | No | Anonymised stoppage reason identifier. | | | **description** | object | No | Stoppage reason description (e.g. Machine Issue). | | | **operationId** | object | No | Anonymised operation identifier. | | | **operationName** | object | No | Human-readable operation name. | | | **eventId** | object | No | Anonymised stoppage event identifier. | | | **eventStartDate** | object | No | Stoppage event start timestamp. | | | **eventEndDate** | object | No | Stoppage event end timestamp. | | | **eventTimeTrackedInMinutes** | float64 | No | Tracked time as recorded by the source system. | | | **eventDurationMinutes** | float64 | No | Duration computed from end \- start. | | | **itemRoutingOperationId** | object | No | Unique unit-operation instance ID. | Link to product\_cycle. | | **itemUnit** | object | No | Anonymised unit/item identifier. | Link to unitId. | | **Company** | object | No | Anonymised company identifier. | | ## **Suggested Join Keys** * **Unit Tracking:** unitId ↔ itemUnit * **Operation Specifics:** itemRoutingOperationId (Best key to connect duration records to specific stoppage events). * **Organisation:** Company * **Process Step:** operationId ## **Data Quality Notes** * **Missing Data:** Many records in df\_unit\_level\_cycle\_downtime\_anonymised.csv have nulls for downtime. This typically indicates zero downtime events occurred for that unit. * **Open Orders:** A high percentage (81%) of orders are currently open. Analysis of these should focus on open\_days\_late rather than completedDate. * **Negatives:** Negative durations in product\_cycle\_anonymised.csv (specifically in actual duration columns) should be treated as data quality errors. **Provenance:** Please record the extract date, source system version, and any applied filters (date range, plant, or operation types) externally to ensure benchmarking is reproducible.
    to
    # **Manufacturing Operations Dataset Documentation** ## **Overview** This package contains anonymised manufacturing operations datasets extracted from an MES (Manufacturing Execution System). These files support the analysis of **cycle times**, **routing/operation durations**, **stoppage/downtime**, and **order due-date performance**. ## **Files Summary** | File | Rows | Columns | Approx Size (MB) | | :---- | :---- | :---- | :---- | | df\_unit\_level\_cycle\_downtime\_anonymised.csv | 165,384 | 6 | 6.34 | | order\_analysis\_anonymised.csv | 53,164 | 9 | 3.83 | | product\_cycle\_anonymised.csv | 952,730 | 17 | 156.75 | | stoppage\_all\_anonymised.csv | 10,891 | 14 | 1.93 | ## **Anonymisation & Standards** * **Anonymisation:** Company names are replaced with Company\_XX. Identifiers such as orderId, unitId, stockId, operationId, eventId, and stoppage id are anonymised tokens. No personally identifying information (PII) is present. * **Timezones:** Timestamp fields include a UTC offset (e.g., \+00:00). When loading into Pandas, it is recommended to use: pd.to\_datetime(col, utc=True, errors='coerce'). * **Negative Values:** Some duration fields contain negative values due to system clock issues. These should be filtered or investigated during the cleaning phase. ## **Data Quality Notes** * **Missing Data:** Many records in df\_unit\_level\_cycle\_downtime\_anonymised.csv have nulls for downtime. This typically indicates zero downtime events occurred for that unit. * **Open Orders:** A high percentage (81%) of orders are currently open. Analysis of these should focus on open\_days\_late rather than completedDate. * **Negatives:** Negative durations in product\_cycle\_anonymised.csv (specifically in actual duration columns) should be treated as data quality errors. **Provenance:** Please record the extract date, source system version, and any applied filters (date range, plant, or operation types) externally to ensure benchmarking is reproducible.