Centralized Core Data Lake : Revolutionizing Data Management in Process Industry

Introduction

In today's rapidly evolving industrial landscape, data management has become paramount for organizations aiming to enhance efficiency, streamline operations, and enable data-driven decision-making. One example of this shift is the concept known as the "Centralized Core Data Lake". This approach promises to revolutionize how data is collected, stored, and utilized across the range of operations domains in manufacturing.

It aims to provide a centralized repository for accessing& utilizing data from various applications, including SAP, LIMS, CMMS, planning, scheduling, and aggregated historian data, i.e., OT data, or Time-series data from DCS, PLC, IoT, etc., to cater to specific departmental needs. In other words, it enables a domain-driven approach to architectural design while maintaining access control and security.

Democratizing data-driven initiatives within an organization

The crux of this concept lies in its approach to diversity. The data lake is designed to be a one-stop hub that caters to the distinct requirements of each department within the organization.

Different functions may use various best-of-breed tools for their needs:

Power BI for procurement and planning.
AVEVAPI Vision for process.
Siemens XHQ for operations.
No-Code App development platforms like Mendix and Out Systems for creating their own Rolling Plan application, etc.
Specialized apps for HSE (Health Safety and Environment Management).

These functions or departments can tap into this data lake for IT/OT data and keep using their domain-specific applications without being locked into a specific application's ecosystem or excessively dependent on the IT team. For example, it allows the functional end-users to gain insights through DSV applications such as AVEVA PI VISION or SIEMENS XHQ. These applications can get any data such as SAP PM, MM, FICO, SD, LIMS, HSE, P&S, applications, etc., from the central IT/OT data lake in their desired format (Tabular, Time-series, CSV, parquet, etc.). The central IT team manages this data lake, which can be hosted on-prem or in any cloud.

The data lake is integrated with Active Directory and ensures robust authorization & authentication mechanisms. Additionally, it offers the flexibility to determine how much data processing is done centrally, thus striking a balance between centralized and departmental responsibilities.

As outlined below, the system schema provides a visual representation of how these various components work together.

Figure 1 - 1) Leverage any IT/OT data source 2) Store the data in a central data lake that is managed by the IT department, and, 3) Individual users/departments tap the data for their use (Visualization, App Development, Analytics, etc.)

‍

Aligning with Industry Needs

This proposal aligns seamlessly with the industry's need for more flexible and agile data management. It encourages organizations to:

Leverage data from a multitude of IT and OT sources.
Store this data in a central data lake managed by the IT department.
The central data-lake can be on-premise, or on cloud (e.g. SQL PaaS) or in hybrid configuration.
Allow individual users and departments to access the data, utilizing their preferred tools, thanks to a new breed of Citizen Developers.‍

‍

Conclusion

The Centralized Core Data Lake is poised to be a game-changer. It exemplifies how data management can evolve to meet the diverse needs of different organizational functions. With this new trend gaining traction, it's only a matter of time before more companies embrace this innovative approach to data management, driving them toward a brighter and more data-driven future.

‍