PNEC 2019

AI based Data Pipeline- Expanding Upon Traditional Data Management #welldata #machinelearning #bigdata #digitize #casestudy

21 May 19
1:00 PM - 1:30 PM

Tracks: Technical Trends and Innovation

A daunting business challenge involves dealing with increasing amount of data, retrieving the associated information contained in various data types, and making sense of the data to make informed business decisions. A data pipeline is a workflow developed to handle varied and complicated data types which include: structured, unstructured, and scanned data. Utilizing a combination of machine learning, scan, search, and data management technologies together enables information discovery in addition to data access. Only then a more comprehensive data analysis and better insights can be achieved, so that data will become a real asset and produce business value. It is critical to develop a proper workflow and data platform to process the data. This solution was initially proven in our Petroleum business for well data discovery. In this scenario, data was sourced from multiple business domains, databases, applications, file locations, and in a variety of formats. The primary goal was to identify all the unique wells and associated information for these wells in a condensed timeline that could not be achieved manually. The data pipeline consists of the following steps: 1) Scan and index data 2) Data identification 3) Data classification 4) Data mastering. The workflow is not pure waterfall or sequential as there are iterations enhancing the process based on various scenarios. Step 1- Scan and index data involves categorizing data types to determine which kind of data process will be utilized for subsequent steps including: Optical Character Recognition (OCR), image and table retrieval, image recognition, structured data decoding, meta data retrieval and contents indexing. For Steps 2 and 3, traditional taxonomy and modern neural networks are combined to identify and then classify the data. And finally in step 4, a BHP unique identification-based data mastering process is utilized to organize data for analytic platform consumption. This process has been tested and utilized in multiple projects and the results significantly reduce processing time and improve accuracy. This in turn has improved the business workflow and create new business value. Going forward, combining AI into established data management processes will become a more standardized practice. This solution has been proved to greatly enhance our data process and to be very adaptive to some business requirement changes. Business is very receptive of this solution and plan to apply it to more projects in the near future.