PNEC 2019

Sessions Speakers Exhibitors Products Floor Plan Buzz

Automation of Analytics & Data Pipelines #digitize #analytics #machinelearning #automation #bigdata (Room Salon A-D)

21 May 19

10:45 AM - 11:15 AM

Tracks: Case Studies and Solutions

Speaker(s): Stephen McAleer, Kubernetes Platform Specialist; John Archer, Open Shift Container Platform Solutions Architect, Red Hat; Audrey Reznik, ExxonMobil URC - Data Scientist, ExxonMobil

In the last few years, container technology has opened the door to the idea of application collaboration platforms. These platforms have evolved to a degree that they show great promise in being used as Data Science collaboration platforms. Fundamentally, Data Scientists have very similar challenges seen in IT and Software Development processes. For Data Scientists, it is crucial to be able to share our results and collaborate quickly and efficiently with our Colleagues and/or Users. We would like to share our knowledge of a particular platform (Red Hat OpenShift) which we have found useful for Data Science collaboration at ExxonMobil. We will discuss the desired platform requirements such as: an interactive environment, being able to store and share code/data without others needing to set up a new environment on their laptops and/or PCs, addressing security access/needs, and finally (when needed) having the ability to “burst” the environment and increase RAM, CPU and/or storage. From our journey, we would like to share how one can distinguish between an “Enterprise” and a “non-Enterprise” ready platform. With respect to “Enterprise-ready” platforms, we will overview the Cloud Native Computing Foundation (CNCF) and their role in defining Kubernetes which is the backbone of container orchestration. We will also highlight the roadmap for Kubernetes including support for unique performance computing requirements which Data Science workloads demand along with Artificial Intelligence, Machine Learning and Neural Network and what has traditionally been considered typical HPC compute. Now having the knowledge of what an “Enterprise-ready” platform is, we can then describe the path for Data Science users to leverage this computing platform that allows for sharing of expensive resources such as GPUs, FPGAs and Infiniband, quotas and thresholds and CI/CD integration. In particular, CI/CD integration is now creating an emerging notion of ScienceOps workflows for automating data science analysis and moving from Data Science development into production Data Science applications.

Home Sessions Speakers Exhibitors Products Floor Plan