Decouple Storage from Compute
Big Data LDN 2019 | Decouple Storage from Compute – Put Data in the Hands of Decision Makers | Matt Houghton
What you'll learn
Kaggle’s state of data science and machine learning survey showed in some industries that up to 92% of people say that bad, unavailable, or difficult to access data is one of the top barriers to success in data and analytic projects. This is a problem because without data there is no AI. The volume and types of data that organisations need to store and process is increasing. Large data appliances that run 24/7 are often unsuitable when designing and implementing cost effective data architectures. Fortunately cloud platforms have a number of services that can help us develop and deliver a modern, cost effective and flexible data architecture. This talk will outline and demonstrate a number of services in AWS to achieve this. The talk will start with taking a look at S3 as a storage layer that is decoupled from our compute services. We will look at discovery and classification of data in S3 using Glue and Macie and how we can apply appropriate lifecycle management and security. Moving on to the various types of processing a data project may require, Matt Houghton will look at Lambda and Glue for ETL and Athena and Redshift for query and analysis. We will look at the ready baked machine learning services available that can jump start your AI/ML capabilities. Finally we will look at how we can put data directly into the hands of decision makers using QuickSight and Alexa and deliver a self service BI capability.