Pyspark - Python and Spark
Data Engineering at scale using Python and Big Data eco system of tools.
Here is the high level outline for the workshop: 1) Revision of basic python programming 2) Overview of Big Data eco system 3) Data Engineering at scale with Spark core APIs using Python as programming language 4) Overvew of Spark SQL and Data Frames 5) Development life cycle and execution life cycle. Training will be provided using state of the art 10 node Big Data cluster with hands on approach.
- A laptop (64 bit operating system and 4 GB RAM are highly desired)
- Browser - Chrome or Firefox
- Basic understanding of Python programming - loops, exception, file handling and collections
Durga Gadiraju is technology evangelist and consultant with close to 14 years of experience in building data driven applications at scale. For past 4 years, Durga is primarily focused on Big Data in the areas of consulting, delivery and training. His online platform itversity, is well known in IT community in the areas of Big Data and Cloud. itversity will be a free continuous learning platform for IT professionals.