Social discovery platform and network for entertainment, tourism and city experiences.


September 16, 2020
Please select timeslot to attend


Course Description One of the most valuable technology skills is the ability to store and process huge data sets, and this course is specifically designed to bring you up to speed on some of the hottest technologies for this task including Hadoop and Apache Spark. The top technology companies are all using Hadoop and Spark to solve their big data problems! This course will enable you to learn and master the most popular Big Data and Hadoop technologies including HDFS, MapReduce, Spark, MLlib and Spark Streaming. It’s filled with hands-on projects from various industries and verticals including transportation, advertising and entertainment. Course Outline Audience Prerequisites After the Course Unit 1 – Big Data Ecosystem What is Big Data? Big Data Characteristics Data Processing Challenges What is a Distributed File System (DFS)? Solving the Speed Problem with DFS What is Hadoop? HDFS Big Data Processing with Map Reduce Hive vs. Pig Introducing Apache Spark Map Reduce, Hive, Pig vs. Spark Ambari Web UI Hadoop Ecosystem Unit 2 – Linux Operating System Review Create a virtual machines using VirtualBox Install Linux on virtual machines Run simple Linux commands using the shell Manage files and directories from the shell prompt Create, view, and edit text files from command line with the vi editor Set Linux permissions on files and directories Access remote systems securely using SSH Configure basic Linux networking Archive files and copy them from one system to another Download, install, update, and manage software packages Unit 3 – Environment Setup Install and Setup a Hadoop Cluster on Linux Install and Setup a Spark Cluster on a Linux Configure HDFS Configure pyspark Configure Jupyter Notebook to access Hadoop and Spark Clusters Unit 4 – Spark DataFrames What is Spark? Spark DataFrame Basics Spark DataFrame Basic Operations Groupby and Aggregate Operations Handling Missing Data Working with Dates and Timestamps Practical Project: Processing a 20 million records dataset Unit 5 – Machine Learning with Spark MLlib Implement Linear Regression Model with Spark’s MLlib Practical Project: Build a regression model for a shipping company Implement a Logistic Regression model with Spark’s MLlib Practical Project: Build a classification model for a marketing agency Unit 6 – Recommender Systems Introduction to Recommender Systems Collaborative Filtering Recommender Systems Practical Project: Building a movie recommender system with Spark Unit 7 – Spark Streaming Introduction to Streaming with Spark Processing Unstructured Data with Spark Streaming Practical Project: Processing Twitter Feeds using Spark Streaming


Be the first and others will follow




Nothing here yet. Be the first


Interact with the Organizer and other attendees here


LOCATION Innosoft Gulf FZ LLC - Dubai Knowlege Village Block 6, Office F02 - الصفوحقرية المعرفة - دبي - United Arab Emirates


With like-minded people through idenity hashtags


share your reviews and experiences for a better and interactive SCENE community for you and others.


Your kind of scenes with your kind of people! Connect with both organizers and attendees!