Big Data Engineering for Analytics

Data Pipelining Integration in Structuring Big Data Projects for Analytics and Model Development

Overview

Reference No	TGS-2020001444
Part of	Graduate Certificate in Big Data Analytics, Graduate Certificate in Engineering Big Data
Duration	5 days
Course Time	9:00am - 5:00pm
Enquiry	Please contact ask-iss@nus.edu.sg for more details.

This 5-day course helps data engineers focus on essential design and architecture while building a data lake and relevant processing platform.

Participants will learn various aspects of data engineering while building resilient distributed datasets. Participants will learn to apply key practices, identify multiple data sources appraised against their business value, design the right storage, and implement proper access model(s). Finally, participants will build a scalable data pipeline solution composed of pluggable component architecture, based on the combination of requirements in a vendor/technology agnostic manner. Participants will familiarize themselves on working with Spark platform along with additional focus on query and streaming libraries.

This course is part of the Software Systems series, Data Science series, Graduate Certificate in Big Data Engineering & Web Analytics as well as Graduate Certificate in Engineering Big Data series offered by NUS-ISS.

Key Takeaways

Upon effective completion of the course, participants will be able to:

Understand the growth of big data and need for a scalable processing framework. Understand the fundamental characteristics, storage, analysis techniques and the relevant distributions
Understand the distributed storage essentials, storage needs, and relevant architectural mechanism in processing large amounts of structured, semi-structured and unstructured data.
Gain expertise with the fault-tolerant computing framework (E.g. YARN) by setting up pseudo cluster nodes or cloud based nodes for processing big data. .
Construct configurable and executable tasks using the In Memory Processing frameworks (E.g. Spark Core). Understand the nuances of writing functional programs and use the core libraries to manipulate the large corpse of unstructured data residing as Resilient Distributed Datasets.
Organize, store and manipulate the collected data using processing libraries. For example, using special statistical operation and stream processing data tools (E.g. Spark Special Libraries).
Understand various data processing, querying and persistence (E.g. Spark QL APIs) available for usage in RDD’s context. Perform tasks such as filtering, selection and categorization.

Who Should Attend

This is an intermediate course, suitable for professionals with some experience in any programming language and data design. If the participants have some business exposure, they can appreciate the case studies discussed better.

This course targets analytics professional including:

Business and IT professionals seeking analytical skills to handle large amounts of unstructured data (Data lake e.g. customer feedback, product reviews on social media, phone call recordings, etc.) for insights to improve business process and decision-making.
Individuals who have no knowledge or experience in data engineering for analytics and would like to gain some practical skills in this area so that they may explore work opportunities in data engineering.
Data analysts and Data Engineers, who want to move from the structured to large amounts of unstructured data engineering.

Prerequisites

This is an intensive, intermediate course. Our proposed course targets the higher value chain professionals such as data engineers, data application architects, integration architects, software engineers working on data pipeline processing and key technology decision makers.

Participants with experience in programming languages such as Python or Java or Scala will benefit more from the course. Participants also need to have a strong interest in building functional pipelines and be comfortable working with Hadoop platform and Spark framework.

NUS-ISS also offers a range of other basic courses in analytics for participants new to analytics

What to Bring

No printed copies of course materials are issued.

Participants must bring their internet-enabled computing device (laptops, tablet etc) with power charger to access and download course materials.

If you are bringing a laptop, please see below for the tech specs:

	Minimum	Recommended
Computer and processor	2 Cores or more. i7 (intel) or higher preferred.	4 Cores or more. i7 (intel) or higher preferred.
Memory	16 GB RAM (Minimum 4 GB Free memory while running the cluster)	32 GB RAM(Minimum 4 GB Free memory while running the cluster)
Hard Disk	1 TB (Minimum 20 GB free for use)	2 TB or more
Display	Minimum of 1280 x 768 screen resolution (32-bit requires hardware acceleration for 4K and higher)
Cluster Management Software	Run local Kubernetes clusters (example minikube or kind) on Windows, Linux, or Mac environments.
Container or virtual machine manager	Kubernetes preferred. Other alternatives: Docker, Hyperkit, Hyper-V, KVM, Parallels, Podman, VirtualBox, or VMware Fusion/Workstation	DirectX 10 graphics card for graphics hardware acceleration
Others	An internet connection – broadband wired or wireless Speakers and a microphone – built-in or USB plug-in or wireless Bluetooth A webcam or HD webcam - built-in or USB plug-in

What Will Be Covered

The course objective is to explore the engineering aspects of big data storage, querying and processing techniques. The course aims to teach the students to apply the newly acquired proficiencies by developing data intensive applications using distributed compute platform (e.g. using the Hadoop platform, Spark Framework and relevant tools).

A brief module description is provided below:

Agenda
Module 1: Introduction to Data Science, Data Engineering and Big Data
Module 2: Understand Big Data from an Analytics Perspective
Module 3: Architectural Viewpoints in Big Data
Module 4: The Hadoop Ecosystem for Big Data
Module 5: Distributed File Storage
Module 6: NoSQL Databases for Big Data
Module 7: Spark and Functional Programming for Big Data
Module 8: Spark and Resilient Distributed Data Sets
Module 9: Spark QL for Big Data
Module 10: Spark and Real Time Stream Processing
Module 11: Management of Big Data initiatives
Discussion and Project Requirement Elaboration
Project and Assessment Project Demonstration, Report Submission and Presentations. Each team will work on a practical case study and submit/present their work done regarding the assigned Big Data project. Closing Remarks

Fees & Subsidies

Fees for 2025

	Full Fee	Singaporeans & PRs (self-sponsored)
Full course fee	S$4500	S$4500
ISS Subsidy	-	(S$450)
Nett course fee	S$4500	S$4050
9% GST on nett course fee	S$405	S$364.50
Total nett course fee payable, including GST	S$4905	S$4414.50

Note:

All fees and subsidies are valid from January 2024, unless otherwise advised.
All self-sponsored Singaporeans aged 25 and above can use their SkillsFuture Credit to pay for course fees. For more information about SkillsFuture Credit, click here.
From 1st January 2024, the GST will be increased to 9%.

Certificate

Certificate of Completion
Participants have to meet a minimum attendance rate of 75% and are required to pass the assessment to be issued a Certificate of Completion.

Join Us

Explore the methods to structure big data analytics projects for successful adoption. Register now to acquire expertise in strategic approaches tailored for big data projects.

Preparing for Your Course

NUS-ISS Course Registration Terms and Conditions

Find out more.

NUS-ISS and Learner’s Commitment and Responsibilities

Find out more.

WIFI Access

WIFI access will be made available to participants.

Venue

NUS-ISS
25 Heng Mui Keng Terrace
Singapore 119615

Click HERE for directions to NUS-ISS

In the event of a change of venue, participants are advised to refer to the acceptance email sent one week prior to the commencement date.

Course Confirmation

All classes are subject to confirmation and NUS-ISS will send an acceptance email to participants one week prior to the commencement date. Confirmed registrants are to attend and complete all lectures, class exercises, workshops and assessments (where applicable). Additionally, all responses to feedbacks and surveys conducted by NUS-ISS and its partners must be submitted. All training and assessments will be delivered as described in the course webpage.

General Enquiry

Please feel free to write to ask-iss@nus.edu.sg if you have any enquiry or feedback.

Course Resources

Develop your Career in the Following
Training Roadmap(s)

Please click on the discipline(s) to view the training roadmap of related courses to assess your training needs and goals.

Driving business decisions using insights from Data

Architecting the backbones of smart cities

NUS-ISS /
Course

Scrolltop

More than one Google Analytics scripts are registered. Please verify your pages and templates.