Key Takeaways
Upon effective completion of the course, participants will be able to:
- Understand the growth of big data and need for a scalable processing framework. Understand the fundamental characteristics, storage, analysis techniques and the relevant distributions
- Understand the distributed storage essentials, storage needs, and relevant architectural mechanism in processing large amounts of structured, semi-structured and unstructured data.
- Gain expertise with the fault-tolerant computing framework (E.g. YARN) by setting up pseudo cluster nodes or cloud based nodes for processing big data. .
- Construct configurable and executable tasks using the In Memory Processing frameworks (E.g. Spark Core). Understand the nuances of writing functional programs and use the core libraries to manipulate the large corpse of unstructured data residing as Resilient Distributed Datasets.
- Organize, store and manipulate the collected data using processing libraries. For example, using special statistical operation and stream processing data tools (E.g. Spark Special Libraries).
- Understand various data processing, querying and persistence (E.g. Spark QL APIs) available for usage in RDD’s context. Perform tasks such as filtering, selection and categorization.
Who Should Attend
This is an intermediate course, suitable for professionals with some experience in any programming language and data design. If the participants have some business exposure, they can appreciate the case studies discussed better.
This course targets analytics professional including:
- Business and IT professionals seeking analytical skills to handle large amounts of unstructured data (Data lake e.g. customer feedback, product reviews on social media, phone call recordings, etc.) for insights to improve business process and decision-making.
- Individuals who have no knowledge or experience in data engineering for analytics and would like to gain some practical skills in this area so that they may explore work opportunities in data engineering.
- Data analysts and Data Engineers, who want to move from the structured to large amounts of unstructured data engineering.
Prerequisites
This is an intensive, intermediate course. Our proposed course targets the higher value chain professionals such as data engineers, data application architects, integration architects, software engineers working on data pipeline processing and key technology decision makers.
Participants with experience in programming languages such as Python or Java or Scala will benefit more from the course. Participants also need to have a strong interest in building functional pipelines and be comfortable working with Hadoop platform and Spark framework.
NUS-ISS also offers a range of other basic courses in analytics for participants new to analytics
What to Bring
No printed copies of course materials are issued.
Participants must bring their internet-enabled computing device (laptops, tablet etc) with power charger to access and download course materials.
If you are bringing a laptop, please see below for the tech specs:
|
Minimum
|
Recommended
|
Computer and processor
|
2 Cores or more. i7 (intel) or higher preferred.
|
4 Cores or more. i7 (intel) or higher preferred.
|
Memory
|
16 GB RAM (Minimum 4 GB Free memory while running the cluster)
|
32 GB RAM(Minimum 4 GB Free memory while running the cluster)
|
Hard Disk
|
1 TB (Minimum 20 GB free for use)
|
2 TB or more
|
Display
|
Minimum of 1280 x 768 screen resolution (32-bit requires hardware acceleration for 4K and higher)
|
|
Cluster Management Software |
Run local Kubernetes clusters (example minikube or kind) on Windows, Linux, or Mac environments. |
|
Container or virtual machine manager
|
Kubernetes preferred. Other alternatives:
Docker, Hyperkit, Hyper-V, KVM, Parallels, Podman, VirtualBox, or VMware Fusion/Workstation
|
DirectX 10 graphics card for graphics hardware acceleration
|
Others
|
An internet connection – broadband wired or wireless
Speakers and a microphone – built-in or USB plug-in or wireless Bluetooth
A webcam or HD webcam - built-in or USB plug-in
|
|
What Will Be Covered
The course objective is to explore the engineering aspects of big data storage, querying and processing techniques. The course aims to teach the students to apply the newly acquired proficiencies by developing data intensive applications using distributed compute platform (e.g. using the Hadoop platform, Spark Framework and relevant tools).
A brief module description is provided below:
Agenda
|
Module 1: Introduction to Data Science, Data Engineering and Big Data
|
Module 2: Understand Big Data from an Analytics Perspective
|
Module 3: Architectural Viewpoints in Big Data
|
Module 4: The Hadoop Ecosystem for Big Data
|
Module 5: Distributed File Storage
|
Module 6: NoSQL Databases for Big Data
|
Module 7: Spark and Functional Programming for Big Data
|
Module 8: Spark and Resilient Distributed Data Sets
|
Module 9: Spark QL for Big Data
|
Module 10: Spark and Real Time Stream Processing
|
Module 11: Management of Big Data initiatives
|
Discussion and Project Requirement Elaboration
|
Project and Assessment
Project Demonstration, Report Submission and Presentations. Each team will work on a practical case study and submit/present their work done regarding the assigned Big Data project.
Closing Remarks
|
Fees & Subsidies
Fees for 2025
|
Full Fee |
Singaporeans & PRs
(self-sponsored) |
Full course fee |
S$4500 |
S$4500 |
ISS Subsidy |
- |
(S$450) |
Nett course fee |
S$4500 |
S$4050 |
9% GST on nett course fee |
S$405 |
S$364.50 |
Total nett course fee payable, including GST |
S$4905 |
S$4414.50 |
Note:
- All fees and subsidies are valid from January 2024, unless otherwise advised.
- All self-sponsored Singaporeans aged 25 and above can use their SkillsFuture Credit to pay for course fees. For more information about SkillsFuture Credit, click here.
- From 1st January 2024, the GST will be increased to 9%.
Certificate
Certificate of Completion
Participants have to meet a minimum attendance rate of 75% and are required to pass the assessment to be issued a Certificate of Completion.
Join Us
Explore the methods to structure big data analytics projects for successful adoption.
Register now to acquire expertise in strategic approaches tailored for big data projects.
Preparing for Your Course
NUS-ISS Course Registration Terms and Conditions
Find out more.
NUS-ISS and Learner’s Commitment and Responsibilities
Find out more.
WIFI Access
WIFI access will be made available to participants.
Venue
NUS-ISS
25 Heng Mui Keng Terrace
Singapore 119615
Click HERE for directions to NUS-ISS
In the event of a change of venue, participants are advised to refer to the acceptance email sent one week prior to the commencement date.
Course Confirmation
All classes are subject to confirmation and NUS-ISS will send an acceptance email to participants one week prior to the commencement date. Confirmed registrants are to attend and complete all lectures, class exercises, workshops and assessments (where applicable). Additionally, all responses to feedbacks and surveys conducted by NUS-ISS and its partners must be submitted. All training and assessments will be delivered as described in the course webpage.
General Enquiry
Please feel free to write to ask-iss@nus.edu.sg if you have any enquiry or feedback.