NUS
 
ISS
 

Digital Resilience

Keep your IT running smoothly & error-free, scale well to meet growing business, and avoid the losses & penalties arising from resilience issues.

Overview

Part of -
Duration 3 days
Course Time 9:00am - 5:00pm
Enquiry Please email ask-iss@nus.edu.sg for more details.

Digital systems have become mission-critical for businesses and indispensable for individuals.

This is evident from the significant public impact, widespread media coverage, and severe penalties faced by companies and their leaders when digital resilience issues occur – e.g. unscheduled system downtime, or poor system performance, or major functional errors.

Notable foreign examples include banks having penalties for the organisation and their leaders due to disruptive system downtime; chaotic system performance for online booking of popular concerts; and miscalculation of financial grants for hundreds of thousands of students.

Singapore also has had its share of digital resilience incidents.

Arising from above, organisations recognise the importance of system resilience, and as such, are investing heavily in hardware, software, and skilled personnel to ensure their systems remain robust.

These efforts aim to maintain smooth business operations, meet customer expectations, scale effectively with growing demand, and avoid the financial and reputational damage caused by system failures.

This course caters to the need for skilled personnel and equips IT professionals with best practices across the entire system lifecycle, to enable them to manage systems effectively and ensure high resilience.


Upcoming Classes

Class 1 25 Nov 2025 to 27 Nov 2025 (Full Time)

Duration: 3 days

Time:
09:00am to 05:00pm



Key Takeaways

 

At the end of this course, the participants will be able to have the competencies in the below to OPTIMISE and IMPROVE the RESILIENCY of their digital systems:

  • DEFINE optimal cost-effective resilience targets for digital services and manage expectations accordingly, as well as collaborate with users on business requirements that facilitate such resiliency 
  • DESIGN for below capabilities using industry best practices
    • Availability
    • Recoverability/Maintainability
    • Performance
  • BUILD & TEST the services using best practices (e.g. "shift-left"; scaling tests, failure recovery tests etc) 
  • OBSERVE through relevant observability and monitoring tools and best practices including use of AI  
  • OPERATE with best practices to keep up with changing risks and rising costs to ensure high resilience





Who Should Attend

The course is designed for:
• IT Managers
• Solution Architects
• Application System Team Leads
• DevSecOps Team Lead/Manager
• Senior developers
• Ops / SRE members
• Incidence/Crisis team members

Pre-requisites
• Currently in above roles, or equivalent
• Have at least 3 years of working experience in designing, developing or managing  digital services (3 years can include relevant time from earlier job roles) 




What Will Be Covered

  • DEFINE optimal cost-effective availability and performance targets for the digital services and error budgets (from Google’s Site Reliability Engineering methodology) or other techniques to maintain low tech debt to achieve this; as well as collaborate with users on choice of business requirements (e.g. whether really need real-time updates vs near real-time) that potentially reduce complexity and facilitate resiliency
  • DESIGN for capabilities such as:
    • Availability - through applying relevant best practices for redundancy, distribution etc
    • Recoverability/Maintainability - through modularity and flexibility
    • Performance - through relevant scaling and optimised design for good performance, as well as designing to avoid bottlenecks 
  • BUILD & TEST the services using best practices such as "shift-left" development; thorough automated functional and non-functional testing including relevant scaling tests, performance tests, failure recovery tests and observability tests; use of safe progressive releases
  • OBSERVE through relevant observability and monitoring tools and best practices including monitoring for infra, applications and users; also, use of AI to speed up problem diagnosis and resolution
  • OPERATE to ensure high resilience through best practices such as ongoing proactive risk management, continuous improvements (e.g. chaos engineering from Netflix e.g. via AWS Fault Injection Simulator, tuning, capacity planning & upgrades, Google SRE’s Elimination of Toil), good incident/crisis management plans and processes (including Google SRE’s Blameless Post-Mortem)



Fees & Subsidies

 



loading

Certificate

The ISS Certificate of Completion will be issued to participants who have attended at least 75% of the course and pass the required assessments.




Preparing for Your Course

NUS-ISS Course Registration Terms and Conditions

Find out more.

NUS-ISS and Learner’s Commitment and Responsibilities

Find out more.

WIFI Access

WIFI access will be made available to participants.

Venue

NUS-ISS
25 Heng Mui Keng Terrace
Singapore 119615

Click HERE for directions to NUS-ISS

In the event of a change of venue, participants are advised to refer to the acceptance email sent one week prior to the commencement date.

Course Confirmation

All classes are subject to confirmation and NUS-ISS will send an acceptance email to participants one week prior to the commencement date. Confirmed registrants are to attend and complete all lectures, class exercises, workshops and assessments (where applicable). Additionally, all responses to feedbacks and surveys conducted by NUS-ISS and its partners must be submitted. All training and assessments will be delivered as described in the course webpage.

General Enquiry

Please feel free to write to ask-iss@nus.edu.sg if you have any enquiry or feedback.




Course Resources

Develop your Career in the Following
Training Roadmap(s)

Please click on the discipline(s) to view the training roadmap of related courses to assess your training needs and goals.

Software Systems

Architecting the backbones of smart cities

Read More Software Systems

You Might be Interested in...

Related Event

Graduate Programme Preview Talks & Information Sessions 22 Mar 2026

NUS Master of Technology (MTech) & Graduate Diploma in Systems Analysis (GDipSA) Preview Talk, Entrance Test & Interview (DELHI, INDIA)

Read More

Graduate Programme Preview Talks & Information Sessions 19 Mar 2026

NUS Master of Technology (MTech) & Graduate Diploma in Systems Analysis (GDipSA) Preview Talk, Entrance Test & Interview (KOCHI, INDIA)

Read More

Graduate Programme Preview Talks & Information Sessions 17 Mar 2026

NUS Master of Technology (MTech) & Graduate Diploma in Systems Analysis (GDipSA) Preview Talk, Entrance Test & Interview (MUMBAI, INDIA)

Read More

A+
A-
Scrolltop
More than one Google Analytics scripts are registered. Please verify your pages and templates.