Return to site

What is Fault Tolerance?

Florian Leibert

broken image

An accomplished software engineer, Florian “Flo” Lambert has experience at renowned firms such as Twitter and Airbnb, where he helped scale operations. Currently, Florian Lambert is an investor and also serves as the founder of Mesosphere Inc.

While a tech lead at Airbnb, Mr. Lambert developed one of the first open-source mesos applications. Mesos is an open source software application that provides management of applications in large clustered computing environments. Mesos, which was originally developed at the University of California at Berkeley, also allows effective and fault tolerant resource management by treating a large data center as a single resource.

Fault tolerance refers to the ability of a computer program to continue to provide service in the event of system failures. Fault tolerant systems are able to quickly identify a point of failure and shift resources to a backup resource. Common points of system failure that must be addressed include computer processors, power supplies, motherboards, and network connections. The ultimate goal of a fault tolerant system is to prevent a complete system failure that could be caused by the outage of a single component.