Thursday, September 16, 2010

[CS 704D] Design Issues of a Distributed OS

We discussed the issues in the class, yet some students wanted me to add a note. I guess, it will have to be about "WHY" of the issues. Why should each of these issues are to be considered! This is the crux of the design issue anyway. We study these issues in details and how to achieve them through the course!

A centralized OS for a computer system contained in one location management of resources are comparatively easier as you can get the status of each item when asked. There is hardly any delay involved (things happen at electronic speed over various kind of buses!). The most bothersome problem with distributed systems is that you depend on the bothersome communication system to command and get information. One is never sure you have the latest information! Without the latest information about the state of the resources, you have difficulty is assigning resources and manage them in general. For example, you have the last information from the n th processor that it is hale & hearty and is ready to execute a process. Yet when you the OS is about to assign the process it may not know, it just went down! So design of a DOS is about making sure you are able to manage the resources and deliver "accurate" processed data.  A designer has to get around the issues/problems and deliver accurate results.  Designing a distributed system is how the get around the problems effectively.

We discussed in the class that following are the relevant issues.

  1. Transparency
  2. Reliability
  3. Flexibility
  4. Performance
  5. Scalability
  6. Heterogeneity
  7. Security
  8. Emulation of existing OS
Transparency: This has several dimensions that needs to be handled. The topmost transparency requirement is that the geographically distributed system should look like one integrated, monolithic system to the user. Users do not want to be bothered with the details of what resources is located where or to mention how to access the stuff. Other needs for transparency arises from uniformity in addressing the resources by the OS.

As a user I need to access the resources of the system without bothering about where they are. Similarly for location transparency. This involves name transparency and user mobility are the two dimensions. Since we are dealing with a diverse set of systems in such systems, naming should be same, else knowledge about location of the resource would be required. User mobility property lets user move to any machine in the system and be able to log in the same way.

Replication transparency makes sure that even if resources are replicated, the location etc needs to be transparent to the user, system must take care of it. Failures should again be transparent, meaning that even when failures do take place if should not affect users as far as possible ( it is difficult make it completely transparent, quite often). Many time we need to migrate resources, processes but then that should not even be known to user. Concurrency related activities carried out in OS also should be transparent meaning that user need not be aware where or how the processes are getting carried out or how concurrency is being managed.

Performance transparency requires that resources get suitably reconfigured so that reduced performance does not affect user. Worst case, user demands should be met with reduced performance but not anything disastrous. Scaling transparency calls for not affecting users even when resources are added to increase performance of the system.

Reliability: Failure can be fail-soft or Byzantine. Either case we should have means of fault avoidance, ability to tolerate failure and fault detection and recovery. This directly derives from the original requirement that we deliver accurate processing.

Flexibility: Ease of maintenance, ease of enhancement are definitely required. Except that the qualifying clause is "as far as possible". The effort is to reach as close to 100% as possible. That takes quite bit of  close design effort though.

Performance: This is always a consideration. We should be able to squeeze out the maximum performance as possible from a given set of resources.

Scalability: As far as possible linear increase in resources should result in linear increase in performance. Thus adding another processor should double performance. As we know, this does not happen but we need as high an increase as possible.

Heterogeneity: We know the distributed system is going to contain all kinds of resources (machines, memory and everything else) but that should not affect performance or the accurate result delivery of the total system.

Security: Since, in the general case, data gets exchanged through networks distributed all over, security  is definitely a concern as all three crucial questions of security achieves prominence. Is the sender really the sender he claims to be, is the receiver the right receiver and is the data in pristine form, not been tampered with!

Emulation of older OS: Unless the OS is being developed for the first time, this is a very important. Typically a particular generation(version) of OS would have been used to develop a lot of applications. Changing/upgrading these on the newer version takes time. So, until all these applications are changed for the newer version, you would need the older version to be emulated within the newer one.

That, in a nutshell why these are the design issues that should be addressed when creating the distributed OS.

If, you still have question, please use the comments section to pose them. I'll try mu best to address them as soon as possible.

No comments:

Post a Comment