THE GRID QUESTIONS Feb 9, 2000 WARNING: This document is under construction and subject to change. Questions to be answered in the context of reading var- ious chapters of: The Grid: Blueprint for a New Computing Infrastructure Ed. by Ian Foster and Carl Kesselman Morgan Kaufmann 1999 ISBN 1-55860-475-8 Chapter 1: Grids in Context 1.1 Where should computing power be placed in the grid? 1.2. Where should data be stored in the grid? 1.3. Where should data be cached in the grid? 1.4. What arguments are there that anything needs to be done to develop the grid except to increase the bandwidth of the internet? Chapter 2: Computational Grids 2.1 Is the notion that readily available computing power will increase x1000 in 5 years and x100,000 in 10 years (2.1.1) realistic? 2.2 What needs to be standardized to create a grid infrastructure? 2.3 Which grid applications require new infrastruc- ture other than increased internet bandwidth? 2.4 What things would you personally like to do that would require remote computing or remote experimental facilities? 2.5 Examine the differences between cluster and internet computing environments. For each environment difference, what differences in programming tools might the environment difference motivate? 2.6 Examine the differences between end user and internet computing environments. For each environment difference, what differences in resource management and scheduling might the environment difference motivate? Chapter 3: Distributed Supercomputing Applications 3.1 What is the maximum rate at which a collection of N computers distributed around the edges of the United States can reliably make decisions? What does this signify for distributed computing? 3.2 What extremely important uses of distributed supercomputing (as opposed to data intensive, real-time, or teleimmersive computing) do you foresee? Which of these involve several physically separated supercomputers? Chapter 4: Distributed Supercomputing Applications 4.1 Look up WALDO on the web and explain the structure and role of metadata in WALDO. 4.2 Ditto but explain the structure and role of DPSS in WALDO. 4.3 Look up JAMM, Java Agents for Monitoring and management, and explain its structure and role when used with DPSS. 4.4 Look up SSL, Secure Socket Layer, and explain its structure and function. Chapter 5: Data-Intensive Computing 5.1 How might one demonstrate that it might or might not be cheaper to co-locate super- computers and large data repositories and move the program to the data rather than vice versa. 5.2 What tera- and penta-byte datasets can you think of that would interest you? That would interest many people? 5.3 Give an example of some type of information you would like to have readily available to you and that requires a very large dataset. How would you organize the data/metadata in a system that provides this information? 5.4 How fast a network connection is possible? What sort of answer can you find on the web? Chapter 6: Teleimmersion 6.1 What economic model would you use to determine whether high speed internet services are really needed for distributed CAD with virtual reality interfaces? 6.2 If teleimmersion is just a very fancy terminal, what data formats should one consider for it? 6.3 Which applications of teleimmersion do you think are realistic? Which are unrealistic? Why? Chapter 7: Application Specific Tools 7.1 Look up NETSOLVE on the web and find out what kinds of actual things can be done with NETSOLVE as it now stands. E.g., can it multiply matrices, and so forth. What kinds of scientific calculations can NETSOLVE now do well, and what kind does it not seem to do well at this time? 7.2 Look up the SCIRun system and study its data types and how they can be visualized. Is there a natural visual input/output mechanism for each data type? Give examples. Chapter 8: Compilers, Languages, and Libraries 8.1 How successful do you think parallel languages have been to date? What percentage of parallelizable problems do they handle? How do special parallel languages compare with ordinary languages augmented by communications libraries? 8.2 List as many different kinds of communication system as you can think of, giving a brief description of each. Find out what a `black- board' communication system is and what it does, and include it in your list. 8.3 Design your own communication system that is optimized to do simulations. Is it similar to any existing communication system? How do you handle virtual time in your system? How do you handle virtual space? Chapter 9: Object-Based Approaches 9.1 Look up CORBA and/or ActiveX on the web and make a list of the kinds of objects that are now available to perform computations. What can you say about the nature of these. 9.2 Compare and contrast any two of these: CORBA, ActiveX, JavaBeans. 9.3 Look up the Illinois Concert runtime system and explain how it is supposed to improve the performance of networked OO. 9.4 Compare and contrast the OO approach to protocol based server approaches such as FTP, X-windows, or whatever you like. 9.5 What is the difference between lightweight and heavyweight objects? Can you give examples of uses of each? Look up an OO database system used for CAD and see what weight objects it supports. Look up LEGION and see what weight objects it supports. Chapter 10: High Performance Commodity Computing 10.1 Explain the difference between two- and three-tier architecture. What are the advantages of each? What are the differences and advantages in security? In performance? 10.2 CORBA is sometimes accused of being inefficient. Look up TAO, a project to make efficient CORBA ORBs, check the the CORBA specification (including IIOP), and explain the pros and cons of CORBA efficiency. Chapter 11: The Globus Toolkit 11.1 Compare Globus to an operating system, such as UNIX. What are the similarities and differences? How should Globus be more like UNIX, and vice versa? (Note: DCE is a system with ambitions similar to Globus that is more like UNIX.) 11.2 Compare Nexus to the message passing API of your choice (e.g. TCP, MPI, etc.). You may find the technical paper "Nexus:Runtime Support for Task Parallel Programming Languages" helpful (see Globus web page documentation). 11.3 Compare the Globus resource manager to any other resource manager. You may choose the matchmaker described in detail in Chapter 13. Documentation of the Globus resource manager is under `details' on the Globus web page. 11.4 Compare the Globus security system to Kerberos. 11.5 Learn more about MDS, the Globus metadata system (actually Metacomputing Directory System). You may find the technical paper "A Directory Service for Configuring High-Performance Distributed Computations" helpful (see Globus web page documentation). Describe those features of MDS that you most like or dislike, or find most significant. Chapter 12: High-Performance Schedulers 12.1 Consider the case study in 12.4. What could happen to make an individual run using this method fail? What are the characteristics of the computers and of the computing task that are required to make this method work? 12.2 Compare AppLeS with one of the other schedulers mentioned in this chapter, getting details from the web. What are the differ- ences and similarities. Which features do you like or dislike or think most significant. Chapter 13: High-Throughput Resource Management 13.1 The Condor pool at the University of Wisconsin appears to consist of around 500 computers assigned to university employees. Suppose there were a separate Condor pool consisting of ALL students, where students are required to enter this pool when they connect to the university network. How would the student pool differ from the employee pool? 13.2 Discuss the economic tradeoffs of using a workstation pool of PCs versus a centralized pool of PCs such as the IBM SP computers. These latter consist of a bunch of PC motherboards in a single (very large) cabinet connected by a very very high performance cheap message switch, with a cost per motherboard perhaps only 50% of the cost of a normal PC. Should a university invest in individual PCs, or SP type computers, or both, and if the latter, how much of each? Chapter 16: Security, Accounting, and Assurance 16.1 One philosophy of security is to keep your users well behaved by tightly controlling permissions. A conflicting philosophy is to be permissive, but keep an excellent audit trail to catch intentional and unintentional abusers. With regard to the grid, and with respect to attacks on privacy, abuse of service attacks (using service for illegal purposes), and denial of service (resource) attacks, what permissions would you tightly control, and what audit trails would you keep? 16.2 Find out about any one of the following and describe its principal and/or most interesting features: CISCO Secure PIX Firewall Generic Security Service (GSS) IETF IP Security Protocol (IPSec) Pretty Good Privacy (PGP) Platform for Internet Content Selection (PICS) Microsoft Authenticode Netcheque Chapter 17: Computing Platforms 17.1 What is the `Memory Wall' (look it up on the web) and how will it influence the notion that personal computers will achieve 8 GIPS by 2003 and 64 GIPS by 2008? What are the personal computer processor chips likely to contain in view of the memory wall problem? 17.2 Look up the IBM SP computers on the web, and describe their principal features. Suppose you lived in a dorm and shared a 64 processor SP with 63 other students, instead of each student having their own computer. What would be the pros and cons of this setup? What if instead of students in a dorm we were talking about coworkers at a business? 17.3 Look up VIA on the web, describe its principal features, and explain why it is supposed to be higher performance than other communica- tions interfaces. Does VIA have any long term defects (compare with some other research fast communication systems if you have time)? 17.4 Is the IBM SP a Shared Controlled-Performance SCE? If not, what are its deficiencies? Are these really important? 17.5 Should the Harvard Division of Applied Sciences build a system like the Berkeley NOW system, by using its existing desktop compu- ters with new network hardware and some new software? What are the costs, advantages, and disadvantages? (Look up the NOW system on the web and be sure you understand it.) Chapter 18: Network Protocols 18.1 Consider a messaging system in which each of N processes can send a message to any subset of the N processes, and all messages must be delivered in the same order at every receiver. What is the intrinsic overhead of such as system? 18.2 Imagine a communication system in which a list of the messages that will be sent and their receivers becomes available somewhat in advance of the messages actually being sent. How much better than existing communications systems could such a communication system be? How would you organize the infrastructure for such a system? 18.3 Find out more about any one of the protocols mentioned in this chapter and describe the principal and/or most interesting features of this protocol. Or look up two protocols designed to do about the same thing (e.g. multicast or realtime streaming) and compare them. Chapter 19: Network Quality of Service 19.1 Lookup the details of RSVP and design a system to permit you and a friend to hold a 2-way audio conversation. Give details of the RSVP reservations that must be made. 19.2 Suppose you were designing a network with 100 nodes geographically distributed in a perfect 10x10 square array, with each node connected to its four nearest neighbors by 200 kilobit per second phone lines. Suppose there is unlimited buffer memory and processing power at each node. Suppose audio links required 16 kilobits per second of almost completely reliable communication, with a maximum delay of 300 milliseconds. How does packet size affect communication delay. What is the maximum feasible packet size? Look up the ATM network link protocol and find out what its natural packet size is, and speculate on why it is that size. Can you characterize the capacity of the network described above in terms of the number of audio links it can handle between given pairs of nodes simultaneously? 19.3 List all the possible defects you can think of that are fairly likely to occur in a resource reservation system. Which ones occur in RSVP? Chapter 20: Operating Systems and Network Interfaces 20.1 If you did not do so for a previous question, look up VIA. Explain how it does or does not conform to the communication system requirements given in this chapter. 20.2 Draw a parallel between communication systems and graphic systems. Provide some details. 20.3 Look up fbufs, or any other concept you feel is not adequately explained in this chapter, and explain it more carefully. Chapter 21: Network Infrastructure 21.1 Using the Web or any other resource to discover your answer, describe how some of the current internet backbone is actually constructed. 21.2 What to you think the internet driving applications of the future might be? If you think of the Web as a database, what new kinds of databases might be among these applications? What else? 21.3 Look up IETF on the web. Look at the RFCs. Make an annotated list of part of what the IETF is working on.