MODERN OPERATING SYSTEMS

For the better understanding of the Operating Systems, it is one of the best books of its kind.


ANDREW S. TANENBAUM,HERBERT BOS


1137 Pages

111040 Reads

68 Downloads

English

PDF Format

6.25 MB

Other Books

Download PDF format


  • ANDREW S. TANENBAUM,HERBERT BOS   
  • 1137 Pages   
  • 21 Feb 2015
  • Page - 1

    read more..

  • Page - 2

    MODERN OPERATING SYSTEMS FOURTH EDITION read more..

  • Page - 3

    Trademarks AMD, the AMD logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Android and Google Web Search are trademarks of Google Inc. Apple and Apple Macintosh are registered trademarkes of Apple Inc. ASM, DESPOOL, DDT, LINK-80, MAC, MP/M, PL/1-80 and SID are trademarks of Digital Research. BlackBerry®, RIM®, Research In Motion® and related trademarks, names read more..

  • Page - 4

    MODERN OPERATING SYSTEMS FOURTH EDITION ANDREW S. TANENBAUM HERBERT BOS Vrije Universiteit Amsterdam, The Netherlands Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo read more..

  • Page - 5

    Vice President and Editorial Director, ECS: Marcia Horton Executive Editor: Tracy Johnson Program Management Team Lead: Scott Disanno Program Manager: Carole Snyder Project Manager: Camille Trentacoste Operations Specialist: Linda Sager Cover Design: Black Horse Designs Cover art: Jason Consalvo Media Project Manager: Renata Butera Copyright © 2015, 2008 by Pearson Education, Inc., Upper Saddle read more..

  • Page - 6

    To Suzanne, Barbara, Daniel, Aron, Nathan, Marvin, Matilde, and Olivia. The list keeps growing. (AST) To Marieke, Duko, Jip, and Spot. Fearsome Jedi, all. (HB) read more..

  • Page - 7

    This page intentionally left blank read more..

  • Page - 8

    CONTENTS PREFACE xxiii 1 INTRODUCTION 1 1.1 WHAT IS AN OPERATING SYSTEM? 3 1.1.1 The Operating System as an Extended Machine 4 1.1.2 The Operating System as a Resource Manager 5 1.2 HISTORY OF OPERATING SYSTEMS 6 1.2.1 The First Generation (1945–55): Vacuum Tubes 7 1.2.2 The Second Generation (1955–65): Transistors and Batch Systems 8 1.2.3 The Third Generation (1965–1980): ICs and read more..

  • Page - 9

    viii CONTENTS 1.4 THE OPERATING SYSTEM ZOO 35 1.4.1 Mainframe Operating Systems 35 1.4.2 Server Operating Systems 35 1.4.3 Multiprocessor Operating Systems 36 1.4.4 Personal Computer Operating Systems 36 1.4.5 Handheld Computer Operating Systems 36 1.4.6 Embedded Operating Systems 36 1.4.7 Sensor-Node Operating Systems 37 1.4.8 Real-Time Operating Systems 37 1.4.9 Smart Card Operating Systems 38 read more..

  • Page - 10

    CONTENTS ix 1.9 RESEARCH ON OPERATING SYSTEMS 77 1.10 OUTLINE OF THE REST OF THIS BOOK 78 1.11 METRIC UNITS 79 1.12 SUMMARY 80 2 PROCESSES AND THREADS 85 2.1 PROCESSES 85 2.1.1 The Process Model 86 2.1.2 Process Creation 88 2.1.3 Process Termination 90 2.1.4 Process Hierarchies 91 2.1.5 Process States 92 2.1.6 Implementation of Processes 94 2.1.7 Modeling Multiprogramming 95 2.2 THREADS read more..

  • Page - 11

    x CONTENTS 2.3.7 Monitors 137 2.3.8 Message Passing 144 2.3.9 Barriers 146 2.3.10 Avoiding Locks: Read-Copy-Update 148 2.4 SCHEDULING 148 2.4.1 Introduction to Scheduling 149 2.4.2 Scheduling in Batch Systems 156 2.4.3 Scheduling in Interactive Systems 158 2.4.4 Scheduling in Real-Time Systems 164 2.4.5 Policy Versus Mechanism 165 2.4.6 Thread Scheduling 165 2.5 CLASSICAL IPC PROBLEMS 167 2.5.1 read more..

  • Page - 12

    CONTENTS xi 3.4 PAGE REPLACEMENT ALGORITHMS 209 3.4.1 The Optimal Page Replacement Algorithm 209 3.4.2 The Not Recently Used Page Replacement Algorithm 210 3.4.3 The First-In, First-Out (FIFO) Page Replacement Algorithm 211 3.4.4 The Second-Chance Page Replacement Algorithm 211 3.4.5 The Clock Page Replacement Algorithm 212 3.4.6 The Least Recently Used (LRU) Page Replacement Algorithm 213 read more..

  • Page - 13

    xii CONTENTS 4 FILE SYSTEMS 263 4.1 FILES 265 4.1.1 File Naming 265 4.1.2 File Structure 267 4.1.3 File Types 268 4.1.4 File Access 269 4.1.5 File Attributes 271 4.1.6 File Operations 271 4.1.7 An Example Program Using File-System Calls 273 4.2 DIRECTORIES 276 4.2.1 Single-Level Directory Systems 276 4.2.2 Hierarchical Directory Systems 276 4.2.3 Path Names 277 4.2.4 Directory Operations 280 read more..

  • Page - 14

    CONTENTS xiii 5 INPUT/OUTPUT 337 5.1 PRINCIPLES OF I/O HARDWARE 337 5.1.1 I/O Devices 338 5.1.2 Device Controllers 339 5.1.3 Memory-Mapped I/O 340 5.1.4 Direct Memory Access 344 5.1.5 Interrupts Revisited 347 5.2 PRINCIPLES OF I/O SOFTWARE 351 5.2.1 Goals of the I/O Software 351 5.2.2 Programmed I/O 352 5.2.3 Interrupt-Driven I/O 354 5.2.4 I/O Using DMA 355 5.3 I/O SOFTWARE LAYERS 356 read more..

  • Page - 15

    xiv CONTENTS 5.8.2 Operating System Issues 419 5.8.3 Application Program Issues 425 5.9 RESEARCH ON INPUT/OUTPUT 426 5.10 SUMMARY 428 6 DEADLOCKS 435 6.1 RESOURCES 436 6.1.1 Preemptable and Nonpreemptable Resources 436 6.1.2 Resource Acquisition 437 6.2 INTRODUCTION TO DEADLOCKS 438 6.2.1 Conditions for Resource Deadlocks 439 6.2.2 Deadlock Modeling 440 6.3 THE OSTRICH ALGORITHM 443 6.4 DEADLOCK read more..

  • Page - 16

    CONTENTS xv 6.7.3 Livelock 461 6.7.4 Starvation 463 6.8 RESEARCH ON DEADLOCKS 464 6.9 SUMMARY 464 7 VIRTUALIZATION AND THE CLOUD 471 7.1 HISTORY 473 7.2 REQUIREMENTS FOR VIRTUALIZATION 474 7.3 TYPE 1 AND TYPE 2 HYPERVISORS 477 7.4 TECHNIQUES FOR EFFICIENT VIRTUALIZATION 478 7.4.1 Virtualizing the Unvirtualizable 479 7.4.2 The Cost of Virtualization 482 7.5 ARE HYPERVISORS MICROKERNELS DONE read more..

  • Page - 17

    xvi CONTENTS 7.12.3 Challenges in Bringing Virtualization to the x86 500 7.12.4 VMware Workstation: Solution Overview 502 7.12.5 The Evolution of VMware Workstation 511 7.12.6 ESX Server: VMware’s type 1 Hypervisor 512 7.13 RESEARCH ON VIRTUALIZATION AND THE CLOUD 514 8 MULTIPLE PROCESSOR SYSTEMS 517 8.1 MULTIPROCESSORS 520 8.1.1 Multiprocessor Hardware 520 8.1.2 Multiprocessor Operating System read more..

  • Page - 18

    CONTENTS xvii 9 SECURITY 593 9.1 THE SECURITY ENVIRONMENT 595 9.1.1 Threats 596 9.1.2 Attackers 598 9.2 OPERATING SYSTEMS SECURITY 599 9.2.1 Can We Build Secure Systems? 600 9.2.2 Trusted Computing Base 601 9.3 CONTROLLING ACCESS TO RESOURCES 602 9.3.1 Protection Domains 602 9.3.2 Access Control Lists 605 9.3.3 Capabilities 608 9.4 FORMAL MODELS OF SECURE SYSTEMS 611 9.4.1 Multilevel Security read more..

  • Page - 19

    xviii CONTENTS 9.9 MALWARE 660 9.9.1 Trojan Horses 662 9.9.2 Viruses 664 9.9.3 Worms 674 9.9.4 Spyware 676 9.9.5 Rootkits 680 9.10 DEFENSES 684 9.10.1 Firewalls 685 9.10.2 Antivirus and Anti-Antivirus Techniques 687 9.10.3 Code Signing 693 9.10.4 Jailing 694 9.10.5 Model-Based Intrusion Detection 695 9.10.6 Encapsulating Mobile Code 697 9.10.7 Java Security 701 9.11 RESEARCH ON SECURITY 703 read more..

  • Page - 20

    CONTENTS xix 10.3.3 Implementation of Processes and Threads in Linux 739 10.3.4 Scheduling in Linux 746 10.3.5 Booting Linux 751 10.4 MEMORY MANAGEMENT IN LINUX 753 10.4.1 Fundamental Concepts 753 10.4.2 Memory Management System Calls in Linux 756 10.4.3 Implementation of Memory Management in Linux 758 10.4.4 Paging in Linux 764 10.5 INPUT/OUTPUT IN LINUX 767 10.5.1 Fundamental Concepts read more..

  • Page - 21

    xx CONTENTS 11 CASE STUDY 2: WINDOWS 8 857 11.1 HISTORY OF WINDOWS THROUGH WINDOWS 8.1 857 11.1.1 1980s: MS-DOS 857 11.1.2 1990s: MS-DOS-based Windows 859 11.1.3 2000s: NT-based Windows 859 11.1.4 Windows Vista 862 11.1.5 2010s: Modern Windows 863 11.2 PROGRAMMING WINDOWS 864 11.2.1 The Native NT Application Programming Interface 867 11.2.2 The Win32 Application Programming Interface 871 read more..

  • Page - 22

    CONTENTS xxi 11.10 SECURITY IN WINDOWS 8 966 11.10.1 Fundamental Concepts 967 11.10.2 Security API Calls 969 11.10.3 Implementation of Security 970 11.10.4 Security Mitigations 972 11.11 SUMMARY 975 12 OPERATING SYSTEM DESIGN 981 12.1 THE NATURE OF THE DESIGN PROBLEM 982 12.1.1 Goals 982 12.1.2 Why Is It Hard to Design an Operating System? 983 12.2 INTERFACE DESIGN 985 12.2.1 Guiding read more..

  • Page - 23

    xxii CONTENTS 12.5 PROJECT MANAGEMENT 1018 12.5.1 The Mythical Man Month 1018 12.5.2 Team Structure 1019 12.5.3 The Role of Experience 1021 12.5.4 No Silver Bullet 1021 12.6 TRENDS IN OPERATING SYSTEM DESIGN 1022 12.6.1 Virtualization and the Cloud 1023 12.6.2 Manycore Chips 1023 12.6.3 Large-Address-Space Operating Systems 1024 12.6.4 Seamless Data Access 1025 12.6.5 Battery-Powered Computers read more..

  • Page - 24

    PREFACE The fourth edition of this book differs from the third edition in numerous ways. There are large numbers of small changes everywhere to bring the material up to date as operating systems are not standing still. The chapter on Multimedia Oper- ating Systems has been moved to the Web, primarily to make room for new mater- ial and keep the book from growing to a read more..

  • Page - 25

    xxiv PREFACE • Chapter 5 has seen a lot of changes. Older devices like CRTs and CD- ROMs have been removed, while new technology, such as touch screens have been added. • Chapter 6 is pretty much unchanged. The topic of deadlocks is fairly stable, with few new results. • Chapter 7 is completely new. It covers the important topics of virtu- alization and the cloud. read more..

  • Page - 26

    PREFACE xxv sheets, software tools for studying operating systems, lab experiments for students, simulators, and more material for use in operating systems courses. Instructors using this book in a course should definitely take a look. The Companion Website for this book is also located at www.pearsonhighered.com/tanenbaum. The specif- ic site for this book is password protected. To use read more..

  • Page - 27

    xxvi PREFACE We were also fortunate to have sev eral reviewers who read the manuscript and also suggested new end-of-chapter problems. These were Trudy Levine, Shivakant Mishra, Krishna Sivalingam, and Ken Wong. Steve Armstrong did the PowerPoint sheets for instructors teaching a course using the book. Normally copyeditors and proofreaders don’t get acknowledgements, but Bob Lentz read more..

  • Page - 28

    ABOUT THE AUTHORS Andrew S. Tanenbaum has an S.B. degree from M.I.T. and a Ph.D. from the University of California at Berkeley. He is currently a Professor of Computer Sci- ence at the Vrije Universiteit in Amsterdam, The Netherlands. He was formerly Dean of the Advanced School for Computing and Imaging, an interuniversity grad- uate school doing research on advanced parallel, read more..

  • Page - 29

    This page intentionally left blank read more..

  • Page - 30

    MODERN OPERATING SYSTEMS read more..

  • Page - 31

    This page intentionally left blank read more..

  • Page - 32

    1 INTRODUCTION A modern computer consists of one or more processors, some main memory, disks, printers, a keyboard, a mouse, a display, network interfaces, and various other input/output devices. All in all, a complex system.oo If every application pro- grammer had to understand how all these things work in detail, no code would ever get written. Furthermore, managing all these read more..

  • Page - 33

    2 INTRODUCTION CHAP. 1 complete access to all the hardware and can execute any instruction the machine is capable of executing. The rest of the software runs in user mode, in which only a subset of the machine instructions is available. In particular, those instructions that affect control of the machine or do I/O )Input/Output" are forbidden to user-mode programs. We will come read more..

  • Page - 34

    SEC. 1.1 WHAT IS AN OPERATING SYSTEM? 3 system (such as the file system) run in user space. In such systems, it is difficult to draw a clear boundary. Everything running in kernel mode is clearly part of the operating system, but some programs running outside it are arguably also part of it, or at least closely associated with it. Operating systems differ from user (i.e., read more..

  • Page - 35

    4 INTRODUCTION CHAP. 1 providing application programmers (and application programs, naturally) a clean abstract set of resources instead of the messy hardware ones and managing these hardware resources. Depending on who is doing the talking, you might hear mostly about one function or the other. Let us now look at both. 1.1.1 The Operating System as an Extended Machine The read more..

  • Page - 36

    SEC. 1.1 WHAT IS AN OPERATING SYSTEM? 5 Operating system Hardware Ugly interface Beautiful interface Application programs Figure 1-2. Operating systems turn ugly hardware into beautiful abstractions. It should be noted that the operating system’s real customers are the applica- tion programs (via the application programmers, of course). They are the ones who deal directly with the operating read more..

  • Page - 37

    6 INTRODUCTION CHAP. 1 few lines of printout might be from program 1, the next few from program 2, then some from program 3, and so forth. The result would be utter chaos. The operating system can bring order to the potential chaos by buffering all the output destined for the printer on the disk. When one program is finished, the operating system can then copy its read more..

  • Page - 38

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 7 run, we will look at successive generations of computers to see what their operat- ing systems were like. This mapping of operating system generations to computer generations is crude, but it does provide some structure where there would other- wise be none. The progression given below is largely chronological, but it has been a bumpy read more..

  • Page - 39

    8 INTRODUCTION CHAP. 1 straightforward mathematical and numerical calculations, such as grinding out tables of sines, cosines, and logarithms, or computing artillery trajectories. By the early 1950s, the routine had improved somewhat with the introduction of punched cards. It was now possible to write programs on cards and read them in instead of using plugboards; otherwise, the procedure read more..

  • Page - 40

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 9 1401 7094 1401 (a) (b) (c) (d) (e) (f) Card reader Tape drive Input tape Output tape System tape Printer Figure 1-3. An early batch system. (a) Programmers bring cards to 1401. (b) 1401 reads batch of jobs onto tape. (c) Operator carries input tape to 7094. (d) 7094 does computing. (e) Operator carries output tape to 1401. (f) 1401 prints output. read more..

  • Page - 41

    10 INTRODUCTION CHAP. 1 $JOB, 10,7710802, MARVIN TANENBAUM $FORTRAN $LOAD $RUN $END Data for program FORTRAN program Figure 1-4. Structure of a typical FMS job. character-oriented, commercial computers, such as the 1401, which were widely used for tape sorting and printing by banks and insurance companies. Developing and maintaining two completely different product lines was an ex- pensive read more..

  • Page - 42

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 11 was an immediate success, and the idea of a family of compatible computers was soon adopted by all the other major manufacturers. The descendants of these ma- chines are still in use at computer centers today. Now adays they are often used for managing huge databases (e.g., for airline reservation systems) or as servers for World Wide read more..

  • Page - 43

    12 INTRODUCTION CHAP. 1 Job 3 Job 2 Job 1 Operating system Memory partitions Figure 1-5. A multiprogramming system with three jobs in memory. Another major feature present in third-generation operating systems was the ability to read jobs from cards onto the disk as soon as they were brought to the computer room. Then, whenever a running job finished, the operating system could load read more..

  • Page - 44

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 13 of simultaneous timesharing users. Their model was the electricity system—when you need electric power, you just stick a plug in the wall, and within reason, as much power as you need will be there. The designers of this system, known as MULTICS (MULTiplexed Information and Computing Service), envisioned one huge machine providing computing read more..

  • Page - 45

    14 INTRODUCTION CHAP. 1 and Saltzer, 1974). It also has an active Website, located at www.multicians.org, with much information about the system, its designers, and its users. Another major development during the third generation was the phenomenal growth of minicomputers, starting with the DEC PDP-1 in 1961. The PDP-1 had only 4K of 18-bit words, but at $120,000 per machine (less read more..

  • Page - 46

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 15 1.2.4 The Fourth Generation (1980–Present): Personal Computers With the development of LSI (Large Scale Integration) circuits—chips con- taining thousands of transistors on a square centimeter of silicon—the age of the personal computer dawned. In terms of architecture, personal computers (initially called microcomputers) were not all that different read more..

  • Page - 47

    16 INTRODUCTION CHAP. 1 attempt to sell CP/M to end users one at a time (at least initially). After all this transpired, Kildall died suddenly and unexpectedly from causes that have not been fully disclosed. By the time the successor to the IBM PC, the IBM PC/AT, came out in 1983 with the Intel 80286 CPU, MS-DOS was firmly entrenched and CP/M was on its last legs. read more..

  • Page - 48

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 17 complete rewrite from scratch internally. It was a full 32-bit system. The lead de- signer for Windows NT was David Cutler, who was also one of the designers of the VAX VMS operating system, so some ideas from VMS are present in NT. In fact, so many ideas from VMS were present in it that the owner of VMS, DEC, sued Microsoft. read more..

  • Page - 49

    18 INTRODUCTION CHAP. 1 x86-based computers, Linux is becoming a popular alternative to Windows for stu- dents and increasingly many corporate users. As an aside, throughout this book we will use the term x86 to refer to all mod- ern processors based on the family of instruction-set architectures that started with the 8086 in the 1970s. There are many such processors, read more..

  • Page - 50

    SEC. 1.2 HISTORY OF OPERATING SYSTEMS 19 differ in certain critical ways. Distributed systems, for example, often allow appli- cations to run on several processors at the same time, thus requiring more complex processor scheduling algorithms in order to optimize the amount of parallelism. Communication delays within the network often mean that these (and other) algorithms must run with read more..

  • Page - 51

    20 INTRODUCTION CHAP. 1 of the town (although not nearly as dominant as Symbian had been), but it did not take very long for Android, a Linux-based operating system released by Google in 2008, to overtake all its rivals. For phone manufacturers, Android had the advantage that it was open source and available under a permissive license. As a result, they could tinker with read more..

  • Page - 52

    SEC. 1.3 COMPUTER HARDWARE REVIEW 21 1.3.1 Processors The ‘‘brain’’ of the computer is the CPU. It fetches instructions from memory and executes them. The basic cycle of every CPU is to fetch the first instruction from memory, decode it to determine its type and operands, execute it, and then fetch, decode, and execute subsequent instructions. The cycle is repeated until read more..

  • Page - 53

    22 INTRODUCTION CHAP. 1 Pipelines cause compiler writers and operating system writers great headaches be- cause they expose the complexities of the underlying machine to them and they have to deal with them. Fetch unit Fetch unit Fetch unit Decode unit Decode unit Execute unit Execute unit Execute unit Execute unit Decode unit Holding buffer (a) (b) Figure 1-7. (a) A three-stage pipeline. (b) A read more..

  • Page - 54

    SEC. 1.3 COMPUTER HARDWARE REVIEW 23 of procedure call that has the additional property of switching from user mode to kernel mode. As a note on typography, we will use the lower-case Helvetica font to indicate system calls in running text, like this: read . It is worth noting that computers have traps other than the instruction for ex- ecuting a system call. Most of the read more..

  • Page - 55

    24 INTRODUCTION CHAP. 1 time, it may inadvertently schedule two threads on the same CPU, with the other CPU completely idle. This choice is far less efficient than using one thread on each CPU. Beyond multithreading, many CPU chips now hav e four, eight, or more com- plete processors or cores on them. The multicore chips of Fig. 1-8 effectively carry four minichips on them, read more..

  • Page - 56

    SEC. 1.3 COMPUTER HARDWARE REVIEW 25 Registers Cache Main memory Magnetic disk 1 nsec 2 nsec 10 nsec 10 msec <1 KB 4 MB 1-8 GB 1-4 TB Typical capacity Typical access time Figure 1-9. A typical memory hierarchy. The numbers are very rough approximations. typically 32 × 32 bits on a 32-bit CPU and 64 × 64 bits on a 64-bit CPU. Less read more..

  • Page - 57

    26 INTRODUCTION CHAP. 1 Not every question is relevant to every caching situation. For caching lines of main memory in the CPU cache, a new item will generally be entered on every cache miss. The cache line to use is generally computed by using some of the high-order bits of the memory address referenced. For example, with 4096 cache lines of 64 bytes and 32 bit read more..

  • Page - 58

    SEC. 1.3 COMPUTER HARDWARE REVIEW 27 Flash memory is also commonly used as the storage medium in portable elec- tronic devices. It serves as film in digital cameras and as the disk in portable music players, to name just two uses. Flash memory is intermediate in speed between RAM and disk. Also, unlike disk memory, if it is erased too many times, it wears out. Yet another read more..

  • Page - 59

    28 INTRODUCTION CHAP. 1 Information is written onto the disk in a series of concentric circles. At any giv en arm position, each of the heads can read an annular region called a track. Toget- her, all the tracks for a given arm position form a cylinder. Each track is divided into some number of sectors, typically 512 bytes per sec- tor. On modern disks, the outer read more..

  • Page - 60

    SEC. 1.3 COMPUTER HARDWARE REVIEW 29 read sector 11,206 from disk 2. The controller then has to convert this linear sector number to a cylinder, sector, and head. This conversion may be complicated by the fact that outer cylinders have more sectors than inner ones and that some bad sec- tors have been remapped onto other ones. Then the controller has to determine which read more..

  • Page - 61

    30 INTRODUCTION CHAP. 1 drivers while running and install them on the fly without the need to reboot. This way used to be rare but is becoming much more common now. Hot-pluggable devices, such as USB and IEEE 1394 devices (discussed below), always need dy- namically loaded drivers. Every controller has a small number of registers that are used to communicate with it. For read more..

  • Page - 62

    SEC. 1.3 COMPUTER HARDWARE REVIEW 31 puts the number of the device on the bus so the CPU can read it and know which device has just finished (many devices may be running at the same time). CPU Interrupt controller Disk controller Disk drive Current instruction Next instruction 1. Interrupt 3. Return 2. Dispatch to handler Interrupt handler (b) (a) 1 3 42 Figure 1-11. (a) The read more..

  • Page - 63

    32 INTRODUCTION CHAP. 1 1.3.5 Buses The organization of Fig. 1-6 was used on minicomputers for years and also on the original IBM PC. However, as processors and memories got faster, the ability of a single bus (and certainly the IBM PC bus) to handle all the traffic was strained to the breaking point. Something had to give. As a result, additional buses were added, both read more..

  • Page - 64

    SEC. 1.3 COMPUTER HARDWARE REVIEW 33 a message through a single connection, known as a lane, much like a network packet. This is much simpler, because you do not have to ensure that all 32 bits arrive at the destination at exactly the same time. Parallelism is still used, because you can have multiple lanes in parallel. For instance, we may use 32 lanes to carry 32 read more..

  • Page - 65

    34 INTRODUCTION CHAP. 1 I/O addresses 0x60 to 0x64, the floppy disk controller was interrupt 6 and used I/O addresses 0x3F0 to 0x3F7, and the printer was interrupt 7 and used I/O addresses 0x378 to 0x37A, and so on. So far, so good. The trouble came in when the user bought a sound card and a modem card and both happened to use, say, interrupt 4. They would conflict read more..

  • Page - 66

    SEC. 1.3 COMPUTER HARDWARE REVIEW 35 operating system loads them into the kernel. Then it initializes its tables, creates whatever background processes are needed, and starts up a login program or GUI. 1.4 THE OPERATING SYSTEM ZOO Operating systems have been around now for over half a century. During this time, quite a variety of them have been developed, not all of them read more..

  • Page - 67

    36 INTRODUCTION CHAP. 1 service. Internet providers run many server machines to support their customers and Websites use servers to store the Web pages and handle the incoming requests. Typical server operating systems are Solaris, FreeBSD, Linux and Windows Server 201x. 1.4.3 Multiprocessor Operating Systems An increasingly common way to get major-league computing power is to con- nect read more..

  • Page - 68

    SEC. 1.4 THE OPERATING SYSTEM ZOO 37 1.4.6 Embedded Operating Systems Embedded systems run on the computers that control devices that are not gen- erally thought of as computers and which do not accept user-installed software. Typical examples are microwave ovens, TV sets, cars, DVD recorders, traditional phones, and MP3 players. The main property which distinguishes embedded sys- tems read more..

  • Page - 69

    38 INTRODUCTION CHAP. 1 occur at a certain moment (or within a certain range), we have a hard real-time system. Many of these are found in industrial process control, avionics, military, and similar application areas. These systems must provide absolute guarantees that a certain action will occur by a certain time. A soft real-time system, is one where missing an occasional read more..

  • Page - 70

    SEC. 1.5 OPERATING SYSTEM CONCEPTS 39 an introduction. We will come back to each of them in great detail later in this book. To illustrate these concepts we will, from time to time, use examples, gener- ally drawn from UNIX. Similar examples typically exist in other systems as well, however, and we will study some of them later. 1.5.1 Processes A key concept in all read more..

  • Page - 71

    40 INTRODUCTION CHAP. 1 typed a command requesting that a program be compiled. The shell must now cre- ate a new process that will run the compiler. When that process has finished the compilation, it executes a system call to terminate itself. If a process can create one or more other processes (referred to as child pro- cesses) and these processes in turn can create child read more..

  • Page - 72

    SEC. 1.5 OPERATING SYSTEM CONCEPTS 41 One UID, called the superuser (in UNIX), or Administrator (in Windows), has special power and may override many of the protection rules. In large in- stallations, only the system administrator knows the password needed to become superuser, but many of the ordinary users (especially students) devote considerable effort seeking flaws in the system read more..

  • Page - 73

    42 INTRODUCTION CHAP. 1 nice, clean abstract model of device-independent files. System calls are obviously needed to create files, remove files, read files, and write files. Before a file can be read, it must be located on the disk and opened, and after being read it should be closed, so calls are provided to do these things. To provide a place to keep files, most PC read more..

  • Page - 74

    SEC. 1.5 OPERATING SYSTEM CONCEPTS 43 access a child process, but mechanisms nearly always exist to allow files and direc- tories to be read by a wider group than just the owner. Every file within the directory hierarchy can be specified by giving its path name from the top of the directory hierarchy, the root directory. Such absolute path names consist of the list of read more..

  • Page - 75

    44 INTRODUCTION CHAP. 1 Root CD-ROM ab cd c d ab xy xy (a) (b) Figure 1-15. (a) Before mounting, the files on the CD-ROM are not accessible. (b) After mounting, they are part of the file hierarchy. Another important concept in UNIX is the special file. Special files are pro- vided in order to make I/O devices look like files. That way, they can be read and written using the read more..

  • Page - 76

    SEC. 1.5 OPERATING SYSTEM CONCEPTS 45 1.5.4 Input/Output All computers have physical devices for acquiring input and producing output. After all, what good would a computer be if the users could not tell it what to do and could not get the results after it did the work requested? Many kinds of input and output devices exist, including keyboards, monitors, printers, and so on. read more..

  • Page - 77

    46 INTRODUCTION CHAP. 1 between a user sitting at his terminal and the operating system, unless the user is using a graphical user interface. Many shells exist, including sh, csh, ksh,and bash. All of them support the functionality described below, which derives from the orig- inal shell (sh). When any user logs in, a shell is started up. The shell has the terminal as read more..

  • Page - 78

    SEC. 1.5 OPERATING SYSTEM CONCEPTS 47 1.5.7 Ontogeny Recapitulates Phylogeny After Charles Darwin’s book On the Origin of the Species was published, the German zoologist Ernst Haeckel stated that ‘‘ontogeny recapitulates phylogeny.’’ By this he meant that the development of an embryo (ontogeny) repeats (i.e., reca- pitulates) the evolution of the species (phylogeny). In other words, read more..

  • Page - 79

    48 INTRODUCTION CHAP. 1 is not always crucial because network delays are so great that they tend to domi- nate. Thus the pendulum has already swung several cycles between direct execu- tion and interpretation and may yet swing again in the future. Large Memories Let us now examine some historical developments in hardware and how they have affected software repeatedly. The first read more..

  • Page - 80

    SEC. 1.5 OPERATING SYSTEM CONCEPTS 49 hardware was added and multiprogramming became possible. Until this day, many embedded systems have no protection hardware and run just a single program. Now let us look at operating systems. The first mainframes initially had no protection hardware and no support for multiprogramming, so they ran simple op- erating systems that handled one read more..

  • Page - 81

    50 INTRODUCTION CHAP. 1 40 cm in diameter and 5 cm high. But it, too, had a single-level directory initially. When microcomputers came out, CP/M was initially the dominant operating sys- tem, and it, too, supported just one directory on the (floppy) disk. Virtual Memory Virtual memory (discussed in Chap. 3) gives the ability to run programs larger than the machine’s physical read more..

  • Page - 82

    SEC. 1.6 SYSTEM CALLS 51 mechanics of issuing a system call are highly machine dependent and often must be expressed in assembly code, a procedure library is provided to make it possible to make system calls from C programs and often from other languages as well. It is useful to keep the following in mind. Any single-CPU computer can ex- ecute only one instruction at a read more..

  • Page - 83

    52 INTRODUCTION CHAP. 1 Return to caller 4 10 6 0 9 78 3 2 1 11 Dispatch Sys call handler Address 0xFFFFFFFF User space Kernel space (Operating system) Library procedure read User program calling read Trap to the kernel Put code for read in register Increment SP Call read Push fd Push &buffer Push nbytes 5 Figure 1-17. The 11 steps in making the system call read(fd, buffer, nbytes). read more..

  • Page - 84

    SEC. 1.6 SYSTEM CALLS 53 does, the compiled code increments the stack pointer exactly enough to remove the parameters pushed before the call to read. The program is now free to do whatever it wants to do next. In step 9 above, we said ‘‘may be returned to the user-space library procedure’’ for good reason. The system call may block the caller, preventing it from continu- read more..

  • Page - 85

    54 INTRODUCTION CHAP. 1 Process management Call Description pid = for k( ) Create a child process identical to the parent pid = waitpid(pid, &statloc, options) Wait for a child to terminate s = execve(name, argv, environp) Replace a process’ core image exit(status) Ter minate process execution and return status File management Call Description fd = open(file, how, ...) Open a read more..

  • Page - 86

    SEC. 1.6 SYSTEM CALLS 55 the parent executes a waitpid system call, which just waits until the child terminates (any child if more than one exists). Waitpid can wait for a specific child, or for any old child by setting the first parameter to −1. When waitpid completes, the address pointed to by the second parameter, statloc, will be set to the child process’ exit status read more..

  • Page - 87

    56 INTRODUCTION CHAP. 1 The main program of cp (and main program of most other C programs) con- tains the declaration main(argc, argv, envp) where argc is a count of the number of items on the command line, including the program name. For the example above, argc is 3. The second parameter, argv, is a pointer to an array. Element i of that array is a pointer to the read more..

  • Page - 88

    SEC. 1.6 SYSTEM CALLS 57 Address (hex) FFFF 0000 Stack Data Text Gap Figure 1-20. Processes have three segments: text, data, and stack. The file descriptor returned can then be used for reading or writing. Afterward, the file can be closed by close , which makes the file descriptor available for reuse on a subsequent open . The most heavily used calls are undoubtedly read and wr read more..

  • Page - 89

    58 INTRODUCTION CHAP. 1 a shared file means that changes that any member of the team makes are instantly visible to the other members—there is only one file. When copies are made of a file, subsequent changes made to one copy do not affect the others. To see how link works, consider the situation of Fig. 1-21(a). Here are two users, ast and jim, each having his own read more..

  • Page - 90

    SEC. 1.6 SYSTEM CALLS 59 By executing the mount system call, the USB file system can be attached to the root file system, as shown in Fig. 1-22. A typical statement in C to mount is mount("/dev/sdb0", "/mnt", 0); where the first parameter is the name of a block special file for USB drive 0, the second parameter is the place in the tree where it is to read more..

  • Page - 91

    60 INTRODUCTION CHAP. 1 run. If the process is not prepared to handle a signal, then its arrival kills the proc- ess (hence the name of the call). POSIX defines a number of procedures for dealing with time. For example, time just returns the current time in seconds, with 0 corresponding to Jan. 1, 1970 at midnight (just as the day was starting, not ending). On computers read more..

  • Page - 92

    SEC. 1.6 SYSTEM CALLS 61 The number of Win32 API calls is extremely large, numbering in the thou- sands. Furthermore, while many of them do invoke system calls, a substantial num- ber are carried out entirely in user space. As a consequence, with Windows it is impossible to see what is a system call (i.e., performed by the kernel) and what is simply a user-space library read more..

  • Page - 93

    62 INTRODUCTION CHAP. 1 UNIX Win32 Description fork CreateProcess Create a new process waitpid WaitForSingleObject Can wait for a process to exit execve (none) CreateProcess = for k + execve exit ExitProcess Terminate execution open CreateFile Create a file or open an existing file close CloseHandle Close a file read ReadFile Read data from a file wr ite Wr iteFile Wr ite data to a read more..

  • Page - 94

    SEC. 1.7 OPERATING SYSTEM STRUCTURE 63 1.7.1 Monolithic Systems By far the most common organization, in the monolithic approach the entire operating system runs as a single program in kernel mode. The operating system is written as a collection of procedures, linked together into a single large executable binary program. When this technique is used, each procedure in the system is read more..

  • Page - 95

    64 INTRODUCTION CHAP. 1 Main procedure Service procedures Utility procedures Figure 1-24. A simple structuring model for a monolithic system. 1.7.2 Layered Systems A generalization of the approach of Fig. 1-24 is to organize the operating sys- tem as a hierarchy of layers, each one constructed upon the one below it. The first system constructed in this way was the THE system built read more..

  • Page - 96

    SEC. 1.7 OPERATING SYSTEM STRUCTURE 65 took care of making sure pages were brought into memory at the moment they were needed and removed when they were not needed. Layer 2 handled communication between each process and the operator con- sole (that is, the user). On top of this layer each process effectively had its own op- erator console. Layer 3 took care of managing the read more..

  • Page - 97

    66 INTRODUCTION CHAP. 1 course, since some bugs may be things like issuing an incorrect error message in a situation that rarely occurs. Nevertheless, operating systems are sufficiently buggy that computer manufacturers put reset buttons on them (often on the front panel), something the manufacturers of TV sets, stereos, and cars do not do, despite the large amount of software in read more..

  • Page - 98

    SEC. 1.7 OPERATING SYSTEM STRUCTURE 67 User mode Microkernel handles interrupts, processes, scheduling, interprocess communication Sys Clock FS Proc. Reinc. Other ... Servers Disk TTY Netw Print Other ... Drivers Shell Make ... Process User programs Other Figure 1-26. Simplified structure of the MINIX system. the kernel to do the write. This approach means that the kernel can check to see that the read more..

  • Page - 99

    68 INTRODUCTION CHAP. 1 highest-priority process that is runnable. The mechanism—in the kernel—is to look for the highest-priority process and run it. The policy—assigning priorities to processes—can be done by user-mode processes. In this way, policy and mechan- ism can be decoupled and the kernel can be made smaller. 1.7.4 Client-Server Model A slight variation of the microkernel read more..

  • Page - 100

    SEC. 1.7 OPERATING SYSTEM STRUCTURE 69 1.7.5 Virtual Machines The initial releases of OS/360 were strictly batch systems. Nevertheless, many 360 users wanted to be able to work interactively at a terminal, so various groups, both inside and outside IBM, decided to write timesharing systems for it. The of- ficial IBM timesharing system, TSS/360, was delivered late, and when it finally read more..

  • Page - 101

    70 INTRODUCTION CHAP. 1 transaction-processing operating systems, while others ran a single-user, interactive system called CMS (Conversational Monitor System) for interactive timesharing users. The latter was popular with programmers. When a CMS program executed a system call, the call was trapped to the oper- ating system in its own virtual machine, not to VM/370, just as it would read more..

  • Page - 102

    SEC. 1.7 OPERATING SYSTEM STRUCTURE 71 ‘‘virtual machine monitor’’ requires more keystrokes than people are prepared to put up with now. Note that many authors use the terms interchangeably though. Type 1 hypervisor Host operating system (a) (b) ... Linux Windows Excel Word Mplayer Apollon Machine simulator Guest OS Guest Host OS process OS process Host operating system (c) Type 2 read more..

  • Page - 103

    72 INTRODUCTION CHAP. 1 The next step in improving performance was to add a kernel module to do some of the heavy lifting, as shown in Fig. 1-29(c). In practice now, all commer- cially available hypervisors, such as VMware Workstation, use this hybrid strategy (and have many other improvements as well). They are called type 2 hypervisors by everyone, so we will (somewhat read more..

  • Page - 104

    SEC. 1.7 OPERATING SYSTEM STRUCTURE 73 1.7.6 Exokernels Rather than cloning the actual machine, as is done with virtual machines, an- other strategy is partitioning it, in other words, giving each user a subset of the re- sources. Thus one virtual machine might get disk blocks 0 to 1023, the next one might get blocks 1024 to 2047, and so on. At the bottom layer, running in read more..

  • Page - 105

    74 INTRODUCTION CHAP. 1 One feature C has that Java and Python do not is explicit pointers. A pointer is a variable that points to (i.e., contains the address of) a variable or data structure. Consider the statements char c1, c2, *p; c1 = ’c’; p = &c1; c2 = *p; which declare c1 and c2 to be character variables and p to be a variable that points to (i.e., read more..

  • Page - 106

    SEC. 1.8 THE WORLD ACCORDING TO C 75 i = max(j, k+1) and get i = (j > k+1 ? j : k+1) to store the larger of j and k+1 in i. Headers can also contain conditional compila- tion, for example #ifdef X86 intel int ack(); #endif which compiles into a call to the function intel int ack if the macro X86 is defined and nothing otherwise. Conditional compilation is heavily read more..

  • Page - 107

    76 INTRODUCTION CHAP. 1 recompile them, thus reducing the number of compilations to the bare minimum. In large projects, creating the Makefile is error prone, so there are tools that do it automatically. Once all the .o files are ready, they are passed to a program called the linker to combine all of them into a single executable binary file. Any library functions cal- led read more..

  • Page - 108

    SEC. 1.8 THE WORLD ACCORDING TO C 77 and file systems. At run time the operating system may consist of multiple seg- ments, for the text (the program code), the data, and the stack. The text segment is normally immutable, not changing during execution. The data segment starts out at a certain size and initialized with certain values, but it can change and grow as need be. read more..

  • Page - 109

    78 INTRODUCTION CHAP. 1 the past 5 to 10 years, just to give a flavor of what might be on the horizon. This introduction is certainly not comprehensive. It is based largely on papers that have been published in the top research conferences because these ideas have at least survived a rigorous peer review process in order to get published. Note that in com- puter read more..

  • Page - 110

    SEC. 1.10 OUTLINE OF THE REST OF THIS BOOK 79 some key abstractions, the most important of which are processes and threads, ad- dress spaces, and files. Accordingly the next three chapters are devoted to these critical topics. Chapter 2 is about processes and threads. It discusses their properties and how they communicate with one another. It also gives a number of detailed read more..

  • Page - 111

    80 INTRODUCTION CHAP. 1 Exp. Explicit Prefix Exp. Explicit Prefix 10− 3 0.001 milli 10 3 1,000 Kilo 10− 6 0.000001 micro 10 6 1,000,000 Mega 10− 9 0.000000001 nano 10 9 1,000,000,000 Giga 10− 12 0.000000000001 pico 10 12 1,000,000,000,000 Tera 10− 15 0.000000000000001 femto 10 15 1,000,000,000,000,000 Peta 10− 18 0.000000000000000001 atto 10 18 1,000,000,000,000,000,000 Exa 10− 21 0.000000000000000000001 read more..

  • Page - 112

    SEC. 1.12 SUMMARY 81 The heart of any operating system is the set of system calls that it can handle. These tell what the operating system really does. For UNIX, we have looked at four groups of system calls. The first group of system calls relates to process crea- tion and termination. The second group is for reading and writing files. The third group is for directory read more..

  • Page - 113

    82 INTRODUCTION CHAP. 1 12. Which of the following instructions should be allowed only in kernel mode? (a) Disable all interrupts. (b) Read the time-of-day clock. (c) Set the time-of-day clock. (d) Change the memory map. 13. Consider a system that has two CPUs, each CPU having two threads (hyperthreading). Suppose three programs, P0, P1,and P2, are started with run times of 5, read more..

  • Page - 114

    CHAP. 1 PROBLEMS 83 where the lseek call makes a seek to byte 3 of the file. What does buffer contain after the read has completed? 24. Suppose that a 10-MB file is stored on a disk on the same track (track 50) in consecu- tive sectors. The disk arm is currently situated over track number 100. How long will it take to retrieve this file from the disk? Assume read more..

  • Page - 115

    84 INTRODUCTION CHAP. 1 ruining the file system. You can also do the experiment safely in a virtual machine. Note: Do not try this on a shared system without first getting permission from the sys- tem administrator. The consequences will be instantly obvious so you are likely to be caught and sanctions may follow. 36. Examine and try to interpret the contents of a UNIX-like read more..

  • Page - 116

    2 PROCESSES AND THREADS We are now about to embark on a detailed study of how operating systems are designed and constructed. The most central concept in any operating system is the process: an abstraction of a running program. Everything else hinges on this con- cept, and the operating system designer (and student) should have a thorough un- derstanding of what a process read more..

  • Page - 117

    86 PROCESSES AND THREADS CHAP. 2 in. If there are multiple disks present, some or all of the newer ones may be fired off to other disks long before the first request is satisfied. Clearly some way is needed to model and control this concurrency. Processes (and especially threads) can help here. Now consider a user PC. When the system is booted, many processes are se- read more..

  • Page - 118

    SEC. 2.1 PROCESSES 87 a long enough time interval, all the processes have made progress, but at any giv en instant only one process is actually running. A B C D D C B A Process switch One program counter Four program counters Process Time BC D A (a) (b) (c) Figure 2-1. (a) Multiprogramming four programs. (b) Conceptual model of four independent, sequential processes. (c) Only one read more..

  • Page - 119

    88 PROCESSES AND THREADS CHAP. 2 and the cake ingredients are the input data. The process is the activity consisting of our baker reading the recipe, fetching the ingredients, and baking the cake. Now imagine that the computer scientist’s son comes running in screaming his head off, saying that he has been stung by a bee. The computer scientist records where he was in the read more..

  • Page - 120

    SEC. 2.1 PROCESSES 89 example, one background process may be designed to accept incoming email, sleeping most of the day but suddenly springing to life when email arrives. Another background process may be designed to accept incoming requests for Web pages hosted on that machine, waking up when a request arrives to service the request. Processes that stay in the background to read more..

  • Page - 121

    90 PROCESSES AND THREADS CHAP. 2 program. For example, when a user types a command, say, sort, to the shell, the shell forks off a child process and the child executes sort. The reason for this two- step process is to allow the child to manipulate its file descriptors after the fork but before the execve in order to accomplish redirection of standard input, standard output, read more..

  • Page - 122

    SEC. 2.1 PROCESSES 91 Windows. Screen-oriented programs also support voluntary termination. Word processors, Internet browsers, and similar programs always have an icon or menu item that the user can click to tell the process to remove any temporary files it has open and then terminate. The second reason for termination is that the process discovers a fatal error. For example, read more..

  • Page - 123

    92 PROCESSES AND THREADS CHAP. 2 per terminal. These processes wait for someone to log in. If a login is successful, the login process executes a shell to accept commands. These commands may start up more processes, and so forth. Thus, all the processes in the whole system be- long to a single tree, with init at the root. In contrast, Windows has no concept of a process read more..

  • Page - 124

    SEC. 2.1 PROCESSES 93 12 3 4 Blocked Running Ready 1. Process blocks for input 2. Scheduler picks another process 3. Scheduler picks this process 4. Input becomes available Figure 2-2. A process can be in running, blocked, or ready state. Transitions be- tween these states are as shown. Four transitions are possible among these three states, as shown. Transition 1 occurs when read more..

  • Page - 125

    94 PROCESSES AND THREADS CHAP. 2 the interrupt handling and details of actually starting and stopping processes are hidden away in what is here called the scheduler, which is actually not much code. The rest of the operating system is nicely structured in process form. Few real sys- tems are as nicely structured as this, however. 0 1 n – 2 n – 1 Scheduler Processes read more..

  • Page - 126

    SEC. 2.1 PROCESSES 95 Process management Memory management File management Registers Pointer to text segment info Root directory Program counter Pointer to data segment info Wor king director y Program status word Pointer to stack segment info File descriptors Stack pointer User ID Process state Group ID Pr ior ity Scheduling parameters Process ID Parent process Process group Signals Time when read more..

  • Page - 127

    96 PROCESSES AND THREADS CHAP. 2 1. Hardware stacks program counter, etc. 2. Hardware loads new program counter from interrupt vector. 3. Assembly-language procedure saves registers. 4. Assembly-language procedure sets up new stack. 5. C interrupt service runs (typically reads and buffers input). 6. Scheduler decides which process is to run next. 7. C procedure returns to the assembly read more..

  • Page - 128

    SEC. 2.1 PROCESSES 97 For the sake of accuracy, it should be pointed out that the probabilistic model just described is only an approximation. It implicitly assumes that all n processes are independent, meaning that it is quite acceptable for a system with fiv e proc- esses in memory to have three running and two waiting. But with a single CPU, we cannot have three read more..

  • Page - 129

    98 PROCESSES AND THREADS CHAP. 2 We hav e seen this argument once before. It is precisely the argument for hav- ing processes. Instead, of thinking about interrupts, timers, and context switches, we can think about parallel processes. Only now with threads we add a new ele- ment: the ability for the parallel entities to share an address space and all of its data among read more..

  • Page - 130

    SEC. 2.2 THREADS 99 Threads can help here. Suppose that the word processor is written as a two- threaded program. One thread interacts with the user and the other handles refor- matting in the background. As soon as the sentence is deleted from page 1, the interactive thread tells the reformatting thread to reformat the whole book. Mean- while, the interactive thread read more..

  • Page - 131

    100 PROCESSES AND THREADS CHAP. 2 An analogous situation exists with many other interactive programs. For exam- ple, an electronic spreadsheet is a program that allows a user to maintain a matrix, some of whose elements are data provided by the user. Other elements are com- puted based on the input data using potentially complex formulas. When a user changes one element, many read more..

  • Page - 132

    SEC. 2.2 THREADS 101 When the thread blocks on the disk operation, another thread is chosen to run, pos- sibly the dispatcher, in order to acquire more work, or possibly another worker that is now ready to run. This model allows the server to be written as a collection of sequential threads. The dispatcher’s program consists of an infinite loop for getting a work request read more..

  • Page - 133

    102 PROCESSES AND THREADS CHAP. 2 reply processed. With nonblocking disk I/O, a reply probably will have to take the form of a signal or interrupt. In this design, the ‘‘sequential process’’ model that we had in the first two cases is lost. The state of the computation must be explicitly saved and restored in the table every time the server switches from working on read more..

  • Page - 134

    SEC. 2.2 THREADS 103 separate them; this is where threads come in. First we will look at the classical thread model; after that we will examine the Linux thread model, which blurs the line between processes and threads. One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, read more..

  • Page - 135

    104 PROCESSES AND THREADS CHAP. 2 Thread Thread Kernel Kernel Process 1 Process 2 Process 3 Process User space Kernel space (a) (b) Figure 2-11. (a) Three processes each with one thread. (b) One process with three threads. same global variables. Since every thread can access every memory address within the process’ address space, one thread can read, write, or even wipe out another read more..

  • Page - 136

    SEC. 2.2 THREADS 105 of resource management, not the thread. If each thread had its own address space, open files, pending alarms, and so on, it would be a separate process. What we are trying to achieve with the thread concept is the ability for multiple threads of ex- ecution to share a set of resources so that they can work together closely to per- form some task. read more..

  • Page - 137

    106 PROCESSES AND THREADS CHAP. 2 address space of the creating thread. Sometimes threads are hierarchical, with a parent-child relationship, but often no such relationship exists, with all threads being equal. With or without a hierarchical relationship, the creating thread is usually returned a thread identifier that names the new thread. When a thread has finished its work, it can read more..

  • Page - 138

    SEC. 2.2 THREADS 107 a few of the major ones to give an idea of how it works. The calls we will describe below are listed in Fig. 2-14. Thread call Description Pthread create Create a new thread Pthread exit Ter minate the calling thread Pthread join Wait for a specific thread to exit Pthread yield Release the CPU to let another thread run Pthread attr init Create read more..

  • Page - 139

    108 PROCESSES AND THREADS CHAP. 2 a new thread on each iteration, after announcing its intention. If the thread creation fails, it prints an error message and then exits. After creating all the threads, the main program exits. #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define NUMBER OF THREADS 10 void *pr int hello world(void *tid) { /* This function read more..

  • Page - 140

    SEC. 2.2 THREADS 109 The first method is to put the threads package entirely in user space. The ker- nel knows nothing about them. As far as the kernel is concerned, it is managing ordinary, single-threaded processes. The first, and most obvious, advantage is that a user-level threads package can be implemented on an operating system that does not support threads. All operating read more..

  • Page - 141

    110 PROCESSES AND THREADS CHAP. 2 the machine happens to have an instruction to store all the registers and another one to load them all, the entire thread switch can be done in just a handful of in- structions. Doing thread switching like this is at least an order of magnitude— maybe more—faster than trapping to the kernel and is a strong argument in favor of read more..

  • Page - 142

    SEC. 2.2 THREADS 111 Somewhat analogous to the problem of blocking system calls is the problem of page faults. We will study these in Chap. 3. For the moment, suffice it to say that computers can be set up in such a way that not all of the program is in main memo- ry at once. If the program calls or jumps to an instruction that is not in memory, a page fault read more..

  • Page - 143

    112 PROCESSES AND THREADS CHAP. 2 The kernel’s thread table holds each thread’s registers, state, and other infor- mation. The information is the same as with user-level threads, but now kept in the kernel instead of in user space (inside the run-time system). This information is a subset of the information that traditional kernels maintain about their single- threaded processes, read more..

  • Page - 144

    SEC. 2.2 THREADS 113 When this approach is used, the programmer can determine how many kernel threads to use and how many user-level threads to multiplex on each one. This model gives the ultimate in flexibility. Multiple user threads on a kernel thread User space Kernel space Kernel thread Kernel Figure 2-17. Multiplexing user-level threads onto kernel-level threads. With this approach, read more..

  • Page - 145

    114 PROCESSES AND THREADS CHAP. 2 kernel-user transition. The user-space run-time system can block the synchronizing thread and schedule a new one by itself. When scheduler activations are used, the kernel assigns a certain number of virtual processors to each process and lets the (user-space) run-time system allo- cate threads to processors. This mechanism can also be used on a read more..

  • Page - 146

    SEC. 2.2 THREADS 115 call waiting for an incoming message. When a message arrives, it accepts the mes- sage, unpacks it, examines the contents, and processes it. However, a completely different approach is also possible, in which the arrival of a message causes the system to create a new thread to handle the message. Such a thread is called a pop-up thread and is read more..

  • Page - 147

    116 PROCESSES AND THREADS CHAP. 2 2.2.9 Making Single-Threaded Code Multithreaded Many existing programs were written for single-threaded processes. Convert- ing these to multithreading is much trickier than it may at first appear. Below we will examine just a few of the pitfalls. As a start, the code of a thread normally consists of multiple procedures, just like a process. These read more..

  • Page - 148

    SEC. 2.2 THREADS 117 new scoping level, variables visible to all the procedures of a thread (but not to other threads), in addition to the existing scoping levels of variables visible only to one procedure and variables visible everywhere in the program. Thread 1's code Thread 2's code Thread 1's stack Thread 2's stack Thread 1's globals Thread 2's globals Figure 2-20. Threads can have read more..

  • Page - 149

    118 PROCESSES AND THREADS CHAP. 2 The next problem in turning a single-threaded program into a multithreaded one is that many library procedures are not reentrant. That is, they were not de- signed to have a second call made to any giv en procedure while a previous call has not yet finished. For example, sending a message over the network may well be programmed to read more..

  • Page - 150

    SEC. 2.2 THREADS 119 These problems are certainly not insurmountable, but they do show that just introducing threads into an existing system without a fairly substantial system redesign is not going to work at all. The semantics of system calls may have to be redefined and libraries rewritten, at the very least. And all of these things must be done in such a way as to read more..

  • Page - 151

    120 PROCESSES AND THREADS CHAP. 2 wants to print a file, it enters the file name in a special spooler directory. Another process, the printer daemon, periodically checks to see if there are any files to be printed, and if there are, it prints them and then removes their names from the di- rectory. Imagine that our spooler directory has a very large number of slots, read more..

  • Page - 152

    SEC. 2.3 INTERPROCESS COMMUNICATION 121 never comes. Situations like this, where two or more processes are reading or writ- ing some shared data and the final result depends on who runs precisely when, are called race conditions. Debugging programs containing race conditions is no fun at all. The results of most test runs are fine, but once in a blue moon something weird and read more..

  • Page - 153

    122 PROCESSES AND THREADS CHAP. 2 A enters critical region A leaves critical region B attempts to enter critical region B enters critical region T 1 T 2 T 3 T 4 Process A Process B B blocked B leaves critical region Time Figure 2-22. Mutual exclusion using critical regions. 2.3.3 Mutual Exclusion with Busy Waiting In this section we will examine various proposals for achieving mutual read more..

  • Page - 154

    SEC. 2.3 INTERPROCESS COMMUNICATION 123 often a useful technique within the operating system itself but is not appropriate as a general mutual exclusion mechanism for user processes. The possibility of achieving mutual exclusion by disabling interrupts—even within the kernel—is becoming less every day due to the increasing number of multicore chips even in low-end PCs. Tw o cores read more..

  • Page - 155

    124 PROCESSES AND THREADS CHAP. 2 while (TRUE) { while (TRUE) { while (turn != 0) /* loop */ ; while (turn != 1) /* loop */; cr itical region( ); cr itical region( ); tur n = 1; tur n=0; noncr itical region( ); noncr itical region( ); }} (a) (b) Figure 2-23. A proposed solution to the critical-region problem. (a) Process 0. (b) Process 1. In both cases, be sure to read more..

  • Page - 156

    SEC. 2.3 INTERPROCESS COMMUNICATION 125 In 1981, G. L. Peterson discovered a much simpler way to achieve mutual exclusion, thus rendering Dekker’s solution obsolete. Peterson’s algorithm is shown in Fig. 2-24. This algorithm consists of two procedures written in ANSI C, which means that function prototypes should be supplied for all the functions de- fined and used. However, to read more..

  • Page - 157

    126 PROCESSES AND THREADS CHAP. 2 The TSL Instruction Now let us look at a proposal that requires a little help from the hardware. Some computers, especially those designed with multiple processors in mind, have an instruction like TSL RX,LOCK (Test and Set Lock) that works as follows. It reads the contents of the memory word lock into register RX and then stores a nonzero read more..

  • Page - 158

    SEC. 2.3 INTERPROCESS COMMUNICATION 127 enter region: TSL REGISTER,LOCK | copy lock to register and set lock to 1 CMP REGISTER,#0 | was lock zero? JNE enter region | if it was not zero, lock was set, so loop RET | retur n to caller; critical region entered leave region: MOVE LOCK,#0 | store a 0 in lock RET | retur n to caller Figure 2-25. Entering and leaving a critical read more..

  • Page - 159

    128 PROCESSES AND THREADS CHAP. 2 scheduled while H is running, L never gets the chance to leave its critical region, so H loops forever. This situation is sometimes referred to as the priority inversion problem. Now let us look at some interprocess communication primitives that block in- stead of wasting CPU time when they are not allowed to enter their critical regions. read more..

  • Page - 160

    SEC. 2.3 INTERPROCESS COMMUNICATION 129 #define N 100 /* number of slots in the buffer */ int count = 0; /* number of items in the buffer */ void producer(void) { int item; while (TRUE) { /* repeat forever */ item = produce item( ); /* generate next item */ if (count == N) sleep( ); /* if buffer is full, go to sleep */ inser t item(item); /* put item in buffer */ read more..

  • Page - 161

    130 PROCESSES AND THREADS CHAP. 2 While the wakeup waiting bit saves the day in this simple example, it is easy to construct examples with three or more processes in which one wakeup waiting bit is insufficient. We could make another patch and add a second wakeup waiting bit, or maybe 8 or 32 of them, but in principle the problem is still there. 2.3.5 Semaphores This was read more..

  • Page - 162

    SEC. 2.3 INTERPROCESS COMMUNICATION 131 system briefly disabling all interrupts while it is testing the semaphore, updating it, and putting the process to sleep, if necessary. As all of these actions take only a few instructions, no harm is done in disabling interrupts. If multiple CPUs are being used, each semaphore should be protected by a lock variable, with the TSL or XCHG read more..

  • Page - 163

    132 PROCESSES AND THREADS CHAP. 2 This solution uses three semaphores: one called full for counting the number of slots that are full, one called empty for counting the number of slots that are empty, and one called mutex to make sure the producer and consumer do not access the buffer at the same time. Full is initially 0, empty is initially equal to the number of slots read more..

  • Page - 164

    SEC. 2.3 INTERPROCESS COMMUNICATION 133 Tw o procedures are used with mutexes. When a thread (or process) needs access to a critical region, it calls mutex lock. If the mutex is currently unlocked (mean- ing that the critical region is available), the call succeeds and the calling thread is free to enter the critical region. On the other hand, if the mutex is already read more..

  • Page - 165

    134 PROCESSES AND THREADS CHAP. 2 The mutex system that we have described above is a bare-bones set of calls. With all software, there is always a demand for more features, and synchronization primitives are no exception. For example, sometimes a thread package offers a call mutex trylock that either acquires the lock or returns a code for failure, but does not block. This read more..

  • Page - 166

    SEC. 2.3 INTERPROCESS COMMUNICATION 135 really has to. Since switching to the kernel and back is quite expensive, doing so improves performance considerably. A futex consists of two parts: a kernel service and a user library. The kernel service provides a ‘‘wait queue’’ that allows multiple processes to wait on a lock. They will not run, unless the kernel explicitly un- read more..

  • Page - 167

    136 PROCESSES AND THREADS CHAP. 2 Thread call Description Pthread mutex init Create a mutex Pthread mutex destroy Destroy an existing mutex Pthread mutex lock Acquire a lock or block Pthread mutex tr ylock Acquire a lock or fail Pthread mutex unlock Release a lock Figure 2-30. Some of the Pthreads calls relating to mutexes. In addition to mutexes, Pthreads offers a second read more..

  • Page - 168

    SEC. 2.3 INTERPROCESS COMMUNICATION 137 Thread call Description Pthread cond init Create a condition var iable Pthread cond destroy Destroy a condition var iable Pthread cond wait Block waiting for a signal Pthread cond signal Signal another thread and wake it up Pthread cond broadcast Signal multiple threads and wake all of them Figure 2-31. Some of the Pthreads calls relating to read more..

  • Page - 169

    138 PROCESSES AND THREADS CHAP. 2 #include <stdio.h> #include <pthread.h> #define MAX 1000000000 /* how many numbers to produce */ pthread mutex t the mutex; pthread cond t condc, condp; /* used for signaling */ int buffer = 0; /* buffer used between producer and consumer */ void *producer(void *ptr) /* produce data */ { int i; for (i= 1; i <= MAX; i++) { pthread mutex read more..

  • Page - 170

    SEC. 2.3 INTERPROCESS COMMUNICATION 139 Monitors have an important property that makes them useful for achieving mutual exclusion: only one process can be active in a monitor at any instant. Moni- tors are a programming-language construct, so the compiler knows they are special and can handle calls to monitor procedures differently from other procedure calls. Typically, when a read more..

  • Page - 171

    140 PROCESSES AND THREADS CHAP. 2 monitor example integer i; condition c; procedure producer(); . . . end; procedure consumer(); ... end; end monitor; Figure 2-33. A monitor. waiting on it, the signal is lost forever. In other words, the wait must come before the signal . This rule makes the implementation much simpler. In practice, it is not a problem because it is easy to keep read more..

  • Page - 172

    SEC. 2.3 INTERPROCESS COMMUNICATION 141 monitor ProducerConsumer condition full, empty; integer count; procedure insert(item: integer); begin if count = N then wait(full); insert item(item); count := count + 1; if count =1 then signal(empty) end; function remove: integer; begin if count =0 then wait(empty); remove = remove item; count := count − 1; if count = N − 1 then signal(full) end; read more..

  • Page - 173

    142 PROCESSES AND THREADS CHAP. 2 public class ProducerConsumer { static final int N = 100; // constant giving the buffer size static producer p = new producer( ); // instantiate a new producer thread static consumer c = new consumer( ); // instantiate a new consumer thread static our monitor mon = new our monitor( ); // instantiate a new monitor public static void main(String read more..

  • Page - 174

    SEC. 2.3 INTERPROCESS COMMUNICATION 143 The producer and consumer threads are functionally identical to their count- erparts in all our previous examples. The producer has an infinite loop generating data and putting it into the common buffer. The consumer has an equally infinite loop taking data out of the common buffer and doing some fun thing with it. The interesting part of read more..

  • Page - 175

    144 PROCESSES AND THREADS CHAP. 2 inapplicable. The conclusion is that semaphores are too low lev el and monitors are not usable except in a few programming languages. Also, none of the primitives allow information exchange between machines. Something else is needed. 2.3.8 Message Passing That something else is message passing. This method of interprocess commu- nication uses two read more..

  • Page - 176

    SEC. 2.3 INTERPROCESS COMMUNICATION 145 At the other end of the spectrum, there are also design issues that are important when the sender and receiver are on the same machine. One of these is perfor- mance. Copying messages from one process to another is always slower than doing a semaphore operation or entering a monitor. Much work has gone into mak- ing message passing read more..

  • Page - 177

    146 PROCESSES AND THREADS CHAP. 2 #define N 100 /* number of slots in the buffer */ void producer(void) { int item; message m; /* message buffer */ while (TRUE) { item = produce item( ); /* generate something to put in buffer */ receive(consumer, &m); /* wait for an empty to arrive */ build message(&m, item); /* constr uct a message to send */ send(consumer, &m); /* send read more..

  • Page - 178

    SEC. 2.3 INTERPROCESS COMMUNICATION 147 Barr ier Barr ier Barr ier A A A B B B C C D D D Time Time Time Process (a) (b) (c) C Figure 2-37. Use of a barrier. (a) Processes approaching a barrier. (b) All proc- esses but one blocked at the barrier. (c) When the last process arrives at the barri- er, all of them are let through. In Fig. 2-37(a) we see four processes approaching a read more..

  • Page - 179

    148 PROCESSES AND THREADS CHAP. 2 is to program each process to execute a barr ier operation after it has finished its part of the current iteration. When all of them are done, the new matrix (the input to the next iteration) will be finished, and all processes will be simultaneously re- leased to start the next iteration. 2.3.10 Avoiding Locks: Read-Copy-Update The fastest read more..

  • Page - 180

    SEC. 2.4 SCHEDULING 149 (a) Original tree. (b) Initialize node X and connect E to X. Any readers in A and E are not affected. X A B E D C D CD C D CD C A B E (c) When X is completely initialized, connect X to A. Readers currently in E will have read the old version, while readers in A will pick up the new version of the tree. X A B E (d) Decouple B from A. Note read more..

  • Page - 181

    150 PROCESSES AND THREADS CHAP. 2 2.4.1 Introduction to Scheduling Back in the old days of batch systems with input in the form of card images on a magnetic tape, the scheduling algorithm was simple: just run the next job on the tape. With multiprogramming systems, the scheduling algorithm became more complex because there were generally multiple users waiting for service. Some read more..

  • Page - 182

    SEC. 2.4 SCHEDULING 151 In addition to picking the right process to run, the scheduler also has to worry about making efficient use of the CPU because process switching is expensive. To start with, a switch from user mode to kernel mode must occur. Then the state of the current process must be saved, including storing its registers in the process ta- ble so they can be read more..

  • Page - 183

    152 PROCESSES AND THREADS CHAP. 2 The former are called compute-bound or CPU-bound; the latter are called I/O- bound. Compute-bound processes typically have long CPU bursts and thus infre- quent I/O waits, whereas I/O-bound processes have short CPU bursts and thus fre- quent I/O waits. Note that the key factor is the length of the CPU burst, not the length of the I/O read more..

  • Page - 184

    SEC. 2.4 SCHEDULING 153 respect to how they deal with clock interrupts. A nonpreemptive scheduling algo- rithm picks a process to run and then just lets it run until it blocks (either on I/O or waiting for another process) or voluntarily releases the CPU. Even if it runs for many hours, it will not be forcibly suspended. In effect, no scheduling decisions are made during read more..

  • Page - 185

    154 PROCESSES AND THREADS CHAP. 2 In systems with real-time constraints, preemption is, oddly enough, sometimes not needed because the processes know that they may not run for long periods of time and usually do their work and block quickly. The difference with interactive systems is that real-time systems run only programs that are intended to further the application at hand. read more..

  • Page - 186

    SEC. 2.4 SCHEDULING 155 done per second than if some of the components are idle. In a batch system, for example, the scheduler has control of which jobs are brought into memory to run. Having some CPU-bound processes and some I/O-bound processes in memory to- gether is a better idea than first loading and running all the CPU-bound jobs and then, when they are finished, read more..

  • Page - 187

    156 PROCESSES AND THREADS CHAP. 2 On the other hand, when a user clicks on the icon that breaks the connection to the cloud server after the video has been uploaded, he has different expectations. If it has not completed after 30 sec, the user will probably be swearing a blue streak, and after 60 sec he will be foaming at the mouth. This behavior is due to the com- read more..

  • Page - 188

    SEC. 2.4 SCHEDULING 157 The great strength of this algorithm is that it is easy to understand and equally easy to program. It is also fair in the same sense that allocating scarce concert tickets or brand-new iPhones to people who are willing to stand on line starting at 2 A .M. is fair. With this algorithm, a single linked list keeps track of all ready proc- esses. read more..

  • Page - 189

    158 PROCESSES AND THREADS CHAP. 2 jobs, with execution times of a, b, c,and d, respectively. The first job finishes at time a, the second at time a + b, and so on. The mean turnaround time is (4a + 3b + 2c + d)/4. It is clear that a contributes more to the average than the other times, so it should be the shortest job, with b next, then c, and finally d as read more..

  • Page - 190

    SEC. 2.4 SCHEDULING 159 (a) Current process Next process BF D G A (b) Current process FD G A B Figure 2-42. Round-robin scheduling. (a) The list of runnable processes. (b) The list of runnable processes after B uses up its quantum. various tables and lists, flushing and reloading the memory cache, and so on. Sup- pose that this process switch or context switch, as it is sometimes read more..

  • Page - 191

    160 PROCESSES AND THREADS CHAP. 2 pecking order may be the president first, the faculty deans next, then professors, secretaries, janitors, and finally students. The need to take external factors into ac- count leads to priority scheduling. The basic idea is straightforward: each proc- ess is assigned a priority, and the runnable process with the highest priority is al- lowed to read more..

  • Page - 192

    SEC. 2.4 SCHEDULING 161 Priority 4 Priority 3 Priority 2 Priority 1 Queue headers Runnable processes (Highest priority) (Lowest priority) Figure 2-43. A scheduling algorithm with four priority classes. Multiple Queues One of the earliest priority schedulers was in CTSS, the M.I.T. Compatible TimeSharing System that ran on the IBM 7094 (Corbato´ et al., 1962). CTSS had the problem that read more..

  • Page - 193

    162 PROCESSES AND THREADS CHAP. 2 Shortest Process Next Because shortest job first always produces the minimum average response time for batch systems, it would be nice if it could be used for interactive processes as well. To a certain extent, it can be. Interactive processes generally follow the pat- tern of wait for command, execute command, wait for command, execute com- read more..

  • Page - 194

    SEC. 2.4 SCHEDULING 163 Lottery Scheduling While making promises to the users and then living up to them is a fine idea, it is difficult to implement. However, another algorithm can be used to give similarly predictable results with a much simpler implementation. It is called lottery scheduling (Waldspurger and Weihl, 1994). The basic idea is to give processes lottery tickets read more..

  • Page - 195

    164 PROCESSES AND THREADS CHAP. 2 the CPU and the scheduler picks processes in such a way as to enforce it. Thus if two users have each been promised 50% of the CPU, they will each get that, no matter how many processes they hav e in existence. As an example, consider a system with two users, each of which has been promised 50% of the CPU. User 1 has four read more..

  • Page - 196

    SEC. 2.4 SCHEDULING 165 m i =1 Σ Ci Pi ≤ 1 A real-time system that meets this criterion is said to be schedulable. This means it can actually be implemented. A process that fails to meet this test cannot be scheduled because the total amount of CPU time the processes want collectively is more than the CPU can deliver. As an example, consider a soft real-time system with read more..

  • Page - 197

    166 PROCESSES AND THREADS CHAP. 2 2.4.6 Thread Scheduling When several processes each have multiple threads, we have two lev els of par- allelism present: processes and threads. Scheduling in such systems differs sub- stantially depending on whether user-level threads or kernel-level threads (or both) are supported. Let us consider user-level threads first. Since the kernel is not read more..

  • Page - 198

    SEC. 2.4 SCHEDULING 167 Now consider the situation with kernel-level threads. Here the kernel picks a particular thread to run. It does not have to take into account which process the thread belongs to, but it can if it wants to. The thread is given a quantum and is for- cibly suspended if it exceeds the quantum. With a 50-msec quantum but threads that block after 5 read more..

  • Page - 199

    168 PROCESSES AND THREADS CHAP. 2 primitive is by showing how elegantly it solves the dining philosophers problem. The problem can be stated quite simply as follows. Five philosophers are seated around a circular table. Each philosopher has a plate of spaghetti. The spaghetti is so slippery that a philosopher needs two forks to eat it. Between each pair of plates is one read more..

  • Page - 200

    SEC. 2.5 CLASSICAL IPC PROBLEMS 169 #define N 5 /* number of philosophers */ void philosopher(int i) /* i: philosopher number, from 0 to 4 */ { while (TRUE) { think( ); /* philosopher is thinking */ take fork(i); /* take left for k */ take fork((i+1) % N); /* take right for k; % is modulo operator */ eat( ); /* yum-yum, spaghetti */ put fork(i); /* put left for k back read more..

  • Page - 201

    170 PROCESSES AND THREADS CHAP. 2 #define N 5 /* number of philosophers */ #define LEFT (i+N −1)%N /* number of i’s left neighbor */ #define RIGHT (i+1)%N /* number of i’s right neighbor */ #define THINKING 0 /* philosopher is thinking */ #define HUNGRY 1 /* philosopher is trying to get for ks */ #define EATING 2 /* philosopher is eating */ typedef int semaphore; /* read more..

  • Page - 202

    SEC. 2.5 CLASSICAL IPC PROBLEMS 171 2.5.2 The Readers and Writers Problem The dining philosophers problem is useful for modeling processes that are competing for exclusive access to a limited number of resources, such as I/O de- vices. Another famous problem is the readers and writers problem (Courtois et al., 1971), which models access to a database. Imagine, for example, an read more..

  • Page - 203

    172 PROCESSES AND THREADS CHAP. 2 leave, they decrement the counter, and the last to leave does an up on the sema- phore, allowing a blocked writer, if there is one, to get in. The solution presented here implicitly contains a subtle decision worth noting. Suppose that while a reader is using the database, another reader comes along. Since having two readers at the same read more..

  • Page - 204

    SEC. 2.6 RESEARCH ON PROCESSES AND THREADS 173 Similarly, much research in the operating systems community these days fo- cuses on security issues. Numerous incidents have demonstrated that users need better protection from attackers (and, occasionally, from themselves). One ap- proach is to track and restrict carefully the information flows in an operating sys- tem (Giffin et al., read more..

  • Page - 205

    174 PROCESSES AND THREADS CHAP. 2 PROBLEMS 1. In Fig. 2-2, three process states are shown. In theory, with three states, there could be six transitions, two out of each state. However, only four transitions are shown. Are there any circumstances in which either or both of the missing transitions might occur? 2. Suppose that you were to design an advanced computer architecture read more..

  • Page - 206

    CHAP. 2 PROBLEMS 175 14. In Fig. 2-12 the register set is listed as a per-thread rather than a per-process item. Why? After all, the machine has only one set of registers. 15. Why would a thread ever voluntarily give up the CPU by calling thread yield? After all, since there is no periodic clock interrupt, it may never get the CPU back. 16. Can a thread ever be read more..

  • Page - 207

    176 PROCESSES AND THREADS CHAP. 2 28. When a computer is being developed, it is usually first simulated by a program that runs one instruction at a time. Even multiprocessors are simulated strictly sequentially like this. Is it possible for a race condition to occur when there are no simultaneous ev ents like this? 29. The producer-consumer problem can be extended to a system read more..

  • Page - 208

    CHAP. 2 PROBLEMS 177 37. Suppose that we have a message-passing system using mailboxes. When sending to a full mailbox or trying to receive from an empty one, a process does not block. Instead, it gets an error code back. The process responds to the error code by just trying again, over and over, until it succeeds. Does this scheme lead to race conditions? 38. read more..

  • Page - 209

    178 PROCESSES AND THREADS CHAP. 2 (a) Round robin. (b) Priority scheduling. (c) First-come, first-served (run in order 10, 6, 2, 4, 8). (d) Shortest job first. For (a), assume that the system is multiprogrammed, and that each job gets its fair share of the CPU. For (b) through (d), assume that only one job at a time runs, until it finishes. All jobs are completely CPU read more..

  • Page - 210

    CHAP. 2 PROBLEMS 179 script in the background and one in the foreground, each accessing the same file. How long does it take before a race condition manifests itself? What is the critical region? Modify the script to prevent the race. (Hint:use ln file file.lock to lock the data file.) 58. Assume that you have an operating system that provides semaphores. Implement a message read more..

  • Page - 211

    180 PROCESSES AND THREADS CHAP. 2 true if the number is a perfect number and false otherwise. The main program will read the numbers N and P from the command line. The main process will spawn a set of P threads. The numbers from 1 to N will be partitioned among these threads so that two threads do not work on the name number. For each number in this set, the read more..

  • Page - 212

    3 MEMORY MANAGEMENT Main memory (RAM) is an important resource that must be very carefully managed. While the average home computer nowadays has 10,000 times more memory than the IBM 7094, the largest computer in the world in the early 1960s, programs are getting bigger faster than memories. To paraphrase Parkinson’s Law, ‘‘Programs expand to f ill the memory available to hold read more..

  • Page - 213

    182 MEMORY MANAGEMENT CHAP. 3 In this chapter we will investigate several different memory management mod- els, ranging from very simple to highly sophisticated. Since managing the lowest level of cache memory is normally done by the hardware, the focus of this chapter will be on the programmer’s model of main memory and how it can be managed. The abstractions for, and the read more..

  • Page - 214

    SEC. 3.1 NO MEMORY ABSTRACTION 183 (a) (b) (c) 0xFFF … 00 0 User program User program User program Operating system in RAM Operating system in RAM Operating system in ROM Device drivers in ROM Figure 3-1. Three simple ways of organizing memory with an operating system and one user process. Other possibilities also exist. One way to get some parallelism in a system with no memory read more..

  • Page - 215

    184 MEMORY MANAGEMENT CHAP. 3 first program starts out by jumping to address 24, which contains a MOV instruc- tion. The second program starts out by jumping to address 28, which contains a CMP instruction. The instructions that are not relevant to this discussion are not shown. When the two programs are loaded consecutively in memory starting at address 0, we have the read more..

  • Page - 216

    SEC. 3.1 NO MEMORY ABSTRACTION 185 can reference a private set of addresses local to it. We will show how this can be acomplished shortly. What the IBM 360 did as a stop-gap solution was modify the second program on the fly as it loaded it into memory using a technique known as static relocation. It worked like this. When a program was loaded at address 16,384, the read more..

  • Page - 217

    186 MEMORY MANAGEMENT CHAP. 3 3.2.1 The Notion of an Address Space Tw o problems have to be solved to allow multiple applications to be in memo- ry at the same time without interfering with each other: protection and relocation. We looked at a primitive solution to the former used on the IBM 360: label chunks of memory with a protection key and compare the key read more..

  • Page - 218

    SEC. 3.2 A MEMORY ABSTRACTION: ADDRESS SPACES 187 programs are loaded into consecutive memory locations wherever there is room and without relocation during loading, as shown in Fig. 3-2(c). When a process is run, the base register is loaded with the physical address where its program begins in memory and the limit register is loaded with the length of the program. In Fig. read more..

  • Page - 219

    188 MEMORY MANAGEMENT CHAP. 3 0 4 8 12 16 20 24 28 (c) ADD JMP 24 MOV JMP 28 CMP . . . 0 . . . 0 16384 16388 16392 16396 16400 16404 16408 16412 16380 32764 16384 16384 Base register Limit register Figure 3-3. Base and limit registers can be used to give each process a separate address space. processes or more may be started up as soon as the computer is booted. For ex- ample, read more..

  • Page - 220

    SEC. 3.2 A MEMORY ABSTRACTION: ADDRESS SPACES 189 The operation of a swapping system is illustrated in Fig. 3-4. Initially, only process A is in memory. Then processes B and C are created or swapped in from disk. In Fig. 3-4(d) A is swapped out to disk. Then D comes in and B goes out. Finally A comes in again. Since A is now at a different location, addresses con- read more..

  • Page - 221

    190 MEMORY MANAGEMENT CHAP. 3 If it is expected that most processes will grow as they run, it is probably a good idea to allocate a little extra memory whenever a process is swapped in or moved, to reduce the overhead associated with moving or swapping processes that no long- er fit in their allocated memory. Howev er, when swapping processes to disk, only the memory read more..

  • Page - 222

    SEC. 3.2 A MEMORY ABSTRACTION: ADDRESS SPACES 191 Chapter 10, we will look at some specific memory allocators used in Linux (like buddy and slab allocators) in more detail. Memory Management with Bitmaps With a bitmap, memory is divided into allocation units as small as a few words and as large as several kilobytes. Corresponding to each allocation unit is a bit in the bitmap, read more..

  • Page - 223

    192 MEMORY MANAGEMENT CHAP. 3 Memory Management with Linked Lists Another way of keeping track of memory is to maintain a linked list of allo- cated and free memory segments, where a segment either contains a process or is an empty hole between two processes. The memory of Fig. 3-6(a) is represented in Fig. 3-6(c) as a linked list of segments. Each entry in the list read more..

  • Page - 224

    SEC. 3.2 A MEMORY ABSTRACTION: ADDRESS SPACES 193 Another well-known and widely used algorithm is best fit. Best fit searches the entire list, from beginning to end, and takes the smallest hole that is adequate. Rather than breaking up a big hole that might be needed later, best fit tries to find a hole that is close to the actual size needed, to best match the request read more..

  • Page - 225

    194 MEMORY MANAGEMENT CHAP. 3 is possible is quite expensive. If merging is not done, memory will quickly frag- ment into a large number of small holes into which no processes fit. 3.3 VIRTUAL MEMORY While base and limit registers can be used to create the abstraction of address spaces, there is another problem that has to be solved: managing bloatware. While memory sizes are read more..

  • Page - 226

    SEC. 3.3 VIRTUAL MEMORY 195 memory, the hardware performs the necessary mapping on the fly. When the pro- gram references a part of its address space that is not in physical memory, the oper- ating system is alerted to go get the missing piece and re-execute the instruction that failed. In a sense, virtual memory is a generalization of the base-and-limit-register idea. The 8088 read more..

  • Page - 227

    196 MEMORY MANAGEMENT CHAP. 3 is put directly onto the memory bus and causes the physical memory word with the same address to be read or written. When virtual memory is used, the virtual ad- dresses do not go directly to the memory bus. Instead, they go to an MMU (Mem- ory Management Unit) that maps the virtual addresses onto the physical memory addresses, as illustrated read more..

  • Page - 228

    SEC. 3.3 VIRTUAL MEMORY 197 Virtual address space Physical memory address 60K–64K 56K–60K 52K–56K 48K–52K 44K–48K 40K–44K 36K–40K 32K–36K 28K–32K 24K–28K 20K–24K 16K–20K 12K–16K 8K–12K 4K–8K 0K–4K 28K–32K 24K–28K 20K–24K 16K–20K 12K–16K 8K–12K 4K–8K 0K–4K Virtual page Page frame X X X X 7 X 5 X X X 3 4 0 6 1 2 Figure 3-9. The relation between virtual addresses read more..

  • Page - 229

    198 MEMORY MANAGEMENT CHAP. 3 trap to the operating system. This trap is called a page fault. The operating system picks a little-used page frame and writes its contents back to the disk (if it is not al- ready there). It then fetches (also from the disk) the page that was just referenced into the page frame just freed, changes the map, and restarts the trapped instruc- read more..

  • Page - 230

    SEC. 3.3 VIRTUAL MEMORY 199 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 000 000 000 000 111 000 101 000 000 000 011 100 000 110 001 010 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 Present/ absent bit Page table 12-bit offset copied directly from input to output Virtual page = 2 is used as an index into the page table Incoming virtual address (8196) Outgoing physical address (24580) 110 1 1 0 0 0 0 read more..

  • Page - 231

    200 MEMORY MANAGEMENT CHAP. 3 varies from computer to computer, but 32 bits is a common size. The most impor- tant field is the Pa g e frame number. After all, the goal of the page mapping is to output this value. Next to it we have the Present/absent bit. If this bit is 1, the entry is valid and can be used. If it is 0, the virtual page to which the read more..

  • Page - 232

    SEC. 3.3 VIRTUAL MEMORY 201 Information the operating system needs to handle page faults is kept in software tables inside the operating system. The hardware does not need it. Before getting into more implementation issues, it is worth pointing out again that what virtual memory fundamentally does is create a new abstraction—the ad- dress space—which is an abstraction of physical read more..

  • Page - 233

    202 MEMORY MANAGEMENT CHAP. 3 large; it is just not practical most of the time. Another one is that having to load the full page table at every context switch would completely kill performance. At the other extreme, the page table can be entirely in main memory. All the hardware needs then is a single register that points to the start of the page table. This design read more..

  • Page - 234

    SEC. 3.3 VIRTUAL MEMORY 203 Valid Virtual page Modified Protection Pag e frame 1 140 1 RW 31 1 20 0 RX 38 1 130 1 RW 29 1 129 1 RW 62 1 19 0 RX 50 1 21 0 RX 45 1 860 1 RW 14 1 861 1 RW 75 Figure 3-12. A TLB to speed up paging. Let us now see how the TLB functions. When a virtual address is presented to the MMU for translation, the hardware first read more..

  • Page - 235

    204 MEMORY MANAGEMENT CHAP. 3 instruction that faulted. And, of course, all of this must be done in a handful of in- structions because TLB misses occur much more frequently than page faults. Surprisingly enough, if the TLB is moderately large (say, 64 entries) to reduce the miss rate, software management of the TLB turns out to be acceptably efficient. The main gain here is read more..

  • Page - 236

    SEC. 3.3 VIRTUAL MEMORY 205 occurs if the page needs to be brought in from disk. Third, it is possible that the program simply accessed an invalid address and no mapping needs to be added in the TLB at all. In that case, the operating system typically kills the program with a segmentation fault. Only in this case did the program do something wrong. All other cases are read more..

  • Page - 237

    206 MEMORY MANAGEMENT CHAP. 3 (a) (b) Top-level page table Second-level page tables To pages Page table for the top 4M of memory 6 5 4 3 2 1 0 1023 6 5 4 3 2 1 0 1023 Bits 10 10 12 PT1 PT2 Offset Figure 3-13. (a) A 32-bit address with two page table fields. (b) Tw o-level page tables. level page table and obtain entry 1, which corresponds to addresses 4M to 8M − 1. It then read more..

  • Page - 238

    SEC. 3.3 VIRTUAL MEMORY 207 taken from the second-level page table is combined with the offset (4) to construct the physical address. This address is put on the bus and sent to memory. The interesting thing to note about Fig. 3-13 is that although the address space contains over a million pages, only four page tables are needed: the top-level table, and the second-level read more..

  • Page - 239

    208 MEMORY MANAGEMENT CHAP. 3 example, with 64-bit virtual addresses, a 4-KB page size, and 4 GB of RAM, an inverted page table requires only 1,048,576 entries. The entry keeps track of which (process, virtual page) is located in the page frame. Although inverted page tables save lots of space, at least when the virtual ad- dress space is much larger than the physical read more..

  • Page - 240

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 209 3.4 PAGE REPLACEMENT ALGORITHMS When a page fault occurs, the operating system has to choose a page to evict (remove from memory) to make room for the incoming page. If the page to be re- moved has been modified while in memory, it must be rewritten to the disk to bring the disk copy up to date. If, however, the page has read more..

  • Page - 241

    210 MEMORY MANAGEMENT CHAP. 3 be referenced until 10, 100, or perhaps 1000 instructions later. Each page can be labeled with the number of instructions that will be executed before that page is first referenced. The optimal page replacement algorithm says that the page with the highest label should be removed. If one page will not be used for 8 million instructions and another read more..

  • Page - 242

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 211 The R and M bits can be used to build a simple paging algorithm as follows. When a process is started up, both page bits for all its pages are set to 0 by the op- erating system. Periodically (e.g., on each clock interrupt), the R bit is cleared, to distinguish pages that have not been referenced recently from those that read more..

  • Page - 243

    212 MEMORY MANAGEMENT CHAP. 3 3.4.4 The Second-Chance Page Replacement Algorithm A simple modification to FIFO that avoids the problem of throwing out a heav- ily used page is to inspect the R bit of the oldest page. If it is 0, the page is both old and unused, so it is replaced immediately. If the R bit is 1, the bit is cleared, the page is put onto the end of read more..

  • Page - 244

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 213 When a page fault occurs, the page the hand is pointing to is inspected. The action taken depends on the R bit: R = 0: Evict the page R = 1: Clear R and advance hand A B C D E F G H I J K L Figure 3-16. The clock page replacement algorithm. When a page fault occurs, the page being pointed to by the hand read more..

  • Page - 245

    214 MEMORY MANAGEMENT CHAP. 3 page table entry for the page just referenced. When a page fault occurs, the operat- ing system examines all the counters in the page table to find the lowest one. That page is the least recently used. 3.4.7 Simulating LRU in Software Although the previous LRU algorithm is (in principle) realizable, few, if any, machines have the required read more..

  • Page - 246

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 215 Page 0 1 2 3 4 5 R bits for pages 0-5, clock tick 0 10000000 00000000 10000000 00000000 10000000 10000000 1 0 1 0 1 1 (a) R bits for pages 0-5, clock tick 1 11000000 10000000 01000000 00000000 11000000 01000000 1 1 0 0 1 0 (b) R bits for pages 0-5, clock tick 2 11100000 11000000 00100000 10000000 01100000 10100000 1 1 0 1 0 1 (c) R bits read more..

  • Page - 247

    216 MEMORY MANAGEMENT CHAP. 3 enough memory to hold them all. Fortunately, most processes do not work this way. They exhibit a locality of reference, meaning that during any phase of ex- ecution, the process references only a relatively small fraction of its pages. Each pass of a multipass compiler, for example, references only a fraction of all the pages, and a different read more..

  • Page - 248

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 217 w(k,t) k Figure 3-18. The working set is the set of pages used by the k most recent mem- ory references. The function w(k, t) is the size of the working set at time t. put it differently, there exists a wide range of k values for which the working set is unchanged. Because the working set varies slowly with time, it is read more..

  • Page - 249

    218 MEMORY MANAGEMENT CHAP. 3 used during the past 100 msec of execution time. In practice, such a definition is just as good and much easier to work with. Note that for each process, only its own execution time counts. Thus if a process starts running at time T and has had 40 msec of CPU time at real time T + 100 msec, for working set purposes its time is 40 read more..

  • Page - 250

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 219 page was in use at the time the fault occurred. Since the page has been referenced during the current clock tick, it is clearly in the working set and is not a candidate for removal ( τ is assumed to span multiple clock ticks). If R is 0, the page has not been referenced during the current clock tick and may be a read more..

  • Page - 251

    220 MEMORY MANAGEMENT CHAP. 3 2204 Current virtual time 1213 0 2084 1 2032 1 1620 0 2020 1 2003 1 1980 1 2014 1 Time of last use R bit (a) (b) (c) (d) New page 1213 0 2084 1 2032 1 1620 0 2020 1 2003 1 1980 1 2014 0 1213 0 2084 1 2032 1 1620 0 2020 1 2003 1 1980 1 2014 0 2204 1 2084 1 2032 1 1620 0 2020 1 2003 1 1980 1 2014 0 Figure 3-20. Operation of read more..

  • Page - 252

    SEC. 3.4 PA GE REPLACEMENT ALGORITHMS 221 1. At least one write has been scheduled. 2. No writes have been scheduled. In the first case, the hand just keeps moving, looking for a clean page. Since one or more writes have been scheduled, eventually some write will complete and its page will be marked as clean. The first clean page encountered is evicted. This page is read more..

  • Page - 253

    222 MEMORY MANAGEMENT CHAP. 3 Second chance is a modification to FIFO that checks if a page is in use before removing it. If it is, the page is spared. This modification greatly improves the performance. Clock is simply a different implementation of second chance. It has the same performance properties, but takes a little less time to execute the algo- rithm. LRU is an read more..

  • Page - 254

    SEC. 3.5 DESIGN ISSUES FOR PAGING SYSTEMS 223 (a) (b) (c) A0 A1 A2 A3 A4 A5 B0 B1 B2 B3 B4 B5 B6 C1 C2 C3 A0 A1 A2 A3 A4 A6 B0 B1 B2 B3 B4 B5 B6 C1 C2 C3 A0 A1 A2 A3 A4 A5 B0 B1 B2 A6 B4 B5 B6 C1 C2 C3 Age 10 7 5 4 6 3 9 4 6 2 5 6 12 3 5 6 Figure 3-22. Local versus global page replacement. (a) Original configuration. (b) Local page replacement. (c) Global page replacement. read more..

  • Page - 255

    224 MEMORY MANAGEMENT CHAP. 3 machines, for example, a single two-operand instruction may need as many as six pages because the instruction itself, the source operand, and the destination oper- and may all straddle page boundaries. With an allocation of only fiv e pages, pro- grams containing such instructions cannot execute at all. If a global algorithm is used, it may be read more..

  • Page - 256

    SEC. 3.5 DESIGN ISSUES FOR PAGING SYSTEMS 225 On the other hand, for other page replacement algorithms, only a local strategy makes sense. In particular, the working set and WSClock algorithms refer to some specific process and must be applied in that context. There really is no working set for the machine as a whole, and trying to use the union of all the working sets read more..

  • Page - 257

    226 MEMORY MANAGEMENT CHAP. 3 Determining the best page size requires balancing several competing factors. As a result, there is no overall optimum. To start with, two factors argue for a small page size. A randomly chosen text, data, or stack segment will not fill an integral number of pages. On the average, half of the final page will be empty. The extra space in that read more..

  • Page - 258

    SEC. 3.5 DESIGN ISSUES FOR PAGING SYSTEMS 227 must lie somewhere in between. By taking the first derivative with respect to p and equating it to zero, we get the equation −se / p2 + 1/2 = 0 From this equation we can derive a formula that gives the optimum page size (con- sidering only memory wasted in fragmentation and page table size). The result is: p = √⎯ read more..

  • Page - 259

    228 MEMORY MANAGEMENT CHAP. 3 While address spaces these days are large, their sizes used to be a serious prob- lem. Even today, though, separate I- and D-spaces are still common. However, rather than for the normal address spaces, they are now used to divide the L1 cache. After all, in the L1 cache, memory is still plenty scarce. 3.5.5 Shared Pages Another design issue is read more..

  • Page - 260

    SEC. 3.5 DESIGN ISSUES FOR PAGING SYSTEMS 229 Program Process table Data 1 Data 2 Page tables Figure 3-25. Tw o processes sharing the same program sharing its page tables. made of the offending page so that each process now has its own private copy. Both copies are now set to READ/WRITE, so subsequent writes to either copy proceed without trapping. This strategy means that read more..

  • Page - 261

    230 MEMORY MANAGEMENT CHAP. 3 library clear, first consider traditional linking. When a program is linked, one or more object files and possibly some libraries are named in the command to the linker, such as the UNIX command ld *.o –lc –lm which links all the .o (object) files in the current directory and then scans two li- braries, /usr/lib/libc.a and /usr/lib/libm.a. Any read more..

  • Page - 262

    SEC. 3.5 DESIGN ISSUES FOR PAGING SYSTEMS 231 process 2 it starts at 12K. Suppose that the first thing the first function in the li- brary has to do is jump to address 16 in the library. If the library were not shared, it could be relocated on the fly as it was loaded so that the jump (in process 1) could be to virtual address 36K + 16. Note that the physical read more..

  • Page - 263

    232 MEMORY MANAGEMENT CHAP. 3 the process exits, or explicitly unmaps the file, all the modified pages are written back to the file on disk. Mapped files provide an alternative model for I/O. Instead, of doing reads and writes, the file can be accessed as a big character array in memory. In some situa- tions, programmers find this model more convenient. If two or more read more..

  • Page - 264

    SEC. 3.5 DESIGN ISSUES FOR PAGING SYSTEMS 233 that is true, but in some advanced systems, programmers have some control over the memory map and can use it in nontraditional ways to enhance program behav- ior. In this section, we will briefly look at a few of these. One reason for giving programmers control over their memory map is to allow two or more processes to share read more..

  • Page - 265

    234 MEMORY MANAGEMENT CHAP. 3 for them. Space has to be allocated in memory for the page table and it has to be initialized. The page table need not be resident when the process is swapped out but has to be in memory when the process is running. In addition, space has to be allocated in the swap area on disk so that when a page is swapped out, it has some- read more..

  • Page - 266

    SEC. 3.6 IMPLEMENTATION ISSUES 235 must retrieve the program counter, fetch the instruction, and parse it in software to figure out what it was doing when the fault hit. 4. Once the virtual address that caused the fault is known, the system checks to see if this address is valid and the protection is consistent with the access. If not, the process is sent a signal or read more..

  • Page - 267

    236 MEMORY MANAGEMENT CHAP. 3 is 6 bytes, for example (see Fig. 3-27). In order to restart the instruction, the oper- ating system must determine where the first byte of the instruction is. The value of the program counter at the time of the trap depends on which operand faulted and how the CPU’s microcode has been implemented. MOVE 6 2 1000 1002 1004 Opcode First operand read more..

  • Page - 268

    SEC. 3.6 IMPLEMENTATION ISSUES 237 3.6.4 Locking Pages in Memory Although we have not discussed I/O much in this chapter, the fact that a com- puter has virtual memory does not mean that I/O is absent. Virtual memory and I/O interact in subtle ways. Consider a process that has just issued a system call to read from some file or device into a buffer within its address read more..

  • Page - 269

    238 MEMORY MANAGEMENT CHAP. 3 However, this simple model has a problem: processes can increase in size after starting. Although the program text is usually fixed, the data area can sometimes grow, and the stack can always grow. Consequently, it may be better to reserve sep- arate swap areas for the text, data, and stack and allow each of these areas to con- sist of read more..

  • Page - 270

    SEC. 3.6 IMPLEMENTATION ISSUES 239 room for one disk address per virtual page) is updated accordingly. A page in memory has no copy on disk. The pages’ entries in the disk map contain an invalid disk address or a bit marking them as not in use. Having a fixed swap partition is not always possible. For example, no disk par- titions may be available. In this case, one read more..

  • Page - 271

    240 MEMORY MANAGEMENT CHAP. 3 Disk Main memory External pager Fault handler User process MMU handler 1. Page fault 6. Map page in 5. Here is page User space Kernel space 2. Needed page 4. Page arrives 3. Request page Figure 3-29. Page fault handling with an external pager. This implementation leaves open where the page read more..

  • Page - 272

    SEC. 3.7 SEGMENTATION 241 1. The source text being saved for the printed listing (on batch systems). 2. The symbol table, containing the names and attributes of variables. 3. The table containing all the integer and floating-point constants used. 4. The parse tree, containing the syntactic analysis of the program. 5. The stack used for procedure calls within the compiler. Each of read more..

  • Page - 273

    242 MEMORY MANAGEMENT CHAP. 3 maximum address allowed. Different segments may, and usually do, have different lengths. Moreover, segment lengths may change during execution. The length of a stack segment may be increased whenever something is pushed onto the stack and decreased whenever something is popped off the stack. Because each segment constitutes a separate address space, read more..

  • Page - 274

    SEC. 3.7 SEGMENTATION 243 If the procedure in segment n is subsequently modified and recompiled, no other procedures need be changed (because no starting addresses have been modi- fied), even if the new version is larger than the old one. With a one-dimensional memory, the procedures are packed tightly right up next to each other, with no ad- dress space between them. read more..

  • Page - 275

    244 MEMORY MANAGEMENT CHAP. 3 Consideration Paging Segmentation Need the programmer be aware that this technique is being used? How many linear address spaces are there? Can the total address space exceed the size of physical memory? Can procedures and data be distinguished and separately protected? Can tables whose size fluctuates be accommodated easily? Is sharing of procedures between users read more..

  • Page - 276

    SEC. 3.7 SEGMENTATION 245 (c) (b) (a) (d) (e) Segment 0 (4K) Segment 7 (5K) Segment 2 (5K) Segment 5 (4K) (3K) Segment 3 (8K) Segment 6 (4K) (3K) Segment 0 (4K) Segment 7 (5K) Segment 2 (5K) Segment 3 (8K) (3K) Segment 2 (5K) Segment 0 (4K) Segment 1 (8K) Segment 4 (7K) Segment 4 (7K) Segment 3 (8K) Segment 0 (4K) Segment 7 (5K) Segment 2 (5K) (3K) Segment 5 (4K) (3K) (4K) Segment 0 (4K) Segment read more..

  • Page - 277

    246 MEMORY MANAGEMENT CHAP. 3 (a) (b) Main memory address of the page table Segment length (in pages) 18 9 1 1 1 3 3 Page size: 0 = 1024 words 1 = 64 words 0 = segment is paged 1 = segment is not paged Miscellaneous bits Protection bits Segment 6 descriptor Segment 5 descriptor Segment 4 descriptor Segment 3 descriptor Segment 2 descriptor Segment 1 descriptor Segment 0 descriptor read more..

  • Page - 278

    SEC. 3.7 SEGMENTATION 247 3. The page table entry for the requested virtual page was examined. If the page itself was not in memory, a page fault was triggered. If it was in memory, the main-memory address of the start of the page was extracted from the page table entry. 4. The offset was added to the page origin to give the main memory ad- dress where the word was read more..

  • Page - 279

    248 MEMORY MANAGEMENT CHAP. 3 Segment number Page number Offset Descriptor segment Segment number Page number MULTICS virtual address Page table Page Word Offset Descriptor Page frame Figure 3-36. Conversion of a two-part MULTICS address into a main memory address. Segment number Virtual page Page frame Comparison field Protection Age Is this entry used? 4 6 12 2 2 1 0 3 1 2 7 2 1 0 12 Read/write Read read more..

  • Page - 280

    SEC. 3.7 SEGMENTATION 249 mechanisms are still available in x86-64’s native mode, mostly for compatibility, they no longer serve the same role and no longer offer true segmentation. The x86-32, however, still comes equipped with the whole shebang and it is the CPU we will discuss in this section. The heart of the x86 virtual memory consists of two tables, called the LDT read more..

  • Page - 281

    250 MEMORY MANAGEMENT CHAP. 3 Privilege level (0-3) Relative address 0 4 Base 0-15 Limit 0-15 Base 24-31 Base 16-23 Limit 16-19 G D 0 P DPL Type 0: Li is in bytes 1: Li is in pages 0: 16-Bit segment 1: 32-Bit segment 0: Segment is absent from memory 1: Segment is present in memory Segment type and protection S 0: System 1: Application 32 Bits Figure 3-39. x86 code segment read more..

  • Page - 282

    SEC. 3.7 SEGMENTATION 251 If paging is disabled (by a bit in a global control register), the linear address is interpreted as the physical address and sent to the memory for the read or write. Thus with paging disabled, we have a pure segmentation scheme, with each seg- ment’s base address given in its descriptor. Segments are not prevented from over- lapping, probably read more..

  • Page - 283

    252 MEMORY MANAGEMENT CHAP. 3 Each page table has entries for 1024 4-KB page frames, so a single page table handles 4 megabytes of memory. A segment shorter than 4M will have a page di- rectory with a single entry, a pointer to its one and only page table. In this way, the overhead for short segments is only two pages, instead of the million pages that would be read more..

  • Page - 284

    SEC. 3.8 RESEARCH ON MEMORY MANAGEMENT 253 paging for performance (Lee et al., 2013), and latency reasons (Saito and Oikawa, 2012), and because they wear out if used too much (Bheda et al., 2011, 2012). More generally, research on paging is still ongoing, but it focuses on newer kinds of systems. For example, virtual machines have rekindled interest in mem- ory management read more..

  • Page - 285

    254 MEMORY MANAGEMENT CHAP. 3 protection for different segments. Sometimes segmentation and paging are com- bined to provide a two-dimensional virtual memory. The MULTICS system and the 32-bit Intel x86 support segmentation and paging. Still, it is clear that few operat- ing system developers care deeply about segmentation (because they are married to a different memory model). read more..

  • Page - 286

    CHAP. 3 PROBLEMS 255 9. What kind of hardware support is needed for a paged virtual memory to work? 10. Copy on write is an interesting idea used on server systems. Does it make any sense on a smartphone? 11. Consider the following C program: int X[N]; int step = M; /* M is some predefined constant */ for (int i = 0; i < N; i += step) X[i] = X[i] + 1; read more..

  • Page - 287

    256 MEMORY MANAGEMENT CHAP. 3 18. Section 3.3.4 states that the Pentium Pro extended each entry in the page table hier- archy to 64 bits but still could only address only 4 GB of memory. Explain how this statement can be true when page table entries have 64 bits. 19. A computer with a 32-bit address uses a two-level page table. Virtual addresses are split into a read more..

  • Page - 288

    CHAP. 3 PROBLEMS 257 (a) Why will the standard replacement algorithms (LRU, FIFO, clock) not be effective in handling this workload for a page allocation that is less than the sequence length? (b) If this program were allocated 500 page frames, describe a page replacement ap- proach that would perform much better than the LRU, FIFO, or clock algorithms. 28. If FIFO page read more..

  • Page - 289

    258 MEMORY MANAGEMENT CHAP. 3 35. How long does it take to load a 64-KB program from a disk whose average seek time is 5 msec, whose rotation time is 5 msec, and whose tracks hold 1 MB (a) for a 2-KB page size? (b) for a 4-KB page size? The pages are spread randomly around the disk and the number of cylinders is so large that the chance of two pages being on read more..

  • Page - 290

    CHAP. 3 PROBLEMS 259 39. You hav e been hired by a cloud computing company that deploys thousands of servers at each of its data centers. They hav e recently heard that it would be worthwhile to handle a page fault at server A by reading the page from the RAM memory of some other server rather than its local disk drive. (a) How could that be done? (b) Under read more..

  • Page - 291

    260 MEMORY MANAGEMENT CHAP. 3 paged virtual memory system with virtual addresses that have a 4-bit page number, and a 10-bit offset. The page tables and protection are as follows (all numbers in the table are in decimal): Segment 0 Segment 1 Read/Execute Read/Write Vir tual Pa ge# Pag e frame # Vir tual Pa ge# Pag e frame # 0 2 0 On Disk 1 On Disk 1 14 2 112 read more..

  • Page - 292

    CHAP. 3 PROBLEMS 261 realistic), and process termination and creation are ignored (eternal life). The inputs will be: • The reclamation age threshhold • The clock interrupt interval expressed as number of memory references • A file containing the sequence of page references (a) Describe the basic data structures and algorithms in your implementation. (b) Show that your simulation read more..

  • Page - 293

    262 MEMORY MANAGEMENT CHAP. 3 (a) Describe the basic data structures and algorithms in your implementation. b) Show that your simulation behaves as expected for a simple (but nontrivial) input example. (c) Plot the number of TLB updates per 1000 references. read more..

  • Page - 294

    4 FILE SYSTEMS All computer applications need to store and retrieve information. While a proc- ess is running, it can store a limited amount of information within its own address space. However, the storage capacity is restricted to the size of the virtual address space. For some applications this size is adequate, but for others, such as airline reservations, banking, or read more..

  • Page - 295

    264 FILE SYSTEMS CHAP. 4 moving parts that may break. Also, they offer fast random access. Tapes and opti- cal disks have also been used extensively, but they hav e much lower performance and are typically used for backups. We will study disks more in Chap. 5, but for the moment, it is sufficient to think of a disk as a linear sequence of fixed-size blocks and read more..

  • Page - 296

    SEC. 4.1 FILES 265 or bitmaps are used to keep track of free storage and how many sectors there are in a logical disk block are of no interest, although they are of great importance to the designers of the file system. For this reason, we have structured the chapter as sev- eral sections. The first two are concerned with the user interface to files and direc- tories, read more..

  • Page - 297

    266 FILE SYSTEMS CHAP. 4 fact, there is second file system for Windows 8, known as ReFS (or Resilient File System), but it is targeted at the server version of Windows 8. In this chapter, when we refer to the MS-DOS or FAT file systems, we mean FAT -16 and FAT -32 as used on Windows unless specified otherwise. We will discuss the FAT file sys- tems later in read more..

  • Page - 298

    SEC. 4.1 FILES 267 insist that files it is to compile end in .c, and it may refuse to compile them if they do not. However, the operating system does not care. Conventions like this are especially useful when the same program can handle several different kinds of files. The C compiler, for example, can be given a list of several files to compile and link together, read more..

  • Page - 299

    268 FILE SYSTEMS CHAP. 4 unusual things, the latter can be very important. All versions of UNIX (including Linux and OS X) and Windows use this file model. The first step up in structure isillustrated in Fig. 4-2(b). In this model, a file is a sequence of fixed-length records, each with some internal structure. Central to the idea of a file being a sequence of records is read more..

  • Page - 300

    SEC. 4.1 FILES 269 The great advantage of ASCII files is that they can be displayed and printed as is, and they can be edited with any text editor. Furthermore, if large numbers of programs use ASCII files for input and output, it is easy to connect the output of one program to the input of another, as in shell pipelines. (The interprocess plumbing is not any easier, read more..

  • Page - 301

    270 FILE SYSTEMS CHAP. 4 (a) (b) Header Header Header Magic number Text size Data size BSS size Symbol table size Entry point Flags Text Data Relocation bits Symbol table Object module Object module Object module Module name Date Owner Protection Size Header Figure 4-3. (a) An executable file. (b) An archive. 4.1.4 File Access Early operating systems provided only one kind of file access: sequential read more..

  • Page - 302

    SEC. 4.1 FILES 271 Random access files are essential for many applications, for example, database systems. If an airline customer calls up and wants to reserve a seat on a particular flight, the reservation program must be able to access the record for that flight without having to read the records for thousands of other flights first. Tw o methods can be used for read more..

  • Page - 303

    272 FILE SYSTEMS CHAP. 4 Attribute Meaning Protection Who can access the file and in what way Password Password needed to access the file Creator ID of the person who created the file Owner Current owner Read-only flag 0 for read/write; 1 for read only Hidden flag 0 for normal; 1 for do not display in listings System flag 0 for normal files; 1 for system file Archive flag read more..

  • Page - 304

    SEC. 4.1 FILES 273 maximum number of open files on processes. A disk is written in blocks, and closing a file forces writing of the file’s last block, even though that block may not be entirely full yet. 5. Read . Data are read from file. Usually, the bytes come from the cur- rent position. The caller must specify how many data are needed and must also provide a read more..

  • Page - 305

    274 FILE SYSTEMS CHAP. 4 /* File copy program. Error checking and reporting is minimal. */ #include <sys/types.h> /* include necessary header files */ #include <fcntl.h> #include <stdlib.h> #include <unistd.h> int main(int argc, char *argv[]); /* ANSI prototype */ #define BUF SIZE 4096 /* use a buffer size of 4096 bytes */ #define OUTPUT MODE 0700 /* protection bits read more..

  • Page - 306

    SEC. 4.1 FILES 275 The four #include statements near the top of the program cause a large number of definitions and function prototypes to be included in the program. These are needed to make the program conformant to the relevant international standards, but will not concern us further. The next line is a function prototype for main, some- thing required by ANSI C, but also read more..

  • Page - 307

    276 FILE SYSTEMS CHAP. 4 number of bytes actually read. Normally, this will be 4096, except if fewer bytes are remaining in the file. When the end of the file has been reached, it will be 0. If rd count is ever zero or negative, the copying cannot continue, so the break state- ment is executed to terminate the (otherwise endless) loop. The call to write outputs the read more..

  • Page - 308

    SEC. 4.2 DIRECTORIES 277 Root directory A B C D Figure 4-6. A single-level directory system containing four files. Consequently, a way is needed to group related files together. A professor, for ex- ample, might have a collection of files that together form a book that he is writing, a second collection containing student programs submitted for another course, a third group read more..

  • Page - 309

    278 FILE SYSTEMS CHAP. 4 root directory to the file. As an example, the path /usr/ast/mailbox means that the root directory contains a subdirectory usr, which in turn contains a subdirectory ast, which contains the file mailbox. Absolute path names always start at the root directory and are unique. In UNIX the components of the path are separated by /. In Windows the separator read more..

  • Page - 310

    SEC. 4.2 DIRECTORIES 279 work since its assumption about where it is may now suddenly be invalid. For this reason, library procedures rarely change the working directory, and when they must, they always change it back again before returning. Most operating systems that support a hierarchical directory system have two special entries in every directory, ‘‘.’’ and ‘‘..’’, read more..

  • Page - 311

    280 FILE SYSTEMS CHAP. 4 that directory. Of course, a more normal way to do the copy would be to use the full absolute path name of the source file: cp /usr/lib/dictionary . Here the use of dot saves the user the trouble of typing dictionary a second time. Nevertheless, typing cp /usr/lib/dictionary dictionar y also works fine, as does cp /usr/lib/dictionary /usr/ast/dictionar y read more..

  • Page - 312

    SEC. 4.2 DIRECTORIES 281 name, and creates a link from the existing file to the name specified by the path. In this way, the same file may appear in multiple direc- tories. A link of this kind, which increments the counter in the file’s i-node (to keep track of the number of directory entries containing the file), is sometimes called a hard link. 8. Unlink . A directory read more..

  • Page - 313

    282 FILE SYSTEMS CHAP. 4 partition starts with a boot block, even if it does not contain a bootable operating system. Besides, it might contain one in the future. Other than starting with a boot block, the layout of a disk partition varies a lot from file system to file system. Often the file system will contain some of the items shown in Fig. 4-9. The first one is read more..

  • Page - 314

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 283 was empty. Then a file A, of length four blocks, was written to disk starting at the beginning (block 0). After that a six-block file, B, was written starting right after the end of file A. Note that each file begins at the start of a new block, so that if file A was really 3½ blocks, some space is w asted at the end of the last read more..

  • Page - 315

    284 FILE SYSTEMS CHAP. 4 would take hours or even days with large disks. As a result, the disk ultimately consists of files and holes, as illustrated in the figure. Initially, this fragmentation is not a problem, since each new file can be written at the end of disk, following the previous one. However, eventually the disk will fill up and it will become necessary to read more..

  • Page - 316

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 285 File A Physical block Physical block 4 0 7 2 10 12 File block 0 File block 1 File block 2 File block 3 File block 4 File B 0 63 11 14 File block 0 File block 1 File block 2 File block 3 Figure 4-11. Storing a file as a linked list of disk blocks. Unlike contiguous allocation, every disk block can be used in this method. No space is lost to disk read more..

  • Page - 317

    286 FILE SYSTEMS CHAP. 4 Physical block File A starts here File B starts here Unused block 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 10 11 7 3 2 12 14 -1 -1 Figure 4-12. Linked-list allocation using a file-allocation table in main memory. Using this organization, the entire block is available for data. Furthermore, ran- dom access is much easier. Although the chain must still be read more..

  • Page - 318

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 287 The big advantage of this scheme over linked files using an in-memory table is that the i-node need be in memory only when the corresponding file is open. If each i- node occupies n bytes and a maximum of k files may be open at once, the total memory occupied by the array holding the i-nodes for the open files is only kn bytes. read more..

  • Page - 319

    288 FILE SYSTEMS CHAP. 4 4.3.3 Implementing Directories Before a file can be read, it must be opened. When a file is opened, the operat- ing system uses the path name supplied by the user to locate the directory entry on the disk. The directory entry provides the information needed to find the disk blocks. Depending on the system, this information may be the disk address read more..

  • Page - 320

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 289 The simplest approach is to set a limit on file-name length, typically 255 char- acters, and then use one of the designs of Fig. 4-14 with 255 characters reserved for each file name. This approach is simple, but wastes a great deal of directory space, since few files have such long names. For efficiency reasons, a different structure is read more..

  • Page - 321

    290 FILE SYSTEMS CHAP. 4 only now compacting the directory is feasible because it is entirely in memory. An- other problem is that a single directory entry may span multiple pages, so a page fault may occur while reading a file name. Another way to handle variable-length names is to make the directory entries themselves all fixed length and keep the file names together in a read more..

  • Page - 322

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 291 link. The file system itself is now a Directed Acyclic Graph,or DAG, rather than a tree. Having the file system be a DAG complicates maintenance, but such is life. Root directory B B B C C C C A B C B ? C C C A Shared file Figure 4-16. File system containing a shared file. Sharing files is convenient, but it also introduces some problems. read more..

  • Page - 323

    292 FILE SYSTEMS CHAP. 4 C's directory B's directory B's directory C's directory Owner = C Count = 1 Owner = C Count = 2 Owner = C Count = 1 (a) (b) (c) Figure 4-17. (a) Situation prior to linking. (b) After the link is created. (c) After the original owner removes the file. an invalid i-node. If the i-node is later reassigned to another file, B’s link will point to read more..

  • Page - 324

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 293 in a directory and its subdirectories onto a tape may make multiple copies of a linked file. Furthermore, if the tape is then read into another machine, unless the dump program is clever, the linked file will be copied twice onto the disk, instead of being linked. 4.3.5 Log-Structured File Systems Changes in technology are putting pressure read more..

  • Page - 325

    294 FILE SYSTEMS CHAP. 4 thus contain i-nodes, directory blocks, and data blocks, all mixed together. At the start of each segment is a segment summary, telling what can be found in the seg- ment. If the average segment can be made to be about 1 MB, almost the full band- width of the disk can be utilized. In this design, i-nodes still exist and even hav e the same read more..

  • Page - 326

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 295 4.3.6 Journaling File Systems While log-structured file systems are an interesting idea, they are not widely used, in part due to their being highly incompatible with existing file systems. Nevertheless, one of the ideas inherent in them, robustness in the face of failure, can be easily applied to more conventional file systems. The basic idea read more..

  • Page - 327

    296 FILE SYSTEMS CHAP. 4 To make journaling work, the logged operations must be idempotent, which means they can be repeated as often as necessary without harm. Operations such as ‘‘Update the bitmap to mark i-node k or block n as free’’ can be repeated until the cows come home with no danger. Similarly, searching a directory and removing any entry called foobar is read more..

  • Page - 328

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 297 1986), most UNIX systems have used the concept of a VFS (virtual file system) to try to integrate multiple file systems into an orderly structure. The key idea is to abstract out that part of the file system that is common to all file systems and put that code in a separate layer that calls the underlying concrete file systems to read more..

  • Page - 329

    298 FILE SYSTEMS CHAP. 4 normally supported. These include the superblock (which describes a file system), the v-node (which describes a file), and the directory (which describes a file sys- tem directory). Each of these has associated operations (methods) that the concrete file systems must support. In addition, the VFS has some internal data structures for its own use, including read more..

  • Page - 330

    SEC. 4.3 FILE-SYSTEM IMPLEMENTATION 299 are shown in Fig. 4-19. Starting with the caller’s process number and the file de- scriptor, successively the v-node, read function pointer, and access function within the concrete file system are located. .. . Process table 0 File descriptors .. . V-nodes open read write Function pointers .. . 2 4 VFS Read function FS 1 Call from VFS into FS 1 Figure read more..

  • Page - 331

    300 FILE SYSTEMS CHAP. 4 4.4.1 Disk-Space Management Files are normally stored on disk, so management of disk space is a major con- cern to file-system designers. Two general strategies are possible for storing an n byte file: n consecutive bytes of disk space are allocated, or the file is split up into a number of (not necessarily) contiguous blocks. The same trade-off is read more..

  • Page - 332

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 301 Length VU 1984 VU 2005 Web Length VU 1984 VU 2005 Web 1 1.79 1.38 6.67 16 KB 92.53 78.92 86.79 2 1.88 1.53 7.67 32 KB 97.21 85.87 91.65 4 2.01 1.65 8.33 64 KB 99.18 90.84 94.80 8 2.31 1.80 11.30 128 KB 99.84 93.73 96.93 16 3.32 2.15 11.46 256 KB 99.96 96.12 98.48 32 5.13 3.15 12.33 512 KB 100.00 97.73 98.99 64 8.71 4.98 read more..

  • Page - 333

    302 FILE SYSTEMS CHAP. 4 1 KB 4 KB 16 KB 64 KB 256 KB 1MB 100% 10 20 30 40 50 60 0 80% 60% 40% 20% 0% Data rate (MB/sec) Disk space utilization Figure 4-21. The dashed curve (left-hand scale) gives the data rate of a disk. The solid curve (right-hand scale) gives the disk-space efficiency. All files are 4 KB. Hence the data rate goes up almost linearly with block size read more..

  • Page - 334

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 303 Keeping Track of Free Blocks Once a block size has been chosen, the next issue is how to keep track of free blocks. Two methods are widely used, as shown in Fig. 4-22. The first one con- sists of using a linked list of disk blocks, with each block holding as many free disk block numbers as will fit. With a 1-KB read more..

  • Page - 335

    304 FILE SYSTEMS CHAP. 4 consecutive free blocks. In the best case, a basically empty disk could be repres- ented by two numbers: the address of the first free block followed by the count of free blocks. On the other hand, if the disk becomes severely fragmented, keeping track of runs is less efficient than keeping track of individual blocks because not only must the read more..

  • Page - 336

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 305 (a) Disk Main memory (b) (c) Figure 4-23. (a) An almost-full block of pointers to free disk blocks in memory and three blocks of pointers on disk. (b) Result of freeing a three-block file. (c) An alternative strategy for handling the three free blocks. The shaded entries represent pointers to free disk blocks. Since the bitmap is read more..

  • Page - 337

    306 FILE SYSTEMS CHAP. 4 Open file table Quota table Soft block limit Hard block limit Current # of blocks # Block warnings left Soft file limit Hard file limit Current # of files # File warnings left Attributes disk addresses User = 8 Quota pointer Quota record for user 8 Figure 4-24. Quotas are kept track of on a per-user basis in a quota table. If either limit has been read more..

  • Page - 338

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 307 Most people do not think making backups of their files is worth the time and effort—until one fine day their disk abruptly dies, at which time most of them undergo a deathbed conversion. Companies, however, (usually) well understand the value of their data and generally do a backup at least once a day, often to tape. read more..

  • Page - 339

    308 FILE SYSTEMS CHAP. 4 Third, since immense amounts of data are typically dumped, it may be desir- able to compress the data before writing them to tape. However, with many com- pression algorithms, a single bad spot on the backup tape can foil the decompres- sion algorithm and make an entire file or even an entire tape unreadable. Thus the decision to compress the read more..

  • Page - 340

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 309 However, sometimes blocks go bad after formatting, in which case the operat- ing system will eventually detect them. Usually, it solves the problem by creating a ‘‘file’’ consisting of all the bad blocks—just to make sure they nev er appear in the free-block pool and are never assigned. Needless to say, this file is read more..

  • Page - 341

    310 FILE SYSTEMS CHAP. 4 1 18 19 5 6 27 7 10 20 22 30 29 23 14 11 2 3 4 8 9 12 13 15 31 28 32 24 25 26 16 17 21 File that has changed File that has not changed Root directory Directory that has not changed Figure 4-25. A file system to be dumped. The squares are directories and the cir- cles are files. The shaded items have been modified since the last dump. Each di- read more..

  • Page - 342

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 311 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 (d) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 (c) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 read more..

  • Page - 343

    312 FILE SYSTEMS CHAP. 4 4.4.3 File-System Consistency Another area where reliability is an issue is file-system consistency. Many file systems read blocks, modify them, and write them out later. If the system crashes before all the modified blocks have been written out, the file system can be left in an inconsistent state. This problem is especially critical if some of the read more..

  • Page - 344

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 313 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 0 0123456789 101112131415 Block number Blocks in use 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 Free blocks (a) 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 0 0123456789 101112131415 Blocks in use 0 0 1 0 2 0 0 0 0 1 1 0 0 0 1 1 Free blocks (c) 1 1 0 1 0 1 1 1 1 read more..

  • Page - 345

    314 FILE SYSTEMS CHAP. 4 file system marks it as unused and releases all of its blocks. This action will result in one of the directories now pointing to an unused i-node, whose blocks may soon be assigned to other files. Again, the solution is just to force the link count in the i- node to the actual number of directory entries. These two operations, checking blocks read more..

  • Page - 346

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 315 Caching The most common technique used to reduce disk accesses is the block cache or buffer cache. (Cache is pronounced ‘‘cash’’ and is derived from the French cacher, meaning to hide.) In this context, a cache is a collection of blocks that log- ically belong on the disk but are being kept in memory for performance read more..

  • Page - 347

    316 FILE SYSTEMS CHAP. 4 the crashes and file-system consistency discussed in the previous section. If a criti- cal block, such as an i-node block, is read into the cache and modified, but not rewritten to the disk, a crash will leave the file system in an inconsistent state. If the i-node block is put at the end of the LRU chain, it may be quite a while before it read more..

  • Page - 348

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 317 in which all modified blocks are written back to the disk immediately are called write-through caches. They require more disk I/O than nonwrite-through caches. The difference between these two approaches can be seen when a program writes a 1-KB block full, one character at a time. UNIX will collect all the charac- ters in the read more..

  • Page - 349

    318 FILE SYSTEMS CHAP. 4 benefit of the doubt and put in sequential-access mode. However, whenever a seek is done, the bit is cleared. If sequential reads start happening again, the bit is set once again. In this way, the file system can make a reasonable guess about wheth- er it should read ahead or not. If it gets it wrong once in a while, it is not a disas- read more..

  • Page - 350

    SEC. 4.4 FILE-SYSTEM MANAGEMENT AND OPTIMIZATION 319 I-nodes are located near the start of the disk Disk is divided into cylinder groups, each with its own i-nodes (a) (b) Cylinder group Figure 4-29. (a) I-nodes placed at the start of the disk. (b) Disk divided into cyl- inder groups, each with its own blocks and i-nodes. For instance, SSDs have peculiar properties when it comes read more..

  • Page - 351

    320 FILE SYSTEMS CHAP. 4 more trouble than it is worth. In some systems, these are fixed-size contiguous areas anyway, so they do not have to be defragmented. The one time when their lack of mobility is a problem is when they happen to be near the end of the parti- tion and the user wants to reduce the partition size. The only way to solve this problem is to read more..

  • Page - 352

    SEC. 4.5 EXAMPLE FILE SYSTEMS 321 file size. File names shorter than 8 + 3 characters are left justified and padded with spaces on the right, in each field separately. The Attributes field is new and con- tains bits to indicate that a file is read-only, needs to be archived, is hidden, or is a system file. Read-only files cannot be written. This is to protect them from read more..

  • Page - 353

    322 FILE SYSTEMS CHAP. 4 of a misnomer, since only the low-order 28 bits of the disk addresses are used. It should have been called FAT -28, but powers of two sound so much neater. Another variant of the FAT file system is exFAT , which Microsoft introduced for large removable devices. Apple licensed exFAT , so that there is one modern file system that can be read more..

  • Page - 354

    SEC. 4.5 EXAMPLE FILE SYSTEMS 323 Block siz e FAT-12 FAT-16 FAT-32 0.5 KB 2 MB 1KB 4 MB 2 KB 8 MB 128 MB 4 KB 16 MB 256 MB 1 TB 8 KB 512 MB 2 TB 16 KB 1024 MB 2 TB 32 KB 2048 MB 2 TB Figure 4-31. Maximum partition size for different block sizes. The empty boxes represent forbidden combinations. In addition to supporting larger disks, the FAT -32 file system has two read more..

  • Page - 355

    324 FILE SYSTEMS CHAP. 4 characters and can contain any ASCII characters except / (because that is the sepa- rator between components in a path) and NUL (because that is used to pad out names shorter than 14 characters). NUL has the numerical value of 0. A UNIX directory entry contains one entry for each file in that directory. Each entry is extremely simple because UNIX read more..

  • Page - 356

    SEC. 4.5 EXAMPLE FILE SYSTEMS 325 I-node Attributes Disk addresses Single indirect block Double indirect block Triple indirect block Addresses of data blocks Figure 4-33. A UNIX i-node. an i-node from its number is straightforward, since each one has a fixed location on the disk. From this i-node, the system locates the directory for /usr and looks up the next component, ast, in it. When read more..

  • Page - 357

    326 FILE SYSTEMS CHAP. 4 Root directory I-node 6 is for /usr Block 132 is /usr directory I-node 26 is for /usr/ast Block 406 is /usr/ast directory Looking up usr yields i-node 6 I-node 6 says that /usr is in block 132 /usr/ast is i-node 26 /usr/ast/mbox is i-node 60 I-node 26 says that /usr/ast is in block 406 1 1 4 7 14 9 6 8 . .. bin dev lib etc usr tmp 6 1 19 30 51 26 45 dick erik read more..

  • Page - 358

    SEC. 4.5 EXAMPLE FILE SYSTEMS 327 (although seeks across the spiral are possible). The bits along the spiral are divid- ed into logical blocks (also called logical sectors) of 2352 bytes. Some of these are for preambles, error correction, and other overhead. The payload portion of each logical block is 2048 bytes. When used for music, CDs have leadins, leadouts, and intertrack read more..

  • Page - 359

    328 FILE SYSTEMS CHAP. 4 then people from companies whose products were big endian would have felt like second-class citizens and would not have accepted the standard. The emotional content of a CD-ROM can thus be quantified and measured exactly in kilo- bytes/hour of wasted space. The format of an ISO 9660 directory entry is illustrated in Fig. 4-35. Since di- rectory entries read more..

  • Page - 360

    SEC. 4.5 EXAMPLE FILE SYSTEMS 329 extension, a semicolon, and a binary version number (1 or 2 bytes). The base name and extension may use uppercase letters, the digits 0–9, and the underscore character. All other characters are forbidden to make sure that every computer can handle every file name. The base name can be up to eight characters; the extension can be up to read more..

  • Page - 361

    330 FILE SYSTEMS CHAP. 4 The extensions use the System use field in order to make Rock Ridge CD- ROMs readable on any computer. All the other fields retain their normal ISO 9660 meaning. Any system not aware of the Rock Ridge extensions just ignores them and sees a normal CD-ROM. The extensions are divided up into the following fields: 1. PX - POSIX attributes. 2. PN - read more..

  • Page - 362

    SEC. 4.5 EXAMPLE FILE SYSTEMS 331 Therefore Microsoft invented some extensions that were called Joliet. They were designed to allow Windows file systems to be copied to CD-ROM and then restor- ed, in precisely the same way that Rock Ridge was designed for UNIX. Virtually all programs that run under Windows and use CD-ROMs support Joliet, including programs that burn CD-recordables. read more..

  • Page - 363

    332 FILE SYSTEMS CHAP. 4 2012; and Vrable et al., 2012). Another area that has been getting attention recently is provenance—keeping track of the history of the data, including where they came from, who owns them, and how they hav e been transformed (Ghoshal and Plale, 2013; and Sultana and Bertino, 2013). Keeping data safe and useful for decades is also of interest to read more..

  • Page - 364

    CHAP. 4 PROBLEMS 333 3. In early UNIX systems, executable files (a.out files) began with a very specific magic number, not one chosen at random. These files began with a header, followed by the text and data segments. Why do you think a very specific number was chosen for ex- ecutable files, whereas other file types had a more-or-less random magic number as the first read more..

  • Page - 365

    334 FILE SYSTEMS CHAP. 4 16. Consider the i-node shown in Fig. 4-13. If it contains 10 direct addresses and these were 8 bytes each and all disk blocks were 1024 KB, what would the largest possible file be? 17. For a giv en class, the student records are stored in a file. The records are randomly ac- cessed and updated. Assume that each student’s record is of read more..

  • Page - 366

    CHAP. 4 PROBLEMS 335 27. Oliver Owl’s night job at the university computing center is to change the tapes used for overnight data backups. While waiting for each tape to complete, he works on writ- ing his thesis that proves Shakespeare’s plays were written by extraterrestrial visitors. His text processor runs on the system being backed up since that is the only one read more..

  • Page - 367

    336 FILE SYSTEMS CHAP. 4 38. Given a disk-block size of 4 KB and block-pointer address value of 4 bytes, what is the largest file size (in bytes) that can be accessed using 10 direct addresses and one indi- rect block? 39. Files in MS-DOS have to compete for space in the FAT -16 table in memory. If one file uses k entries, that is k entries that are not read more..

  • Page - 368

    5 INPUT/OUTPUT In addition to providing abstractions such as processes, address spaces, and files, an operating system also controls all the computer’s I/O (Input/Output) de- vices. It must issue commands to the devices, catch interrupts, and handle errors. It should also provide an interface between the devices and the rest of the system that is simple and easy to use. To the read more..

  • Page - 369

    338 INPUT/OUTPUT CHAP. 5 presented to the software—the commands the hardware accepts, the functions it carries out, and the errors that can be reported back. In this book we are concerned with programming I/O devices, not designing, building, or maimtaining them, so our interest is in how the hardware is programmed, not how it works inside. Never- theless, the programming of read more..

  • Page - 370

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 339 I/O devices cover a huge range in speeds, which puts considerable pressure on the software to perform well over many orders of magnitude in data rates. Figure 5-1 shows the data rates of some common devices. Most of these devices tend to get faster as time goes on. Device Data rate Ke yboard 10 bytes/sec Mouse 100 bytes/sec 56K modem read more..

  • Page - 371

    340 INPUT/OUTPUT CHAP. 5 The interface between the controller and the device is often a very low-level one. A disk, for example, might be formatted with 2,000,000 sectors of 512 bytes per track. What actually comes off the drive, howev er, is a serial bit stream, start- ing with a preamble, then the 4096 bits in a sector, and finally a checksum, or ECC (Error-Correcting read more..

  • Page - 372

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 341 each control register is assigned an I/O port number, an 8- or 16-bit integer. The set of all the I/O ports form the I/O port space, which is protected so that ordinary user programs cannot access it (only the operating system can). Using a special I/O instruction such as IN REG,PORT, the CPU can read in control register PORT and read more..

  • Page - 373

    342 INPUT/OUTPUT CHAP. 5 The x86 uses this architecture, with addresses 640K to 1M − 1 being reserved for device data buffers in IBM PC compatibles, in addition to I/O ports 0 to 64K − 1. How do these schemes actually work in practice? In all cases, when the CPU wants to read a word, either from memory or from an I/O port, it puts the address it needs on the read more..

  • Page - 374

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 343 the loop given above, a fourth instruction has to be added, slightly slowing down the responsiveness of detecting an idle device. In computer design, practically everything involves trade-offs, and that is the case here, too. Memory-mapped I/O also has its disadvantages. First, most com- puters nowadays have some form of caching of memory read more..

  • Page - 375

    344 INPUT/OUTPUT CHAP. 5 buses. One possibility is to first send all memory references to the memory. If the memory fails to respond, then the CPU tries the other buses. This design can be made to work but requires additional hardware complexity. A second possible design is to put a snooping device on the memory bus to pass all addresses presented to potentially interested read more..

  • Page - 376

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 345 CPU DMA controller Disk controller Main memory Buffer 1. CPU programs the DMA controller Interrupt when done 2. DMA requests transfer to memory 3. Data transferred Bus 4. Ack Address Count Control Drive Figure 5-4. Operation of a DMA transfer. Then the controller causes an interrupt. When the operating system starts running, it can read the disk read more..

  • Page - 377

    346 INPUT/OUTPUT CHAP. 5 use a different device controller. After each word is transferred (steps 2 through 4) in Fig. 5-4, the DMA controller decides which device to service next. It may be set up to use a round-robin algorithm, or it may have a priority scheme design to favor some devices over others. Multiple requests to different device controllers may be pending at read more..

  • Page - 378

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 347 the controller tried to write data directly to memory, it would have to go over the system bus for each word transferred. If the bus were busy due to some other de- vice using it (e.g., in burst mode), the controller would have to wait. If the next disk word arrived before the previous one had been stored, the controller read more..

  • Page - 379

    348 INPUT/OUTPUT CHAP. 5 the device is just ignored for the moment. In this case it continues to assert an in- terrupt signal on the bus until it is serviced by the CPU. To handle the interrupt, the controller puts a number on the address lines speci- fying which device wants attention and asserts a signal to interrupt the CPU. The interrupt signal causes the CPU to read more..

  • Page - 380

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 349 Precise and Imprecise Interrupts Another problem is caused by the fact that most modern CPUs are heavily pipelined and often superscalar (internally parallel). In older systems, after each instruction was finished executing, the microprogram or hardware checked to see if there was an interrupt pending. If so, the program counter and PSW were read more..

  • Page - 381

    350 INPUT/OUTPUT CHAP. 5 However, it must be clear which case applies. Often, if the interrupt is an I/O inter- rupt, the instruction will not yet have started. However, if the interrupt is really a trap or page fault, then the PC generally points to the instruction that caused the fault so it can be restarted later. The situation of Fig. 5-6(a) illustrates a precise read more..

  • Page - 382

    SEC. 5.1 PRINCIPLES OF I/O HARDWARE 351 point are allowed to have any noticeable effect on the machine state. Here the price is paid not in time, but in chip area and in complexity of the design. If precise in- terrupts were not required for backward compatibility purposes, this chip area would be available for larger on-chip caches, making the CPU faster. On the other read more..

  • Page - 383

    352 INPUT/OUTPUT CHAP. 5 are not able to deal with the problem should the upper layers be told about it. In many cases, error recovery can be done transparently at a low lev el without the upper levels even knowing about the error. Still another important issue is that of synchronous (blocking) vs. asyn- chronous (interrupt-driven) transfers. Most physical I/O is read more..

  • Page - 384

    SEC. 5.2 PRINCIPLES OF I/O SOFTWARE 353 String to be printed User space Kernel space ABCD EFGH Printed page (a) ABCD EFGH ABCD EFGH Printed page (b) A Next (c) AB Next Figure 5-7. Steps in printing a string. The user process then acquires the printer for writing by making a system call to open it. If the printer is currently in use by another process, this call will fail and return read more..

  • Page - 385

    354 INPUT/OUTPUT CHAP. 5 tight loop, outputting the characters one at a time. The essential aspect of program- med I/O, clearly illustrated in this figure, is that after outputting a character, the CPU continuously polls the device to see if it is ready to accept another one. This behavior is often called polling or busy waiting. copy from user(buffer, p, count); /* p is the read more..

  • Page - 386

    SEC. 5.2 PRINCIPLES OF I/O SOFTWARE 355 copy from user(buffer, p, count); if (count == 0) { enable interr upts( ); unblock user( ); while (*pr inter status reg != READY) ; } else { *pr inter data register = p[0]; *pr inter data register = p[i]; scheduler( ); count = count − 1; i=i+1; } acknowledge interr upt( ); retur n from interr upt( ); (a) (b) Figure 5-9. Writing a read more..

  • Page - 387

    356 INPUT/OUTPUT CHAP. 5 5.3 I/O SOFTWARE LAYERS I/O software is typically organized in four layers, as shown in Fig. 5-11. Each layer has a well-defined function to perform and a well-defined interface to the ad- jacent layers. The functionality and interfaces differ from system to system, so the discussion that follows, which examines all the layers starting at the bottom, is read more..

  • Page - 388

    SEC. 5.3 I/O SOFTWARE LAYERS 357 system dependent, so some of the steps listed below may not be needed on a partic- ular machine, and steps not listed may be required. Also, the steps that do occur may be in a different order on some machines. 1. Save any registers (including the PSW) that have not already been saved by the interrupt hardware. 2. Set up a context read more..

  • Page - 389

    358 INPUT/OUTPUT CHAP. 5 have to know all about sectors, tracks, cylinders, heads, arm motion, motor drives, head settling times, and all the other mechanics of making the disk work properly. Obviously, these drivers will be very different. Consequently, each I/O device attached to a computer needs some device-spe- cific code for controlling it. This code, called the device driver, read more..

  • Page - 390

    SEC. 5.3 I/O SOFTWARE LAYERS 359 does and how it interacts with the rest of the operating system. Device drivers are normally positioned below the rest of the operating system, as is illustrated in Fig. 5-12. User space Kernel space User process User program Rest of the operating system Printer driver Camcorder driver CD-ROM driver Printer controller Hardware Devices Camcorder controller CD-ROM read more..

  • Page - 391

    360 INPUT/OUTPUT CHAP. 5 with UNIX systems because they were run by computer centers and I/O devices rarely changed. If a new device was added, the system administrator simply re- compiled the kernel with the new driver to build a new binary. With the advent of personal computers, with their myriad I/O devices, this model no longer worked. Few users are capable of recompiling read more..

  • Page - 392

    SEC. 5.3 I/O SOFTWARE LAYERS 361 have some data to pass to the device-independent software (e.g., a block just read). Finally, it returns some status information for error reporting back to its caller. If any other requests are queued, one of them can now be selected and started. If nothing is queued, the driver blocks waiting for the next request. This simple model is only read more..

  • Page - 393

    362 INPUT/OUTPUT CHAP. 5 The basic function of the device-independent software is to perform the I/O functions that are common to all devices and to provide a uniform interface to the user-level software. We will now look at the above issues in more detail. Uniform Interfacing for Device Drivers A major issue in an operating system is how to make all I/O devices and driv- read more..

  • Page - 394

    SEC. 5.3 I/O SOFTWARE LAYERS 363 on and off, formatting, and other disky things. Often the driver holds a table with pointers into itself for these functions. When the driver is loaded, the operating system records the address of this table of function pointers, so when it needs to call one of the functions, it can make an indirect call via this table. This table of read more..

  • Page - 395

    364 INPUT/OUTPUT CHAP. 5 User process User space Kernel space 22 11 3 Modem Modem Modem Modem (a) (b) (c) (d) Figure 5-15. (a) Unbuffered input. (b) Buffering in user space. (c) Buffering in the kernel followed by copying to user space. (d) Double buffering in the kernel. Yet another approach is to create a buffer inside the kernel and have the inter- rupt handler put the read more..

  • Page - 396

    SEC. 5.3 I/O SOFTWARE LAYERS 365 but this leads to an even worse problem: how does the user process know that the output has been completed and it can reuse the buffer? The system could generate a signal or software interrupt, but that style of programming is difficult and prone to race conditions. A much better solution is for the kernel to copy the data to a kernel read more..

  • Page - 397

    366 INPUT/OUTPUT CHAP. 5 Error Reporting Errors are far more common in the context of I/O than in other contexts. When they occur, the operating system must handle them as best it can. Many errors are device specific and must be handled by the appropriate driver, but the framework for error handling is device independent. One class of I/O errors is programming errors. These read more..

  • Page - 398

    SEC. 5.3 I/O SOFTWARE LAYERS 367 Device-Independent Block Size Different disks may have different sector sizes. It is up to the device-indepen- dent software to hide this fact and provide a uniform block size to higher layers, for example, by treating several sectors as a single logical block. In this way, the higher layers deal only with abstract devices that all use the read more..

  • Page - 399

    368 INPUT/OUTPUT CHAP. 5 Instead what is done is to create a special process, called a daemon, and a spe- cial directory, called a spooling directory. To print a file, a process first generates the entire file to be printed and puts it in the spooling directory. It is up to the dae- mon, which is the only process having permission to use the printer’s special file, read more..

  • Page - 400

    SEC. 5.3 I/O SOFTWARE LAYERS 369 When the disk is finished, the hardware generates an interrupt. The interrupt handler is run to discover what has happened, that is, which device wants attention right now. It then extracts the status from the device and wakes up the sleeping process to finish off the I/O request and let the user process continue. 5.4 DISKS Now we will read more..

  • Page - 401

    370 INPUT/OUTPUT CHAP. 5 same time. (Reading or writing requires the controller to move bits on a microsec- ond time scale, so one transfer uses up most of its computing power.) The situa- tion is different for hard disks with integrated controllers, and in a system with more than one of these hard drives they can operate simultaneously, at least to the extent of read more..

  • Page - 402

    SEC. 5.4 DISKS 371 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 1 0 11 12 13 14 15 16 17 18 19 20 2 1 2 2 2 3 2 4 2 5 2 6 27 28 29 30 31 0 1 2 3 4 5 6 7 89 10 11 12 13 14 15 16 1 7 1 8 1 9 2 0 21 22 23 24 Figure 5-19. (a) Physical geometry of a disk with two zones. (b) A possible vir- tual geometry for this disk. The controller then remaps a read more..

  • Page - 403

    372 INPUT/OUTPUT CHAP. 5 Yes! As we have seen, parallel processing is increasingly being used to speed up CPU performance. It has occurred to various people over the years that parallel I/O might be a good idea, too. In their 1988 paper, Patterson et al. suggested six specific disk organizations that could be used to improve disk performance, re- liability, or both (Patterson read more..

  • Page - 404

    SEC. 5.4 DISKS 373 proper disks in the right sequence and then assemble the results in memory cor- rectly. Performance is excellent and the implementation is straightforward. RAID level 0 works worst with operating systems that habitually ask for data one sector at a time. The results will be correct, but there is no parallelism and hence no performance gain. Another disadvantage read more..

  • Page - 405

    374 INPUT/OUTPUT CHAP. 5 Figure 5-20. RAID levels 0 through 6. Backup and parity drives are shown shaded. read more..

  • Page - 406

    SEC. 5.4 DISKS 375 crashes, the controller just pretends that all its bits are 0s. If a word has a parity error, the bit from the dead drive must have been a 1, so it is corrected. Although both RAID levels 2 and 3 offer very high data rates, the number of separate I/O re- quests per second they can handle is no better than for a single drive. RAID levels 4 read more..

  • Page - 407

    376 INPUT/OUTPUT CHAP. 5 The preamble starts with a certain bit pattern that allows the hardware to rec- ognize the start of the sector. It also contains the cylinder and sector numbers and some other information. The size of the data portion is determined by the low- level formatting program. Most disks use 512-byte sectors. The ECC field con- tains redundant information that read more..

  • Page - 408

    SEC. 5.4 DISKS 377 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 415 1 61 71 81 92 0 21 22 23 24 25 26 27 28 29 30 31 29 30 31 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 8 1 9 2 0 2 1 2 2 2 3 24 25 26 27 28 26 27 28 29 30 3 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 5 1 6 1 7 1 8 1 9 2 0 21 22 23 24 25 23 24 25 26 27 2 8 2 9 3 0 3 1 0 1 2 3 4 5 6 7 8 9 10 11 read more..

  • Page - 409

    378 INPUT/OUTPUT CHAP. 5 memory. While this transfer is taking place, the next sector will fly by the head. When the copy to memory is complete, the controller will have to wait almost an entire rotation time for the second sector to come around again. This problem can be eliminated by numbering the sectors in an interleaved fashion when formatting the disk. In Fig. read more..

  • Page - 410

    SEC. 5.4 DISKS 379 The final step in preparing a disk for use is to perform a high-level format of each partition (separately). This operation lays down a boot block, the free storage administration (free list or bitmap), root directory, and an empty file system. It also puts a code in the partition table entry telling which file system is used in the partition because many read more..

  • Page - 411

    380 INPUT/OUTPUT CHAP. 5 Initial position Pending requests Sequence of seeks Cylinder XX X X X X X 0 5 10 15 20 25 30 35 Ti m e Figure 5-24. Shortest Seek First (SSF) disk scheduling algorithm. Alternatively, it could always handle the closest request next, to minimize seek time. Given the requests of Fig. 5-24, the sequence is 12, 9, 16, 1, 34, and 36, shown as the jagged line read more..

  • Page - 412

    SEC. 5.4 DISKS 381 Figure 5-25 shows the elevator algorithm using the same seven requests as Fig. 5-24, assuming the direction bit was initially UP. The order in which the cyl- inders are serviced is 12, 16, 34, 36, 9, and 1, which yields arm motions of 1, 4, 18, 2, 27, and 8, for a total of 60 cylinders. In this case the elevator algorithm is slight- ly better read more..

  • Page - 413

    382 INPUT/OUTPUT CHAP. 5 space is available in the controller’s cache memory. The hard disk described in Fig. 5-18 has a 4-MB cache, for example. The use of the cache is determined dynam- ically by the controller. In its simplest mode, the cache is divided into two sections, one for reads and one for writes. If a subsequent read can be satisfied out of the controller’s read more..

  • Page - 414

    SEC. 5.4 DISKS 383 a few bits, it is possible to use the bad sector and just let the ECC correct the errors ev ery time. If the defect is bigger, the error cannot be masked. There are two general approaches to bad blocks: deal with them in the con- troller or deal with them in the operating system. In the former approach, before the disk is shipped from the read more..

  • Page - 415

    384 INPUT/OUTPUT CHAP. 5 system must do the same thing in software. This means that it must first acquire a list of bad sectors, either by reading them from the disk, or simply testing the entire disk itself. Once it knows which sectors are bad, it can build remapping tables. If the operating system wants to use the approach of Fig. 5-26(c), it must shift the data in read more..

  • Page - 416

    SEC. 5.4 DISKS 385 recalibrations insert gaps into the bit stream and are unacceptable. Special drives, called AV disks (Audio Visual disks), which never recalibrate are available for such applications. Anecdotally, a highly convincing demonstration of how advanced disk con- trollers have become was given by the Dutch hacker Jeroen Domburg, who hacked a modern disk controller to make read more..

  • Page - 417

    386 INPUT/OUTPUT CHAP. 5 rare that having the same sector go bad on a second (independent) drive during a reasonable time interval (e.g., 1 day) is small enough to ignore. The model also assumes the CPU can fail, in which case it just stops. Any disk write in progress at the moment of failure also stops, leading to incorrect data in one sector and an incorrect ECC read more..

  • Page - 418

    SEC. 5.4 DISKS 387 in the presence of CPU crashes during stable writes? It depends on precisely when the crash occurs. There are fiv e possibilities, as depicted in Fig. 5-27. Old 1 Old 2 Disk 1 Old 2 Disk New 1 Old 2 Disk New 1 2 Disk New 1 New 2 Disk Crash Crash Crash Crash Crash (a) (b) (c) (d) (e) ECC error Figure 5-27. Analysis of the influence of crashes on stable writes. In read more..

  • Page - 419

    388 INPUT/OUTPUT CHAP. 5 block number, for example, −1. Under these conditions, after a crash the recovery program can check the nonvolatile RAM to see if a stable write happened to be in progress during the crash, and if so, which block was being written when the crashed happened. The two copies of the block can then be checked for correctness and consistency. If read more..

  • Page - 420

    SEC. 5.5 CLOCKS 389 this base signal can be multiplied by a small integer to get frequencies up to several gigahertz or even more. At least one such circuit is usually found in any computer, providing a synchronizing signal to the computer’s various circuits. This signal is fed into the counter to make it count down to zero. When the counter gets to zero, it causes a read more..

  • Page - 421

    390 INPUT/OUTPUT CHAP. 5 5.5.2 Clock Software All the clock hardware does is generate interrupts at known intervals. Every- thing else involving time must be done by the software, the clock driver. The exact duties of the clock driver vary among operating systems, but usually include most of the following: 1. Maintaining the time of day. 2. Preventing processes from running longer read more..

  • Page - 422

    SEC. 5.5 CLOCKS 391 (a) (b) (c) Time of day in ticks Time of day in seconds Counter in ticks System boot time in seconds Number of ticks in current second 64 bits 32 bits 32 bits Figure 5-29. Three ways to maintain the time of day. how long the process has run. To do things right, the second timer should be saved when an interrupt occurs and restored afterward. A less read more..

  • Page - 423

    392 INPUT/OUTPUT CHAP. 5 Current time Next signal Clock header 3 4 6 2 1X 4200 3 Figure 5-30. Simulating multiple timers with a single clock. Note that during a clock interrupt, the clock driver has several things to do— increment the real time, decrement the quantum and check for 0, do CPU ac- counting, and decrement the alarm counter. Howev er, each of these operations has read more..

  • Page - 424

    SEC. 5.5 CLOCKS 393 timer is very high. Below we will briefly describe a software-based timer scheme that works well under many circumstances, even at fairly high frequencies. The idea is due to Aron and Druschel (1999). For more details, please see their paper. Generally, there are two ways to manage I/O: interrupts and polling. Interrupts have low latency, that is, they read more..

  • Page - 425

    394 INPUT/OUTPUT CHAP. 5 an occasional missed deadline. Being 10 μsec late from time to time is often better than having interrupts eat up 35% of the CPU. Of course, there will be periods when there are no system calls, TLB misses, or page faults, in which case no soft timers will go off. To put an upper bound on these intervals, the second hardware timer can be set to go off, read more..

  • Page - 426

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 395 a key release. It is up to the driver to keep track of the status of each key (up or down). So all the hardware does is give press and release interrupts. Software does the rest. When the A key is struck, for example, the scan code (30) is put in an I/O reg- ister. It is up to the driver to determine read more..

  • Page - 427

    396 INPUT/OUTPUT CHAP. 5 requested input, so the characters must be buffered to allow type ahead. Either a dedicated buffer can be used or buffers can be allocated from a pool. The former puts a fixed limit on type ahead; the latter does not. This issue arises most acutely when the user is typing to a shell window (command-line window in Windows) and has just issued a read more..

  • Page - 428

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 397 standard. The defaults are all control characters that should not conflict with text input or codes used by programs; all except the last two can be changed under pro- gram control. Character POSIX name Comment CTRL-H ERASE Backspace one character CTRL-U KILL Erase entire line being typed CTRL-V LNEXT Inter pret next character read more..

  • Page - 429

    398 INPUT/OUTPUT CHAP. 5 twice consecutively. After seeing a CTRL-V, the driver sets a flag saying that the next character is exempt from special processing. The LNEXT character itself is not entered in the character queue. To allow users to stop a screen image from scrolling out of view, control codes are provided to freeze the screen and restart it later. In UNIX these read more..

  • Page - 430

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 399 and make continuous low-resolution photos of the surface under them, looking for changes from image to image. Whenever a mouse has moved a certain minimum distance in either direction or a button is depressed or released, a message is sent to the computer. The mini- mum distance is about 0.1 mm (although it can be set in read more..

  • Page - 431

    400 INPUT/OUTPUT CHAP. 5 own escape sequences. As a consequence, it was difficult to write software that worked on more than one terminal type. One solution, which was introduced in Berkeley UNIX, was a terminal data- base called termcap. This software package defined a number of basic actions, such as moving the cursor to (row, column). To move the cursor to a particular read more..

  • Page - 432

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 401 The X Window System Nearly all UNIX systems base their user interface on the XWindowSystem (often just called X), developed at M.I.T. as part of project Athena in the 1980s. It is very portable and runs entirely in user space. It was originally intended for con- necting a large number of remote user terminals with a read more..

  • Page - 433

    402 INPUT/OUTPUT CHAP. 5 Remote host Window manager Application program Motif Intrinsics Xlib X client UNIX Hardware X server UNIX Hardware Window User space Kernel space X protocol Network Figure 5-33. Clients and servers in the M.I.T. X Window System. elements, called widgets. To make a true GUI interface, with a uniform look and feel, another layer is needed (or several of them). One read more..

  • Page - 434

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 403 program itself. X considers this connection to be reliable in the sense that lost and duplicate messages are handled by the networking software and it does not have to worry about communication errors. Usually, TCP/IP is used between the client and server. Four kinds of messages go over the connection: 1. Drawing commands from read more..

  • Page - 435

    404 INPUT/OUTPUT CHAP. 5 #include <X11/Xlib.h> #include <X11/Xutil.h> main(int argc, char *argv[]) { Display disp; /* ser ver identifier */ Window win; /* window identifier */ GC gc; /* graphic context identifier */ XEvent event; /* storage for one event */ int running = 1; disp = XOpenDisplay("display name"); /* connect to the X server */ win = XCreateSimpleWindow(disp, read more..

  • Page - 436

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 405 It is worth mentioning that not everyone likes a GUI. Many programmers pre- fer a traditional command-line oriented interface of the type discussed in Sec. 5.6.1 above. X handles this via a client program called xterm. This program emulates a venerable VT102 intelligent terminal, complete with all the escape sequences. Thus editors read more..

  • Page - 437

    406 INPUT/OUTPUT CHAP. 5 appear on the screen. Graphics adapters often have powerful 32- or 64-bit CPUs and up to 4 GB of their own RAM, separate from the computer’s main memory. Each graphics adapter supports some number of screen sizes. Common sizes (horizontal × vertical in pixels) are 1280 × 960, 1600 × 1200, 1920 ×1080, 2560 × 1600, and 3840 × 2160. Many read more..

  • Page - 438

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 407 Thumb Title bar File Edit View Tools Options Help Client area (200, 100) (0, 0) (0, 767) Menu bar Tool bar Window Scroll bar (1023, 767) (1023, 0) 12 6 93 4 8 5 7 1 11 2 10 Figure 5-35. A sample window located at (200, 100) on an XGA display. To make this programming model clearer, consider the example of Fig. 5-36. Here we read more..

  • Page - 439

    408 INPUT/OUTPUT CHAP. 5 #include <windows.h> int WINAPI WinMain(HINSTANCE h, HINSTANCE, hprev, char *szCmd, int iCmdShow) { WNDCLASS wndclass; /* class object for this window */ MSG msg; /* incoming messages are stored here */ HWND hwnd; /* handle (pointer) to the window object */ /* Initialize wndclass */ wndclass.lpfnWndProc = WndProc; /* tells which procedure to call */ read more..

  • Page - 440

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 409 also a 32-bit signed integer), s (string), sz (string terminated by a zero byte), p (pointer), fn (function), and h (handle). Thus szCmd is a zero-terminated string and iCmdShow is an integer, for example. Many programmers believe that en- coding the type in variable names this way has little value and makes Windows code read more..

  • Page - 441

    410 INPUT/OUTPUT CHAP. 5 There are two ways Windows can get a program to do something. One way is to post a message to its message queue. This method is used for keyboard input, mouse input, and timers that have expired. The other way, sending a message to the window, inv olves having Windows directly call WndProc itself. This method is used for all other events. read more..

  • Page - 442

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 411 A complete treatment of the GDI is out of the question here. For the interested reader, the references cited above provide additional information. Nevertheless, given how important it is, a few words about the GDI are probably worthwhile. GDI has various procedure calls to get and release device contexts, obtain infor- mation read more..

  • Page - 443

    412 INPUT/OUTPUT CHAP. 5 Windows metafile and is widely used to transmit drawings from one Windows pro- gram to another. Such files have extension .wmf. Many Windows programs allow the user to copy (part of) a drawing and put it on the Windows clipboard. The user can then go to another program and paste the contents of the clipboard into another document. One way of read more..

  • Page - 444

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 413 02 4 6 8 0 2 4 6 8 02 4 6 8 0 2 4 6 8 Window 1 Window 2 02 4 6 8 0 2 4 6 8 02 4 6 8 0 2 4 6 8 Window 1 Window 2 (a) (b) Figure 5-38. Copying bitmaps using BitBlt. (a) Before. (b) After. have file and information headers and a color table before the pixels. This infor- mation makes it easier to move bitmaps read more..

  • Page - 445

    414 INPUT/OUTPUT CHAP. 5 20 pt: 53 pt: 81 pt: Figure 5-39. Some examples of character outlines at different point sizes. error. To improve the quality still more, it is possible to embed hints in each char- acter telling how to do the rasterization. For example, both serifs on the top of the letter T should be identical, something that might not otherwise be the case read more..

  • Page - 446

    SEC. 5.6 USER INTERFACES: KEYBOARD, MOUSE, MONITOR 415 plastic. However, a thin film of ITO (Indium Tin Oxide) or some similar con- ducive material) is printed in thin lines onto the surface’s underside. Beneath it, but not quite touching it, is a second surface also coated with a layer of ITO. On the top surface, the charge runs in the vertical direction and there read more..

  • Page - 447

    416 INPUT/OUTPUT CHAP. 5 Manipulating a touch screen with just a single finger is still fairly WIMPy— you just replace the mouse pointer with your stylus or index finger. Multitouch is a bit more complicated. Touching the screen with fiv e fingers is like pushing fiv e mouse pointers across the screen at the same time and clearly changes things for the window manager. read more..

  • Page - 448

    SEC. 5.7 THIN CLIENTS 417 and other things that used to require PC software. It is even possible that eventually the only software people run on their PC is a Web browser, and maybe not even that. It is probably a fair conclusion to say that most users want high-performance interactive computing but do not really want to administer a computer. This has led researchers to read more..

  • Page - 449

    418 INPUT/OUTPUT CHAP. 5 an industry used to a doubling of performance every 18 months (Moore’s law), having no progress at all seems like a violation of the laws of physics, but that is the current situation. As a consequence, making computers use less energy so existing batteries last longer is high on everyone’s agenda. The operating system plays a major role here, as read more..

  • Page - 450

    SEC. 5.8 POWER MANAGEMENT 419 nothing except send a signal to the operating system, which does the rest in soft- ware. In some countries, electrical devices must, by law, hav e a mechanical power switch that breaks a circuit and removes power from the device, for safety reasons. To comply with this law, another switch may be needed. Power management brings up a number read more..

  • Page - 451

    420 INPUT/OUTPUT CHAP. 5 annoying delay while it is restarted. On the other hand, if it waits too long to shut down a device, energy is wasted for nothing. The trick is to find algorithms and heuristics that let the operating system make good decisions about what to shut down and when. The trouble is that ‘‘good’’ is highly subjective. One user may find it read more..

  • Page - 452

    SEC. 5.8 POWER MANAGEMENT 421 Window 1 Window 2 Window 1 Window 2 Zone (a) (b) Figure 5-41. The use of zones for backlighting the display. (a) When window 2 is selected, it is not moved. (b) When window 1 is selected, it moves to reduce the number of zones illuminated. When it is next needed, it is spun up again. Unfortunately, a stopped disk is hiber- nating rather than read more..

  • Page - 453

    422 INPUT/OUTPUT CHAP. 5 On many computers, there is a relationship between CPU voltage, clock cycle, and power usage. The CPU voltage can often be reduced in software, which saves energy but also reduces the clock cycle (approximately linearly). Since power con- sumed is proportional to the square of the voltage, cutting the voltage in half makes the CPU about half as fast read more..

  • Page - 454

    SEC. 5.8 POWER MANAGEMENT 423 more data to transmit. Finally, it will give up and go to sleep, because continuous polling is very bad for power consumption. Shortly after, the producer provides more data, but now the network stack is fast sleep. Waking up the stack takes time and slows down the throughput. One possible solution is never to sleep, but this is not attractive read more..

  • Page - 455

    424 INPUT/OUTPUT CHAP. 5 when it switches on the radio again. At that point any accumulated messages can be sent to it. Outgoing messages that are generated while the radio is off are buffered on the mobile computer. If the buffer threatens to fill up, the radio is turned on and the queue transmitted to the base station. When should the radio be switched off? One read more..

  • Page - 456

    SEC. 5.8 POWER MANAGEMENT 425 more. Most mobile devices have programs that can be run to query and display all these parameters. Smart batteries can also be instructed to change various opera- tional parameters under control of the operating system. Some notebooks have multiple batteries. When the operating system detects that one battery is about to go, it has to arrange for read more..

  • Page - 457

    426 INPUT/OUTPUT CHAP. 5 In order to measure the energy usage, Flinn and Satyanarayanan devised a soft- ware tool called PowerScope. What it does is provide a power-usage profile of a program. To use it, a computer must be hooked up to an external power supply through a software-controlled digital multimeter. Using the multimeter, software is able to read out the number of read more..

  • Page - 458

    SEC. 5.9 RESEARCH ON INPUT/OUTPUT 427 existing buffering systems (DeBruijn and Bos, 2008). Streamline is especially use- ful for demanding network applications. Megapipe (Han et al., 2012) is another network I/O architecture for message-oriented workloads. It creates per-core bidi- rectional channels between the kernel and user space, on which the systems layers abstractions like lightweight read more..

  • Page - 459

    428 INPUT/OUTPUT CHAP. 5 leads to substantial overhead. Getting rid of this overhead is where the research comes in (Tsafir et al., 2005). Similarly, interrupt latency is still a concern for research groups, especially in the area of real-time operating systems. Since these are often found embedded in critical systems (like controls of brake and steering systems), permitting interrupts read more..

  • Page - 460

    SEC. 5.10 SUMMARY 429 Character-oriented terminals have a variety of issues concerning special char- acters that can be input and special escape sequences that can be output. Input can be in raw mode or cooked mode, depending on how much control the program wants over the input. Escape sequences on output control cursor movement and allow for inserting and deleting text on read more..

  • Page - 461

    430 INPUT/OUTPUT CHAP. 5 controller, how long will it take to transfer 1000 words from the disk controller to main memory, if (a) word-at-a-time mode is used, (b) burst mode is used? Assume that com- manding the disk controller requires acquiring the bus to send one word and acknowl- edging a transfer also requires acquiring the bus to send one word. 7. One mode that read more..

  • Page - 462

    CHAP. 5 PROBLEMS 431 Then it copies the data to the network controller board. When all the bytes are safely inside the controller, they are sent over the network at a rate of 10 megabits/sec. The receiving network controller stores each bit a microsecond after it is sent. When the last bit arrives, the destination CPU is interrupted, and the kernel copies the newly arri- read more..

  • Page - 463

    432 INPUT/OUTPUT CHAP. 5 29. A disk manufacturer has two 5.25-inch disks that each have 10,000 cylinders. The newer one has double the linear recording density of the older one. Which disk proper- ties are better on the newer drive and which are the same? Are any worse on the newer one? 30. A computer manufacturer decides to redesign the partition table of a Pentium read more..

  • Page - 464

    CHAP. 5 PROBLEMS 433 39. A system simulates multiple clocks by chaining all pending clock requests together as shown in Fig. 5-30. Suppose the current time is 5000 and there are pending clock re- quests for time 5008, 5012, 5015, 5029, and 5037. Show the values of Clock header, Current time, and Next signal at times 5000, 5005, and 5013. Suppose a new (pending) signal read more..

  • Page - 465

    434 INPUT/OUTPUT CHAP. 5 Rectangle(hdc, xleft, ytop, xright, ybottom); Is there any real need for the first parameter (hdc), and if so, what? After all, the coor- dinates of the rectangle are explicitly specified as parameters. 50. A thin-client terminal is used to display a Web page containing an animated cartoon of size 400 pixels × 160 pixels running at 10 frames/sec. What read more..

  • Page - 466

    6 DEADLOCKS Computer systems are full of resources that can be used only by one process at a time. Common examples include printers, tape drives for backing up company data, and slots in the system’s internal tables. Having two processes simultan- eously writing to the printer leads to gibberish. Having two processes using the same file-system table slot invariably will lead to read more..

  • Page - 467

    436 DEADLOCKS CHAP. 6 Deadlocks can also occur in a variety of other situations.. In a database sys- tem, for example, a program may have to lock several records it is using, to avoid race conditions. If process A locks record R1 and process B locks record R2,and then each process tries to lock the other one’s record, we also have a deadlock. Thus, deadlocks can read more..

  • Page - 468

    SEC. 6.1 RESOURCES 437 swapping it out and swapping A in. Now A can run, do its printing, and then re- lease the printer. No deadlock occurs. A nonpreemptable resource, in contrast, is one that cannot be taken away from its current owner without potentially causing failure. If a process has begun to burn a Blu-ray, suddenly taking the Blu-ray recorder away from it and read more..

  • Page - 469

    438 DEADLOCKS CHAP. 6 semaphores are all initialized to 1. Mutexes can be used equally well. The three steps listed above are then implemented as a down on the semaphore to acquire the resource, the use of the resource, and finally an up on the resource to release it. These steps are shown in Fig. 6-1(a). typedef int semaphore; typedef int semaphore; semaphore resource 1; read more..

  • Page - 470

    SEC. 6.2 INTRODUCTION TO DEADLOCKS 439 typedef int semaphore; semaphore resource 1; semaphore resource 1; semaphore resource 2; semaphore resource 2; void process A(void) { void process A(void) { down(&resource 1); down(&resource 1); down(&resource 2); down(&resource 2); use both resources( ); use both resources( ); up(&resource 2); up(&resource 2); up(&resource 1); read more..

  • Page - 471

    440 DEADLOCKS CHAP. 6 6.2.1 Conditions for Resource Deadlocks Coffman et al. (1971) showed that four conditions must hold for there to be a (resource) deadlock: 1. Mutual exclusion condition. Each resource is either currently assign- ed to exactly one process or is available. 2. Hold-and-wait condition. Processes currently holding resources that were granted earlier can request new read more..

  • Page - 472

    SEC. 6.2 INTRODUCTION TO DEADLOCKS 441 (a) (b) (c) T U D C S B A R Figure 6-3. Resource allocation graphs. (a) Holding a resource. (b) Requesting a resource. (c) Deadlock. requests and releases of the three processes are given in Fig. 6-4(a)–(c). The oper- ating system is free to run any unblocked process at any instant, so it could decide to run A until A finished all its read more..

  • Page - 473

    442 DEADLOCKS CHAP. 6 (j) A Request R Request S Release R Release S B Request S Request T Release S Release T C Request T Request R Release T Release R 1. A requests R 2. B requests S 3. C requests T 4. A requests S 5. B requests T 6. C requests R deadlock 1. A requests R 2. C requests T 3. A requests read more..

  • Page - 474

    SEC. 6.2 INTRODUCTION TO DEADLOCKS 443 leads to deadlock. We just carry out the requests and releases step by step, and after every step we check the graph to see if it contains any cycles. If so, we have a deadlock; if not, there is no deadlock. Although our treatment of resource graphs has been for the case of a single resource of each type, resource graphs can read more..

  • Page - 475

    444 DEADLOCKS CHAP. 6 recover after the fact. In this section we will look at some of the ways deadlocks can be detected and some of the ways recovery from them can be handled. 6.4.1 Deadlock Detection with One Resource of Each Type Let us begin with the simplest case: there is only one resource of each type. Such a system might have one scanner, one Blu-ray read more..

  • Page - 476

    SEC. 6.4 DEADLOCK DETECTION AND RECOVERY 445 R S T T U V U V W C D E D E G G A F B (a) (b) Figure 6-5. (a) A resource graph. (b) A cycle extracted from (a). uses one dynamic data structure, L, a list of nodes, as well as a list of arcs. During the algorithm, to prevent repeated inspections, arcs will be marked to indicate that they hav e already been inspected, The read more..

  • Page - 477

    446 DEADLOCKS CHAP. 6 any cycles. If this property holds for all nodes, the entire graph is cycle free, so the system is not deadlocked. To see how the algorithm works in practice, let us use it on the graph of Fig. 6-5(a). The order of processing the nodes is arbitrary, so let us just inspect them from left to right, top to bottom, first running the algorithm read more..

  • Page - 478

    SEC. 6.4 DEADLOCK DETECTION AND RECOVERY 447 Resources in existence (E 1, E 2, E 3, …, E m) Current allocation matrix C 11 C 21 C n1 C 12 C 22 C n2 C 13 C 23 C n3 C 1m C 2m C nm Row n is current allocation to process n Resources available (A 1, A 2, A 3, …, A m) Request matrix R 11 R 21 R n1 R 12 R 22 R n2 R 13 R 23 R n3 R 1m R 2m R nm Row 2 is what process 2 needs read more..

  • Page - 479

    448 DEADLOCKS CHAP. 6 finish, they are deadlocked. Although the algorithm is nondeterministic (because it may run the processes in any feasible order), the result is always the same. As an example of how the deadlock detection algorithm works, see Fig. 6-7. Here we have three processes and four resource classes, which we have arbitrarily labeled tape drives, plotters, scanners, read more..

  • Page - 480

    SEC. 6.4 DEADLOCK DETECTION AND RECOVERY 449 6.4.3 Recovery from Deadlock Suppose that our deadlock detection algorithm has succeeded and detected a deadlock. What next? Some way is needed to recover and get the system going again. In this section we will discuss various ways of recovering from deadlock. None of them are especially attractive, howev er. Recovery through Preemption In read more..

  • Page - 481

    450 DEADLOCKS CHAP. 6 Recovery through Killing Processes The crudest but simplest way to break a deadlock is to kill one or more proc- esses. One possibility is to kill a process in the cycle. With a little luck, the other processes will be able to continue. If this does not help, it can be repeated until the cycle is broken. Alternatively, a process not in the cycle read more..

  • Page - 482

    SEC. 6.5 DEADLOCK AVOIDANCE 451 In Fig. 6-8 we see a model for dealing with two processes and two resources, for example, a printer and a plotter. The horizontal axis represents the number of instructions executed by process A. The vertical axis represents the number of in- structions executed by process B.At I1 A requests a printer; at I2 it needs a plotter. The printer read more..

  • Page - 483

    452 DEADLOCKS CHAP. 6 point t the only safe thing to do is run process A until it gets to I4. Beyond that, any trajectory to u will do. The important thing to see here is that at point t, B is requesting a resource. The system must decide whether to grant it or not. If the grant is made, the system will enter an unsafe region and eventually deadlock. To avoid read more..

  • Page - 484

    SEC. 6.5 DEADLOCK AVOIDANCE 453 A B C 3 2 2 9 4 7 Free: 3 (a) A B C 4 2 2 9 4 7 Free: 2 (b) A B C 4 4— 4 2 9 7 Free: 0 (c) A B C 4 — 2 9 7 Free: 4 (d) Has Max Has Max Has Max Has Max Figure 6-10. Demonstration that the state in (b) is not safe. processes needs fiv e. There is no sequence that guarantees completion. Thus, the allocation decision that moved the read more..

  • Page - 485

    454 DEADLOCKS CHAP. 6 A B C D 0 0 0 0 6 Has Max 5 4 7 Free: 10 A B C D 1 1 2 4 6 Has Max 5 4 7 Free: 2 A B C D 1 2 2 4 6 Has Max 5 4 7 Free: 1 (a) (b) (c) Figure 6-11. Three resource allocation states: (a) Safe. (b) Safe. (c) Unsafe. the customers suddenly asked for their maximum loans, the banker could not sat- isfy any of them, and we would have a deadlock. An read more..

  • Page - 486

    SEC. 6.5 DEADLOCK AVOIDANCE 455 These matrices are just C and R from Fig. 6-6. As in the single-resource case, processes must state their total resource needs before executing, so that the system can compute the right-hand matrix at each instant. The three vectors at the right of the figure show the existing resources, E,the possessed resources, P, and the available resources, A, read more..

  • Page - 487

    456 DEADLOCKS CHAP. 6 those of the banker’s algorithm to prevent deadlock. For instance, networks may throttle traffic when buffer utilization reaches higher than, say, 70%—estimating that the remaining 30% will be sufficient for current users to complete their service and return their resources. 6.6 DEADLOCK PREVENTION Having seen that deadlock avoidance is essentially impossible, because read more..

  • Page - 488

    SEC. 6.6 DEADLOCK PREVENTION 457 all processes to request all their resources before starting execution. If ev erything is available, the process will be allocated whatever it needs and can run to comple- tion. If one or more resources are busy, nothing will be allocated and the process will just wait. An immediate problem with this approach is that many processes do not know read more..

  • Page - 489

    458 DEADLOCKS CHAP. 6 resources whenever they want to, but all requests must be made in numerical order. A process may request first a printer and then a tape drive, but it may not request first a plotter and then a printer. (a) (b) 1. Imagesetter 2. Printer 3. Plotter 4. Tape drive 5. Blu-ray drive A i B j Figure 6-13. (a) Numerically ordered resources. (b) A resource read more..

  • Page - 490

    SEC. 6.7 OTHER ISSUES 459 Condition Approach Mutual exclusion Spool ev erything Hold and wait Request all resources initially No preemption Take resources away Circular wait Order resources numer ically Figure 6-14. Summary of approaches to deadlock prevention. 6.7.1 Two-Phase Locking Although both avoidance and prevention are not terribly promising in the gen- eral case, for specific read more..

  • Page - 491

    460 DEADLOCKS CHAP. 6 complete service if their execution were not interleaved with competing processes. A process locks resources in order to prevent inconsistent resource states caused by interleaved access to resources. Interleaved access to locked resources, however, enables resource deadlock. In Fig. 6-2 we saw a resource deadlock where the re- sources were semaphores. A semaphore is read more..

  • Page - 492

    SEC. 6.7 OTHER ISSUES 461 Readers interested in network protocols might be interested in another book by one of the authors, Computer Networks (Tanenbaum and Wetherall, 2010). Not all deadlocks occurring in communication systems or networks are com- munication deadlocks. Resource deadlocks can also occur there. Consider, for ex- ample, the network of Fig. 6-15. It is a simplified view read more..

  • Page - 493

    462 DEADLOCKS CHAP. 6 Consider an atomic primitive try lock in which the calling process tests a mutex and either grabs it or returns failure. In other words, it never blocks. Pro- grammers can use it together with acquire lock which also tries to grab the lock, but blocks if the lock is not available. Now imagine a pair of processes running in parallel (perhaps on read more..

  • Page - 494

    SEC. 6.7 OTHER ISSUES 463 Now suppose that a UNIX system has 100 process slots. Ten programs are running, each of which needs to create 12 children. After each process has created 9 processes, the 10 original processes and the 90 new processes have exhausted the table. Each of the 10 original processes now sits in an endless loop forking and failing—a livelock. The read more..

  • Page - 495

    464 DEADLOCKS CHAP. 6 It is worth mentioning that some people do not make a distinction between starvation and deadlock because in both cases there is no forward progress. Others feel that they are fundamentally different because a process could easily be pro- grammed to try to do something n times and, if all of them failed, try something else. A blocked process does not read more..

  • Page - 496

    SEC. 6.9 SUMMARY 465 forever. Commonly the event that the processes are waiting for is the release of some resource held by another member of the set. Another situation in which deadlock is possible is when a set of communicating processes are all waiting for a message and the communication channel is empty and no timeouts are pending. Resource deadlock can be avoided by read more..

  • Page - 497

    466 DEADLOCKS CHAP. 6 Gridlock is a resource deadlock and a problem in competition synchronization. New York City’s prevention algorithm, called "don’t block the box," prohibits cars from entering an intersection unless the space following the intersection is also available. Which prevention algorithm is this? Can you provide any other prevention algorithms for gridlock? 7. Suppose read more..

  • Page - 498

    CHAP. 6 PROBLEMS 467 15. Explain how the system can recover from the deadlock in previous problem using (a) recovery through preemption. (b) recovery through rollback. (c) recovery through killing processes. 16. Suppose that in Fig. 6-6 Cij + Rij > E j for some i. What implications does this have for the system? 17. All the trajectories in Fig. 6-8 are horizontal or read more..

  • Page - 499

    468 DEADLOCKS CHAP. 6 28. Tw o processes, A and B, each need three records, 1, 2, and 3, in a database. If A asks for them in the order 1, 2, 3, and B asks for them in the same order, deadlock is not possible. However, if B asks for them in the order 3, 2, 1, then deadlock is possible. With three resources, there are 3! or six possible combinations in read more..

  • Page - 500

    CHAP. 6 PROBLEMS 469 ions. In the Ethernet protocol, stations requesting the shared channel do not transmit frames if they sense the medium is busy. When such transmission has terminated, waiting stations each transmit their frames. Two frames that are transmitted at the same time will collide. If stations immediately and repeatedly retransmit after collision de- tection, they will read more..

  • Page - 501

    470 DEADLOCKS CHAP. 6 41. Program a simulation of the banker’s algorithm. Your program should cycle through each of the bank clients asking for a request and evaluating whether it is safe or unsafe. Output a log of requests and decisions to a file. 42. Write a program to implement the deadlock detection algorithm with multiple re- sources of each type. Your program should read more..

  • Page - 502

    7 VIRTUALIZATION AND THE CLOUD In some situations, an organization has a multicomputer but does not actually want it. A common example is where a company has an email server, a Web server, an FTP server, some e-commerce servers, and others. These all run on different computers in the same equipment rack, all connected by a high-speed network, in other words, a multicomputer. read more..

  • Page - 503

    472 VIRTUALIZATION AND THE CLOUD CHAP. 7 1960s. Even so, the way we use it today is definitely new. The main idea is that a VMM (Virtual Machine Monitor) creates the illusion of multiple (virtual) ma- chines on the same physical hardware. A VMM is also known as a hypervisor.As discussed in Sec. 1.7.5, we distinguish between type 1 hypervisors which run on the bare metal, read more..

  • Page - 504

    SEC. 7.1 HISTORY 473 amount of critical state information about every process is kept in operating system tables, including information relating to open files, alarms, signal handlers, and more. When migrating a virtual machine, all that have to be moved are the memory and disk images, since all the operating system tables move, too. Another use for virtual machines is to run read more..

  • Page - 505

    474 VIRTUALIZATION AND THE CLOUD CHAP. 7 SIMMON and CP-40. While CP-40 was a research project, it was reimplemented as CP-67 to form the control program of CP/CMS, a virtual machine operating system for the IBM System/360 Model 67. Later, it was reimplemented again and released as VM/370 for the System/370 series in 1972. The System/370 line was replaced by IBM in the 1990s read more..

  • Page - 506

    SEC. 7.2 REQUIREMENTS FOR VIRTUALIZATION 475 hypervisor to provide this illusion and to do it efficiently. Indeed, hypervisors should score well in three dimensions: 1. Safety: the hypervisor should have full control of the virtualized re- sources. 2. Fidelity: the behavior of a program on a virtual machine should be identical to that of the same program running on bare hardware. read more..

  • Page - 507

    476 VIRTUALIZATION AND THE CLOUD CHAP. 7 sensitive state in user mode without causing a trap. For example, on x86 proces- sors prior to 2005, a program can determine whether it is running in user mode or kernel mode by reading its code-segment selector. An operating system that did this and discovered that it was actually in user mode might make an incorrect de- cision read more..

  • Page - 508

    SEC. 7.2 REQUIREMENTS FOR VIRTUALIZATION 477 machine-like software interface that explicitly exposes the fact that it is a virtu- alized environment. For instance, it offers a set of hypercalls, which allow the guest to send explicit requests to the hypervisor (much as a system call offers ker- nel services to applications). Guests use hypercalls for privileged sensitive opera- tions read more..

  • Page - 509

    478 VIRTUALIZATION AND THE CLOUD CHAP. 7 Type 1 hypervisor Hardware (CPU, disk, network, interrupts, etc.) Hardware (CPU, disk, network, interrupts, etc.) Host OS (e.g., Linux) Control Domain Linux Windows Excel Word Mplayer Emacs Type 2 hypervisor Guest OS (e.g., Windows) Guest OS process Host OS process Figure 7-1. Location of type 1 and type 2 hypervisors. on the x86 market was VMware read more..

  • Page - 510

    SEC. 7.4 TECHNIQUES FOR EFFICIENT VIRTUALIZATION 479 runs on the bare metal. The virtual machine runs as a user process in user mode, and as such is not allowed to execute sensitive instructions (in the Popek-Goldberg sense). However, the virtual machine runs a guest operating system that thinks it is in kernel mode (although, of course, it is not). We will call this read more..

  • Page - 511

    480 VIRTUALIZATION AND THE CLOUD CHAP. 7 ring 0. The remaining two rings are not used by any current operating system. In other words, hypervisors were free to use them as they pleased. As shown in Fig. 7-4, many virtualization solutions therefore kept the hypervisor in kernel mode (ring 0) and the applications in user mode (ring 3), but put the guest operating sys- tem in read more..

  • Page - 512

    SEC. 7.4 TECHNIQUES FOR EFFICIENT VIRTUALIZATION 481 it can be executed immediately. Otherwise, it is first translated, cached, then ex- ecuted. Eventually, most of the program will be in the cache and run at close to full speed. Various optimizations are used, for example, if a basic block ends by jumping to (or calling) another one, the final instruction can be replaced by a read more..

  • Page - 513

    482 VIRTUALIZATION AND THE CLOUD CHAP. 7 needs to clean it up and restore the original processor context. Suppose, for instance, that the guest is running when an interrupt arrives from an external de- vice. Since a type 2 hypervisor depends on the host’s device drivers to handle the interrupt, it needs to reconfigure the hardware completely to run the host operating system read more..

  • Page - 514

    SEC. 7.4 TECHNIQUES FOR EFFICIENT VIRTUALIZATION 483 instruction that may take as little as one to three cycles. Thus, the translated code is faster. Still, with modern VT hardware, usually the hardware beats the software. On the other hand, if the guest operating system modifies its page tables, this is very costly. The problem is that each guest operating system on a virtual read more..

  • Page - 515

    484 VIRTUALIZATION AND THE CLOUD CHAP. 7 (or wrong)—yet—we think we do right by e xploring the similarity between hyper- visors and microkernels a bit more. The main reason the first hypervisors emulated the complete machine was the lack of availability of source code for the guest operating system (e.g., for Win- dows) or the vast number of variants (e.g., for Linux). Perhaps in the read more..

  • Page - 516

    SEC. 7.5 ARE HYPERVISORS MICROKERNELS DONE RIGHT? 485 Paravirtualizing the guest operating system raises a number of issues. First, if the sensitive instructions are replaced with calls to the hypervisor, how can the op- erating system run on the native hardware? After all, the hardware does not under- stand these hypercalls. And second, what if there are multiple hypervisors read more..

  • Page - 517

    486 VIRTUALIZATION AND THE CLOUD CHAP. 7 7.6 MEMORY VIRTUALIZATION So far we have addressed the issue of how to virtualize the CPU. But a com- puter system has more than just a CPU. It also has memory and I/O devices. They have to be virtualized, too. Let us see how that is done. Modern operating systems nearly all support virtual memory, which is basical- ly a read more..

  • Page - 518

    SEC. 7.6 MEMORY VIRTUALIZATION 487 a shadow page table at this point and also map the top-level page table and the page tables it points to as read only. A subsequent attempts by the guest operating system to modify any of them will cause a page fault and thus give control to the hypervisor, which can analyze the instruction stream, figure out what the guest OS is read more..

  • Page - 519

    488 VIRTUALIZATION AND THE CLOUD CHAP. 7 Hardware Support for Nested Page Tables The cost of handling shadow page tables led chip makers to add hardware sup- port for nested page tables. Nested page tables is the term used by AMD. Intel refers to them as EPT (Extended Page Tables). They are similar and aim to re- move most of the overhead by handling the additional read more..

  • Page - 520

    SEC. 7.6 MEMORY VIRTUALIZATION 489 Level 1 offset 63 48 47 39 38 30 29 21 20 12 11 0 Level 2 offset Level 3 offset Level 4 offset page offset + + etc. Guest pointer to level 1 page table Guest pointer to entry in level 1 page table Guest pointer to entry in level 2 page table Look up in nested page tables Look up in nested page tables Figure 7-7. Extended/nested page read more..

  • Page - 521

    490 VIRTUALIZATION AND THE CLOUD CHAP. 7 For instance, suppose that the hypervisor pages out a page P. A little later, the guest OS also decides to page out this page to disk. Unfortunately, the hypervisor’s swap space and the guest’s swap space are not the same. In other words, the hyper- visor must first page the contents back into memory, only to see the guest read more..

  • Page - 522

    SEC. 7.7 I/O VIRTUALIZATION 491 hardware devices was one of the reasons VM/370 became popular: companies wanted to buy new and faster hardware but did not want to change their software. Virtual machine technology made this possible. Another interesting trend related to I/O is that the hypervisor can take the role of a virtual switch. In this case, each virtual machine has a MAC read more..

  • Page - 523

    492 VIRTUALIZATION AND THE CLOUD CHAP. 7 the CPU that currently runs the virtual machine and with the vector number that the VM expects (e.g., 66). Finally, having an I/O MMU also helps 32-bit devices access memory above 4 GB. Normally, such devices are unable to access (e.g., DMA to) addresses beyond 4 GB, but the I/O MMU can easily remap the device’s lower addresses read more..

  • Page - 524

    SEC. 7.7 I/O VIRTUALIZATION 493 exclusive access to its own device is much easier if the hardware actually does the virtualization for you. On PCIe, this is known as single root I/O virtualization. Single root I/O virtualization (SR-IOV) allows us to bypass the hypervisor’s involvement in the communication between the driver and the device. Devices that support SR-IOV provide an read more..

  • Page - 525

    494 VIRTUALIZATION AND THE CLOUD CHAP. 7 7.9 VIRTUAL MACHINES ON MULTICORE CPUS The combination of virtual machines and multicore CPUs creates a whole new world in which the number of CPUs available can be set by the software. If there are, say, four cores, and each can run, for example, up to eight virtual machines, a single (desktop) CPU can be configured as a 32-node read more..

  • Page - 526

    SEC. 7.10 LICENSING ISSUES 495 the software on multiple virtual machines all running on the same physical ma- chine? Many software vendors are somewhat unsure of what to do here. The problem is much worse in companies that have a license allowing them to have n machines running the software at the same time, especially when virtual machines come and go on demand. In some read more..

  • Page - 527

    496 VIRTUALIZATION AND THE CLOUD CHAP. 7 7.11.1 Clouds as a Service In this section, we will look at clouds with a focus on virtualization and operat- ing systems. Specifically, we consider clouds that offer direct access to a virtual machine, which the user can use in any way he sees fit. Thus, the same cloud may run different operating systems, possibly on the same read more..

  • Page - 528

    SEC. 7.11 CLOUDS 497 in significant downtime. The challenge is to move the virtual machine from the hardware that needs servicing to the new machine without taking it down at all. A slightly better approach might be to pause the virtual machine, rather than shut it down. During the pause, we copy over the memory pages used by the virtual machine to the new hardware as read more..

  • Page - 529

    498 VIRTUALIZATION AND THE CLOUD CHAP. 7 7.12 CASE STUDY: VMWARE Since 1999, VMware, Inc. has been the leading commercial provider of virtu- alization solutions with products for desktops, servers, the cloud, and now even on cell phones. It provides not only hypervisors but also the software that manages virtual machines on a large scale. We will start this case study with a read more..

  • Page - 530

    SEC. 7.12 CASE STUDY: VMWARE 499 similarly ran on top of Windows NT. Both variants had identical functionality: users could create multiple virtual machines by specifying first the characteristics of the virtual hardware (such as how much memory to give the virtual machine, or the size of the virtual disk) and could then install the operating system of their choice within the read more..

  • Page - 531

    500 VIRTUALIZATION AND THE CLOUD CHAP. 7 authors the ACM Software System Award for VMware Workstation 1.0 for Lin- ux. The original VMware Workstation is described in a detailed technical article (Bugnion et al., 2012). Here we provide a summary of that paper. The idea was that a virtualization layer could be useful on commodity plat- forms built from x86 CPUs and primarily read more..

  • Page - 532

    SEC. 7.12 CASE STUDY: VMWARE 501 1. Compatibility. The notion of an ‘‘essentially identical en vironment’’ meant that any x86 operating system, and all of its applications, would be able to run without modifications as a virtual machine. A hypervisor needed to provide sufficient compatibility at the hardware level such that users could run whichever operating system, (down to the read more..

  • Page - 533

    502 VIRTUALIZATION AND THE CLOUD CHAP. 7 legacy support for multiple decades of backward compatibility. Over the years, it had introduced four main modes of operations (real, pro- tected, v8086, and system management), each of which enabled in different ways the hardware’s segmentation model, paging mechan- isms, protection rings, and security features (such as call gates). 3. x86 read more..

  • Page - 534

    SEC. 7.12 CASE STUDY: VMWARE 503 safely, on the hardware. When this is not possible, one approach is to specify a vir- tualizable subset of the processor architecture, and port the guest operating systems to that newly defined platform. This technique is known as paravirtualization (Barham et al., 2003; Whitaker et al., 2002) and requires source-code level modifi- cations of the read more..

  • Page - 535

    504 VIRTUALIZATION AND THE CLOUD CHAP. 7 VMM Shared modules (shadow MMU, I/O handling, …) Direct Execution Binary translation Decision Alg. Figure 7-8. High-level components of the VMware virtual machine monitor (in the absence of hardware support). 1. The virtual machine is currently running in kernel mode (ring 0 in the x86 architecture). 2. The virtual machine can disable interrupts read more..

  • Page - 536

    SEC. 7.12 CASE STUDY: VMWARE 505 with the identical instruction. This is possible only because the VMware VMM (of which the binary translator is a component) has previously configured the hard- ware to match the exact specification of the virtual machine: (a) the VMM uses shadow page tables, which ensures that the memory management unit can be used directly (rather than emulated) read more..

  • Page - 537

    506 VIRTUALIZATION AND THE CLOUD CHAP. 7 how to correctly virtualize ring 1 and ring 2, the VMware VMM simply had code to detect if a guest was trying to enter into ring 1 or ring 2, and, in that case, would abort execution of the virtual machine. This not only removed unnecessary code, but more importantly it allowed the VMware VMM to assume that ring 1 and ring 2 read more..

  • Page - 538

    SEC. 7.12 CASE STUDY: VMWARE 507 Virtual Hardware (front end) Back end 1 virtual x86 CPU, with the same instruction set extensions as the un- derlying hardware CUP Up to 512 MB of contiguous DRAM Multiplexed Emulated PCI Bus Scheduled by the host operating system on either a uniprocessor or multiprocessor host Allocated and managed by the host OS (page-by-page) Fully emulated compliant read more..

  • Page - 539

    508 VIRTUALIZATION AND THE CLOUD CHAP. 7 across physical boundaries (Nelson et al., 2005). In the cloud, it allows customers to deploy their virtual machines on any available server, without having to worry of the details of the underlying hardware. The Role of the Host Operating System The final critical design decision in VMware Workstation was to deploy it ‘‘on top’’ of read more..

  • Page - 540

    SEC. 7.12 CASE STUDY: VMWARE 509 CPU VMM Context Host OS Context Kernel mode User mode Disk int handler int handler IDTR Any Proc. Host OS write() fs scsi VMM Driver world switch VMM VMX Virtual Machine (i) (ii) (iii) (iv) (v) Figure 7-10. The VMware Hosted Architecture and its three components: VMX, VMM driver and VMM. These components each have different functions and operate independently read more..

  • Page - 541

    510 VIRTUALIZATION AND THE CLOUD CHAP. 7 VMware Workstation appears to run on top of an existing operating system, and, in fact, its VMX does run as a process of that operating system. However, the VMM operates at system level, in full control of the hardware, and without de- pending on any way on the host operating system. Figure 7-10 shows the relation- ship between the read more..

  • Page - 542

    SEC. 7.12 CASE STUDY: VMWARE 511 The careful reader will have wondered: what of the guest operating system’s kernel address space? The answer is simply that it is part of the virtual machine ad- dress space, and is present when running in the VMM context. Therefore, the guest operating system can use the entire address space, and in particular the same loca- tions in read more..

  • Page - 543

    512 VIRTUALIZATION AND THE CLOUD CHAP. 7 support 64-bit systems, the fundamental idea of having totally separate address spaces for the host operating system and the VMM remains valid today. In contrast, the approach to the virtualization of the x86 architecture changed rather dramatically with the introduction of hardware-assisted virtualization. Hard- ware-assisted virtualizations, such as read more..

  • Page - 544

    SEC. 7.12 CASE STUDY: VMWARE 513 x86 ESX hypervisor VMM VMM VMM VMM VM ESX VM VM VM Figure 7-12. ESX Server: VMware’s type 1 hypervisor. ESX Server (unlike VMware Workstation) required users to install a new system image on a boot partition. Despite the drawbacks, the trade-off made sense for dedicated deployments of virtualization in data centers, consisting of hundreds or thousands read more..

  • Page - 545

    514 VIRTUALIZATION AND THE CLOUD CHAP. 7 system optimized specifically to store virtual machine images and ensure high I/O throughput. This allows for extreme levels of per- formance. For example, VMware demonstrated back in 2011 that a single ESX Server could issue 1 million disk operations per second (VMware, 2011). 5. ESX Server made it easy to introduce new capabilities, which read more..

  • Page - 546

    SEC. 7.13 RESEARCH ON VIRTUALIZATION AND THE CLOUD 515 One of the nice things about virtualization hardware is that untrusted code can get direct but safe access to hardware features like page tables, and tagged TLBs. With this in mind, the Dune project (Belay, 2012) does not aim to provide a ma- chine abstraction, but rather it provides a process abstraction. The process is read more..

  • Page - 547

    516 VIRTUALIZATION AND THE CLOUD CHAP. 7 17. Give one case where a translated code can be faster than the original code, in a system using binary translation. 18. VMware does binary translation one basic block at a time, then it executes the block and starts translating the next one. Could it translate the entire program in advance and then execute it? If so, what are read more..

  • Page - 548

    8 MULTIPLE PROCESSOR SYSTEMS Since its inception, the computer industry has been driven by an endless quest for more and more computing power. The ENIAC could perform 300 operations per second, easily 1000 times faster than any calculator before it, yet people were not satisfied with it. We now hav e machines millions of times faster than the ENIAC and still there is a read more..

  • Page - 549

    518 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 in all, going from 1 MHz to 1 GHz simply required incrementally better engineer- ing of the chip manufacturing process. Going from 1 GHz to 1 THz is going to re- quire a radically different approach. One approach to greater speed is through massively parallel computers. These machines consist of many CPUs, each of which runs at read more..

  • Page - 550

    SEC. 8.1 MULTIPROCESSORS 519 main memory (and sometimes even sharing caches). In other words, the model of shared-memory multicomputers may be implemented using physically separate CPUs, multiple cores on a single CPU, or a combination of the above. While this model, illustrated in Fig. 8-1(a), sounds simple, actually implementing it is not really so simple and usually involves read more..

  • Page - 551

    520 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 introduction to the relevant hardware. Then we move on to the software, especially the operating system issues for that type of system. As we will see, in each case different issues are present and different approaches are needed. 8.1 MULTIPROCESSORS A shared-memory multiprocessor (or just multiprocessor henceforth) is a computer system in which read more..

  • Page - 552

    SEC. 8.1 MULTIPROCESSORS 521 CPU CPU M Shared memory Shared memory Bus (a) CPU CPU M Private memory (b) CPU CPU M (c) Cache Figure 8-2. Three bus-based multiprocessors. (a) Without caching. (b) With caching. (c) With caching and private memories. The solution to this problem is to add a cache to each CPU, as depicted in Fig. 8-2(b). The cache can be inside the CPU chip, next to the read more..

  • Page - 553

    522 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 memories is the crossbar switch, shown in Fig. 8-3. Crossbar switches have been used for decades in telephone switching exchanges to connect a group of incoming lines to a set of outgoing lines in an arbitrary way. At each intersection of a horizontal (incoming) and vertical (outgoing) line is a crosspoint. A crosspoint is a small read more..

  • Page - 554

    SEC. 8.1 MULTIPROCESSORS 523 Contention for memory is still possible, of course, if two CPUs want to access the same module at the same time. Nevertheless, by partitioning the memory into n units, contention is reduced by a factor of n compared to the model of Fig. 8-2. One of the worst properties of the crossbar switch is the fact that the number of crosspoints grows as read more..

  • Page - 555

    524 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 CPUs b b b b a aa a 3 Stages Memories 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 1A 1B 1C 1D 2A 2B 2C 2D 3A 3B 3C 3D Figure 8-5. An omega switching network. All the second-stage switches, including 2D, use the second bit for routing. This, too, is a 1, so the message is now forwarded via the lower output to 3D. Here the read more..

  • Page - 556

    SEC. 8.1 MULTIPROCESSORS 525 mostly accesses full 32-bit words. The 2 low-order bits will usually be 00, but the next 3 bits will be uniformly distributed. By using these 3 bits as the module num- ber, consecutively words will be in consecutive modules. A memory system in which consecutive words are in different modules is said to be interleaved. Inter- leaved memories read more..

  • Page - 557

    526 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 as shown in Fig. 8-6(a). Each node also holds the directory entries for the 218 64-byte cache lines comprising its 224-byte memory. For the moment, we will as- sume that a line can be held in at most one cache. Directory Node 0 Node 1 Node 255 (a) (b) Bits 818 6 (c) Interconnection network CPU Memory Local bus CPU Memory Local bus CPU read more..

  • Page - 558

    SEC. 8.1 MULTIPROCESSORS 527 Now let us consider a second request, this time asking about node 36’s line 2. From Fig. 8-6(c) we see that this line is cached at node 82. At this point the hard- ware could update directory entry 2 to say that the line is now at node 20 and then send a message to node 82 instructing it to pass the line to node 20 and invalidate read more..

  • Page - 559

    528 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 sure that if a word is present in two or more caches and one of the CPUs modifies the word, it is automatically and atomically removed from all the caches in order to maintain consistency. This process is known as snooping. The result of this design is that multicore chips are just very small multiproces- sors. In fact, multicore read more..

  • Page - 560

    SEC. 8.1 MULTIPROCESSORS 529 Consider, for instance, our directory-based cache-coherency solution discussed above. If each directory entry contains a bit vector to indicate which cores contain a particular cache line, the directory entry for a CPU with 1024 cores will be at least 128 bytes long. Since cache lines themselves are rarely larger than 128 bytes, this leads to the awkward read more..

  • Page - 561

    530 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 of processors in a single chip are collectively known as heterogeneous multicore processors. An example of a heterogeneous multicore processor is the line of IXP network processors originally introduced by Intel in 2000 and updated regularly with the latest technology. The network processors typically contain a single gener- al purpose control core read more..

  • Page - 562

    SEC. 8.1 MULTIPROCESSORS 531 8.1.2 Multiprocessor Operating System Types Let us now turn from multiprocessor hardware to multiprocessor software, in particular, multiprocessor operating systems. Various approaches are possible. Below we will study three of them. Note that all of these are equally applicable to multicore systems as well as systems with discrete CPUs. Each CPU Has Its Own read more..

  • Page - 563

    532 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Third, there is no sharing of physical pages. It can happen that CPU 1 has pages to spare while CPU 2 is paging continuously. There is no way for CPU 2 to borrow some pages from CPU 1 since the memory allocation is fixed. Fourth, and worst, if the operating system maintains a buffer cache of recently used disk blocks, each read more..

  • Page - 564

    SEC. 8.1 MULTIPROCESSORS 533 idle while another is overloaded. Similarly, pages can be allocated among all the processes dynamically and there is only one buffer cache, so inconsistencies never occur. The problem with this model is that with many CPUs, the master will become a bottleneck. After all, it must handle all system calls from all CPUs. If, say, 10% of all time is read more..

  • Page - 565

    534 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 example, there is no problem with one CPU running the scheduler while another CPU is handling a file-system call and a third one is processing a page fault. This observation leads to splitting the operating system up into multiple inde- pendent critical regions that do not interact with one another. Each critical region is protected by its read more..

  • Page - 566

    SEC. 8.1 MULTIPROCESSORS 535 touching the table. It can then do its work knowing that it will be able to finish without any other process sneaking in and touching the table before it is finished. On a multiprocessor, disabling interrupts affects only the CPU doing the disable. Other CPUs continue to run and can still touch the critical table. As a conse- quence, a proper read more..

  • Page - 567

    536 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 implement TSL correctly. This is why Peterson’s protocol was invented: to synchro- nize entirely in software (Peterson, 1981). If TSL is correctly implemented and used, it guarantees that mutual exclusion can be made to work. However, this mutual exclusion method uses a spin lock be- cause the requesting CPU just sits in a tight loop testing read more..

  • Page - 568

    SEC. 8.1 MULTIPROCESSORS 537 Another way to reduce bus traffic is to use the well-known Ethernet binary exponential backoff algorithm (Anderson, 1990). Instead of continuously polling, as in Fig. 2-25, a delay loop can be inserted between polls. Initially the delay is one instruction. If the lock is still busy, the delay is doubled to two instructions, then four instructions, and read more..

  • Page - 569

    538 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 ready list to pick a process to run. If the ready list is locked, the CPU cannot just decide to suspend what it is doing and run another process, as doing that would re- quire reading the ready list. It must wait until it can acquire the ready list. However, in other cases, there is a choice. For example, if some thread on a read more..

  • Page - 570

    SEC. 8.1 MULTIPROCESSORS 539 maximum of 2 msec, but observe how long it actually spun. If it fails to acquire a lock and sees that on the previous three runs it waited an average of 200 μsec, it should spin for 2 msec before switching. However, if it sees that it spun for the full 2 msec on each of the previous attempts, it should switch immediately and not spin read more..

  • Page - 571

    540 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 An example of the latter situation occurs regularly in program development en- vironments. Large systems often consist of some number of header files containing macros, type definitions, and variable declarations that are used by the actual code files. When a header file is changed, all the code files that include it must be re- compiled. The read more..

  • Page - 572

    SEC. 8.1 MULTIPROCESSORS 541 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 A B C D E F G H I J K L M N 7 5 4 2 1 0 Priority CPU 0 A 8 12 1 5 9 13 2 6 10 14 3 7 11 15 B C D E F G H I J K L M N 7 5 4 2 1 0 Priority CPU 4 goes idle CPU 12 goes idle 0 A 8 B 1 5 9 13 2 6 10 14 3 7 11 15 C D E F G H I J K L M N 7 5 4 2 333 666 1 0 Priority (a) (b) (c) Figure 8-12. Using a read more..

  • Page - 573

    542 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 keep a thread on the same CPU for its entire lifetime, cache affinity is maximized. However, if a CPU has no threads to run, it takes one from another CPU rather than go idle. Tw o-level scheduling has three benefits. First, it distributes the load roughly ev enly over the available CPUs. Second, advantage is taken of cache read more..

  • Page - 574

    SEC. 8.1 MULTIPROCESSORS 543 Periodically, scheduling decisions have to be made. In uniprocessor systems, shortest job first is a well-known algorithm for batch scheduling. The analogous al- gorithm for a multiprocessor is to choose the process needing the smallest number of CPU cycles, that is, the thread whose CPU-count × run-time is the smallest of the candidates. However, in read more..

  • Page - 575

    544 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 A 0 B 0 A 0 B 0 A 0 B 0 B 1 A 1 B 1 A 1 B 1 A 1 Thread A 0 running 0 100 200 300 400 500 600 CPU 0 CPU 1 Time Request 1 Request 2 Reply 2 Reply 1 Figure 8-14. Communication between two threads belonging to thread A that are running out of phase. The solution to this problem is gang scheduling, which is an outgrowth of co- scheduling read more..

  • Page - 576

    SEC. 8.2 MULTICOMPUTERS 545 0 1 2 3 4 5 6 7 01 2 3 4 5 A 0 B 0 B 1 D 1 E 2 A 1 B 1 D 1 E 2 A 1 A 2 B 2 D 2 E 3 A 2 B 2 D 2 E 3 A 3 D 3 E 4 A 3 C 0 D 3 E 4 C 1 D 4 E 5 A 4 C 1 D 4 E 5 C 2 E 0 E 6 A 5 C 2 E 0 E 6 C 0 A 4 A 5 D 0 E 1 A 0 B 0 D 0 E 1 CPU Time slot Figure 8-15. Gang scheduling. 8.2 MULTICOMPUTERS Multiprocessors are popular and attractive because read more..

  • Page - 577

    546 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 8.2.1 Multicomputer Hardware The basic node of a multicomputer consists of a CPU, memory, a network in- terface, and sometimes a hard disk. The node may be packaged in a standard PC case, but the monitor, keyboard, and mouse are nearly always absent. Sometimes this configuration is called a headless workstation because there is no user with read more..

  • Page - 578

    SEC. 8.2 MULTICOMPUTERS 547 As an alternative to the single-switch design, the nodes may form a ring, with two wires coming out the network interface card, one into the node on the left and one going into the node on the right, as shown in Fig. 8-16(b). In this topology, no switches are needed and none are shown. The grid or mesh of Fig. 8-16(c) is a two-dimensional read more..

  • Page - 579

    548 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 CPU 1 Input port (a) Output port Entire packet Entire packet Four-port switch C A CPU 2 Entire packet D B (b) C A D B (c) C A D B Figure 8-17. Store-and-forward packet switching. design a network in which a packet can be logically divided into smaller units. As soon as the first unit arrives at a switch, it can be forwarded, even before the read more..

  • Page - 580

    SEC. 8.2 MULTICOMPUTERS 549 constant rate. If the packet is in the main RAM, this continuous flow out onto the network cannot be guaranteed due to other traffic on the memory bus. Using a ded- icated RAM on the interface board eliminates this problem. This design is shown in Fig. 8-18. CPU CPU CPU CPU Switch Node 2 Main RAM Main RAM Node 4 Interface board Optional on- board CPU read more..

  • Page - 581

    550 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 However, having two CPUs means that they must synchronize to avoid race condi- tions, which adds extra overhead and means more work for the operating system. Copying data across layers is safe, but not necessarily efficient. For instance, a brower requesting data from a remote web server will create a request in the brow- ser’s address read more..

  • Page - 582

    SEC. 8.2 MULTICOMPUTERS 551 buffer on the interface board, and then, due to a time slice, B runs and claims the same buffer, disaster results. Some kind of synchronization mechanism is needed, but these mechanisms, such as mutexes, work only when the processes are as- sumed to be cooperating. In a shared environment with multiple users all in a hurry to get their work done, read more..

  • Page - 583

    552 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 packet to it, not only will the incoming packet be lost, but also a page of innocent memory will be ruined, probably with disastrous consequences shortly. These problems can be avoided by having system calls to pin and unpin pages in memory, marking them as temporarily unpageable. However, having to make a system call to pin the page read more..

  • Page - 584

    SEC. 8.2 MULTICOMPUTERS 553 8.2.3 User-Level Communication Software Processes on different CPUs on a multicomputer communicate by sending messages to one another. In the simplest form, this message passing is exposed to the user processes. In other words, the operating system provides a way to send and receive messages, and library procedures make these underlying calls available to read more..

  • Page - 585

    554 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 the receiver can specify from whom it wishes to receive, in which case it remains blocked until a message from that sender arrives. Sender blocked Sender blocked Trap to kernel, sender blocked Message being sent Message being sent Sender running Sender running Return Sender running Sender running Trap Message copied to a kernel buffer Return from kernel, read more..

  • Page - 586

    SEC. 8.2 MULTICOMPUTERS 555 course, the message will not yet have been sent, but the sender is not hindered by this fact. The disadvantage of this method is that every outgoing message has to be copied from user space to kernel space. With many network interfaces, the mes- sage will have to be copied to a hardware transmission buffer later anyway, so the first copy is read more..

  • Page - 587

    556 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 for incoming messages using a procedure, poll, that tells whether any messages are waiting. If so, the called can call get message, which returns the first arrived mes- sage. In some systems, the compiler can insert poll calls in the code at appropriate places, although knowing how often to poll is tricky. Yet another option is a scheme read more..

  • Page - 588

    SEC. 8.2 MULTICOMPUTERS 557 represents the server procedure in the client’s address space. Similarly, the server is bound with a procedure called the server stub. These procedures hide the fact that the procedure call from the client to the server is not local. The actual steps in making an RPC are shown in Fig. 8-20. Step 1 is the client calling the client stub. This read more..

  • Page - 589

    558 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 space. With RPC, passing pointers is impossible because the client and server are in different address spaces. In some cases, tricks can be used to make it possible to pass pointers. Suppose that the first parameter is a pointer to an integer, k. The client stub can marshal k and send it along to the server. The server stub then read more..

  • Page - 590

    SEC. 8.2 MULTICOMPUTERS 559 many of the issues and complications in distributed systems. Moreover, the idea it- self has been very influential. With DSM, each page is located in one of the mem- ories of Fig. 8-1(b). Each machine has its own virtual memory and page tables. When a CPU does a LOAD or STORE on a page it does not have, a trap to the oper- ating system read more..

  • Page - 591

    560 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 and the DSM software fetches the page containing the address and restarts the faulting instruction, which now completes successfully. This concept is illustrated in Fig. 8-22(a) for an address space with 16 pages and four nodes, each capable of holding six pages. Globally shared virtual memory consisting of 16 pages Memory Network (a) (b) (c) 0 1 read more..

  • Page - 592

    SEC. 8.2 MULTICOMPUTERS 561 Replication One improvement to the basic system that can improve performance consid- erably is to replicate pages that are read only, for example, program text, read-only constants, or other read-only data structures. For example, if page 10 in Fig. 8-22 is a section of program text, its use by CPU 0 can result in a copy being sent to CPU 0 read more..

  • Page - 593

    562 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 CPU 1 Code using variable A A B Shared page CPU 2 Code using variable B A B Network A and B are unrelated shared variables that just happen to be on the same page Figure 8-23. False sharing of a page containing two unrelated variables. The problem here is that although the variables are unrelated, they appear by accident on the same page, read more..

  • Page - 594

    SEC. 8.2 MULTICOMPUTERS 563 all other CPUs holding a copy of the page telling them to unmap and discard the page. After all of them have replied that the unmap has finished, the original CPU can now do the write. It is also possible to tolerate multiple copies of writable pages under carefully restricted circumstances. One way is to allow a process to acquire a lock on read more..

  • Page - 595

    564 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 once a process has been assigned to a node, the decision about which process should go on which node is important. This is in contrast to multiprocessor sys- tems, in which all processes live in the same memory and can be scheduled on any CPU at will. Consequently, it is worth looking at how processes can be assigned to nodes in read more..

  • Page - 596

    SEC. 8.2 MULTICOMPUTERS 565 G H I A E F B C D Node 1 Node 2 32 3 5 5 8 1 2 4 42 3 6 2 1 4 Node 3 G H I A E F B C D Node 1 Node 2 32 3 5 5 8 1 2 4 42 3 6 2 1 4 Node 3 Traffic between D and I Process Figure 8-24. Tw o ways of allocating nine processes to three nodes. A Sender-Initiated Distributed Heuristic Algorithm Now let us look at some distributed algorithms. One read more..

  • Page - 597

    566 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Eager et al. constructed an analytical queueing model of this algorithm. Using this model, it was established that the algorithm behaves well and is stable under a wide range of parameters, including various threshold values, transfer costs, and probe limits. Nevertheless, it should be observed that under conditions of heavy load, all machines will read more..

  • Page - 598

    SEC. 8.3 DISTRIBUTED SYSTEMS 567 each node has its own private memory, with no shared physical memory in the sys- tem. However, distributed systems are even more loosely coupled than multicom- puters. To start with, each node of a multicomputer generally has a CPU, RAM, a net- work interface, and possibly a disk for paging. In contrast, each node in a distrib- uted system read more..

  • Page - 599

    568 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Typical Internet applications include access to remote computers (using telnet, ssh,and rlogin), access to remote information (using the World Wide Web and FTP, the File Transfer Protocol), person-to-person communication (using email and chat programs), and many emerging applications (e.g., e-commerce, telemedicine, and distance learning). The trouble with read more..

  • Page - 600

    SEC. 8.3 DISTRIBUTED SYSTEMS 569 Pentium Windows Middleware Middleware Middleware Middleware Application Pentium Linux Application SPARC Solaris Application Mac OS Application Macintosh Common base for applications Network Figure 8-27. Positioning of middleware in a distributed system. Networks), which can be citywide, countrywide, or worldwide. The most impor- tant kind of LAN is Ethernet, so we will read more..

  • Page - 601

    570 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Computer Ethernet Switch Computer Ethernet (b) (a) Vampire tap Figure 8-28. (a) Classic Ethernet. (b) Switched Ethernet. With many computers hooked up to the same cable, a protocol is needed to prevent chaos. To send a packet on an Ethernet, a computer first listens to the cable to see if any other computer is currently transmitting. If not, read more..

  • Page - 602

    SEC. 8.3 DISTRIBUTED SYSTEMS 571 The Internet The Internet evolved from the ARPANET, an experimental packet-switched network funded by the U.S. Dept. of Defense Advanced Research Projects Agency. It went live in December 1969 with three computers in California and one in Utah. It was designed at the height of the Cold War to a be a highly fault-tolerant net- work that would read more..

  • Page - 603

    572 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Backbone High-bandwidth fiber Router at ISP ADSL line to home PC Home PC Medium- bandwidth fiber Router Host Ethernet Fiber or copper wire Local router Regional network Figure 8-29. A portion of the Internet. 8.3.2 Network Services and Protocols All computer networks provide certain services to their users (hosts and proc- esses), which they implement using read more..

  • Page - 604

    SEC. 8.3 DISTRIBUTED SYSTEMS 573 Each service can be characterized by a quality of service. Some services are reliable in the sense that they nev er lose data. Usually, a reliable service is imple- mented by having the receiver confirm the receipt of each message by sending back a special acknowledgement packet so the sender is sure that it arrived. The acknowledgement process read more..

  • Page - 605

    574 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Service Reliable message stream Reliable byte stream Unreliable connection Unreliable datagram Acknowledged datagram Request-reply Example Sequence of pages of a book Remote login Digitized voice Network test packets Registered mail Database query Connection-oriented Connectionless Figure 8-30. Six different types of network service. Network Protocols All networks have read more..

  • Page - 606

    SEC. 8.3 DISTRIBUTED SYSTEMS 575 in the range 0–255 separated by dots, as in 192.31.231.65. When a packet arrives at a router, the router extracts the IP destination address and uses that for routing. Since IP datagrams are not acknowledged, IP alone is not sufficient for reliable communication in the Internet. To provide reliable communication, another proto- col, TCP read more..

  • Page - 607

    576 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 addresses are of the form user-name@DNS-host-name. This naming system al- lows the mail program on the sending host to look up the destination host’s IP ad- dress in the DNS database, establish a TCP connection to the mail daemon process there, and send the message as a file. The user-name is sent along to identify which mailbox to put read more..

  • Page - 608

    SEC. 8.3 DISTRIBUTED SYSTEMS 577 The way the whole system hangs together is as follows. The Web is fundamen- tally a client-server system, with the user being the client and the Website being the server. When the user provides the browser with a URL, either by typing it in or clicking on a hyperlink on the current page, the browser takes certain steps to fetch the read more..

  • Page - 609

    578 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 read, the file is then read locally, for high performance. If the file is to be written, it is written locally. When the process is done with it, the updated file is put back on the server. With the remote-access model, the file stays on the server and the cli- ent sends commands there to get work done there, as shown in Fig. read more..

  • Page - 610

    SEC. 8.3 DISTRIBUTED SYSTEMS 579 A Root B C A D BC File server 1 Client 1 EF Root A D BC Client 1 EF D Root E F A D BC File server 2 Client 2 EF Root A D BC Client 2 E (a) (b) (c) F Figure 8-34. (a) Two file servers. The squares are directories and the circles are files. (b) A system in which all clients have the same view of the file system. (c) A system in which read more..

  • Page - 611

    580 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 that x is located on server 1, but it does not tell where that server is located. The server is free to move anywhere it wants to in the network without the path name having to be changed. Thus this system has location transparency. However, suppose that file x is extremely large and space is tight on server 1. Furthermore, read more..

  • Page - 612

    SEC. 8.3 DISTRIBUTED SYSTEMS 581 a b a b c A B Single processor Original file 1. Write "c" 2. Read gets "abc" (a) (b) a b a b a b c A Client 1 1. Read "ab" 2. Write "c" File server 3. Read gets "ab" Client 2 a b B Figure 8-35. (a) Sequential consistency. (b) In a distributed system with cach- ing, reading a file may return an obsolete read more..

  • Page - 613

    582 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 upload/download model shown in Fig. 8-33. This semantic rule is widely imple- mented and is known as session semantics. Using session semantics raises the question of what happens if two or more cli- ents are simultaneously caching and modifying the same file. One solution is to say that as each file is closed in turn, its value is sent read more..

  • Page - 614

    SEC. 8.3 DISTRIBUTED SYSTEMS 583 When a CORBA object is created, a reference to it is also created and returned to the creating process. This reference is how the process identifies the object for subsequent invocations of its methods. The reference can be passed to other proc- esses or stored in an object directory. To inv oke a method on an object, a client process read more..

  • Page - 615

    584 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 A serious problem with CORBA is that every object is located on only one ser- ver, which means the performance will be terrible for objects that are heavily used on client machines around the world. In practice, CORBA functions acceptably only in small-scale systems, such as to connect processes on one computer, one LAN, or within a read more..

  • Page - 616

    SEC. 8.3 DISTRIBUTED SYSTEMS 585 Tuples are retrieved from the tuple space by the in primitive. They are ad- dressed by content rather than by name or address. The fields of in can be expres- sions or formal parameters. Consider, for example, in("abc", 2, ?i); This operation ‘‘searches’’ the tuple space for a tuple consisting of the string ‘‘abc’’, the inte ger 2, read more..

  • Page - 617

    586 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 is also a primitive eval, which causes its parameters to be evaluated in parallel and the resulting tuple to be put in the tuple space. This mechanism can be used to per- form an arbitrary computation. This is how parallel processes are created in Linda. Publish/Subscribe Our next example of a coordination-based model was inspired by Linda read more..

  • Page - 618

    SEC. 8.3 DISTRIBUTED SYSTEMS 587 necessary to store old tuples in case they are needed later. One way to store them is to hook up a database system to the system and have it subscribe to all tuples. This can be done by wrapping the database system in an adapter, to allow an existing database to work with the publish/subscribe model. As tuples come by, the adapter read more..

  • Page - 619

    588 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 Much systems research in recent years has also gone into making large appli- cations scale to multicore and multiprocessor environments. One example is the scalable database engine described by Salomie et al. (2011). Again, the solution is to achieve scalability by replicating the database rather than trying to hide the par- allel nature of read more..

  • Page - 620

    SEC. 8.5 SUMMARY 589 often put on top of the operating system to provide a uniform layer for applications to interact with. The various kinds include document-based, file-based, ob- ject-based, and coordination-based middleware. Some examples are the World Wide Web, CORBA, and Linda. PROBLEMS 1. Can the USENET newsgroup system or the SETI@home project be considered distrib- uted systems? read more..

  • Page - 621

    590 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 consists of 16 32-bit words, each of which requires one bus cycle to transfer, and the bus runs at 400 MHz, what fraction of the bus bandwidth is eaten up by moving the cache block back and forth? 11. In the text, it was suggested that a binary exponential backoff algorithm be used be- tween uses of TSL to poll a lock. It was read more..

  • Page - 622

    CHAP. 8 PROBLEMS 591 21. When a procedure is scooped up from one machine and placed on another to be called by RPC, some problems can occur. In the text, we pointed out four of these: pointers, unknown array sizes, unknown parameter types, and global variables. An issue not discussed is what happens if the (remote) procedure executes a system call. What prob- lems might read more..

  • Page - 623

    592 MULTIPLE PROCESSOR SYSTEMS CHAP. 8 35. Copying buffers takes time. Write a C program to find out how much time it takes on a system to which you have access. Use the clock or times functions to determine how long it takes to copy a large array. Test with different array sizes to separate copying time from overhead time. 36. Write C functions that could be used read more..

  • Page - 624

    9 SECURITY Many companies possess valuable information they want to guard closely. Among many things, this information can be technical (e.g., a new chip design or software), commercial (e.g., studies of the competition or marketing plans), finan- cial (e.g., plans for a stock offering) or legal (e.g., documents about a potential merger or takeover). Most of this information is stored read more..

  • Page - 625

    594 SECURITY CHAP. 9 others’ hair. If Tracy and Camille were both registered users of the same computer the trick was to make sure that neither could read or tamper with the other’s files, yet allow them to share those files they wanted shared. Elaborate models and mechanisms were developed to make sure no user could get access rights he or she was not entitled to. read more..

  • Page - 626

    SEC. 9.1 THE SECURITY ENVIRONMENT 595 Phrased differently: while Camille may think she is the only user on the computer, she really is not alone at all! Attackers may launch exploits manually or automatically, by means of a virus or a worm. The difference between a virus and worm is not always very clear. Most people agree that a virus needs at least some user interaction read more..

  • Page - 627

    596 SECURITY CHAP. 9 the term protection mechanisms to refer to the specific operating system mechan- isms used to safeguard information in the computer. The boundary between them is not well defined, however. First we will look at security threats and attackers to see what the nature of the problem is. Later on in the chapter we will look at the protection mechanisms and read more..

  • Page - 628

    SEC. 9.1 THE SECURITY ENVIRONMENT 597 have. Even so, the original three still have a special place in the hearts and minds of most (elderly) security experts. Systems are under constant threat from attackers. For instance, an attacker may sniff the traffic on a local area network and break the confidentiality of the infor- mation, especially if the communication protocol does read more..

  • Page - 629

    598 SECURITY CHAP. 9 consist of thousands (and sometimes millions) of compromised computers—often normal computers of innocent and ignorant users. There are all-too-many ways in which attackers can compromise a user’s machine. For instance, they may offer free, but malicious versions of popular software. The sad truth is that the promise of free (‘‘cracked’’) versions of expensiv e read more..

  • Page - 630

    SEC. 9.1 THE SECURITY ENVIRONMENT 599 9.1.2 Attackers Most people are pretty nice and obey the law, so why worry about security? Because there are unfortunately a few people around who are not so nice and want to cause trouble (possibly for their own commercial gain). In the security litera- ture, people who are nosing around places where they hav e no business being are read more..

  • Page - 631

    600 SECURITY CHAP. 9 In general, we distinguish between attacks that passively try to steal infor- mation and attacks that actively try to make a computer program misbehave. An example of a passive attack is an adversary that sniffs the network traffic and tries to break the encryption (if any) to get to the data. In an active attack, the intruder may take control of a read more..

  • Page - 632

    SEC. 9.2 OPERATING SYSTEMS SECURITY 601 Here are two fairly simple examples. The first email systems sent messages as ASCII text. They were simple and could be made fairly secure. Unless there are really dumb bugs in the email program, there is little an incoming ASCII message can do to damage a computer system (we will actually see some attacks that may be possible later read more..

  • Page - 633

    602 SECURITY CHAP. 9 An important part of the TCB is the reference monitor, as shown in Fig. 9-2. The reference monitor accepts all system calls involving security, such as opening files, and decides whether they should be processed or not. The reference monitor thus allows all the security decisions to be put in one place, with no possibility of bypassing it. Most operating read more..

  • Page - 634

    SEC. 9.3 CONTROLLING ACCESS TO RESOURCES 603 9.3.1 Protection Domains A computer system contains many resources, or ‘‘objects,’’ that need to be pro- tected. These objects can be hardware (e.g., CPUs, memory pages, disk drives, or printers) or software (e.g., processes, files, databases, or semaphores). Each object has a unique name by which it is referenced, and a finite set read more..

  • Page - 635

    604 SECURITY CHAP. 9 To make the idea of a protection domain more concrete, let us look at UNIX (including Linux, FreeBSD, and friends). In UNIX, the domain of a process is de- fined by its UID and GID. When a user logs in, his shell gets the UID and GID contained in his entry in the password file and these are inherited by all its chil- dren. Given any (UID, read more..

  • Page - 636

    SEC. 9.3 CONTROLLING ACCESS TO RESOURCES 605 This situation models executing a SETUID program in UNIX. No other domain switches are permitted in this example. Object Domain2 Domain3 Domain1 Enter Printer1 Plotter2 Domain 1 2 3 File1 File2 File3 File4 File5 File6 Read Read Read Write Read Write Read Write Execute Read Write Execute Write Write Write Figure 9-5. A protection matrix with domains as read more..

  • Page - 637

    606 SECURITY CHAP. 9 A B C Process Owner F1 A: RW; B: R F2 A: R; B:RW; C:R F3 B:RWX; C: RX File User space Kernel space ACL Figure 9-6. Use of access control lists to manage file access. This example illustrates the most basic form of protection with ACLs. More sophisticated systems are often used in practice. To start with, we have shown only read more..

  • Page - 638

    SEC. 9.3 CONTROLLING ACCESS TO RESOURCES 607 File Access control list Password tana, sysadm: RW Pigeon data bill, pigfan: RW; tana, pigfan: RW; ... Figure 9-7. Tw o access control lists. names and/or passwords to keep them separate. The point of this scheme is to pre- vent Tana from accessing the password file when she currently has her pigeon fancier’s hat on. She can do read more..

  • Page - 639

    608 SECURITY CHAP. 9 done is edit the ACL to make the change. However, if the ACL is checked only when a file is opened, most likely the change will take effect only on future calls to open . Any file that is already open will continue to have the rights it had when it was opened, even if the user is no longer authorized to access the file. 9.3.3 read more..

  • Page - 640

    SEC. 9.3 CONTROLLING ACCESS TO RESOURCES 609 The second way is to keep the C-list inside the operating system. Capabilities are then referred to by their position in the capability list. A process might say: ‘‘Read 1 KB from the file pointed to by capability 2.’’ This form of addressing is similar to using file descriptors in UNIX. Hydra (Wulf et al., 1974) worked read more..

  • Page - 641

    610 SECURITY CHAP. 9 the fourth field. Note that the original Check value is used because other outstand- ing capabilities depend on it. This new capability is sent back to the requesting process. The user can now give this to a friend by just sending it in a message. If the friend turns on rights bits that should be off, the server will detect this when the read more..

  • Page - 642

    SEC. 9.3 CONTROLLING ACCESS TO RESOURCES 611 other hand, ACLs allow selective rev ocation of rights, which capabilities do not. Finally, if an object is removed and the capabilities are not or vice versa, problems arise. ACLs do not suffer from this problem. Most users are familiar with ACLs, because they are common in operating sys- tems like Windows and UNIX. However, read more..

  • Page - 643

    612 SECURITY CHAP. 9 Compiler Mailbox 7 Objects Secret Read Execute Read Execute Read Write Read Execute Read Write Eric Henry Robert Compiler Mailbox 7 Objects Secret Read Execute Read Execute Read Write Read Read Execute Read Write Eric Henry Robert (a) (b) Figure 9-10. (a) An authorized state. (b) An unauthorized state. It should now be clear that the set of all possible matrices can be read more..

  • Page - 644

    SEC. 9.4 FORMAL MODELS OF SECURE SYSTEMS 613 The Bell-LaPadula Model The most widely used multilevel security model is the Bell-LaPadula model so we will start there (Bell and LaPadula, 1973). This model was designed for handling military security, but it is also applicable to other organizations. In the military world, documents (objects) can have a security level, such as read more..

  • Page - 645

    614 SECURITY CHAP. 9 5 2 6 4 3 1 D E C A B 4 3 2 1 Security level Legend Process Object Read Write Figure 9-11. The Bell-LaPadula multilevel security model. a path that moves information downward, thus guaranteeing the security of the model. The Bell-LaPadula model refers to organizational structure, but ultimately has to be enforced by the operating system. One way this could be done read more..

  • Page - 646

    SEC. 9.4 FORMAL MODELS OF SECURE SYSTEMS 615 1. The simple integrity property: A process running at security level k can write only objects at its level or lower (no write up). 2. The integrity * property: A process running at security level k can read only objects at its level or higher (no read down). Together, these properties ensure that the programmer can update the read more..

  • Page - 647

    616 SECURITY CHAP. 9 (a) (b) Client Server Collaborator Kernel Kernel Encapsulated server Covert channel Figure 9-12. (a) The client, server, and collaborator processes. (b) The encapsu- lated server can still leak to the collaborator via covert channels. a 1 bit, it computes as hard as it can for a fixed interval of time. To send a 0 bit, it goes to sleep for the same length read more..

  • Page - 648

    SEC. 9.4 FORMAL MODELS OF SECURE SYSTEMS 617 1 1 0 1 0 1 0 0 Server Server locks file to send 1 Time Collaborator Server unlocks file to send 0 Bit stream sent Figure 9-13. A covert channel using file locking. another bit is present in S. Since timing is no longer involved, this protocol is fully reliable, even in a busy system, and can proceed as fast as the two processes read more..

  • Page - 649

    618 SECURITY CHAP. 9 As a case in point, consider Fig. 9-14(a). This photograph, taken by the author in Kenya, contains three zebras contemplating an acacia tree. Fig. 9-14(b) appears to be the same three zebras and acacia tree, but it has an extra added attraction. It contains the complete, unabridged text of fiv e of Shakespeare’s plays embedded in it: Hamlet, King Lear, read more..

  • Page - 650

    SEC. 9.4 FORMAL MODELS OF SECURE SYSTEMS 619 Viewing the two images in black and white with low resolution does not do justice to how powerful the technique is. To get a better feel for how steganogra- phy works, one of the authors (AST) has prepared a demonstration for Windows systems, including the full-color image of Fig. 9-14(b) with the fiv e plays embed- ded in read more..

  • Page - 651

    620 SECURITY CHAP. 9 network packets, and most operating systems scramble passwords to prevent at- tackers from recovering them. Moreover, in Sec. 9.6, we will discuss the role of en- cryption in another important aspect of security: authentication. We will look at the basic primitives used by these systems. However, a serious discussion of cryptography is beyond the scope of read more..

  • Page - 652

    SEC. 9.5 BASICS OF CRYPTOGRAPHY 621 E K E Encryption key Decryption key P P Plaintext in Plaintext out Encryption algorithm D K D Decryption algorithm Ciphertext C = E(P, K E) P = D(C, K D) Decryption Encryption Figure 9-15. Relationship between the plaintext and the ciphertext. plaintext ATTA CK would be transformed into the ciphertext QZZQEA. The de- cryption key tells how to get back read more..

  • Page - 653

    622 SECURITY CHAP. 9 system has the property that distinct keys are used for encryption and decryption and that given a well-chosen encryption key, it is virtually impossible to discover the corresponding decryption key. Under these circumstances, the encryption key can be made public and only the private decryption key kept secret. Just to give a feel for public-key read more..

  • Page - 654

    SEC. 9.5 BASICS OF CRYPTOGRAPHY 623 9.5.4 Digital Signatures Frequently it is necessary to sign a document digitally. For example, suppose a bank customer instructs the bank to buy some stock for him by sending the bank an email message. An hour after the order has been sent and executed, the stock crashes. The customer now denies ever having sent the email. The bank can read more..

  • Page - 655

    624 SECURITY CHAP. 9 public-key cryptography only to a relatively small piece of data, the hash. Note carefully that this method works only if for all x E (D (x)) = x It is not guaranteed a priori that all encryption functions will have this property since all that we originally asked for was that D (E (x)) = x that is, E is the encryption function and D is the read more..

  • Page - 656

    SEC. 9.5 BASICS OF CRYPTOGRAPHY 625 Many computers already have TPM chips and many more are likely to have them in the future. TPM is extremely controversial because different parties have different ideas about who will control the TPM and what it will protect from whom. Microsoft has been a big advocate of this concept and has developed a series of technologies to use read more..

  • Page - 657

    626 SECURITY CHAP. 9 First, the challenging party creates an unpredictable value of, for example, 160 bits. This value, known as a nonce, is simply a unique identifier for this verifica- tion request. It serves to prevent an attacker from recording the response to one re- mote attestation request, changing the configuration on the attesting party and then simply replaying the read more..

  • Page - 658

    SEC. 9.6 AUTHENTICATION 627 have a login procedure, but more sophisticated personal computer operating sys- tems, such as Linux and Windows 8, do (although foolish users can disable it). Machines on corporate LANs almost always have a login procedure configured so that users cannot bypass it. Finally, many people nowadays (indirectly) log into re- mote computers to do Internet read more..

  • Page - 659

    628 SECURITY CHAP. 9 LOGIN: mitch LOGIN: carol LOGIN: carol PASSWORD: FooBar!-7 INVALID LOGIN NAME PASSWORD: Idunno SUCCESSFUL LOGIN LOGIN: INVALID LOGIN LOGIN: (a) (b) (c) Figure 9-17. (a) A successful login. (b) Login rejected after name is entered. (c) Login rejected after name and password are typed. feedback about whether the login name itself is valid. All she learns is that the read more..

  • Page - 660

    SEC. 9.6 AUTHENTICATION 629 Lest anyone think that better-quality users pick better-quality passwords, rest assured that they do not. When in 2012, 6.4 million LinkedIn (hashed) passwords leaked to the Web after a hack, many people had fun analyzing the results. The most popular password was ‘‘password’’. The second most popular was ‘‘123456’’ (‘‘1234’’, ‘‘12345’ ’, and read more..

  • Page - 661

    630 SECURITY CHAP. 9 otherwise weak password enable attackers to harvest a large number of accounts, sometimes with full administrator rights. UNIX Password Security Some (older) operating systems keep the password file on the disk in unen- crypted form, but protected by the usual system protection mechanisms. Having all the passwords in a disk file in unencrypted form is just read more..

  • Page - 662

    SEC. 9.6 AUTHENTICATION 631 users, Bobbie, Tony, Laura, Mark, and Deborah. Each user has one line in the file, with three entries separated by commas: login name, salt, and encrypted password + salt. The notation e(Dog, 4238) represents the result of concatenating Bobbie’s password, Dog, with her randomly assigned salt, 4238, and running it through the encryption function, e. It read more..

  • Page - 663

    632 SECURITY CHAP. 9 The algorithm is based on a one-way function, that is, a function y = f (x)that has the property that given x it is easy to find y, but given y it is computationally infeasible to find x. The input and output should be the same length, for example, 256 bits. The user picks a secret password that he memorizes. He also picks an integer, n, which read more..

  • Page - 664

    SEC. 9.6 AUTHENTICATION 633 1. Who is Marjolein’s sister? 2. On what street was your elementary school? 3. What did Mrs. Ellis teach? At login, the server asks one of them at random and checks the answer. To make this scheme practical, though, many question-answer pairs would be needed. Another variation is challenge-response. When this is used, the user picks an algorithm when read more..

  • Page - 665

    634 SECURITY CHAP. 9 contains the user’s password (e.g., PIN code) so the terminal can perform an identi- ty check even if the link to the main computer is down. Typically the password is encrypted by a key known only to the bank. These cards cost about $0.10 to $0.50, depending on whether there is a hologram sticker on the front and the production volume. As a way read more..

  • Page - 666

    SEC. 9.6 AUTHENTICATION 635 Smart cards have many other potentially valuable uses (e.g., encoding the bearer’s allergies and other medical conditions in a secure way for use in emergen- cies), but this is not the place to tell that story. Our interest here is how they can be used for secure login authentication. The basic concept is simple: a smart card is a small, read more..

  • Page - 667

    636 SECURITY CHAP. 9 the card is used, new software is installed on it. A disadvantage of this approach is that it makes an already slow card even slower, but as technology improves, this method is very flexible. Another disadvantage of smart cards is that a lost or stolen one may be subject to a side-channel attack, for example a power analysis attack. By observing the read more..

  • Page - 668

    SEC. 9.6 AUTHENTICATION 637 has a device like the one of Fig. 9-20. The user inserts his hand into it, and the length of all his fingers is measured and checked against the database. Spring Pressure plate Figure 9-20. A device for measuring finger length. Finger-length measurements are not perfect, however. The system can be at- tacked with hand molds made out of plaster of read more..

  • Page - 669

    638 SECURITY CHAP. 9 used. Amsterdam Airport has been using iris recognition technology since 2001 to enable frequent travelers to bypass the normal immigration line. A somewhat different technique is signature analysis. The user signs his name with a special pen connected to the computer, and the computer compares it to a known specimen stored online or on a smart card. Even read more..

  • Page - 670

    SEC. 9.7 EXPLOITING SOFTWARE 639 9.7 EXPLOITING SOFTWARE One of the main ways to break into a user’s computer is by exploiting vulnera- bilities in the software running on the system to make it do something different than the programmer intended. For instance, a common attack is to infect a user’s browser by means of a drive-by-download. In this attack, the cybercriminal read more..

  • Page - 671

    640 SECURITY CHAP. 9 Although every exploit involves a specific bug in a specific program, there are several general categories of bugs that occur over and over and are worth studying to see how attacks work. In the following sections we will examine not only a number of these methods, but also countermeasures to stop them, and counter countermeasures to evade these measures, read more..

  • Page - 672

    SEC. 9.7 EXPLOITING SOFTWARE 641 it easier to search the log later). Assume that function A is part of a privileged proc- ess, for instance a program that is SETUID root. An attacker who is able to take control of such a process, essentially has root privileges himself. The code above has a severe bug, although it may not be immediately obvious. The problem is caused by read more..

  • Page - 673

    642 SECURITY CHAP. 9 not represent a valid code address. As soon as the function A returns, the progam would try to jump to an invalid target—something the system would not like at all. In most cases, the program would crash immediately. Now assume that this is not a benign user who provides an overly long mes- sage by mistake, but an attacker who provides a tailored read more..

  • Page - 674

    SEC. 9.7 EXPLOITING SOFTWARE 643 In the past, miners therefore brought canaries into the mine as early warning sys- tems. Any build up of toxic gases would kill the canary before harming its owner. If your bird died, it was probably time to go up. Modern computer systems still use (digital) canaries as early warning systems. The idea is very simple. At places where the read more..

  • Page - 675

    644 SECURITY CHAP. 9 Let us suppose the system uses stack canaries. How could we possibly change the return address? The trick is that when the attacker overflows buffer B, he does not try to hit the return address immediately. Instead, he modifies the variable len that is located just above it on the stack. In line 9, len serves as an offset that deter- mines where read more..

  • Page - 676

    SEC. 9.7 EXPLOITING SOFTWARE 645 schemes. A generic name for this security measure is DEP (Data Execution Pre- vention). Some hardware does not support the NX bit. In that case, DEP still works but the enforcement takes place in software. DEP prevents all of the attacks discussed so far. The attacker can inject as much shellcode into the process as much as he wants. Unless read more..

  • Page - 677

    646 SECURITY CHAP. 9 to? Since the attacker has control over the stack, he can again make the code return anywhere he wants to. Moreover, after he has done it twice, he may as well do it three times, or four, or ten, etc. Thus, the trick of return-oriented programming is to look for small sequences of code that (a) do something useful, and (b) end with a return read more..

  • Page - 678

    SEC. 9.7 EXPLOITING SOFTWARE 647 good enough for the job. For instance, Fig. 9-23(b) suggests that gadget A has a check as part of the instruction sequence. The attacker may not care for the check at all, but since it is there, he will have to accept it. For most purposes, it is perhaps good enough to pop any nonnegative number into register 1. The next gadget pops read more..

  • Page - 679

    648 SECURITY CHAP. 9 A more important attack against ASLR is formed by memory disclosures. In this case, the attacker uses one vulnerability not to take control of the program di- rectly, but rather to leak information abour the memory layout, which he can then use to exploit a second vulnerability. As a trivial example, consider the following code: 01. void C( ) { 02. int read more..

  • Page - 680

    SEC. 9.7 EXPLOITING SOFTWARE 649 10. } else 11. printf ("Sorry %s, but you are not authorized.\n"); 12. } 13. } The code is meant to do an authorization check. Only users with the right cre- dentials are allowed to see the top secret data. The function check credentials is not a function from the C library, but we assume that it exists somewhere in the program read more..

  • Page - 681

    650 SECURITY CHAP. 9 char *s="Hello Wor ld"; pr intf("%s", s); In this program, the character string variable s is declared and initialized to a string consisting of ‘‘Hello World’’ and a zero-byte to indicate the end of the string. The call to the function printf has two arguments, the format string ‘‘%s’’, which instructs it to print a string, and the read more..

  • Page - 682

    SEC. 9.7 EXPLOITING SOFTWARE 651 When this program is compiled and run, the output it produces on the screen is: Hello wor ld i=6 Note that the variable i has been modified by a call to printf, something not ob- vious to everyone. While this feature is useful once in a blue moon, it means that printing a format string can cause a word—or many words—to be stored read more..

  • Page - 683

    652 SECURITY CHAP. 9 ¨AAAA %08x %08x [...] %08x %n¨ Buffer B first parameter to printf (pointer to format string) stack frame of printf Figure 9-24. A format string attack. By using exactly the right number of %08x, the attacker can use the first four characters of the format string as an address. string (stored in buffer B) itself. In other words, printf will then use read more..

  • Page - 684

    SEC. 9.7 EXPLOITING SOFTWARE 653 to a newly allocated chunk of memory. Later, when the program no longer needs it, it calls free to release the memory. A dangling pointer error occurs when the pro- gram accidentally uses the memory after it has already freed it. Consider the fol- lowing code that discriminates against (really) old people: 01. int *A = (int *) malloc (128); read more..

  • Page - 685

    654 SECURITY CHAP. 9 starts executing to handle a system call, it will run in the process’ address space. On a 32-bit system, user space occupies the bottom 3 GB of the address space and the kernel the top 1 GB. The reason for this cohabitation is efficiency—switching between address spaces is expensive. Normally this arrangement does not cause any problems. The situation read more..

  • Page - 686

    SEC. 9.7 EXPLOITING SOFTWARE 655 This ability to cause undetected numerical overflows can be turned into an at- tack. One way to do this is to feed a program two valid (but large) parameters in the knowledge that they will be added or multiplied and result in an overflow. For example, some graphics programs have command-line parameters giving the height and width of an read more..

  • Page - 687

    656 SECURITY CHAP. 9 user types in ‘‘abc’’ and ‘‘xyz’’ respectively, then the command that the shell will execute is cp abc xyz which indeed copies the file. Unfortunately this code opens up a gigantic security hole using a technique called command injection. Suppose that the user types ‘‘abc’’ and ‘‘xyz; rm –rf /’’ instead. The command that is constructed read more..

  • Page - 688

    SEC. 9.7 EXPLOITING SOFTWARE 657 To prevent this, the program performs a check to make sure the user has write access to the file by means of the access system call. The call checks the actual file (i.e., if it is a symbolic link, it will be dereferenced), returning 0 if the re- quested access is allowed and an error value of -1 otherwise. Moreover, the check is read more..

  • Page - 689

    658 SECURITY CHAP. 9 from the premises without warning, the next day (or next week) the logic bomb does not get fed its daily password, so it goes off. Many variants on this theme are also possible. In one famous case, the logic bomb checked the payroll. If the per- sonnel number of the programmer did not appear in it for two consecutive payroll periods, it went off read more..

  • Page - 690

    SEC. 9.8 INSIDER ATTACKS 659 inserted by a programmer working for a computer manufacturer and then shipped with its computers, the programmer could log into any computer made by his com- pany, no matter who owned it or what was in the password file. The same holds for a programmer working for the OS vendor. The back door simply bypasses the whole authentication process. One read more..

  • Page - 691

    660 SECURITY CHAP. 9 triggers the real login program to start and display the prompt of Fig. 9-27(a). The user assumes that she made a typing error and just logs in again. This time, how- ev er, it works. But in the meantime, Mal has acquired another (login name, pass- word) pair. By logging in at many computers and starting the login spoofer on all of them, he read more..

  • Page - 692

    SEC. 9.9 MALWARE 661 GREETINGS FROM GENERAL ENCRYPTION! TO PURCHASE A DECRYPTION KEY FOR YOUR HARD DISK, PLEASE SEND $100 IN SMALL, UNMARKED BILLS TO BOX 2154, PANAMA CITY, PANAMA. THANK YOU. WE APPRECIATE YOUR BUSINESS. Another common application of malware has it install a keylogger on the infected machine. This program simply records all keystrokes typed in and periodically read more..

  • Page - 693

    662 SECURITY CHAP. 9 at a competitor’s factory and with no system administrator currently logged in. If the coast was clear, it would interfere with the production process, reducing prod- uct quality, thus causing trouble for the competitor. In all other cases it would do nothing, making it hard to detect. Another example of targeted malware is a program that could be written read more..

  • Page - 694

    SEC. 9.9 MALWARE 663 9.9.1 Trojan Horses Writing malware is one thing. You can do it in your bedroom. Getting millions of people to install it on their computers is quite something else. How would our malware writer, Mal, go about this? A very common practice is to write some gen- uinely useful program and embed the malware inside of it. Games, music players, read more..

  • Page - 695

    664 SECURITY CHAP. 9 Most common programs are in /bin or /usr/bin, so putting a Trojan horse in /usr/bin/X11/ls does not work for a common program because the real one will be found first. However, suppose the cracker inserts la into /usr/bin/X11.If a user mistypes la instead of ls (the directory listing program), now the Trojan horse will run, do its dirty work, and then read more..

  • Page - 696

    SEC. 9.9 MALWARE 665 biological viruses reproduce. The virus can also do other things in addition to reproducing itself. Worms are like viruses but are self replicating. That difference will not concern us for the moment, so we will use the term ‘‘virus’’ to cover both. We will look at worms in Sec. 9.9.3. How Viruses Work Let us now see what kinds of viruses there read more..

  • Page - 697

    666 SECURITY CHAP. 9 A somewhat related attack uses the Windows desktop, which contains short- cuts (symbolic links) to programs. A virus can change the target of a shortcut to make it point to the virus. When the user double clicks on an icon, the virus is ex- ecuted. When it is done, the virus just runs the original target program. Executable Program Viruses One step up read more..

  • Page - 698

    SEC. 9.9 MALWARE 667 the entire file system starting at the root directory by changing to the root directory and calling search with the root directory as parameter. The recursive procedure search processes a directory by opening it, then read- ing the entries one at a time using readdir until a NULL is returned, indicating that there are no more entries. If the entry is read more..

  • Page - 699

    668 SECURITY CHAP. 9 virus some more, but it does not do what it is supposed to do, and the user will no- tice this instantly. Consequently, many viruses attach themselves to the program and do their dirty work, but allow the program to function normally afterward. Such viruses are called parasitic viruses. Parasitic viruses can attach themselves to the front, the back, or read more..

  • Page - 700

    SEC. 9.9 MALWARE 669 Memory-Resident Viruses So far we have assumed that when an infected program is executed, the virus runs, passes control to the real program, and then exits. In contrast, a memory- resident virus stays in memory (RAM) all the time, either hiding at the very top of memory or perhaps down in the grass among the interrupt vectors, the last few hundred read more..

  • Page - 701

    670 SECURITY CHAP. 9 When the computer is booted, the virus copies itself to RAM, either at the top or down among the unused interrupt vectors. At this point the machine is in kernel mode, with the MMU off, no operating system, and no antivirus program running. Party time for viruses. When it is ready, it boots the operating system, usually staying memory resident so it read more..

  • Page - 702

    SEC. 9.9 MALWARE 671 Device Driver Viruses Getting into memory like this is a little like spelunking (exploring caves)—you have to go through contortions and keep worrying about something falling down and landing on your head. It would be much simpler if the operating system would just kindly load the virus officially. With a little bit of work, that goal can be achieved read more..

  • Page - 703

    672 SECURITY CHAP. 9 Source Code Viruses Parasitic and boot sector viruses are highly platform specific; document viruses are somewhat less so (Word runs on Windows and Macs, but not on UNIX). The most portable viruses of all are source code viruses. Imagine the virus of Fig. 9-28, but with the modification that instead of looking for binary executable files, it looks for C read more..

  • Page - 704

    SEC. 9.9 MALWARE 673 to infect the boot sector of the hard disk. Once the boot sector is infected, it is easy to start a kernel-mode memory-resident virus on subsequent boots. Nowadays, other options are also available to Virgil. The virus can be written to check if the infected machine is on a (wireless) LAN, something that is very likely. The virus can then start read more..

  • Page - 705

    674 SECURITY CHAP. 9 just as devastating. Another category of virus writers is the military, which sees vi- ruses as a weapon of war potentially able to disable an enemy’s computers. Another issue related to spreading viruses is avoiding detection. Jails have notoriously bad computing facilities, so Virgil would prefer avoiding them. Post- ing a virus from his home machine is read more..

  • Page - 706

    SEC. 9.9 MALWARE 675 the Internet to their knees within a few hours of its release. Morris’ motivation is unknown, but it is possible that he intended the whole idea as a high-tech practical joke, but which due to a programming error got completely out of hand. Technically, the worm consisted of two programs, the bootstrap and the worm proper. The bootstrap was 99 lines read more..

  • Page - 707

    676 SECURITY CHAP. 9 propagating even if the system administrator there started up his own version of the worm to fool the real worm. The use of one in seven created far too many worms, and was the reason all the infected machines ground to a halt: they were infested with worms. If Morris had left this out and just exited whenever another worm was sighted (or made it read more..

  • Page - 708

    SEC. 9.9 MALWARE 677 background. Neither of these are considered spyware. If Potter Stewart were alive, he would probably say: ‘‘I can’ t define spyware, but I know it when I see it.’’† Others have tried harder to define it (spyware, not pornography). Barwinski et al. (2006) have said it has four characteristics. First, it hides, so the victim cannot find it easily. read more..

  • Page - 709

    678 SECURITY CHAP. 9 which causes the browser to download and execute the software. At this point, the machine is infected and the spyware is free to do anything it wants to. The second common route is the infected toolbar. Both Internet Explorer and Firefox support third-party toolbars. Some spyware writers create a nice toolbar that has some useful features and then widely read more..

  • Page - 710

    SEC. 9.9 MALWARE 679 but not to anyone subsequent to him outside the legal profession. Once the user has accepted the license, he may lose his right to sue the spyware vendor because he has just agreed to let the spyware run amok, although sometimes local laws over- ride such licenses. (If the license says ‘‘Licensee hereby irre vocably grants to licensor the right to kill read more..

  • Page - 711

    680 SECURITY CHAP. 9 Spyware should not be confused with adware, in which legitimate (but small) software vendors offer two versions of their product: a free one with ads and a paid one without ads. These companies are very clear about the existence of the two versions and always offer users the option to upgrade to the paid version to get rid of the ads. 9.9.5 Rootkits read more..

  • Page - 712

    SEC. 9.9 MALWARE 681 4. Library rootkits. Another place a rootkit can hide is in the system library, for example, in libc in Linux. This location gives the malware the opportunity to inspect the arguments and return values of system calls, modifying them as need be to keep itself hidden. 5. Application rootkits. Another place to hide a rootkit is inside a large application read more..

  • Page - 713

    682 SECURITY CHAP. 9 on the TLB, observe the performance, and compare it to previously measured per- formance on the bare hardware. Another class of detection methods relates to timing, especially of virtualized I/O devices. Suppose that it takes 100 clock cycles to read out some PCI device register on the real machine and this time is highly reproducible. In a virtual envi- read more..

  • Page - 714

    SEC. 9.9 MALWARE 683 ‘‘NO RO OTKITS FOUND!’’ These require more complicated measures, but fortu- nately no active rootkits have appeared in the wild yet. There are two schools of thought about what to do after a rootkit has been discovered. One school says the system administrator should behave like a surgeon treating a cancer: cut it out very carefully. The other says read more..

  • Page - 715

    684 SECURITY CHAP. 9 studied the extent and discovered that computers on over 500,000 networks world- wide had been infected by the rootkit. When the news broke, Sony’s initial reaction was that it had every right to pro- tect its intellectual property. In an interview on National Public Radio, Thomas Hesse, the president of Sony BMG’s global digital business, said: ‘‘Most read more..

  • Page - 716

    SEC. 9.10 DEFENSES 685 Properly secured computer systems are like this house, with multiple layers of se- curity. We will now look at some of the layers. The defenses are not really hierar- chical, but we will start roughly with the more general outer ones and work our way to more specific ones. 9.10.1 Firewalls The ability to connect any computer, anywhere, to any other read more..

  • Page - 717

    686 SECURITY CHAP. 9 interface (most firewalls have a mini-Web server built in to allow this). In the sim- plest kind of firewall, the stateless firewall, the header of each packet passing through is inspected and a decision is made to pass or discard the packet based solely on the information in the header and the firewall’s rules. The information in the packet header read more..

  • Page - 718

    SEC. 9.10 DEFENSES 687 In addition to stateless firewalls, there are also stateful firewalls, which keep track of connections and what state they are in. These firewalls are better at defeat- ing certain kinds of attacks, especially those relating to establishing connections. Yet other kinds of firewalls implement an IDS (Intrusion Detection System), in which the firewall inspects not read more..

  • Page - 719

    688 SECURITY CHAP. 9 up false alarms (false positives), that is, warnings about legitimate files that just happen to contain some code vaguely similar to a virus reported in Pakistan 7 years ago. What is the user supposed to do with the message: WARNING! File xyz.exe may contain the lahore-9x virus. Delete? The more viruses in the database and the broader the criteria for read more..

  • Page - 720

    SEC. 9.10 DEFENSES 689 (a) Executable program Header (b) Executable program Header (c) Decompressor Compressor Compressed executable program Compressed executable program Header (d) Decryptor Header Encryptor Compressor Encrypted Virus Decompressor Compressed executable program Encryptor Compressor Encrypted Virus Decompressor (e) Header File is longer Virus Original size Virus Original size Original size Encrypted Key Decryptor Key read more..

  • Page - 721

    690 SECURITY CHAP. 9 MOV A,R1 MOV A,R1 MOV A,R1 MOV A,R1 MOV A,R1 ADD B,R1 NOP ADD #0,R1 OR R1,R1 TST R1 ADD C,R1 ADD B,R1 ADD B,R1 ADD B,R1 ADD C,R1 SUB #4,R1 NOP OR R1,R1 MOV R1,R5 MOV R1,R5 MOV R1,X ADD C,R1 ADD C,R1 ADD C,R1 ADD B,R1 NOP SHL #0,R1 SHL R1,0 CMP R2,R5 SUB #4,R1 SUB #4,R1 SUB #4,R1 SUB #4,R1 NOP JMP .+1 ADD R5,R5 JMP .+1 MOV R1,X MOV R1,X read more..

  • Page - 722

    SEC. 9.10 DEFENSES 691 built-in device drivers for SATA, USB, SCSI, and other common disks, making the antivirus program less portable and subject to failure on computers with unusual disks. Furthermore, since bypassing the operating system to read the boot sector is possible, but bypassing it to read all the executable files is not, there is also some danger that the virus can read more..

  • Page - 723

    692 SECURITY CHAP. 9 Viruses do not have to passively lie around waiting for an antivirus program to kill them, like cattle being led off to slaughter. They can fight back. A particularly exciting battle can occur if a memory-resident virus and a memory-resident antivirus meet up on the same computer. Years ago there was a game called Core Wars in which two programmers read more..

  • Page - 724

    SEC. 9.10 DEFENSES 693 Finally, sixth, resist the temptation to download and run glitzy new free soft- ware from an unknown source. Maybe there is a reason it is free—the maker wants your computer to join his zombie army. If you have virtual machine soft- ware, running unknown software inside a virtual machine is safe, though. The industry should also take the virus threat read more..

  • Page - 725

    694 SECURITY CHAP. 9 guarding the latter. In order to sign a piece of software, the vendor first computes a hash function of the code to get a 160-bit or 256-bit number, depending on whether SHA-1 or SHA-256 is used. It then signs the hash value by encrypting it with its private key (actually, decrypting it using the notation of Fig. 9-15). This signature accompanies the read more..

  • Page - 726

    SEC. 9.10 DEFENSES 695 the signature merely proves where it came from, not what it does. A technique for doing this is called jailing and illustrated in Fig. 9-36. Kernel Jailer Prisoner Sys Figure 9-36. The operation of a jail. The newly acquired program is run as a process labeled ‘‘prisoner’’ in the fig- ure. The ‘‘jailer’’ is a trusted (system) process that read more..

  • Page - 727

    696 SECURITY CHAP. 9 In Fig. 9-37(a) we see a small program that opens a file called data and reads it one character at a time until it hits a zero byte, at which time it prints the number of nonzero bytes at the start of the file and exits. In Fig. 9-37(b) we see a graph of the system calls made by this program (where print calls wr ite ). int main(int argc read more..

  • Page - 728

    SEC. 9.10 DEFENSES 697 When this kind of static model-based intrusion detection is used, the jailer has to know the model (i.e., the system-call graph). The most straightforward way for it to learn it is to have the compiler generate it and have the author of the program sign it and attach its certificate. In this way, any attempt to modify the executable program in read more..

  • Page - 729

    698 SECURITY CHAP. 9 to draw certain curves and then fill them in, but it can do anything else it wants to as well. Applets, agents, and PostScript files are just three examples of mobile code, but there are many others. Given the long discussion about viruses and worms earlier, it should be clear that allowing foreign code to run on your machine is more than a wee read more..

  • Page - 730

    SEC. 9.10 DEFENSES 699 usually is. Each applet is given two sandboxes, one for the code and one for the data, as illustrated in Fig. 9-38(a) for the case of 16 sandboxes of 16 MB each. (a) (b) 256 224 192 160 128 96 64 32 0 Ref. Mon. Code 1 Data 1 Code 2 Data 2 Reference monitor for checking system Applet 2 Applet 1 MOV R1, S1 SHR #24, S1 CMP S1, S2 TRAPNE JMP (R1) read more..

  • Page - 731

    700 SECURITY CHAP. 9 example of such a test is shown in Fig. 9-38(b). Remember that all valid addresses have the same upper k bits, so this prefix can be stored in a scratch register, say S2 . Such a register cannot be used by the applet itself, which may require rewriting it to avoid this register. The code works as follows: First the target address under inspection read more..

  • Page - 732

    SEC. 9.10 DEFENSES 701 Untrusted applet Trusted applet Web browser Sandbox Interpreter Virtual address space 0xFFFFFFFF 0 Figure 9-39. Applets can be interpreted by a Web browser. also caught and interpreted. How these calls are handled is a matter of the security policy. For example, if an applet is trusted (e.g., it came from the local disk), its system calls could be carried out read more..

  • Page - 733

    702 SECURITY CHAP. 9 Java, constructions that mix types like this are forbidden by the grammar. In addi- tion, Java has no pointer variables, casts, or user-controlled storage allocation (such as malloc and free), and all array references are checked at run time. Java programs are compiled to an intermediate binary code called JVM (Java Virtual Machine) byte code. JVM has read more..

  • Page - 734

    SEC. 9.10 DEFENSES 703 Each rule may list a URL, a signer, an object, and an action that the applet may perform on the object if the applet’s URL and signer match the rule. Conceptually, the information provided is shown in the table of Fig. 9-40, although the actual for- matting is different and is related to the Java class hierarchy. URL Signer Object Action read more..

  • Page - 735

    704 SECURITY CHAP. 9 in security, both in academia and in industry, is not likely to wav er in the next few years either. One important topic is the protection of binary programs. Control Flow Integ- rity (CFI) is a fairly old technique to stop all control flow div ersions and, hence, all ROP exploits. Unfortunately, the overhead is very high. Since ASLR, DEP, and read more..

  • Page - 736

    SEC. 9.12 SUMMARY 705 not tampered with, which rapidly leads to the requirement that operating systems must provide good security. In general, the security of a system is inversely propor- tional to the size of the trusted computing base. A fundamental component of security for operating systems concerns access control to resources. Access rights to information can be modeled as a read more..

  • Page - 737

    706 SECURITY CHAP. 9 PROBLEMS 1. Confidentiality, integrity, and availability are three components of security. Describe an application that integrity and availability but not confidentiality, an application that requires confidentiality and integrity but not (high) availability, and an application that requires confidentiality, integrity, and availability 2. One of the techniques to build a read more..

  • Page - 738

    CHAP. 9 PROBLEMS 707 10. Modify the ACL from the previous problem for one file to grant or deny an access that cannot be expressed using the UNIX rwx system. Explain this modification. 11. Suppose there are four security levels, 1, 2 and 3. Objects A and B are at level 1, C and D are at level 2, and E and F are at level 3. Processes 1 and 2 are at level read more..

  • Page - 739

    708 SECURITY CHAP. 9 B[i], A[1] is copied to B[ j], A[2] is copied to B[k], etc. After all n bytes are copied to the B array, that array is written to the output file and n more bytes are read into A. This procedure continues until the entire file has been encrypted. Note that here en- cryption is not being done by replacing characters with other ones, but by read more..

  • Page - 740

    CHAP. 9 PROBLEMS 709 27. After getting your degree, you apply for a job as director of a large university computer center that has just put its ancient mainframe system out to pasture and switched over to a large LAN server running UNIX. You get the job. Fifteen minutes after you start work, your assistant bursts into your office screaming: ‘‘Some students ha ve read more..

  • Page - 741

    710 SECURITY CHAP. 9 38. Can the Trojan-horse attack work in a system protected by capabilities? 39. When a file is removed, its blocks are generally put back on the free list, but they are not erased. Do you think it would be a good idea to have the operating system erase each block before releasing it? Consider both security and performance factors in your answer, read more..

  • Page - 742

    CHAP. 9 PROBLEMS 711 52. Section 9.10.1 describes a set of firewall rules that limit outside access to only three services. Describe another set of rules that you can add to this firewall to further restrict access to these services. 53. On some machines, the SHR instruction used in Fig. 9-38(b) fills the unused bits with zeros; on others the sign bit is extended to the read more..

  • Page - 743

    712 SECURITY CHAP. 9 62. Write a program that emulates overwriting viruses outlined in Sec. 9.9.2 under the heading ‘‘Executable Program Vi ruses’’. Choose an existing executable file that you know can be overwritten without any harm. For the virus binary, choose any harmless executable binary. read more..

  • Page - 744

    10 CASE STUDY 1: UNIX, LINUX, AND ANDROID In the previous chapters, we took a close look at many operating system prin- ciples, abstractions, algorithms, and techniques in general. Now it is time to look at some concrete systems to see how these principles are applied in the real world. We will begin with Linux, a popular variant of UNIX, which runs on a wide variety of read more..

  • Page - 745

    714 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 and data structures are similar, but there are some differences. To make the ex- amples concrete, it is best to choose one of them and describe it consistently. Since most readers are more likely to have encountered Linux than any of the others, we will use it as our running example, but again be aware that except read more..

  • Page - 746

    SEC. 10.1 HISTORY OF UNIX AND LINUX 715 discarded PDP-7 minicomputer. Despite the tiny size of the PDP-7, Thompson’s system actually worked and could support Thompson’s dev elopment effort. Conse- quently, one of the other researchers at Bell Labs, Brian Kernighan, somewhat jok- ingly called it UNICS (UNiplexed Information and Computing Service). Despite puns about ‘‘EUNUCHS’’ read more..

  • Page - 747

    716 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 speakers getting up in front of the room to tell about some obscure kernel bug they had found and fixed. An Australian professor, John Lions, wrote a commentary on the UNIX source code of the type normally reserved for the works of Chaucer or Shakespeare (reprinted as Lions, 1996). The book described Version 6, so named read more..

  • Page - 748

    SEC. 10.1 HISTORY OF UNIX AND LINUX 717 located on the fifth floor at Bell Labs. The Interdata was on the first floor. Gener- ating a new version meant compiling it on the fifth floor and then physically carry- ing a magnetic tape down to the first floor to see if it worked. After several months of tape carrying, an unknown person said: ‘‘You know , we’re the read more..

  • Page - 749

    718 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Berkeley also added a substantial number of utility programs to UNIX, includ- ing a new editor (vi), a new shell (csh), Pascal and Lisp compilers, and many more. All these improvements caused Sun Microsystems, DEC, and other computer ven- dors to base their versions of UNIX on Berkeley UNIX, rather than on AT&T’s read more..

  • Page - 750

    SEC. 10.1 HISTORY OF UNIX AND LINUX 719 BSD, namely Version 7. The 1003.1 document is written in such a way that both operating system implementers and software writers can understand it, another novelty in the standards world, although work is already underway to remedy this. Although the 1003.1 standard addresses only the system calls, related docu- ments standardize threads, the read more..

  • Page - 751

    720 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 quickly became a collective undertaking by large numbers of users over the Inter- net. It was a prototype of other collaborative efforts that came later. In 1997, Ver- sion 2.0 of MINIX, was released and the base system, now including networking, had grown to 62,200 lines of code. Around 2004, the direction of MINIX read more..

  • Page - 752

    SEC. 10.1 HISTORY OF UNIX AND LINUX 721 Linux rapidly grew in size and evolved into a full, production UNIX clone, as virtual memory, a more sophisticated file system, and many other features were added. Although it originally ran only on the 386 (and even had embedded 386 as- sembly code in the middle of C procedures), it was quickly ported to other plat- forms and now read more..

  • Page - 753

    722 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 One unusual feature of Linux is its business model: it is free software. It can be downloaded from various sites on the Internet, for example: www.kernel.org. Linux comes with a license devised by Richard Stallman, founder of the Free Soft- ware Foundation. Despite the fact that Linux is free, this license, the GPL (GNU read more..

  • Page - 754

    SEC. 10.2 OVERVIEW OF LINUX 723 10.2 OVERVIEW OF LINUX In this section we will provide a general introduction to Linux and how it is used, for the benefit of readers not already familiar with it. Nearly all of this mater- ial applies to just about all UNIX variants with only small deviations. Although Linux has several graphical interfaces, the focus here is on how Linux read more..

  • Page - 755

    724 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 is a complete waste of valuable hacking time. To extract all the lines containing the string ‘‘ard’’ from the file f, the Linux programmer merely types grep ard f The opposite approach is to have the programmer first select the grep program (with no arguments), and then have grep announce itself by saying: read more..

  • Page - 756

    SEC. 10.2 OVERVIEW OF LINUX 725 proper place, then executes the trap instruction. Thus to execute the read system call, a C program can call the read library procedure. As an aside, it is the library interface, and not the system call interface, that is specified by POSIX. In other words, POSIX tells which library procedures a conformant system must supply, what their parameters read more..

  • Page - 757

    726 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 10.2.3 The Shell Although Linux systems have a graphical user interface, most programmers and sophisticated users still prefer a command-line interface, called the shell. Often they start one or more shell windows from the graphical user interface and just work in them. The shell command-line interface is much faster to use, read more..

  • Page - 758

    SEC. 10.2 OVERVIEW OF LINUX 727 To make it easy to specify multiple file names, the shell accepts magic charac- ters, sometimes called wild cards. An asterisk, for example, matches all possible strings, so ls *.c tells ls to list all the files whose name ends in .c. If files named x.c, y.c,and z.c all exist, the above command is equivalent to typing ls x.c y.c z.c read more..

  • Page - 759

    728 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 temp and print them on standard output, which defaults to the terminal. Finally, the temporary file is removed. It is not recycled. It is gone with the wind, forever. It frequently occurs that the first program in a command line produces output that is used as input to the next program. In the above example, we used read more..

  • Page - 760

    SEC. 10.2 OVERVIEW OF LINUX 729 10.2.4 Linux Utility Programs The command-line (shell) user interface to Linux consists of a large number of standard utility programs. Roughly speaking, these programs can be divided into six categories, as follows: 1. File and directory manipulation commands. 2. Filters. 3. Program development tools, such as editors and compilers. 4. Text processing. 5. read more..

  • Page - 761

    730 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 columns of text to be cut and pasted into files; od, which converts its (usually bina- ry) input to ASCII text, in octal, decimal, or hexadecimal; tr, which does character translation (e.g., lowercase to uppercase), and pr, which formats output for the printer, including options to include running heads, page numbers, and so read more..

  • Page - 762

    SEC. 10.2 OVERVIEW OF LINUX 731 10.2.5 Kernel Structure In Fig. 10-1 we saw the overall structure of a Linux system. Now let us zoom in and look more closely at the kernel as a whole before examining the various parts, such as process scheduling and the file system. System calls Interrupts Dispatcher Virtual file system Terminals Sockets File systems Network protocols Network device read more..

  • Page - 763

    732 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 memory or on disk, is the same as performing a read operation to retrieve a charac- ter from a terminal input. At the lowest level, all I/O operations pass through some device driver. All Linux drivers are classified as either character-device drivers or block-device drivers, the main difference being that seeks and read more..

  • Page - 764

    SEC. 10.2 OVERVIEW OF LINUX 733 While the three components are represented separately in the figure, they are highly interdependent. File systems typically access files through the block de- vices. However, in order to hide the large latencies of disk accesses, files are cop- ied into the page cache in main memory. Some files may even be dynamically created and may have only read more..

  • Page - 765

    734 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Linux is a multiprogramming system, so multiple, independent processes may be running at the same time. Furthermore, each user may have sev eral active proc- esses at once, so on a large system, there may be hundreds or even thousands of processes running. In fact, on most single-user workstations, even when the user read more..

  • Page - 766

    SEC. 10.3 PROCESSES IN LINUX 735 pid = for k( ); /* if the for k succeeds, pid > 0 in the parent */ if (pid < 0) { handle error( ); /* fork failed (e.g., memory or some table is full) */ } else if (pid > 0) { /* parent code goes here. /*/ } else { /* child code goes here. /*/ } Figure 10-4. Process creation in Linux. just finished. This can be important read more..

  • Page - 767

    736 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Signal Cause SIGABRT Sent to abort a process and force a core dump SIGALRM The alar m clock has gone off SIGFPE A floating-point error has occurred (e.g., division by 0) SIGHUP The phone line the process was using has been hung up SIGILL The user has hit the DEL key to interr upt the process SIGQUIT The user has read more..

  • Page - 768

    SEC. 10.3 PROCESSES IN LINUX 737 System call Description pid = for k( ) Create a child process identical to the parent pid = waitpid(pid, &statloc, opts) Wait for a child to terminate s = execve(name, argv, envp) Replace a process’ core image exit(status) Ter minate process execution and return status s = sigaction(sig, &act, &oldact) Define action to take on signals s = read more..

  • Page - 769

    738 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 while (TRUE) { /* repeat forever /*/ type prompt( ); /* display prompt on the screen */ read command(command, params); /* read input line from keyboard */ pid = for k( ); /* fork off a child process */ if (pid < 0) { pr intf("Unable to for k0); /* error condition */ continue; /* repeat the loop */ } if (pid read more..

  • Page - 770

    SEC. 10.3 PROCESSES IN LINUX 739 Several system calls relate to signals, which are used in a variety of ways. For example, if a user accidentally tells a text editor to display the entire contents of a very long file, and then realizes the error, some way is needed to interrupt the edi- tor. The usual choice is for the user to hit some special key (e.g., DEL or read more..

  • Page - 771

    740 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 10.3.3 Implementation of Processes and Threads in Linux A process in Linux is like an iceberg: you only see the part above the water, but there is also an important part underneath. Every process has a user part that runs the user program. However, when one of its threads makes a system call, it traps to kernel mode read more..

  • Page - 772

    SEC. 10.3 PROCESSES IN LINUX 741 1. Scheduling parameters. Process priority, amount of CPU time con- sumed recently, amount of time spent sleeping recently. Together, these are used to determine which process to run next. 2. Memory image. Pointers to the text, data, and stack segments, or page tables. If the text segment is shared, the text pointer points to the shared text read more..

  • Page - 773

    742 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 with the address of the process descriptor. By storing the process descriptor’s ad- dress at a fixed location, Linux needs only few eff icient operations to locate the task structure for a running process. The majority of the process-descriptor contents are filled out based on the par- ent’s descriptor values. Linux then read more..

  • Page - 774

    SEC. 10.3 PROCESSES IN LINUX 743 sh sh ls fork code New process Same process 1. Fork call 3. exec call 4. sh overlaid with ls PID = 501 PID = 748 PID = 748 Allocate child’s task structure Fill child’s task structure from parent Allocate child’s stack and user area Fill child’s user area from parent Allocate PID for child Set up child to share parent’s text read more..

  • Page - 775

    744 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 File I/O is another problem area. Suppose that one thread is blocked reading from a file and another thread closes the file or does an lseek to change the current file pointer. What happens next? Who knows? Signal handling is another thorny issue. Should signals be directed at a specific thread or just at the process? A read more..

  • Page - 776

    SEC. 10.3 PROCESSES IN LINUX 745 structure or shares it with the calling thread. Fig. 10-9 shows some of the items that can be shared or copied according to bits in sharing flags. Flag Meaning when set Meaning when cleared CLONE VM Create a new thread Create a new process CLONE FS Share umask, root, and wor king dirs Do not share them CLONE FILES Share the file read more..

  • Page - 777

    746 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 is easy to make a new task structure for each cloned thread and have it point either to the old thread’s scheduling, memory, and other data structures or to copies of them. The fact that such fine-grained sharing is possible does not mean that it is useful, however, especially since traditional UNIX versions do not read more..

  • Page - 778

    SEC. 10.3 PROCESSES IN LINUX 747 In Linux, time is measured as the number of clock ticks. In older Linux ver- sions, the clock ran at 1000Hz and each tick was 1ms, called a jiffy.In newer ver- sions, the tick frequency can be configured to 500, 250 or even 1Hz. In order to avoid wasting CPU cycles for servicing the timer interrupt, the kernel can even be configured read more..

  • Page - 779

    748 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Flags CPU Static_prio <> Active Expired <> Priority 0 Array[0] Priority 139 Priority 0 Array[1] Priority 139 P P P P P P P P P P P P (a) Per CPU runqueue in the Linux O(1) scheduler 35 26 47 10 30 38 62 3 27 45 86 NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL (b) Per CPU red-black tree in the CFS read more..

  • Page - 780

    SEC. 10.3 PROCESSES IN LINUX 749 to dynamically map the task’s bonus to values from −5 to +5. The scheduler recal- culates the new priority level as a thread is moved from the active to the expired list. The O(1) scheduling algorithm refers to the scheduler made popular in the early versions of the 2.6 kernel, and was first introduced in the unstable 2.5 kernel. Prior read more..

  • Page - 781

    750 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 of tasks in the system. Given the levels of load in current systems, this continues to be acceptable, but as the compute capacity of the nodes, and the number of tasks they can run, increase, particularly in the server space, it is possible that new scheduling algorithms will be proposed in the future. Besides the basic read more..

  • Page - 782

    SEC. 10.3 PROCESSES IN LINUX 751 multicore systems. Threads that are allowed to or need to block use constructs like mutexes and semaphores. Linux supports nonblocking calls like mutex tr ylock and sem tr ywait to determine the status of the synchronization variable without block- ing. Other types of synchronization variables, like futexes, completions, ‘‘read- copy-update’’ (RCU) read more..

  • Page - 783

    752 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 see which ones actually are present. If a probed device responds to the probe, it is added to a table of attached devices. If it fails to respond, it is assumed to be absent and ignored henceforth. Unlike traditional UNIX versions, Linux device drivers do not need to be statically linked and may be loaded dynamically read more..

  • Page - 784

    SEC. 10.3 PROCESSES IN LINUX 753 Process 0 Process 1 Process 2 Page daemon Terminal 0 Terminal 1 Terminal 2 Login: Password: % cp f1 f2 login sh cp getty init Figure 10-11. The sequence of processes used to boot some Linux systems. In the figure, the getty process running for terminal 0 is still waiting for input. On terminal 1, a user has typed a login name, so getty has read more..

  • Page - 785

    754 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 10.4.1 Fundamental Concepts Every Linux process has an address space that logically consists of three seg- ments: text, data, and stack. An example process’ address space is illustrated in Fig. 10-12(a) as process A.The text segment contains the machine instructions that form the program’s executable code. It is produced by the read more..

  • Page - 786

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 755 initial value is 0. In practice, most global variables are not initialized explicitly, and are thus 0. This could be implemented by simply having a section of the ex- ecutable binary file exactly equal to the number of bytes of data, and initializing all of them, including the ones that have defaulted to 0. However, to sav e read more..

  • Page - 787

    756 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 When two users are running the same program, such as the editor, it would be possible, but inefficient, to keep two copies of the editor’s program text in memory at once. Instead, Linux systems support shared text segments. In Fig. 10-12(a) and Fig. 10-12(c) we see two processes, A and B, that have the same text read more..

  • Page - 788

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 757 Text BSS Data Text Stack pointer Stack pointer 20K 8K 0K 24K 0K (a) (b) (c) OS Physical memory Mapped file Mapped file Process A Process B BSS Data 8K Unused memory Figure 10-13. Tw o processes can share a mapped file. How malloc is implemented is thus moved outside the scope of the POSIX stan- dard. In some circles, this approach is known as read more..

  • Page - 789

    758 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 the file to be mapped. Only open files can be mapped, so to map a file in, it must first be opened. Finally, offset tells where in the file to begin the mapping. It is not necessary to start the mapping at byte 0; any page boundary will do. The other call, unmap , removes a mapped file. If only a portion of read more..

  • Page - 790

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 759 operations, and ZONE DMA32 marks this region. In addition, if the hardware, like older-generation i386, cannot directly map memory addresses above 896 MB, ZONE HIGHMEM corresponds to anything above this mark. ZONE NORMAL is anything in between them. Therefore, on 32-bit x86 platforms, the first 896 MB of the Linux address space are directly read more..

  • Page - 791

    760 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 First of all, Linux maintains an array of page descriptors,oftype page one for each physical page frame in the system, called mem map. Each page descriptor contains a pointer to the address space that it belongs to, in case the page is not free, a pair of pointers which allow it to form doubly linked lists with other read more..

  • Page - 792

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 761 Global directory Upper directory Middle directory Page Offset Page Page table Page middle directory Page upper directory Page global directory Virtual address Figure 10-16. Linux uses four-level page tables. Physical memory is used for various purposes. The kernel itself is fully hard- wired; no part of it is ever paged out. The rest of memory is read more..

  • Page - 793

    762 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 The basic idea for managing a chunk of memory is as follows. Initially memo- ry consists of a single contiguous piece, 64 pages in the simple example of Fig. 10-17(a). When a request for memory comes in, it is first rounded up to a power of 2, say eight pages. The full memory chunk is then divided in half, as read more..

  • Page - 794

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 763 slab is available, it looks through the list of empty slabs. Finally, if necessary, it will allocate a new slab, place the new task structure there, and link this slab with the task-structure object cache. The kmalloc kernel service, which allocates physi- cally contiguous memory regions in the kernel address space, is in fact built on read more..

  • Page - 795

    764 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 The vm area struct also records whether the area has backing storage on disk assigned, and if so, where. Text segments use the executable binary as backing storage and memory-mapped files use the disk file as backing storage. Other areas, such as the stack, do not have backing storage assigned until they hav e to be read more..

  • Page - 796

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 765 may be needed soon, in the hope it will be there when needed). Te xt segments and mapped files are paged to their respective files on disk. Everything else is paged to either the paging partition (if present) or one of the fixed-length paging files, called the swap area. Paging files can be added and removed dynamically and each read more..

  • Page - 797

    766 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Each time PFRA executes, it first tries to reclaim easy pages, then proceeds with the harder ones. Many people also grab the low-hanging fruit first. Dis- cardable and unreferenced pages can be reclaimed immediately by moving them onto the zone’s freelist. Next it looks for pages with backing store which have not been read more..

  • Page - 798

    SEC. 10.4 MEMORY MANAGEMENT IN LINUX 767 Inactive Used Used Used Timeout Timeout Refill Refill Refill PG_active = 0 PG_referenced = 0 PG_active = 0 PG_referenced = 1 PG_active = 1 PG_referenced = 0 PG_active = 1 PG_referenced = 1 Active Figure 10-18. Page states considered in the page-frame replacement algorithm. back to disk very old dirty pages, or (2) are explicitly awakened read more..

  • Page - 799

    768 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 device is assigned a path name, usually in /dev. For example, a disk might be /dev/hd1, a printer might be /dev/lp, and the network might be /dev/net. These special files can be accessed the same way as any other files. No special commands or system calls are needed. The usual open , read ,and wr ite system calls will read more..

  • Page - 800

    SEC. 10.5 INPUT/OUTPUT IN LINUX 769 handles tab expansion, enabling and disabling of character echoing, conversion be- tween carriage return and line feed, and similar items. The system call is not per- mitted on regular files or block special files. 10.5.2 Networking Another example of I/O is networking, as pioneered by Berkeley UNIX and taken over by Linux more or less verbatim. read more..

  • Page - 801

    770 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 The second type is rather similar to the first one, except that it preserves packet boundaries. If the sender makes fiv e separate calls to wr ite , each for 512 bytes, and the receiver asks for 2560 bytes, with a type 1 socket all 2560 bytes will be re- turned at once. With a type 2 socket, only 512 bytes read more..

  • Page - 802

    SEC. 10.5 INPUT/OUTPUT IN LINUX 771 functions into separate function calls primarily for terminal devices. In Linux and modern UNIX systems, whether each one is a separate system call or they share a single system call or something else is implementation dependent. The first four calls listed in Fig. 10-20 are used to set and get the terminal speed. Different calls are provided read more..

  • Page - 803

    772 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 a parameter. Adding a new device type to Linux means adding a new entry to one of these tables and supplying the corresponding procedures to handle the various operations on the device. Some of the operations which may be associated with different character de- vices are shown in Fig. 10-21. Each row refers to a single read more..

  • Page - 804

    SEC. 10.5 INPUT/OUTPUT IN LINUX 773 the cache, the block is taken from there and a disk access is avoided, thereby re- sulting in great improvements in system performance. Block device driver Char device driver Network device driver I/O scheduler Regular file Char special file Network socket Cache Virtual File System (Optional line discipline) Protocol drivers File system 1 FS 2 Block device driver I/O read more..

  • Page - 805

    774 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 and 5 sec for writes. If a system-defined deadline for the oldest write operation is about to expire, that write request will be serviced before any of the requests on the main doubly linked list. In addition to regular disk files, there are also block special files, also called raw block files. These files allow programs read more..

  • Page - 806

    SEC. 10.5 INPUT/OUTPUT IN LINUX 775 high-end workstations, with their small and unchanging sets of I/O devices, this scheme worked well. Basically, a computer center built a kernel containing drivers for the I/O devices and that was it. If next year the center bought a new disk, it relinked the kernel. No big deal. With the arrival of Linux on the PC platform, suddenly all read more..

  • Page - 807

    776 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 disks of its era), there was interest in better file systems almost from the beginning of the Linux development, which began about 5 years after MINIX 1 was released. The first improvement was the ext file system, which allowed file names of 255 characters and files of 2 GB, but it was slower than the MINIX 1 file read more..

  • Page - 808

    SEC. 10.6 THE LINUX FILE SYSTEM 777 example of an absolute path is /usr/ast/books/mos4/chap-10. This tells the system to look in the root directory for a directory called usr, then look there for another directory, ast. In turn, this directory contains a directory books, which contains the directory mos4, which contains the file chap-10. Absolute path names are often long and read more..

  • Page - 809

    778 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 In the example just discussed, we suggested that before linking, the only way for Fred to refer to Lisa’s file x was by using its absolute path. Actually, this is not really true. When a directory is created, two entries, . and .., are automatically made in it. The former refers to the working directory itself. The read more..

  • Page - 810

    SEC. 10.6 THE LINUX FILE SYSTEM 779 / a ba c p q r q q r d / cd b DVD / Hard disk Hard disk x y z x y z Figure 10-25. (a) Separate file systems. (b) After mounting. critical regions. However, if the processes belong to independent users who do not ev en know each other, this kind of coordination is generally inconvenient. Consider, for example, a database consisting of many read more..

  • Page - 811

    780 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 lock on bytes 6 through 9, as shown in Fig. 10-26(b). Finally, C locks bytes 2 through 11. As long as all these locks are shared, they can coexist. 0 (a) 1 2 3 8 9 10 11 121314 15 0 1 2 3 10 11 121314 15 0 1 121314 15 (b) (c) Process A's shared lock A's shared lock B's shared lock C's shared lock A B 4 567 4 5 6 789 23 read more..

  • Page - 812

    SEC. 10.6 THE LINUX FILE SYSTEM 781 nonnegative integer called a file descriptor, fd in the example above. If a creat is done on an existing file, that file is truncated to length 0 and its contents are dis- carded. Files can also be created using the open call with appropriate arguments. Now let us continue looking at the main file-system calls, which are listed in Fig. read more..

  • Page - 813

    782 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 pointer that indicates the current position in the file. When reading (or writing) se- quentially, it normally points to the next byte to be read (written). If the pointer is at, say, 4096, before 1024 bytes are read, it will automatically be moved to 5120 after a successful read system call. The lseek call changes the read more..

  • Page - 814

    SEC. 10.6 THE LINUX FILE SYSTEM 783 being aware that these have been redirected. If they hav e not been redirected, sort will automatically read from the keyboard and write to the screen (the default de- vices). Similarly, when head reads from file descriptor 0, it is reading the data sort put into the pipe buffer without even knowing that a pipe is in use. This is read more..

  • Page - 815

    784 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 10.6.3 Implementation of the Linux File System In this section we will first look at the abstractions supported by the Virtual File System layer. The VFS hides from higher-level processes and applications the differences among many types of file systems supported by Linux, whether they are residing on local devices or are read more..

  • Page - 816

    SEC. 10.6 THE LINUX FILE SYSTEM 785 are cached in what is called the dentry cache. For instance, the dentry cache would contain entries for /, /usr, /usr/ast, and the like. If multiple processes access the same file through the same hard link (i.e., same path), their file object will point to the same entry in this cache. Finally, the file data structure is an in-memory read more..

  • Page - 817

    786 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Tw o bitmaps are used to keep track of the free blocks and free i-nodes, respect- iv ely, a choice inherited from the MINIX 1 file system (and in contrast to most UNIX file systems, which use a free list). Each map is one block long. With a 1-KB block, this design limits a block group to 8192 blocks and read more..

  • Page - 818

    SEC. 10.6 THE LINUX FILE SYSTEM 787 19 (a) 42 F 8 F 10 88 D 6 bigdir colossal voluminous Unused 19 (b) F 8 88 D 6 bigdir colossal Unused Unused I-node number Entry size Type File name length Figure 10-32. (a) A Linux directory with three files. (b) The same directory af- ter the file voluminous has been removed. name is padded by an unknown length. That is the meaning of the read more..

  • Page - 819

    788 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 number for the /usr/ast directory can be taken from it. Armed with the i-node num- ber of the /usr/ast directory, this i-node can be read and the directory blocks locat- ed. Finally, ‘‘file’’ is looked up and its i-node number found. Thus, the use of a rel- ative path name is not only more convenient for the read more..

  • Page - 820

    SEC. 10.6 THE LINUX FILE SYSTEM 789 The problem is as follows. Associated with every file descriptor is a file position that tells at which byte the next read (or write) will start. Where should it go? One possibility is to put it in the i-node table. However, this approach fails if two or more unrelated processes happen to open the same file at the same time because read more..

  • Page - 821

    790 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Mode i-node Link count Uid Gid File size Times Addresses of first 12 disk blocks Single indirect Double indirect Triple indirect Parent's file- descriptor table Child's file- descriptor table Unrelated process file- descriptor table Open file description File position R/W Pointer to i-node File position R/W Pointer to i-node Pointers to disk blocks read more..

  • Page - 822

    SEC. 10.6 THE LINUX FILE SYSTEM 791 ext3, it changes the block addressing scheme used by its predecessors, thereby sup- porting both larger files and larger overall file-system sizes. We will describe some of its features next. The basic idea behind a journaling file system is to maintain a journal, which describes all file-system operations in sequential order. By sequentially read more..

  • Page - 823

    792 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 scheme also reduces fragmentation for large files. As a result, ext4 can provide faster file system operations and support larger files and file system sizes. For instance, for a block size of 1 KB, ext4 increases the maximum file size from 16 GB to 16 TB, and the maximum file system size to 1 EB (Exabyte). The /proc read more..

  • Page - 824

    SEC. 10.6 THE LINUX FILE SYSTEM 793 wide area network if the server is far from the client. For simplicity we will speak of clients and servers as though they were on distinct machines, but in fact, NFS al- lows every machine to be both a client and a server at the same time. Each NFS server exports one or more of its directories for access by remote clients. When read more..

  • Page - 825

    794 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 NFS Protocols Since one of the goals of NFS is to support a heterogeneous system, with cli- ents and servers possibly running different operating systems on different hard- ware, it is essential that the interface between the clients and servers be well de- fined. Only then is anyone able to write a new client read more..

  • Page - 826

    SEC. 10.6 THE LINUX FILE SYSTEM 795 also access file attributes, such as file mode, size, and time of last modification. Most Linux system calls are supported by NFS, with the perhaps surprising ex- ceptions of open and close . The omission of open and close is not an accident. It is fully intentional. It is not necessary to open a file before reading it, nor to close it read more..

  • Page - 827

    796 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Client kernel Server kernel System call layer Buffer cache Buffer cache Virtual file system layer Virtual file system layer Local FS 1 Local FS 1 Local FS 2 Local FS 2 NFS client NFS server Driver Driver Driver Driver Message to server Message from client Local disks Local disks V- node Figure 10-36. The NFS layer structure file system read more..

  • Page - 828

    SEC. 10.6 THE LINUX FILE SYSTEM 797 When a remote file is opened on the client, at some point during the parsing of the path name, the kernel hits the directory on which the remote file system is mounted. It sees that this directory is remote and in the directory’s v-node finds the pointer to the r-node. It then asks the NFS client code to open the file. The NFS read more..

  • Page - 829

    798 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 one of them modifies it. When the other one reads the block, it gets the old (stale) value. The cache is not coherent. Given the potential severity of this problem, the NFS implementation does sev- eral things to mitigate it. For one, associated with each cache block is a timer. When the timer expires, the entry is read more..

  • Page - 830

    SEC. 10.7 SECURITY IN LINUX 799 UID of their owner. By default, the owner of a file is the person who created the file, although there is a way to change ownership. Users can be organized into groups, which are also numbered with 16-bit inte- gers called GIDs (Group IDs). Assigning users to groups is done manually (by the system administrator) and consists of making entries read more..

  • Page - 831

    800 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 The user with UID 0 is special and is called the superuser (or root). The superuser has the power to read and write all files in the system, no matter who owns them and no matter how they are protected. Processes with UID 0 also have the ability to make a small number of protected system calls denied to ordinary read more..

  • Page - 832

    SEC. 10.7 SECURITY IN LINUX 801 10.7.2 Security System Calls in Linux There are only a small number of system calls relating to security. The most important ones are listed in Fig. 10-38. The most heavily used security system call is chmod . It is used to change the protection mode. For example, s = chmod("/usr/ast/newgame", 0755); sets newgame to rwxr–xr–x so that read more..

  • Page - 833

    802 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 from being stored in unencrypted form anywhere in the system. If the password is correct, the login program looks in /etc/passwd to see the name of the user’s pre- ferred shell, possibly bash, but possibly some other shell such as csh or ksh.The login program then uses setuid and setgid to give itself the user’s UID read more..

  • Page - 834

    SEC. 10.8 ANDROID 803 level libraries are written in C and C++. However a large amount of the system is written in Java and, but for some small exceptions, the entire application API is written and published in Java as well. The parts of Android written in Java tend to follow a very object-oriented design as encouraged by that language. 10.8.1 Android and Google Android read more..

  • Page - 835

    804 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 10.8.2 History of Android Google developed Android in the mid-2000s, after acquiring Android as a startup company early in its development. Nearly all the development of the Android platform that exists today was done under Google’s management. Early Development Android, Inc. was a software company founded to build software to read more..

  • Page - 836

    SEC. 10.8 ANDROID 805 applications as a single process on a host computer. In fact there are still some remnants of this old implementation around today, with things like the Applica- tion.onTer minate method still in the SDK (Software Dev elopment Kit), which Android programmers use to write applications. In June 2006, two hardware devices were selected as software-development targets read more..

  • Page - 837

    806 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Android also contained quite a few novel design ideas that had never been done before, and it was not clear how they would pan out. This all needed to come together as a stable product, and the team spent a few nail-biting months wonder- ing if all of this stuff would actually come together and work as intended. read more..

  • Page - 838

    SEC. 10.8 ANDROID 807 Google code. However, the implementation of Google’s proprietary code was often not yet cleaned up, having dependencies on internal parts of the platform. Often the platform did not even hav e facilities that Google’s proprietary code need- ed in order to integrate well with it. A series of projects were soon undertaken to address these issues: 1. In read more..

  • Page - 839

    808 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 designed to be neutral as much as possible to the higher-level system features built on top of it, from access to cloud services (such as data sync or cloud-to-device messaging APIs), to libraries (such as Google’s mapping library) and rich services like application stores. 4. Provide an application security model in which read more..

  • Page - 840

    SEC. 10.8 ANDROID 809 10.8.4 Android Architecture Android is built on top of the standard Linux kernel, with only a few signifi- cant extensions to the kernel itself that will be discussed later. Once in user space, however, its implementation is quite different from a traditional Linux distribution and uses many of the Linux features you already understand in very different read more..

  • Page - 841

    810 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 is called system server, which contains all of the core operating system services. Ke y parts of this are the power manager, package manager, window manager, and activity manager. Other processes will be created from zygote as needed. Some of these are ‘‘persistent’’ processes that are part of the basic operating read more..

  • Page - 842

    SEC. 10.8 ANDROID 811 Application process System server Application Code PackageManager PackageManagerService Service manager "package" Binder IPC Binder IPC Binder IPC Figure 10-40. Publishing and interacting with system services. executing without an external interrupt such as pressing a power key. While run- ning, secondary pieces of hardware may be turned on or off as needed, but the read more..

  • Page - 843

    812 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 their new dog and turn on the device to take a picture of her. In this kind of typical mobile usage, any delay from pulling the device out until it is ready for use has a significant negative impact on the user experience. Given these requirements, one solution would be to just not have the CPU go to sleep read more..

  • Page - 844

    SEC. 10.8 ANDROID 813 opportunity there to acquire its own wake lock. This flow may continue across subsystems in user space as well; as long as something is holding a wake lock, we continue performing the desired processing to respond to the event. Once no more wake locks are held, however, the entire system falls back to sleep and all proc- essing stops. Out-Of-Memory read more..

  • Page - 845

    814 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Instead of trying to guess which processes should be killed, the Android out-of-memory killer relies very strictly on information provided to it by user space. The traditional Linux out-of-memory killer has a per-process oom adj pa- rameter that can be used to guide it toward the best process to kill by modifying the read more..

  • Page - 846

    SEC. 10.8 ANDROID 815 The use of Linux processes and security greatly simplifies the Dalvik environ- ment, since it is no longer responsible for these critical aspects of system stability and robustness. Not incidentally, it also allows applications to freely use native code in their implementation, which is especially important for games which are usually built with C++-based engines. read more..

  • Page - 847

    816 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Preloaded resources Preloaded classes Dalvik Copy-on-write Dalvik Preloaded classes Preloaded resources Application classes and resources App process Zygote Figure 10-41. Creating a new Dalvik process from zygote. right. Android’s Binder interprocess communication mechanism is a rich general- purpose IPC facility that most of the Android system read more..

  • Page - 848

    SEC. 10.8 ANDROID 817 Platform / Application Interface definitions Method calls Ilnterface / aidl transact() onTransact() IBinder / Binder Binder user space Result codes command Codes ioctl() Binder kernel module Figure 10-42. Binder IPC architecture. is executed in the receiving process; the sender may block while the receiver ex- ecutes, allowing a result to be returned back from the call. read more..

  • Page - 849

    818 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 the sender. It determines which process is responsible for the target of the transac- tion and wakes up a thread in the process to receive it. Once the receiving process is executing, it determines the appropriate target of the transaction and delivers it. Process 1 Process 2 Transaction To: Object1 From: Process 1 (Data) read more..

  • Page - 850

    SEC. 10.8 ANDROID 819 Process 2—this is known by the kernel to be associated with Process 2, and further the kernel has assigned Handle 2 for it in Process 1. Process 1 can thus submit a transaction to the kernel targeted to its Handle 2, and from that the kernel can de- termine this is being sent to Process 2 and specifically Object2a in that process. Process 1 read more..

  • Page - 851

    820 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Process 1 Process 1 Process 2 Object1b Object2a Object2a Object1b Handle 2 Handle 2 Handle 1 Handle 2 Handle 3 Handle 3 Handle 1 1 3 4 6 5 6 8 Transaction Transaction Transaction Transaction To: Handle 2 To: Handle 2 To: Object2a From: Process 1 From: Process 1 To: Object2a From: Process 1 Data Data Object1b Data Data Data Data Object1b Handle read more..

  • Page - 852

    SEC. 10.8 ANDROID 821 Binder User-Space API Most user-space code does not directly interact with the Binder kernel module. Instead, there is a user-space object-oriented library that provides a simpler API. The first level of these user-space APIs maps fairly directly to the kernel concepts we have covered so far, in the form of three classes: 1. IBinder is an abstract interface read more..

  • Page - 853

    822 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Process 1 Binder1b Binder1b Binder1b Parcel transact() BinderProxy (Handle 2) BinderProxy (Handle 3) onTransact() Parcel Binder1b Binder2b Binder2a Process 1 Process 2 Process 2 Handle 1 Handle 1 Handle 2 Handle 3 Handle 3 Handle 3 Handle 2 Transaction Transaction To: Handle 2 From: Process 1 To: Binder2a From: Process 1 Data Data Data Data Data Data read more..

  • Page - 854

    SEC. 10.8 ANDROID 823 An interface description like that in Fig. 10-47 is compiled by AIDL to gener- ate three Java-language classes illustrated in Fig. 10-48: 1. IExample supplies the Java-language interface definition. 2. IExample.Stub is the base class for implementations of this inter- face. It inherits from Binder, meaning it can be the recipient of IPC calls; it inherits from read more..

  • Page - 855

    824 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 4. The transaction is decoded back into a Parcel and onTransact called on the appropriate local object, here ExampleImpl (which is a sub- class of IExample.Stub). 5. IExample.Stub decodes the Parcel into the appropriate method and arguments to call, here calling print. 6. The concrete implementation of print in ExampleImpl finally read more..

  • Page - 856

    SEC. 10.8 ANDROID 825 package android.os interface IServiceManager { IBinder getService(Str ing name); void addService(Str ing name, IBinder binder); } Figure 10-50. Basic service manager AIDL interface. An Android application by convention is a file with the apk extension, for Android Package. This file is actually a normal zip archive, containing everything about the application. The important read more..

  • Page - 857

    826 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 <?xml v ersion="1.0" encoding="utf-8"?> <manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.example.email"> <application> <activity android:name="com.example.email.MailMainActivity"> <intent-filter> <action android:name="android.intent.action.MAIN" /> <categor y read more..

  • Page - 858

    SEC. 10.8 ANDROID 827 Applications statically declare their entry points in their manifest so they do not need to execute code at install time that registers them with the system. This design makes the system more robust in many ways: installing an application does not require running any application code, the top-level capabilities of the applica- tion can always be determined by read more..

  • Page - 859

    828 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 with the appropriate ActivityRecord. This activity is in a state called resumed since it is now in the foreground of the user interface. Activity manager in system_server process Email app process MailMainActivity Task: Email ActivityRecord (MailMainActivity) RESUMED Figure 10-52. Starting an email application’s main activity. If the user read more..

  • Page - 860

    SEC. 10.8 ANDROID 829 manager and stores in the system server process, in the ActivityRecord associated with that activity. The saved state for an activity is generally small, containing for example where you are scrolled in an email message, but not the message itself, which will be stored elsewhere by the application in its persistent storage. Recall that although Android does read more..

  • Page - 861

    830 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 be started and given the picture to be shared. (Later we will see how the camera application is able to find the email application’s ComposeActivity.) Performing that share option while in the activity state seen in Fig. 10-54 will lead to the new state in Fig. 10-55. There are a number of important things to note: 1. read more..

  • Page - 862

    SEC. 10.8 ANDROID 831 application. Figure 10-56 shows the new state the system will be in. Note that we have brought the email task with its main activity back to the foreground. This makes MailMainActivity the foreground activity, but there is currently no instance of it running in the application’s process. Activity manager in system_server process Email app process Camera app read more..

  • Page - 863

    832 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 2. It can serve as a connection point for other applications or the system to perform rich interaction with the application. This can be used by applications to provide secure APIs for other applications, such as to perform image or audio processing, provide a text to speech, etc. The example email manifest shown in Fig. read more..

  • Page - 864

    SEC. 10.8 ANDROID 833 1. The client application tells the activity manager that it would like to bind to the service. 2. If the service is not already created, the activity manager creates it in the service application’s process. 3. The service returns the IBinder for its interface back to the activity manager, which now holds that IBinder in its ServiceRecord. 4. Now that read more..

  • Page - 865

    834 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 for a list of all receivers interested in the event, which is placed in a Broadcast- Record representing that broadcast. The activity manager will then proceed to step through each entry in the list, having each associated application’s process create and execute the appropriate receiver class. Activity manager in system_server read more..

  • Page - 866

    SEC. 10.8 ANDROID 835 content://com.example.email.provider.email/messages means the list of all email messages, while content://com.example.email.provider.email/messages/1 provides access to a single message at key number 1. To interact with a content provider, applications always go through a system API called ContentResolver, where most methods have an initial URI argument indicating the data to read more..

  • Page - 867

    836 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Activity manager in system_server process Email app process ProviderRecord (EmailProvider) EmailProvider ContentResolver 1. query() Client app process 3. Create IBinder IContentProvider.Stub IContentProvider.Proxy 4. Return 5. Return 2. Look up Authority IBinder IBinder 6. query() Figure 10-61. Interacting with a content provider. 3. The system read more..

  • Page - 868

    SEC. 10.8 ANDROID 837 most important part of such an intent is a pair of strings naming the component: the package name of the target application and class name of the component within that application. Now referring back to the activity of Fig. 10-52 in application Fig. 10-51, an explicit intent for this component would be one with package name com.example.email and class name read more..

  • Page - 869

    838 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 10.8.10 Application Sandboxes Traditionally in operating systems, applications are seen as code executing as the user, on the user’s behalf. This behavior has been inherited from the command line, where you run the ls command and expect that to run as your identity (UID), with the same access rights as you have on the read more..

  • Page - 870

    SEC. 10.8 ANDROID 839 caller. Binder IPC explicitly includes this information in every transaction deliv- ered across processes so a recipient of the IPC can easily ask for the UID of the caller. Android predefines a number of standard UIDs for the lower-level parts of the system, but most applications are dynamically assigned a UID, at first boot or in- stall time, from a read more..

  • Page - 871

    840 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 access that data, which is what we want since the pictures there may be sensitive data to the user. After the user has taken a picture, she may want to email it to a friend. Email is a separate application, in its own sandbox, with no access to the pictures in the camera application. How can the email application read more..

  • Page - 872

    SEC. 10.8 ANDROID 841 on processes and UIDs, so a security barrier always happens at a process boundary, and permissions themselves are associated with UIDs. Given this, a permission check can be performed by retrieving the UID associated with the incoming IPC and asking the package manager whether that UID has been granted the correspon- ding permission. For example, permissions for read more..

  • Page - 873

    842 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 receives a URI of the data to share, but does not know where it came from—in the figure here it comes from the camera, but any other application could use this to let the user email its data, from audio files to word-processing documents. The email application only needs to read that URI as a byte stream to add read more..

  • Page - 874

    SEC. 10.8 ANDROID 843 Activity manager in system_server process Camera app process Granted URIs Task: Pictures SEND content://pics/1 Saved state STOPPED RESUMED ActivityRecord (ComposeActivity) ActivityRecord (CameraActivity) To: ComposeActivity URI: content://pics/1 Allow Check PicturesProvider Authority: "pics" ComposeActivity Email app process Open content://pics/1 Receive data Figure 10-65. Sharing a picture using read more..

  • Page - 875

    844 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Activity manager in system_server process Camera app process PicturesProvider Authority: "pics" ComposeActivity Email app process Granted URls Allow Check Open content://pics/1 Receive data To: ComposeActivity URI: content://pics/1 Task: Pictures ActivityRecord (PicturePickerActivity) ActivityRecord (ComposeActivity) Saved state read more..

  • Page - 876

    SEC. 10.8 ANDROID 845 Starting Processes In order to launch new processes, the activity manager must communicate with the zygote. When the activity manager first starts, it creates a dedicated socket with zygote, through which it sends a command when it needs to start a process. The command primarily describes the sandbox to be created: the UID that the new process should run as read more..

  • Page - 877

    846 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 8. Activity manager sends to the new process any pending operations, in this case ‘‘start acti vity X.’’ 9. New process receives the command to start an activity, instantiates the appropriate Java class, and executes it. System_server process Application process Activity instance Application code Android framework PackageManagerService read more..

  • Page - 878

    SEC. 10.8 ANDROID 847 based on the state of that process, by classifying them into major categories of use. Figure 10-68 shows the main categories, with the most important category first. The last column shows a typical oom adj value that is assigned to processes of this type. Categor y Description oom adj SYSTEM The system and daemon processes −16 PERSISTENT Always-r unning read more..

  • Page - 879

    848 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 implemented by a content provider in the camera application. Other applications may want to access that picture data, becoming a client of the camera application. Dependencies between processes can happen with both content providers (through simple access to the provider) and services (by binding to a service). In either case, the read more..

  • Page - 880

    SEC. 10.9 SUMMARY 849 Process State Impor tance system Core par t of operating system SYSTEM phone Always running for telephony stack PERSISTENT email Current foreground application FOREGROUND camera In use by email to load attachment FOREGROUND music Running background service playing music PERCEPTIBLE media In use by music app for accessing user’s music PERCEPTIBLE download Downloading a file read more..

  • Page - 881

    850 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 Process management in Linux is different compared to other UNIX systems in that Linux views each execution entity—a single-threaded process, or each thread with- in a multithreaded process or the kernel—as a distinguishable task. A process, or a single task in general, is then represented via two key components, the task read more..

  • Page - 882

    SEC. 10.9 SUMMARY 851 PROBLEMS 1. Explain how writing UNIX in C made it easier to port it to new machines. 2. The POSIX interface defines a set of library procedures. Explain why POSIX stan- dardizes library procedures instead of the system-call interface. 3. Linux depends on gcc compiler to be ported to new architectures. Describe one advan- tage and one disadvantage of this read more..

  • Page - 883

    852 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 12. Why are negative arguments to nice reserved exclusively for the superuser? 13. A non-real-time Linux process has priority levels from 100 to 139. What is the default static priority and how is the nice value used to change this? 14. Does it make sense to take away a process’ memory when it enters zombie state? read more..

  • Page - 884

    CHAP. 10 PROBLEMS 853 28. In Linux, the data and stack segments are paged and swapped to a scratch copy kept on a special paging disk or partition, but the text segment uses the executable binary file instead. Why? 29. Describe a way to use mmap and signals to construct an interprocess-communication mechanism. 30. A file is mapped in using the following mmap system call: read more..

  • Page - 885

    854 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 41. In Fig. 10-24, both Fred and Lisa have access to the file x in their respective directories after linking. Is this access completely symmetrical in the sense that anything one of them can do with it the other one can, too? 42. As we have seen, absolute path names are looked up starting at the root directory read more..

  • Page - 886

    CHAP. 10 PROBLEMS 855 running the program special rights only with respect to access to files. Why is this fea- ture useful? 56. On a Linux system, go to /proc/#### directory, where #### is a decimal number cor- responding to a process currently running in the system. Answer the following along with an explanation: (a) What is the size of most of the files in this read more..

  • Page - 887

    856 CASE STUDY 1: UNIX, LINUX, AND ANDROID CHAP. 10 mythreads exit , mythreads yield , mythreads self , and perhaps a few others. Next, im- plement these synchronization variables to enable safe concurrent operations: mythreads mutex init , mythreads mutex lock , mythreads mutex unlock . Before start- ing, clearly define the API and specify the semantics of each of the calls. Next read more..

  • Page - 888

    11 CASE STUDY 2: WINDOWS 8 Windows is a modern operating system that runs on consumer PCs, laptops, tablets and phones as well as business desktop PCs and enterprise servers. Win- dows is also the operating system used in Microsoft’s Xbox gaming system and Azure cloud computing infrastructure. The most recent version is Windows 8.1. In this chapter we will examine various read more..

  • Page - 889

    858 CASE STUDY 2: WINDOWS 8 CHAP. 11 Year MS−DOS Notes MS-DOS based Windows NT-based Windows Modern Windows 1981 1.0 Initial release for IBM PC 1983 2.0 Suppor t for PC/XT 1984 3.0 Suppor t for PC/AT 1990 3.0 Ten million copies in 2 years 1991 5.0 Added memory management 1992 3.1 Ran only on 286 and later 1993 NT 3.1 1995 7.0 95 MS-DOS embedded in Win 95 1996 NT 4.0 1998 98 read more..

  • Page - 890

    SEC. 11.1 HISTORY OF WINDOWS THROUGH WINDOWS 8.1 859 MS-DOS was a 16-bit real-mode, single-user, command-line-oriented operat- ing system consisting of 8 KB of memory resident code. Over the next decade, both the PC and MS-DOS continued to evolve, adding more features and capabili- ties. By 1986, when IBM built the PC/AT based on the Intel 286, MS-DOS had grown to be 36 KB, read more..

  • Page - 891

    860 CASE STUDY 2: WINDOWS 8 CHAP. 11 Cutler’s system was called NT for New Technology (and also because the orig- inal target processor was the new Intel 860, code-named the N10). NT was de- signed to be portable across different processors and emphasized security and reliability, as well as compatibility with the MS-DOS-based versions of Windows. Cutler’s background at DEC read more..

  • Page - 892

    SEC. 11.1 HISTORY OF WINDOWS THROUGH WINDOWS 8.1 861 NT did meet its portability goals, with additional releases in 1994 and 1995 adding support for (little-endian) MIPS and PowerPC architectures. The first major upgrade to NT came with Windows NT 4.0 in 1996. This system had the power, security, and reliability of NT, but also sported the same user interface as the by-then read more..

  • Page - 893

    862 CASE STUDY 2: WINDOWS 8 CHAP. 11 Microsoft followed up Windows XP by embarking on an ambitious release to kindle renewed excitement among PC consumers. The result, Windows Vista, was completed in late 2006, more than fiv e years after Windows XP shipped. Win- dows Vista boasted yet another redesign of the graphical interface, and new securi- ty features under the covers. read more..

  • Page - 894

    SEC. 11.1 HISTORY OF WINDOWS THROUGH WINDOWS 8.1 863 the same time, processor performance ceased to improve at the same rate it had previously, due to the difficulties in dissipating the heat created by ever-increasing clock speeds. Moore’s Law continued to hold, but the additional transistors were going into new features and multiple processors rather than improvements in sin- read more..

  • Page - 895

    864 CASE STUDY 2: WINDOWS 8 CHAP. 11 applied to Windows Phone 8, which shares most of the core binaries with desktop and server Windows. Support of phones and tablets by Windows required support for the popular ARM architecture, as well as new Intel processors targeting those devices. What makes Windows 8 part of the Modern Windows era are the funda- mental changes in the read more..

  • Page - 896

    SEC. 11.2 PROGRAMMING WINDOWS 865 Hardware abstraction layer (hal.dll) Hypervisor (hvix, hvax) Drivers: devices, file systems, network NTOS executive layer (ntoskrnl.exe) GUI driver (Win32k.sys) NTOS kernel layer (ntoskrnl.exe) Kernel mode User mode Native NT API, C/C++ run-time (ntdll.dll) NT services: smss, lsass, services, winlogon, Win32 subsystem process (csrss.exe) Modern broker processes Windows read more..

  • Page - 897

    866 CASE STUDY 2: WINDOWS 8 CHAP. 11 shifting programmers away from a threading model to a task model in order to dis- entangle resource management (priorities, processor affinities) from the pro- gramming model (specifying concurrent activities). Other omitted Win32 APIs in- clude most of the Win32 virtual memory APIs. Programmers are expected to rely on the Win32 heap-management APIs read more..

  • Page - 898

    SEC. 11.2 PROGRAMMING WINDOWS 867 the only remaining subsystem supported, Windows still maintains the subsystem model, including the csrss.exe Win32 subsystem process. Subsystem process Program process Subsystem libraries Subsystem run-time library (CreateProcess hook) Subsystem kernel support NTOS Executive Local procedure call (LPC) Native NT system services User-mode Kernel-mode Native NT API,C/C++ run-time read more..

  • Page - 899

    868 CASE STUDY 2: WINDOWS 8 CHAP. 11 11.2.1 The Native NT Application Programming Interface Like all other operating systems, Windows has a set of system calls it can per- form. In Windows, these are implemented in the NTOS executive layer that runs in kernel mode. Microsoft has published very few of the details of these native system calls. They are used internally by read more..

  • Page - 900

    SEC. 11.2 PROGRAMMING WINDOWS 869 access requested. When handles are duplicated between processes, new access restrictions can be added that are specific to the duplicated handle. Thus, a process can duplicate a read-write handle and turn it into a read-only version in the target process. Not all system-created data structures are objects and not all objects are kernel- mode objects. read more..

  • Page - 901

    870 CASE STUDY 2: WINDOWS 8 CHAP. 11 devices, processes, and interprocess communication (IPC) facilities like shared memory, message ports, semaphores, and I/O devices. In UNIX there are a variety of ways of naming and accessing objects, such as file descriptors, process IDs, and integer IDs for SystemV IPC objects, and i-nodes for devices. The implementation of each class of UNIX read more..

  • Page - 902

    SEC. 11.2 PROGRAMMING WINDOWS 871 handle for the object. Such objects can even extend the NT namespace by provid- ing parse routines that allow the objects to function somewhat like mount points in UNIX. File systems and the registry use this facility to mount volumes and hives onto the NT namespace. Accessing the device object for a volume gives access to the raw volume, but read more..

  • Page - 903

    872 CASE STUDY 2: WINDOWS 8 CHAP. 11 Win32 call Native NT API call CreateProcess NtCreateProcess CreateThread NtCreateThread SuspendThread NtSuspendThread CreateSemaphore NtCreateSemaphore ReadFile NtReadFile DeleteFile NtSetInfor mationFile CreateFileMapping NtCreateSection Vir tualAlloc NtAllocateVir tualMemory MapViewOfFile NtMapViewOfSection DuplicateHandle NtDuplicateObject CloseHandle NtClose Figure 11-8. Examples of read more..

  • Page - 904

    SEC. 11.2 PROGRAMMING WINDOWS 873 Win32 has calls for creating and managing both processes and threads. There are also many calls that relate to interprocess communication, such as creating, de- stroying, and using mutexes, semaphores, events, communication ports, and other IPC objects. Although much of the memory-management system is invisible to pro- grammers, one important feature is read more..

  • Page - 905

    874 CASE STUDY 2: WINDOWS 8 CHAP. 11 storage. Modifications to files or directory subtrees can be detected through a noti- fication mechanism, or by reading the journal that NTFS maintains for each vol- ume. Each file-system volume is implicitly mounted in the NT namespace, accord- ing to the name given to the volume, so a file \ foo \ bar might be named, for ex- ample, read more..

  • Page - 906

    SEC. 11.2 PROGRAMMING WINDOWS 875 drawing geometric figures, filling them in, managing the color palettes they use, dealing with fonts, and placing icons on the screen. Finally, there are calls for dealing with the keyboard, mouse and other human-input devices as well as audio, printing, and other output devices. The GUI operations work directly with the win32k.sys driver using special read more..

  • Page - 907

    876 CASE STUDY 2: WINDOWS 8 CHAP. 11 Hive file Mounted name Use SYSTEM HKLM \SYSTEM OS configuration infor mation, used by ker nel HARDWARE HKLM \HARDWARE In-memory hive recording hardware detected BCD HKLM \BCD* Boot Configuration Database SAM HKLM \SAM Local user account infor mation SECURITY HKLM \SECURITY lsass’ account and other security infor mation DEFAULT HKEY USERS \.DEFAULT read more..

  • Page - 908

    SEC. 11.2 PROGRAMMING WINDOWS 877 Win32 API function Description RegCreateKeyEx Create a new registr y key RegDeleteKey Delete a registry key RegOpenKeyEx Open a key to get a handle to it RegEnumKeyEx Enumerate the subkeys subordinate to the key of the handle RegQuer yValueEx Look up the data for a value within a key Figure 11-10. Some of the Win32 API calls for using the read more..

  • Page - 909

    878 CASE STUDY 2: WINDOWS 8 CHAP. 11 The division of NTOS into kernel and executive is a reflection of NT’s VAX/VMS roots. The VMS operating system, which was also designed by Cutler, had four hardware-enforced layers: user, supervisor, executive, and kernel corres- ponding to the four protection modes provided by the VAX processor architecture. The Intel CPUs also support four read more..

  • Page - 910

    SEC. 11.3 SYSTEM STRUCTURE 879 firmware represents configuration information and deals with differences in the CPU support chips, such as various interrupt controllers. The lowest software layer is the hypervisor, which Windows calls Hyper-V. The hypervisor is an optional feature (not shown in Fig. 11-11). It is available in many versions of Windows—including the professional desktop read more..

  • Page - 911

    880 CASE STUDY 2: WINDOWS 8 CHAP. 11 The Hardware Abstraction Layer One goal of Windows is to make the system portable across hardware plat- forms. Ideally, to bring up an operating system on a new type of computer system it should be possible to just recompile the operating system on the new platform. Unfortunately, it is not this simple. While many of the components in read more..

  • Page - 912

    SEC. 11.3 SYSTEM STRUCTURE 881 By using the HAL services and not addressing the hardware directly, drivers and the kernel require fewer changes when being ported to new processors—and in most cases can run unmodified on systems with the same processor architecture, despite differences in versions and support chips. The HAL does not provide abstractions or services for specific I/O read more..

  • Page - 913

    882 CASE STUDY 2: WINDOWS 8 CHAP. 11 way, without having to know anything about which interrupt vector is for which bus. Interrupt request level management is also handled in the HAL. Another HAL service is setting up and managing DMA transfers in a de- vice-independent way. Both the systemwide DMA engine and DMA engines on specific I/O cards can be handled. Devices are read more..

  • Page - 914

    SEC. 11.3 SYSTEM STRUCTURE 883 ntoskrnl.exe file which contains NTOS, the core of the Windows operating system. Or it can refer to the kernel layer within NTOS, which is how we use it in this sec- tion. It is even used to name the user-mode Win32 library that provides the wrap- pers for the native system calls: kernel32.dll. In the Windows operating system the kernel read more..

  • Page - 915

    884 CASE STUDY 2: WINDOWS 8 CHAP. 11 The system hardware assigns a hardware priority level to interrupts. The CPU also associates a priority level with the work it is performing. The CPU responds only to interrupts at a higher-priority level than it is currently using. Normal prior- ity levels, including the priority level of all user-mode work, is 0. Device inter- rupts occur read more..

  • Page - 916

    SEC. 11.3 SYSTEM STRUCTURE 885 response to a timer interrupt. To avoid blocking threads, timer events which need to run for an extended time should queue requests to the pool of worker threads the kernel maintains for background activities. Asynchronous Procedure Calls The other special kernel control object is the APC (Asynchronous Procedure Call) object. APCs are like DPCs in that read more..

  • Page - 917

    886 CASE STUDY 2: WINDOWS 8 CHAP. 11 as entering critical regions to defer APCs when acquiring locks or other resources, so that they cannot be terminated while still holding the resource. Dispatcher Objects Another kind of synchronization object is the dispatcher object. This is any ordinary kernel-mode object (the kind that users can refer to with handles) that contains a data read more..

  • Page - 918

    SEC. 11.3 SYSTEM STRUCTURE 887 locking primitives, like mutexes. When a thread that is waiting for a lock begins running again, the first thing it does is to retry acquiring the lock. If only one thread can hold the lock at a time, all the other threads made runnable might im- mediately block, incurring lots of unnecessary context switching. The difference between dispatcher read more..

  • Page - 919

    888 CASE STUDY 2: WINDOWS 8 CHAP. 11 pool of high-priority worker threads mentioned earlier which can be used to run bounded tasks by queuing a request and signaling the synchronization event that the worker threads are waiting on. The object manager manages most of the interesting kernel-mode objects used in the executive layer. These include processes, threads, files, semaphores, read more..

  • Page - 920

    SEC. 11.3 SYSTEM STRUCTURE 889 drivers that can be moved into user-mode processes, where a bug will only trigger the failure of a single driver (rather than bringing down the entire system), the bet- ter. The trend of moving code from the kernel to user-mode processes is expected to accelerate in the coming years. The I/O manager also includes the plug-and-play and device read more..

  • Page - 921

    890 CASE STUDY 2: WINDOWS 8 CHAP. 11 in terms of their location in their files. This differs from physical block caching, as in UNIX, where the system maintains a cache of the physically addressed blocks of the raw disk volume. Cache management is implemented using mapped files. The actual caching is performed by the memory manager. The cache manager need be concerned only read more..

  • Page - 922

    SEC. 11.3 SYSTEM STRUCTURE 891 tells the operating system how long to maintain it (e.g., until the next reboot or permanently). A publisher atomically updates the state as appropriate. Subscri- bers can arrange to run code whenever an instance of state data is modified by a publisher. Because the WNF state instances contain a fixed amount of preallocated data, there is no queuing read more..

  • Page - 923

    892 CASE STUDY 2: WINDOWS 8 CHAP. 11 routines to use for the I/O request packets that flow through the device stack. In some cases the devices in the stack represent drivers whose sole purpose is to filter I/O operations aimed at a particular device, bus, or network driver. Filtering is used for a number of reasons. Sometimes preprocessing or postprocessing I/O op- erations read more..

  • Page - 924

    SEC. 11.3 SYSTEM STRUCTURE 893 The network protocols, such as Windows’ integrated IPv4/IPv6 TCP/IP imple- mentation, are also loaded as drivers using the I/O model. For compatibility with the older MS-DOS-based Windows, the TCP/IP driver implements a special proto- col for talking to network interfaces on top of the Windows I/O model. There are other drivers that also implement such read more..

  • Page - 925

    894 CASE STUDY 2: WINDOWS 8 CHAP. 11 kernel, and executive layers, link in the driver images, and access/update configu- ration data in the SYSTEM hive. After all the kernel-mode components are ini- tialized, the first user-mode process is created using for running the smss.exe pro- gram (which is like /etc/init in UNIX systems). Recent versions of Windows provide support for read more..

  • Page - 926

    SEC. 11.3 SYSTEM STRUCTURE 895 to manage the NT namespace and implement objects using a common facility. These are directory, symbolic link, and object-type objects. The uniformity provided by the object manager has various facets. All these objects use the same mechanism for how they are created, destroyed, and ac- counted for in the quota system. They can all be accessed from read more..

  • Page - 927

    896 CASE STUDY 2: WINDOWS 8 CHAP. 11 Object header Object data Object-specific data Object name Directory in which the object lives Security information (which can use object) Quota charges (cost to use the object) List of processes with handles Reference counts Pointer to the type object Type name Access types Access rights Quota charges Synchronizable? Pageable Open method Close method Delete read more..

  • Page - 928

    SEC. 11.3 SYSTEM STRUCTURE 897 Handles User-mode references to kernel-mode objects cannot use pointers because they are too difficult to validate. Instead, kernel-mode objects must be named in some other way so the user code can refer to them. Windows uses handles to refer to kernel-mode objects. Handles are opaque values which are converted by the object manager into references to read more..

  • Page - 929

    898 CASE STUDY 2: WINDOWS 8 CHAP. 11 A: Handle-table entries [512] B: Handle-table pointers [1024] C:Handle-table entries [512] D: Handle-table pointers [32] E: Handle-table pointers [1024] F:Handle-table entries [512] Table pointer Handle-table Descriptor Object Object Object Figure 11-17. Handle-table data structures for a maximal table of up to 16 mil- lion handles. at the time the object is read more..

  • Page - 930

    SEC. 11.3 SYSTEM STRUCTURE 899 Procedure When called Notes Open For every new handle Rarely used Parse For object types that extend the namespace Used for files and registry keys Close At last handle close Clean up visible side effects Delete At last pointer dereference Object is about to be deleted Secur ity Get or set object’s secur ity descr iptor Protection Quer yName Get read more..

  • Page - 931

    900 CASE STUDY 2: WINDOWS 8 CHAP. 11 Apart from the object-type callbacks, the object manager also provides a set of generic object routines for operations like creating objects and object types, dupli- cating handles, getting a referenced pointer from a handle or name, adding and subtracting reference counts to the object header, and NtClose (the generic function that closes all read more..

  • Page - 932

    SEC. 11.3 SYSTEM STRUCTURE 901 object is closed it is important to delete the exclusive access at that point rather than wait for any incidental kernel references to eventually go away (e.g., after the last flush of data from memory). Otherwise closing and reopening a file from user mode may not work as expected because the file still appears to be in use. Though the read more..

  • Page - 933

    902 CASE STUDY 2: WINDOWS 8 CHAP. 11 NtCreateFile( \??\C:\ foo\ bar) IoCallDriver IRP File system filters Win32 CreateFile(C:\ foo\ bar) OpenObjectByName( \??\C:\ foo\ bar) I/O manager I/O manager Object manager IopParseDevice(DeviceObject,\ foo\ bar) C: s Device stack NTFS NtfsCreateFile() (5) IoCallDriver IoCompleteRequest User mode Kernel mode \ (a) (b) (1) Devices ?? C: Harddisk1 SYMLINK: \Devices\Harddisk1 read more..

  • Page - 934

    SEC. 11.3 SYSTEM STRUCTURE 903 6. The device objects encountered as the IRP heads toward the file sys- tem represent file-system filter drivers, which may modify the I/O op- eration before it reaches the file-system device object. Typically these intermediate devices represent system extensions like antivirus filters. 7. The file-system device object has a link to the file-system driver read more..

  • Page - 935

    904 CASE STUDY 2: WINDOWS 8 CHAP. 11 Type Description Process User process Thread Thread within a process Semaphore Counting semaphore used for interprocess synchronization Mutex Binar y semaphore used to enter a critical region Event Synchronization object with persistent state (signaled/not) ALPC port Mechanism for interprocess message passing Timer Object allowing a thread to sleep for a read more..

  • Page - 936

    SEC. 11.3 SYSTEM STRUCTURE 905 links allow a name in one part of the object namespace to refer to an object in a different part of the object namespace. Each device known to the operating system has one or more device objects that contain information about it and are used to refer to the device by the system. Finally, each device driver that has been loaded has a read more..

  • Page - 937

    906 CASE STUDY 2: WINDOWS 8 CHAP. 11 The implementation of DLLs is simple in concept. Instead of the compiler emitting code that calls directly to subroutines in the same executable image, a level of indirection is introduced: the IAT (Import Address Table). When an ex- ecutable is loaded it is searched for the list of DLLs that must also be loaded (this will be a graph read more..

  • Page - 938

    SEC. 11.3 SYSTEM STRUCTURE 907 kernel and services implemented in user-mode processes. Both the kernel and process provide private address spaces where data structures can be protected and service requests can be scrutinized. However, there can be significant performance differences between services in the kernel vs. services in user-mode processes. Entering the kernel from user mode is read more..

  • Page - 939

    908 CASE STUDY 2: WINDOWS 8 CHAP. 11 from an attacker attempting to exploit a vulnerability. As a result more and more services in Windows are turned off by default, particularly on versions of Windows Server. 11.4 PROCESSES AND THREADS IN WINDOWS Windows has a number of concepts for managing the CPU and grouping re- sources together. In the following sections we will examine read more..

  • Page - 940

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 909 but code that relies on these fields must query them often just to see if they hav e changed. As with many performance hacks, it is a bit ugly, but it works. Processes Processes are created from section objects, each of which describes a memory object backed by a file on disk. When a process is created, the creating read more..

  • Page - 941

    910 CASE STUDY 2: WINDOWS 8 CHAP. 11 resource management is that once a process is in a job, all processes’ threads in those processes create will also be in the job. There is no escape. As suggested by the name, jobs were designed for situations that are more like batch processing than ordinary interactive computing. In Modern Windows, jobs are used to group together read more..

  • Page - 942

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 911 are completely unaware of fibers, and applications that attempt to use fibers as if they were threads will encounter various failures. The kernel has no knowledge of fibers, and when a fiber enters the kernel, the thread it is executing on may block and the kernel will schedule an arbitrary thread on the processor, making it read more..

  • Page - 943

    912 CASE STUDY 2: WINDOWS 8 CHAP. 11 model that UNIX has. Each of these threads is allocated its own stack and its own memory to save its registers when not running. The two threads appear to be a sin- gle thread because they do not run at the same time. The user thread operates as an extension of the kernel thread, running only when the kernel thread switches to read more..

  • Page - 944

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 913 Name Description Notes Job Collection of processes that share quotas and limits Used in AppContainers Process Container for holding resources Thread Entity scheduled by the ker nel Fiber Lightweight thread managed entirely in user space Rarely used Thread pool Task-or iented programming model Built on top of threads User-mode thread Abstraction read more..

  • Page - 945

    914 CASE STUDY 2: WINDOWS 8 CHAP. 11 token so it can perform operations on the client’s behalf. (In general a service can- not use the client’s actual token, as the client and server may be running on dif- ferent systems.) Threads are also the normal focal point for I/O. Threads block when perform- ing synchronous I/O, and the outstanding I/O request packets for read more..

  • Page - 946

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 915 1. The actual search path for finding the program to execute is buried in the library code for Win32, but managed more explicitly in UNIX. 2. The current working directory is a kernel-mode concept in UNIX but a user-mode string in Windows. Windows does open a handle on the current directory for each process, with the same read more..

  • Page - 947

    916 CASE STUDY 2: WINDOWS 8 CHAP. 11 left to user-mode code that can use the handle on the new process to manipulate its virtual address space directly. To support the POSIX subsystem, native process creation has an option to cre- ate a new process by copying the virtual address space of another process rather than mapping a section object for a new program. This is read more..

  • Page - 948

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 917 be used over a network but do not provide guaranteed delivery. Finally, they allow the sending process to broadcast a message to many receivers, instead of to just one receiver. Both mailslots and named pipes are implemented as file systems in Windows, rather than executive functions. This allows them to be accessed over the read more..

  • Page - 949

    918 CASE STUDY 2: WINDOWS 8 CHAP. 11 Semaphores are kernel-mode objects and thus have security descriptors and hand- les. The handle for a semaphore can be duplicated using DuplicateHandle and pas- sed to another process so that multiple processes can synchronize on the same sem- aphore. A semaphore can also be given a name in the Win32 namespace and have an ACL set to read more..

  • Page - 950

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 919 the event is cleared. An alternative operation is PulseEvent , which is like SetEvent except that if nobody is waiting, the pulse is lost and the event is cleared. In con- trast, a SetEvent that occurs with no waiting threads is remembered by leaving the ev ent in the signaled state so a subsequent thread that calls a wait read more..

  • Page - 951

    920 CASE STUDY 2: WINDOWS 8 CHAP. 11 Win32 API Function Description CreateProcess Create a new process CreateThread Create a new thread in an existing process CreateFiber Create a new fiber ExitProcess Ter minate current process and all its threads ExitThread Ter minate this thread ExitFiber Ter minate this fiber SwitchToFiber Run a different fiber on the current thread SetPr ior read more..

  • Page - 952

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 921 5. The memory manager creates the address space for the new process by allocating and initializing the page directories and the virtual ad- dress descriptors which describe the kernel-mode portion, including the process-specific regions, such as the self-map page-directory en- tries that gives each process kernel-mode access to the physical read more..

  • Page - 953

    922 CASE STUDY 2: WINDOWS 8 CHAP. 11 15. If NtCreateUserProcess was successful, there is still some work to be done. Win32 processes have to be registered with the Win32 subsys- tem process, csrss.exe. Kernel32.dll sends a message to csrss telling it about the new process along with the process and thread handles so it can duplicate itself. The process and threads are entered read more..

  • Page - 954

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 923 In case 1, the thread is already in the kernel to carry out the operation on the dis- patcher or I/O object. It cannot possibly continue, so it calls the scheduler code to pick its successor and load that thread’s CONTEXT record to resume running it. In case 2, the running thread is in the kernel, too. However, after read more..

  • Page - 955

    924 CASE STUDY 2: WINDOWS 8 CHAP. 11 class of its process. The allowed values are: time critical, highest, above normal, normal, below normal, lowest, and idle. Time-critical threads get the highest non- real-time scheduling priority, while idle threads get the lowest, irrespective of the priority class. The other priority values adjust the base priority of a thread with re- spect read more..

  • Page - 956

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 925 To improve the scalability of the scheduling algorithm for multiprocessors with a high number of processors, the scheduler tries hard not to have to take the lock that protects access to the global array of priority lists. Instead, it sees if it can di- rectly dispatch a thread that is ready to run to the processor where read more..

  • Page - 957

    926 CASE STUDY 2: WINDOWS 8 CHAP. 11 Next thread to run Priority System priorities User priorities Zero page thread 31 24 16 8 1 0 Idle thread Figure 11-26. Windows supports 32 priorities for threads. busy. The amount of boost depends on the I/O device, typically 1 for a disk, 2 for a serial line, 6 for the keyboard, and 8 for the sound card. Second, if a thread was waiting read more..

  • Page - 958

    SEC. 11.4 PROCESSES AND THREADS IN WINDOWS 927 12 4 8 12 Does a down on the semaphore and blocks Semaphone Semaphone Blocked Running Ready Waiting on the semaphore Would like to do an up on the semaphore but never gets scheduled (a) (b) Figure 11-27. An example of priority inversion. problem is well known under the name priority inversion. Windows addresses priority inversion between read more..

  • Page - 959

    928 CASE STUDY 2: WINDOWS 8 CHAP. 11 11.5.1 Fundamental Concepts In Windows, every user process has its own virtual address space. For x86 ma- chines, virtual addresses are 32 bits long, so each process has 4 GB of virtual ad- dress space, with the user and kernel each receiving 2 GB. For x64 machines, both the user and kernel receive more virtual addresses than they read more..

  • Page - 960

    SEC. 11.5 MEMORY MANAGEMENT 929 memory is accessible only while running in kernel mode. The reason for sharing the process’ virtual memory with the kernel is that when a thread makes a system call, it traps into kernel mode and can continue running without changing the mem- ory map. All that has to be done is switch to the thread’s kernel stack. From a per- formance read more..

  • Page - 961

    930 CASE STUDY 2: WINDOWS 8 CHAP. 11 Pagefiles An interesting trade-off occurs with assignment of backing store to committed pages that are not being mapped to specific files. These pages use the pagefile. The question is how and when to map the virtual page to a specific location in the pagefile. A simple strategy would be to assign each virtual page to a page in one of read more..

  • Page - 962

    SEC. 11.5 MEMORY MANAGEMENT 931 Windows supports up to 16 pagefiles, normally spread out over separate disks to achieve higher I/O bandwidth. Each one has an initial size and a maximum size it can grow to later if needed, but it is better to create these files to be the maxi- mum size at system installation time. If it becomes necessary to grow a pagefile when the read more..

  • Page - 963

    932 CASE STUDY 2: WINDOWS 8 CHAP. 11 sequence of two or more pages that are consecutive in the virtual address space. Of course, processes do not have to manage their memory; paging happens auto- matically, but these calls give processes additional power and flexibility. Win32 API function Description Vir tualAlloc Reser ve or commit a region Vir tualFree Release or decommit a read more..

  • Page - 964

    SEC. 11.5 MEMORY MANAGEMENT 933 11.5.3 Implementation of Memory Management Windows, on the x86, supports a single linear 4-GB demand-paged address space per process. Segmentation is not supported in any form. Theoretically, page sizes can be any power of 2 up to 64 KB. On the x86 they are normally fixed at 4 KB. In addition, the operating system can use 2-MB large pages to read more..

  • Page - 965

    934 CASE STUDY 2: WINDOWS 8 CHAP. 11 particular address can be found efficiently. This scheme supports sparse address spaces. Unused areas between the mapped regions use no resources (memory or disk) so they are essential free. Page-Fault Handling When a process starts on Windows, many of the pages mapping the program’s EXE and DLL image files may already be in memory because read more..

  • Page - 966

    SEC. 11.5 MEMORY MANAGEMENT 935 format and is determined by the memory manager. For example, for an unmapped page that must be allocated and zeroed before it may be used, that fact is noted in the page-table entry. N X 63 AVL Physical page number 62 52 51 12 AVL 11 9 G 8 P A T 7 D 6 A 5 P C D 4 P W T 3 U / S 2 R / W 1 P 0 NX No eXecute AVL AVaiLable to the OS G read more..

  • Page - 967

    936 CASE STUDY 2: WINDOWS 8 CHAP. 11 an access violation and usually results in termination of the process. Access viola- tions are often the result of bad pointers, including accessing memory that was freed and unmapped from the process. The third case has the same symptoms as the second one (an attempt to write to a read-only page), but the treatment is different. Because read more..

  • Page - 968

    SEC. 11.5 MEMORY MANAGEMENT 937 The memory manager can allocate pages as needed using either the free list or the standby list. Before allocating a page and copying it in from disk, the memory manager always checks the standby and modified lists to see if it already has the page in memory. The prepaging scheme in Windows thus converts future hard faults into soft faults by read more..

  • Page - 969

    938 CASE STUDY 2: WINDOWS 8 CHAP. 11 CR3 PD 0x300 Self-map: PD[0xc0300000>>22] is PD (page-directory) Virtual address (a): (PTE *)(0xc0300c00) points to PD[0x300] which is the self-map page directory entry Virtual address (b): (PTE *)(0xc0390c84) points to PTE for virtual address 0xe4321000 (a) 1100 0000 00 11 1001 0000 1100 1000 01 00 Virtual address c0390c84 1100 0000 00 11 read more..

  • Page - 970

    SEC. 11.5 MEMORY MANAGEMENT 939 The working set manager runs every second, called from the balance set man- ager thread. The working-set manager throttles the amount of work it does to keep from overloading the system. It also monitors the writing of pages on the modified list to disk to be sure that the list does not grow too large, waking the Modified- PageWr iter thread read more..

  • Page - 971

    940 CASE STUDY 2: WINDOWS 8 CHAP. 11 X X X X State Cnt WS PT Other Next Clean Dirty Clean Active Clean Dirty Active Dirty Free Free Zeroed Active Active Zeroed 13 12 11 20 10 84 7 6 5 4 36 2 114 0 14 Standby Modified Free Zeroed Page tables Page-frame number database Zeroed List headers 9 Figure 11-33. Some of the major fields in the page-frame database for a valid page. faulted back to read more..

  • Page - 972

    SEC. 11.5 MEMORY MANAGEMENT 941 kernel stacks are unpinned from physical memory and their pages are moved to the standby or modified lists, also shown as (1). Tw o other system threads, the mapped page writer and the modified page writer, wake up periodically to see if there are enough clean pages. If not, they take pages from the top of the modified list, write them read more..

  • Page - 973

    942 CASE STUDY 2: WINDOWS 8 CHAP. 11 The store manager optimizes where and how physical memory pages are backed by the persistent stores in the system. It also implements optimization techniques such as copy-on-write sharing of identical physical pages and compression of the pages in the standby list to effectively increase the available RAM. Another change in memory management in read more..

  • Page - 974

    SEC. 11.6 CACHING IN WINDOWS 943 The Windows cache-manager facilities are shared among all the file systems. Because the cache is virtually addressed according to individual files, the cache manager is easily able to perform read-ahead on a per-file basis. Requests to ac- cess cached data come from each file system. Virtual caching is convenient be- cause the file systems do not read more..

  • Page - 975

    944 CASE STUDY 2: WINDOWS 8 CHAP. 11 and play) and power management for devices and the CPU—all using a fundamen- tally asynchronous structure that allows computation to overlap with I/O transfers. There are many hundreds of thousands of devices that work with Windows. For a large number of common devices it is not even necessary to install a driver, be- cause there is read more..

  • Page - 976

    SEC. 11.7 INPUT/OUTPUT IN WINDOWS 945 a convenient point for making a clean backup of their persistent state on the vol- ume. Once all the applications are ready, the system initializes the snapshot of the volume and then tells the applications that they can continue. The backup is made of the volume state at the point of the snapshot. And the applications were only blocked read more..

  • Page - 977

    946 CASE STUDY 2: WINDOWS 8 CHAP. 11 operations for setting parameters, as well as calls for flushing system buffers, and so on. At the Win32 layer these APIs are wrapped by interfaces that provide high- er-level operations specific to particular devices. At the bottom, though, these wrappers open devices and perform these basic types of operations. Even some metadata operations, read more..

  • Page - 978

    SEC. 11.7 INPUT/OUTPUT IN WINDOWS 947 the directory or detailed information about each file that is needed for an extended directory listing. Since this is really an I/O operation, all the standard ways of reporting that the I/O completed are supported. NtQuer yVolumeInfor mationFile is like the directory query operation, but expects a file handle which represents an open volume which read more..

  • Page - 979

    948 CASE STUDY 2: WINDOWS 8 CHAP. 11 the attributes of a specific file—but these are just wrappers around the other I/O manager operations we have listed and did not really need to be implemented as separate system calls. There are also system calls for dealing with I/O completion ports, a queuing facility in Windows that helps multithreaded servers make ef- ficient use of read more..

  • Page - 980

    SEC. 11.7 INPUT/OUTPUT IN WINDOWS 949 Driver Framework) for writing drivers as services that execute in the kernel, but with many of the details of WDM made automagical. Since underneath it is the WDM that provides the driver model, that is what we will focus on in this section. Devices in Windows are represented by device objects. Device objects are also used to represent read more..

  • Page - 981

    950 CASE STUDY 2: WINDOWS 8 CHAP. 11 I/O Request Packets Figure 11-37 shows the major fields in the IRP. The bottom of the IRP is a dy- namically sized array containing fields that can be used by each driver for the de- vice stack handling the request. These stack fields also allow a driver to specify the routine to call when completing an I/O request. During completion read more..

  • Page - 982

    SEC. 11.7 INPUT/OUTPUT IN WINDOWS 951 IRP to devices while it is being processed are reused when the I/O operation has finally completed to provide memory for the APC control object used to call the I/O manager’s completion routine in the context of the original thread. There is also a link field used to link all the outstanding IRPs to the thread that initiated them. read more..

  • Page - 983

    952 CASE STUDY 2: WINDOWS 8 CHAP. 11 separating this work from the device-specific part, driver writers are freed from learning how to control the bus. They can just use the standard bus driver in their stack. Similarly, USB and SCSI drivers have a device-specific part and a generic part, with common drivers being supplied by Windows for the generic part. Another use of read more..

  • Page - 984

    SEC. 11.8 THE WINDOWS NT FILE SYSTEM 953 use them. FAT-32 uses 32-bit disk addresses and supports disk partitions up to 2 TB. There is no security in FAT -32 and today it is really used only for tran- sportable media, like flash drives. NTFS is the file system developed specifically for the NT version of Windows. Starting with Windows XP it became the default file read more..

  • Page - 985

    954 CASE STUDY 2: WINDOWS 8 CHAP. 11 NTFS is a hierarchical file system, similar to the UNIX file system. The sepa- rator between component names is ‘‘ \’’, howev er, instead of ‘‘/’’, a fossil inherited from the compatibility requirements with CP/M when MS-DOS was created (CP/M used the slash for flags). Unlike UNIX the concept of the current working directory, read more..

  • Page - 986

    SEC. 11.8 THE WINDOWS NT FILE SYSTEM 955 CP/M, where each directory entry was called an extent. A bitmap keeps track of which MFT entries are free. The MFT is itself a file and as such can be placed anywhere within the volume, thus eliminating the problem with defective sectors in the first track. Furthermore, the file can grow as needed, up to a maximum size of 248 read more..

  • Page - 987

    956 CASE STUDY 2: WINDOWS 8 CHAP. 11 Record 1 is a duplicate of the early portion of the MFT file. This information is so precious that having a second copy can be critical in the event one of the first blocks of the MFT ever becomes unreadable. Record 2 is the log file. When struc- tural changes are made to the file system, such as adding a new directory or read more..

  • Page - 988

    SEC. 11.8 THE WINDOWS NT FILE SYSTEM 957 Attribute Description Standard infor mation Flag bits, timestamps, etc. File name File name in Unicode; may be repeated for MS-DOS name Secur ity descr iptor Obsolete. Secur ity infor mation is now in $Extend$Secure Attr ibute list Location of additional MFT records, if needed Object ID 64-bit file identifier unique to this volume Reparse point read more..

  • Page - 989

    958 CASE STUDY 2: WINDOWS 8 CHAP. 11 The next three attributes deal with how directories are implemented. Small ones are just lists of files but large ones are implemented using B+ trees. The logged utility stream attribute is used by the encrypting file system. Finally, we come to the attribute that is the most important of all: the data stream (or in some cases, streams). read more..

  • Page - 990

    SEC. 11.8 THE WINDOWS NT FILE SYSTEM 959 Standard info header File name header Data header Info about data blocks Run #1 Run #2 Run #3 Standard info File name 0 9 20 4 642803 Unused Disk blocks Blocks numbers 20-23 64-65 80-82 MTF record Record header Header Figure 11-41. An MFT record for a three-run, nine-block stream. In this figure we have an MFT record for a short stream of read more..

  • Page - 991

    960 CASE STUDY 2: WINDOWS 8 CHAP. 11 109 108 106 105 103 102 100 Run #m+1 Run n Run #k+1 Run m MFT 105 Run #1 MFT 108 Run #k Second extension record First extension record Base record 101 104 107 Figure 11-42. A file that requires three MFT records to store all its runs. Note that Fig. 11-42 contains some redundancy. In theory, it should not be necessary to specify the end read more..

  • Page - 992

    SEC. 11.8 THE WINDOWS NT FILE SYSTEM 961 Standard info header Index root header Standard info Unused Record header A directory entry contains the MFT index for the file, the length of the file name, the file name itself, and various fields and flags Figure 11-43. The MFT record for a small directory. We now hav e enough information to finish describing how file-name lookup read more..

  • Page - 993

    962 CASE STUDY 2: WINDOWS 8 CHAP. 11 in the device stack inserted into the IRP as the request was being made. A driver that wants to tag a file associates a reparse tag and then watches for completion re- quests for file open operations that failed because they encountered a reparse point. From the block of data that is passed back with the IRP, the driver can tell read more..

  • Page - 994

    SEC. 11.8 THE WINDOWS NT FILE SYSTEM 963 Compressed 016 32 47 7 0 30 37 24 31 85 8 40 92 23 55 Disk addr Original uncompressed file Compressed Uncompressed Standard info File name 0 48 30 8 0 8 40 16 85 (a) (b) Unused 8 08 Header Five runs (of which two empties) Figure 11-44. (a) An example of a 48-block file being compressed to 32 blocks. (b) The MFT record for the file after read more..

  • Page - 995

    964 CASE STUDY 2: WINDOWS 8 CHAP. 11 new files moved to them or created in them to be encrypted as well. The actual en- cryption and decryption are not managed by NTFS itself, but by a driver called EFS (Encryption File System), which registers callbacks with NTFS. EFS provides encryption for specific files and directories. There is also anoth- er encryption facility in read more..

  • Page - 996

    SEC. 11.9 WINDOWS POWER MANAGEMENT 965 the current generation of multiprocessors, both hibernation and resume can be per- formed in a few seconds even on systems with many gigabytes of RAM. An alternative to hibernation is standby mode where the power manager re- duces the entire system to the lowest power state possible, using just enough power to the refresh the dynamic RAM. read more..

  • Page - 997

    966 CASE STUDY 2: WINDOWS 8 CHAP. 11 when to run such background activities. For example, checking for updates might occur only once a day or at the next time the device is charging its battery. A set of system brokers provide a variety of conditions which can be used to limit when background activity is performed. If a background task needs to access a low-cost network read more..

  • Page - 998

    SEC. 11.10 SECURITY IN WINDOWS 8 967 failed. Windows prevents this attack by instructing users to hit CTRL-ALT-DEL to log in. This key sequence is always captured by the keyboard driver, which then invokes a system program that puts up the genuine login screen. This procedure works because there is no way for user processes to disable CTRL-ALT-DEL proc- essing in the keyboard read more..

  • Page - 999

    968 CASE STUDY 2: WINDOWS 8 CHAP. 11 access control list assigned to objects created by the process if no other ACL is specified. The user SID tells who owns the process. The restricted SIDs are to allow untrustworthy processes to take part in jobs with trustworthy processes but with less power to do damage. Finally, the privileges listed, if any, giv e the process read more..

  • Page - 1000

    SEC. 11.10 SECURITY IN WINDOWS 8 969 access. This simple example is illustrated in Fig. 11-46. The SID Everyone refers to the set of all users, but it is overridden by any explicit ACEs that follow. Security descriptor Header Owner's SID Group SID DACL SACL Header Audit Marilyn 111111 Security descriptor Header Allow Everyone Deny Elvis 111111 Allow Cathy 110000 Allow Ida ACE ACE File read more..

  • Page - 1001

    970 CASE STUDY 2: WINDOWS 8 CHAP. 11 initialized using InitializeSecur ityDescr iptor . This call fills in the header. If the owner SID is not known, it can be looked up by name using LookupAccountSid .It can then be inserted into the security descriptor. The same holds for the group SID, if any. Normally, these will be the caller’s own SID and one of the called’s read more..

  • Page - 1002

    SEC. 11.10 SECURITY IN WINDOWS 8 971 performs this check by looking at the caller’s access token and the DACL associ- ated with the object. It goes down the list of ACEs in the ACL in order. As soon as it finds an entry that matches the caller’s SID or one of the caller’s groups, the access found there is taken as definitive. If all the rights the caller read more..

  • Page - 1003

    972 CASE STUDY 2: WINDOWS 8 CHAP. 11 Yet another change was the introduction of what Microsoft calls UA C (User Account Control). This is to address the chronic problem in Windows where most users run as administrators. The design of Windows does not require users to run as administrators, but neglect over many releases had made it just about impos- sible to use Windows read more..

  • Page - 1004

    SEC. 11.10 SECURITY IN WINDOWS 8 973 11.10.4 Security Mitigations It would be great for users if computer software did not have any bugs, particu- larly bugs that are exploitable by hackers to take control of their computer and steal their information, or use their computer for illegal purposes such as distrib- uted denial-of-service attacks, compromising other computers, and read more..

  • Page - 1005

    974 CASE STUDY 2: WINDOWS 8 CHAP. 11 the address space. Recent work shows how running programs can be rerandom- ized every few seconds, making attacks even more difficult (Giuffrida et al., 2012). Heap hardening is a series of mitigations added to the Windows imple- mentation of the heap that make it more difficult to exploit vulnerabilities such as writing beyond the boundaries read more..

  • Page - 1006

    SEC. 11.10 SECURITY IN WINDOWS 8 975 Many of these mitigations are under the control of compiler and linker flags. If applications, kernel device drivers, or plug-in libraries read data into executable memory or include code without /GS and ASLR enabled, the mitigations are not present and any vulnerabilities in the programs are much easier to exploit. Fortu- nately, in recent read more..

  • Page - 1007

    976 CASE STUDY 2: WINDOWS 8 CHAP. 11 I/O performance for many applications because read operations can be satisfied without accessing the disk. I/O is performed by device drivers, which follow the Windows Driver Model. Each driver starts out by initializing a driver object that contains the addresses of the procedures that the system can call to manipulate devices. The actual read more..

  • Page - 1008

    CHAP. 11 PROBLEMS 977 6. Win32 does not have signals. If they were to be introduced, they could be per process, per thread, both, or neither. Make a proposal and explain why it is a good idea. 7. An alternative to using DLLs is to statically link each program with precisely those li- brary procedures it actually calls, no more and no less. If this scheme were to read more..

  • Page - 1009

    978 CASE STUDY 2: WINDOWS 8 CHAP. 11 18. Windows uses a facility called Autoboost to temporarily raise the priority of a thread that holds the resource that is required by a higher-priority thread. How do you think this works? 19. In Windows it is easy to implement a facility where threads running in the kernel can temporarily attach to the address space of a different read more..

  • Page - 1010

    CHAP. 11 PROBLEMS 979 Give an example of how Windows might do something similar using NtCreateProcess . (Hint: Consider processes that host DLLs to implement functionality provided by a third party). 31. A file has the following mapping. Give the MFT run entries. Offset 0123456789 10 Disk address 50 51 52 22 24 25 26 53 54 - 60 32. Consider the MFT record of Fig. 11-41. read more..

  • Page - 1011

    980 CASE STUDY 2: WINDOWS 8 CHAP. 11 41. The regedit command can be used to export part or all of the registry to a text file under all current versions of Windows. Save the registry several times during a work session and see what changes. If you have access to a Windows computer on which you can install software or hardware, find out what changes when a program read more..

  • Page - 1012

    12 OPERATING SYSTEM DESIGN In the past 11 chapters, we have covered a lot of ground and taken a look at many concepts and examples relating to operating systems. But studying existing operating systems is different from designing a new one. In this chapter we will take a quick look at some of the issues and trade-offs that operating systems de- signers have to consider read more..

  • Page - 1013

    982 OPERATING SYSTEM DESIGN CHAP. 12 12.1 THE NATURE OF THE DESIGN PROBLEM Operating system design is more of an engineering project than an exact sci- ence. It is hard to set clear goals and meet them. Let us start with these points. 12.1.1 Goals In order to design a successful operating system, the designers must have a clear idea of what they want. Lack of a goal read more..

  • Page - 1014

    SEC. 12.1 THE NATURE OF THE DESIGN PROBLEM 983 manipulate these data structures. For example, users can read and write files. The primitive operations are implemented in the form of system calls. From the user’s point of view, the heart of the operating system is formed by the abstractions and the operations on them available via the system calls. Since on some computers read more..

  • Page - 1015

    984 OPERATING SYSTEM DESIGN CHAP. 12 Or even a few years. All current versions of UNIX contain millions of lines of code; Linux has hit 15 million, for example. Windows 8 is probably in the range of 50–100 million lines of code, depending on what you count (Vista was 70 mil- lion, but changes since then have both added code and removed it). No one person can read more..

  • Page - 1016

    SEC. 12.1 THE NATURE OF THE DESIGN PROBLEM 985 regard to one another. An example of where this diversity causes problems is the need for an operating system to run on both little-endian and big-endian machines. A second example was seen constantly under MS-DOS when users attempted to install, say, a sound card and a modem that used the same I/O ports or interrupt re- quest read more..

  • Page - 1017

    986 OPERATING SYSTEM DESIGN CHAP. 12 If you want to get really picky, he didn’t say that. He said: Il semble que la perfection soit atteinte non quand il n’y a plus rien a` ajouter, mais quand il n’y a plus rien a` retrancher. But you get the idea. Memorize it either way. This principle says that less is better than more, at least in the operating system read more..

  • Page - 1018

    SEC. 12.2 INTERFACE DESIGN 987 to be sent and the reply to be requested with only one kernel trap. Everything else is done by requesting some other process (e.g., the file-system process or the disk driver) to do the work. The most recent version of MINIX added two additional calls, both for asynchronous communication. The senda call sends an asynchro- nous message. The kernel read more..

  • Page - 1019

    988 OPERATING SYSTEM DESIGN CHAP. 12 mostly deal with the system call interface. If the intention is to have a single GUI that pervades the complete system, as in the Macintosh, the design should begin there. If, on the other hand, the intention is to support many possible GUIs, such as in UNIX, the system-call interface should be designed first. Doing the GUI first is read more..

  • Page - 1020

    SEC. 12.2 INTERFACE DESIGN 989 Execution Paradigms Architectural coherence is important at the user level, but equally important at the system-call interface level. It is often useful to distinguish between the execu- tion paradigm and the data paradigm, so we will do both, starting with the former. Tw o execution paradigms are widespread: algorithmic and event driven. The algorithmic read more..

  • Page - 1021

    990 OPERATING SYSTEM DESIGN CHAP. 12 batch systems, everything was modeled as a sequential magnetic tape. Card decks read in were treated as input tapes, card decks to be punched were treated as output tapes, and output for the printer was treated as an output tape. Disk files were also treated as tapes. Random access to a file was possible only by rewinding the tape read more..

  • Page - 1022

    SEC. 12.2 INTERFACE DESIGN 991 system, Plan 9 from Bell Labs, has not compromised and does not provide spe- cialized interfaces for network sockets and such. As a result, the Plan 9 design is arguably cleaner. Windows tries to make everything look like an object. Once a process has ac- quired a valid handle to a file, process, semaphore, mailbox, or other kernel object, it read more..

  • Page - 1023

    992 OPERATING SYSTEM DESIGN CHAP. 12 system call keeps the operating system simple, yet the programmer gets the con- venience of various ways to call exec . Of course, trying to have one call to handle every possible case can easily get out of hand. In UNIX creating a process requires two calls: fork followed by exec . The former has no parameters; the latter has three. read more..

  • Page - 1024

    SEC. 12.2 INTERFACE DESIGN 993 On the other hand, some remote file-access protocols are connectionless. The Web protocol (HTTP) is connectionless. To read a Web page you just ask for it; there is no advance setup required (a TCP connection is required, but this is at a lower level of protocol. HTTP itself is connectionless). The trade-off between any connection-oriented mechanism read more..

  • Page - 1025

    994 OPERATING SYSTEM DESIGN CHAP. 12 of them is more a way of trying to describe the system than a real guiding principle that was used in building the system. For a new system, designers choosing to go this route should first very careful- ly choose the layers and define the functionality of each one. The bottom layer should always try to hide the worst idiosyncracies read more..

  • Page - 1026

    SEC. 12.3 IMPLEMENTATION 995 Exokernels While layering has its supporters among system designers, another camp has precisely the opposite view (Engler et al., 1995). Their view is based on the end- to-end argument (Saltzer et al., 1984). This concept says that if something has to be done by the user program itself, it is wasteful to do it in a lower layer as well. Consider read more..

  • Page - 1027

    996 OPERATING SYSTEM DESIGN CHAP. 12 Client process Client process Client process Process server File server Memory server Microkernel User mode Kernel mode Client obtains service by sending messages to server processes Figure 12-3. Client-server computing based on a microkernel. process could have the page for its device mapped in, but no other device pages. If the I/O port space can be read more..

  • Page - 1028

    SEC. 12.3 IMPLEMENTATION 997 military systems, where very high reliability is absolutely essential. Also, Apple’s OS X, which runs on all Macs and Macbooks, consists of a modified version of FreeBSD running on top of a modified version of the Mach microkernel. Extensible Systems With the client-server systems discussed above, the idea was to remove as much out of the kernel as read more..

  • Page - 1029

    998 OPERATING SYSTEM DESIGN CHAP. 12 Even if the policy module has to be kept in the kernel, it should be isolated from the mechanism, if possible, so that changes in the policy module do not affect the mechanism module. To make the split between policy and mechanism clearer, let us consider two real-world examples. As a first example, consider a large company that has a read more..

  • Page - 1030

    SEC. 12.3 IMPLEMENTATION 999 data types, including arrays, structures, and unions. These ideas combine indepen- dently, allowing arrays of integers, arrays of characters, structures and union mem- bers that are floating-point numbers, and so forth. In fact, once a new data type has been defined, such as an array of integers, it can be used as if it were a primitive data type, read more..

  • Page - 1031

    1000 OPERATING SYSTEM DESIGN CHAP. 12 www.cs.vu.nl/~ast/ indicates a specific machine (www) in a specific department (cs) at specific university (vu) in a specific country (nl). The part after the slash in- dicates a specific file on the designated machine, in this case, by convention, www/index.html in ast’s home directory. Note that URLs (and DNS addresses in general, including read more..

  • Page - 1032

    SEC. 12.3 IMPLEMENTATION 1001 handles and MFT entries. Although the names in the external namespaces are all Unicode strings, looking up a file name in the registry will not work, just as using an MFT index in the object table will not work. In a good design, considerable thought is given to how many namespaces are needed, what the syntax of names is in each one, how read more..

  • Page - 1033

    1002 OPERATING SYSTEM DESIGN CHAP. 12 12.3.6 Static vs. Dynamic Structures Operating system designers are constantly forced to choose between static and dynamic data structures. Static ones are always simpler to understand, easier to program, and faster in use; dynamic ones are more flexible. An obvious example is the process table. Early systems simply allocated a fixed array of read more..

  • Page - 1034

    SEC. 12.3 IMPLEMENTATION 1003 space should each one get? The trade-offs here are similar to those for the process table. Making key data structures like these dynamic is possible, but complicated. Another static-dynamic trade-off is process scheduling. In some systems, es- pecially real-time ones, the scheduling can be done statically in advance. For ex- ample, an airline knows what read more..

  • Page - 1035

    1004 OPERATING SYSTEM DESIGN CHAP. 12 write different modules. Each one tests its own work in isolation. When all the pieces are ready, they are integrated and tested. The problem with this line of at- tack is that if nothing works initially, it may be hard to isolate whether one or more modules are malfunctioning, or one group misunderstood what some other module was read more..

  • Page - 1036

    SEC. 12.3 IMPLEMENTATION 1005 Whenever the request requires the server to contact other servers for further proc- essing it sends an asynchronous message of its own and, rather than block, con- tinues with the next request. Multiple threads are not needed. With only a single thread processing events, the problem of multiple threads accessing shared data structures cannot occur. On read more..

  • Page - 1037

    1006 OPERATING SYSTEM DESIGN CHAP. 12 A third approach is to immediately convert an interrupt into a message to some thread. The low-level code just builds a message telling where the interrupt came from, enqueues it, and calls the scheduler to (potentially) run the handler, which was probably blocked waiting for the message. All these techniques, and others like them, all try to read more..

  • Page - 1038

    SEC. 12.3 IMPLEMENTATION 1007 #include "config.h" #include "config.h" init( ) #if (WORD LENGTH == 32) { typedef int Register; #if (CPU == IA32) #endif /* IA32 initialization here. */ #endif #if (WORD LENGTH == 64) typedef long Register; #if (CPU == ULTRASPARC) #endif /* UltraSPARC initialization here. */ #endif Register R0, R1, R2, R3; (a) (b) } Figure 12-6. (a) CPU-dependent read more..

  • Page - 1039

    1008 OPERATING SYSTEM DESIGN CHAP. 12 Furthermore, when the key is released later, a second interrupt is generated, also with the key number. This indirection allows the operating system the possibility of using the key number to index into a table to get the ASCII character, which makes it easy to handle the many keyboards used around the world in different countries. read more..

  • Page - 1040

    SEC. 12.3 IMPLEMENTATION 1009 Reentrancy Reentrancy refers to the ability of code to be executed two or more times si- multaneously. On a multiprocessor, there is always the danger than while one CPU is executing some procedure, another CPU will start executing it as well, before the first one has finished. In this case, two (or more) threads on different CPUs might be executing read more..

  • Page - 1041

    1010 OPERATING SYSTEM DESIGN CHAP. 12 Check for Errors First Many system calls can fail for a variety of reasons: the file to be opened be- longs to someone else; process creation fails because the process table is full; or a signal cannot be sent because the target process does not exist. The operating sys- tem must painstakingly check for every possible error before read more..

  • Page - 1042

    SEC. 12.4 PERFORMANCE 1011 12.4.1 Why Are Operating Systems Slow? Before talking about optimization techniques, it is worth pointing out that the slowness of many operating systems is to a large extent self-inflicted. For example, older operating systems, such as MS-DOS and UNIX Version 7, booted within a few seconds. Modern UNIX systems and Windows 8 can take sev eral minutes read more..

  • Page - 1043

    1012 OPERATING SYSTEM DESIGN CHAP. 12 Here is a true story of where an optimization did more harm than good. One of the authors (AST) had a former student (who shall here remain nameless) who wrote the original MINIX mkfs program. This program lays down a fresh file sys- tem on a newly formatted disk. The student spent about 6 months optimizing it, including putting in read more..

  • Page - 1044

    SEC. 12.4 PERFORMANCE 1013 obvious procedure is given in Fig. 12-7(a). It loops over the bits in a byte, count- ing them one at a time. It is pretty simple and straightforward. #define BYTE SIZE 8 /* A byte contains 8 bits */ int bit count(int byte) {/* Count the bits in a byte. */ int i, count = 0; for (i = 0; i < BYTE SIZE; i++) /* loop over the bits in read more..

  • Page - 1045

    1014 OPERATING SYSTEM DESIGN CHAP. 12 value. With this approach no computation at all is needed at run time, just one indexing operation. A macro to do the job is given in Fig. 12-7(c). This is a clear example of trading computation time against memory. Howev er, we could go still further. If the bit counts for whole 32-bit words are needed, using our bit count macro, read more..

  • Page - 1046

    SEC. 12.4 PERFORMANCE 1015 For example, if there is a rectangular block of pixels all the same color in an image, a PostScript program for the image would carry instructions to place a rect- angle at a certain location and fill it with a certain color. Only a handful of bits are needed to issue this command. When the image is received at the printer, an inter- preter read more..

  • Page - 1047

    1016 OPERATING SYSTEM DESIGN CHAP. 12 Path I-node number /usr 6 /usr/ast 26 /usr/ast/mbox 60 /usr/ast/books 92 /usr/bal 45 /usr/bal/paper.ps 85 Figure 12-9. Part of the i-node cache for Fig. 4-34. /usr/ast/grants/erc is presented, the cache returns the fact that /usr/ast is i-node 26, so the search can start there, eliminating four disk accesses. A problem with caching paths is that the read more..

  • Page - 1048

    SEC. 12.4 PERFORMANCE 1017 12.4.6 Exploiting Locality Processes and programs do not act at random. They exhibit a fair amount of lo- cality in time and space, and this information can be exploited in various ways to improve performance. One well-known example of spatial locality is the fact that processes do not jump around at random within their address spaces. They tend to use read more..

  • Page - 1049

    1018 OPERATING SYSTEM DESIGN CHAP. 12 One way to do this is to keep a bit in the process table that tells whether an alarm is pending. If the bit is off, the easy path is followed (just add a new timer- queue entry without checking). If the bit is on, the timer queue must be checked. 12.5 PROJECT MANAGEMENT Programmers are perpetual optimists. Most of them think that read more..

  • Page - 1050

    SEC. 12.5 PROJECT MANAGEMENT 1019 a project takes 15 people 2 years to build, it is inconceivable that 360 people could do it in 1 month and probably not possible to have 60 people do it in 6 months. There are three reasons for this effect. First, the work cannot be fully paral- lelized. Until the planning is done and it has been determined what modules are needed read more..

  • Page - 1051

    1020 OPERATING SYSTEM DESIGN CHAP. 12 their desire to put their stamp on the project to carry out the initial architect’s plans. The result is an architectural coherence unmatched in other European cathe- drals. In the 1970s, Harlan Mills combined the observation that some programmers are much better than others with the need for architectural coherence to propose the chief read more..

  • Page - 1052

    SEC. 12.5 PROJECT MANAGEMENT 1021 In practice, large companies, which have had long experience producing soft- ware and know what happens if it is produced haphazardly, hav e a tendency to at least try to do it right. In contrast, smaller, newer companies, which are in a huge rush to get to market, do not always take the care to produce their software careful- ly. This read more..

  • Page - 1053

    1022 OPERATING SYSTEM DESIGN CHAP. 12 C C Test modules C Code Test system (a) Deploy Dummy procedure 1 (b) Plan Dummy procedure 2 Dummy procedure 3 Main program Figure 12-11. (a) Traditional software design progresses in stages. (b) Alterna- tive design produces a working system (that does nothing) starting on day 1. 12.5.4 No Silver Bullet In addition to The Mythical Man Month, Brooks also read more..

  • Page - 1054

    SEC. 12.6 TRENDS IN OPERATING SYSTEM DESIGN 1023 Generally, when new hardware arrives, what everyone does is just plop the old software (Linux, Windows, etc.) down on it and call it a day. In the long run, this is a bad idea. What we need is innovative software to deal with innovative hardware. If you are a computer science or engineering student or an ICT read more..

  • Page - 1055

    1024 OPERATING SYSTEM DESIGN CHAP. 12 One obvious question is: what do you do with all the cores? If you run a popu- lar server handling many thousands of client requests per second, the answer may be relatively simple. For instance, you may decide to dedicate a core to each re- quest. Assuming you do not run into locking issues too much, this may work. But what do we read more..

  • Page - 1056

    SEC. 12.6 TRENDS IN OPERATING SYSTEM DESIGN 1025 share the objects in a general way. Such a design would clearly lead to very dif- ferent operating systems than we now hav e. Another operating system issue that will have to be rethought with 64-bit ad- dresses is virtual memory. With 264 bytes of virtual address space and 8-KB pages we have 251 pages. Conventional page read more..

  • Page - 1057

    1026 OPERATING SYSTEM DESIGN CHAP. 12 continues, their operating systems will have to be appreciably different from cur- rent ones to handle all these demands. In addition, they must balance the power budget and ‘‘keep cool.’’ Heat dissipation and power consumption are some of the most important challenges even in high-end computers. However, an even faster growing segment of read more..

  • Page - 1058

    SEC. 12.7 SUMMARY 1027 12.7 SUMMARY Designing an operating system starts with determining what it should do. The interface should be simple, complete, and efficient. It should have a clear user-in- terface paradigm, execution paradigm, and data paradigm. The system should be well structured, using one of several known techniques, such as layering or client-server. The internal read more..

  • Page - 1059

    1028 OPERATING SYSTEM DESIGN CHAP. 12 4. Corbato´’s dictum is that the system should provide minimal mechanism. Here is a list of POSIX calls that were also present in UNIX Version 7. Which ones are redundant, that is, could be removed with no loss of functionality because simple combinations of other ones could do the same job with about the same performance? Access , read more..

  • Page - 1060

    CHAP. 12 PROBLEMS 1029 You may use up to 256 KB of RAM for tables if need be. Write a macro to carry out your algorithm. Extra Credit: Write a procedure to do the computation by looping over the 32 bits. Measure how many times faster your macro is than the procedure. 16. In Fig. 12-8, we saw how GIF files use 8-bit values to index into a color palette. The read more..

  • Page - 1061

    1030 OPERATING SYSTEM DESIGN CHAP. 12 27. Write programs that enter randomly generated short strings into an array and then can search the array for a given string using (a) a simple linear search (brute force), and (b) a more sophisticated method of your choice. Recompile your programs for array sizes ranging from small to as large as you can handle on your system. Evaluate read more..

  • Page - 1062

    13 READING LIST AND BIBLIOGRAPHY In the previous 12 chapters we have touched upon a variety of topics. This chapter is intended to aid readers interested in pursuing their study of operating systems further. Section 13.1 is a list of suggested readings. Section 13.2 is an alphabetical bibliography of all books and articles cited in this book. In addition to the references read more..

  • Page - 1063

    1032 READING LIST AND BIBLIOGRAPHY CHAP. 13 13.1.1 Introduction Silberschatz et al., Operating System Concepts, 9th ed., A general textbook on operating systems. It covers processes, memory man- agement, storage management, protection and security, distributed systems, and some special-purpose systems. Two case studies are given: Linux and Windows 7. The cover is full of dinosaurs. These read more..

  • Page - 1064

    SEC. 13.1 SUGGESTIONS FOR FURTHER READING 1033 Zhuravlev et al., ‘‘Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors’’ Multicore systems have started to dominate the field of general-purpose com- puting world. One of the most important challenges is shared resource contention. In this survey, the authors present different scheduling techniques for read more..

  • Page - 1065

    1034 READING LIST AND BIBLIOGRAPHY CHAP. 13 Silberschatz et al., Operating System Concepts, 9th ed., Chapters 10–12 are about storage hardware and file systems. They cover file operations, interfaces, access methods, directories, and implementation, among other topics. Stallings, Operating Systems, 7th ed., Chapter 12 contains a fair amount of material about file systems and little bit about read more..

  • Page - 1066

    SEC. 13.1 SUGGESTIONS FOR FURTHER READING 1035 Touch screens have become ubiquitous in a short time span. This article traces the history of the touch screen through history with easy-to-understand explana- tions and nice vintage pictures and videos. Fascinating stuff! Walker and Cragon, ‘‘Interrupt Processing in Concurrent Processors’’ Implementing precise interrupts on superscalar computers read more..

  • Page - 1067

    1036 READING LIST AND BIBLIOGRAPHY CHAP. 13 detail what is hidden behind acronyms like IAAS, PAAS, SAAS, and similar ‘‘X’’ As A Service family members. Rosenblum and Garfinkel, ‘‘Virtual Machine Monitors: Current Te chnology and Future Trends’’ Starting with a history of virtual machine monitors, this article then goes on to discuss the current state of CPU, memory, and I/O read more..

  • Page - 1068

    SEC. 13.1 SUGGESTIONS FOR FURTHER READING 1037 Kumar et al., ‘‘Heterogeneous Chip Multiprocessors’’ The multicore chips used for desktop computers are symmetric—all the cores are identical. However, for some applications, heterogeneous CMPs are widespread, with cores for computing, video decoding, audio decoding, and so on. This paper discusses some issues related to heterogeneous CMPs. read more..

  • Page - 1069

    1038 READING LIST AND BIBLIOGRAPHY CHAP. 13 Bratus et al., ‘‘From Buffer Ov erflows to Weird Machines and Theory of Computa- tion’’ Connecting the humble buffer overflow to Alan Turing. The authors show that hackers program vulnerable programs like weird machines with strange-looking instruction sets. In doing so, they come full circle to Turing’s seminal research on ‘‘What is read more..

  • Page - 1070

    SEC. 13.1 SUGGESTIONS FOR FURTHER READING 1039 Milojicic, ‘‘Security and Privacy’’ Security has many facets, including operating systems, networks, implications for privacy, and more. In this article, six security experts are interviewed on their thoughts on the subject. Nachenberg, ‘‘Computer Vi rus-Antivirus Coevolution’’ As soon as the antivirus developers find a way to detect and read more..

  • Page - 1071

    1040 READING LIST AND BIBLIOGRAPHY CHAP. 13 Maxwell, Linux Core Kernel Commentary The first 400 pages of this book contain a subset of the Linux kernel code. The last 150 pages consist of comments on the code, very much in the style of John Lions’ classic book. If you want to understand the Linux kernel in all its gory detail, this is the place to begin, but be read more..

  • Page - 1072

    SEC. 13.1 SUGGESTIONS FOR FURTHER READING 1041 Cooke et al., ‘‘UNIX and Beyond: An Interview with Ken Thompson’’ Designing an operating system is much more of an art than a science. Conse- quently, listening to experts in the field is a good way to learn about the subject. They do not come much more expert than Ken Thompson, co-designer of UNIX, Inferno, and Plan 9. read more..

  • Page - 1073

    1042 READING LIST AND BIBLIOGRAPHY CHAP. 13 ADAMS, G.B. III, AGRAWAL, D.P., and SIEGEL, H.J.: ‘‘ A Survey and Comparison of Fault- Tolerant Multistage Interconnection Networks,’’ Computer, vol. 20, pp. 14–27, June 1987. ADAMS, K., and AGESEN, O.: ‘‘ A Comparison of Software and Hardware Technqiues for X86 Virtualization,’’ Proc. 12th Int’l Conf. on Arc h. Support read more..

  • Page - 1074

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1043 BAKER, F.T.: ‘‘Chief Programmer Te am Management of Production Programming,’’ IBM Systems J., vol. 11, pp. 1, 1972. BAKER, M., SHAH, M., ROSENTHAL, D.S.H., ROUSSOPOULOS, M., MANIATIS, P., GIULI, T.J., and BUNGALE, P.: ‘‘ A Fresh Look at the Reliability of Long-Term Digital Storage,’’ Proc. First European Conf. on Computer Systems read more..

  • Page - 1075

    1044 READING LIST AND BIBLIOGRAPHY CHAP. 13 BHEDA, R.A., BEU, J.G., RAILING, B.P., and CONTE, T.M.: ‘‘Extrapolation Pitf alls When Evaluating Limited Endurance Memory,’’ Proc. 20th Int’l Symp. on Modeling, Analy- sis, & Simulation of Computer and Telecomm. Systems, IEEE, pp. 261–268, 2012. BHEDA, R.A., POOVEY, J.A., BEU, J.G., and CONTE, T.M.: ‘‘Energy Ef ficient Phase Change Memory read more..

  • Page - 1076

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1045 BRATUS, S., LOCASTO, M.E., PATTERSON, M., SASSAMAN, L., SHUBINA, A.: ‘‘From Buffer Overflows to Weird Machines and Theory of Computation,’’ ;Login:, USENIX, pp. 11–21, December 2011. BRINCH HANSEN, P.: ‘‘The Programming Language Concurrent Pa scal,’’ IEEE Trans. on Software Engineering, vol. SE-1, pp. 199–207, June 1975. BROOKS, F.P., Jr.: ‘‘No read more..

  • Page - 1077

    1046 READING LIST AND BIBLIOGRAPHY CHAP. 13 CHEN, Z., XIAO, N., and LIU, F.: ‘‘SAC: Rethinking the Cache Replacement Policy for SSD-Based Storage Systems,’’ Proc. Fifth Int’l Systems and Storage Conf., ACM, Art. 13, 2012. CHERVENAK, A., VELLANKI, V., and KURMAS, Z.: ‘‘Protecting File Systems: A Survey of Backup Techniques,’’ Proc. 15th IEEE Symp. on Mass Storage Systems, IEEE, read more..

  • Page - 1078

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1047 CROWLEY, C.: Operating Systems: A Design-Oriented Approach, Chicago: Irwin, 1997. CUSUMANO, M.A., and SELBY, R.W.: ‘‘How Microsoft Builds Software,’’ Commun. of the ACM, vol. 40, pp. 53–61, June 1997. DABEK, F., KAASHOEK, M.F., KARGET, D., MORRIS, R., and STOICA, I.: ‘‘Wide-Area Cooperative Storage with CFS,’’ Proc. 18th Symp. on read more..

  • Page - 1079

    1048 READING LIST AND BIBLIOGRAPHY CHAP. 13 DIJKSTRA, E.W.: ‘‘Co-operating Sequential Processes,’’ in Programming Languages, Genuys, F. (Ed.), London: Academic Press, 1965. DIJKSTRA, E.W.: ‘‘The Structure of THE Multiprogramming System,’’ Commun. of the ACM, vol. 11, pp. 341–346, May 1968. DUBOIS, M., SCHEURICH, C., and BRIGGS, F.A.: ‘‘Synchronization, Coherence, and Event Ordering in read more..

  • Page - 1080

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1049 FANDRICH, M., AIKEN, M., HAWBLITZEL, C., HODSON, O., HUNT, G., LARUS, J.R., and LEVI, S.: ‘‘Language Support for Fast and Reliable Message-Based Communication in Singularity OS,’’ Proc. First European Conf. on Computer Systems (EUROSYS),ACM, pp. 177–190, 2006. FEELEY, M.J., MORGAN, W.E., PIGHIN, F.H., KARLIN, A.R., LEVY, H.M., and THEKKATH, C.A.: read more..

  • Page - 1081

    1050 READING LIST AND BIBLIOGRAPHY CHAP. 13 GEIST, R., and DANIEL, S.: ‘‘ A Continuum of Disk Scheduling Algorithms,’’ ACM Trans. on Computer Systems, vol. 5, pp. 77–92, Feb. 1987. GELERNTER, D.: ‘‘Generative Communication in Linda,’’ ACM Trans. on Programming Languages and Systems, vol. 7, pp. 80–112, Jan. 1985. GHOSHAL, D., and PLALE, B: ‘‘Provenance from Log Files: read more..

  • Page - 1082

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1051 HAND, S.M., WARFIELD, A., FRASER, K., KOTTSOVINOS, E., and MAGENHEIMER, D.: ‘‘ Are Virtual Machine Monitors Microkernels Done Right?,’’ Proc. 10th Workshop on Hot Topics in Operating Systems, USENIX, pp. 1–6, 2005. HARNIK, D., KAT, R., MARGALIT, O., SOTNIKOV, D., and TRAEGER, A.: ‘‘To Zip or Not to Zip: Effective Resource Usage read more..

  • Page - 1083

    1052 READING LIST AND BIBLIOGRAPHY CHAP. 13 HOCKING, M: ‘‘Feature: Thin Client Security in the Cloud,’’ J. Network Security,vol. 2011, pp. 17–19, June 2011. HOHMUTH, M., PETER, M., HAERTIG, H., and SHAPIRO, J.: ‘‘Reducing TCB Size by Using Untrusted Components: Small Kernels Versus Virtual-Machine Monitors,’’ Proc. 11th ACM SIGOPS European Workshop, ACM, Art. 22, 2004. HOLMBACKA, S., read more..

  • Page - 1084

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1053 JANTZ, M.R., STRICKLAND, C., KUMAR, K., DIMITROV, M., and DOSHI, K.A.: ‘‘ A Framework for Application Guidance in Virtual Memory Systems,’’ Proc. Ninth Int’l Conf. on Virtual Execution Environments, ACM, pp. 155–166, 2013. JEONG, J., KIM, H., HWANG, J., LEE, J., and MAENG, S.: ‘‘Rigorous Rental Memory Management for Embedded Systems,’’ ACM read more..

  • Page - 1085

    1054 READING LIST AND BIBLIOGRAPHY CHAP. 13 KATO, S., ISHIKAWA, Y., and RAJKUMAR, R.: ‘‘Memory Management for Interactive Real-Time Applications,’’ Real-Time Systems, vol. 47, pp. 498–517, May 2011. KAUFMAN, C., PERLMAN, R., and SPECINER, M.: Network Security, 2nd ed., Upper Sad- dle River, NJ: Prentice Hall, 2002. KELEHER, P., COX, A., DWARKADAS, S., and ZWAENEPOEL, W.: read more..

  • Page - 1086

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1055 KUMAR, R., TULLSEN, D.M., JOUPPI, N.P., and RANGANATHAN, P.: ‘‘Heterogeneous Chip Multiprocessors,’’ Computer, vol. 38, pp. 32–38, Nov. 2005. KUMAR, V.P., and REDDY, S.M.: ‘‘ Augmented Shuffle-Exchange Multistage Interconnec- tion Networks,’’ Computer, vol. 20, pp. 30–40, June 1987. KWOK, Y.-K., AHMAD, I.: ‘‘Static Scheduling Algorithms for read more..

  • Page - 1087

    1056 READING LIST AND BIBLIOGRAPHY CHAP. 13 LI, D., LIAO, X., JIN, H., ZHOU, B., and ZHANG, Q.: ‘‘ A New Disk I/O Model of Virtual- ized Cloud Environment,’’ IEEE Trans. on Parallel and Distributed Systems, vol. 24, pp. 1129–1138, June 2013b. LI, K.: Shared Virtual Memory on Loosely Coupled Multiprocessors, Ph.D. Thesis, Yale Univ., 1986. LI, K., and HUDAK, P.: ‘‘Memory read more..

  • Page - 1088

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1057 LU, L., ARPACI-DUSSEAU, A.C., and ARPACI-DUSSEAU, R.H.: ‘‘Fault Isolation and Quick Recovery in Isolation File Systems,’’ Proc. Fifth USENIX Workshop on Hot Topics in Storage and File Systems, USENIX, 2013. LUDWIG, M.A.: The Little Black Book of Email Viruses, Show Low, AZ: American Eagle Publications, 2002. LUO, T., MA, S., LEE, R., ZHANG, X., read more..

  • Page - 1089

    1058 READING LIST AND BIBLIOGRAPHY CHAP. 13 MIKHAYLOV, K., and TERVONEN, J.: ‘‘Energy Consumption of the Mobile Wireless Sen- sor Network’s Node with Controlled Mobility,’’ Proc. 27th Int’l Conf. on Advanced Networking and Applications Workshops, IEEE, pp. 1582–1587, 2013. MILOJICIC, D.: ‘‘Security and Privacy,’’ IEEE Concurrency, vol. 8, pp. 70–79, April–June 2000. MOODY, G.: read more..

  • Page - 1090

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1059 NIST (National Institute of Standards and Technology): ‘‘The NIST Defi nition of Cloud Computing,’’ Special Publication 800-145, Recommendations of the National Institute of Standards and Technology, 2011. NO, J.: ‘‘NAND Flash Memory-Based Hybrid File System for High I/O Performance,’’ J. Parallel and Distributed Computing, vol. 72, pp. 1680–1695, Dec. read more..

  • Page - 1091

    1060 READING LIST AND BIBLIOGRAPHY CHAP. 13 PATTERSON, D.A., GIBSON, G., and KATZ, R.: ‘‘ A Case for Redundant Arrays of Inexpen- sive Disks (RAID),’’ Proc. ACM SIGMOD Int’l. Conf. on Management of Data,ACM, pp. 109–166, 1988. PEARCE, M., ZEADALLY, S., and HUNT, R.: ‘‘Virtualization: Issues, Security Threats, and Solutions,’’ Computing Surveys, ACM, vol. 45, Art. 17, Feb. read more..

  • Page - 1092

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1061 RECTOR, B.E., and NEWCOMER, J.M.: Win32 Programming, Boston: Addison-Wesley, 1997. REEVES, R.D.: Windows 7 Device Driver, Boston: Addison-Wesley, 2010. RENZELMANN, M.J., KADAV , A., and SWIFT, M.M.: ‘‘SymDrive: Testing Drivers without Devices,’’ Proc. 10th Symp. on Operating Systems Design and Implementation, USENIX, pp. 279–292, 2012. RIEBACK, M.R., read more..

  • Page - 1093

    1062 READING LIST AND BIBLIOGRAPHY CHAP. 13 ROZIER, M., ABROSSIMOV, V., ARMAND, F., BOULE, I., GIEN, M., GUILLEMONT, M., HERRMANN, F., KAISER, C., LEONARD, P., LANGLOIS, S., and NEUHAUSER, W.: ‘‘Chorus Distributed Operating Systems,’’ Computing Systems, vol. 1, pp. 305–379, Oct. 1988. RUSSINOVICH, M., and SOLOMON, D.: Windows Internals, Part 1, Redmond, WA: Microsoft Press, 2012. RYZHYK, read more..

  • Page - 1094

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1063 SCOTT, M., LEBLANC, T., and MARSH, B.: ‘‘Multi-Model Pa rallel Programming in Psy- che,’’ Proc. Second ACM Symp. on Principles and Practice of Parallel Programming, ACM, pp. 70–78, 1990. SEAWRIGHT, L.H., and MACKINNON, R.A.: ‘‘VM/370—A Study of Multiplicity and Use- fulness,’’ IBM Systems J., vol. 18, pp. 4–17, 1979. SEREBRYANY, K., read more..

  • Page - 1095

    1064 READING LIST AND BIBLIOGRAPHY CHAP. 13 STALLINGS, W.: Operating Systems, 7th ed., Upper Saddle River, NJ: Prentice Hall, 2011. STAN, M.R., and SKADRON, K: ‘‘Power-Aware Computing,’’ Computer, vol. 36, pp. 35–38, Dec. 2003. STEINMETZ, R., and NAHRSTEDT, K.: Multimedia: Computing, Communications and Applications, Upper Saddle River, NJ: Prentice Hall, 1995. STEVENS, R.W., and RAGO, read more..

  • Page - 1096

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1065 TANENBAUM, A.S., VAN RENESSE, R., VAN STAVEREN, H., SHARP, G.J., MULLENDER, S.J., JANSEN, J., and VAN ROSSUM, G.: ‘‘Experiences with the Amoeba Distributed Operating System,’’ Commun. of the ACM, vol. 33, pp. 46–63, Dec. 1990. TANENBAUM, A.S., and VAN STEEN, M.R.: Distributed Systems, 2nd ed., Upper Saddle River, NJ: Prentice Hall, 2007. read more..

  • Page - 1097

    1066 READING LIST AND BIBLIOGRAPHY CHAP. 13 UR, B., KELLEY, P.G., KOMANDURI, S., LEE, J., MAASS, M., MAZUREK, M.L., PAS- SARO, T., SHAY, R., VIDAS, T., BAUER, L., CHRISTIN, N., and CRANOR, L.F.: ‘‘How Does Your Password Measure Up? The Effect of Strength Meters on Password Cre- ation,’’ Proc. 21st USENIX Security Symp., USENIX, 2012. VA GHANI, S.B.: ‘‘Virtual Machine File read more..

  • Page - 1098

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1067 VON EICKEN, T., CULLER, D., GOLDSTEIN, S.C., and SCHAUSER, K.E.: ‘‘ Active Mes- sages: A Mechanism for Integrated Communication and Computation,’’ Proc. 19th Int’l Symp. on Computer Arch., ACM, pp. 256–266, 1992. VOSTOKOV, D.: Windows Device Drivers: Practical Foundations, Opentask, 2009. VRABLE, M., SAVA GE, S., and VOELKER, G.M.: read more..

  • Page - 1099

    1068 READING LIST AND BIBLIOGRAPHY CHAP. 13 WEISER, M., WELCH, B., DEMERS, A., and SHENKER, S.: ‘‘Scheduling for Reduced CPU Energy,’’ Proc. First Symp. on Operating Systems Design and Implementation, USENIX, pp. 13–23, 1994. WEISSEL, A.: Operating System Services for Task-Specific Power Management: Novel Approaches to Energy-Aware Embedded Linux, AV Akademikerverlag, 2012. WENTZLAFF, D., read more..

  • Page - 1100

    SEC. 13.2 ALPHABETICAL BIBLIOGRAPHY 1069 YUAN, W., and NAHRSTEDT, K.: ‘‘Energy-Efficient CPU Scheduling for Multimedia Sys- tems,’’ ACM Trans. on Computer Systems, ACM, vol. 24, pp. 292–331, Aug. 2006. ZACHARY, G.P.: Showstopper, New York: Maxwell Macmillan, 1994. ZAHORJAN, J., LAZOWSKA, E.D., and EAGER, D.L.: ‘‘The Ef fect of Scheduling Disci- pline on Spin Overhead in Shared read more..

  • Page - 1101

    1070 READING LIST AND BIBLIOGRAPHY CHAP. 13 ZWICKY, E.D.: ‘‘Torture-Testing Backup and Archive Programs: Things You Ought to Know But Probably Would Rather Not,’’ Proc. Fifth Conf. on Large Installation Sys- tems Admin., USENIX, pp. 181–190, 1991. read more..

  • Page - 1102

    INDEX read more..

  • Page - 1103

    This page intentionally left blank read more..

  • Page - 1104

    INDEX A Absolute path, 776 Absolute path name, 277 Abstraction, 982 Access, 116, 617, 657, 672, 801 Access control entry, Windows, 968 Access control list, 605–608, 874 Access to resources, 602–611 Access token, 967 Access violation, 936 Accountability, 596 ACE (see Access Control Entry) Acknowledged datagram service, 573 Acknowledgement message, 144 Acknowledgement packet, 573 ACL (see Access read more..

  • Page - 1105

    1074 INDEX Algorithmic paradigm, 989 Allocating dedicated devices, 366 ALPC (see Advanced LPC) Alternate data stream, 958 Amoeba, 610 Analytical engine, 7 Andreesen, Marc, 77 Android, 20, 802–849 history, 803–807 Android 1.0, 805 Android activity, 827–831 Android application, 824–836 Android application sandbox, 838 Android architecture, 809–810 Android binder, 816–822 Android binder IPC, read more..

  • Page - 1106

    INDEX 1075 B B programming language, 715 Babbage, Charles, 7, 13 Back door, 658–660 Backing store for paging, 237–239 Backing up a file system, 306–311 Bad disk sector, 383 Bad-news diode, 1020 Balance set manager, 939 Ballooning, 490 Bandwidth reservation, Windows, 945 Banker’s algorithm multiple resources, 454–456 single resource, 453–454 Barrier, 146–148 Base priority, Windows read more..

  • Page - 1107

    1076 INDEX Busy waiting, 30, 122, 124, 354 Bypassing ASLR, 647–648 Byron, Lord, 7 Byte code, 702 C C language, introduction, 73–77 C preprocessor, 75 C programming language, 715 C-list, 608 CA (see Certification Authority) Cache, 100 Linux, 772 Windows, 942–943 write-through, 317 Cache (L1, L2, L3), 527 Cache hit, 25 Cache line, 25, 521 Cache manager, 889 Cache-coherence protocol, 521 read more..

  • Page - 1108

    INDEX 1077 Code review, 659 Code signing, 693–694 Coherency wall, 528 Colossus, 7 COM (see Component Object Model) Command injection attack, 655–656 Command interpreter, 39 Committed page, Windows, 929 Common criteria, 890 Common object request broker architecture, 582–584 Communication, synchronous vs. asynchronous, 1004–1005 Communication deadlock, 459–461 Communication software, 550–552 Companion read more..

  • Page - 1109

    1078 INDEX Dalvik, 814–815 Dangling pointer attack, 652–653 Darwin, Charles, 47 Data confidentiality, 596 Data execution prevention, 644, 645–645 Data paradigm, 989–991 Data rate for devices, 339 Data segment, 56, 754 Datagram service, 573 Deadlock, 435–465 banker’s algorithm for multiple resources, 454–456 banker’s algorithm for single resource, 453–454 checkpointing to recover from, 449 read more..

  • Page - 1110

    INDEX 1079 Disco, 474 Discretionary access control, 612 Discretionary ACL, 967 Disk, 27–28, 49–50 Disk controller cache, 382 Disk driver, 4 Disk error handling, 382–385 Disk formatting, 375–379 Disk hardware, 369–375 Disk interleaving, 378 Disk operating system, 15 Disk properties, 370 Disk quota, 305–306 Disk recalibration, 384 Disk scheduling algorithms, 379–382 elevator, 380–382 read more..

  • Page - 1111

    1080 INDEX Evolution of Linux, 714–703 Evolution of Windows, 857–864 Example file systems, 320–331 ExceptPortHandle, 869 Exclusive lock, 779 Exec, 55, 56, 82, 112, 604, 642, 669, 737, 738, 742, 758, 815, 844 Executable, 862 Executable file (UNIX), 269–270 Executable program virus, 666–668 Executive, Windows, 877, 887 Execution paradigm, 989 Execve, 54–56, 89 ExFAT file system, 266 read more..

  • Page - 1112

    INDEX 1081 File-system fragmentation, 283–284 File-system implementation, 281–299 File-system layout, 281–282 File-system management, 299–320 File-system performance, 314–319 File-system structure, Windows NT, 954–958 Linux, 785–792 File-system-based middleware, 577–582 File type, 268–270 File usage, example, 273–276 Filter, 892 Filter driver, Windows, 952 Finger daemon, 675 Finite-state machine, 102 read more..

  • Page - 1113

    1082 INDEX Guest operating system, 72, 477, 505 Guest physical address, 488 Guest virtual address, 488 Guest-induced page fault, 487 GUI (see Graphical User Interface) GUID partition table, 378 H Hacker, 597 HAL (see Hardware Abstraction Layer) Handheld computer operating system, 36 Handle, 92, 868, 897–898 Hard fault, 936 Hard link, 281 Hard miss, 204 Hard real-time system, 38, 164 read more..

  • Page - 1114

    INDEX 1083 I/O software layers, 356–369 I/O system calls in Linux, 770–771 I/O system layers, 368 I/O using DMA, 355 I/O virtualization, 490–493 I/O-bound process, 152 IAAS (see Infrastructure As A Service) IAT (see Import Address Table) IBinder, Android, 821 IBM AS/400, 609 IBM PC, 15 IC (see Integrated Circuit) Icon, 405 IDE (see Integrated Drive Electronics) Ideal processor, Windows, read more..

  • Page - 1115

    1084 INDEX Interrupt service routine, 883 Interrupt vector, 31, 94, 348 Interrupt-driven I/O, 354–355 Introduction to scheduling, 150 Intruder, 599 Intrusion detection system, 687, 695 model-based, 695–697 Invalid page, Windows, 929 Inverted page table, 207–208 IoCallDrivers, 948–949 IoCompleteRequest, 948, 961, 962 Ioctl, 770, 771 IopParseDevice, 902, 903 iOS, 19 IP (see Internet Protocol) IP read more..

  • Page - 1116

    INDEX 1085 Limit register, 186 Limits to clock speed, 517 Linda, 584–587 Line discipline, 774 Linear address, 250 Link, 54, 57, 58, 280, 783 file, 291, 777 Linked lists for memory management, 192–194 Linked-list allocation, 284 Linked-list allocation using a table in memory, 285 Linker, 76 Linux, 14, 713–802 history, 720–722 overview, 723–733 Linux booting, 751 Linux buddy algorithm, read more..

  • Page - 1117

    1086 INDEX Locking, 778–779 Locking pages in memory, 237 Log-structured file system, 293–293 Logic bomb, 657–658 Logical block addressing, 371 Logical dump, file system, 309 Login Linux, 752 Login spoofing, 659–660 LookupAccountSid, 970 Loosely coupled distributed system, 519 Lord Byron, 7 Lottery scheduling, 163 Lovelace, Ada, 7 Low-level communication software, 550–552 Low-level format, 375 read more..

  • Page - 1118

    INDEX 1087 Mesh, multicomputer, 547 Message passing, 144–146 Message-passing interface, 146 Metadata, file, 271 Metafile, Windows, 412 Method, 407, 582 Metric units, 79–80 MFT (see Master File Table) Mickey, 399 Microcomputer, 15 Microkernel, 65–68, 995–997 Microkernels vs. hypervisors, 483–485 Microsoft Development Kit, 865 Microsoft disk operating system, 15 Middleware, 568 document-based, 576–577 read more..

  • Page - 1119

    1088 INDEX Multithreaded Web server, 100–101 Multithreaded word processor, 99–100 Multithreading, 23–24, 103 Multitouch, 415 Munmap, 757 Murphy’s law, 120 Mutation engine, 690 Mutex, 132–134 Mutexes in Pthreads, 135–137 Mutual exclusion, 121 busy waiting, 124 disabling interrupts, 122–123 lock variable, 123 Peterson’s solution, 124–125 priority inversion, 128 sleep and wakeup, 127–130 spin read more..

  • Page - 1120

    INDEX 1089 NtWriteFile, 899, 946 NtWriteVirtualMemory, 869 Null pointer dereference attack, 653 NUMA (see NonUniform Memory Access) NUMA multiprocessor, 525–527 NX bit, 644 O ObCreateObjectType, 903 Object, 582 security, 605 Object adapter, 583 Object cache, 762 Object file, 75 Object manager, 870, 888 Object manager implementation, 894–896 Object namespace, 898–905 Object request broker, 582 read more..

  • Page - 1121

    1090 INDEX Operating system structure, 62–73, 993–997 client-server, 68 client-server system, 995–997 exokernel, 73, 995 extensible system, 997 layered, 64–65 layered system, 993–994 microkernel, 65–68 virtual machine, 69–72 Operating system type, 35–38 Operating systems security, 599–602 Optimal page replacement algorithm, 209–210 Optimize the common case, 1017 ORB (see Object Request read more..

  • Page - 1122

    INDEX 1091 Paging daemon, 232 Paradigm, data, 989–991 operating system, 987–993 Parallel bus architecture, 32 Parallels, 474 Parasitic virus, 668 Paravirt op, 485 Paravirtualization, 72, 476, 483 Parcel, Android, 821 Parent process, 90, 734 Parse routine, 898 Partition, 59, 879 Passive attack, 600 Password security, 628–632 Password strength, 628–629 Patchguard, 974 Path name, 43, 277–280 read more..

  • Page - 1123

    1092 INDEX Power management (continued) thermal management, 424 Windows, 964–966 wireless communication, 423–424 PowerShell, 876 Pre-copy memory migration, 497 Preamble, 340 Precise interrupt, 349–351 Preemptable resource, 436 Preemptive scheduling, 153 Prepaging, 216 Present/absent bit, 197, 200 Primary volume descriptor, 327 Principal, security, 605 Principle of least authority, 603 Principles of read more..

  • Page - 1124

    INDEX 1093 Q Quality of service, 573 Quantum, scheduling 158 QueueUserAPC, 885 Quick fit algorithm, 193 R R-node, NFS, 796 Race condition, 119–121, 121, 656 RAID (see Redundant Aray of Inexpensive Disks) RAM (see Random Access Memory) Random access memory, 26 Random-access file, 270 Raw block file, 774 Raw mode, 395 RCU (see Read-Copy-Update) RDMA (see Remote DMA) RDP (see Remote Desktop read more..

  • Page - 1125

    1094 INDEX Research on operating systems, 77–78 Research on security, 703 Research on virtualization and the cloud, 514 Reserved page, Windows, 929 ResetEvent, 918 Resilient file system, 266 Resistive screen, 414 ResolverActivity, 837 Resource, 436–439 nonpremptable, 437 preemptable, 436 X, 403 Resource access, 602–611 Resource acquisition, 437–439 Resource allocation graph, 440–441 Resource read more..

  • Page - 1126

    INDEX 1095 Scheduling mechanism, 165 Scheduling of processes Linux, 746–751 Windows, 922–927 Scheduling policy, 165 Script kiddy, 599 Scroll bar, 406 SCSI (see Small Computer System Interface) SDK (see Software Development Kit) Seamless data access, 1025 Seamless live migration, 497 Second system effect, 1021 Second-chance page replacement algorithm, 212 Secret-key cryptography, 620–621 read more..

  • Page - 1127

    1096 INDEX Shell pipe symbol, 728 Shell pipeline, 728 Shell prompt, 726 Shell script, 728 Shell wild card, 727 Shellcode, 642 Shim, 922 Short name, NTFS, 957 Shortest job first scheduling, 157–158 Shortest process next scheduling, 162 Shortest remaining time next scheduling, 158 Shortest seek first disk scheduling, 380 SID (see Security ID) Side-by-side DLLs, 906 Side-channel attack, 636 read more..

  • Page - 1128

    INDEX 1097 Stateless firewall, 686 Static relocation, 185 Static vs. dynamic structures, 1002–1003 Steganography, 617–619 Storage allocation, NTFS, 958–962 Store manager, Windows, 941 Store-and-forward packet switching, 547–548 Stored-value card, 634 Strict alternation, 123–124 Striping, RAID, 372 Structure, operating system, 993–997 Stuxnet attack on nuclear facility, 598 Subject, security, 605 read more..

  • Page - 1129

    1098 INDEX Thread (continued) user-space, 108–111 Windows, 908–927 Thread environment block, 908 Thread local storage, 908 Thread management API calls in Windows, 914–919 Thread of execution, 103 Thread pool, Windows, 911–914 Thread scheduling, 166–167 Thread table, 109 Thread usage, 97–102 Threads, POSIX, 106–108 Threat, 596–598 Throughput, 155 Tightly coupled distributed system, 519 Time, read more..

  • Page - 1130

    INDEX 1099 Unsafe state, 452–453 Up operation on semaphore, 130 Upcall, 114 Upload/download model, 577 URL (see Uniform Resource Locator) USB (see Universal Serial Bus) Useful techniques, 1005–1010 User account control, 972 User datagram protocol, 770 User ID, 40, 604, 798 User interface paradigm, 988 User interfaces, 394–399 User mode, 2 User shared data, 908 User-friendly software, 16 read more..

  • Page - 1131

    1100 INDEX VMware workstation, 478 VMware Workstation, 498–500 Linux, 498 Windows, 498 VMX, 509 VMX driver, 509 Volume shadow copy, Windows, 944 VT (see Virtualization Technology) Vulnerability, 594 W Wait, 139, 140, 356 WaitForMultipleObjects, 886, 895, 918, 977 WaitForSingleObject, 918 WaitOnAddress, 919 Waitpid, 54–55, 55, 56, 736, 737, 738 Waitqueue, 750 Wake lock, Android, 810–813 read more..

  • Page - 1132

    INDEX 1101 Windows IPC, 916–917 Windows job, 909–911 Windows kernel, 882 Windows Me, 17, 859 Windows memory management, 927–942 implementation, 933–942 introduction, 928–931 Windows memory management API calls, 931–932 Windows metafile, 412 Windows notification facility, 890 Windows NT, 16, 860 Windows NT 4.0, 861, 891 Windows NT file system, 265–266, 952–964 introduction, 952–954 read more..

  • Page - 1133

    This page intentionally left blank read more..

  • Page - 1134

    Also by Andrew S. Tanenbaum and Albert S. Woodhull Operating Systems: Design and Implementation, 3rd ed. All other textbooks on operating systems are long on theory and short on practice. This one is different. In addition to the usual material on processes, memory management, file systems, I/O, and so on, it contains a CD-ROM with the source code (in C) of a small, but read more..

  • Page - 1135

    Also by Andrew S. Tanenbaum and David J. Wetherall Computer Networks, 5th ed. This widely read classic, with a fifth edition co-authored with David Wetherall, provides the ideal introduction to today’s and tomorrow’s networks. It explains in detail how modern networks are structured. Starting with the physical layer and working up to the application layer, the book covers a vast read more..

  • Page - 1136

    Also by Andrew S. Tanenbaum and Todd Austin Structured Computer Organization, 6th ed. Computers are getting more complicated every year but this best-selling book makes computer architecture and organization easy to understand. It starts at the very beginning explaining how a tran- sistor works and from there explains the basic circuits from which computers are built. Then it moves read more..

  • Page - 1137

    Also by Andrew S. Tanenbaum and Maarten van Steen Distributed Systems: Principles and Paradigms, 2nd ed. Distributed systems are becoming ever-more important in the world and this book explains their principles and illustrates them with numerous examples. Among the topics covered are architectures, processes, communication, naming, synchronization, consistency, fault tolerance, and security. read more..

Write Your Review