Game Scripting Mastery

If game scripting is something you\'ve been interested in, and you want to learn it in some serious detail, then this book is the book for you.


Alex Varanese


1273 Pages

111676 Reads

12 Downloads

English

PDF Format

52.0 MB

Game Development

Download PDF format


  • Alex Varanese   
  • 1273 Pages   
  • 19 Feb 2015
  • Page - 1

    TEAMFLY read more..

  • Page - 2

    Game Scripting Mastery Alex Varanese read more..

  • Page - 3

    © 2003 by Premier Press, a division of Course T echnology. All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system without written permission from Premier Press, except for the inclusion of brief quotations in a review. The Premier read more..

  • Page - 4

    This book is dedicated to my parents, Ray and Sue, and to my sister Katherine, if for no other reason than the simple fact that they'd put me in a body bag if I forgot to do so. read more..

  • Page - 5

    iv Foreword Programming games is so fun! The simple reason is that you get to code so many different types of subsystems in a game, regardless of whether it's a simple Pac Man clone or a complex triple-A tactical shooter. Coding experience is very enriching, whether you’re writing a renderer, sound system, AI system, or the game code itself; all of these types of read more..

  • Page - 6

    v Acknowledgments It all started as I was standing around with some friends of mine on the second day of the 2001 Xtreme Game Developer's Conference in Santa Clara, California, discussing the Premier Press game development series. At the time, I'd been doing a lot of research on the subject of compiler theory—specifically, how it could be applied to game scripting—and at the read more..

  • Page - 7

    vi Of course, due to my relatively young age and penchant for burning through cash like NASA, I've relied on others to provide a roof over my head. The honor here, not surprisingly, goes to my parents. I'd like to thank my mom for spreading news of my book deal to every friend, relative, teacher, and mailman our family has ever known, and my dad for deciding that read more..

  • Page - 8

    vii About the Author Alex Varanese has been obsessed with game development since the mid-1980's when, at age five, he first laid eyes—with both fascination and a strange and unexplainable sense of familiarity—on the 8-bit Nintendo Entertainment System. He's been an avid artist since birth as well, but didn't really get going as a serious coder until later in life, at around read more..

  • Page - 9

    viii Letter from the Series Editor A long, long, time ago on an 8-bit computer far, far, away, you could get away with hard coding all your game logic, artificial intelligence, and so forth. These days, as they say on the Sopranos "forget about it.…" Games are simply too complex to even think about coding anymore—in fact, 99 percent of all commercial games work like read more..

  • Page - 10

    ix your existing C/C++ game engine—in essence, you will have mastered game scripting! Also, you will never want to write another parser as long as you live. In conclusion, if game scripting is something you’ve been interested in, and you want to learn it in some serious detail, then this book is the book for you. Moreover, this is the only book on the market (as we read more..

  • Page - 11

    x CONTENTS AT A GLANCE Contents at a Glance Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliv Part One Scripting Fundamentals ..........................1 Chapter 1 An Introduction to Scripting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 Applications read more..

  • Page - 12

    Chapter 6 Integration: Using Existing Scripting Systems . . . . . . . . . . . . . 173 Chapter 7 Designing a Procedural Scripting Language . . . . . . . . . . . . . . . 335 Part Four Designing and Implementing a Low-Level Language ..........................367 Chapter 8 Assembly Language Primer. . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 13

    xii CONTENTS AT A GLANCE Chapter 15 Parsing and Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 983 Part Seven Completing Your Training ..................1137 Chapter 16 Applying the System to a Full Game . . . . . . . . . . . . . . . . . . . 1139 Chapter 17 Where to Go From Here . . . . . . . . . . . . read more..

  • Page - 14

    xiii CONTENTS Contents Introduction ..................................xliv Part One Scripting Fundamentals ...............1 Chapter 1 An Introduction to Scripting ................3 What Is Scripting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Structured Game Content—A Simple Approach. . . . . . . . . . . . . 6 read more..

  • Page - 15

    xiv CONTENTS Chapter 2 Applications of Scripting Systems .......29 The General Purpose of Scripting . . . . . . . . . . . . . . . . . . . . . . . 30 Role Playing Games (RPGs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Complex, In-Depth Stories . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 16

    xv CONTENTS Implementing a Command-Based Language . . . . . . . . . . . . . . . 74 Designing the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Writing the Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 17

    xvi CONTENTS Simple Iterative and Conditional Logic . . . . . . . . . . . . . . . . . . 125 Conditional Logic and Game Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Grouping Code with Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 The read more..

  • Page - 18

    xvii CONTENTS Parsing/Syntactic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Intermediate Code Generation . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 19

    xviii CONTENTS Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Advanced String Features. . . . . . . . . . read more..

  • Page - 20

    xix CONTENTS Integrating Python with C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Compiling a Python Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Initializing Python . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 21

    xx CONTENTS Which Scripting System Should You Use? . . . . . . . . . . . . . . . . 331 Scripting an Actual Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 On the CD . . . read more..

  • Page - 22

    xxi CONTENTS Part Four Designing and Implementing a Low-Level Language .............367 Chapter 8 Assembly Language Primer ..............369 What Is Assembly Language? . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Why Assembly Now? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 How Assembly read more..

  • Page - 23

    xxii CONTENTS XASM Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Stack and Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Functions . . . . . . . . . . . . . . . . . . . read more..

  • Page - 24

    xxiii CONTENTS Implementing the Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Basic Lexing/Parsing Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Lexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 25

    xxiv CONTENTS Part Five Designing and Implementing a Virtual Machine ..................565 Chapter 10 Basic VM Design and Implementation ..567 Ghost in the Virtual Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . 568 Mimicking Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 26

    xxv CONTENTS An .XSE Format Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 The Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 The Instruction Stream . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 27

    xxvi Loading and Storing Multiple Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 The g_Script Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Loading Scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 28

    xxvii Embedding the XVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Defining the Host API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 The Main Program . . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 29

    xxviii Chapter 13 Lexical Analysis ............................783 The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 From Characters to Lexemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Tokenization . . . . . . . . . . . . read more..

  • Page - 30

    xxix Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Breaking Operators Down. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832 Building Operator State Transition Tables . . . . . . . . . . . . . . . read more..

  • Page - 31

    xxx Preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 Code Emission . . . . . . . . . . . . . . . . . . read more..

  • Page - 32

    xxxi The Function Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 910 The FuncNode Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911 The Interface . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 33

    xxxii Adding Source Code Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947 Retrieving I-Code Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 948 The Code-Emitter Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949 read more..

  • Page - 34

    xxxiii How Parsing Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 993 Recursive Descent Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994 The XtremeScript Parser Module . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 35

    xxxiv New Unary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053 New Binary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054 Logical and Relational Operators . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 36

    xxxv The Bouncing Head Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106 Anatomy of the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107 The Host Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 37

    xxxvi The Red Droid’s Behavior Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171 Loading and Running the Scripts . . . . . . . . . . . . . . . . . . . . . . . read more..

  • Page - 38

    xxxvii Dynamic Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189 The Compiler and High-Level Language . . . . . . . . . . . . . . . . . . . . . . . 1190 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199 read more..

  • Page - 39

    Introduction I f you've been programming games for any reasonable amount of time, you've probably learned that at the end of the day, the really hard part of the job has nothing to do with illumi- nation models, doppler shift, file formats, or frame rates, as the majority of game development books on the shelves would have you believe. These days, it's more or less read more..

  • Page - 40

    xxxix INTRODUCTION community of rabid, photosensitive code junkies can't tear it open and rewire its guts. The prob- lem is, you can't just pop up an Open File dialog box and let the player chose a DLL or other dynamically linked solution, because doing so opens you up to all sorts of security holes. What if a malicious mod author decides that the penalty for taking a read more..

  • Page - 41

    this section are complete tutorials on using the Lua, Python and Tcl languages, as well as integrating their associated runtime environments with a host application. • Part Four: Designing and Implementing a Low-Level Langauge. At the bottom of our scripting system will lie an assembly language and corresponding machine code (or byte- code). The design and implementation of this read more..

  • Page - 42

    Part One Scripting Fundamentals read more..

  • Page - 43

    This page intentionally left blank read more..

  • Page - 44

    An Introduction to Scripting “We’ll bring you the thrill of victory, the agony of defeat, and because we’ve got soccer highlights, the sheer pointlessness of a zero-zero tie.” ——Dan Rydel, Sports Night CHAPTER 1 read more..

  • Page - 45

    4 I t goes without saying that modern game development is a multi-faceted task. As so many books on the subject love to ask, what other field involves such a perfect synthesis of art, music and sound, choreography and direction, and hardcore programming? Where else can you find each of these subjects sharing such equal levels of necessity, while at the same time working in read more..

  • Page - 46

    5 But enough with the drama! It’s time to roll up your sleeves, take one last look at the real world, and dive headlong into the almost entirely uncharted territory that programmers call “game scripting.” In this chapter you will find ■ An overview of what scripting is and how it works. ■ Discussion on the fundamental types of scripting systems. ■ Brief coverage read more..

  • Page - 47

    6 and fun to play? The answer is the content—the quest and the storyline, the dialogue, the descrip- tions of each weapon, spell, enemy, and all those other details that separate a demo from the next platinum seller. STRUCTURED GAME CONTENT— A SIMPLE APPROACH So how exactly do you create a complete game? The programmer uses a compiler to code the design document read more..

  • Page - 48

    7 const ARMOR_REPAIR = 2; const TELEPORT = 3; This provides a modest but useful selection of item types. If an item is of type HEAL, it restores the player’s health points (or HP as they’re often called). Items of type MAGIC_RESTORE are similar; they restore a player’s magic points (MP). ARMOR_REPAIR repairs armor (not surprisingly), and TELEPORT lets read more..

  • Page - 49

    8 // To be honest, I have no idea what on earth this thing is: ItemArray [ 3 ].pstrName = "Orb of Sayjack"; ItemArray [ 3 ].iType = TELEPORT; ItemArray [ 3 ].iPrice = 3000; ItemArray [ 3 ].iPower = NULL; Upon recompiling the game, four unique items will be available for use. With them in place, let’s imagine you take them out for a field test, to make read more..

  • Page - 50

    9 given enough opportunities to collect gold, and thus gets stuck. This is very important and must be fixed immediately. ItemArray [ 1 ].iPrice = 80; // This tweaking is getting old. Once again, (say it with me now) you recompile the engine to reflect the changes. The balancing of items in an RPG is not a trivial task, and requires a great deal of field read more..

  • Page - 51

    10 Think about it this way—coding game content directly into your engine is a little like wearing a tuxedo every day of your life. Not only does it take a lot longer to put on a tux in the morning than it does to throw on a v-neck and some khakis, but it’s inappropriate except for a few rare occasions. You’re only going to go to a handful of weddings in read more..

  • Page - 52

    11 game’s only connection with this data is the code that reads it from the disk. They’re loaded at runtime. At compile-time, they don’t even have to be on the same hard drive, because they’re unrelated to the source code. The game engine doesn’t care what the data actually is, it just reads it and tosses it out there. So somehow, you need to offload your read more..

  • Page - 53

    12 pretty simple; each attribute of the item gets its own line. Let’s take a look at the steps you might take to load this into the game: 1. Open the file and determine which index of the item array to store its contents in. You’ll probably be loading these in a loop, so it should just be a matter of referring to the loop counter. 2. Read the first string read more..

  • Page - 54

    13 iArray [ iIndex - 1 ] = iElement; iChecksum += iElement; } iArray [ MAX_ARRAY_SIZE - 1 ] = iChecksum; Regardless of what it’s actually supposed to be doing the important thing to notice is that the size of the array, which is referred to a number of times, is stored in a handy constant beforehand. Why is this important? Well imagine if you suddenly wanted the read more..

  • Page - 55

    14 STORING FUNCTIONALITY IN EXTERNAL FILES Sooner or later, you’re going to want more unique and complex items. The common thread between all of the items described so far is that they basically just increase or decrease various stats. It’s something that’s very easy to do, because each item only needs to tell the engine which stats it wants to change, and by how much. read more..

  • Page - 56

    15 The answer is scripting. Scripting actually lets you write code outside of your engine, load that code into the engine, and execute it. Generally, scripts are written in their own language, which is often very similar to C/C++ (but usually simpler). These two types of code are separate—scripts use their own compiler and have no effect on your engine (unless you want read more..

  • Page - 57

    16 One of the most popular solutions to this problem literally involves designing and implementing a new language from the ground up. This language is called a scripting language, and as I’ve men- tioned a number of times, is compiled with its own special compiler (so don’t expect Microsoft VisualStudio to do this for you). Once this language is designed and implemented, read more..

  • Page - 58

    17 That’s where compilers come in. A compiler’s job is to turn the C/C++, Java, or Pascal code that your brain can easily interpret and understand into machine code; a set of numeric codes (called opcodes, short for operation code) that tell the processor to perform extremely fine-grained tasks like moving individual bytes of memory from one place to another or jumping to read more..

  • Page - 59

    18 Anyway, once the code is compiled, it’s ready to fly. The compiler hands all the compiled code to a program called a linker, which takes that massive volume of instructions, packages them all into a nice, tidy executable file along with a considerable amount of header information and slaps an .EXE on the end (or whatever extension your OS uses). When you run that read more..

  • Page - 60

    19 When you write a script, you write it just like you write a normal program. You open up a text edi- tor of some sort (or maybe even an actual VisualStudio-style IDE if you go so far as to make one), and input your code in a high-level language, just like you do now with C/C++. When you’re done, you hand that source file to a compiler, which reduces it read more..

  • Page - 61

    20 script and allow it process it, and ultimately the script would perform whatever functionality was associated with the item. Host applications provide running scripts with a group of functions, called an API (which stands for Application Programming Interface), which they can call to affect the game. This API for an RPG might allow the script to move the player around in the read more..

  • Page - 62

    21 ed user communities, and almost all of which are free to download and use. Even after attaining scripting mastery, you still might feel that an existing package is right for you. Regardless of the details, however, the motivation behind any choice in a scripting system should always be to match the project appropriately. With the huge number of features that can be read more..

  • Page - 63

    22 Unreal is a high-profile example of a game that’s really put this method of scripting to good use. Its proprietary scripting language, UnrealScript, was designed specifically for use in Unreal, and provides a highly object oriented language similar to C/C++. Check out Figure 1.5. 1. AN INTRODUCTION TO SCRIPTING Figure 1.5 Unreal, a first-person shooter based around a proprietary read more..

  • Page - 64

    23 language to script another type of program, like a word processor. In that case, you’d want to revise the command set to be more appropriate. For example: MoveCursor 2, 2 SetFont "Times New Roman", 24, BLACK PrintText "Newsletter" LineBreak SetFontSize 12 PrintDate LineBreak Once again, the key characteristic behind these read more..

  • Page - 65

    24 called an SDK (Software Development Kit), so that other programmers can add to the game by writing their own modules. These add-ons are often called mods (an abbreviation for “modifica- tion”) and are very popular with the previously mentioned games (Quake and Half-Life). At first, dynamically linked modules seem like the ultimate scripting solution; they’re separate and read more..

  • Page - 66

    25 Rather, they literally have to process and understand the exact same human-written, high-level C/C++ code you and I deal with every day. If you think that sounds like a tough job, you’re right. Interpreters are no picnic to implement. On the one hand, they’re based on almost all of the complex, language parsing functionality of com- pilers, but on the other hand, they read more..

  • Page - 67

    26 As a result, this book will only casually mention interpreted code here and there, and instead focus entirely on compiled code. Again, while interpreters do function extremely well as debug- gers and other development tools, the work involved in creating them outweighs their long-term usefulness (at least in the context of this book). Existing Scripting Solutions Creating your own read more..

  • Page - 68

    27 Lua http://www.lua.org/ As described by the official Lua web site, “Lua is a powerful, lightweight programming language designed for extending applications.” Lua is a procedural scripting system that works well in any number of applications, including games. One of its most distinguishing features, however, lies in its ability to be expanded by programs written with it. As a read more..

  • Page - 69

    This page intentionally left blank read more..

  • Page - 70

    Applications of Scripting Systems “What’s wr ong with science being practical? Even profitable?” ——Dr. David Dr umlin, Contact CHAPTER 2 read more..

  • Page - 71

    30 A s I mentioned in the last chapter, scripting systems should be designed to do as much as is necessary and no more. Because of this, understanding what the various forms of scripting systems can do, as well as their common applications, is essential in the process of attain- ing scripting mastery. So that’s what this chapter is all about: giving you some insight into read more..

  • Page - 72

    31 carried out without constant recompilation of the entire project. It also allows the game to be eas- ily expanded even after it’s been compiled, packaged, and shipped (see Figure 2.1). Modifications and extensions can be downloaded by players and immediately recognized by the game. With a system like this, gameplay can be extended indefinitely (so long as people produce new read more..

  • Page - 73

    32 With this in mind, scripting seems applicable to all sorts of games; don’t let the example from the first chapter imply that only RPGs need this sort of technology. Just about any type of game can benefit from scripting; even a PacMan clone could give the different colored ghosts their own unique AI by assigning them individual scripts to control their movement. So read more..

  • Page - 74

    33 At any given point in the player’s adventure, the game is going to need to know every major thing the player has done up until that point in order to determine the current state of the game world, and thus, what will happen next. For example, if players can’t stop the villain from burn- ing the bridge to the hideout early in the game, they might be forced read more..

  • Page - 75

    34 engine entirely untouched. This is technically the ideal way to do it, because all game logic is offloaded from the main engine, but either way is certainly acceptable. Non-Player Characters (NPCs) One of the most commonly identifiable aspects of any RPG is the constant conversation with the characters that inhabit the game world. Whether it be the friendly population of the read more..

  • Page - 76

    35 The Solution First, let’s discuss some of the simpler NPC conversations that you’ll find in RPGs. In the case of conversations that don’t require branching, a command-based language system is more than enough. For example, imagine you’d like the following exchange in your game: NPC: “You look like you could use some garlic.” Player: “Excuse me?” NPC: “You’re the read more..

  • Page - 77

    36 Pretty straightforward, huh? Once written, this script would then be associated with the NPC, telling the game to run it whenever the player talks to him (or her, or it, or whatever your NPCs are classified as). It’s a simple but elegant solution; all you need to establish is a one-to-one map- ping of scripts to NPCs and you’ve got an easy and reasonably read more..

  • Page - 78

    37 (Player talks to NPC for the first time) NPC: “Hey, you look familiar.” (Squints at player’s face) Player: “Do I? I don’t believe we’ve met.” NPC: “Wait a sec— you’re the guy who’s gonna save the world from the vampires, right?” NPC: (If player says Yes) “I knew it! Here, take this garlic!” ( Gives player garlic ) Player: “Thanks!” (Player talks read more..

  • Page - 79

    38 ■ As you can see in the first exchange, the NPC needs the ability to ask the player a ques- tion. At the very least, he needs to prompt the player for a yes or no response and branch out through the script’s code depending on the result. It’d be nice to provide a custom list of possible answers as well, however, because not everything is going to be a read more..

  • Page - 80

    39 macro-esque, command-based script and a lot more like the beginnings a C/C++ program! In essence, it will be. Let’s take a look at some C/C++-like script code that you might write to imple- ment this conversation. static int iConverseCount = 0; static bool bIsPlayerHero = FALSE; main () { string strAnswer; if ( iConverseCount == 0 ) { NPCTalk ( "Hey, you look read more..

  • Page - 81

    40 PlayAnim ( PLAYER, STAMP_FEET ); } elseif ( iConverseCount == 2 ) { NPCTalk ( "Dude I told you, I gave you all my garlic. Leave me alone!" ); PlayerTalk ( "But I ran out, and there's still like 10 more vampires that need to be valiantly defeated!" ); NPCTalk ( "Hmm... well, my brother lives in the next town over, and he owns a garlic processing plant. I'll read more..

  • Page - 82

    41 even more later. Throughout the course of an RPG project, you’ll most likely find use for even more advanced features like arrays, pointers, dynamic resource allocation, and so on. It’s a lot easier to decide to go with a C/C++-style syntax from the beginning and just add new things as you need them than it is to design both the syntax and overall structure of read more..

  • Page - 83

    42 Furthermore, items and weapons in modern RPGs need to be attention-grabbers. Gone are the days of casting a spell or attacking with a sword that simply causes some lost hit points; today, gamers expect grandiose animations with detailed effects like glowing, morphing, and lens flares. Because graphics programming is a demanding and complicated field, a feature-rich scripting read more..

  • Page - 84

    43 selected and used it. But that’s not enough; like I mentioned earlier, these things need to be expe- rienced—they need to be seen and heard. What’s the fun in using a weapon if you don’t get to see some fireworks? So, the other thing you need to worry about when scripting items and weapons are the visuals. This is where command-based languages fall short. read more..

  • Page - 85

    44 ■ The player needs to see an actual fireball being launched from the player’s on-screen location to that of the enemy, as well as hear an explosion-like sound effect that’s played upon impact. Because you’re now dealing with animation and sound, you’re definitely going to need conditional logic and iteration. Command-based languages are no longer an option. In read more..

  • Page - 86

    45 Enemies I’ve covered the friendlier characters, like NPCs, and you understand the basis for the items and weapons you use to combat the forces of darkness, but what about the forces of darkness themselves? Enemies are the evil, hostile characters in RPGs. They roam the game world and repeatedly attack the players in an attempt to stop them from fulfilling whatever it is read more..

  • Page - 87

    46 have the functional and destructive characteristics of items and weapons. As a result, determining how to define an enemy for use in your RPG engine is basically just a matter of combining the concepts behind these two other entities. The Solution You could approach this situation in any number of ways, but they all boil down to pretty familiar territory. As was the read more..

  • Page - 88

    47 ■ Endurance. How well the enemy will hold up after taking a significant amount of dam- age. Higher endurance allows enemies to maintain their intensity when the going gets rough. ■ Armor/Defense. How much damage incoming attacks will cause. The lower the armor/defense level, the faster its hit points will decrease over the course of the battle due to its vulnerability. ■ read more..

  • Page - 89

    48 decisions based on it. It’s in a constant state of activity, and as such, its script must be written in a different manner. Basically, the difference is that you need to think of the code as being part of a larger, constant loop rather than a single, self-contained event. Check out Figure 2.9 for a visual idea of this. 2. APPLICATIONS OF SCRIPTING SYSTEMS Like read more..

  • Page - 90

    49 else { iWeakestPlayer = GetWeakestPlayer (); if ( Player [ iWeakestPlayer ].iHitPoints < 20 ) Attack ( iWeakestPlayer, METEOR_SHOWER ); else { iLastAttacker = GetLastAttacker (); switch ( Player [ iLastAttacker ].iType ) { case NINJA: { Attack ( iLastAttacker, THROW_FIREBALL ); break; } case MAGE: { Attack ( iLastAttacker, BROADSWORD ); break; } case WARRIOR: { Attack ( iLastAttacker, read more..

  • Page - 91

    50 feels strong enough to keep fighting, however, it calls a function provided by the battle engine to determine the identity of the weakest player. If the enemy deems the player suitably close to defeat (in this case, if his HP is less than 20), it wipes him out with the devastating “Meteor Shower” attack (whatever that is). If the weakest player isn’t quite weak read more..

  • Page - 92

    51 Objects, Puzzles, and Switches (Obligatory Oh My!) The world of a highly developed FPS needs to feel “alive.” Ideally, ever ything around you should properly react to your interaction with it, whether you’re using it, activating it, shooting it, throw- ing grenades at it, or whatever else you like doing with crates and computer terminals. If you see a light switch on the read more..

  • Page - 93

    52 lower the shields surrounding the reactor you want to destroy, or whatever. In these cases, objects are no longer self-contained, privately-operating entities. They now work together to create com- plex, interconnected systems, and can even be combined to form elaborate puzzles. Check out Figure 2.11. 2. APPLICATIONS OF SCRIPTING SYSTEMS Figure 2.11 A mock-up hallway scene from an read more..

  • Page - 94

    53 crate to violently explode when gently pushed, and it’d be equally confusing if the crate only slid over a few inches after being struck by a nuclear missile. Events in a typical FPS relate to the abilities of the players and enemies who inhabit the game world. For example, players might be able to perform the following actions: ■ Fire. Fires the weapon the player read more..

  • Page - 95

    54 { case SHOT: { /* The crate has been shot and thus destroyed, so first let's make it disappear. */ this.bIsVisibile = FALSE; /* Now let's tell the game engine to spawn an explosion in its place. */ CreateExplosion ( this.iX, this.iY, this.iZ ); /* To complete the effect, we'll tell the game engine to spawn a particle system of wooden shards, emanating from the explosion. */ read more..

  • Page - 96

    55 } } } And the door switch: /* * Door Switch * * Can be shot and destroyed, and is also * used to open and close a door. */ main ( Event InvokingEvent ) { switch ( InvokingEvent.Type ) { case SHOT: { /* Just to be evil, let's make the switch very fragile. Shooting it will destroy it and render it useless! Ha ha! */ this.bIsBroken = TRUE; /* And just to read more..

  • Page - 97

    56 /* This is the primary function of the switch. Let's assume that the level's doors exist in an array, and the one we want to open or close is at index zero. */ if ( Door [ 0 ].IsOpen ) CloseDoor ( 0 ); else OpenDoor ( 0 ); break; } } } And finally, the electric fence. /* * Electric Fence * * Simply exists to shock whoever or whatever comes in * read more..

  • Page - 98

    57 Entity [ InvokingEvent.iEntityIndex ].Health -= 10; /* But what fun is electrocution without the visuals? */ CreateParticleSystem ( this.iX, this.iY, this.iZ, SPARKS ); /* And to really drive the point home... */ PlaySound ( ZAP_AND_SIZZLE ); } } } And there you go. Three fully-functional FPS game world objects, ready to be dropped into an alien corridor, a military compound, or a read more..

  • Page - 99

    58 AI, or artificial intelligence, is what makes a good FPS such a convincing experience. Games just aren’t fun if enemies don’t seem lifelike and unique; if you’re simply bombarded with lemming- like creatures that dive headlong into your gunfire, you’re going to become very bored, very quickly. So, not surprisingly, the AI of FPS bad guys is a rapidly evolving read more..

  • Page - 100

    59 tives into the desired ones. So if one enemy, designated as the “leader” of sorts, decides that sur- rounding the player would be the most effective strategy, that leader needs the ability to spread that message around. The Solution If enemies need to communicate, and enemies are based on scripts, what I’m really talking about here is inter-script communication. So, for example, read more..

  • Page - 101

    60 An actual discussion of artificial intelligence, however, would be lengthy at best and is well beyond the scope of this book. The main lesson here is that script-to-script communication is a must for any FPS, because it’s required for group-based enemy AI. SUMMARY With any luck, your interest in scripting has taken on a more focused and educated form over the course of read more..

  • Page - 102

    Part Two Command- Based Scripting read more..

  • Page - 103

    This page intentionally left blank read more..

  • Page - 104

    Introduction to Command- Based Scripting “It’s not Irish, it’s not English, it’s just... well... it’s just Pikey.” ——Turkish, Snatch CHAPTER 3 read more..

  • Page - 105

    64 W ith the introductory stuff behind you, it’s time to roll up your sleeves and take a stab at some basic scripting. To get started, you’re going to explore a simple but useful method of scripting known as command-based scripting. Command-based scripts starkly contrast the types of scripts you’ll ultimately write—they don’t support common programming language features read more..

  • Page - 106

    65 in these terms, game engines really only perform a limited number of tasks. Even a game like Quake, for example, is based primarily on only a few major actions, such as: ■ Player and robot movement within the game world. ■ The firing of player and robot (bot) weapons. ■ Managing the damage taken by collisions between players, bots, and projectiles. ■ Assigning read more..

  • Page - 107

    66 ■ The file containing the new arena’s geometry, textures, shadow maps, and other such resources is opened. ■ The file format is parsed, headers are verified, and data is carefully extracted. ■ New structures are allocated to store the arena, which are incrementally filled with the data from the file. ■ The existing background music fades out. ■ The existing read more..

  • Page - 108

    67 The point to all this is that writing a command-based script is like articulating the high-level explanation of the process in a reasonably structured way. Let’s just jump right in and see how the previous process would look as a command-based script: ShowBitmap "Gfx/LevelLoading.bmp" LoadLevel "Levels/Level4.lev" FadeBGMusicOut PlaySound "Sounds/LevelLoaded.wav" read more..

  • Page - 109

    68 Commands Specifically, a command is a symbolic name given to a specific game engine function or action. Commands can accept zero or more parameters, which can vary in data types but must always be literal values (command-based languages don’t support variables or other methods of indirec- tion). Here’s the general syntax: Command Param0 Param1 Param2 Imagine writing a C read more..

  • Page - 110

    69 Actually Getting Something Done With all of these restrictions, you may be wondering if command-based languages (or CBLs, as the street kids are saying nowadays) are actually useful for anything. Admittedly, the inability to define or use variables, expressions, loops, branches, and other common features of program- ming languages is a serious setback. What this means, however, is read more..

  • Page - 111

    70 environment of each location in the game and could scroll in all four directions. On top of these maps, sprite-based characters would move around and interact with one another, as well the underlying background map. As you learned in the last chapter, one major issue of such games is the non-player characters (NPCs). NPCs need to appear lifelike, at least to some extent, read more..

  • Page - 112

    71 Pause 400 SetNPCDir "Down" ShowTextBox "Hmmmmm... I know I left it here somewhere..." Pause 400 Can you tell what this does just by looking at it? In only a few lines of simplistic script code, I’ve defined the behavior for an NPC who’s clearly looking for something. He starts off in a given position, facing a given direction, and turns “up” (which actually read more..

  • Page - 113

    72 the NPC data within the game engine. At this point, the script has succeeded in controlling the game engine. The execution of command-based scripts is always purely sequential. This means that execution starts with the first command (line 0) and runs until the last command (line 5, in the case of Figure 3.4). At each step of the way, a global variable representing the read more..

  • Page - 114

    73 executes quickly and continually, simulating the execution of actual code. Once the last com- mand in the script is reached, the script can either stop or loop back to the beginning and run again. Figure 3.5 illustrates the execution of a script. COMMAND-BASED SCRIPTING OVERVIEW Figure 3.5 The execution of a script. Looping Scripts So should your scripts loop or stop when the read more..

  • Page - 115

    74 otherwise ambient entities. For example, NPCs represent the living inhabitants of the game world, which means they should be constantly moving to keep the player’s suspension of disbelieve intact. NPC scripts, therefore, should immediately revert to the first command after executing the last so that their actions never cease. Granted, this means that looped scripts will read more..

  • Page - 116

    75 Writing the Script It won’t take much to test this language, because you can deem it functional after implementing just four commands. Here’s a reasonable test script, though, that will help determine whether everything is working right in the following pages: PrintString "This is a command-based language." PrintString "Therefore, this is a command-based script." Newline read more..

  • Page - 117

    76 LoadScript () is used to load scripts into memory. It works like this: ■ The file is opened in binary mode, and every instance of the '\n' (newline) character is counted to determine how many lines it contains. ■ A string array is then allocated to hold the script based on this number. ■ The script is then loaded, line-by-line, and the file is closed. read more..

  • Page - 118

    77 // Allocate a script of the proper size g_ppstrScript = ( char ** ) malloc ( g_iScriptSize * sizeof ( char * ) ); // Load each line of code for ( int iCurrLineIndex = 0; iCurrLineIndex < g_iScriptSize; ++ iCurrLineIndex ) { // Allocate space for the line and a null terminator g_ppstrScript [ iCurrLineIndex ] = ( char * ) malloc ( MAX_SOURCE_LINE_SIZE + 1 ); // read more..

  • Page - 119

    78 // Free each line of code individually for ( int iCurrLineIndex = 0; iCurrLineIndex < g_iScriptSize; ++ iCurrLineIndex ) free ( g_ppstrScript [ iCurrLineIndex ] ); // Free the script structure itself free ( g_ppstrScript ); } The function first makes sure the g_ppstrScript [] array is valid, and then manually frees each line of code. After this step, the string array pointer read more..

  • Page - 120

    79 void RunScript () { // Allocate strings for holding source substrings char pstrCommand [ MAX_COMMAND_SIZE ]; char pstrStringParam [ MAX_PARAM_SIZE ]; // Loop through each line of code and execute it for ( g_iCurrScriptLine = 0; g_iCurrScriptLine < g_iScriptSize; ++ g_iCurrScriptLine ) { // ---- Process the current line // Reset the current character g_iCurrScriptLineChar = 0; // Read read more..

  • Page - 121

    80 iCurrString < iLoopCount; ++ iCurrString ) printf ( "\t%d: %s\n", iCurrString, pstrStringParam ); } // Newline else if ( stricmp ( pstrCommand, COMMAND_NEWLINE ) == 0 ) { // Print a newline printf ( "\n" ); } // WaitForKeyPress else if ( stricmp ( pstrCommand, COMMAND_WAITFORKEYPRESS ) == 0 ) { // Suspend execution until a key is pressed while ( kbhit () ) getch (); read more..

  • Page - 122

    81 GetCommand () that fills pstrCommand with a string containing the specified command (you’ll learn more about g_iCurrScriptLineChar momentarily.) A series of if/else if’s is then entered to deter- mine which command was found. stricmp () is used to make the language case-insensitive, which I find convenient. As you can see, each comparison is made to a constant relating read more..

  • Page - 123

    82 GetCommand () The key to everything is g_iCurrScriptLineChar. Although g_iCurrScriptLine keeps track of the current line within the script, g_iCurrScriptLineChar keeps track of the current character within that line. Whenever a new line is executed by the execution loop, g_iCurrScriptLineChar is imme- diately set to zero. This puts the index within the source line string at the very read more..

  • Page - 124

    83 // Move to the next character in the current line ++ g_iCurrScriptLineChar; } // Skip the trailing space ++ g_iCurrScriptLineChar; // Append a null terminator pstrDestString [ iCommandSize ] = '\0'; // Convert it all to uppercase strupr ( pstrDestString ); } Just as expected, this function is little more than a character-reading loop that incrementally builds a new string containing read more..

  • Page - 125

    84 The process followed by GetCommand () is repeated for both GetIntParam () and GetStringParam (), so you should have no trouble following them. The only real difference is that unlike GetCommand (), both of these functions convert their substring in some form to create a “final value” that the command handler will use. For example, integer parameters found in the script will, read more..

  • Page - 126

    85 // Otherwise, append it to the current command pstrString [ iParamSize ] = cCurrChar; // Increment the length of the command ++ iParamSize; // Move to the next character in the current line ++ g_iCurrScriptLineChar; } // Move past the trailing space ++ g_iCurrScriptLineChar; // Append a null terminator pstrString [ iParamSize ] = '\0'; // Convert the string to an integer int read more..

  • Page - 127

    86 // Move past the opening double quote ++ g_iCurrScriptLineChar; // Read all characters until the closing double quote to isolate // the string while ( g_iCurrScriptLineChar < ( int ) strlen ( g_ppstrScript [ g_iCurrScriptLine ] ) ) { // Read the next character from the line cCurrChar = g_ppstrScript [ g_iCurrScriptLine ][ g_iCurrScriptLineChar ]; // If a double quote (or read more..

  • Page - 128

    87 g_iCurrScriptLineChar to avoid the first quote. It then runs until the next quote is found, without including it. This is why it’s very important to note that GetStringParam () reads characters until a quote or newline character is encountered, rather than a space or newline, as the last two func- tions did. Lastly, the function increments g_iCurrScriptLineChar by two. This read more..

  • Page - 129

    88 // PrintStringLoop else if ( stricmp ( pstrCommand, COMMAND_PRINTSTRINGLOOP ) == 0 ) { // Get the string GetStringParam ( pstrStringParam ); // Get the loop count int iLoopCount = GetIntParam (); // Print the string the specified number of times for ( int iCurrString = 0; iCurrString < iLoopCount; ++ iCurrString ) printf ( "\t%d: %s\n", iCurrString, pstrStringParam ); } // read more..

  • Page - 130

    89 are just used to make sure the keyboard buffer is clear beforehand. Also, just to make things a bit more interesting, PrintStringLoop prints each string after a tab and a number that marks where it is in the loop. Figure 3.6 illustrates this general process of the script controlling the text console. IMPLEMENTING A COMMAND-BASED LANGUAGE Figure 3.6 The process of commands in read more..

  • Page - 131

    90 Before moving on, there’s an important lesson to be learned here about command-based lan- guages. Because these languages consist entirely of domain-specific commands, the actual body of RunScript () has to change almost entirely from project to project. Otherwise, the existing com- mand handlers will almost invariably have to be removed entirely and replaced with new ones. This read more..

  • Page - 132

    91 The Language In addition to displaying these images and performing transitions, the intro program plays sounds as well. Table 3.3 lists each of the commands the language will offer to facilitate every- thing you need. I just added an Exit command on a whim here; it doesn’t really serve a direct purpose because the script will end anyway upon the execution of the file read more..

  • Page - 133

    92 The Script You know what you want the intro to look like, roughly at least, so you can now write the script: DrawBitmap "gfx/copyright.bmp" PlaySound "sound/ambient.wav" Pause 3000 PlaySound "sound/wipe.wav" FoldCloseEffectY DrawBitmap "gfx/ynh_presents.bmp" PlaySound "sound/ambient.wav" Pause 3000 PlaySound "sound/wipe.wav" FoldCloseEffectX DrawBitmap read more..

  • Page - 134

    93 screen for a few seconds thanks to Pause. FoldCloseEffect transitions to the next screen, along with a transition sound effect. Finally, the title screen (which plays a different effect) is displayed and remains on-screen until a key is pressed. It may be simple, but this is the same idea behind just about any game intro sequence. Add some commands for playing .MPEG or read more..

  • Page - 135

    94 The actual demo code is rather cluttered with calls to my wrapper API, so I’ve chosen to leave it out here, rather than risk the confusion it might cause. I strongly encourage you to check it out on the CD, however, although you can rest assured that the implementation of each command is simple either way. Here’s the code to the new version of RunScript () read more..

  • Page - 136

    95 SCRIPTING AN RPG CHARACTER’S BEHAVIOR The game intro was an interesting application for command-based scripting, but it’s time to set your sights on something a bit more game-like. As you learned in the last chapter, and as was mentioned earlier in this chapter, RPGs have a number of non-player characters, called NPCs, that need to be automated in some way so they appear read more..

  • Page - 137

    96 Using these commands, you can move the character around in all directions, change the direc- tion the player’s facing, display text in a text box to simulate dialogue, and cause the player to stand still for arbitrary periods. All of these abilities come together to form a lifelike character that seems to be functioning entirely under his or her own control (and in a read more..

  • Page - 138

    97 // Do something else ShowTextBox "This is something else." PlaySound "Buzzer.wav" Much nicer, eh? And all it takes is the following addition to RunScript (), which is added to the beginning of the function just before the command is read with GetCommand (): if ( strlen ( g_NPC.ppstrScript [ g_NPC.iCurrScriptLine ] ) == 0 || ( g_NPC.ppstrScript [ read more..

  • Page - 139

    98 So, you’ll instead give the NPC two fields within his structure that define his current movement along the X and Y movements. For example, if you want the NPC to move north 20 pixels, you set his Y-movement to 20. At each iteration of the game loop, the NPC’s Y-movement would be evaluated. If it was greater than zero, he would move up one pixel, and the read more..

  • Page - 140

    99 must remain synchronous with the game loop, the Pause command can’t simply enter an empty loop within RunScript () until the duration elapses. Rather, RunScript ()will check the script’s pause status and end times each time it’s called. This way, the script can pause arbitrarily without stalling the rest of the game loop. The Script The script for the character is read more..

  • Page - 141

    100 SetCharDir "Left" MoveChar -80 0 MoveChar -8 -8 SetCharDir "Up" MoveChar 0 -80 MoveChar 8 -8 SetCharDir "Right" MoveChar 40 0 Pause 800 // Random movement with text box ShowTextBox "WE CAN EVEN MOVE AROUND WITH THE TEXT BOX ACTIVE!" Pause 2400 ShowTextBox "WHEEEEEEEEEEE!!!" Pause 800 SetCharDir "Down" MoveChar 12, 38 SetCharDir "Left" read more..

  • Page - 142

    101 The Implementation The demo requires two major resources to run—the castle background image and the NPCs ani- mation frames. Figure 3.11 displays some of these. These of course come together to form a basic but convincing scene, as shown in Figure 3.12. SCRIPTING AN RPG CHARACTER’S BEHAVIOR Figure 3.11 Resources used by the NPC demo. Figure 3.12 The running NPC demo. read more..

  • Page - 143

    102 Of course, the real changes lie in RunScript (). In addition to the new command handlers, which should be pretty much no-brainers, there are some other general changes as well. Here’s the function, with the command handlers this time (notice I left them in this time because the graphics-intensive code has been offloaded to the main loop): void RunScript () { // Only read more..

  • Page - 144

    103 // Reset the current character g_NPC.iCurrScriptLineChar = 0; // Read the command GetCommand ( pstrCommand ); // ---- Execute the command // MoveChar if ( stricmp ( pstrCommand, COMMAND_MOVECHAR ) == 0 ) { // Move the player to the specified X, Y location g_NPC.iMoveX = GetIntParam (); g_NPC.iMoveY = GetIntParam (); } // SetCharLoc if ( stricmp ( pstrCommand, COMMAND_SETCHARLOC ) read more..

  • Page - 145

    104 if ( stricmp ( pstrStringParam, "Up" ) == 0 ) g_NPC.iDir = UP; if ( stricmp ( pstrStringParam, "Down" ) == 0 ) g_NPC.iDir = DOWN; if ( stricmp ( pstrStringParam, "Left" ) == 0 ) g_NPC.iDir = LEFT; if ( stricmp ( pstrStringParam, "Right" ) == 0 ) g_NPC.iDir = RIGHT; } // ShowTextBox else if ( stricmp ( pstrCommand, COMMAND_SHOWTEXTBOX ) == 0 ) { // read more..

  • Page - 146

    105 // Activate the pause g_NPC.iIsPaused = TRUE; g_NPC.iPauseEndTime = iPauseEndTime; } // Move to the next line ++ g_NPC.iCurrScriptLine; } The function begins by checking the NPC’s X and Y movement. If he’s currently in motion, the function returns without evaluating the line or incrementing the line counter. This allows the character to complete his current task without the rest read more..

  • Page - 147

    106 ■ Updates the current frame of animation, so the character always appears to be walking (even when he’s standing still, heh). ■ Sets the direction the character is facing, in case it was changed within the last frame by RunScript (). ■ Blits the appropriate character animation sprite based on the direction he’s facing and the current frame. ■ Draws the text box read more..

  • Page - 148

    107 else phCurrFrame = & g_hCharDown1; break; case LEFT: if ( iCurrAnimFrame ) phCurrFrame = & g_hCharLeft0; else phCurrFrame = & g_hCharLeft1; break; case RIGHT: if ( iCurrAnimFrame ) phCurrFrame = & g_hCharRight0; else phCurrFrame = & g_hCharRight1; break; } W_BlitImage ( * phCurrFrame, g_NPC.iX, g_NPC.iY ); // Draw the text box if active if ( g_iIsTextBoxActive ) { // Draw the read more..

  • Page - 149

    108 // Handle X-axis movement if ( g_NPC.iMoveX > 0 ) { ++ g_NPC.iX; -- g_NPC.iMoveX; } if ( g_NPC.iMoveX < 0 ) { -- g_NPC.iX; ++ g_NPC.iMoveX; } // Handle Y-axis movement if ( g_NPC.iMoveY > 0 ) { ++ g_NPC.iY; -- g_NPC.iMoveY; } if ( g_NPC.iMoveY < 0 ) { -- g_NPC.iY; ++ g_NPC.iMoveY; } } // If a key was pressed, exit if ( g_iExitApp || W_GetAnyKeyState () ) break; So that read more..

  • Page - 150

    109 CONCURRENT SCRIPT EXECUTION Unless your game has some sort of Twilight Zone-like premise in which your character and one NPC are the only humans left on the planet, you’re probably going to want more than one game entity active at once. The problem with this is that so far, this scripting system has been designed with a single script in mind. Fortunately, command-based read more..

  • Page - 151

    110 Once these changes have been made (which you can see for yourself on the demo included on the CD), it becomes possible to create any number of NPCs, all of which will seem to move around simultaneously. Check out Figure 3.14. 3. INTRODUCTION TO COMMAND-BASED SCRIPTING Figure 3.14 The multiple NPC demo. SUMMARY You must admit; this is pretty cool. You’re only just getting read more..

  • Page - 152

    111 Overall, command-based languages are a lot of fun to play with. They can be implemented extremely quickly, and once up and running, can be used to solve a reasonable amount of basic scripting problems. After the next chapter, you’ll have command-based languages behind you and can move on to designing and implementing a C-style language and truly becoming a game scripting read more..

  • Page - 153

    112 ■ Intermediate: Add escape sequences that allow the double-quote symbol (") to appear within string literals without messing up the interpreter. Naturally, this can be important when scripting dialogue sequences. ■ Difficult: Implement anything from the next chapter (after reading it, of course). 3. INTRODUCTION TO COMMAND-BASED SCRIPTING read more..

  • Page - 154

    Advanced Command- Based Scripting “We gotta take it up a notch or shut it down for good.” ——Tyler Durden, Fight Club CHAPTER 4 read more..

  • Page - 155

    114 T he last chapter introduced command-based scripting, and was a gentle introduction to the process of writing code in a custom-designed language and executing it from within the game engine. Although this form of scripting is among the simplest possible solutions, it has proven quite capable of handling basic scripting problems, like the details of a game’s intro sequence or read more..

  • Page - 156

    115 NEW DATA TYPES The current command-based scripting system is decidedly simple in its support for data types. Parameters can be integers or strings, with no real middle ground. You can simulate symbolic constants in a brute-force sort of manner using descriptive string literals, like "Up" and "Down", for example, but this is obviously a messy way to solve the read more..

  • Page - 157

    116 General-Purpose Symbolic Constants Having built-in TRUE and FALSE constants is great, but there will be times when an enumeration of arbitrary symbolic constants will be necessary. You’ve already seen an example of this in the last chapter, when you were forced to use the string literal values "Up", "Down", "Left", and "Right" to represent the read more..

  • Page - 158

    117 a puzzle game or a flight simulator, can use it in the same way (as illustrated in Figure 4.3). Here’s an example: DefConst UP 0 DefConst DOWN 1 DefConst LEFT 2 DefConst RIGHT 3 NEW DATA TYPES Figure 4.3 DefConst is a domain-independent command. An Internal Constant List The question is, how does the interpreter “make a record” of the constant? The easiest approach is read more..

  • Page - 159

    118 From this point on, whenever a command is executed, constants can be accepted in the place of integer parameters. In these cases, the specified identifier is used as a key to search the constant list and find its associated value. In fact, a slick way to add constants to your existing commands without changing them is to simply rewrite GetIntParam () to transparently read more..

  • Page - 160

    119 So, to summarize, the implementation of constants is twofold. First, DefConst must be used to define the constant by assigning it an integer value. This value is added to the constant list and ready to go. Then, GetIntParam () is rewritten to transparently handle constant references, which allows existing commands to keep functioning without even having to know such constants read more..

  • Page - 161

    120 // Cause an NPC to pace back and forth SetNPCDir LEFT MoveNPC 20 0 Pause PAUSE_DUR SetNPCDir RIGHT MoveNPC -20 0 Pause PAUSE_DUR Cool, huh? Now the NPC can be moved around using actual directional constants, and the dura- tion at which he rests after each movement can even be stored in a constant. This will come in particularly handy if you want to use the same pause read more..

  • Page - 162

    121 One easy way around this is to maintain a flag that monitors whether the script is in its first itera- tion; if so, constant declarations are handled; if not, they’re ignored because the constant list has already been built. Check out Figure 4.9. This is a reasonable solution, and will be necessary if you stick to a single-pass approach. However, the two-pass approach read more..

  • Page - 163

    122 string in the script array that con- tains a DefConst command, and tell the interpreter to check for and ignore null pointers. Now, the comparison of each line’s com- mand to DefConst can be eliminat- ed entirely, saving time when large numbers of scripts are running concurrently. So one benefit of the two-pass approach is that it alleviates a small string comparison read more..

  • Page - 164

    123 mentioning nonetheless. A real application of two-pass execution, however, is eliminating the idea of constants altogether at runtime. If you think about it, constants don’t provide any additional functionality that wasn’t available before as far as actual script execution goes. For example, consider the following script fragment: DefConst MY_CONST 20 MyCommand MY_CONST This could be read more..

  • Page - 165

    124 Now, with the preprocessed code entirely devoid of constant references, the constant list can be disposed of entirely and any extra code written into GetIntParam () for handling constants can be removed. The finished script will now appear to the interpreter as if it were written entirely by hand, and execute just as fast. How cool is that? Loading Before Executing Aside read more..

  • Page - 166

    125 long before they’re actually used, scripts should be both loaded and preprocessed before run- ning. This allows the first of the two passes to take as much time as it needs without intruding on the script’s overall runtime performance. What this does mean, however, is that your engine should be designed specifically to determine all of the scripts it will need for read more..

  • Page - 167

    126 been talked to already. In the second case, the NPC now lives in a world no longer threatened by “the ultimate evil,” and can probably react in a much cheerier manner. As discussed in Chapter 2, these are all examples of game flags. Game flags are set and cleared as various events transpire, and persist throughout the lifespan of the game. Each flag corresponds read more..

  • Page - 168

    127 Furthermore, you can use the symbolic constants described in the previous section to give each flag a descriptive name such as ED_TALKED_TO or NUKE_DEFUSED. Specifying a flag with either an integer parameter or constant is easy. The real issue is determin- ing how to group code in such a way that the interpreter knows it’s part of a specific condition. One solution read more..

  • Page - 169

    128 In this simple example, the new If command works as follows. First, its single integer parameter (which, of course, can also be a constant) is evaluated. The following two lines of code provide both the true and false actions. If the flag is set, the first of these two lines is executed and the second is skipped. Otherwise, the reverse takes place. This is read more..

  • Page - 170

    129 SetNPCDir UP MoveNPC 0 -24 } These blocks provide much fuller reactions to each condition, and can be referred to with a sin- gle name. Now, if the If command is rewritten to instead accept three parameters—an integer flag index and two block names—you could rewrite the previous code like this: If NUKE_DEFUSED NukeDefused NukePrimed Slick, eh? Now, with one line of read more..

  • Page - 171

    130 be performed after loading the script, because attempting to collect information about a script’s blocks while executing that same script is tricky and error-prone at best. Naturally, you’ll store this information in another linked list called the block list. This list will con- tain the names of each block, as well as the indexes of the first and last commands (or, read more..

  • Page - 172

    131 Iterative Logic Getting back to the original topic, there’s the separate issue of looping and iteration. Much like the If command, a command for looping needs the capability to stop at a certain point, in response to some event. Because this simple scripting system is designed only to have access to binary game flags, these will have to do. Looping can be implemented read more..

  • Page - 173

    132 Block RunLikeHell { // Run to the left/east, away from the reactor MoveNPC 80 0 // Stop for a moment to scream bloody murder ShowTextBox "WE'RE ALL GONNA DIE!!!" Pause 300 // Keep moving! MoveNPC 80 0 // Scream some more ShowTextBox "SERIOUSLY! IT'S ALL OVER!!!" Pause 300 // As long as the loop runs, this block will be executed over and over } // If the nuke read more..

  • Page - 174

    133 This is a decent solution, but it’s a bit complex; you now have to test for optional parameters, which is more logic than you’re used to. Instead, it’s easier to just add another looping com- mand, one that will provide the converse of While: Until NUKE_DEFUSED RunLikeHell Simple, huh? Instead of looping while a flag is set, Until loops until a flag is set. read more..

  • Page - 175

    134 Block BlockY { ShowTextBox "Block Y called." Pause 400 While FLAG_Z BlockZ } Block BlockZ { ShowTextBox "Block Z called." Pause 400 } First BlockX is called, which will push the index of the first While line onto the stack. Then, BlockY is called, which pushes the index of BlockX’s While line onto the stack. The same is done for BlockY and its While command, read more..

  • Page - 176

    135 As you can see, support for nested block invocation is not a trivial matter, so I won’t discuss it past this. Besides, as the book progresses, you’ll get into real functions and function calls, and learn all about how this process works for serious scripting languages. Until then, nesting is a luxury that isn’t necessary for the basic scripting that command-based read more..

  • Page - 177

    136 you start getting lower and lower on the heirarchy, and events become more and more specific, it gets cumbersome to implement each of these events’ scripts in separate files. For example, if an NPC named Steve can react to three events—being talked to, being pushed, and being offered money—your current system would force you to write the following scripts: read more..

  • Page - 178

    137 COMPILING SCRIPTS TO A BINARY FORMAT Thus far you’ve seen a number of ways to enhance a script’s power and flexibility, but what about the script data itself? You’re currently subjecting your poor real-time game engine to a lot of string processing that, at least when compared to dealing strictly with integer values, is slow. Just as you learned in Chapter 1, read more..

  • Page - 179

    138 ■ The string buffer containing the command is then compared to each possible command name, which is another operation that requires traversing each character in the string. Each character is read from the string buffer and compared to the corresponding char- acter in the specified command name to make sure the strings match overall. ■ Once a command has been matched, read more..

  • Page - 180

    139 Detecting Compile-Time Errors The fastest script format in the world doesn’t matter if it has errors that cause everything to choke and die at runtime. Despite the simplicity of a command-based language, there’s still plen- ty of room for error, both logic errors that simply cause unexpected behavior, and more serious errors that bring everything to a screeching halt. For read more..

  • Page - 181

    140 Compiled scripts are not in a format that’s easily readable by humans, nor are they even easily opened in a text editor in the first place. Unless the player is willing to crack them open in a hex editor and understands your compiled script format, you can sleep tight knowing that your game is safe and all is well. How a CBL Compiler Works A command-based read more..

  • Page - 182

    141 COMPILING SCRIPTS TO A BINARY FORMAT Table 4.2 Domain-Independent Commands Command Description DefConst Defines a constant and assigns it the specified integer value. If Evaluates the specified flag and executes one of the two specified blocks based on the result. While Executes the specified block until the specified flag is cleared. Until Executes the specified block until the read more..

  • Page - 183

    142 This means that, if the compiler were fed a script that consisted of the following sequence of commands (ignore parameters for now): DefConst DefConst MovePlayer MoveNPC PlaySound MovePlayer GetItem PlaySound The compiler would translate this to the following numeric sequence (see for yourself by compar- ing it to the previous table): 0 0 4 7 9 4 5 9 As long as you keep read more..

  • Page - 184

    143 case COMMAND_MOVEPLAYER: // MovePlayer handler break; case COMMAND_PAUSE: // Pause handler break; } These new numeric “command codes” make everything much faster, smaller, easier, and more robust. Of course, you are skipping one major advantage that you can easily take advantage of when compiling. Compile-Time Preprocessing You’ve already seen the advantage of preprocessing the DefConst read more..

  • Page - 185

    144 will be preceded with the number of entries it contains, just like you did with the command list itself. For example, imagine a script has two blocks. The first block begins at the seventh com- mand and ends at the twelfth, and the second begins at the 22nd and ends at the 34th. The block list would then be written out like this: 2 7 12 22 34 The leading 2 read more..

  • Page - 186

    145 simple to compile, because they’re already in an irreducible format. Strings, although more com- plex, really can’t be compiled much either, aside from attempting to perform some sort of com- pression (but then, that’s not compiling, it’s just compressing). The first and most important step when compiling parameters is ensuring that the command has been supplied with both read more..

  • Page - 187

    146 These parameters can then be stored in a static array, which is itself part of a larger structure that represents a compiled command: typedef struct Command // A compiled command { int iCommandCode; // The command code Param ParamList [ MAX_PARAM_COUNT ]; // The parameter list } Remember, read more..

  • Page - 188

    147 DefConst MY_CONST 256 MyCommand MY_CONST Is basically doing the same thing as the small C/C++ code fragment: #define MY_CONST 256 MyCommand ( MY_CONST ); DefConst can therefore be viewed as a way to define simple macros, especially because the compil- er will literally perform the same macro expansion that C/C++’s #define does. Of course, there’s one other extremely useful read more..

  • Page - 189

    148 Directions and other miscellaneous constants are one thing, but the real attraction here are game flags. Remember, games may have hundreds or even thousands of flags, the constants for which need to be available to all scripts. Declaring all of your flags in a single file means every script can easily reference various events and states. For example, here’s a file read more..

  • Page - 190

    149 File-Inclusion Implementation A file-inclusion preprocessor command is simple to implement, at least on a basic level. The idea is that, whenever an IncludeFile command is found, that particular line of code is removed from the script and replaced with the contents of the file it specifies. This means that a single line of code can be expanded to N lines, which in turn read more..

  • Page - 191

    150 As you can see, even the comments were included, but of course, that doesn’t matter to the com- piler. The contents of the source code linked list after every file has been included would most likely appear cluttered and disorganized if you were to print it, but of course, the compiler could- n’t care less as long as the code is syntactically valid. Check out read more..

  • Page - 192

    151 ier to master when I’ve had a chance to think about it on a more simplistic level beforehand. That was the idea of this chapter—whether you try to implement any of this stuff or not, it will hopefully get the gears turning in your head a bit, so by the time you reach real compiler issues, the light bulbs will already be flashing and you’ll find yourself read more..

  • Page - 193

    This page intentionally left blank read more..

  • Page - 194

    Part Three Introduction to Procedural Scripting Languages read more..

  • Page - 195

    This page intentionally left blank read more..

  • Page - 196

    Introduction to Procedural Scripting Systems “Well, when all else fails, fresh tactics!” ——Castor Tr oy, Face/Off CHAPTER 5 read more..

  • Page - 197

    156 I n the last section, you took your first steps towards developing your own scripting system by designing and implementing a command-based language from the ground up. Although the finished product was rather modest, many of the concepts behind basic script execution were illustrated first hand. The following chapters take things to the next level, however. In fact, it’d read more..

  • Page - 198

    157 High-Level Code High-level code is the most widely recognized part of a scripting system. Because it’s what scripts are written with in the first place, it’s the human interface to the script module and perhaps the system’s most useful component. High-level languages (HLLs), which include examples such as C, C++, Pascal and Java, were created so that problems could be read more..

  • Page - 199

    158 not quite all). This is great news because it means you can write your script code in almost the same language you’d use to write a game engine itself. The downside, however, is that C is a com- plex language, and writing a program that compiles C code anything but a trivial task. The extra effort involved, however, will be more than worth it in the end. In read more..

  • Page - 200

    159 PC developers often turn to assembly language coding for an extra speed boost when maximum performance is required (such as in the case of graphics routines), scripts stand to gain little from it by comparison. In accordance with my continuing theme of borrowing syntax from popular languages to make your script system as familiar and easy-to-use as possible, the assembly read more..

  • Page - 201

    160 The XtremeScript virtual machine closely mirrors a hardware-based computer in many ways. For example, it provides its own threading system to allow multiple scripts to run simultaneously; it manages protected memory and other resources required by a running script; it allows scripts to communicate with one another via a message system; and perhaps most importantly, it provides an read more..

  • Page - 202

    161 Machine) has been written for a vast number of systems, allowing Java code to run on any of them without rewriting a single line. The XtremeScript Virtual Machine, referred to as the XVM, will be implemented as a static library that can be dropped into any game project with minimal setup. It will be highly portable from one project to the next, making it an read more..

  • Page - 203

    162 High-Level Code/Compilation Once again, you can start with the high-level code. This is without a doubt the most profoundly convoluted step in the entire process of passing a script through the XtremeScript system, and that’s no coincidence. In all of computer science, the most difficult problems faced by software engineers are often the ones that deal with the complexities read more..

  • Page - 204

    163 and computers. Natural language synthesis, image recognition, and artificial intelligence are but a few of the fields of study that have puzzled programmers for decades. Not surprisingly, the area of scripting that involves understanding and translating a human-readable language like C (or a derivative of that language like XtremeScript) is significantly more complex than read more..

  • Page - 205

    164 Lexical Analysis The first and most basic operation the compiler performs is breaking the source file into mean- ingful chunks called tokens. Tokens are the fine-grained components that languages are based on. Examples include reserved words like C’s if, while, else, and void. Tokens also include arithmetic and logic operators, structure symbols like commas and parentheses, as read more..

  • Page - 206

    165 Syntax Tree. The AST is a convenient way to internally represent source code, and allows for more structured analysis later. Semantic Analysis Although the syntax of a language tells you what valid source code looks like, the semantics of a language is focused on what that code means. Let’s look at another example line of code: int Q = "Hello" + 3.14159; The read more..

  • Page - 207

    166 performs at virtually the same level as the code written by a human (or better). In this case, opti- mization is far less important, however. The speed overhead associated with scripts is so great (relative to native machine code like 80X86, that is) that the difference between optimized and unoptimzed script code is usually unnoticeable. Regardless, it’s still a topic read more..

  • Page - 208

    167 Low-Level Code/Assembly Turning an ASCII-formatted assembly language source file into a binary, machine-code version is far simpler than compiling high-level code, but it’s still a reasonably involved process. This process is called assembly, and is naturally handled by a program called an assembler. The Assembler Assembly language is significantly simpler than higher-level code for read more..

  • Page - 209

    168 pivotal and recurring role in the development of software. Although programmers still usually spend far more time debugging a program than they do writing it, many tools have been invent- ed to help ease and accelerate the process of hunting bugs down and squashing them. These tools are called debuggers. In the low-level world, debuggers usually work by loading an assembly read more..

  • Page - 210

    169 Although you’ve already learned about the XVM for the most part, there are a few things that could use some elaboration. For instance, w haven’t really decided on how exactly a script will communicate with the host application. You know that one of the primary features of a VM is its interface with the game engine, but how this will actually work is still read more..

  • Page - 211

    170 High-Level The high-level aspect of XtremeScript can be summarized with the following points: ■ Based around XtremeScript, a C-subset language our scripts will be written in. The lan- guage will be designed to resemble C and C++ as much as possible, in order to keep the environment familiar to the programmer. ■ High-level code will be compiled with the XtremeScript read more..

  • Page - 212

    171 ■ Each running script is given a protected environment with its own block of memory, code, stack space, and message queue. Scripts cannot read or write outside of their own address space, ensuring a higher-level of stability. ■ Other general security schemes can be put in place, such as loop timeout limits. That pretty much wraps things up. This list, although read more..

  • Page - 213

    This page intentionally left blank read more..

  • Page - 214

    Integration: Using Existing Scripting Systems “This will feel... a little weird.” ——Morpheus, The Matrix CHAPTER 6 read more..

  • Page - 215

    174 T he last chapter introduced you to scripting in a more technical manner through a general overview of how the pieces fit together, with a focus on exactly how they do so in XtremeScript. Armed with this information, you’re now ready for your first hands-on encounter with “real” scripting, which will be the integration of some of the more popular existing scripting systems read more..

  • Page - 216

    175 between two or more entities, interpreting and routing their input and output instead of letting them communicate directly (which may not even be possible). To understand this concept better, consider the analogy of a human translator. A translator for English and Japanese, for example, is someone who is fluent in both languages and allows English-only speakers to communicate read more..

  • Page - 217

    176 It’s called a layer because, for example, the translator is “wedged” in between the English and Japanese speaking parties, much like a layer of adhesive sits between two surfaces. It’s considered abstract because neither entity knows all the details of the others; in this case, the Japanese speak- ers don’t know English, and the gai-jin don’t know Japanese. Regardless, read more..

  • Page - 218

    177 (), which accepts two numeric values for moving the player along the X- and Y-axes, the following code certainly won’t compile in C: Int X = 16, Y = 32; MovePlayer ( X, Y ); Why not? Because from the perspective of your C compiler, MovePlayer () doesn’t exist. More importantly, even if the compiler knew about the function, how would the function be called? Python read more..

  • Page - 219

    178 example, 80x86 machine code, Python expects just the opposite and deals only with other Python scripts, which are far more high-level and “virtual” because they run inside the Python runtime environment. The problem is that these two languages exist in “parallel dimensions” so to speak, and therefore have no intrinsic methods of communication. If you’re in the mood for a read more..

  • Page - 220

    179 Again, this is an abstraction because Python and C still haven’t learned how to talk to each other. Rather, they’ve simply learned how to talk to a translator, which in turn is capable of talking to the other party for them. IMPLEMENTATION OF SCRIPTING SYSTEMS Generally, a scripting system is implemented in the form of a static library or something similar, although a read more..

  • Page - 221

    180 functions the script defines as well as how to call them. This information, coupled with the fact that it already has an intrinsic connection to the C host application, explains exactly how func- tion calls can be translated back and forth from the script to the host. In other words, both the C program and the Python script can now break up their function calls read more..

  • Page - 222

    181 running, perhaps a few general functions for initializing and shutting down the runtime environ- ment itself, and of course, functions for calling other functions defined by the script. If you write a script called my_script.scr, for example, that consists of three functions, DoThing0 (), DoThing1 (), and DoThing2 (), the pseudocode for a small C program that loads and read more..

  • Page - 223

    182 In a nutshell, the demo is composed of three phases: initialization, the main loop, and shutdown. Let’s first look at the steps taken by the initialization phase: ■ The Wrappuh API is initialized, which provides the program with simple access to DirectX for graphics, sound, and input. ■ The video mode is set. In this case, 640x480 is used with 16-bit color. ■ read more..

  • Page - 224

    183 of handling a small detail you’ve overlooked, and as a result, you’ll end up finding out that your language of choice was inappropriate halfway into the process of writing the actual scripts. This certainly isn’t a fun revelation, so plan ahead. Now that you’ve nailed down exactly what the initialization phase can do (and what the other two phases will do in a read more..

  • Page - 225

    184 of things will be inactive. A better design is to keep the actual main program loop run- ning in C and give the script only a small portion of each loop iteration to keep the sprites bouncing around. Also, the random number generator can be seeded in C. This is another operation that’s done only once and is so basic and obscure that there’s no need for the read more..

  • Page - 226

    185 LUA (AND BASIC SCRIPTING CONCEPTS) The first stop on your scripting language tour is the quaint little town of Lua. Lua is a simple, easy-to-use language and scripting system designed to extend any sort of program by giving it the capability to load and execute optionally compiled scripts (which, really, is the goal of virtually any scripting system). Lua the language is read more..

  • Page - 227

    186 where Filename is the name of the script. The script will be compiled into a file called luac.out by default, but this can be changed with the -o switch. For example, if you have a script called test.lua that you want compiled to a file with the same name, you type this: luac -o test.out test.lua What may surprise you about all this, however, is that you read more..

  • Page - 228

    187 You’d see the following output: 96 The last piece of information regarding the lua interactive interpreter worth mentioning is that it can also be used to immediately run simple scripts with- out the need to embed the lua.lib run- time environment into a C program. Simply call lua with a filename as the single command-line parameter, like so: lua my_script.lua and it will read more..

  • Page - 229

    188 Comments I like to introduce comment syntax first when describing a language, because it generally shows up in the code examples anyway. Lua’s single comment type is denoted with a double-dash: -- This is a comment. Just like the // comment in C++, Lua’s comments cause everything from the double-dashes to the end of the line to be ignored by the compiler. Lua has read more..

  • Page - 230

    189 This little example also illustrates another quirk of Lua’s syntax: that semicolons aren’t required to terminate lines. However, the semicolon can still be used and is still required in the case of statements that span multiple lines. Consider the following: MyVar0 = 128 -- Valid statement; semicolons are optional. MyVar1 = 256; -- Also valid; semicolons read more..

  • Page - 231

    190 The last issue of variables to cover now is the concept of multiple assignment, which Lua supports. Multiple assignment allows you to put more than one variable on the left side of the assignment operator, like so: X, Y, Z = 2, 4, 8; After this line executes, X will equal 2, Y will equal 4, and Z will equal 8. This left-to-right order allows you to tell which read more..

  • Page - 232

    191 Overall, multiple assignment is a convenient shorthand but definitely has potential to make your code less-than-readable. Only use it in cases when you’re sure that the code is clearly understand- able, and try not to do it for too many variables at once. Don’t try to get cute and impress your friends with huge tangles of multiple assignment; it will only result in read more..

  • Page - 233

    192 If you happen to have the Lua interpreter open at the time, try using the type () function to examine various identifiers. The type () function returns a string describing the data type of whatever identifier is passed to it, so consider the following: print ( type ( 256 ) ); \ print ( type ( 3.14159 ) ); \ print ( type ( "It's a trap!" ) ); Upon read more..

  • Page - 234

    193 attempt to convert it to a number and find that it has no numeric equivalent, thus stopping exe- cution to report the error of attempting to use a string in an arithmetic expression: error: attempt to perform arithmetic on a string value Tables Tables in Lua are, first and foremost, associative arrays not unlike the ones found in other script- ing languages like Perl read more..

  • Page - 235

    194 indexes 1 through 4, but you can still expand the array to cover 0 through 4 by simply assigning a value to the desired index. Lua will automatically expand the array to accommodate the new val- ues. In fact, virtually any index you can imagine will already be accessible the moment you create a new table. For example: print ( IntArray [ 0 ] ); print ( IntArray read more..

  • Page - 236

    195 Which will output the following: ABC MNO YZ It’s important to know exactly how things are working under the hood when working with tables that contain tables, however. When working with Lua, don’t think of tables as values, but rather as references. Any time you access a table index or assign a table to another table index, you’re actually dealing with the references read more..

  • Page - 237

    196 Enemy [ "Weapon" ] = "Pulse Cannon"; Enemy [ "Sprite" ] = "../gfx/enemies/security_droid.bmp"; print ( "Enemy Profile:" ); print ( "\n Type:", Enemy [ "Name" ], "\n HP:", Enemy [ "HP" ], "\nWeapon:", Enemy [ "Weapon" ] ); Which will print out the following: Enemy Profile: Type: Security Droid HP: 200 Weapon: Pulse read more..

  • Page - 238

    197 Enemy.Weapon = "Pulse Cannon"; Enemy.Sprite = "../gfx/enemies/security_droid.bmp"; print ( "Enemy Profile:" ); print ( "\n Type:", Enemy.Name, "\n HP:", Enemy.HP, "\nWeapon:", Enemy.Weapon ); As you can see, the string keys are now being used as if they were fields of a struct-like structure. In this case, that’s exactly what they are. read more..

  • Page - 239

    198 There are a number of escape sequences supported by Lua in addition to the previous one, but most are related to text formatting and are therefore not particularly useful when scripting games. However, I personally find the following useful: \\ (Backslash), \' (Single Quote), and \XXX, where XXX is a three-digit decimal value that corresponds to the ASCII code of the read more..

  • Page - 240

    199 LUA (AND BASIC SCRIPTING CONCEPTS) Table 6.1 Lua Arithmetic Operators Operator Function + Add - Subtract * Multiply / Divide ^ Exponent - Unary negation .. Concatenate (strings) Table 6.2 Lua Relational Operators Operator Function == Equal ~= Not equal < Less than > Greater than <= Less than or equal >= Greater than or equal Table 6.3 Lua Logical Operators Operator Function and read more..

  • Page - 241

    200 Major differences from C worth noting are as follows: the != (Not Equal) operator is replaced with the equivalent ~= operator, and the logical operators are now mnemonics instead of symbols (and instead of &&). These are important to remember, as it’s easy to forget details like this and have a “C lapse”. :) Conditional Logic Now that you have a handle on read more..

  • Page - 242

    201 else -- Unknown item end As you can see, the final else clause mimics C’s default case for switch blocks. As a gentle reminder, remember that the logical operators in Lua follow a different syntax from C: X = 1; Y = nil; if X ~= Y then print ( "X does not equal Y." ); end if X and Y then print ( "Both X and Y are true." ); end if X or Y then read more..

  • Page - 243

    202 That should all look pretty reasonable, although the exact syntax of the for loop might be a bit confusing. Unlike C, which allows you to use entire statements (or even multiple statements) to define the loop’s starting condition, stopping condition, and iterator, Lua allows only simple numeric values (in this regard, it’s a lot like BASIC). The step value is also read more..

  • Page - 244

    203 MyTable [ "Key1" ] = "Value1"; MyTable [ "Key2" ] = "Value2"; for MyKey, MyValue in MyTable do print ( MyKey, MyValue ); end produces the following output: Key0 Value0 Key2 Value2 Key1 Value1 Functions Functions in Lua follow a pattern similar to that of most languages, in that they’re defined with an initial declaration line, containing an read more..

  • Page - 245

    204 You once again get the proper output of 48. This is because GlobalVar is automatically created in the global scope, and therefore is visible even after Add () returns. To suppress this and create local variables, the local keyword is used. So, if you simply add one instance of local to the previ- ous example: function Add ( X, Y ) local GlobalVar = X + Y; end read more..

  • Page - 246

    205 One last detail; because functions can be assigned to table elements, you can take advantage of the same notational shorthands. For example: function PrintHello () print ( "Hello, World!" ); end MyTable = {}; MyTable [ "Greeting" ] = PrintHello; At this point, the "Greeting" element of MyTable contains a reference to PrintHello (), which can now be called in read more..

  • Page - 247

    206 Compiling a Lua Project Understanding how to compile a Lua project is the first and most important thing to understand for obvious reasons. Not surprisingly, the first step is to include lua.h in your main source file and make sure the compiler knows where to find the lua.lib library. In the case of Microsoft Visual C++ users, this is a simple matter of selecting read more..

  • Page - 248

    207 Remember, this will work only if you properly set your path as described previously. LUA (AND BASIC SCRIPTING CONCEPTS) NOTE In case you’re not familiar with it, extern is a directive that informs the linker that the identifiers (namely functions) defined within its braces follow the conventions of another language and should be treated as such. In this case, because read more..

  • Page - 249

    208 This example creates a new state called pLuaState that refers to an instance of the runtime envi- ronment with a stack of 1024 elements. This state is now valid, and is capable of loading and exe- cuting scripts. Of course, no initialization function is complete without its corresponding shut down function. Once you’re done with your Lua state, be sure to close it read more..

  • Page - 250

    209 you have to learn how to call C functions from Lua. Once you can do this, you just wrap a func- tion that wraps printf () or something along those lines, and you can print the output of your scripts to the console and actually watch it run. As such, pretty much everything following this point deals with how Lua and C are integrated, starting with the read more..

  • Page - 251

    210 At any time, the index of the stack’s top element will be equal to stack’s overall size. This is because Lua indexes the stack starting from 1; therefore, a stack of one element can be indexed from 1-1, a stack of 16 elements can be indexed from 1-16, and so on. This is a stark contrast from C and most other languages, in which arrays and other aggregate read more..

  • Page - 252

    211 So to sum things up, Lua will virtually always appear to portray an empty stack starting from 1 when you attempt to access it from C. That being said, let’s look at the functions that actually pro- vide the stack interface. Lua features a rich collection of stack-related functions, but the majority of them won’t be particularly useful for your purpose and as such, read more..

  • Page - 253

    212 even alert you until it’s too late. lua_stackspace () should be used in any case where large num- bers of values will be pushed onto the stack, especially when the pushing will be done inside loops, which are especially prone to overflow errors. The next set of functions you will read about is one of the most important. It provides the classic push/pop interface read more..

  • Page - 254

    213 Actually, because Lua doesn’t provide a particularly convenient way to directly pop a value off the stack in the traditional context of the stack interface, let’s write some macros to do it now. Using the existing Lua functions, you have to do three things in order to simulate a stack pop: ■ Get the index of the stack’s top element using lua_gettop (). ■ Use read more..

  • Page - 255

    214 Because of this, 0 is never a valid index (unlike tables) and should not be used. Past that, valid indexes run from 1 to the size of the stack. So, if you have a stack of four elements, 1, 2, 3, and 4 are all valid indexes. One interesting facet of Lua stack access, however, is using a negative number. At first this may seem strange, but using a negative read more..

  • Page - 256

    215 The first function, lua_type (), returns one of a number of constants referring to the type of the element at the given index. These constants are shown with a description of their meanings in Table 6.5. LUA (AND BASIC SCRIPTING CONCEPTS) Table 6.5 lua_type () Return Constants Constant Description LUA_TNIL nil LUA_TNUMBER Numeric: int, long, float, or double. LUA_TSTRING String read more..

  • Page - 257

    216 function MyFunc0 ( X, Y ) -- ... end function MyFunc1 ( Z ) -- ... end MyFunc0 ( 16, 32 ); MyFunc1 ( "String Parameter" ); CFunc ( 2, 4.8, "String Parameter" ); Of course, if CFunc () is not exported, this will produce a runtime error. Notice, however, that the syntax for calling the C function is identical to any other Lua function, including parameter read more..

  • Page - 258

    217 currently empty (whether it is or not), so all of your stack accessing will be relative to element index 1. At the beginning of your C function, the stack will be entirely empty except for any parameters that the Lua caller may have passed. Because of this, the size of the stack is always syn- onymous with the number of parameters the caller passed, and thus, you read more..

  • Page - 259

    218 So you’re now capable of registering a C function with Lua, as well as receiving parameters and returning results. That’s pretty much everything you need, so let’s have a go at implementing that printf () wrapper mentioned earlier. I’ll just show you the code up front and I’ll dissect it after- wards: int PrintStringList ( lua_State * pLuaState ) { // Get the read more..

  • Page - 260

    219 and halts the current script just before printing the supplied message. Here’s the prototype, just for reference: void lua_error ( lua_State * pLuaState, char * pstrMssg ); Getting back on track, the rest of the loop deals with reading the string from the stack using lua_tostring () and printing it to the screen (in between the tab and newline char- acters). The function read more..

  • Page - 261

    220 if X then Logic = "X is true."; -- Remember, only nil is considered false in Lua else Logic = "X is false."; end -- Now call your exported C function for printing the strings PrintStringList ( "Random Strings:", "" ); -- The extra empty -- string is just to -- create a blank line PrintStringList ( FullName, PiString, Logic ); The first part of read more..

  • Page - 262

    221 All that’s necessary to run this script is to initialize Lua with a call to lua_open (), register the PrintStringList () function with lua_register (), and finally load and execute the script in one fell swoop with lua_dofile (). The output of this program will look like this: Lua Integration Example Executing Script test_0.lua: Random Strings: Name: Alex Varanese Pi: 3.14159 read more..

  • Page - 263

    222 Let’s get this new script started, which is called test_1.lua, with the Exponent () function: -- Manually computes exponents in the form of X ^ Y function Exponent ( X, Y ) -- First, let's just print out the parameters PrintStringList ( "Calculating " .. X .. " to the power of " .. Y ); -- Now manually compute the result Exponent = 1; if Y < 0 then read more..

  • Page - 264

    223 PrintStringList ( "Multiplying string \"" .. String .. "\" by " .. Factor ); -- Multiply the string NewString = ""; for X = 1, Factor do NewString = NewString .. String; end -- Return the multiplied string to C return NewString; end This function is even simpler than Exponent. All it does is create a variable called NewString and assign it the read more..

  • Page - 265

    224 In Lua, functions can be thought of as globals, just as much as global variables can be thought of as globals. This doesn’t mean they’re any more like variables than C functions are, but they can be referred to this way. The first thing you need to do when calling a function is push a reference to the function onto the stack. Because functions are simply read more..

  • Page - 266

    225 That’s everything there is to know about basic Lua function calls from the host application. Now that you know what you’re doing, let’s go back to test_1.lua and try calling your Exponent () and MultiplyString () functions. printf ( "\nLoading Script test_1.lua:\n\n" ); lua_dofile ( pLuaState, "test_1.lua" ); // Call the exponent function // Call lua_getglobal () to read more..

  • Page - 267

    226 much prepared for anything. Most of the interaction between these two entities will lie in func- tion calls. Because you’ve learned the language as well, you should be familiar enough with Lua in general to get started with your own experiments and exploration. Of course, you still need to get back to the bouncing alien head demo, but before that, there’s one last read more..

  • Page - 268

    227 GlobalInt = 256; GlobalFloat = 2.71828; GlobalString = "I'm an obtuse man..."; This gives you three globals to work with, all of differing types. To get things started, let’s just try reading their values and printing them from C: // Read some global variables printf ( "\n\tReading global variables...\n\n" ); // Read an integer global by pushing it onto the stack read more..

  • Page - 269

    228 // Write and read the float global lua_pushnumber ( pLuaState, 3.14159 ); lua_setglobal ( pLuaState, "GlobalFloat" ); lua_getglobal ( pLuaState, "GlobalFloat" ); printf ( "\t\tGlobalFloat: %f\n", lua_tonumber ( pLuaState, 1 ) ); lua_pop ( pLuaState, 1 ); // Write and read the string global lua_pushstring ( pLuaState, "...so I'll try to be oblique." ); lua_setglobal ( read more..

  • Page - 270

    229 In short, you need to create a table within the script that will hold all of your bouncing alien heads; each element of the array needs to describe its corresponding alien head in the same way that the Alien struct did in the hardcoded version. Obviously, table manipulation is built in to Lua, so you don’t need to provide any functionality for that from the host read more..

  • Page - 271

    230 ■ HAPI_BlitBG () is a simple function that causes the background image to be blitted to the framebuffer. No parameters are necessary. ■ HAPI_BlitSprite () accepts parameters referring to an X, Y location and an index into the array of frames of the spinning alien head animation. ■ HAPI_BlitFrame () is another simple function that blits the framebuffer to the screen. read more..

  • Page - 272

    231 numeric value. At least, that’s how things are working internally. All you need to worry about, however, is that the function is returning a parameter. So right off the bat, wrapping it in a macro that provides a more descriptive name will result in improved code readability. Second, you have to cast the value the function returns to an int because Lua works only read more..

  • Page - 273

    232 // Blit sprite W_BlitImage ( g_AlienAnim [ iIndex ], iX, iY ); // Return nothing return 0; } Again, you see a similar process. First you read in three integer parameters with your handy GetIntParam () macro. You then pass those parameters directly to the Wrappuh function W_BlitImage (), which performs the blit. Unlike HAPI_GetRandomNumber (), this function does not return read more..

  • Page - 274

    233 What can I say? I’m a bit of a neat-freak, so InitLua () had to have a matching ShutDown () func- tion, whether it was necessary or not. :) It would just seem lopsided without one! After the call to InitLua (), you’ll have a valid Lua state and your host API will be locked and loaded. It’s here where the scripting really begins. After all of your C-side read more..

  • Page - 275

    234 Another call to CallLuaFunc (), and another script function you haven’t yet seen. This one is called HandleFrame (), and naturally, handles the current frame by moving the sprites around. Once again, you’ll see these two functions in the next section. That’s it! In summary, the new host application works by first defining a series of functions that collectively form read more..

  • Page - 276

    235 in this respect; their loading phase is costly and should only be done outside of speed-critical code (i.e., outside of your main loop). Calling lua_dofile () to execute a script on a per-frame basis would be frame rate homicide (which is only legal in Texas). Getting back to the topic at hand, let’s look at the script’s constant declaration section: ALIEN_COUNT read more..

  • Page - 277

    236 -- Set the X, Y location CurrAlien.X = GetRandomNumber ( 0, 639 - ALIEN_WIDTH ); CurrAlien.Y = GetRandomNumber ( 0, 479 - ALIEN_HEIGHT ); -- Set the X, Y velocity CurrAlien.XVel = GetRandomNumber ( MIN_VEL, MAX_VEL ); CurrAlien.YVel = GetRandomNumber ( MIN_VEL, MAX_VEL ); -- Set the spin direction CurrAlien.SpinDir = GetRandomNumber ( 0, 2 ); -- Copy the reference to the new read more..

  • Page - 278

    237 To solve this problem, you simply start the loop with this line: local CurrAlien = {}; Assigning {} to CurrAlien forces Lua to allocate a new table and therefore provide a fresh, unused reference. You can then fill the values of this instance of CurrAlien and freely assign it to the next element of Aliens, without worrying about overwriting the values you set in read more..

  • Page - 279

    238 The rest of the alien head initialization loop is pretty much what you would expect; each element of CurrAlien is set to a random value, using the GetRandomNumber () function that the previously discussed host API provides. Once this loop completes, Init () is finished and the global Aliens table contains a record of every bouncing alien head.The script is now fully read more..

  • Page - 280

    239 -- Get the X, Y velocities local XVel = Aliens [ CurrAlienIndex ].XVel; local YVel = Aliens [ CurrAlienIndex ].YVel; -- Increment the paths of the aliens X = X + XVel; Y = Y + YVel; Aliens [ CurrAlienIndex ].X = X; Aliens [ CurrAlienIndex ].Y = Y; -- Check for wall collisions if X > 640 - HALF_ALIEN_WIDTH or X < -HALF_ALIEN_WIDTH then XVel = -XVel; end if Y > read more..

  • Page - 281

    to handle the logic. This means moving the alien heads along their paths and checking for colli- sions, among other things. The first thing to do after blitting the new frame to the screen is update CurrAnimFrame. You do this by incrementing the variable, and resetting it to zero if the increment pushes it past ALIEN- _MAX_FRAME. Of course, you want to perpetuate the read more..

  • Page - 282

    241 script and watch the executable change with it. As a challenge, try adding a gravity constant to the bouncing movement of the heads; perhaps something that will slowly cause them to fall to the ground. Once they’re all at the bottom of the screen, reverse the polarity and watch them “fall” back up. This shouldn’t take too much effort to implement given what read more..

  • Page - 283

    242 Web Links For more general information on Lua, as well as the Lua user community, check out the follow- ing links. These are also great places to begin your investigation of the advanced topics described previously: ■ The Official Lua Web Site: http://www.lua.org/. This is the official source for Lua docu- mentation and distributions. Check here for updates on the language read more..

  • Page - 284

    243 Directory Structure When the installation is complete, check out the Python22/ directory (which should be the root of your Python installation). In it, you’ll find the following subdirectories: ■ DLLs/. DLLs necessary for runtime support. Nothing you need to worry about. ■ Doc/. Extensive HTML-based documentation of Python and the Python system. Definitely worth your attention. read more..

  • Page - 285

    244 Also, similar to Lua, python can run entire Python scripts from text files, which is of course much easier when you want it to execute large scripts, because it would quickly become tedious to retype them over and over. It’s also a good way to validate your scripts; the interpreter will flag any compile-time errors it finds in your code and provide reasonably read more..

  • Page - 286

    245 Int = 16 # Set Int to 16 Float = 3.14159 # Set Float to 3.14159 String = "Hello, world!" # Set String to "Hello, world!" Note the lack of semicolons. Python does allow them, but they aren’t useful in the same way they are in Lua and are rarely seen in the Python scripts read more..

  • Page - 287

    246 Data Types Python has a rich selection of data types, even directly supporting advanced mathematical con- cepts like complex numbers. However, your experience with Python in the context of game scripting will be primarily limited to the following: ■ Numeric— Integer and floating-point values are directly supported, with any necessary casting that may arise handled transparently by read more..

  • Page - 288

    247 Basic Strings As stated, Python has extensive support for strings, both in terms of their representation and the built-in operations that can be performed on them. To get things started, consider the multiple ways in which a Python string literal can be expressed. First off is the traditional double-quote syn- tax we all know and love: MyString = "Hello, world!" This read more..

  • Page - 289

    248 At this point it should be clear why the aforementioned technique for simulating block com- ments works the way it does. Because Python (like many languages) allows isolated expressions to appear outside of a larger statement like an assignment, these “comments” are really just string literals left untouched by the compiler that don’t have any effect at runtime. read more..

  • Page - 290

    249 In addition to simple array notation, however, slice notation can also be used to easily extract sub- strings, which has this general form: StringName [ StartIndex : EndIndex ] Get the idea? Here’s an example: MyString = "Stringtastic!" print "Slicing from index 3 to 8:", MyString [ 3 : 8 ] Here’s its output: Slicing from index 3 to 8: ingta Just provide two read more..

  • Page - 291

    250 At which point MyString will contain “So I said 'Goodbye!'“. What you can’t do, however, is attempt to change individual characters or slices of a string. The compiler won’t like either of these cases: MyString [ 3 ] = "X" MyString [ 0 : 2 ] = "012" This sort of substring alteration must be simulated instead by creating a new string based on read more..

  • Page - 292

    251 Lastly, check out the built-in function len (), which Python provides to return the length of a given string: MyString = "Plaza de toros de Mardid" print "MyString is", len ( MyString ), "characters long." This example will output: MyString is 24 characters long. Lists Lists are the main aggregate data structure in Python. Something of a cross between C’s read more..

  • Page - 293

    252 Python provides a large assortment of built-in functions for dealing with lists. I only cover a select few here, but be aware that there are many more. Consult the documentation that came with your Python distribution for more information if you’re interested. Just like strings, the len () function can be used to return the number of elements in a list. Here’s an read more..

  • Page - 294

    253 This example produces the following output: [0, 1, 2, 3] [4, 5, 6, 7] [0, 1, 2, 3, 4, 5, 6, 7] Lastly, let’s take a look at insert (). This function allows a new element to be inserted into the list at a specific index, pushing everything beyond that index over by one to make room. MyList = [ "Game", "Mastery." ] print MyList MyList.insert ( 1, read more..

  • Page - 295

    254 Of course, just as you saw in Lua, the issue of references rears its ugly head again. After assigning SubList1 to SuperList [ 1 ] in the last example, check out what happens when I make a change to SubList 1: print "SubList1: ", SubList1 print "SuperList [ 1 ]:", SuperList [ 1 ] SubList1 [ 1 ] = "XYZ"; print "SubList1: read more..

  • Page - 296

    255 PYTHON Table 6.8 Python Bitwise Operators Operator Function << Shift left >> Shift right & And ^ Xor | Or ~ Unary not Table 6.9 Python Relational Operators Operator Function < Less than > Greater than <= Less than or equal >= Less than or equal !=, <> Not equal (<> is obsolete) == Equal Table 6.10 Python Logical Operators Operator Function and And or read more..

  • Page - 297

    256 Here are a few general-purpose notes to keep in mind when dealing with Python expressions: ■ Like Lua, Python’s logical operators are spelled out as short mnemonics, rather than symbols. For example, logical and is and rather than &&. ■ Assignments cannot occur in expressions. Python has removed this because of its signifi- cant probability of leading to logic read more..

  • Page - 298

    257 Python’s form looks like this: if Expression: Also, else if has been replaced with elif, a more compact version of the same thing. Make sure to note that all clauses; the initial if, the zero or more elif’s, and the optional else; all must end with a colon (:). The other important lesson to learn here is how a code block is denoted in Python. In C, you rely read more..

  • Page - 299

    258 Here are a few more examples to help the paint dry: X = 0 Y = 1 if X > 0: print "X is greater than zero." if X <= 0 or Y != 1: print "X is less than or equal to zero." if X or Y: print "Between X and Y, one, the other, or both are true." Z = "Quantum Foam" if ( X + Y ) and Z: print "X + Y and Z are both true." And read more..

  • Page - 300

    259 Loop Iteration: 5 Loop Iteration: 6 Loop Iteration: 7 Loop Iteration: 8 Loop Iteration: 9 Loop Iteration: 10 Loop Iteration: 11 Loop Iteration: 12 Loop Iteration: 13 Loop Iteration: 14 Loop Iteration: 15 While I am on the topic of loops, I should cover some of the required loop-handling statements that most languages provide. Like C, Python gives you break and continue, and read more..

  • Page - 301

    260 And here’s the output: First Loop - No Break Loop Iteration: 0 Loop Iteration: 1 Loop Iteration: 2 Loop Iteration: 3 Loop Iteration: 4 Loop Iteration: 5 Loop Iteration: 6 Loop Iteration: 7 Else clause activated. Second Loop - With Break Loop Iteration: 0 Loop Iteration: 1 Loop Iteration: 2 Next up are for loops, which work slightly differently than they do in C. In Python, read more..

  • Page - 302

    261 a human hardcoding each value individually). For example, say you want to loop through a list 1024 times. Rather than type out all 1024 comma separated list elements, you simply do this: for X in range ( 0, 1023 ): print X (You’ll have to run this yourself, my editors wouldn’t appreciate a dump of 1024 lines. :) range () automatically generates and returns a list read more..

  • Page - 303

    262 This simple example uses the def keyword (short for define) to create a new function called GetMax (). This function accepts two parameters, X and Y. As you can see, parameters need only be listed; the typeless nature of Python means you don’t have to declare them with data types or anything like that. As for the function body itself, it follows the same form read more..

  • Page - 304

    263 When MyFunc () is entered, it gives both global variables new values. It then prints them out, and you can see that both variables are indeed different. However, when the function returns and you print the globals again from within their native global scope, you find that GlobalInt has seeming- ly gone from 128, the value MyFunc () set it to, back to 256. read more..

  • Page - 305

    264 The Debug Library In practice, there’s a slight issue with the Python.org 2.2 distribution; the python22_d.lib file is missing, at least in its compiled form. You can download the source and build it yourself, but for now, running any Python program will result in the following linker error: LINK : fatal error LNK1104: cannot open file "python22_d.lib" The reason read more..

  • Page - 306

    265 #define Py_DEBUG #endif */ That’s everything, so save pyconfig.h with the changes and the Python library will use the non- debug version of python22.lib in all cases. Everything should run smoothly from here on out. Initializing Python Within your program, the initialization and shut down of Python is quite simple. Just call Py_Initialize () at the outset, and Py_Finalize () read more..

  • Page - 307

    266 The actual objects are created by functions in the Python integration API, so you don’t have to worry about that just yet. Reference Counting Python objects are vital to the overal scripting system, and as such, are often used in a number of places at once. Because of this, you can’t safely free a Python object arbitrarily, because you have no idea whether something read more..

  • Page - 308

    267 Simply put, this code loads a script called test_0.py into the pModule object. What’s all this extra junk, though? The first thing you’ll notice is that you’re creating a Python object called pName. It’s created in a function called PyString_FromString (), which takes a C-string and creates a Python object around it. This allows the string to be accessed and read more..

  • Page - 309

    268 IntVar = 256 FloatVar = 3.14159 StringVar = "Python String" # Test out some conditional logic X = 0 Logic = "" if X: Logic = "X is true" else: Logic = "X is false" # Print the variables out to make sure everything is working print "Random Stuff:" print "\tInteger:", IntVar print "\t Float:", FloatVar print "\t String: " + read more..

  • Page - 310

    269 if X > Y: return X else: return Y The GetMax () function accepts two integer parameters and returns whichever value is greater. The question is: how can this function be called from C? The Module Dictionary To understand the solution to this problem, you need to understand a script module’s dictionary. The dictionary of a module is a data structure that maps all of read more..

  • Page - 311

    270 You have the function, so now what? Now, you need to worry about parameters. You know GetMax () accepts two of them, but how are you going to pass them? You’ll see how in just a moment, when you learn how to call the function, but for now, you need to focus on how the parameters are stored during this process. For this, I’ll briefly cover another Python read more..

  • Page - 312

    271 locally defined Python object pointer. The second call extracts the raw value from this object. Check it out: PyObject * pMax = PyObject_CallObject ( pFunc, pParams ); int iMax = PyInt_AsLong ( pMax ); printf ( "\tResult from call to GetMax ( 16, 32 ): %d\n\n", iMax ); PyObject_CallObject () is the call to make when invoking a script-defined function, provided you read more..

  • Page - 313

    272 // Read in the string and integer parameters if ( ! PyArg_ParseTuple ( pParams, "si", & pstrString, & iRepCount ) ) { printf ( "Unable to parse parameter tuple.\n" ); exit ( 0 ); } // Print out the string repetitions for ( int iCurrStringRep = 0; iCurrStringRep < iRepCount; ++ iCurrStringRep ) printf ( "\t\t%d: %s\n", iCurrStringRep, pstrString ); // Return read more..

  • Page - 314

    273 The last order of business within a host API function (aside from the intended logic itself) is the return value. Because you’re returning Python objects, you have to send something back. If there’s nothing you want to return, just use PyInt_FromLong () to generate the integer value zero. In your case, however, you’ll return the specified repetition count just for the read more..

  • Page - 315

    274 // Create a new module to hold the host API's functions if ( ! PyImport_AddModule ( "HostAPI" ) ) printf ( "Host API module could not be created." ); This function simply accepts a string containing the module’s desired name. In this case, name it HostAPI. You already have the function table prepared, so add it to the module: if ( ! Py_InitModule ( read more..

  • Page - 316

    275 will terminate the loading process. To remedy this, remember to define any modules you’d like your scripts to use before loading the scripts: // Create a new module to hold the host API's functions if ( ! PyImport_AddModule ( "HostAPI" ) ) printf ( "Host API module could not be created." ); // Create a function table to store the host API PyMethodDef HostAPIFuncs read more..

  • Page - 317

    276 Everything should look simple enough, but notice that in the call to RepeatString (), you had to prefix it with HostAPI, the name of the module in which it resides, forming HostAPI.RepeatString (). This is done for the same reason you prefixed the Lua host API functions in the last section with HAPI_—to help prevent name clashes. This way, if the script already read more..

  • Page - 318

    277 Re-coding the Alien Head Demo You’ve hopefully become comfortable by now with the basic process of Python integration, so you can now try something a bit more dynamic and use Python to rewrite the central logic behind the bouncing alien head demo initially coded in C earlier in the chapter. I already covered a lot of the general theory behind how this recoding process read more..

  • Page - 319

    278 Remember, for a host API function to be compatible with Python, it must return a PyObject point- er and accept two PyObject pointers as parameters. Also remember that you always prefix host API functions with HAPI_ to ensure that they don’t clash with any of the other names in the pro- gram. Within each function, parameters are extracted using a format string and read more..

  • Page - 320

    279 // Store the host API function table static PyMethodDef HostAPIFuncs [] = { { "GetRandomNumber", HAPI_GetRandomNumber, METH_VARARGS, NULL }, { "BlitBG", HAPI_BlitBG, METH_VARARGS, NULL }, { "BlitSprite", HAPI_BlitSprite, METH_VARARGS, NULL }, { "BlitFrame", HAPI_BlitFrame, METH_VARARGS, NULL }, { "GetTimerState", HAPI_GetTimerState, METH_VARARGS, NULL }, { NULL, NULL, NULL, NULL } }; // read more..

  • Page - 321

    280 void ShutDownPython () { // Decrement object reference counts Py_XDECREF ( g_pFunc ); Py_XDECREF ( g_pDict ); Py_XDECREF ( g_pModule ); Py_XDECREF ( g_pName ); // Shut down Python Py_Finalize (); } Whether or not you’d like to keep all of your main Python objects global in a real project is up to you; I primarily chose to do it here because it helps illustrate the read more..

  • Page - 322

    281 Because Init () won’t take any parameters, you just pass NULL instead of a python object array when calling PyObject_CallObject. This is a flag to the function that lets it know not to look for a parameter list. The last section of code implements the main loop and shuts down Python upon the loop’s ter- mination. It starts by reusing the g_pFunc pointer from the read more..

  • Page - 323

    282 ALIEN_WIDTH = 128 # Width of the alien sprite ALIEN_HEIGHT = 128 # Height of the alien sprite HALF_ALIEN_WIDTH = ALIEN_WIDTH / 2 # Half of the sprite width HALF_ALIEN_HEIGHT = ALIEN_HEIGHT / 2 # Half of the sprite height ALIEN_FRAME_COUNT = 32 read more..

  • Page - 324

    283 while CurrAlienIndex < ALIEN_COUNT: # Set a random X, Y location X = HostAPI.GetRandomNumber ( 0, 639 - ALIEN_WIDTH ) Y = HostAPI.GetRandomNumber ( 0, 479 - ALIEN_HEIGHT ) # Set a random X, Y velocity XVel = HostAPI.GetRandomNumber ( MIN_VEL, MAX_VEL ) YVel = HostAPI.GetRandomNumber ( MIN_VEL, MAX_VEL ) # Set a random spin direction SpinDir = HostAPI.GetRandomNumber ( 0, 2 ) read more..

  • Page - 325

    284 # Blit the background HostAPI.BlitBG () # Update the current frame of animation if HostAPI.GetTimerState ( ANIM_TIMER_INDEX ): CurrAnimFrame = CurrAnimFrame + 1 if CurrAnimFrame > ALIEN_MAX_FRAME: CurrAnimFrame = 0 # Loop through each alien and draw it CurrAlienIndex = 0 while CurrAlienIndex < ALIEN_COUNT: # Get the X, Y location X = Aliens [ CurrAlienIndex ][ 0 ] Y = Aliens [ read more..

  • Page - 326

    285 # Loop through each alien and move it, checking for collisions CurrAlienIndex = 0 while CurrAlienIndex < ALIEN_COUNT: # Get the X, Y location X = Aliens [ CurrAlienIndex ][ 0 ] Y = Aliens [ CurrAlienIndex ][ 1 ] # Get the X, Y velocity XVel = Aliens [ CurrAlienIndex ][ 2 ] YVel = Aliens [ CurrAlienIndex ][ 3 ] # Move the alien along its path X = X + XVel read more..

  • Page - 327

    286 The logic here should speak for itself, and has been covered in the Lua section anyway. Speaking of Lua, you’ll notice that this was one of many references to the Lua version of this demo. If you were to compare the scripts and even the host applications of each of these demos to one anoth- er, you’d find that they’re almost exactly alike. This is because, read more..

  • Page - 328

    287 ■ Jython.org: http://www.jython.org/. Jython is an interesting project to port Python in its entirety to the Java platform, opening Python scripting to a whole new set of applications and users. ■ ActiveState: http://www.activestate.com/. Makers of the ActiveState ActivePython distribution. TCL So far this chapter has been dealing with languages that bear at least a reasonable read more..

  • Page - 329

    288 ActiveStateTcl You’ll be using the ActiveStateTcl distribution throughout the course of this chapter. ActiveStateTcl is available for Linux, Solaris, and Windows, implementing Tcl 8.3 (the latest version at the time of this writing). You can download ActiveStateTcl for free from www.activestate.com. It’s a clean and easy-to-use package, which can be installed in Windows simply by read more..

  • Page - 330

    289 ■ include/. The header files necessary to use both the Tcl implementation of ActiveStateTcl, as well as the extensions it provides. You’ll find quite a bit of stuff in here, but the only file in this folder you really need is tcl.h. ■ lib/. The compiled library (.lib) files necessary to use Tcl within your programs. Like include/, it’s a crowded folder, but read more..

  • Page - 331

    290 What, No Compiler? That’s right, most pure versions of Tcl do not ship with a compiler, which means all scripts are loaded by the game directly as human-readable source. Because you should know by now that loading a script at runtime is not a good way to handle compile-time errors, remember to use tclsh to attempt to execute your file beforehand; this will help read more..

  • Page - 332

    291 The Tcl Language Now that you’re familiar with the Tcl distribution, you can move on to the language. Tcl can be difficult to get comfortable with, because there are some deceptively subtle differences in its fun- damental nature when compared to the more conventional languages studied thus far. Ironically, Tcl’s incredible simplicity and generic design end up making it read more..

  • Page - 333

    292 example, whereas the previous example would set X to the desired value, the following would cause an error: set 256 X For obvious reasons, I might add. Putting X “into” 256 doesn’t make any more sense than the fol- lowing would in C: 256 = X; Also, like functions, commands generally return a value. Even set does this; it returns whatever value was set to the read more..

  • Page - 334

    293 As you can see, the output of expr 256 * 256 is 65536, the product of the multiplication. When evaluating the following command: set X [ expr 256 * 256 ] the Tcl interpreter takes the following steps: 1. The first word is read, informing Tcl that a set command is being issued. 2. The second word is read, which tells the set command that the variable X is the read more..

  • Page - 335

    294 It’s just a simple function for adding two integers and returning the result. However, imagine you called it like this: int Sum = Add ( 16 * 16, 128 / 4 ); Both parameters in this case are not immediate integer values, but are rather expressions. Rather than sending the string representation of these expressions to the Add () function, the runtime environment will read more..

  • Page - 336

    295 I haven’t covered the details of expressions yet, but this should help you understand how complex programming can be done using a language based entirely on commands, provid- ed those commands can be nested within one another. What you see is known as command substitution. This is a useful technique and is one of the cornerstones of Tcl programming, but anoth- er equally read more..

  • Page - 337

    296 after the dollar sign with its value. So, this too is considered identical from the perspective of set: set X $Y set X 256 Assuming Y is equal to 256, of course. Lastly, let’s see how this can be used to correct the first example: % set X [ expr $Y / 8 ] 32 Presto! The expression now evaluates as intended, without error, and properly assigns 32 to X. One read more..

  • Page - 338

    297 down. It’s like trying to learn trigonometry or calculus without first learning algebra—without that basis firmly in place, you won’t get very far. Anyway, with this initial Tcl philosophy out of the way, let’s get on to actually examining the lan- guage (which, as I mentioned previously, is primarily just a matter of learning about the com- mands in the Tcl read more..

  • Page - 339

    298 many people do (including me, for that matter, when I’m declaring a constant or global) and will be forced to use them in at least some cases. Because I think consistency is important, I suggest you either don’t use semicolons at all (and therefore give all of your comments their own line), or use them everywhere. Variables In Tcl, all values are stored internally read more..

  • Page - 340

    299 tells you is that the purpose of strings in Tcl is different than other languages. The concept of a string in Tcl is less about data and data types, and more about simply grouping words. Anything surrounded in double quotes is interpreted by Tcl to be a single word, even if it includes spaces. This is also the reason why assigning a variable to another variable read more..

  • Page - 341

    300 # Add 15 to MyInt incr MyInt 15 puts $MyInt # Decrement MyInt by 24 incr MyInt -24 puts $MyInt Here’s the example’s output: 16 17 32 8 The last variable-related command I’ll discuss here is append, which you can think of as incr for strings. Because incr only alters the value of integer variables, you’ll get an error if you try passing a string or float to it. read more..

  • Page - 342

    301 immediately (but permanently) change that variable into a string containing the string-represen- tation of the number. One thing about append is that its functionality seems redundant; after all, the following append example: # Append a variable using the append command set Title "Running Down " append Title "the Way Up" could be written just as easily with only the read more..

  • Page - 343

    302 # Create an array with four indexes set MyArray(0) 256 set MyArray(1) 512 set MyArray(2) 1024 set MyArray(3) 2048 This creates an array of four elements called MyArray and assigns values to each index. You may notice that, in a departure from my normal coding style, there aren’t spaces around the paren- theses and index in the array reference. Normally I’d use MyArray read more..

  • Page - 344

    303 # Create a seemingly two-dimensional array set MyArray(0,0) "This is 0, 0" set MyArray(0,1) "This is 0, 1" set MyArray(1,0) "This is 1, 0" set MyArray(1,1) "This is 1, 1" # Print two of its indexes puts $MyArray(0,0) puts $MyArray(1,1) # Now print two more, using variables as indexes set X 0 set Y 1 puts $MyArray($X,$Y) set X 1 set Y 0 puts read more..

  • Page - 345

    304 Basically, what I’m driving at is the fact that the Tcl language doesn’t support expressions in any way. As you’ve seen, all Tcl really does is pass space-delimited words to commands and perform substitution with the $ and [] notation. So, to provide expression-parsing support, the expr com- mand was created. This seems like a trivial detail, but it’s very important. read more..

  • Page - 346

    305 TCL Table 6.12 Tcl Bitwise Operators Operator Description Supported Data Types << Shift Left Integer >> Shift Right Integer & And Integer ^ Xor Integer | Or Integer ~ Unary Not Integer Table 6.13 Tcl Relational Operators Operator Description Supported Data Types < Less Than Integer, Float, String > Greater Than Integer, Float, String <= Less Than or Equal Integer, Float, read more..

  • Page - 347

    306 Something you can quickly ascertain from these tables is that string operands are only permitted when using the relational operators (<, >, <=, >=, !=, ==). Something you may be wondering, though, is why or how the data type of an operand even matters, because I’ve belabored the fact that Tcl sees everything as strings. This may be true, and Tcl does indeed read more..

  • Page - 348

    307 # Create a variable set X 0 # Print different strings depending on its value if { $X > 0 } { puts "X is greater than zero." } else { puts "X is zero or less." } Which outputs: X is zero or less. What you’re seeing here is a command whose parameters are chunks of Tcl code. The syntax that provides this, the {} notation, is actually a special type read more..

  • Page - 349

    308 Note also that the first parameter passed to an if command is an expression; like expr, if pro- vides its own expression-evaluation capabilities. Lastly, you may again be wondering why I’ve again deviated from my usual coding style by putting the opening and closing curly-braces of each code block in unusual places. This is another syntax imposition on behalf of Tcl. read more..

  • Page - 350

    309 Iteration: 2 Iteration: 1 Iteration: 0 Almost identical to C, right? Indeed, while has been implemented in a familiar way. The com- mand takes two parameters, an expression and a code block to execute as long as that expression evaluates to true (which, if you remember, is defined in Tcl as any nonzero value). Here’s the while from the previous example rewritten in a read more..

  • Page - 351

    310 Everything is in roughly the same place, so you should feel pretty much at home. Lastly, just like the other two languages, Tcl gives you break and continue for obvious purposes. break causes the loop to immediately terminate, causing program flow to resume just after the last line of the loop. continue causes the current iteration of the loop to terminate prematurely, read more..

  • Page - 352

    311 from to exit, and its single parameter is returned as the return value. For example, if you changed the custom Add command to look like this: proc Add { X Y } { return 0 expr $X + $Y } puts [ Add 32 32 ] The command would always return 0, no matter what parameters you pass it. The last issue to discuss with custom commands is that of global variables. read more..

  • Page - 353

    312 #Import the global variable global GlobalVar # Print out both the global and local puts $GlobalVar puts $LocalVar } # Call your command TestGlobal The error will no longer occur, and the output will look like this: I'm global variable. Not me, I'm into the local scene. This works because global brings the specified global variable into the function’s local scope until it read more..

  • Page - 354

    313 Once your paths are set, include the main Tcl header: #include <tcl.h> Finally, physically include the tcl83.lib library with your project (remember, of course, that your distribution’s main .LIB file might not be tcl83.lib exactly, unless you’re using ActiveStateTcl ver- sion 8.3 like me). At this point, you should be ready to get started. Initializing Tcl Just as Lua read more..

  • Page - 355

    314 Loading and Running Scripts Just as in Lua, Tcl immediately attempts to execute scripts when they’re loaded. Because most of the time, you will simply load a script once and deal with it later, the issue of code in the global scope once again becomes significant. Any code in the global scope of the script will run upon the script’s loading; user-defined commands, read more..

  • Page - 356

    315 set Logic "X is false." } # Print the variables out to make sure everything is working puts "Random Stuff:" puts "\tInteger: $IntVar" puts "\t Float: $FloatVar" puts "\t String: \"$StringVar\"" puts "\t Logic: $Logic" Running the host application with the call to Tcl_EvalFile () will produce the following output: Random Stuff: read more..

  • Page - 357

    316 Remember, the proc command is a Tcl-core command for creating your user-defined commands (or procedures, if you want to think of them like that). Here’s the code to call it: Tcl_Eval ( pTclInterp, "PrintStuff" ); Note that Tcl_Eval () requires you to pass the pointer to your interpreter as well as the com- mand. When this program is run, the following will read more..

  • Page - 358

    317 // Set the return value to an integer Tcl_SetObjResult ( pTclInterp, Tcl_NewIntObj ( iRepCount ) ); // Return the success code to Tcl return TCL_OK; } Everything should look more or less understandable at first, but the function’s signature certainly demands some explanation. Any function exported to a Tcl interpreter is required to match this prototype: int RepeatString ( read more..

  • Page - 359

    318 Returning Values Lastly, values can be returned to the script using the Tcl_SetObjResult () function. This function requires as a pointer to the Tcl interpreter in which the function’s caller is executing, and a pointer to a Tcl_Obj structure. You can create this structure on the fly to return an integer value with the Tcl_NewIntObj () function: Tcl_Obj * Tcl_NewIntObj ( read more..

  • Page - 360

    319 Calling the Exported Function from Tcl The RepeatString function exported to Tcl can be called just like any other command. Let’s modi- fy the PrintStuff command a bit to call it: proc PrintStuff {} { # Print some stuff to show we're alive puts "\tPrintStuff was called from the host." # Call the host API command RepeatString and print out its return value set read more..

  • Page - 361

    320 } else { return $Y } } This command is called like any other, using the techniques you’ve already seen. As a test, let’s call it with the integer values 16 and 32: Tcl_Eval ( pTclInterp, "GetMax 16 32" ); The command will of course return 32, but how exactly will it do so? At any time, the last com- mand’s return value can be extracted from the Tcl read more..

  • Page - 362

    321 pGlobalIntObj and pGlobalStringObj are pointers to integer and string Tcl objects, respectively. Reading values from a Tcl script’s global variables into these structures is done with the Tcl_GetVar2Ex () function, like this: pGlobalIntObj = Tcl_GetVar2Ex ( pTclInterp, "GlobalInt", NULL, NULL ); pGlobalStringObj = Tcl_GetVar2Ex ( pTclInterp, "GlobalString", NULL, NULL ); As has been read more..

  • Page - 363

    322 Only the first, second, and fourth parameters matter in the context of this example. As always, start by passing the Tcl interpeter instance you’d like to use. This is followed by the name of the global you’re interested in, a NULL parameter, and a Tcl object structure containing the value you’d like to update the global with. In this case, you use Tcl_NewIntObj read more..

  • Page - 364

    323 each new frame) and rewrite it using Tcl. This will require a host API that wraps the core func- tionality of the host that the script will need access to, and the body of the C-version of the demo will be almost entirely gutted and replaced with calls to Tcl. The Host API The host API will be the same as it was in the Lua version, but here are the read more..

  • Page - 365

    324 Tcl_CreateObjCommand ( g_pTclInterp, "BlitSprite", HAPI_BlitSprite, ( ClientData ) NULL, NULL ); Tcl_CreateObjCommand ( g_pTclInterp, "BlitFrame", HAPI_BlitFrame, ( ClientData ) NULL, NULL ); Tcl_CreateObjCommand ( g_pTclInterp, "GetTimerState", HAPI_GetTimerState, ( ClientData ) NULL, NULL ); } g_pTclInterp is a global pointer to the Tcl interpreter, and the multiple calls to read more..

  • Page - 366

    325 // Start the current loop iteration HandleLoop { // Let Tcl handle the frame Tcl_Eval ( g_pTclInterp, "HandleFrame" ); // Check for the Escape key and exit if it's down if ( W_GetKeyState ( W_KEY_ESC ) ) W_Exit (); } } By running this command once per frame, the aliens will move around and be redrawn consis- tently. This wraps up the host application, so let’s read more..

  • Page - 367

    326 set ANIM_TIMER_INDEX 0; # Animation timer index set MOVE_TIMER_INDEX 1; # Movement timer index You also need two globals: an array to hold the alien heads, and a counter to track the current frame of the animation. Remember, Tcl’s lack of multidimensionality can be easily sidestepped by cleverly naming indexes, so don’t worry about read more..

  • Page - 368

    327 # Set the X, Y velocity set Aliens($CurrAlienIndex,XVel) [ GetRandomNumber $MIN_VEL $MAX_VEL ]; set Aliens($CurrAlienIndex,YVel) [ GetRandomNumber $MIN_VEL $MAX_VEL ]; # Set the spin direction set Aliens($CurrAlienIndex,SpinDir) [ GetRandomNumber 0 2 ]; } } Remember that your “constants” are actually just typical globals, which need to be imported into the command’s local scope with the read more..

  • Page - 369

    328 # Blit the background image BlitBG; # Increment the current frame in the animation if { [ GetTimerState $ANIM_TIMER_INDEX ] == 1 } { incr CurrAnimFrame; if { $CurrAnimFrame >= $ALIEN_FRAME_COUNT } { set CurrAnimFrame 0; } } # Blit each sprite for { set CurrAlienIndex 0; } { $CurrAlienIndex < $ALIEN_COUNT } { incr CurrAlienIndex; } { # Get the X, Y location set X read more..

  • Page - 370

    329 # Get the X, Y location set X $Aliens($CurrAlienIndex,X); set Y $Aliens($CurrAlienIndex,Y); # Get the X, Y velocities set XVel $Aliens($CurrAlienIndex,XVel); set YVel $Aliens($CurrAlienIndex,YVel); # Increment the paths of the aliens incr X $XVel incr Y $YVel set Aliens($CurrAlienIndex,X) $X set Aliens($CurrAlienIndex,Y) $Y # Check for wall collisions if { $X > 640 - $HALF_ALIEN_WIDTH || $X < read more..

  • Page - 371

    330 Advanced Topics As usual, I couldn’t possibly fit a full description of the language here, so there’s still plenty to learn if you’re interested. Here are some of the semi-advanced to advanced topics to consider pursuing as you expand your knowledge of Tcl: ■ Tk. Naturally, Tk is logical next step now that you’ve attained familiarity and comfort with the Tcl read more..

  • Page - 372

    331 ■ ActiveState: http://www.activestate.com/. Makers of the ActiveStateTcl distribution used throughout this chapter. ■ The Tcl’ers Wiki: http://mini.net/tcl/. A collaboratively edited Web site dedicated to Tcl and its user community. Good source of reference material, discussions, and projects. WHICH SCRIPTING SYSTEM SHOULD YOU USE? You’ve learned quite a bit about these three scripting read more..

  • Page - 373

    332 ■ Speed of development. Aside from difficulty, building a scripting system from scratch takes a long time. If you find yourself working on a commercial project for an estab- lished game company, or just don’t want to spend two years from start to finish on a per- sonal project, you may find that there simply aren’t enough hours in the day to do both. Because read more..

  • Page - 374

    333 ■ No one knows your game better than you. Optimization and freedom of creativity are two things that are always on the minds of game developers. You may find that the only way to get a scripting language small enough, fast enough, or specific enough for your game is to build it yourself. To put it simply, scripting languages are sometimes better off when read more..

  • Page - 375

    334 huh? Along the way, you’ve learned a lot about how these three scripting systems work, which means you’ll be much better prepared for the coming chapters, in which you design your own scripting language. ON THE CD We built three major projects throughout the course of this chapter by recoding the original bouncing alien head demo in three different scripting languages. read more..

  • Page - 376

    Designing a Procedural Scripting Language “It’s a Cosby sweater. A COSBY SWEATAH!!!” ——Barry, High Fidelity CHAPTER 7 read more..

  • Page - 377

    336 N ow that you’ve learned how scripting systems are generally laid out, and even gained some hands-on experience with a few of the existing solutions, you’re finally on the verge of getting started with the design and construction of your own scripting engine. As you’ve learned, the high-level language is quite possibly the most important—or more specifi- cally, the read more..

  • Page - 378

    337 GENERAL TYPES OF LANGUAGES Programming languages, like people, for example, come in a wide variety of shapes and sizes. Also like people, certain languages are better at doing certain things than others. Some lan- guages have broad and far-reaching applications, and seem to do pretty much everything well. Other languages are narrow and focused, being applicable to only a small read more..

  • Page - 379

    338 a small, simple operation like moving the value of a variable around or performing arithmetic. Operands further describe instructions; like the parameters of a function, they tell the virtual machine exactly which data or values the instruction should operate on. Let’s start with an example. Say you’re writing a script that maintains three variables: X, Y, and Z. Just to read more..

  • Page - 380

    339 This makes endless loops very easy to code. Consider the following: Move X, 0 Label: Add X, 1 Jump Label This simple code snippet will set a variable called X to zero, and then increment it infinitely. As soon as the virtual machine hits the Jump instruction, it will jump back to the instruction immedi- ately following Label, read more..

  • Page - 381

    340 high-level language that will sit on top of it. As I mentioned previously, however, the one real advantage to a language like this is that it’s really quite easy to compile. As you can probably imagine, code that looks like this: X = Y * Z + ( Q / 10.5 ) + P - 2 Is considerably harder for a compiler to parse and understand than something simpler (albeit read more..

  • Page - 382

    341 game is just silly, at least from the perspective of the script coder. Scripts are usually slow compared to true, compiled machine code whether they’re in the form of an assembly-style language or a higher level language, so you might as well make them easier to use. The first thing to add, then, is support for more complex expressions. This in itself is a rather read more..

  • Page - 383

    342 TrueBlock: ; Handle true condition here Add X, Y Mul Z, 2 Mov Y, Z SkipTrueBlock: It works, and it works much better thanks to the ability to code Boolean expressions directly into scripts, but it’s a bit backwards, and it’s still too low level. First of all, you still have to use labels and jumps to route the flow read more..

  • Page - 384

    343 Looks like the same problem, huh? You’re being forced to emulate the nice, tidy organization of code blocks with labels and jumps, and the expression that you evaluate at each iteration of the loop to determine whether you should keep going is below the loop body, which is backwards from the while loop in C. Once again, these are things that the language should be read more..

  • Page - 385

    344 ing your way up from what is virtually the simplest type of language you could implement. Now that you know exactly why you should aim for a language like this, let’s have a look at some of the more complex language features. FUNCTIONS What if you wanted to add trigonometry to your expressions? In other words, what if you wanted to do something like this: Theta = read more..

  • Page - 386

    345 This function of course computes the Fibonacci Sequence, a sequence defined such that each ele- ment X is defined as the sum of the previous two elements (in other words, X - 1 and X - 2). The Fibonacci Sequence is a common example of basic recursive algorithms. For example, here are the first few terms from the sequence: 1,1,2,3,5,8,13,… In general, functions read more..

  • Page - 387

    346 players, you may want to write that algorithm once in a function, and then call that function whenever you need to level-up a player from any subsequent scripts. C programmers are certainly familiar with the concept of a standard library, so you should be able to imagine the possibilities as they would relate to games, once a game project gets complicated enough. read more..

  • Page - 388

    347 structure. Generally speaking, objects manage both a set of data known as the properties that describe a given entity (such as an enemy in your game) as well as a group of functions known as methods that operate specifically on that data and implement the entity’s behavior and functionali- ty (see Figure 7.3). FUNCTIONS Figure 7.3 Objects combine data and code into single read more..

  • Page - 389

    348 entities in a game to physical objects and therefore control them and their behavior in a very intuitive and lifelike manner. For example, UnrealScript, the scripting language used for the Unreal series of games, is based entirely around this concept. However, OOP-based languages are not only far more complex to design than their procedural counterparts, they’re also much more read more..

  • Page - 390

    349 see in the following sections, you’ll have no shortage of flexibility when you actually start script- ing. Regardless, OOP is still something important to keep in mind. XtremeScript Language Overview XtremeScript is the name of the scripting system you’re going to build, but more importantly, it’s the name of the language the system is based around. As a result, I’ll read more..

  • Page - 391

    350 You already know of some fairly serious differences from C, for example, like the fact that the lan- guage will be typeless and have built-in support for strings. These are more along the lines of additions to the language, however, as opposed to removals. The real differences will be in the form of features that will not be supported, such as pointers. Pointers not read more..

  • Page - 392

    351 Syntax and Features Fortunately, I’ve done the (somewhat) hard part already and put together a full language specifi- cation for you to work from. As I said, it’s a clear derivative of C, which gives it a familiar syntax and most of its popular features. There are a number of cutbacks here and there, in addition to a few small additions or modifications, but I read more..

  • Page - 393

    352 Makes things easier, huh? The only restriction is that vari- ables must be declared before using them, which con- trasts with a number of other scripting languages that don’t force you to do this. The reason I’ve chosen to enforce this policy is that positively evil logic errors can be the result of simple variable typos, such as the following: MyValue = 256; if ( read more..

  • Page - 394

    353 X = "Hello"; // Set X to a greeting Y = "Goodbye"; // Set Y to the opposite X = Y; // Now X and Y both contain "Hello" Which is the same way you’d deal with other data types, such as integers and Booleans. However, in the event that you need to access individual characters or substrings from read more..

  • Page - 395

    354 Remember, even though more complex structures like C’s struct aren’t supported, you can simu- late them with relative ease simply by using different elements of the array. For example, imagine that you wanted to port a structure like this from C++: struct MyStruct { bool X; int Y; float Z; } MyStruct Foo; Foo.X = true; Foo.Y = 32; Foo.Z = 3.14159; // I've read more..

  • Page - 396

    355 Notice that unlike C, this language provides a built-in exponent operator using the familiar caret (^). Also, as is the case with C, the increment (++) and decrement (--) operators come in both pre- and post- forms, so both of the following are legal: X ++; ++ X; FUNCTIONS Table 7.1 XtremeScript Arithmetic Operators Operator Description + Addition (Binary) - Subtraction (Binary) $ read more..

  • Page - 397

    356 Bitwise Bitwise operations are generally used for manipulating the individual bits of integer variables. XtremeScript’s bitwise operators are listed in Table 7.2: In another slight divergence from C, notice that the exclusive or operator is no longer the caret. I swapped that with the exponent operator. It is now the hash mark (#) instead. 7. DESIGNING A PROCEDURAL SCRIPTING read more..

  • Page - 398

    357 Precedence Lastly, let’s quickly touch on operator precedence. Precedence is a set of rules that determines the order in which operators are evaluated. For example, recall the PEMDAS mnemonic from school, which taught us that, for example, multiplication (M) is evaluated before subtraction (S). So, 8 - 4 * 2 is equal to zero, FUNCTIONS Table 7.3 XtremeScript Logical Operators read more..

  • Page - 399

    358 because 4 * 2 is evaluated first, the result of which is then subtracted from 8. If subtraction had higher precedence, the answer would be 8, because 8 - 4 would be multiplied by 2. XtremeScript operators follow pretty much the same precedence rules as other languages like C and Java, as illustrated in Table 7.5 (operators are listed in order of decreasing precedence, read more..

  • Page - 400

    359 Branching First up is if, which works just like most other languages. It accepts a single Boolean expres- sion and can route program flow to both a true or false block, with the help of the optional else keyword: if ( Expression ) { // True } else { // False } Iteration XtremeScript supports two simple methods for iteration. First up is the while loop, which looks like read more..

  • Page - 401

    360 The funny thing about the for loop is that it’s really just another way to write a while loop. Consider the following code example: for ( X = 0; X < 16; ++ X ) { Print ( X ); } This code could be just as easily written as while loop, and behave in the exact same way: X = 0; while ( X < 16 ) { Print ( X ); ++ X; } Nifty, huh? You might be read more..

  • Page - 402

    361 Functions Functions are an important part of XtremeScript, and are the very reason why you call it a proce- dural language to begin with. You’ll notice a small amount of deviation from C syntax, when deal- ing with XtremeScript functions, however, so take note of those details. Functions are declared with the func keyword, unlike C functions, which are declared with the read more..

  • Page - 403

    362 This would cause a compile-time error because at the time Func1 () is called in Func0 (), Func1 () hasn’t been defined yet and the compiler has no evidence that it ever will be. C++ solves this problem with function prototypes, which are basically declarations of the function that precede its actual definition and look like this: void Func0 (); void Func1 (); void read more..

  • Page - 404

    363 Escape Sequences One important but often unnoticed addition to a language is the escape sequence. Escape sequences allow, most notably, double quotes to be used within string literal values without con- fusing the compiler. XtremeScript’s escape sequence syntax is familiar, although we’ll only be implementing two: \" for escaping double-quotes, and \\, for escaping the read more..

  • Page - 405

    364 Note the use of quotation marks. The XtremeScript compiler won’t contain any default path information, so the greater-than/less-than symbol syntax used in C won’t be included. We’ll also include a watered-down version of #define, which will be useful for declaring constants: #define THIS_IS_A_CONSTANT 32 var X = THIS_IS_A_CONSTANT; I say watered-down because this will be the read more..

  • Page - 406

    365 SUMMARY This chapter has been a relatively easy one due to its largely theoretical nature, and I hope it’s been fun (or at least interesting), because designing the language itself is usually the most enjoy- able and creative part of creating a scripting system (in my opinion). More importantly, however, I hope that you’ve learned that creating a language even as simple read more..

  • Page - 407

    This page intentionally left blank read more..

  • Page - 408

    Part Four Designing and Implementing a Low-Level Language read more..

  • Page - 409

    This page intentionally left blank read more..

  • Page - 410

    Assembly Language Primer “Are you insane in the membrane?” ——Principal Blackman, Strangers with Candy CHAPTER 8 read more..

  • Page - 411

    370 I n the last chapter, we finally sat down and designed the language you’re ultimately going to implement later in the book. This was the first major step towards building your own script- ing system, and it was a truly important one. Obviously, a scripting system hinges on the design of the language around which it’s based; failing to take the design of this read more..

  • Page - 412

    371 you can see is the 2 foot x 2 foot surrounding area, it’d be hard to execute a plan like “walk to the center of the park.” However, if someone broke it down into simple instructions, like “take four steps forward, and then take two steps right (to avoid the tree), and then take another 10 steps forward, turn 90 degrees, and stop” you’d find it to read more..

  • Page - 413

    372 Besides, high-level code compilation is a large and complicated task and is orders of magnitude more difficult than the assembly of low-level code. It’ll be nice to see a working version of your system early on to give you the motivation to push through such a difficult subject later. HOW ASSEMBLY WORKS Assembly language is often perceived by newcomers as awkward to read more..

  • Page - 414

    373 saw operands in the Mov example. Mov is a general-purpose instruction for moving memory from one area to another. Without operands, you’d have no way to tell Mov what to move, or where to move it. Imagine a Mov instruction that simply looked like this: Mov Doesn’t make much sense, does it? Mov does require operands, of course—two of them to be exact—the read more..

  • Page - 415

    374 So, this means you need to perform three arithmetic instructions: an addition, a multiplication, and a division. The result of these three operations will be the same as the single expression list- ed previously. You can then put this value in X and your task will be complete. Here’s one question though: step two says you have to multiply the sum of Y and Z by read more..

  • Page - 416

    375 So, using only a handful of instructions (Mov, Add, Mul, and Div), you’ve managed to recreate the majority of the expression parsing abilities of C using assembly. Granted, it’s a far less intuitive way to code, but once you get some practice and experience it becomes second nature. Jump Instructions Normally, assembly language executes in a sequential fashion from the read more..

  • Page - 417

    376 You can refer to the “top” of this block of code as the while line, whereas the “bottom” of the block is the closing bracket (}). Everything in between represents the actual loop itself. So, to rewrite this loop in assembly-like terms, consider the following: LoopStart: ; ... ; ... ; ... Jmp LoopStart Just like in C, you can define line labels in assembly. The Jmp read more..

  • Page - 418

    377 Here, the code is almost identical, right? As you can see, assembly doesn’t have to be all that differ- ent. In a lot of ways it strongly par- allels C (which, in fact, was one of C’s original design goals back in the ultra old-school K&R days). Conditional Logic Of course, unconditional jumps are about as useful as infinite loops are in C, so you need a more read more..

  • Page - 419

    378 in assembly, you first need an instruction that facilitates comparisons. In the case of Intel 80X86 assembly, this instruction is called Cmp (short for Compare). Here’s an example: Cmp X, Y This instruction will compare the two values, just like you need. The question, though, is where does the result of the comparison go? For now, let’s not worry about that. read more..

  • Page - 420

    379 When performing conditional logic in assembly, there are basically two ways to go about it. Both methods involve marking blocks of code with line labels, but the exact placement of the code blocks and labels differs. Here’s the first approach (check out Figure 8.5 to see it graphically): Cmp X, Y JG TrueCase ; Execute false case Jmp SkipTrueCase read more..

  • Page - 421

    380 you want. Because of this, you need to put an unconditional jump (Jmp) after the false case to skip past the true case. This ensures that no matter what, only one of the two cases will be execut- ed based on the outcome of the comparison. This approach works well, but there is one little gripe; the code blocks are upside down, at least compared to their usual read more..

  • Page - 422

    381 LoopStart: ; ... ; ... ; ... Jmp LoopStart Here, the loop executes exactly from the declaration of the LoopStart label, all the way down to the Jmp, before moving back to the label and reiterating. Once again, however, this loop would run indefinitely and therefore be of little use to you. Fortunately, however, you learned how con- ditional logic works in the last read more..

  • Page - 423

    382 Once again you’re introduced to another instruction, Sub, which Subtracts the second operand from the first. As for the code itself, the example starts by Moving 16 into X, which implements the assignment statement in the C version. You then create a line label to denote the top of the loop block; this is what you’ll jump back to at each iteration. Following the read more..

  • Page - 424

    383 And because you’ve already man- aged to translate a while loop to assembly (albeit a slightly reversed one), you can certainly manage for loops as well. You’ve made a lot of progress so far; understanding how expres- sions, conditional logic, and itera- tion work in assembly is a huge step forward. Now, let’s dig a bit deeper and see how assembly will actually interact read more..

  • Page - 425

    384 but noble missions to harass the girls of the neighborhood. Now neighborhood spying is risky business, and requires a secure method of communication in order to properly get orders to field agents without enemy forces intercepting the message. Because of this, we had to devise what is without a doubt the most foolproof, airtight method of encryption man has ever dared to read more..

  • Page - 426

    385 to numeric codes. Of course, these numeric codes have a name—they’re called opcodes. “Opcode” is an abbreviation of Operation Code. This makes pretty good sense, because each numeric code corresponds to a specific operation, as you’ve seen. These are important terms, however, and a lot of people screw them up. Instructions can come in two forms; the numeric opcode that read more..

  • Page - 427

    386 RISC versus CISC So, now you understand how assembly language programming basically works and you have a good idea of the overall process of converting assembly to machine code. Throughout the last few pages you’ve had a lot of interaction with various instructions, from the arithmetic instruc- tions (Add and Mul) to the conditional branching family (Cmp, JG, and so on). read more..

  • Page - 428

    387 CISC system has reduced the overhead of instruction processing by a factor of four (despite the fact that the instruction itself will take longer to execute and be more complex on the CISC processor). Electrical engineering is an interesting subject, but you’re here to build a virtual machine for a scripting system, so let’s shift the focus back to software. In a read more..

  • Page - 429

    388 as possible. You shouldn’t do so much in C that you end up restricting the freedom of the scripts, because that’d defeat the whole purpose of this project, but you must remember that scripting involves significant overhead and should be minimized wherever possible. Orthogonal Instruction Sets In addition to the RISC versus CISC decision when designing an instruction set, read more..

  • Page - 430

    389 grammer, so it’s one of a few subtle details you’ll be ironing out in the design of your own assem- bly language. Registers Before moving on, I’d like to address the issue of registers. Those of you who have some assembly experience might be wondering if the virtual machine of a scripting system has any sort of analog to a real CPU’s register set. Before read more..

  • Page - 431

    390 Most runtime environments, whether they’re virtual or physical machines, provide some sort of a runtime stack (also known simply as a stack). The stack, due to its inherent ability to grow and shrink, as well as the rigid and predictable order in which data is pushed on and popped off, make it the ideal data structure for managing frequently changing data—namely, the read more..

  • Page - 432

    391 stack itself is a global structure, meaning it’s available to all parts of the program. That’s why you can push something on before calling a function and still access it from within that function. Figure 8.10 shows general stack use in assembly. Getting back on track, you don’t need to “mark” the end of the function. Instead, you can just end it with another read more..

  • Page - 433

    392 adds one to its own address to make sure the function returns to the following instruction, not itself; otherwise you’d have an infinite loop on your hands. Ret, on the other hand, is a bit differ- ent. It also performs an unconditional jump, but you don’t have to pass it a label. Instead, it jumps to whatever address it finds on the top of the stack. In read more..

  • Page - 434

    393 ing whichever method you choose. Following the parameters, the return address is pushed, as already discussed. The function is then invoked, and execution begins at its entry point. As the function executes, it will of course refer to these parameters you’ve sent it, which means it’ll need to read the stack. Rather than pop the values off, however, it’ll instead access read more..

  • Page - 435

    394 it’ll be found in the reverse order if you move from the top of the stack down. The return address will be at the top, with everything else following it, so it’ll look like this: Return Address Parameter Z Parameter Y Parameter X This means that return address is at the top of the stack, Z is at the top of the stack minus 1, Y is at the top of the read more..

  • Page - 436

    395 way to a pass return value on the stack would involve the function pushing it with the intention of the caller popping it back off. Unfortunately, you’d push this value after the parameters and return address, meaning the return value would now be above everything else, on the top of the stack. The problem is that once the Ret instruction is executed, it’ll attempt read more..

  • Page - 437

    396 example, the function might be nested into itself six levels deep and thus have six stack frames on the stack. The code for the function is not repeated anywhere, because it doesn’t change from one instance to the next. However, the data the function acts upon (namely, its locally defined variables) does change from one instance to another quite significantly. This is read more..

  • Page - 438

    397 All in all, this section is meant to show you how important the stack is when discussing runtime environments. Your language won’t support dynamically allocated data, which means that the only structure you need to store an entire script’s variables and arrays is a single runtime stack (in addition to a single register for returning values from functions to callers). In read more..

  • Page - 439

    398 XtremeScript system. You’ll get started on that in this chapter by designing the assembly language of the XtremeScript virtual machine, which I like to call XVM Assembly. XVM Assembly is what your scripts will ultimately be reduced to when you run them through the XtremeScript compiler that you’ll develop later on in this book. For now, however, it’ll be your first read more..

  • Page - 440

    399 tion. Therefore, even the assembler must statically allocate arrays, and should therefore have array functionality built-in. So, in addition to variable references like this: Mov X, Y XVM Assembly will also directly support array indexing like this: Mov X, MyArray [ Y ] I’ll talk about how to declare arrays a bit later. The last real issue regarding data is read more..

  • Page - 441

    400 specialized ones. One thing to note about Mov, however, is that its name is somewhat misleading. The instruction doesn’t actually move anything, in the sense that the Source operand will no longer exist in its original location afterwards. A more logical name would be Copy, because the result of the instruction is two instances of Source. Expect Mov to be your most read more..

  • Page - 442

    401 calculate the exponent using XVM assembly itself. This means you’d have to perform a loop of repetitive multiplication. This would be significantly slower than simply providing an Exp instruction that takes direct advantage of a far-faster C implementation. These extra instructions are good examples of how to offload more of the work to C, while preserv- ing the flexibility of read more..

  • Page - 443

    402 String Processing Concat String0, String1 GetChar Destination, Source, Index SetChar Index, Destination, Source XtremeScript is a typeless language with built-in support for strings. In another example of a CISC-like design decision, I’ve chosen to provide a set of dedicated string-processing functions for easy manipulation of string data as read more..

  • Page - 444

    403 itself, which are as follows: Jump if Equal (JE), Jump if Not Equal (JNE), Jump if Greater (JG), Jump if Less (JL), Jump if Greater or Equal (JGE), and Jump if Less or Equal (JLE). In all cases, Label must be a line label. The Stack Interface Push Source Pop Destination As you have learned, the runtime stack is vital to read more..

  • Page - 445

    404 provide a registered function of the same name. Without going into too much more detail, you can safely assume that this is how XtremeScript interacts with the host API. You’ll find that this approach is rather similar to the scripting systems discussed in Chapter 6. I’ll discuss the exact nature of the host inter- face in the coming chapters. Miscellaneous Pause read more..

  • Page - 446

    405 All of these questions can be answered with directives. A directive is a special part of the script’s source code that is not reduced to machine code and therefore is not part of the final exe- cutable. However, the information a directive provides helps the assembler shape the final version of the machine code output, and is therefore just as important as the read more..

  • Page - 447

    406 Functions The instruction set lets you write code, the var directives let you statically allocate data, so all that’s really left is declaring functions. The Func directive can be used to “wrap” a block of code that collectively is considered a function with somewhat C-style notation. Here’s an example: Func Add { Param Y Param X Var Sum Mov read more..

  • Page - 448

    407 Escape Sequences Because game scripting often involves scripted dialogue sequences, it’s not uncommon to find a heavy use of the double quote (“) symbol for quotes. Unfortunately, because strings themselves are delimited with that same symbol, you need a way for the assembler to tell the difference between a quotation mark that’s part of the string’s content, and the read more..

  • Page - 449

    408 SUMMARY OF XVM ASSEMBLY You’ve covered a lot of ground here in a fairly short space, so here are a few important bullet points to remember just to make sure you stay sharp: ■ Assembly language and machine code are basically the same thing; the only real difference is how they’re expressed. Assembly is the human readable version that is fed to the assembler, and read more..

  • Page - 450

    409 SUMMARY Out of all the theoretical chapters in the book, this has hopefully been among the most informa- tive. In only a few pages you’ve learned quite a lot about basic assembly language, different approaches to instruction set design, and even gotten your first taste of how an assembler works. I then moved on to cover the design of XVM Assembly, the low-level read more..

  • Page - 451

    This page intentionally left blank TEAMFLY Team-Fly® read more..

  • Page - 452

    Building the XASM Assembler “It’s fair to say I’m stepping out on a limb, but I am on the edge. And that’s where it happens.” ——Max Cohen, Pi CHAPTER 9 read more..

  • Page - 453

    412 O ver the course of the last eight chapters, you’ve been introduced to what scripting is and how it works, you’ve built a simple command-based language scripting system, you’ve learned how real procedural scripting is done on a conceptual level, you’ve learned how to use a number of existing scripting systems in real programs, and you’ve even designed both the high- read more..

  • Page - 454

    413 With the pleasantries out of the way, it’s time to roll up your sleeves and get started. This chapter will cover ■ A much more in-depth exploration of how a generic assembler works. ■ The exact details of how XASM works. ■ An overall design plan for the construction of the assembler. ■ A file format specification for the output of XASM, the XVM executable read more..

  • Page - 455

    414 The next section discusses how the instructions of a script file are processed by a generic assem- bler, in reasonably complete detail. The output of this generic, theoretical assembler is known as an instruction stream, a term representing the resulting data when you combine all of the opcodes and operands and pack them together sequentially and contiguously. It represents read more..

  • Page - 456

    415 I also mentioned previously that in addition to the mnemonic string and the opcode, each entry in the table can contain additional information. Specifically, I like to store an instruction’s opcode list here. The opcode list is just a series of flags of some sort (usually stored in a simple array of bit vectors) that the assembler uses to make sure the operands read more..

  • Page - 457

    416 Element 0, corresponding to Destination, only allows memory references and would therefore have the MEMORY_REF flag set (for example), whereas the LITERAL_VALUE flag would be unset. Element 1, on the other hand, because it corresponds to Source, would have both the MEMORY_REF and LITER- AL_VALUE flags set. Other operand types would exist as well, such as LINE_LABEL and read more..

  • Page - 458

    417 If you remember back to the discussion of Lua in Chapter 6, you may recall that the Lua stack can be accessed in two ways; with positive indices and with negative indices. Positive indices start from the bottom, so that the higher the index, the higher up you go into the stack. Negative indices, however, are used to read from the stack relative to the top. read more..

  • Page - 459

    418 You can use this information to replace a variable name with a stack index. Let’s assume the fol- lowing code was used to declare the function’s variables, and that variables are placed on the stack in the order they’re declared (therefore, the first one declared is the lowest on the stack): var X var Y var Z var W X would be placed on the stack first, read more..

  • Page - 460

    419 index) and an assembled integer variable. For example, let’s say the code for a stack index is 0, and the code for an integer literal is 1. The new output of the assembler would look like this: 0 0 -2 1 4 As you can see, the new format for the Mov instruction is opcode, operand type, operand data, operand type, and operand data. Lastly, there’s the issue of read more..

  • Page - 461

    420 An assembled global variable reference is just like a local one; the only difference is the sign of the index. Assembling Operands You’ve already seen the first steps in assembling operands in the last section with the codes you used to distinguish variable stack indices from integer literals, but let’s round the discussion out with coverage of other operand types. As read more..

  • Page - 462

    421 The list should be pretty straightforward, although you might be a bit confused by the idea of arrays indexed by literal values being considered different than arrays indexed by variables. The reason this is an issue is that the two operand types must be written to the output file with differ- ent pieces of information. For example, an array with an integer index must read more..

  • Page - 463

    422 and MyVar is found at stack index -8, the machine-code equivalent would look like this: 0 2 3 -8 0 16384 Now, the order is basically this: first you output the opcode (0), and then you output the newly- added operand count (2, for two operands), and then the operand type of the first operand (a variable in this case, the code for which let’s assume is 3), read more..

  • Page - 464

    423 Mov X, "This is a string literal." Mov Y, 16384 The instruction stream would look something like this: 0 2 3 8 This is a string literal 0 2 3 9 0 16384 I personally happen to find this implementation a bit messy; loading the instruction stream from the disk when the script is loaded into the runtime environment will become a more complicated read more..

  • Page - 465

    424 Line labels and jumps are often approached with one of two popular methods when assembling code for a real hardware system. The first method is called the two-pass method, because the cal- culation of line labels is handled in one complete pass over the source file, whereas the second pass assigns the results of the first pass (the index of each line label) to read more..

  • Page - 466

    425 According to the diagram, these nine instructions are indexed from 0-8, and any lines that do not contain instructions (even if they contain a label declaration) don’t count. Also, notice that line labels can be declared after references to them are made, as in the case of Label1. Here, notice that Label1 is referenced in the JLE instruction on line 5 before being read more..

  • Page - 467

    426 ■ The first pass begins with the assembler scanning through the entire source code file and assigning a sequential index to each instruction. It’s important to note that the amount of lines in the file is not necessarily equal to the amount of instructions it con- tains (in fact, this is rarely the case and will ultimately be impossible when the final XVM read more..

  • Page - 468

    427 machine code you output. Just like with labels, this is called resolving the jump. Note also that if the label cannot be found in the label array, you know it’s an invalid label (or again, just a misspelling) and must alert the users with an error. That, in a nutshell, is how line labels are processed in a two-pass fashion. The end results are jump instructions read more..

  • Page - 469

    428 Simple, right? Of course, functions are more than just labels, and calling a function is more than just jumping to it—otherwise, you’d just use the jump instructions and typical line labels instead. A function also brings with it a concept of scope and builds itself an appropriate stack frame upon its invocation—containing the parameters passed, return address and local read more..

  • Page - 470

    429 Memory Management First and foremost, it’s important to be aware of the different ways in which both the script source data, as well as the final executable data, can be stored. Early compilers and assemblers ran on machines with claustrophobically small amounts of memory, and as a result, kept as much infor- mation on the hard drive as possible at all times. Source read more..

  • Page - 471

    430 Either method will serve you well if it’s implemented correctly. However, for the purpose of this book, you’ll load the entire script into memory, rather than constantly making references to an external file, for a number of reasons: ■ It’s a lot easier to learn the concepts involved when everything is loaded into a structured memory location rather than the disk, read more..

  • Page - 472

    431 data that software simply chokes on. Whitespace? Hmph! Line breaks? Hmph! An assembler craves not these things. It’s your job to filter them out. Parsing and understanding human-readable data of any sort is always a tricky affair. Style and technique differ wildly from human to human, which means you have to make all sorts of gener- alizations and minimize your assumptions read more..

  • Page - 473

    432 will set the size of the script’s stack to 1024 elements. Here are some notes to keep in mind: ■ 0 can be passed as a stack size as well, which is a special flag for the VM to allocate the default stack size to the script. ■ The directive does not have to appear in the script at all; just like requesting a stack size of zero elements, this is read more..

  • Page - 474

    433 ; This function will run automatically when a script is executed Func _Main { ; Script execution begins here } XASM will need to take note of whether a _Main () function was found, and set the proper flags in the output file accordingly so as to pass the information on to the XVM. Because identifiers, including function names, are not preserved after the assembly read more..

  • Page - 475

    434 Func Super { ; Code Func Sub { ; Code } ; Code } The last issue in regards to Func is that Ret is not explicitly required at the end of a function. A Ret instruction will always be appended to the end of a function (even if you put one there yourself, not that it’d make a difference), to save the user having to add it to each function manually. Generally read more..

  • Page - 476

    435 ables in two different functions can’t use the same identifier; that’d be silly. Perhaps I should phrase it this way: no two variables within the same or overlapping scope can share a name. Var also has a modified form that can be used to declare arrays, which has the following syntax: Var ArrayName [ ArraySize ] All variable and array declarations in XtremeScript read more..

  • Page - 477

    436 I strongly advise against this for two reasons, however: ■ The code is far less readable, especially if there’s a considerable amount of code between the variable’s reference and its declaration. Although forward referencing is a must for line labels, it’s in no way required with variables. ■ It’s generally good practice to declare all variables before code anyway, read more..

  • Page - 478

    437 ; Begin function code Mov MyArray [ 0 ], U Mov MyArray [ 1 ], V Mov MyArray [ 2 ], W } This function is now designed to accept three parameters. This means that, in addition to the sin- gle stack element reserved for the return address, as well as the 18 stack elements worth of local data, the total size of this function’s stack frame read more..

  • Page - 479

    438 The stack indices will be assigned to the parameter names in the order they’re encountered, which explains why it’s so important. Note, however, that I implied you might want to list the parameters in reverse order, like this: Func MyFunc { Param W Param V Param U } This is actually preferable to the first method, because it allows the caller to push the parameters read more..

  • Page - 480

    439 And this is just bad practice. The two names are so close that you’re only going to end up confus- ing yourself, so I’ve taken it out of the realm of possibilities altogether. Instructions Despite the obvious importance of directives, instructions are what you’re really interested in. Because they ultimately drive the output of machine code, instructions are the read more..

  • Page - 481

    440 ■ Integer and floating-point literals. Integer literals are defined as strings of digits, optional- ly preceded by a negative sign. Floats are similar, although they can additionally contain one (and only one) radix point. Exponential notation and other permutations of float- ing-point form are not supported, but can be added rather easily. ■ String literals. These are read more..

  • Page - 482

    441 Everything about calling a host API function is syntactically identical to calling a script function. You pass parameters by pushing them onto the stack, you receive return values via _RetVal, and so on. The only major difference lies within the assembler, because you can’t just check the speci- fied function name against an array of function information. In fact, you read more..

  • Page - 483

    442 in the global scope, its value isn’t changed or erased as functions are called and returned; this is what makes it so useful for returning values. Comments Lastly, let’s talk about comments. Comments are somewhat flexible in XVM Assembly, in the sense that they can easily appear both on their own lines, or can follow the instruction on a line of code. For example: read more..

  • Page - 484

    443 ; ---- Functions ---------------------------------------------- ; A simple addition function Func MyAdd { ; Import our parameters Param Y Param X ; Declare local data Var Sum Mov Sum, X Add Sum, Y ; Put the result in the _RetVal register Mov _RetVal, Sum ; Remember, Ret will be automatically added } ; Just a bizarre function that does nothing read more..

  • Page - 485

    444 ; The special _Main () function, which will be automatically executed Func _Main { ; Call the MyFunc test function Call MyFunc } Whew! Think you’re clever enough to write an assembler that can understand everything here, and more? There’s only one way to find out, so let’s keep moving. Output: Structure of an XVM Executable So you know what sort of input to expect, read more..

  • Page - 486

    445 Each field of the file is prefixed by a size field, rather than followed by a terminating flag of some sort. This, for example, allows entire blocks of the file to be loaded into memory very quickly by C’s buffered input routines in a single call. In addition to the speed and simplicity by which a file can be loaded, the .XSE format is of course far read more..

  • Page - 487

    446 9. BUILDING THE XASM ASSEMBLER Table 9.2 XSE Main Header Name Size (in Bytes) Description ID String 4 Four-character string containing the .XSE ID, “XSE0” Version 2 Version number (first byte is major, sec- ond byte is minor) Stack Size 4 Requested stack size (set by SetStackSize directive; 0 means use default) Global Data Size 4 The total size of all global data Is _Main () read more..

  • Page - 488

    447 SetStackSize directive, and defaults to zero if the directive was not present in the script. Following this field is the size of all global data in the program, which is collected incrementally during the assembly phase. Lastly, we store information regarding the _Main () function-- the first is a 1-byte flag that just lets us know if it was present at all. If it read more..

  • Page - 489

    448 9. BUILDING THE XASM ASSEMBLER Table 9.4 The Instruction Structure Name Size (in Bytes) Description Opcode 2 The instruction’s opcode, corresponding to a specific VM action Operand Stream N Contains the instruction’s operand data Table 9.5 The Operand Stream Structure Name Size (in Bytes) Description Size 1 The number of operands in the stream (the operand count) Stream N A read more..

  • Page - 490

    449 Operand Types The last issue regarding the instruction stream is one of the various operand types the operands can assume. In addition to the code for each type, you also need to know what form the operand data itself will be found in. Let’s first take a look at the operand type codes themselves, found in Table 9.7. You’ll notice this list differs slightly from read more..

  • Page - 491

    450 because the value of X won’t be known until runtime. So, you instead write the base address of MyArray [] to the file, followed by the stack index at which X resides, so that the VM can add the value of X to MyArray []’s base address at runtime and find the absolute index. I know this can all come across as complicated, but remember—it’s just one level read more..

  • Page - 492

    451 The Function Index code is similar, and is used as the operand for the Call instruction. Rather than provide a direct instruction index to jump to, however, a function index refers to an ele- ment within the function table, which I’ll discuss in detail later. Similar to the Function Call Index is the Host API Call Index. Because the names of the host API’s read more..

  • Page - 493

    452 in the table is preceded by its own individual four-byte header specifying the string length. The string length is then followed by the string’s characters. Note that the strings are not padded or aligned in any way; if a string’s header contains the value 37, the string is exactly 37 characters (not including a null-terminator, because it’s not needed here), which read more..

  • Page - 494

    453 The Function Table The function table is the .XSE format’s next structure and maintains a profile of each function in the script. Each element of the table contains the function’s entry point (the index of its first instruction), the number of parameters it takes, and the total size of its local data. This informa- tion is used at runtime to prepare stack frames, read more..

  • Page - 495

    454 The _Main () function is also contained in this table, and is always stored at index zero (unless the script doesn’t implement _Main (), in which case index zero can be used for something else). The main header of the .XSE file contains a field that lets the VM know whether the _Main () method is present. Note also that the _Main () method will always set read more..

  • Page - 496

    455 That’s basically it. Aside from maybe the instruction stream, which gets a bit tricky, the .XSE for- mat overall is a simple and straightforward structure for storing executable scripts. It’s an easy and clean format to both read and write, so you shouldn’t have much trouble working with it. Despite its simplicity, however, it’s still quite powerful and complete, and read more..

  • Page - 497

    456 Before moving on, I’d like to say that what you’re about to work on is going to be your first real taste of compiler theory. I discussed some of these principals in a much more simplistic manner back in the command-based language chapters, but what you’re about to build is far more com- plex and a great deal more powerful. The scripts you’ll be able to read more..

  • Page - 498

    457 concepts form the basis for a language processor capable of understanding, validating, and trans- lating XVM Assembly Language. Lexing To get things started, let’s once again consider the Add function, a common example throughout the last two chapters: Func MyAdd { Param X ; Assign names to the two parameters Param Y Var Sum read more..

  • Page - 499

    458 The unfiltered source code, as it enters your assembler’s processing pipeline, is called a character stream, because it’s a stream of raw source code expressed as a sequence of characters. Once it passes through the first phase of the lexer, it becomes a lexeme stream, because each element in the stream is now a separate lexeme. Figure 9.22 helps visualize this. 9. read more..

  • Page - 500

    459 a comma, (TOKEN_TYPE_COMMA), and finally another identifier. These tokens of course directly cor- respond to Mov, Sum, ,, and X, respectively. This process of turning the lexeme stream into a token stream is known as tokenization, and because of this, lexers are often referred to as tokenizers. Without getting into the nitty gritties, I can tell you that the lexer is one read more..

  • Page - 501

    460 single token. Based on this initial token’s type, you can predict what tokens should theoretically come next, and compare that to the actual token stream. If the tokens match up the way you think they do, you can group them as a logical unit and consider them valid and ready to assem- ble. Figure 9.24 illustrates this. 9. BUILDING THE XASM ASSEMBLER Figure 9.24 Each read more..

  • Page - 502

    461 symbol. So, the following two tokens must be TOKEN_TYPE_IDENT and TOKEN_TYPE_OPEN_BRACKET. If either of these tokens is incorrect, or if they appear in the wrong order, you’ve detected a syntax error and can halt the assembly process to alert the users. If these two tokens are successfully read, on the other hand, you know the function declaration is valid and can record read more..

  • Page - 503

    462 Hopefully this has helped you understand the general process of parsing. Along with lexing and tokenization, you should at least have a conceptual idea of how this process works. Once you’ve properly parsed a given group of tokens, you’re all set to translate it. After parsing an instruction, for example, you use the instruction lookup table to verify its operands and read more..

  • Page - 504

    463 starting index and the ending index. The substring data itself is defined as all characters between and including the indices. Whitespace Whitespace can exist in any string, and is usually defined simply as non-visible characters such as spaces, tabs, and line breaks. However, it is often important to distinguish between whitespace that includes line breaks, and whitespace that read more..

  • Page - 505

    464 radix point (.) somewhere within the string (but not before the sign, if present, and not after the last digit), you can create another class called signed floating-point numeric strings. See figure 9.25 for a visual. 9. BUILDING THE XASM ASSEMBLER Figure 9.25 String classification. As you can see, this sort of classification is a useful and frequent operation when read more..

  • Page - 506

    465 if ( cChar >= '0' && cChar <= '9' ) return TRUE; else return FALSE; } // Determines if a character is whitespace int IsCharWhitespace ( char cChar ) { // Return true if the character is a space or tab. if ( cChar == ' ' || cChar == '\t' ) return TRUE; else return FALSE; } // Determines if a character could be part of a valid identifier int IsCharIdent ( read more..

  • Page - 507

    466 Simple enough, right? Each function basically works by comparing the character in question to either a set of specific characters or a range of characters and returning TRUE or FALSE based on the results. Now that you can classify individual characters, let’s expand the library to include functions for doing the same with strings. Because these functions are a bit more read more..

  • Page - 508

    467 sure that all characters are either numeric digits or the negation sign. Of course, at this stage, a number like -867-5309 would be considered valid. So, to complete the process, you make one more scan through to make sure that the negation sign, if present at all, is only the first character. So you can classify integer strings, but what about floats? Well, it’s more read more..

  • Page - 509

    468 else iRadixPointFound = TRUE; for ( iCurrCharIndex = 1; iCurrCharIndex < strlen ( pstrString ); ++ iCurrCharIndex ) if ( pstrString [ iCurrCharIndex ] == '-' ) return FALSE; if ( iRadixPointFound ) return TRUE; else return FALSE; } Once again, you start off with the typical checks for bad strings. You then move on to make sure the number consists solely of numbers, radix points, read more..

  • Page - 510

    469 This is a very simple function; all that’s necessary is to pass each character in the string to our pre- viously defined IsCharWhitespace () function and exit if non-whitespace is found. One extra note, however—note that unlike the last two functions you’ve written, this function returns TRUE in the event of an empty string. You do this because a lack of characters read more..

  • Page - 511

    470 The General Interface Just to get it out of the way, let’s start with a description of how the assembler will be implement- ed specifically. XASM will be a simple console application, which makes the code portable and the interface easy to design. The user will specify the input and output files using command-line parameters, and all messages to be displayed (error read more..

  • Page - 512

    471 At load time, the number of lines in the source file will be counted, and a suitably sized array of static strings called g_ppstrSourceCode will be allocated. These static strings will be large enough to hold what you predefine as the largest possible line the assembler supports. I usually use 4096 for this value. Chances are this is much bigger than anything you read more..

  • Page - 513

    472 Instructions The instruction structure will need to contain the instruction’s opcode, the number of operands it accepts, and a pointer to the operand data itself: typedef struct _Instr // An instruction { int iOpcode; // Opcode int iOpCount; // Number of operands Op * pOpList; read more..

  • Page - 514

    473 iOffsetIndex is only used when the active data type within the union is iStackIndex. In the cases where an operand is defined as a relative stack index, we need to store the base index and the off- set. Since we can’t have two members of the union active at the same time without overwriting each other, the offset field is kept separate. During the first pass, read more..

  • Page - 515

    474 A Simple Linked List Implementation All of the remaining structures in XASM are built on linked lists to allow them to grow dynami- cally as the source file is assembled. Before we go any further, I’m going to cover a simple C linked list implementation that will be the basis for the remaining tables. Linked lists consist of two structures: the list itself, and the read more..

  • Page - 516

    475 All this function does is set the head and tail pointers to NULL, and set the node count to zero. Once the list is initialized, you can start adding nodes to it with AddNode (): int AddNode ( LinkedList * pList, void * pData ) { // Create a new node LinkedListNode * pNewNode = ( LinkedListNode * ) malloc ( sizeof ( LinkedListNode ) ); // Set the node's data read more..

  • Page - 517

    476 // Alter the tail's next pointer to point to the new node pList->pTail->pNext = pNewNode; // Update the list's tail pointer pList->pTail = pNewNode; } // Increment the node count ++ pList->iNodeCount; // Return the new size of the linked list - 1, which is the node's index return pList->iNodeCount - 1; } The function begins by allocating space for the node and initializing read more..

  • Page - 518

    477 { // Save the pointer to the next node before freeing the current one pNextNode = pCurrNode->pNext; // Clear the current node's data if ( pCurrNode->pData ) free ( pCurrNode->pData ); // Clear the node itself if ( pCurrNode ) free ( pCurrNode ); // Move to the next node if it exists; otherwise, exit the loop if ( pNextNode ) pCurrNode = pNextNode; else break; } } } read more..

  • Page - 519

    478 int AddString ( LinkedList * pList, char * pstrString ) { // ---- First check to see if the string is already in the list // Create a node to traverse the list LinkedListNode * pNode = pList->pHead; // Loop through each node in the list for ( int iCurrNode = 0; iCurrNode < pList->iNodeCount; ++ iCurrNode ) { // If the current node's string equals the specified read more..

  • Page - 520

    479 it pretty much is identical. Since we can really just think of it as another string table, there’s no point in writing the same function twice just so it can have a different name. Because of this, I used AddString () in both places, and thus, the caller has to specify which list to add to. The Function Table The next table of interest is the function table, read more..

  • Page - 521

    480 int AddFunc ( char * pstrName, int iEntryPoint ) { // If a function already exists with the specified name, exit and return // an invalid index if ( GetFuncByName ( pstrName ) ) return -1; // Create a new function node FuncNode * pNewFunc = ( FuncNode * ) malloc ( sizeof ( FuncNode ) ); // Initialize the new function strcpy ( pNewFunc->pstrName, pstrName ); read more..

  • Page - 522

    481 The function you’ll use to add this remaining data looks like this: void SetFuncInfo ( char * pstrName, int iParamCount, int iLocalDataSize ) { // Based on the function's name, find its node in the list FuncNode * pFunc = GetFuncByName ( pstrName ); // Set the remaining fields pFunc->iParamCount = iParamCount; pFunc->iLocalDataSize = iLocalDataSize; } Again the function begins read more..

  • Page - 523

    482 // Otherwise move to the next node pCurrNode = pCurrNode->pNext; } // The structure was not found, so return a NULL pointer return NULL; } With this function, you can immediately retrieve any function’s node at any time, based solely on its name. For example, when parsing a Call instruction, you simply need to grab the function name string from the source code, pass read more..

  • Page - 524

    483 int AddSymbol ( char * pstrIdent, int iSize, int iStackIndex, int iFuncIndex ) { // If a label already exists if ( GetSymbolByIdent ( pstrIdent, iFuncIndex ) ) return -1; // Create a new symbol node SymbolNode * pNewSymbol = ( SymbolNode * ) malloc ( sizeof ( SymbolNode ) ); // Initialize the new label strcpy ( pNewSymbol->pstrIdent, pstrIdent ); pNewSymbol->iSize = iSize; read more..

  • Page - 525

    484 if ( CurrNode.FuncIndex == FuncIndex || CurrNode.StackIndex >= 0 ) return CurrNode; // Otherwise move on to the next in the list CurrNode = CurrNode.Next; } // The specified symbol was not found, so return NULL return NULL; } Just pass it the symbol’s identifier and function index, and this function will return the full node, allowing you access to anything you need. read more..

  • Page - 526

    485 // Return its size return pSymbol->iSize; } IMPLEMENTING THE ASSEMBLER NOTE Technically, the term symbol table is usually applied to a much broader range of information and stores information for all of the program’s symbols (the term symbol just being a synonym for identifier).This means that symbol tables usually store information regarding functions, line labels, etc. However, I read more..

  • Page - 527

    486 int AddLabel ( char * pstrIdent, int iTargetIndex, int iFuncIndex ) { // If a label already exists, return -1 if ( GetLabelByIdent ( pstrIdent, iFuncIndex ) ) return -1; // Create a new label node LabelNode * pNewLabel = ( LabelNode * ) malloc ( sizeof ( LabelNode ) ); // Initialize the new label strcpy ( pNewLabel->pstrIdent, pstrIdent ); pNewLabel->iTargetIndex = read more..

  • Page - 528

    487 // If the names and scopes match, return the current pointer if ( strcmp ( pCurrLabel->pstrIdent, pstrIdent ) == 0 && pCurrLabel->iFuncIndex == iFuncIndex ) return pCurrLabel; // Otherwise move to the next node pCurrNode = pCurrNode->pNext; } // The structure was not found, so return a NULL pointer return NULL; } As you’d imagine, it traverses the list until a suitable read more..

  • Page - 529

    488 allocated array of InstrLookup structures. The InstrLookup structure encapsulates a single instruc- tion, and looks like this: typedef struct _InstrLookup // An instruction lookup { char pstrMnemonic [ MAX_INSTR_MNEMONIC_SIZE ]; // Mnemonic string int iOpcode; // Opcode int iOpCount; // Number of operands read more..

  • Page - 530

    489 #define MAX_INSTR_MNEMONIC_SIZE 16 // Maximum size of an instruction // mnemonic's string InstrLookup g_InstrTable [ MAX_INSTR_LOOKUP_COUNT ]; Adding Instructions Two functions will be necessary to populate the table-- one to add new instructions, and one to define the individual operands. Let’s look at the function for adding instructions first, which is of read more..

  • Page - 531

    490 Given a mnemonic, opcode, and operand count, AddInstrLookup () will create the specified instruction at the next free index within the table (maintained via the static int) and return the index to the caller. It also allocates a dynamic array of OpTypes, giving the instruction room to define each of its operands. That process is facilitated with a function called SetOpType read more..

  • Page - 532

    491 instruction did accept. For example, the Mov instruction’s destination operand can be a variable or array index. The parser doesn’t care which it is; it only wants to make sure it’s one of them. So we’ve got the two functions we need, as well as our bitfield flags. Let’s look at an example of how a few instructions in the set are defined. Here’s Mov: read more..

  • Page - 533

    492 Of course, if you really want to go all out, you could store your language description in an exter- nal file that is read in by the assembler when it initializes. This would literally allow a single assem- bler to implement multiple instruction sets, which may be advantageous if you have a number of different virtual machines that you use in various game projects. read more..

  • Page - 534

    493 Accessing Instruction Definitions Once the table is populated, the parser (and even the lexer) will need to be able to easily retrieve the instruction lookup structure based on a supplied mnemonic. This will be enabled with a func- tion called GetInstrByMnemonic (). Here’s the code: int GetInstrByMnemonic ( char * pstrMnemonic, InstrLookup * pInstr ) { // Loop through each read more..

  • Page - 535

    494 // Compare the instruction's mnemonic to the specified one if ( strcmp ( g_InstrTable [ iCurrInstrIndex ].pstrMnemonic, pstrMnemonic ) == 0 ) { // Set the instruction definition to the user-specified pointer * pInstr = g_InstrTable [ iCurrInstrIndex ]; // Return TRUE to signify success return TRUE; } } // A match was not found, so return FALSE return FALSE; } Structural Overview read more..

  • Page - 536

    495 Each (or most) of these global structures also has a small interface of functions used to manipu- late the data it contains. Let’s run through them one more time to make sure you’re clear with everything. Starting with the string table: int AddString ( LinkedList * pList, char * pstrString ); Next up is the function table: int AddFunc ( char * pstrName, int read more..

  • Page - 537

    496 The Lexer’s Interface and Implementation The implementation of the lexical analyzer is embodied by a small group of functions and struc- tures. The primary interface will come down to a few main functions: GetNextToken (), GetCurrLexeme (), GetLookAheadChar (),SkipToNextLine (), and ResetLexer (). GetNextToken () GetNextToken () returns the current token and advances the token stream read more..

  • Page - 538

    497 GetCurrLexeme () GetCurrLexeme () returns a character pointer to the string containing the current lexeme. For example, if GetNextToken () returns TOKEN_TYPE_IDENT, GetCurrLexeme () will return the actual iden- tifier itself. Its prototype looks like this: char * GetCurrLexeme (); The string pointed to by GetNextLexeme () belongs to the g_Tokenizer structure, however, which means you read more..

  • Page - 539

    498 declaration is in fact an array declaration and that the line isn’t finished. Of course, if an open brack- et isn’t found, it means that the current line is indeed finished, and you can move on to the next token without fear of the stream being out of sync. As you’ll see throughout the development of the parser, you’ll only need a one-character look- ahead. read more..

  • Page - 540

    499 IMPLEMENTING THE ASSEMBLER Table 9.15 Token Type Constants Constant Description TOKEN_TYPE_INT An integer literal TOKEN_TYPE_FLOAT A floating-point literal TOKEN_TYPE_STRING A string literal value, not including the surrounding quotes. Quotes are considered separate tokens. TOKEN_TYPE_QUOTE A double quote " TOKEN_TYPE_IDENT An identifier TOKEN_TYPE_COLON A colon : TOKEN_TYPE_OPEN_BRACKET An opening read more..

  • Page - 541

    500 Note the END_OF_TOKEN_STREAM constant, which actually isn’t a token in itself but rather a sign that the token stream has ended. Even though the token type is just a simple integer value, it’s often convenient to wrap primitive data types in more descriptive names using typedef (plus it looks cool!). In the case of your tok- enizer, you can create a Token type based read more..

  • Page - 542

    501 This is not only a different string than was intended, but it won’t even assemble. You therefore need a way to make sure that the scanner knows when it’s inside a string, so it can ignore any semicolons until the string ends. Fortunately, this is easily solved: as the scanner moves through the string, it also needs to keep watch for double-quote characters. When read more..

  • Page - 543

    502 Running the initial line of code through this function will yield the correct output: Mov X, "This curse; it is your birthright." See a visual of this process in figure 9.32. 9. BUILDING THE XASM ASSEMBLER Figure 9.32 StripComments () maintains a flag that is set and cleared as semicolons are read, since they presumably denote the beginnings and endings of string read more..

  • Page - 544

    503 for ( iCurrCharIndex = iPadLength; iCurrCharIndex < iStringLength; ++ iCurrCharIndex ) pstrString [ iCurrCharIndex - iPadLength ] = pstrString [ iCurrCharIndex ]; for ( iCurrCharIndex = iStringLength - iPadLength; iCurrCharIndex < iStringLength; ++ iCurrCharIndex ) pstrString [ iCurrCharIndex ] = ' '; } // Terminate string at the start of right hand whitespace for ( iCurrCharIndex = read more..

  • Page - 545

    504 Lexing and Tokenizing Here’s where the real work begins. At this point you have a list of token type constants to pro- duce, your line of source code has been prepped and is ready to go, so all that’s left to do is iso- late the next lexeme and identify its token type. This, of course, is the most complicated part. The first thing to understand is where read more..

  • Page - 546

    505 3: Param Y 4: Var Product ; Declare a local 5: Mov Product, X ; Multiply X by Y 6: Mul Product, Y 7: } And would look like this after each line was prepped: 0: Func MyFunc 1: { 2: Param X 3: Param Y 4: Var Product 5: Mov Product, X 6: Mul Product, Y 7: } The assembly process moves from read more..

  • Page - 547

    506 simple premise: for example, that all lexemes are separated by whitespace. This would make your job very simple, and perhaps even let you use the standard C library tokenizing function, strtok (). Unfortunately, one of the four lexemes found previously was not separated from the lexeme before it by a space. Look at the Product and comma lexemes: Mov Product, X read more..

  • Page - 548

    507 beginning of the next lexeme. The second pointer is then repositioned to equal the first. Both pointers are now positioned on the first character of the lexeme. The second pointer then scans forward until the first delimiter character is found, and stops just before that character is read. At this point, the two pointers will exactly surround the lexeme. Check out read more..

  • Page - 549

    508 9. BUILDING THE XASM ASSEMBLER Table 9.16 Single-Character Tokens Token Description TOKEN_TYPE_QUOTE A quotation mark " TOKEN_TYPE_COMMA A comma , TOKEN_TYPE_COLON A colon : TOKEN_TYPE_OPEN_BRACKET An opening bracket [ TOKEN_TYPE_CLOSE_BRACKET A closing bracket ] TOKEN_TYPE_NEWLINE A line break TOKEN_TYPE_OPEN_BRACE An opening curly brace { TOKEN_TYPE_CLOSE_BRACE A closing curly brace } Table 9.17 read more..

  • Page - 550

    509 To check for integers, floats, and identifiers, you can use the functions covered earlier: IsStringInt (), IsStringFloat (), and IsStringIdent (). Every other token is a specific string like "VAR" or "_RETVAL" and can be tested with a simple string comparison. What I’ve described so far is a lexer capable of isolating and identifying all of the read more..

  • Page - 551

    510 ■ Should replace the \" and \\ escape sequences with their respective single-character values. ■ Should only stop scanning when it hits a non-escape sequence double-quote. As you can see, strings add quite a bit of complexity to the otherwise simplistic lexer, so let’s dis- cuss the solutions to each of these problems. First of all, you need the ability to tell read more..

  • Page - 552

    511 The solution is to design the lexer with three states in mind, rather than two. The first state, LEX_STATE_NO_STRING, is active by default and is used for all non-string lexemes. When a double- quote is read, this state switches to LEX_STATE_IN_STRING, which allows it to properly handle string lexemes. When the next double quote is read, it will know that read more..

  • Page - 553

    512 Token GetNextToken () { // ---- Lexeme Extraction // Move the first index (Index0) past the end of the last token, // which is marked by the second index (Index1). g_Lexer.iIndex0 = g_Lexer.iIndex1; // Make sure we aren't past the end of the current line. If a string is // 8 characters long, it's indexed from 0 to 7; therefore, indices 8 // and beyond lie outside of read more..

  • Page - 554

    513 while ( TRUE ) { // If the current character is not whitespace, exit the loop // because the lexeme is starting. if ( ! IsCharWhitespace ( g_ppstrSourceCode [ g_Lexer.iCurrSourceLine ][ g_Lexer.iIndex0 ] ) ) break; // It is whitespace, however, so move to the next character and // continue scanning ++ g_Lexer.iIndex0; } } // Bring the second index (Index1) to the lexeme's read more..

  • Page - 555

    514 // If the current character is a backslash, move ahead two // characters to skip the escape sequence and jump to the next // iteration of the loop if ( g_ppstrSourceCode [ g_Lexer.iCurrSourceLine ] [ g_Lexer.iIndex1 ] == '\\' ) { g_Lexer.iIndex1 += 2; continue; } // If the current character isn't a double-quote, move to the // next, otherwise exit the loop, because the read more..

  • Page - 556

    515 // Single-character lexemes will appear to be zero characters at this // point (since Index1 will equal Index0), so move Index1 over by one // to give it some noticeable width if ( g_Lexer.iIndex1 - g_Lexer.iIndex0 == 0 ) ++ g_Lexer.iIndex1; // The lexeme has been isolated and lies between Index0 and Index1 // (inclusive), so make a local copy for the lexer unsigned int read more..

  • Page - 557

    516 // We'll set the type to invalid now just in case the lexer doesn't // match any token types g_Lexer.CurrToken = TOKEN_TYPE_INVALID; // The first case is the easiest-- if the string lexeme state is // active, we know we're dealing with a string token. However, if the // string is the double-quote sign, it means we've read an empty string // and should return a read more..

  • Page - 558

    517 // If we're in a string, tell the lexer we just ended a // string case LEX_STATE_IN_STRING: g_Lexer.iCurrLexState = LEX_STATE_END_STRING; break; } g_Lexer.CurrToken = TOKEN_TYPE_QUOTE; break; // Comma case ',': g_Lexer.CurrToken = TOKEN_TYPE_COMMA; break; // Colon case ':': g_Lexer.CurrToken = TOKEN_TYPE_COLON; break; // Opening Bracket case '[': g_Lexer.CurrToken = TOKEN_TYPE_OPEN_BRACKET; break; // read more..

  • Page - 559

    518 // Closing Brace case '}': g_Lexer.CurrToken = TOKEN_TYPE_CLOSE_BRACE; break; // Newline case '\n': g_Lexer.CurrToken = TOKEN_TYPE_NEWLINE; break; } } // Now let's check for the multi-character tokens // Is it an integer? if ( IsStringInteger ( g_Lexer.pstrCurrLexeme ) ) g_Lexer.CurrToken = TOKEN_TYPE_INT; // Is it a float? if ( IsStringFloat ( g_Lexer.pstrCurrLexeme ) ) g_Lexer.CurrToken = read more..

  • Page - 560

    519 // Is it Func? if ( strcmp ( g_Lexer.pstrCurrLexeme, "FUNC" ) == 0 ) g_Lexer.CurrToken = TOKEN_TYPE_FUNC; // Is it Param? if ( strcmp ( g_Lexer.pstrCurrLexeme, "PARAM" ) == 0 ) g_Lexer.CurrToken =TOKEN_TYPE_PARAM; // Is it RetVal? if ( strcmp ( g_Lexer.pstrCurrLexeme, "_RETVAL" ) == 0 ) g_Lexer.CurrToken = TOKEN_TYPE_REG_RETVAL; // Is it an instruction? InstrLookup Instr; if ( read more..

  • Page - 561

    520 character. The character is analyzed, and it’s identified as whitespace. The lexer now knows that it has an arbitrary amount of leading whitespace to deal with, so it switches into STATE_WHITESPACE, which will consume whitespace until a non-whitespace is found. Finally a non-whitespace charac- ter is found. If this is a number, the state will switch into STATE_INT. It turns read more..

  • Page - 562

    521 brute force method on my own, long before learning about state machines, and I think that’s indica- tive of a lot of aspiring compiler/assembler writers. These ad-hoc methods just come more naturally, so I like the idea of covering them instead of pretend- ing they don’t exist like a lot of text- books tend to do. In a lot of ways, the XASM assembler implementa- tion read more..

  • Page - 563

    522 // Turn off string lexeme mode, since strings can't span multiple lines g_Lexer.iCurrLexState = LEX_STATE_NO_STRING; // Return TRUE to indicate success return TRUE; } It starts by incrementing the pointer to the current line, which moves us to the next line. It then makes sure we haven’t moved beyond the last line in the file by comparing the new position to read more..

  • Page - 564

    523 The last function in our lexer interface is GetLookAheadChar (), which scans through the source code from the current position until it finds the first character of the next token. Let’s have a look at its implementation: char GetLookAheadChar () { // We don't actually want to move the lexer's indices, so we'll // make a copy of them int iCurrSourceLine = read more..

  • Page - 565

    524 // If the current character is not whitespace, return it, since // it's the first character of the next lexeme and is thus the // look-ahead if ( ! IsCharWhitespace ( g_ppstrSourceCode [ iCurrSourceLine ][ iIndex ] ) ) break; // It is whitespace, however, so move to the next character // and continue scanning ++ iIndex; } } // Return whatever character the loop left read more..

  • Page - 566

    525 Error Handling We’re just about ready to dive into parsing, but before we do, there’s one important issue to address-- how will we handle errors? There are three major aspects of error handling: detection, resynchronization, and message output. Detection is all about determining when an error has occurred in the first place, as well as what type of error it was. read more..

  • Page - 567

    526 void ExitOnCodeError ( char * pstrErrorMssg ) { // Print the message printf ( "Error: %s.\n\n", pstrErrorMssg ); printf ( "Line %d\n", g_Lexer.iCurrSourceLine ); // Reduce all of the source line's spaces to tabs so it takes less // space and so the caret lines up with the current token properly char pstrSourceLine [ MAX_SOURCE_LINE_SIZE ]; strcpy ( pstrSourceLine, read more..

  • Page - 568

    527 There are times, however, when all that’s necessary is to let the user know that a specific character was expected but not found. For this, there’s ExitOnCharExpectedError (): void ExitOnCharExpectedError ( char cChar ) { // Create an error message based on the character char * pstrErrorMssg = ( char * ) malloc ( strlen ( "' ' expected" ) ); sprintf ( read more..

  • Page - 569

    528 which you’ll detect them), but you don’t have to worry about mixed caps, spacing, or anything along those lines. The actual process of parsing the token stream is relatively simple. As mentioned in the parsing introduction, the main principal is identifying the initial token and predicting what should follow based on how that initial token fits into the rules of the read more..

  • Page - 570

    529 // ---- Set some initial variables g_iInstrStreamSize = 0; g_iIsSetStackSizeFound = FALSE; g_ScriptHeader.iGlobalDataSize = 0; // Set the current function's flags and variables int iIsFuncActive = FALSE; FuncNode * pCurrFunc; int iCurrFuncIndex; char pstrCurrFuncName [ MAX_IDENT_SIZE ]; int iCurrFuncParamCount = 0; int iCurrFuncLocalDataSize = 0; // Create an instruction definition structure to read more..

  • Page - 571

    530 that follows the parsing of a directive really just means storing its information in the appropriate tables and moving on. At each iteration of the first pass, an initial token is read with a call to GetNextToken (), like this: if ( GetNextToken () == END_OF_TOKEN_STREAM ) break; Note that before doing anything, we make sure we haven’t passed the end of the token read more..

  • Page - 572

    531 // Convert the lexeme to an integer value from its string // representation and store it in the script header g_ScriptHeader.iStackSize = atoi ( GetCurrLexeme () ); // Mark the presence of SetStackSize for future encounters g_iIsSetStackSizeFound = TRUE; break; That wasn’t so bad, huh? That’s how parsing works. This pattern, as simple as it seems, can be applied to the read more..

  • Page - 573

    532 check for any number of line breaks, from 0 to N, between the name of the function and the opening brace. This will allow the users to use whatever style they’re used to. Let’s look at some code to parse it (also check out Figure 9.40): case TOKEN_TYPE_FUNC: { // First make sure we aren't in a function already, since nested functions // are illegal if ( read more..

  • Page - 574

    533 // Read any number of line breaks until the opening brace is found while ( GetNextToken () == TOKEN_TYPE_NEWLINE ); // Make sure the lexeme was an opening brace if ( g_Lexer.CurrToken != TOKEN_TYPE_OPEN_BRACE ) ExitOnCharExpectedError ( '{' ); // All functions are automatically appended with Ret, so increment the // required size of the instruction stream ++ g_iInstrStreamSize; read more..

  • Page - 575

    534 During the parsing of the function’s body, you need to count the number of parameters and local variables as the function is parsed, which is why we initialize iCurrFuncParamCount and iCurrFuncLocalDataSize to zero. When the end of the function is reached, you can send this infor- mation to SetFuncInfo () to finalize the function’s entry in the table. Speaking of the read more..

  • Page - 576

    535 Var/Var [] The Var and Var [] directives can occur both inside and outside of functions. As you’ve learned, those found outside declare variables and arrays within the global scope, and those found inside declare them in a scope local to that function. Like I mentioned earlier when discussing the lexer, you’ll need to utilize a one-character look- ahead when parsing the read more..

  • Page - 577

    536 Things are different in the case of globals. Global variables should get their own counter, because they’re separate from locals and because, technically, global declarations can appear in between function declarations. This is why we initialized g_ScriptHeader.iGlobalDataSize to zero earlier. Every time a global variable is encountered, the current global data size is used as its read more..

  • Page - 578

    537 With all that sorted out, let’s take a look at the code for parsing single variables, in both the local and global scope: case TOKEN_TYPE_VAR: { // Get the variable's identifier if ( GetNextToken () != TOKEN_TYPE_IDENT ) ExitOnCodeError ( ERROR_MSSG_IDENT_EXPECTED ); char pstrIdent [ MAX_IDENT_SIZE ]; strcpy ( pstrIdent, GetCurrLexeme () ); // This version of the code only read more..

  • Page - 579

    538 the variable size (stored in iSize) to 1 by default since this initial code won’t handle arrays. The variable’s stack index is then calculated using the same algorithm described earlier, and saved in iStackIndex. Using this information, a new symbol is added using AddSymbol (), which reports an error in the event of a variable redefinition. Lastly, the current read more..

  • Page - 580

    539 // We're parsing an array, so the next lexeme should be an integer // describing the array's size if ( GetNextToken () != TOKEN_TYPE_INT ) ExitOnCodeError ( ERROR_MSSG_INVALID_ARRAY_SIZE ); // Convert the size lexeme to an integer value iSize = atoi ( GetCurrLexeme () ); // Make sure the size is valid, in that it's greater than zero if ( iSize <= 0 ) ExitOnCodeError ( read more..

  • Page - 581

    540 Pretty simple addition, huh? It was just a matter of taking the new variable size into account. If the look-ahead reveals an open bracket, two tokens are read. The first should be the bracket itself, and the second should be an integer token correlating to the size of the array. The lexeme is translated into a real integer with atoi (), and the value is saved in read more..

  • Page - 582

    541 Just as in the first pass, the second pass will keep track of which function it’s in, which is helpful so you can assign it to the parameter with the proper scope. You’ll also need to once again keep track of iCurrFuncParamCount for each function, because the current parameter count will help you determine the stack index. The stack index for a parameter is read more..

  • Page - 583

    542 Line Labels Line labels will first appear to the parser in the form of an identifier token, since that’s what a label is. This means that any time your initial token is TOKEN_TYPE_IDENT, the look-ahead character can be used to find out if the following token is a colon. If so, it’s definitely a line label declaration. Here’s an example of a line label: read more..

  • Page - 584

    543 sure it’s not being declared globally, which is illegal. Both of these cases result in errors. The cur- rent lexeme contains the label itself, and the current instruction (which is always equal to the cur- rent size of the instruction stream minus one) is locally saved as the label’s target instruction index. The function in which the label resides is also recorded, read more..

  • Page - 585

    544 Notice that I’ve pretty much glossed over the process of parsing the operands. This is because operand parsing is a rather huge job and would only end up cluttering this example. In fact, it’s easily the most complex part of parsing an instruction. In fact, therein lies the problem. Think about it—any given operand can be one of any number of types. Some of read more..

  • Page - 586

    545 Since we’ve already designed and implemented the instruction lookup table, we have everything we need to get started. Just as a refresher, the each entry in the instruction lookup table contains: ■ The instruction’s mnemonic, which is used to map instructions in the source file to their entries in the table. ■ The opcode. ■ The number of operands the instruction read more..

  • Page - 587

    546 structure. This is why we declared the CurrInstr structure when the parser was initialized. This structure is initially used to write the opcode to the instruction stream at the index specified by g_iCurrInstrIndex. The parser thus far will produce an assembled instruction stream that represents each source code instruction as an opcode. There aren’t any operands yet, but read more..

  • Page - 588

    547 I mentioned originally that these masks don’t match up directly with the specific operand types we’ve established because the parser only needs a general idea of which operands are acceptable, as opposed to the exact type that was used. The XVM, however, will need to know exactly what type of operand was actually used at runtime, because variables, arrays indexed with read more..

  • Page - 589

    548 // Loop through each operand, read it from the source and assemble it for ( int iCurrOpIndex = 0; iCurrOpIndex < CurrInstr.iOpCount; ++ iCurrOpIndex ) { // Read the operand's type bitfield OpTypes CurrOpTypes = CurrInstr.OpList [ iCurrOpIndex ]; // Read in the next token, which is the initial token of the operand Token InitOpToken = GetNextToken (); // --- Process the read more..

  • Page - 590

    549 // Make sure there's no extraneous stuff ahead if ( GetNextToken () != TOKEN_TYPE_NEWLINE ) ExitOnCodeError ( ERROR_MSSG_INVALID_INPUT ); // Copy the operand list pointer into the assembled stream g_pInstrStream [ g_iCurrInstrIndex ].pOpList = pOpList; // Move along to the next instruction in the stream ++ g_iCurrInstrIndex; This actually brings you closer than you might think to a read more..

  • Page - 591

    550 { // Set an integer operand type pOpList [ iCurrOpIndex ].iType = OP_TYPE_INT; // Copy the value into the operand list from the current // lexeme pOpList [ iCurrOpIndex ].iIntLiteral = atoi ( GetCurrLexeme () ); } else ExitOnCodeError ( ERROR_MSSG_INVALID_OP ); break; } This code implements an integer operand parse-and-assemble sequence. Of course, that leaves a number of other read more..

  • Page - 592

    551 ■ The _RetVal register. _RetVal is another easy one. It exists as a single, deterministic token, which means all you need to do is make sure the initial token is TOKEN_TYPE_REG, and write the register code zero to the operand list. The operand type is set to OP_TYPE_REG. ■ Line labels. This is the first operand type that involves an identifier, which makes it read more..

  • Page - 593

    552 Building the .XSE Executable The source file has been fully assembled, so all that remains is dumping everything into an .XSE file. We already know what the structure of the file is like, so let’s look at some code. To get start- ed, the file is opened for binary output (assume pstrFilename) contains the name of the exe- cutable file): FILE * pExecFile; if ( read more..

  • Page - 594

    553 Notice that the function makes a number of local copies of the data before writing it. This is done to ensure that the variable written to the file occupies the exact number of bytes specified by the format. Even though 32-bit integers are used to store most integer values internally, many of these values are represented more efficiently in the file as 8- and read more..

  • Page - 595

    554 // Loop through the operand list and print each one out for ( int iCurrOpIndex = 0; iCurrOpIndex < iOpCount; ++ iCurrOpIndex ) { // Make a copy of the operand pointer for convenience Op CurrOp = g_pInstrStream [ iCurrInstrIndex ].pOpList [ iCurrOpIndex ]; // Create a character for holding operand types (1 byte) char cOpType = CurrOp.iType; fwrite ( & cOpType, 1, 1, pExecFile read more..

  • Page - 596

    555 // Relative stack index case OP_TYPE_REL_STACK_INDEX: fwrite ( & CurrOp.iStackIndex, sizeof ( int ), 1, pExecFile ); fwrite ( & CurrOp.iOffsetIndex, sizeof ( int ), 1, pExecFile ); break; // Function index case OP_TYPE_FUNC_INDEX: fwrite ( & CurrOp.iFuncIndex, sizeof ( int ), 1, pExecFile ); break; // Host API call index case OP_TYPE_HOST_API_CALL_INDEX: fwrite ( & CurrOp.iHostAPICallIndex, read more..

  • Page - 597

    556 // Create a character for writing parameter counts char cParamCount; // Loop through each node in the list and write out its string for ( iCurrNode = 0; iCurrNode < g_StringTable.iNodeCount; ++ iCurrNode ) { // Copy the string and calculate its length char * pstrCurrString = ( char * ) pNode->pData; int iCurrStringLength = strlen ( pstrCurrString ); // Write the length (4 read more..

  • Page - 598

    557 // Write the parameter count (1 byte) cParamCount = pFunc->iParamCount; fwrite ( & cParamCount, 1, 1, pExecFile ); // Write the local data size (4 bytes) fwrite ( & pFunc->iLocalDataSize, sizeof ( int ), 1, pExecFile ); // Move to the next node pNode = pNode->pNext; } For convenience the function creates a local copy of the function at each iteration of the loop, and read more..

  • Page - 599

    558 Since the host API call table is really just a glorified string table, the procedure is more or less identical. Also like the string table, the length of each host API call string is calculated just before being written out. With this table written, the entire .XSE file is complete, along with the rest of the assembly process for that matter! It’s been a pretty read more..

  • Page - 600

    559 The initialization of the program then begins. This is where the master instruction lookup table is initialized. This can either be done in the code itself by an initialization function, or loaded from a file containing a description of the instruction set. The lexer is also reset with a call to ResetLexer (). The First Pass With the source code loaded into memory, the read more..

  • Page - 601

    560 The Second Pass The second pass is responsible for actually assembling the code into an instruction stream capa- ble of being dumped into the executable file. This pass makes heavy use of data collected in the first pass, but, all things considered, is the more vital of the two. Directives are largely ignored in the second pass, and regardless of function declarations, read more..

  • Page - 602

    561 proper register code. Any reference to a variable, parameter, array, function, or line label that’s either not in the current scope or doesn’t exist results in an error that terminates the assembly process and is displayed for the user. That brings you to literal values. Integer and float literals are dumped directly into the instruc- tion stream, whereas strings are read more..

  • Page - 603

    562 Producing the .XSE The last step in the assembly process is dumping everything into the executable. This process begins by writing out the main header, including the ID string, major and minor version num- bers, requested stack size, and a single integer value representing whether a _Main () method was implemented. After the main header, the instruction stream is dumped read more..

  • Page - 604

    563 SUMMARY You’ve done well, apprentice. Against all odds, you rose to the challenge and took your first major step towards attaining scripting mastery by building your own assembler (or, at least, you read about how it’s done and hopefully understood it). If you haven’t already, I strongly urge you to check out the working XASM implementation on the accompa- nying CD. read more..

  • Page - 605

    564 don’t mind (or even enjoy) coding in assembly, nothing will stop you from immediately putting the system to use. That’s why I made sure you designed the language with human coders in mind as well. Remember-- the syntax may be a bit funky, but assembly languages can do everything higher level languages can. That means XASM and the XVM alone will be enough to read more..

  • Page - 606

    Part Five Designing and Implementing a Virtual Machine read more..

  • Page - 607

    This page intentionally left blank read more..

  • Page - 608

    Basic VM Design and Implementation “They’re gonna build it.” ——Palmer Joss, Contact CHAPTER 10 read more..

  • Page - 609

    568 X ASM is up and running, which means you’re now capable of turning XVM Assembly scripts into executables. However, despite your ability to create neat-looking binary files that amaze and confuse your friends, you can’t actually do anything with them. Fortunately, this chapter is all about changing that. An executable produced by the XASM assembler is designed for a runtime read more..

  • Page - 610

    569 The common thread among all of these examples is that without using the hardware processor itself, these pieces of software are capable of executing programs in the form of scripts and pro- viding them with the necessary memory address space and other such facilities. This is exactly what the virtual machine will do. Check out Figure 10.1. GHOST IN THE VIRTUAL MACHINE read more..

  • Page - 611

    570 But much like Blade, with his combination of human and vampire blood, your VM will enjoy most of the strengths and few of the weaknesses of a real computing system. On the one hand, you can take advantage of the tried-and-true architecture that already runs so well on real hardware. On the other hand, however, you can discard many of the low-level complexities of read more..

  • Page - 612

    571 The Instruction Stream The first and most obvious, of course, is the instruction stream— an array of compiled opcodes and operands that describes the logic of the script. The instruction stream embodies the script’s runtime activity— as execution progresses, the script’s opcodes determine exactly what will hap- pen. Figure 10.3 illustrates the instruction stream. GHOST IN THE read more..

  • Page - 613

    572 10. BASIC VM DESIGN AND IMPLEMENTATION Figure 10.4 A general memory map of the VM’s run- time stack. Globals always start at the base, followed by func- tion stack frames. In between frames may exist 0-N elements pushed on by code using the Push and Pop instructions. Figure 10.5 Global data tables. generally not shared; rather, scripts exist within their own self-contained read more..

  • Page - 614

    573 Multithreading Especially in the context of game scripting, it’s extremely important that a VM support multi- threading to allow the concurrent execution of multiple scripts. If each enemy on the screen is controlled by a separate script, and the level environment is scripted as well, it’s obvious that all of these entities must be able to execute at once without stepping read more..

  • Page - 615

    574 As you saw in Chapter 6, this usually comes down to a translation mechanism that can facilitate intra-language function calls—in other words, an abstraction layer that lets the host call script functions, and vice versa, without either side knowing the details of the other. See Figure 10.7. Like multithreading, I’ll also discuss the host/script interface in the next chapter. read more..

  • Page - 616

    575 This process starts by reading the script’s header data. In the case of your predefined .XSE exe- cutable format, you begin by reading the four-byte ID string and comparing it to "XSE0". This is done to ensure that the file in question is indeed a valid XVM executable. Once the ID string is validated, you can proceed to read out the version number, which read more..

  • Page - 617

    576 Beginning Execution at the Entry Point Every script with a _Main () function has an entry point by nature, whereas those without _Main () do not. This term refers to the first instruction of _Main (), which is where the automatic execu- tion of the script begins. Not every script needs an entry point. In the case of these scripts, execu- tion doesn’t begin until read more..

  • Page - 618

    577 language is largely typeless, an Add instruction may be required to “add” an integer to a string, because of the data types of the operands. Because of cases like this, the first step in dealing with operands is converting them to a common type. Because the integer and string values can’t actually be added, you’ll need to temporarily cast the string to an inte- ger. read more..

  • Page - 619

    578 Function Calls One major aspect of a script’s runtime behavior is the calling of and returning from functions. Naturally, since the XtremeScript system is based around a procedural language, a reliable method of handling function calls is crucial. Up until now, we’ve learned quite a bit about stack frames, how functions are described and stored in the .XSE executable, and read more..

  • Page - 620

    579 So, we can solve this problem by pushing another stack element on after the stack frame. This element will have the index of the function to which the frame belongs written to its iFuncIndex field, which means that all Ret has to do is read the element at the top of the current stack frame, grab the value of its iFuncIndex field, and use that to get the read more..

  • Page - 621

    580 We now have all the information we need to safely call a function, so let’s review. When calling a function: ■ The function’s information is retrieved in the form of a Func structure from the function table. ■ The return address is pushed onto the stack. ■ The stack frame is pushed. The size of this frame is large enough to hold the function’s local read more..

  • Page - 622

    581 With the return address saved, the entire stack frame—meaning the function’s local data, return address, and parameters—is popped off. The function’s stack frame is now entirely removed, so the stack structure’s iTopIndex and iFrameIndex values are updated. With the stack in the state it was in before the function was called, an unconditional jump is made to the return read more..

  • Page - 623

    582 STRUCTURAL OVERVIEW OF THE XVM PROTOTYPE A VM’s structure is extremely important. Because scripting is already daunted by a considerable performance overhead, you should do all you can to design your runtime environment to minimize bottlenecks and maximize efficiency. You’ve already taken a brief tour of the virtual machine’s major components, so let’s take a deeper look and read more..

  • Page - 624

    583 The Script Header Just as an executable file maintains a script header area, a script’s representation in memory will involve a header-like structure that manages miscellaneous high-level attributes. Here’s a list of what a script in the XVM prototype will need to properly maintain itself: ■ A Pause Flag. The Pause instruction can be used at any time to temporarily read more..

  • Page - 625

    584 typedef struct _Value // A runtime value { int iType; // Type union // The value { int iIntLiteral; // Integer literal float fFloatLiteral; // Float literal char * read more..

  • Page - 626

    585 typedef struct _InstrStream // An instruction stream { Instr * pInstrs; // The instructions themselves int iSize; // The number of instructions in the // stream int iCurrInstr; // The instruction pointer } read more..

  • Page - 627

    586 Ultimately, the stack is just an array of runtime values. However, because it doesn’t have the ability to physically grow or shrink as the script executes, you must augment this otherwise simple struc- ture with an extra data member— a simple integer value that tracks the current top index. This value will initially be set to zero, as the stack will start off read more..

  • Page - 628

    587 The Function Table Fortunately, the function table marks the first of the easy structures. The function table never changes during the execution of the script, which means you can allocate it once at the time the script is loaded and can forget about it. A script won’t somehow add, remove, or change its func- tions, so once it’s initialized, the table is good to read more..

  • Page - 629

    588 The Final Script Structure All of these structures I’ve discussed are brought together to describe the script as a whole. It’s therefore convenient to wrap them into a single main structure that allows you to refer to each of the script’s elements relative to a common name. This structure is simply called Script, and looks like this: typedef struct _Script read more..

  • Page - 630

    589 RuntimeStack Stack; // The runtime stack Func * pFuncTable; // The function table HostAPICallTable HostAPICallTable; // The host API call table } For now, this is just an easy way to refer to your single script, but as you’ll see in the next chapter, wrapping everything like this makes read more..

  • Page - 631

    590 Loading an .XSE Executable The first thing to do, naturally, is write a function that will give you the ability to load executable script files and populate the VM script structure’s major structures with their data. This will account for the first major phase of the XVM prototype’s lifecycle. An .XSE Format Overview To get things started, refresh yourself on the read more..

  • Page - 632

    591 BUILDING THE XVM PROTOTYPE Table 10.2 The Instruction Stream Structure Name Size (in Bytes) Description Size 4 The number of instructions in the stream (not the stream size in bytes) Stream N A variable-length stream of instruction structures Table 10.3 The Instruction Structure Name Size (in Bytes) Description Opcode 2 The instruction’s opcode, corresponding to a specific VM action read more..

  • Page - 633

    592 10. BASIC VM DESIGN AND IMPLEMENTATION Table 10.5 The Operand Structure Name Size (in Bytes) Description Type 1 The type of operand (integer literal, vari- able, and so on) Data N The operand data itself, which may be any size Table 10.6 The String Table Structure Name Size (in Bytes) Description Size 4 The number of strings in the table (not the total table size in bytes) read more..

  • Page - 634

    593 BUILDING THE XVM PROTOTYPE Table 10.9 The Function Structure Name Size (in Bytes) Description Entry Point 4 The index of the first instruction of the function Parameter Count 1 The number of parameters the function accepts Local Data Size 4 The total size of the function’s local data (the sum of all local variables and arrays) Table 10.10 The Host API Call Table Structure read more..

  • Page - 635

    594 The Header The header is probably the easiest part of the executable to load. It’s read from the file simply by reading the first four elements and saving a few of them. Here’s the XVM prototype’s implemen- tation: // Create a buffer to hold the file's ID string // (4 bytes + 1 null terminator = 5) char * pstrIDString; pstrIDString = ( char * ) malloc ( 5 read more..

  • Page - 636

    595 // Allocate the runtime stack int iStackSize = g_Script.Stack.iSize; g_Script.Stack.pElmnts = ( Value * ) malloc ( iStackSize * sizeof ( Value ) ); // Read the global data size (4 bytes) fread ( & g_Script.iGlobalDataSize, 4, 1, pScriptFile ); // Check for presence of _Main () (1 byte) fread ( & g_Script.iIsMainFuncPresent, 1, 1, pScriptFile ); // Read _Main ()'s function index (4 read more..

  • Page - 637

    596 That was easy, but loading the stream itself is considerably more complex. For the most part, it’s just a simple loop, but just like always, the details of the operand lists are going to make things tough. The basic idea is to start a loop that will iterate through each instruction in the stream. At each iteration, the opcode and operand count are read from the read more..

  • Page - 638

    597 { // Integer literal case OP_TYPE_INT: fread ( & pOpList [ iCurrOpIndex ].iIntLiteral, sizeof ( int ), 1, pScriptFile ); break; // Floating-point literal case OP_TYPE_FLOAT: fread ( & pOpList [ iCurrOpIndex ].fFloatLiteral, sizeof ( float ), 1, pScriptFile ); break; // String index case OP_TYPE_STRING: // Since there's no field in the Value structure for string // table // indices, read more..

  • Page - 639

    598 fread ( & pOpList [ iCurrOpIndex ].iOffsetIndex, sizeof ( int ), 1, pScriptFile ); break; // Function index case OP_TYPE_FUNC_INDEX: fread ( & pOpList [ iCurrOpIndex ].iFuncIndex, sizeof ( int ), 1, pScriptFile ); break; // Host API call index case OP_TYPE_HOST_API_CALL_INDEX: fread ( & pOpList [ iCurrOpIndex ].iHostAPICallIndex, sizeof ( int ), 1, pScriptFile ); break; // Register case read more..

  • Page - 640

    599 into the integer’s slot and forget about them it. In the next section, when you load the string table, you’ll put this information to use. The String Table At runtime, strings are stored directly in the Value structure, which is different than their storage on the disk wherein strings are organized in a separate table and only indirectly referenced in the instruction read more..

  • Page - 641

    600 { // Get the instruction's operand count and a copy of its operand list int iOpCount = g_Script.InstrStream.pInstrs [ iCurrInstrIndex ].iOpCount; Value * pOpList = g_Script.InstrStream.pInstrs [ iCurrInstrIndex ].pOpList; // Loop through each operand for ( int iCurrOpIndex = 0; iCurrOpIndex < iOpCount; ++ iCurrOpIndex ) { // If the operand is a string index, make a local copy of read more..

  • Page - 642

    601 // Get the current operand type int OpType = g_Script.InstrStream.Instrs \ [ CurrInstr ].OpList [ CurrOp ].Type; // Is this a string operand? if ( OpType == OP_TYPE_STRING ) { // The string index is in the IntLiteral field int StringIndex = g_Script.InstrStream \ [ CurrInstr ].OpList [ CurrOp ].IntLiteral; // Get the string from the table string StringOp = read more..

  • Page - 643

    602 // Read the function count (4 bytes) int iFuncTableSize; fread ( & iFuncTableSize, 4, 1, pScriptFile ); // Allocate the table g_Script.pFuncTable = ( Func * ) malloc ( iFuncTableSize * sizeof ( Func ) ) Next is a loop that reads each function from the file: // Read each function for ( int iCurrFuncIndex = 0; iCurrFuncIndex < iFuncTableSize; ++ iCurrFuncIndex ) { // Read the read more..

  • Page - 644

    603 I’ll just let the code speak for itself. Here’s the allocation: // Read the host API call count fread ( & g_Script.HostAPICallTable.iSize, 4, 1, pScriptFile ); // Allocate the table g_Script.HostAPICallTable.ppstrCalls = ( char ** ) malloc ( g_Script.HostAPICallTable.iSize * sizeof ( char * ) ); Next is a loop that reads each function from the file: for ( int iCurrCallIndex = read more..

  • Page - 645

    604 Figure 10.18 illustrates the concept of adequate interfaces for script structures: 10. BASIC VM DESIGN AND IMPLEMENTATION NOTE The details and purpose of this section may be somewhat confusing at first, so you might have to take some of this on faith.The following sec- tion,“The Execution Cycle,” will be considerably easier to understand and implement with this under your read more..

  • Page - 646

    605 to move the source data. It’d be nice to make a single function call that essentially tells the VM “give me the stack index of the first operand”. Of course, because the destination may also be the _RetVal register, which doesn’t reside on the stack, you might first want to say “tell me the type of the first operand.” This would just be a simple read more..

  • Page - 647

    606 First, you’ll need a function that will simply return the type of a given operand in the current instruction: int GetOpType ( int iOpIndex ) { // Get the current instruction int iCurrInstr = g_Script.InstrStream.iCurrInstr; // Return the type return g_Script.InstrStream.pInstrs [ iCurrInstr ].pOpList [ iOpIndex ].iType; } Simple, huh? All you had to do was grab the iType field read more..

  • Page - 648

    607 So you can read the type of the current instruction’s operands. What about the operand values themselves? You can start by writing a function that returns exactly that: int GetOpType ( int iOpIndex ) { // Get the current instruction int iCurrInstr = g_Script.InstrStream.iCurrInstr; // Return the type return g_Script.InstrStream.pInstrs [ iCurrInstr ].pOpList [ iOpIndex ].iType; } All read more..

  • Page - 649

    608 // Return a function table index int GetOpAsFuncIndex ( int OpIndex ); // Return a host API call index string GetOpAsHostAPICallIndex ( int OpIndex ); // Return a register code string GetOpAsReg ( int OpIndex ); These functions are only so useful, however. Remember, most instructions not only accept literal values, but also _RetVal and variables that refer to values on the read more..

  • Page - 650

    609 // It's in _RetVal case OP_TYPE_REG: return g_Script._RetVal; // Anything else can be returned as-is default: return OpValue; } } How cool is this function? Just pass it an operand index, and it’ll return the Value structure that con- tains it, no matter where it is-- directly in the instruction stream, on the stack via both absolute and relative indices, or in _RetVal. The read more..

  • Page - 651

    610 Being able to load a specific data type from any operand with a single call is a great help, but you need to take it one step further for it to do everything you’ll ultimately need. In addition to sim- ply reading a given field from an operand’s Value structure, you’ll also need these functions to automatically perform coercions. For example, imagine you’re read more..

  • Page - 652

    611 int CoerceValueToInt ( Value Val ) { // Determine which type the Value currently is switch ( Val.iType ) { // It's an integer, so return it as-is case OP_TYPE_INT: return Val.iIntLiteral; // It's a float, so cast it to an integer case OP_TYPE_FLOAT: return ( int ) Val.fFloatLiteral; // It's a string, so convert it to an integer case OP_TYPE_STRING: return atoi ( read more..

  • Page - 653

    612 // It's a string, so convert it to a float case OP_TYPE_STRING: return ( float ) atof ( Val.pstrStringLiteral ); // Anything else is invalid default: return 0; } } Looks simple enough. Here’s the string version: char * CoerceValueToString ( Value Val ) { char * pstrCoercion; if ( Val.iType != OP_TYPE_STRING ) pstrCoercion = ( char * ) malloc ( MAX_COERCION_STRING_SIZE + 1 read more..

  • Page - 654

    613 Now this function is a bit different and deserves some explanation. The issue here is that unlike primitive data types int and float, strings are not allocated statically and therefore, whenever an operand must be converted to a string, its space must be allo- cated immediately. Unfortunately, we can’t very easily tell how long the string needs to be that will hold the read more..

  • Page - 655

    614 There is one exception, however, and that’s GetOpType (), which must actually exist in two forms. The reason for this is an operand can potentially have two types at once, in a manner of speak- ing. On the one hand, all values ultimately come down to one of the direct types— integers, strings, line labels, whatever. However, the single level of indirection allowed read more..

  • Page - 656

    615 easy and automated. Fortunately, this part of the job is easier by nature, and we’ll only need to write one new function to handle it. Reading operands is complicated because their location within the runtime environment must be resolved, and their data types must be coerced. Writing them, however, is quite a bit simpler because they can only go to one of two places: read more..

  • Page - 657

    616 case OP_TYPE_REL_STACK_INDEX: { int iStackIndex = ResolveOpStackIndex ( iOpIndex ); return & g_Script.Stack.pElmnts [ ResolveStackIndex ( iStackIndex ) ]; } // It's _RetVal case OP_TYPE_REG: return & g_Script._RetVal; } // Return NULL for anything else return NULL; } With this function, any destination operand can be easily written to by writing a Value structure to the pointer it read more..

  • Page - 658

    617 The way this works is simple-- if iIndex is less than zero, meaning it’s a negative stack index and is therefore relative to the top of the current stack frame, it’s added to the stack’s iFrameIndex index. Otherwise, it’s left alone because positive indices are already in their fully resolved form. Remember, negative stack indices are relative to the top of the read more..

  • Page - 659

    618 To push a runtime value onto the stack, you copy the Value structure into the array index pointed to by the iTopIndex field of the Stack structure, and then increment that value. Here’s an example: void Push ( Value Val ) { // Get the current top element int iTopIndex = g_Script.Stack.iTopIndex; // Put the value into the current top index g_Script.Stack.pElmnts [ read more..

  • Page - 660

    619 need a good way to quickly push a large block of new elements onto the stack. You can create a function called PushFrame () to do the job for you: void PushFrame ( int iSize ) { // Increment the top index by the size of the frame g_Script.Stack.iTopIndex += iSize; // Move the frame index to the new top of the stack g_Script.Stack.iFrameIndex = g_Script.Stack.iTopIndex; read more..

  • Page - 661

    620 Once an empty frame has been established with PushFrame (), you can use the random access SetStackValue () and GetStackValue () to manipulate its elements. But, like everything you push onto the stack, stack frames must eventually be popped back off when the function returns. This is just as easy as the PushFrame () function—all you do is decre- ment TopIndex by the read more..

  • Page - 662

    621 // Copy the object * pDest = Source; // Make a physical copy of the source string, if necessary if ( Source.iType == OP_TYPE_STRING ) { pDest->pstrStringLiteral = ( char * ) malloc ( strlen ( Source.pstrStringLiteral ) + 1 ); strcpy ( pDest->pstrStringLiteral, Source.pstrStringLiteral ); } } Cool, huh? Now, instead of directly assigning anything to the stack or _RetVal, we just pass read more..

  • Page - 663

    622 Summary Just to round out the discussion and provide a reference, here are all of the functions you’ve cre- ated (directly or indirectly) in this section: The Instruction Stream The following code returns the type of the specified operand in the current instruction. Note the difference between GetOpType () and ResolveOpType (). The first returns the type of the operand as read more..

  • Page - 664

    623 Lastly, once we’ve done all of our operand reading, it’s time to do some writing. We can do this easily with ResolveOpPntr (), which returns a pointer to the Value structure of any operand: Value * ResolveOpPntr ( int iOpIndex ); The Runtime Stack Above all else, stack indices need to be interpreted properly since they can come in positive and negative forms. This read more..

  • Page - 665

    624 This wraps up the interfaces the XVM prototypes major structures will need. With these in place, we can get back to executing scripts. Initializing the VM Before the script can begin execution, the runtime environment must be prepared, which is a sim- ple but vital process. Here’s a rundown of what must be done to set the stage for the script to run: ■ The read more..

  • Page - 666

    625 // If the function table is present, set the entry point if ( g_Script.FuncTable.pFuncs ) { // If _Main () is present, read _Main ()'s index of the function // table to get its entry point if ( g_Script.iIsMainFuncPresent ) { g_Script.InstrStream.iCurrInstr = g_Script.FuncTable.pFuncs [ iMainFuncIndex ].iEntryPoint; } } // Clear the stack g_Script.Stack.iTopIndex = 0; read more..

  • Page - 667

    626 The next two sections require a bit more explanation. The global data in a script always resides at the bottom, which means that if there are four global variables and a global array of 12 elements, declared like this: Var GlobalVar0 Var GlobalVar1 Var GlobalVar2 Var GlobalVar3 Var GlobalArray [ 12 ] The script will need to maintain a total of 16 stack indices, relative read more..

  • Page - 668

    627 Once the global data region has been added, the stack is almost ready to go. The only detail that remains is the _Main () function’s stack frame. _Main () may be a special function, but it needs a stack frame just like any other function the script may define. The stack frame itself is used for slightly simpler purposes, however. Since _Main () doesn’t have read more..

  • Page - 669

    628 On a basic level, this primitive version of the VM will consist mainly of a while loop that encapsu- lates the entire execution cycle and runs until a key is pressed. At each iteration of the loop, a new instruction is processed in full; its effects on the stack and string table are managed and any jumps or function calls it makes are handled. After executing the read more..

  • Page - 670

    629 In a lot of ways the function method is more flexible; for example, DLLs or other forms of dynam- ic libraries could be written that allow the VM to “swap out” entire instruction sets. It also pro- vides better overall encapsulation, because each instruction is in an isolated scope. However, I prefer the switch method for smaller languages like this one and mostly read more..

  • Page - 671

    630 In order to switch to the proper instruction, it helps to assign each opcode to a constant that gives it a more intelligible name. The code then becomes much more readable. Consider this: switch ( Opcode ) { case 0: // Implement Mov break; case 1: // Implement Add break; case 2: // Implement Sub break; } And compare it to this: switch ( Opcode ) { case INSTR_MOV: // read more..

  • Page - 672

    631 case INSTR_SUB: // Implement Sub break; } The latter is obviously a lot easier to follow and understand. Table 10.13 lists these constants. BUILDING THE XVM PROTOTYPE Table 10.13 Instruction Opcode Constants Mnemonic Opcode Constant Mov 0 INSTR_MOV Add 1 INSTR_ADD Sub 2 INSTR_SUB Mul 3 INSTR_MUL Div 4 INSTR_DIV Mod 5 INSTR_MOD Exp 6 INSTR_EXP Neg 7 INSTR_NEG Inc 8 INSTR_INC Dec 9 INSTR_DEC And 10 read more..

  • Page - 673

    632 With this table, you can easily set up a basic instruction-handling skeleton, like so: // Check the current opcode value switch ( iOpcode ) { case INSTR_MOV: // Implement Mov break; case INSTR_ADD: // Implement Mov break; case INSTR_SUB: // Implement Mov break; 10. BASIC VM DESIGN AND IMPLEMENTATION Table 10.13 Continued Mnemonic Opcode Constant JE 20 INSTR_JE JNE 21 INSTR_JNE JG 22 read more..

  • Page - 674

    633 // ... case INSTR_PAUSE: // Implement Pause break; case INSTR_EXIT: // Implement Exit break; } As you can see, you’re working your way in from the outside. You started with nothing but data structures, and then created a main loop, and now you have an instruction-handling skeleton. The next stop is each instructions’ behavior. But first, let’s take a quick detour into a read more..

  • Page - 675

    634 // Yes, so unpause the script g_Script.iIsPaused = FALSE; } else { // No, so skip this iteration of the execution cycle continue; } } Simple, huh? Either the pause is over and the flag is cleared, or we just skip this iteration of the loop with continue. You may be wondering where the iCurrTime variable gets its value, however. At each iteration of the execution loop, read more..

  • Page - 676

    635 executing Call or Jmp (for example), IP will point to the function’s entry point or the jump’s tar- get instruction. This means that IP shouldn’t be changed before the next instruction is executed, because it’s already where it needs to be for the next cycle. However, if our code blindly incre- ments IP after executing all instructions, we’re going to run into read more..

  • Page - 677

    636 With this final detail out of the way, the skeleton of the execution cycle is pretty much taken care of, so we can get back to the real meat of things-- implementing the instruction set. Operand Resolution As you saw, each instruction’s implementation resides in a case. Within this case, you can break the implementation into phases, as discussed earlier, like this: case read more..

  • Page - 678

    637 Instruction Execution and Result Storage You’ve seen a generic method for resolving operands, so you’re ready to move into the next phase of the instruction’s implementation, which is the execution of its logic and the storage of its results. As you’ll see, storing the results of an instruction is so simple it barely deserves its own phase, so it’ll be almost read more..

  • Page - 679

    638 // Copy the source operand into the destination CopyValue ( & Dest, Source ); // Use ResolveOpPntr () to get a pointer to the destination Value // structure and move the result there * ResolveOpPntr ( 0 ) = Dest; break; Figure 10.31 illustrates how Mov works. 10. BASIC VM DESIGN AND IMPLEMENTATION Figure 10.31 Mov in action. Pretty simple, huh? All it does is the read more..

  • Page - 680

    639 case INSTR_ADD: Add Op0, Op1 // Get a local copy of the destination operand (operand index 0) Value Dest = ResolveOpValue ( 0 ); // Add the source to the destination if ( Dest.iType == OP_TYPE_INT ) Dest.iIntLiteral += ResolveOpAsInt ( 1 ); else Dest.fFloatLiteral += ResolveOpAsFloat ( 1 ); // Use ResolveOpPntr () to get a pointer to the destination Value // structure read more..

  • Page - 681

    640 Conditional Branching Implementation The jump instructions are a little bit different than Mov and the binary operations, but they’re nothing you can’t handle. To start things off, let’s look at what’s by far the simplest branch instruction, Jmp— the unconditional jump. case INSTR_JMP: { // Jmp Label // Get the index of the target instruction (opcode index 0) int read more..

  • Page - 682

    641 switch ( Op0.iType ) { case OP_TYPE_INT: if ( Op0.iIntLiteral == Op1.iIntLiteral ) iJump = TRUE; break; case OP_TYPE_FLOAT: if ( Op0.fFloatLiteral == Op1.fFloatLiteral ) iJump = TRUE; break; case OP_TYPE_STRING: if ( strcmp ( Op0.pstrStringLiteral, Op1.pstrStringLiteral ) == 0 ) iJump = TRUE; break; } // If the comparison evaluated to TRUE, make the jump if ( iJump ) read more..

  • Page - 683

    642 Function Call Implementation After all you’ve seen, you may be under the impression that the implementation of your function call system will be right up there with the more complex aspects of your virtual machine. Fortunately, this is not the case. You’ve written such a powerful base of helper functions already for working with the stack and routing the flow of read more..

  • Page - 684

    643 ReturnAddr.iInstrIndex = g_Script.InstrStream.iCurrInstr; Push ( ReturnAddr ); // Push the stack frame + 1 (the extra space is for the function index // we'll put on the stack after it) PushFrame ( DestFunc.iLocalDataSize + 1 ); // Write the function index and old stack frame to the top of the stack Value FuncIndex; FuncIndex.iFuncIndex = iFuncIndex; FuncIndex.iOffsetIndex = read more..

  • Page - 685

    644 Ret Of course, you don’t want to strand your script inside a function. Once a Ret instruction is encountered by the VM, it’s time to go home. Take a look at the implementation: case INSTR_RET // Ret // Get the current function index off the top of the stack and use it to // get the corresponding function structure Value FuncIndex = Pop (); Func CurrFunc = read more..

  • Page - 686

    645 // Make the jump to the return address g_Script.InstrStream.iCurrInstr = ReturnAddr.iInstrIndex; break; The instruction begins by popping the function table index off the top of the stack that Call placed there just before it invoked the function. Remember, this value must be on top of the stack when Ret is called, or else none of its logic will work. Because of this, read more..

  • Page - 687

    646 Pause Implementation The last instruction I want to take a look at is Pause, because it has more of an effect on the main loop of the virtual machine than the other functions. Once Pause is called, the execution cycle will ignore the current instruction until the pause duration has elapsed. Here’s the implementation: case INSTR_PAUSE: { // Pause Duration // Get read more..

  • Page - 688

    647 The only real job left at this point is to free the dynamically allocated data structures. This includes the following: ■ The instruction stream and each instruction’s operand list. ■ The runtime stack. ■ The function table. ■ The host API call table. Note that some structures like the script header can be ignored in this phase due to their static allocation. One read more..

  • Page - 689

    648 // Free any strings that are still on the stack for ( int iCurrElmtnIndex = 0; iCurrElmtnIndex < g_Script.Stack.iSize; ++ iCurrElmtnIndex ) if ( g_Script.Stack.pElmnts [ iCurrElmtnIndex ].iType == OP_TYPE_STRING ) free ( g_Script.Stack.pElmnts [ iCurrElmtnIndex ].pstrStringLiteral ); // Now free the stack itself if ( g_Script.Stack.pElmnts ) free ( g_Script.Stack.pElmnts ); // ---- Free read more..

  • Page - 690

    649 awesome, and will give you plenty of power to play with for a while. The finished XtremeScript Virtual Machine will be a fast, powerful, and best of all, multithreaded virtual machine that can communicate easily with the host application. Once the next chapter is finished, ending this section of the book, you’ll be rounding the home stretch and find yourself hip-deep read more..

  • Page - 691

    This page intentionally left blank TEAMFLY Team-Fly® read more..

  • Page - 692

    Advanced VM Concepts and Issues “After Fleet gasses the planet, M.I. mops up.” ——Lieutenant Rasczak, Starship Troopers CHAPTER 11 read more..

  • Page - 693

    652 I t’s on now. Chapter 10 introduced you to the design and implementation of a virtual machine’s core logic, and now you’re going to finish the job by adding the much-needed fea- tures that will allow your runtime environment to fully integrate itself with a game engine. By the time this chapter is through, the XtremeScript Virtual Machine (XVM) will be finished read more..

  • Page - 694

    653 MULTITHREADING The current VM is single-threaded, which means that only one script’s bytecode can be executed at once. Furthermore, the runtime environment’s internal structures only allow for a single script to be stored in memory at any given time, using the g_Script structure. However, because games are naturally based around large numbers of autonomous entities that all read more..

  • Page - 695

    654 these scripts are extremely lightweight—so much so, in fact, that a custom-built threading system would be the best way to capitalize on their small footprints and maximize efficiency. Besides, actually implementing threads is a far better learning experience. Multithreading Fundamentals Let’s start at the beginning. Virtually all operating systems these days are multitasking read more..

  • Page - 696

    655 problem with the cooperative approach is that it relies on programs to govern themselves. If you’ve ever read Lord of the Flies, you know this can only end badly. Figure 11.3 displays the uneven behavior of a cooperative multitasking system. MULTITHREADING Figure 11.3 Cooperative multitask- ing leads to an uneven distribution of proces- sor time. NOTE The term context switch read more..

  • Page - 697

    656 Because of this lack of equality among tasks, a cooperative multitasking system tends to lag and feel noticeably uneven. This is brought on by the fact that each program in memory can poten- tially run at wildly varying intervals, resulting in certain programs with perfect responsiveness and others that feel sluggish and jerky. This issue is significant when dealing with read more..

  • Page - 698

    657 This is known as round-robin scheduling, because each thread is executed in the same sequence every time, as illustrated in Figure 11.5. The mechanism within the operating system that man- ages context switches among tasks and threads is known as the scheduler. MULTITHREADING Figure 11.5 Round-robin time slice scheduling. NOTE The actual definition of a task’s priority can vary. read more..

  • Page - 699

    658 being interrupted. Figure 11.6 illustrates how a single function or procedure can be transparently broken into multiple time slices. 11. ADVANCED VM CONCEPTS AND ISSUES Figure 11.6 A single function can be executed over the course of multiple time slices without the pro- gram’s knowledge. From Tasks to Threads Multitasking is great, but modern applications need even more flexibility read more..

  • Page - 700

    659 Concurrent Execution Issues Despite its obvious utility value and necessity for game development, multithreading is a technol- ogy that brings with it a number of serious issues and caveats. Just as roommates sharing a single bathroom and refrigerator tend to get in each other’s way, threads that share common or global data run a significant risk of stepping on one read more..

  • Page - 701

    660 within the game world to the player’s statistics like the amount of damage the ship has taken or how much ammo is left in the sniper rifle. All of this data is vital to a game’s execution—if the player’s on-screen Y-location were to suddenly jump 400 pixels, for example, it would have a sig- nificant effect on the game’s overall playability. Naturally, threads read more..

  • Page - 702

    661 Atomic Operations One approach to the problem presented by race conditions is to wrap all modifications of shared data in atomic operations. An atomic operation is a block of code that is guaranteed to execute in full without fear of a context switch occurring. Atomic operations are implemented in many ways, varying from one platform to the next, but I’ll discuss a read more..

  • Page - 703

    662 if ( g_Enemy.iX > g_Player.iX ) -- g_Enemy.iX; // Move the enemy closer on the Y-axis if ( g_Enemy.iY < g_Player.iY ) ++ g_Enemy.iY; if ( g_Enemy.iY > g_Player.iY ) -- g_Enemy.iY; With these two threads running concurrently, it won’t be long before they slip out of sync (if they’re even in sync to begin with, which is unlikely). When this happens, the comparisons and read more..

  • Page - 704

    663 if ( g_Enemy.iY > g_Player.iY ) -- g_Enemy.iY; } The scripting system knows now that both of these blocks are critical to the integrity of the game engine’s data overall and will allow them to run in full before a pending context switch can take effect. Figure 11.10 illustrates atomic operations. MULTITHREADING Figure 11.10 Atomic operations allow code blocks to execute in read more..

  • Page - 705

    664 in the last example, neither of them can be active at the same time as the other. If there were three such blocks in the example, two of them would have to remain inactive while the third was performing its operation. No matter how many blocks of code attempt to access a single shared resource, they’re all part of the same critical section and therefore cannot read more..

  • Page - 706

    665 operation that takes place within its particular part of the critical section, this flag is read. If it’s clear, the thread sets the flag and begins its operation. During this time, context switches will reg- ularly occur and interrupt the thread with the time slices of other threads. These other threads may themselves attempt to access the same resource, and therefore read more..

  • Page - 707

    666 semaphore and a mutex is that a mutex treats a resource as either locked or unlocked, thereby allowing only a single thread access to a resource at one time. A semaphore, on the other hand, lets a specific number of threads access the resource concurrently before it denies subsequent requests. Because of this, mutexes are often known as binary semaphores. Race Conditions read more..

  • Page - 708

    667 Loading and Storing Multiple Scripts Now that you have a basic understanding of the concepts behind multithreading, it’s time to get back to reality. Before I get into the serious stuff, I still have to address the basic issue of loading and storing multiple scripts at once. All the multithreading theory in the world won’t matter if you can’t even get more than read more..

  • Page - 709

    668 Straight C arrays, on the other hand, offer the following advantages: ■ Very easy implementation. ■ Extremely fast and simple random or sequential access. And, as expected, the following disadvantages: ■ General inflexibility due to a limit being placed on the number of scripts that can theo- retically be in memory at once. ■ Inefficient memory usage that doesn’t read more..

  • Page - 710

    669 // Runtime tracking int iIsPaused; // Is the script currently paused? int iPauseEndTime; // If so, when should it resume? // Register file Value _RetVal; // The _RetVal register // Script data InstrStream InstrStream; // The instruction stream RuntimeStack Stack; // The runtime stack Func * read more..

  • Page - 711

    670 typedef struct _InstrStream // An instruction stream { Instr * pInstrs; // The instructions themselves int iSize; // The number of instructions in the // stream int iCurrInstr; // The instruction pointer } InstrStream; Two ints and a 32-bit pointer add up to another 12 bytes for this structure, read more..

  • Page - 712

    671 And there you have it. For only 72KB, which isn’t even a tenth of a megabyte, you can support up to 1024 scripts at once—more than enough for most games. So, the first moral of the story is that arrays will hardly waste memory. Secondly, 1024 script structures is huge, which is hardly limiting either. Chances are your game will never even approach that limit, read more..

  • Page - 713

    672 More Robust Error Handling LoadScript () has always returned an error code to the caller in the event that something went wrong, but has glossed over the potential memory allocation errors that can occur when using malloc (). For the time being this wasn’t an issue, but the XVM will soon be an embeddable mod- ule, and therefore have a public interface. A module’s read more..

  • Page - 714

    673 if ( ! ( g_Scripts [ iThreadIndex ].Stack.pElmnts = ( Value * ) malloc ( iStackSize * sizeof ( Value ) ) ) ) return LOAD_ERROR_OUT_OF_MEMORY; Note again the transition from g_Script to g_Scripts []. Let’s now take a look at the code for determining the next free thread index: // ---- Find the next free script index int iFreeThreadFound = FALSE; for ( int read more..

  • Page - 715

    674 Initialization and Shutdown In addition to LoadScript (), it’s now necessary to make some changes to the Init () and ShutDown () functions. Because these functions are primarily responsible for initializing the script structure to the proper default values and freeing it when the XVM exits, they’ll have to be rewrit- ten to work with the entire g_Scripts [] array. read more..

  • Page - 716

    675 PushFrame (), ResolveOpAsInt (), and so on and so forth, are designed to work with the same script. I don’t mean the same script in the sense that they all work with the g_Script structure. Rather, I mean that they all work with the script that is currently executing, which could be any of the scripts in the new g_Scripts [] array. What this means is that read more..

  • Page - 717

    676 As an example, here’s the old version of PushFrame (): void PushFrame ( int iSize ) { // Increment the top index by the size of the frame g_Script.Stack.iTopIndex += iSize; // Move the frame index to the new top of the stack g_Script.Stack.iFrameIndex = g_Script.Stack.iTopIndex; } Here’s the updated version: void PushFrame ( int iSize ) { // Increment the top index by read more..

  • Page - 718

    677 Value GetStackValue ( int iIndex ); void SetStackValue ( int iIndex, Value Val ); void Push ( Value Val ); Value Pop (); void PushFrame ( int iSize ); void PopFrame ( int iSize ); And the function table/host API call table interface: Func GetFunc ( int iIndex ); char * GetHostAPICall ( int iIndex ); There are, however, cases where g_iCurrThread won’t be enough, and a read more..

  • Page - 719

    678 This process loops until either a key is pressed or every thread exits by reaching an Exit instruc- tion. As you can see, this custom-built multithreading system is really quite simple; all it takes is the capability to maintain a thread index and a time slice timer. Now that you understand the overall strategy, let’s break down the details. Tracking Active Threads read more..

  • Page - 720

    679 typedef struct _Script // Encapsulates a full script { int iIsActive; // Is this script structure in use? // Header data int iGlobalDataSize; // The size of the script's global data int iIsMainFuncPresent; // Is _Main () present? int iMainFuncIndex; // _Main ()'s function index // Runtime tracking int iIsRunning; read more..

  • Page - 721

    680 // Set the activation time for the current thread // to get things rolling g_iCurrThreadActiveTime = GetCurrTime (); Now that the first time slice has been invoked, the main loop can begin. Performing a Context Switch At each iteration of the main loop, the first order of business is to determine whether the current time slice has elapsed, and perform a context switch read more..

  • Page - 722

    681 A while loop is entered that cycles through each element of the array. Notice that the current thread is incremented at the top of the loop rather than the bottom; this is because when the loop initially starts, g_iCurrThread will point to the thread that is currently ending, so you need to immediately move past it. The thread index then wraps around to zero if read more..

  • Page - 723

    682 if ( ! iIsStillActive ) iExitExecLoop = TRUE; // Print the exit code PrintOpValue ( 0 ); break; After extracting the exit code operand as usual, the instruction handler sets the current thread’s iIsRunning flag to FALSE. It then creates a flag variable called iIsStillRunning, sets it to FALSE, and loops through each thread in the g_Scripts [] array to find out if any read more..

  • Page - 724

    683 Running Scripts in Parallel with the Host So far, every incarnation of the XVM has been a standalone program that executes scripts in an uninterrupted loop until they terminate, or until the user presses a key. This is fine for demos, as well as standalone virtual machines, but it’s not particularly conducive to embeddable runtime environments that need to execute in read more..

  • Page - 725

    684 Manual Time Slicing vs. Native Threads There are two ways to go about implementing this approach. You could use the operating sys- tem’s native threading system to physically run the game engine and virtual machine in separate threads, allowing you to leave the XVM’s design as it is and forget about it entirely, or you can do everything yourself and manually implement read more..

  • Page - 726

    685 Thinking in Multiple Dimensions It’s extremely important that you not confuse the XVM’s time slice with the time slices assigned to each script. Remember, regardless of how many scripts are in memory, or what their time slices may be, the XVM itself will only run for the duration specified by RunScripts ()'s caller. Within the XVM’s overall time slice of the game read more..

  • Page - 727

    686 Introducing the Integration Interface The integration interface between the host application and the scripts running inside the VM comes down to two major aspects in most scripting systems—the capability to make inter-lan- guage function calls, as well as the capability to “track” global variables. Function calls are the most obvious way to communicate, because they allow you to read more..

  • Page - 728

    687 the host’s. Furthermore, because these parameters have no explicit type, special functions must be used to read parameters from a specific stack index and with a specific data type in mind. Return values are much easier; all that’s necessary is to set the value of the _RetVal register stored within the script’s Script structure. Calling Script Functions from the Host read more..

  • Page - 729

    688 11. ADVANCED VM CONCEPTS AND ISSUES Figure 11.21 Asynchronous function calls interrupt the flow of execution for both the game engine and the script. Figure 11.22 Synchronous function calls follow the existing flow of the scripts and game engine, and therefore execute over time as opposed to immediately. read more..

  • Page - 730

    689 To put it another way, synchronous calls are a way for the host applica- tion to simulate the Call instruction. If a script were to call one of its own functions just before the XVM’s time slice ended, the called function wouldn’t begin executing until the next time slice rolled around. Also, unless it was extremely small, it prob- ably wouldn’t return for at least read more..

  • Page - 731

    690 host global, such that the script-defined variable always mirrors its value. This way, if the script wants to constantly refer to a host application variable’s value, whether for the purpose of read- ing, writing, or both, it can do so in a more natural way without making a ton of function calls. Figure 11.24 illustrates this concept. 11. ADVANCED VM CONCEPTS AND ISSUES read more..

  • Page - 732

    691 you can’t just refer to it like this within the assembler (assume BindToHostVar is an XASM directive for binding script variables to host variables): Var MyVar BindToHostVar MyVar, g_iGlobalInt One solution to this problem is to give the host application the capability to assign a numeric index to each of the globals it’d like to expose to the script, perhaps with a read more..

  • Page - 733

    692 Binding Stack Indexes to the Pointer Array Even variables that the script binds to the host application reside somewhere on the stack. This stack index, therefore, is all you need to keep the script’s variable in sync with the global defined by the host. Therefore, the BindToHostVar directive discussed earlier needs to save the specified variable’s stack index in a read more..

  • Page - 734

    693 Keeping the Values Synchronized At each frame of the game loop, the game engine and scripting system will execute in almost entirely separate phases. With the exception of inter-language function calls, which I won’t be addressing in this section, the game engine will be entirely halted while RunScripts () is running. For the rest of the frame, the scripting system is read more..

  • Page - 735

    694 #define HOST_VAR_TYPE_INT 0 #define HOST_VAR_TYPE_FLOAT 1 #define HOST_VAR_TYPE_STRING 2 BindVarToIndex ( g_iGlobalInt, 0, HOST_VAR_TYPE_INT ); Of course, this also means the g_pGlobalVars [] array needs to become an array of structures, wherein each element stores both the pointer and its type: typedef struct TrackedVar { void * pVar; int iType; }; TrackedVar g_TrackedVars [ read more..

  • Page - 736

    695 void Init (); void ShutDown (); int LoadScript ( char * pstrFilename, int & iScriptIndex, int iThreadTimeslice ); void UnloadScript ( int iThreadIndex ); void ResetScript ( int iThreadIndex ); void RunScripts ( int iTimesliceDur ); With these functions, the host application can initialize and shut down the system, load and unload scripts, reset them arbitrarily, and execute a read more..

  • Page - 737

    696 Because of this, it’s important to transform or mangle your function names in such a way that they’re less likely to step on the host application’s toes. The easiest way to do it is to follow the time-honored tradition of prefixing your function names with a brief abbreviation of your script- ing system’s name (usually two letters) and an underscore. So, in the read more..

  • Page - 738

    697 tion of the integration interface. This will mostly boil down to the ability to make inter-language function calls, but as you’ll see, this is hardly a trivial matter. Basic Script Control Functions Just before getting into the nitty-gritties of the host API and other such issues, however, let’s start off with something simple and talk about the functions the host will read more..

  • Page - 739

    698 void XS_StartScript ( int iThreadIndex ) { // Make sure the thread index is valid and active if ( ! IsThreadActive ( iThreadIndex ) ) return; // Set the thread's execution flag g_Scripts [ iThreadIndex ].iIsRunning = TRUE; // Set the current thread to the script g_iCurrThread = iThreadIndex; // Set the activation time for the current // thread to get things rolling read more..

  • Page - 740

    699 Of course, this macro calls another macro, IsValidThreadIndex (). This one just makes sure that the specified thread index is within the proper range: #define IsValidThreadIndex( iIndex ) \ ( iIndex < 0 || iIndex > MAX_THREAD_COUNT ? FALSE : TRUE ) Together, these two macros provide an easy and quick way to make the public script control func- tions more robust. The last read more..

  • Page - 741

    700 Host API Calls You can begin your descent into the maddening world of the integration layer with host API calls. Host API calls are made from the script, and allow it to call functions written in C (or whatever the host application is written with) just like it’d call a typical script-defined function. The only difference is the use of the CallHost instruction read more..

  • Page - 742

    701 The Structure The first order of business is creating a structure to store the API within the XVM. As mentioned, this is really just an array of structures, wherein each structure represents a single API function. Let’s start with this structure’s definition: typedef struct _HostAPIFunc // Host API function { int iIsActive; // Is read more..

  • Page - 743

    702 Adding Host API Functions With the array decided upon, the host application needs an easy way to add functions to it. This process is called registering a host API function, and is handled with the function XS_RegisterHostAPIFunc (): void XS_RegisterHostAPIFunc ( int iThreadIndex, char * pstrName, HostAPIFuncPntr fnFunc ) { // Loop through each function in the host API until a read more..

  • Page - 744

    703 // Set the function to active g_HostAPI [ iCurrHostAPIFunc ].iIsActive = TRUE; } } } This function makes use of the usual technique of looping through an array until the first free element is found. Once an inactive structure is located, it’s populated with the function’s data, which pretty much comes directly from the XS_RegisterHostAPIFunc ()’s parameters, and the read more..

  • Page - 745

    704 case INSTR_CALLHOST: { // Use operand zero to index into the host API call table and // get the host API function name Value HostAPICall = ResolveOpValue ( 0 ); int iHostAPICallIndex = HostAPICall.iHostAPICallIndex; // Get the name of the host API function char * pstrFuncName = char * pstrFuncName = GetHostAPICall ( iHostAPICallIndex ); // Search through the host API until the read more..

  • Page - 746

    705 if ( iMatchFound ) g_HostAPI [ iHostAPIFuncIndex ].fnFunc ( g_iCurrThread ); break; } The first task is reading the value of operand zero, which is an index into the script’s host API call table where the function name string can be found. This index is passed to GetHostAPICall () to retrieve the name of the function the instruction is trying to call. This string is read more..

  • Page - 747

    706 string to the console. For further illustrative purposes, the function will return a string value as well. Here’s the function: void HAPI_PrintString ( int iThreadIndex ) { char * pstrString = XS_GetParamAsString ( iThreadIndex, 0 ); int iCount = XS_GetParamAsInt ( iThreadIndex, 1 ); for ( int iCurrString = 0; iCurrString < iCount; ++ iCurrString ) printf ( "%s\n", read more..

  • Page - 748

    707 Reading Parameters Remember, even from the perspective of a C-defined function, the parameters passed from a script always reside on the thread’s runtime stack and are thus inaccessible as formally defined C parameters. For this reason, a number of functions exist to extract parameters and cast them to a specific data type. Remember also that although the XVM is typeless, read more..

  • Page - 749

    708 CoerceValueToInt () and returns it. This pattern is followed by the other two functions, but you can see for yourself by checking out the included XVM source code on the accompanying CD. Returning Values Returning values is almost criminally easy. Because the Value structure behind the _RetVal register is freely available in the thread’s corresponding Script structure, all you read more..

  • Page - 750

    709 { // Clear the parameters off the stack g_Scripts [ iThreadIndex ].Stack.iTopIndex -= iParamCount; // Put the return value and type in _RetVal Value ReturnValue; ReturnValue.iType = OP_TYPE_STRING; ReturnValue.pstrStringLiteral = pstrString; CopyValue ( & g_Scripts [ iThreadIndex ]._RetVal, ReturnValue ); } Instead of simply assigning the string pointer to _RetVal, it’s first encapsulated by read more..

  • Page - 751

    710 void XS_ReturnFromHost ( int iThreadIndex, int iParamCount ) { // Clear the parameters off the stack g_Scripts [ iThreadIndex ].Stack.iTopIndex -= iParamCount; } #define XS_Return( iThreadIndex, iParamCount ) \ { \ XS_ReturnFromHost ( read more..

  • Page - 752

    711 Remember, parameters are always pushed in the reverse order in which they’re read, so you push the count before the string in this case (because HAPI_PrintString () read the string first). CallHost then calls the function, and the XVM takes over from there. After the function returns, any return value it may have issued will be available in _RetVal. In this example, read more..

  • Page - 753

    712 Updating XASM The changes that must be made to XASM are minimal to say the least—it’s just a matter of writ- ing the name string along with each function record that’s written to the .XSE’s function table. Here’s the code responsible for emitting the assembled function table in the assembler’s BuildXSE () function, with the new code in bold: // Write out the read more..

  • Page - 754

    713 // Write the entry point (4 bytes) fwrite ( & pFunc->iEntryPoint, sizeof ( int ), 1, pExecFile ); // Write the parameter count (1 byte) cParamCount = pFunc->iParamCount; fwrite ( & cParamCount, 1, 1, pExecFile ); // Write the local data size (four bytes) fwrite ( & pFunc->iLocalDataSize, sizeof ( int ), 1, pExecFile ); // Write the function name length (1 byte) char read more..

  • Page - 755

    714 I also like to refer to synchronous calls as invoking a script function, and asynchronous calls as calling a script function. For this reason, synchronous calls are made with the XS_InvokeScriptFunc () function: void XS_InvokeScriptFunc ( int iThreadIndex, char * pstrName ); Simple, huh? Pass it the thread index in which the function resides, as well as the function’s name, read more..

  • Page - 756

    715 // Push the stack frame + 1 (the extra space is // for the function index we'll put on the stack after it) PushFrame ( iThreadIndex, DestFunc.iLocalDataSize + 1 ); // Write the function index and old stack frame // to the top of the stack Value FuncIndex; FuncIndex.iFuncIndex = iIndex; FuncIndex.iOffsetIndex = iFrameIndex; SetStackValue ( iThreadIndex, g_Scripts [ iThreadIndex read more..

  • Page - 757

    716 With the function call logic of the XVM embodied in a more modular way, you can implement XS_InvokeScriptFunc () easily. Here’s the code: void XS_InvokeScriptFunc ( int iThreadIndex, char * pstrName ) { // Make sure the thread index is valid and active if ( ! IsThreadActive ( iThreadIndex ) ) return; // Get the function's index based on its name int iFuncIndex = read more..

  • Page - 758

    717 [ iFuncIndex ].pstrName ) == 0 ) return iFuncIndex; } // A match wasn't found, so return -1 return -1; } Nothing to it—just scan through the array until the specified function name matches something, and return the corresponding index. Return -1 if a match isn’t found. Passing Parameters Calling functions without parameters is a decent capability, and is more than useful read more..

  • Page - 759

    718 Nothing tricky going on here. The parameter comes in, it’s stuffed into a Value structure called Param, and is pushed onto the stack, as shown in Figure 11.31. Done deal. Of course, like always, strings have to ruin the fun and require a bit of special attention: void XS_PassStringParam ( int iThreadIndex, char * pstrString ) { // Create a Value structure to read more..

  • Page - 760

    719 Fortunately, this really isn’t a problem. Synchronous calls aren’t meant to be used to calculate values or return information about the script; rather, they’re meant for long- term behavior and actions. For example, if an enemy’s AI was implemented in functions that corresponded to each of its major behavioral states, each of which contained an infinite loop that would run read more..

  • Page - 761

    720 halt the program until it’s finished, and optionally return a value. Asynchronous calls are good for making quick or immediate changes to the script, reading the value of a script variable wrapped in a “getter” function, or any other task that must be executed within the script, but immediately. Figure 11.32 demonstrates asynchronous calls. 11. ADVANCED VM CONCEPTS AND ISSUES read more..

  • Page - 762

    721 ■ The function must return upon execution of the proper Ret instruction. When the asyn- chronously called function returns, control must be returned to the host application, not the script. On the surface the solution to this problem may seem as easy as halting execu- tion of the script when the first Ret is encountered, but this won’t work if the function ends up read more..

  • Page - 763

    722 if ( iCurrTime > g_iCurrThreadActiveTime + g_Scripts [ g_iCurrThread ].iTimesliceDur || ! g_Scripts [ g_iCurrThread ].iIsRunning ) { // Loop until the next thread is found while ( TRUE ) { // Move to the next thread in the array ++ g_iCurrThread; // If we're past the end of the array, loop back around if ( g_iCurrThread >= MAX_THREAD_COUNT ) g_iCurrThread = 0; // If the read more..

  • Page - 764

    723 The Stack Base The next issue is a bit more subtle, but vitally important nonetheless. As I said, it’s important that the asynchronous function return control to the host as soon as it finishes executing, rather than returning control to the originally running part of the script. Like I also said, it’s tempting to sim- ply solve this problem by creating a flag read more..

  • Page - 765

    724 You can implement the stack base marker as a simple Value type constant. In addition to OP_TYPE_INT and OP_TYPE_REG, you now have OP_TYPE_STACK_BASE_MARKER: #define OP_TYPE_STACK_BASE_MARKER 9 // Marks a stack base Creating the marker is a simple matter of setting the iType field of the function index’s Value structure at the top of the stack to this read more..

  • Page - 766

    725 The instruction works just like it always did, except that any function index element whose iType field has been modified to mark the base of the stack will cause the execution loop to terminate. This, in combination with the capability to run in a single-threaded mode, is almost everything you need to safely execute an asynchronous function call. An Infinite Time Slice read more..

  • Page - 767

    726 void XS_CallScriptFunc ( int iThreadIndex, char * pstrName ) { // Make sure the thread index is valid and active if ( ! IsThreadActive ( iThreadIndex ) ) return; // ---- Calling the function // Preserve the current state of the VM int iPrevThreadMode = g_iCurrThreadMode; int iPrevThread = g_iCurrThread; // Set the threading mode for single-threaded execution g_iCurrThreadMode = read more..

  • Page - 768

    727 // Restore the VM state g_iCurrThreadMode = iPrevThreadMode; g_iCurrThread = iPrevThread; } The function begins with the usual check to determine whether the specified thread index is valid and active. It then saves the current threading mode and thread index. This is done to restore the XVM to the exact state it was in before the call was made. As for exactly why the read more..

  • Page - 769

    728 marker is then set; the top stack element, containing the function index, is read from the stack and changed to OP_TYPE_STACK_BASE_MARKER. The modified function index is then written back to the stack in the same position, and XS_RunScripts () is called with an infinite time slice. This will execute the function in isolation until it returns, at which point the state of read more..

  • Page - 770

    729 For example, consider a scenario in a game involving four computer-controlled enemy characters and a floating power-up item. The enemies, power-up, and the level itself are all scripted sepa- rately, meaning there are currently six threads executing within the XVM. It’s most likely that the player is directly interacting with the enemies—he or she may be engaged in a read more..

  • Page - 771

    730 the enemies suddenly began to falter, however, the game play experience would be severely jarred. The time graph of Figure 11.37 shows how priority-based threading helps distribute the virtual CPU’s load more effectively. 11. ADVANCED VM CONCEPTS AND ISSUES Figure 11.37 Priority-based multi- threading over time. Because of this, it’s important that the scheduler recognize the read more..

  • Page - 772

    731 There are two ways to define a script’s priority. On the simplest level, each thread would simply request a specific time slice duration, expressed in milliseconds. For example, a medium-priority might ask for 20 milliseconds, whereas a high priority thread would ask for 50. A lower-priority thread might be content with just 10. This approach can become tedious, however, if read more..

  • Page - 773

    732 11. ADVANCED VM CONCEPTS AND ISSUES Table 11.3 Updated 0.8 .XSE Main Header Code Definition 0 User-defined time slice duration (no priority rank) 1 Low priority 2 Medium priority 3 High priority Table 11.4 Updated 0.8 .XSE Main Header Name Size (in Bytes) Description ID String 4 Four-character string containing the .XSE ID, “XSE0” Version 2 Version number; (first byte is major, read more..

  • Page - 774

    733 Updating XASM Of course, in order to generate this updated version of the 0.8 .XSE format, XASM will need a bit of an upgrade too. Specifically, it needs to produce executables using the new format, and inter- pret a new directive—SetPriority. The SetPriority directive accepts a single parameter—either an integer literal value corresponding to the desired time slice read more..

  • Page - 775

    734 Parsing the SetPriority Directive The SetPriority directive is parsed in a manner similar to SetStackSize, so the code should look rather familiar. Let’s take an initial look: case TOKEN_TYPE_SETPRIORITY: // SetPriority can only be found in the global scope, so make // sure you aren't in a function. if ( iIsFuncActive ) ExitOnCodeError ( ERROR_MSSG_LOCAL_SETPRIORITY ); // It can read more..

  • Page - 776

    735 else if ( stricmp ( g_Lexer.pstrCurrLexeme, PRIORITY_HIGH_KEYWORD ) == 0 ) g_ScriptHeader.iPriorityType = PRIORITY_HIGH; else ExitOnCodeError ( ERROR_MSSG_INVALID_PRIORITY ); break; // Anything else should cause an error default: ExitOnCodeError ( ERROR_MSSG_INVALID_PRIORITY ); } // Mark the presence of SetStackSize for future encounters g_iIsSetPriorityFound = TRUE; break; When SetPriority is the read more..

  • Page - 777

    736 The Script Structure First up, the Script function needs to be augmented with a field specifying the script’s time slice duration in milliseconds. Here’s the new structure with the added field in bold: typedef struct _Script // Encapsulates a full script { int iIsActive; // Is this script structure in use? // Header data int read more..

  • Page - 778

    737 #define THREAD_PRIORITY_DUR_LOW 20 // Low-priority thread time slice #define THREAD_PRIORITY_DUR_MED 40 // Medium-priority thread time slice #define THREAD_PRIORITY_DUR_HIGH 80 // High-priority thread time slice Loading Version 0.8 Scripts XS_LoadScript () is then updated to recognize the .XSE format modification. There is one twist though; as an added bonus, I thought it’d be read more..

  • Page - 779

    738 case XS_THREAD_PRIORITY_HIGH: g_Scripts [ iThreadIndex ].iTimesliceDur = THREAD_PRIORITY_DUR_HIGH; break; } The priority-type code is read in first. If it specifies a user-defined thread, that value is immed- iately stuffed into iTimesliceDur. Otherwise, a switch block is entered to assign the proper THREAD_PRIORITY_* constant. Either way, by the time all scripts are loaded, their read more..

  • Page - 780

    739 if ( g_Scripts [ g_iCurrThread ].iIsActive && g_Scripts [ g_iCurrThread ].iIsRunning ) break; } // Reset the time slice g_iCurrThreadActiveTime = iCurrTime; } } As you can see, the only major change is the fact that the former generic time slice duration con- stant has been replaced with the script’s own iTimesliceDur field. DEMONSTRATING THE FINAL XVM To wrap things up, read more..

  • Page - 781

    740 ; 8.28.2002 ; Author. ; Alex Varanese ; ---- Directives -------- SetStackSize 512 SetPriority Low ; ---- Functions ----- ; ---- Simple function for doing random stuff Func DoStuff { ; Print a string sequence on the host side Push 1 Push "The following string sequence was printed by the host app:" CallHost PrintString Push 4 Push read more..

  • Page - 782

    741 ; Return a value to the host Push 1 Push "Returning Pi to the host..." CallHost PrintString Mov _RetVal, 3.14159 } ; ---- Function to be invoked and run alongside a host application loop Func InvokeLoop { ; Print a string infinitely LoopStart: Push 1 Push "Looping..." CallHost PrintString Pause 200 Jmp read more..

  • Page - 783

    742 Defining the Host API The demo’s “host API” is really just one function, called HAPI_PrintString (), which will allow the script to print output to the console. Here’s its definition: void HAPI_PrintString ( int iThreadIndex ) { // Read in the parameters char * pstrString = XS_GetParamAsString ( iThreadIndex, 0 ); int iCount = XS_GetParamAsInt ( iThreadIndex, 1 ); // read more..

  • Page - 784

    743 It then declares integer variables to hold an error code and thread index, and calls XS_LoadScript () to load the assembled .XSE demo script: // Declare the thread indexes int iThreadIndex; // An error code int iErrorCode; // Load the demo script iErrorCode = XS_LoadScript ( "script.xse", iThreadIndex, XS_THREAD_PRIORITY_USER ); Multithreading won’t play a role in this demo, read more..

  • Page - 785

    744 printf ( ".\n" ); return 0; } else { // Print a success message printf ( "Script loaded successfully.\n" ); } printf ( "\n" ); To get things going, the HAPI_PrintString () function is registered with the XVM under the sim- pler name PrintString (), and the script is started to let the XVM know that its code is exe- cutable: // Start up the script read more..

  • Page - 786

    745 At this point, the script’s functionality has been demonstrated, so you can shut everything down with a simple call to XS_ShutDown (). // Free resources and perform general cleanup XS_ShutDown (); The Output What fun would all this be if you couldn’t see the output, huh? Upon running the host applica- tion demo, you’ll see this (of course, it’s more interesting to read more..

  • Page - 787

    746 Looping... Looping... Looping... Looping... Looping... Looping... Looping... Looping... Cool, huh? It may be simple, but this output represents a totally finished and fully integrated vir- tual machine. Game scripting ahoy! SUMMARY Whew! With priority-based multithreading and a feature-rich integration interface, your now- embeddable XVM has become quite a slick little piece of software. You read more..

  • Page - 788

    747 CHALLENGES ■ Intermediate: Implement a mutex and/or semaphore system to protect shared resources within the game engine, and create a set of host API functions for locking and unlocking them. ■ Intermediate: Implement a thread priority system in which all threads are given the same time slice, but are invoked more or less frequently depending on their rank. ■ Difficult: read more..

  • Page - 789

    This page intentionally left blank read more..

  • Page - 790

    Part Six Compiling High-Level Code read more..

  • Page - 791

    This page intentionally left blank TEAMFLY Team-Fly® read more..

  • Page - 792

    Compiler Theory Overview “I didn’t say it would be easy, Neo. I just said it would be the truth.” ——Morpheus, The Matrix CHAPTER 12 read more..

  • Page - 793

    752 A t last. After working your way through page after page of prerequisite information and concepts, after enduring an 11-chapter build-up, and after completing two thirds of the XtremeScript system, you’re finally on the brink of what will undoubtedly be both the most com- plex and most rewarding aspect of designing a custom scripting language. Compiling high-level code is one read more..

  • Page - 794

    753 Just to make sure you’re up to speed on a few things, let’s review some of the terms and concepts I’ve attempted to drill into your head over the course of the chapters that have led up to now: ■ High-Level Languages, or HLLs, are languages that are designed to mimic human-readable languages like English for the purpose of clearly describing algorithms, expressions, read more..

  • Page - 795

    754 to be confused with passes, which I’ll cover separately). The first phase involved a basic processing of the incoming source code; whitespace was removed, comments were stripped away, and so on. The next phase was known as lexical analysis, in which the source code stream was broken into streams of tokens and lexemes. This stream was then fed into a parser, which was read more..

  • Page - 796

    755 simplistic components, the magic is demystified. If there’s one thing that all software engineers should understand—incredible complexity can be attained simply by combining the right pieces in the right way. This is exactly how the construction of a compiler is approached. Chapter 5 saw your first real introduction to the phases of a compiler. In Chapter 9, you even read more..

  • Page - 797

    756 and produces two forms of output; a stream of lexemes and a stream of tokens. The lexeme stream is rather similar to the original source code, except that each unique “word” or “component” has been isolated. The previous line would be returned from the lexer in this order: X = MyVar * 2 ; This is definitely an improvement, because it’s a lot easier to analyze each read more..

  • Page - 798

    757 The lexer allows you to think of the source code in much higher-level, abstracted terms. No longer is it necessary to hunt and peck your way through a raw chunk of character data; instead, you now have a simple but significant glimpse of what the source code means. It doesn’t take a PhD to understand that anything becomes simpler if you can isolate and group read more..

  • Page - 799

    758 Of course, the lexer doesn’t physically delete anything; but this is how it will perceive the string from now on. With the whitespace out of the way, the first character of the lexeme itself will be read and the lexeme extraction process will begin. It will start with the character M and work its way through yVar until the first whitespace character is encountered. read more..

  • Page - 800

    759 is in one of a finite number of states, and contains code that allows it to transition to other states based on certain circumstances. State machines are great because they’re written in such a generic manner that any number of tokens can be processed by a single character-processing loop. At each iteration of the loop, a new character is read from the input stream read more..

  • Page - 801

    760 Parsing The stream of tokens and lexemes generated by the lexer in the lexical analysis phase is fed directly to the parser for the parsing phase. Parsing is the process of analyzing incoming tokens and determining how they fit into the language. Parsing is quite possibly the most complex part of a basic compiler, and there are numerous ways to go about doing it. read more..

  • Page - 802

    761 As humans, we know upon first glance that we’re dealing with a while loop. We know this because the first word we saw on the line was the while keyword itself. If that keyword had been anything else, we wouldn’t have come to the conclusion that we were looking at such a loop. Beyond this, we know that the criteria of the loop is a Boolean expression. read more..

  • Page - 803

    762 it will hit the closing parenthesis, which is back within the jurisdiction of the while loop parsing logic. This is all very visual, so check out Figure 12.7. As you may be starting to suspect, a recursive descent parser has a separate parsing mechanism for every major language feature. For example, when a while token is read, the while loop parser is activated. When read more..

  • Page - 804

    763 So, recursive descent parsing is heavily defined by its repetitious use of nested, and often times, recursive function calls. For a specific example of recursion, consider the following expression: X = 8 * ( 16 + Y ) / Z; You have one overall expression, but there are definitely “sub-expressions” within it. 8 * ( 16 + Y ) / Z is a large expression, with smaller read more..

  • Page - 805

    764 and bottom-up parsing is analogous to that of brute force and state machine based tokenization. Both brute force tokenization and top-down parsing require the compiler to be written in a spe- cific, nearly hard-coded fashion that gets a very specific job done with no fuss. State machine tok- enizers and bottom-up parsers, on the other hand, are based around simplistic and read more..

  • Page - 806

    765 I-Code The result of the parsing and semantic analysis phases is a version of the script’s source code, represented entirely in I-code. I-code stands for Intermediate Code, and is a clean and structured way to store the script internally without worrying about the details of the source language. I-code is very similar to assembly language or machine code, because it’s based read more..

  • Page - 807

    766 Single-Pass versus Multi-Pass Compilers As initially explained in Chapter 9, compilers and assemblers can be categorized based on the number of passes they make over the source code. A pass is defined as any complete scan of the entire source code, regardless of what information it’s used to collect. Single-pass compilers are capable of fully understanding and translating the read more..

  • Page - 808

    767 future passes will have this information readily available in full no matter where they are, thereby allowing Func0 () to call Func1 (). Once again, referring to an identifier before its declaration is called a forward reference, and is very important in the context of functions. Of course, especially in the case of particularly huge programs (which high-end compilers deal read more..

  • Page - 809

    768 Target Code Emission Implementing the last phase of a compiler requires a solid understanding of the target platform, because it revolves around the translation of I-code to executable machine code or assembly lan- guage. In either case, because the I-code is often such a simplified representation of the program, considerable work is involved with this conversion. The 80x86, for read more..

  • Page - 810

    769 Compiler Compilers Compilers are all over the place; they exist in huge numbers and have limitless applications in all sorts of language and data translation fields. Because of this, it was inevitable that someone would finally sit down and create a set of tools to help automate the process of creating a new compiler. These tools usually consist of programs that can read more..

  • Page - 811

    770 Because XASM is such a high-level assembler, with built-in support for variables, arrays, and even functions, it’d be silly not to leverage all that power. So, rather than directly produce a finished .XSE, the XtremeScript compiler instead generates an ASCII-based .XASM file containing XVM assembly code that will be automatically fed to XASM to get the final executable read more..

  • Page - 812

    771 Advanced Compiler Theory Topics You should understand how the basics work, at least conceptually, and what specific topics will apply most significantly to the XtremeScript compiler and how. But compiler theory is a huge subject—one that I couldn’t hope to do justice in the context of a book like this, so don’t forget that even at their most complex, the things read more..

  • Page - 813

    772 to notice the large-scale patterns and relationships that ultimately lead to the optimizations that you might notice at first glance alone. Of course, real compilers like Microsoft Visual C++ have been in a constant state of evolution, the brunt of which has been focused specifically on optimizing the code they generate. Scores of math-heavy algorithms and techniques have read more..

  • Page - 814

    773 Preprocessing Anyone who’s used a C compiler before will be familiar with the concept of preprocessing. A pre- processor is a special layer of software that sits between the source code and the lexical analyzer, adding an additional early phase to the compilation process. The preprocessor filters and trans- forms the incoming source code according to special directives written read more..

  • Page - 815

    774 file1.c void Func1 () { printf ( "This is function one." ); } file2.c #include "file0.c" #include "file1.c" void Func2 () { printf ( "This is function two." ); } main () { Func0 (); Func1 (); Func2 (); return 0; } Without the help of the preprocessor and its #include directive, file2.c would not compile. Even if the functions Func0 () and Func1 () read more..

  • Page - 816

    775 void Func1 () { printf ( "This is function one." ); } void Func2 () { printf ( "This is function two." ); } main () { Func0 (); Func1 (); Func2 (); return 0; } Check out Figure 12.15 to see a more visual take on this process. Because the #include lines were physically replaced with the contents of the files they specified, the compiler never knew they were read more..

  • Page - 817

    776 Macro Expansion Macros are another popular feature of the C preprocessor, and are a great way to define symbolic constants or encapsulate logic without using a function. Macros in C are defined with the #define statement, which simply replaces all instances of the macro’s name with its value. For instance, consider the following constants, each of which are defined with read more..

  • Page - 818

    777 The previous line of code associates char * with the name String, so you could declare a string- returning function like this: string MyFunc (); The preprocessor will automatically expand this out to the following before the compiler gets its hands on it: char * MyFunc (); Notice that here, the macro name was replaced with an entire string of text, containing spaces and read more..

  • Page - 819

    778 Retargeting You learned earlier that a compiler can be split into two distinct halves: the front end and the back end. The front end is in charge of turning the source language into I-code, whereas the back end is in charge of translating that I-code to a specific assembly language like XVM assembly. What you may notice here, however, is that the front and back read more..

  • Page - 820

    779 Retargeting has become a ubiquitous practice with the emergence of so many new platforms. Specifically in the case of console gaming, C and C++ compilers are needed for multitudes of hardware, ranging from the Gameboy Advance to the Playstation II, to the Xbox. Many of the compilers used to write code for these systems are simply retargeted versions of typical 80x86 read more..

  • Page - 821

    780 Imagine that the second thread contained three jumps, to addresses 22, 481, and 1906. Because these addresses are relative to a base address of zero, which the second thread doesn’t have, the real base address will need to be added to each jump target address so that the jumps will once again point to the proper instructions. The new jump targets will therefore be read more..

  • Page - 822

    781 plistic and custom-designed language like XtremeScript. The script editor for Quake 3, for exam- ple, is capable of producing both hardware machine code DLLs and virtual machine-compatible executable scripts. Targeting a hardware platform is hardly a trivial matter, however. The virtual machine in this book is designed with the utmost of simplicity and ease of use in mind; chief read more..

  • Page - 823

    782 SUMMARY If anything, this chapter has served as a much-deserved break after pressing through the work- load of Chapters 9 through 11. Unfortunately, it’s more like the calm before the storm, however, because there won’t be a moment’s rest in the upcoming chapters. Now that you can talk the talk of compiler writers, it’s time to see whether you can handle the read more..

  • Page - 824

    Lexical Analysis “I’m a geneticist—I write code. A, G, T, P, in different combinations.” ——Burchenal, Red Planet CHAPTER 13 read more..

  • Page - 825

    784 A fter all the build-up and preparation, it’s time to really get your hands dirty by building the first major component of the XtremeScript compiler—the lexical analyzer. As you learned in Chapter 9, the lexer is one of the most pivotal phases of a compiler’s pipeline; despite it’s semi-trivial implementation, it provides one of the most straightforward and effective read more..

  • Page - 826

    785 THE BASICS You’ve already learned about the theory and concepts behind lexical analysis fairly thoroughly (in Chapters 9 and 12). The construction of the XASM assembler in Chapter 9 required a structured and robust lexical analyzer, so you should already have a reasonable grasp of what’s going on here. For the sake of completeness, however, and to make these chapters a read more..

  • Page - 827

    786 In one fell swoop, it’s isolated the statement’s major components and separated them so they can be parsed sequentially. It’s also done a bit of clean-up by discarding whitespace and converting everything to uppercase. If you can imagine reading a book one character at a time, perhaps by having a friend look at the pages and verbally tell you each character read more..

  • Page - 828

    787 Tokenization Of course, even with the character stream grouped into lexemes, there’s still a lot the compiler has to do in order to determine what each word means. At the very least, it’ll have to constantly perform string comparisons with strcpy () to determine the difference between 3.14159, IF, and +=. It’d be nice if the lexer would not only produce the lexemes, read more..

  • Page - 829

    788 framework (see Figure 13.3). Examples of such utilities are lex, a common UNIX and Linux utili- ty, and Flex, lex’s Win32 port. Lexer Generation Utilities I won’t be covering the use of programs like lex and Flex to generate lexers; they’re invaluable when creating real-world compilers, but they obviously don’t shed much light on a lexer’s inner workings. From the read more..

  • Page - 830

    789 Brute Force The lexer you built for the XASM assembler in Chapter 9 was what I like to call a brute-force lexer. It got the job done in a simple and straightforward manner by grouping every character in the stream up to the next delimiter or instance of whitespace into a lexeme, and then performed some basic string analysis to determine exactly what it was. The read more..

  • Page - 831

    790 out brute force approach, but still not completely there. For this reason, I call them “semi-state machine” lexical analyzers. The basic idea is to start by reading the first character from the next lexeme. Based on this initial character, a number of paths can be taken; if a digit or radix point is detected, a numeric token is probably being read. If a letter or read more..

  • Page - 832

    791 State Machines State machines work on a simple principal—perform a task only once at each iteration of a loop, but do it differently depending on the situation. State machines can be applied effectively to string processing, because strings have to be iteratively analyzed—in other words, they must be dealt with on a sequential character-by-character basis. During this read more..

  • Page - 833

    792 As the loop progresses, more and more characters are read in. Each time, their values are used to make a possible state transition. However, as long as digits are read in, the integer state is main- tained. Furthermore, each newly read character is added to the end of an accumulating lexeme buffer. Finally, a character is read. This invokes another state transition—the read more..

  • Page - 834

    793 THE LEXER’S FRAMEWORK You’re going to begin by setting up a basic framework within which you can build the lexer. Specifically, you need a way to: ■ Read a text file from the hard drive, line by line. ■ Store the contents of the text file in a single, contiguous region of memory for easy processing. ■ Display the output of the lexer’s processing—both read more..

  • Page - 835

    794 printf ( "Usage:\tLEXER Source.TXT\n" ); return 0; } // Create a file pointer for the script FILE * pSourceFile; // Open the script and print an error if it's not found if ( ! ( pSourceFile = fopen ( argv [ 1 ], "rb" ) ) ) { printf ( "File I/O error.\n" ); return 0; } With the file open in binary mode, you can use the fseek () command to read more..

  • Page - 836

    795 char cCurrChar; for ( int iCurrCharIndex = 0; iCurrCharIndex < iSourceSize; ++ iCurrCharIndex ) { // Analyze the current character cCurrChar = fgetc ( pSourceFile ); if ( cCurrChar == 13 ) { // If a two-character line break is found, replace // it with a single newline fgetc ( pSourceFile ); -- iSourceSize; g_pstrSource [ iCurrCharIndex ] = '\n'; } else { // Otheriwse use it read more..

  • Page - 837

    796 // Tokenize the entire source file while ( TRUE ) { // Get the next token CurrToken = GetNextToken (); // Make sure the token stream hasn't ended if ( CurrToken == TOKEN_TYPE_END_OF_STREAM ) break; // Convert the token code to a descriptive string switch ( CurrToken ) { // Create a string to represent the token } // Print the token and the lexeme printf ( "%d: Token: read more..

  • Page - 838

    797 Error Handling Error handling won’t be a particularly huge concern of these small demos, but just to keep things clean, unexpected character input will be flagged with the following function: void ExitOnInvalidInputError ( char cInput ) { printf ( "Error: '%c' unexpected.\n", cInput ); exit ( 0 ); } Whenever the lexer reads something it doesn’t understand, it’ll use this read more..

  • Page - 839

    798 As you can see, it has an intentionally extreme amount of whitespace irregularity to make sure the lexer’s robustness is really put through its paces. There are few things in the world more irri- tating than a compiler whose acceptance of whitespace can’t be trusted; we should go to great lengths to ensure that using XtremeScript is just as easy and natural as read more..

  • Page - 840

    799 Seems like a pretty straightforward process, huh? Now that you have a conceptual overview of what the lexer will do, let’s jump into the code. The lexer is primarily implemented with the GetNextToken () function, which performs the previous steps and returns a Token value to the user, indicating the type of the lexeme it read. Just like in XASM, the lexeme is not read more..

  • Page - 841

    800 States and Token Types As the lexer executes, it will frequently transition from one state to the other to follow the format of the input. Rather than just refer to these states as arbitrary numbers, it helps to use symbolic constants to make everything easier to read. The same goes for token types—as you already saw in Chapter 9, tokens can be represented well read more..

  • Page - 842

    801 These indexes are global so that functions like this and others, as well as GetNextToken (), can access them easily. Here’s their declaration: int g_iCurrLexemeStart; int g_iCurrLexemeEnd; With the initialization out of the way, let’s get back to the lexer itself. First, however, let’s quickly cover the string buffer that will be filled with the lexeme by GetNextToken (). read more..

  • Page - 843

    802 // ---- Flag to determine when the lexeme is done int iLexemeDone = FALSE; // ---- Loop until a token is completed // Current character char cCurrChar; // Current position in the lexeme string buffer int iNextLexemeCharIndex = 0; // Should the current character be included in the lexeme? int iAddCurrChar; Once the lexeme indexes have been synchronized, iCurrLexState is set to read more..

  • Page - 844

    803 if ( cCurrChar == '\0' ) break; // Assume the character will be added to the lexeme iAddCurrChar = TRUE; Next, the current state is used to determine what should be done with the character. Naturally, to determine what the current state is, you use a switch block. The first state to consider is the start state, represented by the LEX_STATE_START constant. From this read more..

  • Page - 845

    804 // A float is starting else if ( cCurrChar == '.' ) { iCurrLexState = LEX_STATE_FLOAT; } // It's invalid else ExitOnInvalidInputError ( cCurrChar ); break; The first thing the LEX_STATE_START state handler does is look for whitespace. Remember, the beginning of the lexeme is the only place whitespace is valid (because what you call “trailing whitespace” is actually the leading read more..

  • Page - 846

    805 After the check for whitespace, the state handler looks for a numeric digit. No matter what the lexeme turns out to ultimately be (integer or float), the occurrence of a digit in the start state is always interpreted as an integer lexeme, so the LEX_STATE_INT state is transitioned to. Of course, certain floating-point values can still be detected here, if they begin read more..

  • Page - 847

    806 helps readability. If the character isn’t a digit, the handler determines whether it’s a radix point. This isn’t a valid integer character, but it indicates a state transition should be made to LEX_STATE_FLOAT. This should be a good indication of the elegance of the state machine approach—with only a few lines of code, you’ve got a lexer capable of seamlessly read more..

  • Page - 848

    807 { iCurrLexState = LEX_STATE_FLOAT; } // If whitespace is read, the lexeme is done else if ( IsCharWhitespace ( cCurrChar ) ) { iLexemeDone = TRUE; iAddCurrChar = FALSE; } // Anything else is invalid else ExitOnInvalidInputError ( cCurrChar ); break; This state is even simpler than the integer state. Any valid integer digit is added to the lexeme buffer, whitespace terminates the read more..

  • Page - 849

    808 All you’re doing here is appending the current character to the lexeme buffer, assuming the cur- rent state didn’t suppress it, and ending the loop if the lexeme has been flagged as complete. Once the loop ends, there’s a tiny bit of extra housekeeping to do as well: // Complete the lexeme string g_pstrCurrLexeme [ iNextLexemeCharIndex ] = '\0'; // Retract the read more..

  • Page - 850

    809 A Token variable is declared, and a switch is used to determine which state the lexer was in when it finished. It’s pretty self-explanatory. If it ended in LEX_STATE_INT, the token type is TOKEN_TYPE_INT. If it ended in LEX_STATE_FLOAT, the token type is TOKEN_TYPE_FLOAT. If anything else was returned, it must be a pure whitespace string (because if it wasn’t pure read more..

  • Page - 851

    810 // Float case TOKEN_TYPE_FLOAT: strcpy ( pstrToken, "Float" ); break; } // Print the token and the lexeme printf ( "%d: Token: %s, Lexeme: \"%s\"\n", iTokenCount, pstrToken, GetCurrLexeme () ); // Increment the token count ++ iTokenCount; } // Print the token count printf ( "\n" ); printf ( "\tToken count: %d\n", iTokenCount ); The token is used to fill read more..

  • Page - 852

    811 15: Token: Float, Lexeme: "1.0" 16: Token: Integer, Lexeme: "0" 17: Token: Integer, Lexeme: "02345" 18: Token: Integer, Lexeme: "63246" 19: Token: Float, Lexeme: "0.2346" 20: Token: Float, Lexeme: "34.0" Token count: 21 Cool, huh? Using state machines, you’ve lexed a highly free-form source file containing a num- ber of different read more..

  • Page - 853

    812 AND ARRAY DO DOWNTO RECORD REPEAT Because each letter of each of these words is a different state, you can imagine how many transi- tions are represented here. Right off the bat, AND and ARRAY both start with A. So, when A is read, its state has to recognize transitions initiated by both N and R. DO and DOWNTO are even worse, because they share two initial letters; read more..

  • Page - 854

    813 That’s right, just one state needed. From start to finish, every character of an identifier is classi- fied the same way (an alphanumeric digit or underscore), so state transitions aren’t necessary. Furthermore, because reserved words are treated as identifiers until after the lexing phase, they don’t need separate states. Next are the new tokens: #define TOKEN_TYPE_IDENT read more..

  • Page - 855

    814 rEtUrN TRUE false 22 .5 .35 2.0 while 1 0.0 var 1.0 var 0 This_is_an_identifier 02345 _so_is_this___ 63246 0.2346 34.0 Upgrading the Lexer Adding identifier and reserved word support to the lexer is actually quite simple. All that you really need to do is look for valid identifier characters in the start state, use them to transition to an identifier state, and keep reading read more..

  • Page - 856

    815 cChar == '_' ) return TRUE; else return FALSE; } Armed with this function, adding identifier support to the lexer will be a snap. The first thing to do is add a check for identifier characters to the start state: case LEX_STATE_START: // Just loop past whitespace, and don't add it to the lexeme if ( IsCharWhitespace ( cCurrChar ) ) { ++ g_iCurrLexemeStart; iAddCurrChar = read more..

  • Page - 857

    816 Observant readers may have noticed, however, that making a call to IsCharIdent () in the start state isn’t technically correct, because it accepts characters 0-9, even though identifiers can’t start with numbers. Fortunately, if you notice the order in which the start state evaluates the input character, it checks for digits first. This effectively weeds out any read more..

  • Page - 858

    817 Token TokenType; switch ( iCurrLexState ) { // Integer case LEX_STATE_INT: TokenType = TOKEN_TYPE_INT; break; // Float case LEX_STATE_FLOAT: TokenType = TOKEN_TYPE_FLOAT; break; // Identifier/Reserved Word case LEX_STATE_IDENT: // Set the token type to identifier in case none // of the reserved words match TokenType = TOKEN_TYPE_IDENT; // ---- Determine if the "identifier" is actually a read more..

  • Page - 859

    818 // break if ( stricmp ( g_pstrCurrLexeme, "break" ) == 0 ) TokenType = TOKEN_TYPE_RSRVD_BREAK; // continue if ( stricmp ( g_pstrCurrLexeme, "continue" ) == 0 ) TokenType = TOKEN_TYPE_RSRVD_CONTINUE; // for if ( stricmp ( g_pstrCurrLexeme, "for" ) == 0 ) TokenType = TOKEN_TYPE_RSRVD_FOR; // while if ( stricmp ( g_pstrCurrLexeme, "while" ) == 0 ) TokenType = read more..

  • Page - 860

    819 Completing the Demo To test the new lexer, let’s add some code to the main () function that prints out the lexer’s results. As you can see, the additions are similar to those made to the end of GetNextToken ()— mostly just comparisons to determine which reserved word was found: while ( TRUE ) { // Get the next token CurrToken = GetNextToken (); // Make sure read more..

  • Page - 861

    820 case TOKEN_TYPE_RSRVD_FALSE: strcpy ( pstrToken, "false" ); break; case TOKEN_TYPE_RSRVD_IF: strcpy ( pstrToken, "if" ); break; case TOKEN_TYPE_RSRVD_ELSE: strcpy ( pstrToken, "else" ); break; case TOKEN_TYPE_RSRVD_BREAK: strcpy ( pstrToken, "break" ); break; case TOKEN_TYPE_RSRVD_CONTINUE: strcpy ( pstrToken, "continue" ); break; case TOKEN_TYPE_RSRVD_FOR: strcpy ( pstrToken, read more..

  • Page - 862

    821 // Increment the token count ++ iTokenCount; } // Print the token count printf ( "\n" ); printf ( "\tToken count: %d\n", iTokenCount ); With this code in place, the source file listed previously will produce the following results: Lexical Analyzer Demo 0: Token: Integer, Lexeme: "293048" 1: Token: Integer, Lexeme: "24" 2: Token: Integer, Lexeme: read more..

  • Page - 863

    822 28: Token: Identifier, Lexeme: "_so_is_this___" 29: Token: Integer, Lexeme: "63246" 30: Token: Float, Lexeme: "0.2346" 31: Token: Float, Lexeme: "34.0" Token count: 32 How cool is that? It not only lexes the file, but also detects and prints the reserved word associat- ed with each lexeme (if applicable). You’re closely approaching a complete lexer read more..

  • Page - 864

    823 Like reserved words, however, each delimiter gets its own token type. #define TOKEN_TYPE_DELIM_COMMA 16 #define TOKEN_TYPE_DELIM_OPEN_PAREN 17 #define TOKEN_TYPE_DELIM_CLOSE_PAREN 18 #define TOKEN_TYPE_DELIM_OPEN_BRACE 19 #define TOKEN_TYPE_DELIM_CLOSE_BRACE 20 #define TOKEN_TYPE_DELIM_OPEN_CURLY_BRACE 21 read more..

  • Page - 865

    824 #define MAX_DELIM_COUNT 24 char cDelims [ MAX_DELIM_COUNT ] = { ',', '(', ')', '[', ']', '{', '}', ';' }; IsCharDelim () can now scan through this array to determine whether the specified character is a delimiter: int IsCharDelim ( char cChar ) { // Loop through each delimiter in the array and compare // it to the specified character read more..

  • Page - 866

    825 // A float is starting else if ( cCurrChar == '.' ) { iCurrLexState = LEX_STATE_FLOAT; } // An identifier is starting else if ( IsCharIdent ( cCurrChar ) ) { iCurrLexState = LEX_STATE_IDENT; } // A delimiter has been read else if ( IsCharDelim ( cCurrChar ) ) { iCurrLexState = LEX_STATE_DELIM; } // It's invalid else ExitOnInvalidInputError ( cCurrChar ); This is easy enough, read more..

  • Page - 867

    826 case LEX_STATE_DELIM: // Determine which delimiter was found switch ( g_pstrCurrLexeme [ 0 ] ) { case ',': TokenType = TOKEN_TYPE_DELIM_COMMA; break; case '(': TokenType = TOKEN_TYPE_DELIM_OPEN_PAREN; break; case ')': TokenType = TOKEN_TYPE_DELIM_CLOSE_PAREN; break; case '[': TokenType = TOKEN_TYPE_DELIM_OPEN_BRACE; break; case ']': TokenType = TOKEN_TYPE_DELIM_CLOSE_BRACE; break; case '{': TokenType = read more..

  • Page - 868

    827 Lexing Strings Strings represent a subtle departure from the types of lexemes you’ve been handling in the lexer so far. Integers, floating-point values, identifiers, reserved words, and delimiters are all imple- mented with a single state—the state is entered in the start state, and continues onwards until the lexeme is done. The only exceptions to this rule are integers read more..

  • Page - 869

    828 #define LEX_STATE_STRING 8 #define LEX_STATE_STRING_ESCAPE 9 #define LEX_STATE_STRING_CLOSE_QUOTE 10 Remember, the opening quote isn’t represented by an explicit state. This is because once the quote is detected by the start state, it immediately transitions to LEX_STATE_STRING. Here’s the new token type strings will be read more..

  • Page - 870

    829 // A delimiter has been read else if ( IsCharDelim ( cCurrChar ) ) { iCurrLexState = LEX_STATE_DELIM; } // A string is starting, but don't add the // opening quote to the lexeme else if ( cCurrChar == '"' ) { iAddCurrChar = FALSE; iCurrLexState = LEX_STATE_STRING; } // It's invalid else ExitOnInvalidInputError ( cCurrChar ); break; Remember, you have to set iAddCurrChar to read more..

  • Page - 871

    830 { iAddCurrChar = FALSE; iCurrLexState = LEX_STATE_STRING_ESCAPE; } // Anything else gets added to the string break; The cool thing about lexing a string is that you literally don’t need to do anything—the way the state machine is set up, characters are added to the lexeme automatically, so by literally doing nothing, the string lexeme is populated. One character of interest, read more..

  • Page - 872

    831 You know something’s easy when the comment lines out-number the code. That’s right, all the escape sequence state does is transition back to the normal string state. Remember, all states auto- matically append the current character to the lexeme unless they explicitly request otherwise, so all you have to do is let the current character be written (which is the character read more..

  • Page - 873

    832 You could also apply brute force to the whole situation and spend a good six hours hard-coding each of the states a set of 34 operators would require. The amount of permutations and transi- tions between them would boil down to an astronomical number of separate states, but it’d work. But you can do better than this. It sounds a bit strange to think of it this read more..

  • Page - 874

    833 These indexes are numbered 0-2: > is 0, > is 1, and = is 2. When lexing the following operators: > >> >>= The state transitions are sequential—the first character, >, transitions into the second, > . This then transitions into the third, =, and the process is complete. It’s not possible for the first > to transition to =; in other read more..

  • Page - 875

    834 Now back to the conclusions. First of all, each of the 12 single characters has a number of proper- ties. These properties can be used to determine how many states they’re capable of transitioning to, as well as what those states are. For example, the + character, if it’s the first character of the lexeme, is associated with three states. First, it can be its read more..

  • Page - 876

    835 Note that the Substates column doesn’t list full operators; rather, it lists the characters that can immediately follow to invoke the transition to the substate. The first + row says that its substates are + and =, meaning that if either of these characters are read after the +, they’ll invoke a transi- tion to the ++ or += substates. Armed with this table, read more..

  • Page - 877

    836 such as < and +. Characters in the second group—index 1—represent both the final characters of double-character operators, like the = in +=, but also represent the second character in triple-char- acter operators, like the second < in <<=. Characters in the final group, index 2, only represent the final character of triple-character operators. Because there are no read more..

  • Page - 878

    837 typedef struct _OpState // Operator state { char cChar; // State character int iSubStateIndex; // Index into substate array where // sub states begin int iSubStateCount; // Number of substates int iIndex; // Operator index } OpState; First read more..

  • Page - 879

    838 // ---- Second operator characters OpState g_OpChars1 [ MAX_OP_STATE_COUNT ] = { { '=', 0, 0, 14 }, { '+', 0, 0, 15 }, // ++ { '=', 0, 0, 16 }, // -= { '-', 0, 0, 17 }, // -- { '=', 0, 0, 18 }, // *= { '=', 0, 0, 19 }, // /= { '=', 0, 0, 20 }, // %= { '=', 0, 0, 21 }, // ^= { '=', 0, read more..

  • Page - 880

    839 ■ The first character of the new lexeme is read in by the lexer, and it’s a <. Because you haven’t started lexing an operator yet, you’re still at character zero. You therefore look for < in the cChar element of each OpState structure in the g_OpChars0 [] array. It’s found, so you know an operator is beginning. You set the current character index of read more..

  • Page - 881

    840 New States and Tokens So, with a firm grasp on the logic behind the state transition tables and the code that will utilize them, let’s specify some new lexer states and tokens for GetNextToken () to work with. Here’s the new lexer state: #define LEX_STATE_OP 6 You need only one new state because all operators will be read more..

  • Page - 882

    841 #define OP_TYPE_ASSIGN_AND 22 // &= #define OP_TYPE_ASSIGN_OR 24 // |= #define OP_TYPE_ASSIGN_XOR 26 // #= #define OP_TYPE_ASSIGN_SHIFT_LEFT 33 // <<= #define OP_TYPE_ASSIGN_SHIFT_RIGHT read more..

  • Page - 883

    842 Here’s GetCurrOp (), which simply returns it: int GetCurrOp () { return g_iCurrOp; } With these variables in place, you can start writing the state handlers. Here are the additions that need to be made to the start state (in bold, as usual): case LEX_STATE_START: // Just loop past whitespace, and don't add it to the lexeme if ( IsCharWhitespace ( cCurrChar ) ) { ++ read more..

  • Page - 884

    843 // An operator is starting else if ( IsCharOpChar ( cCurrChar, 0 ) ) { // Get the index of the initial operand state iCurrOpStateIndex = GetOpStateIndex ( cCurrChar, 0, 0, 0 ); if ( iCurrOpStateIndex == -1 ) ExitOnInvalidInputError ( cCurrChar ); // Get the full state structure CurrOpState = GetOpState ( 0, iCurrOpStateIndex ); // Move to the next character in the operator read more..

  • Page - 885

    844 check, for which you pass cCurrChar, as well as the character index group to which the character may belong. For this, you pass zero. Here’s the code to IsCharOpChar (): int IsCharOpChar ( char cChar, int iCharIndex ) { // Loop through each state in the specified character // index and look for a match for ( int iCurrOpStateIndex = 0; iCurrOpStateIndex < read more..

  • Page - 886

    845 Once the start state knows an operator character from the first character index has been found, it knows an operator is starting. It then calls GetOpStateIndex () to find the index into the g_OpChars0 [] array where the character’s OpState structure resides (I’ll explain what each of those zeroed parameters following cCurrChar mean in a moment.) You technically know this read more..

  • Page - 887

    846 case 1: cOpChar = g_OpChars1 [ iCurrOpStateIndex ].cChar; break; case 2: cOpChar = g_OpChars2 [ iCurrOpStateIndex ].cChar; break; } // If the character is a match, return the index if ( cChar == cOpChar ) return iCurrOpStateIndex; } // Return -1 if no match is found return -1; } This function does almost the same thing IsCharOpChar () does, except it returns the specific index read more..

  • Page - 888

    847 // No, so save the substate information iStartStateIndex = iSubStateIndex; iEndStateIndex = iStartStateIndex + iSubStateCount; } You then call GetOpState () to use the index returned by GetOpStateIndex () to retrieve the actual OpState structure associated with the character read. You pass it zero, along with this index, to tell it to return the structure found at the specified read more..

  • Page - 889

    848 case LEX_STATE_OP: // If the current character within the operator // has no substates, we're done if ( CurrOpState.iSubStateCount == 0 ) { iAddCurrChar = FALSE; iLexemeDone = TRUE; break; } // Otherwise, find out if the new character is a possible substate if ( IsCharOpChar ( cCurrChar, iCurrOpCharIndex ) ) { // Get the index of the next substate iCurrOpStateIndex = read more..

  • Page - 890

    849 The first check this handler makes is for the possibility that the current operator state has no sub- states. In this case, no matter what the current character is, you know you’re done. Next, it com- pares the current character to the current operator state’s substates to determine whether the operator is being further developed. If so, you basically repeat the read more..

  • Page - 891

    850 case TOKEN_TYPE_DELIM_OPEN_BRACE: strcpy ( pstrToken, "Opening Brace" ); break; case TOKEN_TYPE_DELIM_CLOSE_BRACE: strcpy ( pstrToken, "Closing Brace" ); break; case TOKEN_TYPE_DELIM_OPEN_CURLY_BRACE: strcpy ( pstrToken, "Opening Curly Brace" ); break; case TOKEN_TYPE_DELIM_CLOSE_CURLY_BRACE: strcpy ( pstrToken, "Closing Curly Brace" ); break; case TOKEN_TYPE_DELIM_SEMICOLON: strcpy ( pstrToken, read more..

  • Page - 892

    851 3 ++ 2 ( 4 / 2 ) * 2 -22 .5 -.35 2.0 > >> >>= While "Hello, world!" 1 0.0 var 1.0 var 0 This_is_an_identifier 02345 _so_is_this___ if ( X < Y ) Z; 63246 -0.2346 34.0 When this file is passed through the final lexer, it produces the following results: Lexical Analyzer Demo 0: Token: Integer, Lexeme: "293048" 1: Token: Integer, Lexeme: "24" read more..

  • Page - 893

    852 11: Token: Opening Curly Brace, Lexeme: "{" 12: Token: Closing Curly Brace, Lexeme: "}" 13: Token: Opening Brace, Lexeme: "[" 14: Token: Identifier, Lexeme: "MyVar0" 15: Token: Comma, Lexeme: "," 16: Token: Identifier, Lexeme: "MyVar1" 17: Token: Comma, Lexeme: "," 18: Token: Identifier, Lexeme: "MyVar2" 19: Token: Closing read more..

  • Page - 894

    853 52: Token: Float, Lexeme: "0.0" 53: Token: var, Lexeme: "var" 54: Token: Float, Lexeme: "1.0" 55: Token: var, Lexeme: "var" 56: Token: Integer, Lexeme: "0" 57: Token: Identifier, Lexeme: "This_is_an_identifier" 58: Token: Integer, Lexeme: "02345" 59: Token: Identifier, Lexeme: "_so_is_this___" 60: Token: if, Lexeme: "if" read more..

  • Page - 895

    854 Here are the results: Lexical Analyzer Demo 0: Token: func, Lexeme: "func" 1: Token: Identifier, Lexeme: "MyFunc" 2: Token: Opening Parenthesis, Lexeme: "(" 3: Token: Identifier, Lexeme: "Param0" 4: Token: Comma, Lexeme: "," 5: Token: Identifier, Lexeme: "Param1" 6: Token: Comma, Lexeme: "," 7: Token: Identifier, Lexeme: read more..

  • Page - 896

    855 37: Token: Integer, Lexeme: "256" 38: Token: Semicolon, Lexeme: ";" 39: Token: Identifier, Lexeme: "MyFunc" 40: Token: Opening Parenthesis, Lexeme: "(" 41: Token: Identifier, Lexeme: "MyString" 42: Token: Comma, Lexeme: "," 43: Token: Float, Lexeme: "3.14159" 44: Token: Comma, Lexeme: "," 45: Token: Identifier, Lexeme: read more..

  • Page - 897

    856 Each of the lexer executables accepts a command-line argument to specify which file to lex. Go ahead and write your own source files to test out its robustness. CHALLENGES ■ Easy: Add some extra multi-character operators, and see whether you can properly insert them into the operator transition state tables. Remember to add them to the end of the tables so they don’t read more..

  • Page - 898

    Building the XtremeScript Compiler Framework “Telephone, computer, fax machine, fifty-two weekly paychecks and forty-eight flight coupons… we now had corporate sponsorship.” ——Jack, Fight Club CHAPTER 14 read more..

  • Page - 899

    858 W ith 13 chapters behind you, the moment has finally arrived. You’re now ready to dive headlong into the real inner-workings of the XtremeScript Compiler—the high-level, human interface to our nearly complete scripting system. Regardless of the reasonable complexity associated with both the virtual machine and assembler, no scripting system is really worth using without a read more..

  • Page - 900

    859 interfaces are generally a good thing, you don’t need to follow this rule too strictly. There might still be a handful of globals floating around, or other such “cheating,” but the final result will be more than clean enough for the purposes here. As I’ve demonstrated frequently throughout the book so far, compilers are generally built as two separate “ends,” read more..

  • Page - 901

    860 The Loader Module The loader module is responsible for initially loading the source code from an .XSS (XtremeScript Source) file into memory. Although this may seem like a trivial job at first, there are still some important details to consider. Storing the Source Code Unlike the simplified examples of the lexer built in the last chapter, the XtremeScript compiler will not read more..

  • Page - 902

    861 The Preprocessor Module Once the loader has populated the compiler’s internal source code linked list, you’re almost ready to pass things to the lexer and parser so the compilation process can begin. Before doing so, however, you have the opportunity to filter and convert the source code to a more convenient a format via the preprocessor. By inserting a preprocessor read more..

  • Page - 903

    862 The Parser Module In addition to being the most complex aspect of the compiler, the parser also takes center stage among the various modules of the front end, and is its final phase. The parser is responsible for converting the stream of tokens and lexemes produced by the lexical analyzer into I-code, which is then converted to XVM assembly by the back end. The read more..

  • Page - 904

    863 The Back End The back end is responsible for converting the contents of the I-code module to XVM assembly and invoking the XASM assembler to create a ready-to-use .XSE executable from it. The Code Emitter Module The XtremeScript compiler doesn’t generate actual .XSE executables; rather, it generates an ASCII-formatted XVM assembly file and relies on the XASM assembler built read more..

  • Page - 905

    864 the following line of code is represented as a single node within the list, even though it contains multiple statements: X = 256; Y = 512; MyString = "Hello, world!"; Furthermore, single statements can often span multiple lines, such as the following: X = 256; This would be stored internally as three nodes. The Script Header Much like the XASM assembler and the XVM, read more..

  • Page - 906

    865 The Function Table The function table is similar to the symbol table, but maintains a record of the script’s functions, rather than its variables and arrays. The function table stores each function’s name, parameter count, and other such information. Like the symbol table, it’s written to as functions are initially parsed, and read from as they’re called. One major read more..

  • Page - 907

    866 The String Table Like an XVM assembly script, a script written in XtremeScript is also likely to contain a number of string literals. In addition to converting the script’s statements and declarations to valid XVM assembly, the parser will also be responsible for collecting these strings and storing them in a table (as well as filter out any duplicates it may come read more..

  • Page - 908

    867 structure in this same way, the entire process of compiling a source file can be broken into a rather straightforward hierarchy of function calls. Although object-oriented programming is generally the best foundation for the interfaces and implementation of a large program’s structures, this compiler still certainly falls within the bound- aries of what C is capable of. Because read more..

  • Page - 909

    868 example. It’s also important to mention that the XtremeScript compiler will work strictly in a sin- gle pass; rather than scanning through the file multiple times like XASM, it will work its way from top to bottom in a straight line. This brings with it some restrictions—for example, forward refer- ences of functions will be illegal. It will help you understand the read more..

  • Page - 910

    869 // Verify the filenames VerifyFilenames ( argc, argv ); // Initialize the compiler Init (); // Read in the command line parameters ReadCmmndLineParams ( argc, argv ); // ---- Begin the compilation process (front end) // Load the source file into memory LoadSourceFile (); // Preprocess the source file PreprocessSourceFile (); // ---- Compile the source code to I-code printf ( read more..

  • Page - 911

    870 Even without an understanding of the rest of the program, this should make reasonable sense. You start by printing the program’s “logo,” which is really just its title and version information. The number of command-line arguments is then checked; if it’s less than 2 (meaning only the name of the program was passed), the user hasn’t specified any action to be taken. In read more..

  • Page - 912

    871 printf ( "XtremeScript Compiler Version %d.%d\n", VERSION_MAJOR, VERSION_MINOR ); printf ( "Written by Alex Varanese\n" ); printf ( "\n" ); } After printing the logo, main () then prints the program’s usage info and exits if the user didn’t supply any command-line arguments: void PrintUsage () { printf ( "Usage:\tXSC Source.XSS [Output.XASM] [Options]\n" ); printf ( read more..

  • Page - 913

    872 Implementation According to the compiler’s main () function, the compiler calls a function called VerifyFilenames () to read the filenames from the command line, append file extensions if necessary, and store them for subsequent use by other modules. Regardless of how many filenames were initially spec- ified by the user, VerifyFilenames () produces two separate strings and read more..

  • Page - 914

    873 // Check for the presence of the .XSE extension and add it if it's not // there if ( ! strstr ( g_pstrOutputFilename, OUTPUT_FILE_EXT ) ) { // The extension was not found, so add it to string strcat ( g_pstrOutputFilename, OUTPUT_FILE_EXT ); } } else { // No, so base it on the source filename // First locate the start of the extension, and use pointer subtraction // read more..

  • Page - 915

    874 This takes care of the first filename, so the second one is read next. This part is a twofold job; in addition to checking for the presence of the extension, it must be determined whether a second filename was specified. If not, the filename is based on the first. The second filename should be located at index 2 of the argv [] array. To find out if read more..

  • Page - 916

    875 All command-line options must be preceded with a dash (-) to differentiate them from file- names. Each of them are optional, and although A and N are technically mutually exclusive, this isn’t enforced. Lastly, the options can appear in any order (unlike the filenames, which must always be either the first, or first and second in the list). As is shown in the read more..

  • Page - 917

    876 pstrCurrOption, psrtCurrValue, and pstrErrorMssg are just local copies of various strings read from the arrays as the loop executes. Once it’s determined that the current argument is a valid option, its actual data is extracted. This can be either a one- or two-step process, depending on whether the option accepts a value. Both the -S and -P options do, but -A and -N read more..

  • Page - 918

    877 // Make sure the value is valid if ( ! strlen ( pstrCurrValue ) ) { sprintf ( pstrErrorMssg, "Invalid value for -%s option", pstrCurrOption ); ExitOnError ( pstrErrorMssg ); } } This is handled in two loops. The first reads all characters until the end of the string or the first instance of the : character and adds them to the pstrCurrOption string. When this loop read more..

  • Page - 919

    878 // Set the priority else if ( stricmp ( pstrCurrOption, "P" ) == 0 ) { // ---- Determine what type of priority was specified // Low rank if ( stricmp ( pstrCurrValue, PRIORITY_LOW_KEYWORD ) == 0 ) { g_ScriptHeader.iPriorityType = PRIORITY_LOW; } // Medium rank else if ( stricmp ( pstrCurrValue, PRIORITY_MED_KEYWORD ) == 0 ) { g_ScriptHeader.iPriorityType = PRIORITY_MED; } // read more..

  • Page - 920

    879 #define PRIORITY_LOW_KEYWORD "Low" #define PRIORITY_MED_KEYWORD "Med" #define PRIORITY_HIGH_KEYWORD "High" If the option’s value doesn’t match any of the keywords, pstrCurrValue is unconditionally convert- ed to an integer and assigned to g_ScriptHeader.iUserPriority. The iPriorityType field is then set to PRIORITY_USER to read more..

  • Page - 921

    880 ELEMENTARY DATA STRUCTURES Before proceeding, I’d like to get one thing out of the way. This chapter, as well as the next, will make heavy use of both the stack and linked list data types. Although these are obviously both sim- ple to understand and implement, I think it’s a good idea to briefly cover their specific implementation in the XtremeScript compiler, so read more..

  • Page - 922

    881 This simple structure consists of three fields. The two pointers, pHead and pTail, point to the head and tail nodes of the list. iNodeCount keeps track of how many nodes the list contains. Check out Figure 14.11. ELEMENTARY DATA STRUCTURES Figure 14.11 The linked list structure. The Interface The linked list interface is rather simple; it has a handful of functions for read more..

  • Page - 923

    882 Freeing Lists Initializing a list is easy, but freeing one is a bit more complex: void FreeLinkedList ( LinkedList * pList ) { // If the list is empty, exit if ( ! pList ) return; // If the list is not empty, free each node if ( pList->iNodeCount ) { // Create a pointer to hold each current node and the next node LinkedListNode * pCurrNode, * pNextNode; // Set read more..

  • Page - 924

    883 The function takes a single LinkedList structure pointer. The list is traversed with two node point- ers, pCurrNode and pNextNode. pCurrNode is set to the head of the list, and the traversal begins. At each iteration of the loop, the pointer to the next node is saved in pNextNode. The current node’s data is then freed, as well as the structure representing the node read more..

  • Page - 925

    884 // Update the list's tail pointer pList->pTail = pNewNode; } // Increment the node count ++ pList->iNodeCount; // Return the new size of the linked list - 1, // which is the node's index return pList->iNodeCount - 1; } The first thing the function does is allocates space for the new node’s LinkedListNode structure. It then sets the node’s data pointer to the pData read more..

  • Page - 926

    885 else { // Otherwise, traverse the list until the specified node's previous // node is found LinkedListNode * pTravNode = pList->pHead; for ( int iCurrNode = 0; iCurrNode < pList->iNodeCount; ++ iCurrNode ) { // Determine if the current node's next node is the specified one if ( pTravNode->pNext == pNode ) { // Determine if the specified node is the tail if ( read more..

  • Page - 927

    886 The function accepts a linked list pointer, pList, and a node pointer, pNode. It starts by determin- ing whether the node to be deleted is the head node. If so, it sets the base structure’s head point- er to the node just after the current head pointer. If the node to be deleted isn’t the head, it creates a new node pointer called pTravNode to traverse the read more..

  • Page - 928

    887 // ---- Add the new string, since it wasn't added // Create space on the heap for the specified string char * pstrStringNode = ( char * ) malloc ( strlen ( pstrString ) + 1 ); strcpy ( pstrStringNode, pstrString ); // Add the string to the list and return its index return AddNode ( pList, pstrStringNode ); } This function accepts two parameters—a linked list read more..

  • Page - 929

    888 This function accepts a linked list pointer, pList, as well as an integer index, iIndex, which it uses to find the desired string. It does so by iterating through each node in the list and comparing the current node counter, iCurrNode, to the specified index. If a match is found, the node’s pData pointer is cast to a string pointer and returned to the caller. read more..

  • Page - 930

    889 Initializing Stacks Because the initialization of a stack really just means the initialization of its underlying linked list, all this function boils down to is a call to InitLinkedList (): void InitStack ( Stack * pStack ) { // Initialize the stack's internal list InitLinkedList ( & pStack->ElmntList ); } Freeing Stacks The same goes for freeing a stack; all that’s read more..

  • Page - 931

    890 void Push ( Stack * pStack, void * pData ) { // Add a node to the end of the stack's internal list AddNode ( & pStack->ElmntList, pData ); } Popping Elements off a Stack The opposite of pushing, of course, is popping. Unlike traditional stacks, however, the stack will not return the data member it removes from the top of the stack; rather, it will simply delete read more..

  • Page - 932

    891 LinkedList g_SourceCode; LinkedList g_FuncTable; LinkedList g_SymbolTable; LinkedList g_StringTable; The script header, however, is an instance of the ScriptHeader structure. Here’s its definition: typedef struct _ScriptHeader // Script header data { int iStackSize; // Requested stack size int iIsMainFuncPresent; // Is _Main () present? int read more..

  • Page - 933

    892 // Mark the assembly file for deletion g_iPreserveOutputFile = FALSE; // Generate an .XSE executable g_iGenerateXSE = TRUE; // Initialize the source code list InitLinkedList ( & g_SourceCode ); // Initialize the tables InitLinkedList ( & g_FuncTable ); InitLinkedList ( & g_SymbolTable ); InitLinkedList ( & g_StringTable ); } The function should be pretty clear. It starts by initializing read more..

  • Page - 934

    893 // Give allocated resources a chance to be freed ShutDown (); // Exit the program exit ( 0 ); } This decidedly trivial function simply allows the caller to run the compiler’s shutdown sequence and exit the program in a single call. THE COMPILER’S MODULES Because the compiler is a decidedly more complex project than the XASM assembler or the XVM, it’s broken into a read more..

  • Page - 935

    894 By breaking the project down like this, it’s simply a matter of knocking out each module, one by one, until they’re all finished. You’ve already seen some of this; much of xsc.cpp|h has been explained in the earlier sections (although the rest will be revisited), I just finished a thorough discussion of both linked_list.cpp|h and stack.cpp|h, and lexer.cpp|h will be read more..

  • Page - 936

    895 While you’re at, you might as well get globals.h taken out too. As Table 14.2 mentions, this just contains some basic global data that everyone needs, which really just boils down to the TRUE and FALSE macros, as well as some useful #includes: #ifndef XSC_GLOBALS #define XSC_GLOBALS // ---- Include Files --------------------------------------- #include <stdlib.h> #include read more..

  • Page - 937

    896 void LoadSourceFile () { // ---- Open the input file FILE * pSourceFile; if ( ! ( pSourceFile = fopen ( g_pstrSourceFilename, "r" ) ) ) ExitOnError ( "Could not open source file for input" ); // ---- Load the source code // Loop through each line of code in the file while ( ! feof ( pSourceFile ) ) { // Allocate space for the next line char * pstrCurrLine read more..

  • Page - 938

    897 At this point, you’ve loaded a source file into memory and are ready to go. Let’s move on to see how the file will be transformed and converted as it passes through the compiler’s remaining modules. THE PREPROCESSOR MODULE The preprocessor is the source code’s first stop on its trip through the system. The preprocessor is implemented as a single function called read more..

  • Page - 939

    898 // Traverse the source code while ( TRUE ) { // Create local copy of the current line char * pstrCurrLine = ( char * ) pNode->pData; The iInBlockComment and iInString flags are there so the preprocessor can tell at all times whether it’s currently inside a string or block comment. You’ll see why the former of these two flags is important in a moment, but you read more..

  • Page - 940

    899 to this: ScreenX = X / Z; Of course, there’s still the whitespace in between the semicolon and the start of the former com- ment, but that obviously doesn’t matter. Here’s the code: // Check for a single-line comment, and terminate the rest // of the line if one is found if ( pstrCurrLine [ iCurrCharIndex ] == '/' && pstrCurrLine [ iCurrCharIndex + 1 ] read more..

  • Page - 941

    900 14. BUILDING THE XTREMESCRIPT COMPILER FRAMEWORK Figure 14.15 The preprocessor iden- tifying and deleting block comments. TIP If you really want to physically remove comments, the algorithm is conceptually sim- ple but might be a bit tricky to implement.The key is understanding that block com- ments can result in a number of different “line types”.The first line type is just read more..

  • Page - 942

    901 Here’s the code for replacing a block comment with whitespace: // Check for a block comment if ( pstrCurrLine [ iCurrCharIndex ] == '/' && pstrCurrLine [ iCurrCharIndex + 1 ] == '*' && ! iInString && ! iInBlockComment ) { iInBlockComment = TRUE; } // Check for the end of a block comment if ( pstrCurrLine [ iCurrCharIndex ] == '*' && pstrCurrLine read more..

  • Page - 943

    902 Preprocessor Directives The language specification from Chapter 7 included two preprocessor directives, #include and #define. #include replaces itself with the contents of the file it specifies, whereas #define defines a symbolic constant and assigns it a value. The preprocessor then scans over the entire source code and replaces all instances of the symbol’s name with the read more..

  • Page - 944

    903 Nested #include Directives The only caveat left is the issue of nested #include directives, wherein the file you’re including includes files of its own. Because this is a vital feature of file inclusion directives, it’s important to support this feature. I personally think the best way to solve this issue is to make the #include directive’s handler func- tion recursive. read more..

  • Page - 945

    904 The implementation of #define is a bit more in-depth than #include. Let’s first review its syntax. Although C’s #define is capable of macros that span multiple lines and even accept parameters, this version of #define is relegated to symbolic constants that map a single-line string to an identi- fier, like these: #define MY_NAME "Alex" #define PI read more..

  • Page - 946

    905 The Symbol Table As the source code is parsed, perhaps the most obvious collection of data that needs to be organ- ized, maintained, and tracked is the script’s variables and arrays (see Figure 14.17). As you can imagine, high-level programming wouldn’t get very far without them, so it’s a logical place to start. THE COMPILER’S TABLES Figure 14.17 The symbol table read more..

  • Page - 947

    906 As will be the case with most node structures, the first field is an integer index called iIndex. The reason you need an explicit field for this, as opposed to simply basing a node’s index on its physi- cal position within the list, is to prepare for the possibility of the lists order changing arbitrarily. If this were to happen for whatever reason, it would read more..

  • Page - 948

    907 #define SCOPE_GLOBAL 0 #define SYMBOL_TYPE_VAR 0 #define SYMBOL_TYPE_PARAM 1 The Interface The symbol table interface is more or less what you expect—it provides a function for adding a new symbol, retrieving symbols based on their indexes and identifiers, and so on. read more..

  • Page - 949

    908 The first thing the function does is call GetSymbolByIdent () to find out if the symbol already exists. I haven’t covered this function yet, so rest assured that it does just what it says—returns a pointer to the symbol matching the specified identifier if one was found, and returns NULL other- wise. If this function returns a valid pointer, it means the symbol read more..

  • Page - 950

    909 same thing as GetSymbolByIdent (), except it returns the symbol corresponding to the specified index (obviously). Once the symbol has been read from the table, its identifier is compared to the specified one, as well as its scope. If a match is found, the structure is returned; otherwise, NULL is returned. Moving on, the next function is GetSymbolByIdent (), which does read more..

  • Page - 951

    910 that the specified identifier is indeed an array). In these cases, GetSizeByIdent () is called—pass it the variable’s identifier, and it returns its size: int GetSizeByIdent ( char * pstrIdent, int iScope ) { // Get the symbol's information SymbolNode * pSymbol = GetSymbolByIdent ( pstrIdent, iScope ); // Return its size return pSymbol->iSize; } Pretty simple, huh? With read more..

  • Page - 952

    911 The FuncNode Structure Just as symbols needed a separate structure to store each of their nodes, so do functions: typedef struct _FuncNode // A function table node { int iIndex; // Index char pstrName [ MAX_IDENT_SIZE ]; // Name int iIsHostAPI; // Is this a host read more..

  • Page - 953

    912 Adding Functions Let’s start at the beginning, with the predictably titled AddFunc (): int AddFunc ( char * pstrName, int iIsHostAPI ) { // If a function already exists with the specified name, // exit and return an invalid index if ( GetFuncByName ( pstrName ) ) return -1; // Create a new function node FuncNode * pNewFunc = ( FuncNode * ) malloc ( sizeof ( read more..

  • Page - 954

    913 The basic strategy here is just the same as it was in AddSymbol (): ■ Determine whether the function being added is already in the table. If so, return the existing node’s index to the caller (note I haven’t covered GetFuncByName () yet). ■ Allocate a FuncNode structure and initialize it based on the parameters passed. ■ Add the node to the table. ■ Return read more..

  • Page - 955

    914 // Return the function if the name matches if ( pCurrFunc && stricmp ( pCurrFunc->pstrName, pstrName ) == 0 ) return pCurrFunc; } // The function was not found, so return a NULL pointer return NULL; } Again, just as was the case with the symbol table interface, this function is making repeated calls to GetFuncByIndex () as it iterates through the function table. As read more..

  • Page - 956

    915 At this point, the function should be entirely self-explanatory. A local node pointer is used to traverse the list, node by node, until a match is found by comparing the specified index to each node’s iIndex field. In the event of a match, the pointer is returned; otherwise, NULL is returned when the end of the loop is reached. Updating a Function’s Parameter read more..

  • Page - 957

    916 INTEGRATING THE LEXICAL ANALYZER MODULE The last chapter saw you through the design and implementation of a lexical analyzer capable of lexing the entire XtremeScript language. Although the lexer you built was complete, the issue of integrating it smoothly with the compiler framework you’re building in this chapter is still significant. Rewinding the Token Stream In the last read more..

  • Page - 958

    917 Lexer States So how is the token stream rewound? The key to understanding how it works is to simply realize that at any given time, the lexer is in a particular “state” (not to be confused with the states of the state machine in GetNextToken ()). By “state”, I mean the lexeme stream contains a specific lex- eme, the current token contains a specific token code, and read more..

  • Page - 959

    918 // ---- Operators int g_iCurrOp; // Current operator int g_iPrevOp; As you can see by the bold code, each variable has been duplicated and prefixed with Prev. Now, GetNextToken () can save each of the Curr versions to the Prev versions of the variable, like g_iCurrLexemeStart to g_iPrevLexemeStart, for example. Once this is done, the caller read more..

  • Page - 960

    919 Now that you can arbitrarily instantiate lexer states at will, an important operation will be copying the contents of one state to another. This is facilitated with the CopyLexerState () function, which accepts two LexerState pointers and copies one into the other: void CopyLexerState ( LexerState & pDestState, LexerState & pSourceState ) { // Copy each field individually to read more..

  • Page - 961

    920 node represents a separate line from the original source file. Getting the lexer to work with this new format will be the next challenge to face. If you recall, the lexer in the last chapter relied heavily on a function called GetNextChar (). At any time, this function could be called to both read and return the next character from the source buffer, but would read more..

  • Page - 962

    921 int iCurrLexemeEnd; // Current lexeme's // ending index int iCurrOp; // Current operator } LexerState; Let’s take a look at the new version of GetNextChar (), capable now of reading the next character from the source buffer in linked list format: char GetNextChar () { // Make a local copy of the string read more..

  • Page - 963

    922 // No, so return a null terminator to alert the lexer that the end // of the source code has been reached return '\0'; } } // Return the character and increment the pointer return pstrCurrLine [ g_CurrLexerState.iCurrLexemeEnd ++ ]; } Simply to keep the code readable, the first thing the function does is makes a local copy of the pCurrLine pointer in the g_CurrLexerState read more..

  • Page - 964

    923 within the linked list, writing a look-ahead function is no problem. Here’s the code to GetLookAheadChar (): char GetLookAheadChar () { // Save the current lexer state LexerState PrevLexerState; CopyLexerState ( PrevLexerState, g_CurrLexerState ); // Skip any whitespace that may exist and return the // first non-whitespace character char cCurrChar; while ( TRUE ) { cCurrChar = read more..

  • Page - 965

    924 still not the lexer’s place to terminate the program and display error messages. That task is han- dled by the error-handling functions defined in error.cpp|h, which you should be mindful of. Because the lexer is now but a single part in a much larger system, it should now simply return an error flag that signifies invalid tokens, allowing the caller (most likely read more..

  • Page - 966

    925 iCurrLexState = LEX_STATE_FLOAT; } // If whitespace or a delimiter is read, the lexeme is done else if ( IsCharWhitespace ( cCurrChar ) || IsCharDelim ( cCurrChar ) ) { iLexemeDone = TRUE; iAddCurrChar = FALSE; } // Anything else is invalid else iCurrLexState = LEX_STATE_UNKNOWN; break; As soon as a non-float character is read, the state transitions to unknown. Upon the next read more..

  • Page - 967

    926 operator. However, even though GetNextToken () returns it initially, it would be nice to be able to read the token again at any time. As you might imagine, this is an easy feature to add. All you need to do is expand the LexerState function to track the current token, make sure GetNextToken () saves the token type there before returning, and add a new one-line read more..

  • Page - 968

    927 void CopyCurrLexeme ( char * pstrBuffer ) { strcpy ( pstrBuffer, g_CurrLexerState.pstrCurrLexeme ); } Error-Printing Helper Functions The error-handling functions discussed later in the chapter will require that the lexer expose a few key pieces of information to help make its messages more verbose and informative for the users. As with XASM, it’s helpful to print the actual line read more..

  • Page - 969

    928 Resetting the Lexer One last modification to the lexer worth mentioning is that InitLexer () is now known as ResetLexer (). It’s the same function, but because the compiler may need to reset the lexer multi- ple times during its lifespan, I feel the name change is appropriate for the new environment. THE PARSER MODULE The parser will be left blank for this chapter, read more..

  • Page - 970

    929 compiler will display the current line, print the line number, and use a caret symbol to point out the offending character/lexeme: void ExitOnCodeError ( char * pstrErrorMssg ) { // Print the message printf ( "Error: %s.\n\n", pstrErrorMssg ); printf ( "Line %d\n", GetCurrSourceLineIndex () ); // Reduce all of the source line's spaces to tabs so it takes less space // read more..

  • Page - 971

    930 // Print message indicating that the script could not be assembled printf ( "Could not compile %s.", g_pstrSourceFilename ); // Exit the program Exit (); } The function first prints the error message and the line number. It then statically allocates a local string buffer to hold the current line of code, the pointer to which it gets from GetCurrSourceLine (). Once it has read more..

  • Page - 972

    931 However, it’s an interesting topic and one that I’ll discuss briefly. To implement a cascading error system, the parser needs to be able to resynchronize itself after detecting an error. On a basic level, this means finding the next valid token with which it can pick itself up, dust itself off, and resume a normal parsing process. For example, imagine the following read more..

  • Page - 973

    932 The parser will read the next line without a problem, because it’s perfectly valid. The line after that, however, in which Square () is passed two parameters, presents another error. And because the parser is still active, it will catch it as well as the first one. The end result is two errors printed where a more simplistic error mechanism would print only one. Of read more..

  • Page - 974

    933 structures that are similar to the source language. Low-level I-code implementations, like lists of pseudo-instructions, are closer to the back end and resemble Assembly far more than they would C++ or Pascal. To keep things simple but still useful, I’ve chosen to base XtremeScript’s I-code module on the latter of the two options. The intermediate code generated by the read more..

  • Page - 975

    934 For example, the 80x86 has a multiplication operator that differs strongly from the XVM’s Mul operator (even though they share the same name). Here’s an example of multiplying two vari- ables, X and Y, and storing the result in X: MOV EAX, X MUL Y MOV X, EAX The first thing to remember is that the 80x86 platform has a number of hardware read more..

  • Page - 976

    935 The XtremeScript I-Code Instruction Set The funny thing, however, is that the XVM is already designed around an intentionally simplistic and easy-to-use instruction set. Although I’m sure it’s possible to find ways to make it even simpler (within reason), I designed it intentionally from day one to iron out the difficulties associated with many native hardware assembly read more..

  • Page - 977

    936 Instructions Each node of this list will represent a single instruction, complete with an opcode and operands. To keep things as simple as possible, these opcodes will map directly to XVM assembly opcodes, so you can copy and paste the list of instruction constants directly from XASM: #define INSTR_MOV 0 #define INSTR_ADD read more..

  • Page - 978

    937 #define INSTR_CALL 28 #define INSTR_RET 29 #define INSTR_CALLHOST 30 #define INSTR_PAUSE 31 #define INSTR_EXIT 32 This takes care of the I-code instructions, but you of course need operands as well. Like the instructions, you can copy these read more..

  • Page - 979

    938 member will contain it at all times. The operand list still needs an operand structure to embody each of its nodes, however. For this, you need the Op structure: typedef struct _Op // An I-code operand { int iType; // Type union // The value { int read more..

  • Page - 980

    939 drawbacks. For example, it is possible that at some point, you’ll need to insert an instruction arbi- trarily into the stream. If, by chance, this insertion must take place in between the jump target and the instruction to which that target is bound, you’re hosed—there’s no way to separate the target from its instruction, because they both occupy the same structure. read more..

  • Page - 981

    940 { ICodeInstr Instr; // The I-code instruction int iJumpTargetIndex; // The jump target index }; } ICodeNode; Now, the ICodeStream linked list found in the FuncNode structure discussed earlier can be filled with ICodeNode structures. Each node is capable of functioning as either a jump target or instruc- tion, allowing for a complete read more..

  • Page - 982

    941 return 0; } With source code annotation turned on, the Microsoft VC++ disassembler produces the following: ; 5 : Y = 4; mov DWORD PTR _Y$[ebp], 4 ; 6 : X = Y * 8; mov eax, DWORD PTR _Y$[ebp] shl eax, 3 mov DWORD PTR _X$[ebp], eax ; 7 : Y = X / 2; mov eax, DWORD PTR _X$[ebp] cdq sub read more..

  • Page - 983

    942 typedef struct _ICodeNode // An I-code node { int iType; // The node type union { ICodeInstr Instr; // The I-code instruction char * pstrSourceLine; // The source line with // which this instruction // is annotated int iJumpTargetIndex; // The jump target index read more..

  • Page - 984

    943 Adding Instructions The first and most basic I-code module operation is the addition of an instruction. Remember, all I-code must exist within the scope of a specific FuncNode structure in the function table. In other words, code only exists within functions. Because of this, a function for adding an I-code instruction needs both the opcode to add, as well as a function read more..

  • Page - 985

    944 The index returned by AddICodeInstr () is actually of significant importance; because operands will be added to the instruction in subsequent function calls, the caller needs to be able to specify which node index in the stream the operands should be added to. Adding Operands Speaking of which, adding operands to preexisting instructions in an I-code stream is the focus of read more..

  • Page - 986

    945 be nice to simply pass the function an integer value. If you’re adding a symbol table index operand, it would be easier if you could just pass the index itself. To do this, you can create a number of helper functions that will wrap AddICodeOp () to make the addition of specific operand values easier. Let’s start with AddIntICodeOp (), which adds integer values read more..

  • Page - 987

    946 // Traverse the list until the matching index is found for ( int iCurrNode = 0; iCurrNode < pInstr->Instr.OpList.iNodeCount; ++ iCurrNode ) { // If the index matches, return the operand if ( iOpIndex == iCurrNode ) return ( Op * ) pCurrNode->pData; // Otherwise move to the next node pCurrNode = pCurrNode->pNext; } // The operand was not found, so return a NULL read more..

  • Page - 988

    947 // Set the jump target pSourceLineNode->iJumpTargetIndex = iTargetIndex; // Add the instruction node to the list and get the index AddNode ( & pFunc->ICodeStream, pSourceLineNode ); } Predictably, this function accepts a function index and a jump target index. It calls GetFuncByIndex () to retrieve a pointer to the function node, and then allocates space for the new I-code node. read more..

  • Page - 989

    948 void AddICodeSourceLine ( int iFuncIndex, char * pstrSourceLine ) { // Get the function to which the source line should be added FuncNode * pFunc = GetFuncByIndex ( iFuncIndex ); // Create an I-code node structure to hold the line ICodeNode * pSourceLineNode = ( ICodeNode * ) malloc ( sizeof ( ICodeNode ) ); // Set the node type to source line pSourceLineNode->iType = read more..

  • Page - 990

    949 // Create a pointer to traverse the list LinkedListNode * pCurrNode = pFunc->ICodeStream.pHead; // Traverse the list until the matching index is found for ( int iCurrNode = 0; iCurrNode < pFunc->ICodeStream.iNodeCount; ++ iCurrNode ) { // If the implicit index matches, return the instruction if ( iInstrIndex == iCurrNode ) return ( ICodeNode * ) pCurrNode->pData; // Otherwise read more..

  • Page - 991

    950 emitted as variable identifiers, entries in the function table are emitted as formal XASM function declarations, and so on. Although you could just emit a bare-bones, completely unformatted chunk of borderline unread- able text, there are a number of reasons to expend some extra effort formatting the generated assembly file for both general aesthetics and readability: ■ You read more..

  • Page - 992

    951 ; ---- Functions ---------------------------------- ; Non-_Main () function declarations go here ; ---- Main --------------------------------------- ; _Main ()'s function declaration goes here, if present As you can see, this is designed to mimic the formatting style I’ve been using throughout the book. Each segment of the file is partitioned in a very visual, verbose manner that read more..

  • Page - 993

    952 "Push", "Pop", "Call", "Ret", "CallHost", "Pause", "Exit" }; Last is a single constant that is used to track the width of tab stops: #define TAB_STOP_WIDTH 8 This will come in handy when aligning the columns of instructions and their operands in the out- putted code. Emitting the read more..

  • Page - 994

    953 g_pstrSourceFilename, and the version of the compiler, found in VERSION_MAJOR and VERSION_MINOR (defined in xsc.h): #define VERSION_MAJOR 0 #define VERSION_MINOR 8 The final line of the header is the timestamp calculated earlier. To convert the contents of the structure pointed to by pCurrTime to something that can read more..

  • Page - 995

    954 // Medium rank case PRIORITY_MED: fprintf ( g_pOutputFile, PRIORITY_MED_KEYWORD ); break; // High rank case PRIORITY_HIGH: fprintf ( g_pOutputFile, PRIORITY_HIGH_KEYWORD ); break; // User-defined time slice case PRIORITY_USER: fprintf ( g_pOutputFile, "%d", g_ScriptHeader.iUserPriority ); break; } fprintf ( g_pOutputFile, "\n" ); iAddNewline = TRUE; } // If necessary, insert an extra line read more..

  • Page - 996

    955 The next directive is SetPriority, whose value is represented within the script header by two sepa- rate fields. Before doing anything, the function determines whether the script header’s iPriorityType field is PRIORITY_TYPE_NONE. If so, it’s taken as a sign that the user never entered a priority. Otherwise, it’s assumed to be the type of priority requested. fprintf () is read more..

  • Page - 997

    956 EmitScopeSymbols () and can be used to emit both the global declarations at the top of the script, and the local declarations within each function: void EmitScopeSymbols ( int iScope, int iType ) { // If declarations were emitted, this is set to TRUE so we remember to // insert extra line breaks after them int iAddNewline = FALSE; // Local symbol node pointer SymbolNode read more..

  • Page - 998

    957 { // Get the current symbol structure pCurrSymbol = GetSymbolByIndex ( iCurrSymbolIndex ); // If the scopes and parameter flags match, emit the declaration if ( pCurrSymbol->iScope == iScope && pCurrSymbol->iType == iType ) { // Print one tab stop for global declarations, and two for locals fprintf ( g_pOutputFile, "\t" ); if ( iScope != SCOPE_GLOBAL ) fprintf ( read more..

  • Page - 999

    958 appropriate number of tab stops. The function can be used for both global and local declara- tions, and this fact is reflected here. Globals and functions are both indented by a single tab. So, a global variable declaration only needs one tab stop to precede it. However, because a local dec- laration’s function is one tab in as well, the declaration itself needs two read more..

  • Page - 1000

    959 Everything except the code is a snap; the function declaration itself is just a matter of emitting the Func directive, the function node’s pstrName field, and the curly braces. Parameters and local variables can each be emitted with two calls to the EmitScopeSymbols () function developed in the last section. The code, however, is where things get tricky. Because you’re read more..

  • Page - 1001

    960 Two calls to EmitScopeSymbols (), used to emit the function’s parameters and variables (in that order), are then made. At this point, all that remains is the code. The function node stores this code in its nested ICodeStream linked list, so you begin by determining whether it contains any- thing: // Does the function have an I-code block? if ( read more..

  • Page - 1002

    961 ■ I-code instruction. An I-code instruction in the XtremeScript compiler has a one-to-one mapping with the XVM instruction set, so all you have to do here is emit the proper mnemonic and each of its operands. ■ Jump target. Jump targets are ultimately translated to labels by the code emitter, which must generate a unique label name on the fly. You’ll learn how this read more..

  • Page - 1003

    962 I-Code Instructions Instructions are hands down the most complex part about emitting I-code. Fortunately, the process is really just a regurgitation of the ones performed many times during the implementa- tion of XASM and the XVM. Naturally, the first thing to do when emitting an instruction is to map the opcode to its corre- sponding string in the mnemonic array declared read more..

  • Page - 1004

    963 // Determine the number of operands int iOpCount = pCurrNode->Instr.OpList.iNodeCount; // If there are operands to emit, follow the instruction with some space if ( iOpCount ) { // All instructions get at least one tab fprintf ( g_pOutputFile, "\t" ); // If it's less than a tab stop's width in characters, however, they // get a second if ( strlen ( ppstrMnemonics read more..

  • Page - 1005

    964 // String literal case OP_TYPE_STRING_INDEX: fprintf ( g_pOutputFile, "\"%s\"", GetStringByIndex ( & g_StringTable, pOp->iStringIndex ) ); break; // Variable case OP_TYPE_VAR: fprintf ( g_pOutputFile, "%s", GetSymbolByIndex ( pOp->iSymbolIndex )->pstrIdent ); break; // Array index absolute case OP_TYPE_ARRAY_INDEX_ABS: fprintf ( g_pOutputFile, "%s [ %d ]", GetSymbolByIndex ( read more..

  • Page - 1006

    965 // If the operand isn't the last one, append it with a comma and space if ( iCurrOpIndex != iOpCount - 1 ) fprintf ( g_pOutputFile, ", " ); } This should look pretty straightforward, but here’s a quick rundown. Integer operands are print- ed by simply emitting the iIntLiteral field of the Op structure. Floats are handled the same way; they come directly out read more..

  • Page - 1007

    966 case ICODE_NODE_JUMP_TARGET: { // Emit a label in the format _LX, where X is the jump target fprintf ( g_pOutputFile, "\t_L%d:\n", pCurrNode->iJumpTargetIndex ); } It’s simply a matter of prefixing the jump target index with _L to make a valid label, and then fol- lowing it with a colon to turn it into a declaration. Finishing Up The rest of the operand emission read more..

  • Page - 1008

    967 void EmitCode () { // ---- Open the output file if ( ! ( g_pOutputFile = fopen ( g_pstrOutputFilename, "wb" ) ) ) ExitOnError ( "Could not open output file for output" ); // ---- Emit the header EmitHeader (); Immediately following the header are the directives: // ---- Emit directives fprintf ( g_pOutputFile, "; ---- Directives ---------------------------\n\n" ); read more..

  • Page - 1009

    968 // Pointer to hold the _Main () function, if it's found FuncNode * pMainFunc = NULL; // Loop through each function and emit its declaration and code, if functions // exist if ( g_FuncTable.iNodeCount > 0 ) { while ( TRUE ) { // Get a pointer to the node pCurrFunc = ( FuncNode * ) pNode->pData; // Don't emit host API function nodes if ( ! pCurrFunc->iIsHostAPI ) { read more..

  • Page - 1010

    969 The table traversal then begins, assuming it’s not empty, and pCurrFunc is set to pNode’s current pData member at each iteration. The first thing to determine is whether the current function is defined by the script, or whether it belongs to the host API. Host API functions are simply kept in the function table for the parser’s benefit so it can validate read more..

  • Page - 1011

    970 void AssmblOutputFile () { // Command-line parameters to pass to XASM char * ppstrCmmndLineParams [ 3 ]; // Set the first parameter to "XASM" (not that it really matters) ppstrCmmndLineParams [ 0 ] = ( char * ) malloc ( strlen ( "XASM" ) + 1 ); strcpy ( ppstrCmmndLineParams [ 0 ], "XASM" ); // Copy the .XASM filename into the second parameter read more..

  • Page - 1012

    971 The AssmblOutputFile () function begins by declaring a string array of three elements called ppstrCmmndLineParams []. You allocate three elements because the argv [] array passed to a con- sole application’s main () function always includes the name of the executable as typed at the command line at index zero of the array. The second element in the array is the filename read more..

  • Page - 1013

    972 WRAPPING IT ALL UP At this point, you’ve seen how every component of the XtremeScript compiler was designed and implemented from the ground up, and for the most part, seen how they fit together. This section covers a few loose ends left over from the previous discussion. Initiating the Compilation Process Earlier in the chapter, the compiler’s main () function was read more..

  • Page - 1014

    973 // Traverse the list to count each symbol type for ( int iCurrSymbolIndex = 0; iCurrSymbolIndex < g_SymbolTable.iNodeCount; ++ iCurrSymbolIndex ) { // Create a pointer to the current symbol structure SymbolNode * pCurrSymbol = GetSymbolByIndex ( iCurrSymbolIndex ); // It's an array if the size is greater than 1 if ( pCurrSymbol->iSize > 1 ) ++ iArrayCount; // It's a variable read more..

  • Page - 1015

    974 // Print out final calculations printf ( "%s created successfully!\n\n", g_pstrOutputFilename ); printf ( "Source Lines Processed: %d\n", g_SourceCode.iNodeCount ); printf ( " Stack Size: " ); if ( g_ScriptHeader.iStackSize ) printf ( "%d", g_ScriptHeader.iStackSize ); else printf ( "Default" ); printf ( "\n" ); printf ( " read more..

  • Page - 1016

    975 printf ( " _Main () Present: " ); if ( g_ScriptHeader.iIsMainFuncPresent ) printf ( "Yes (Index %d)\n", g_ScriptHeader.iMainFuncIndex ); else printf ( "No\n" ); printf ( "\n" ); } It should all be pretty self-explanatory. A number of variables are created to hold various totals that are either read directly from the iNodeCount of tables or read more..

  • Page - 1017

    976 // Calculate 2^8 and put the result in Y [ 1 ] Y [ 1 ] = MyGlobal ^ X; } By hand-compiling this code, meaning to compile it manually without the aid of an actual com- piler, you can see pretty easily that the script should compile down to something along these lines: Var MyGlobal Func _Main { Var X Var Y [ 4 ] Mov MyGlobal, 2 Mov X, 8 Exp read more..

  • Page - 1018

    977 These can be hard-coded into the table with repeated calls to AddSymbol (). Once again, it’s important to save their indexes for later use: // Hard-code symbols into the table and save their indexes int iMyGlobalIndex = AddSymbol ( "MyGlobal", 1, SCOPE_GLOBAL, SYMBOL_TYPE_VAR ); int iXIndex = AddSymbol ( "X", 1, iMainIndex, SYMBOL_TYPE_VAR ); int iYIndex = AddSymbol ( read more..

  • Page - 1019

    978 These three string buffers now contain the three lines of executable, non-declaration code from the original high-level script we hand-compiled. With these ready to go, let’s add them, along with the instructions, to the I-code module. The First Instruction Here’s the first instruction: Mov MyGlobal, 2 And this is the line of high-level code it was hand-compiled from: read more..

  • Page - 1020

    979 Here’s the code used to hard-code it into the I-code module: // X = 8; AddICodeSourceLine ( iMainIndex, pstrLine1 ); iInstrIndex = AddICodeInstr ( iMainIndex, INSTR_MOV ); AddVarICodeOp ( iMainIndex, iInstrIndex, iXIndex ); AddIntICodeOp ( iMainIndex, iInstrIndex, 8 ); Once again, AddICodeSourceLine () is called first to add the source line annotation. This is fol- lowed by the read more..

  • Page - 1021

    980 The Results When the compiler runs, the CompileSourceFile () function will hard-code the data covered pre- viously into the function table, symbol table, and I-code stream. From that point on, it will be as if that was read directly from the source file and converted by the parser. The rest of the compiler has no idea where any of it came from, and doesn’t care. read more..

  • Page - 1022

    981 How cool is this? You’ve created a perfectly valid, XASM-ready assembly file with full source code annotation. You now know the framework for the ever-evolving compiler works (at least, with as much certainty as you can derive from a single, simplistic test). With everything working so far, you can plow through the parser in the next chapter and create a finished, read more..

  • Page - 1023

    982 CHALLENGES Being that this chapter was mostly about preparation for the parser you’ll build in the next, there isn’t much room for improvement or enhancement just yet. As a result, there’s just one challenge for this chapter: ■ Intermediate: Implement the missing #include and #define preprocessor directives discussed earlier. 14. BUILDING THE XTREMESCRIPT COMPILER FRAMEWORK read more..

  • Page - 1024

    Parsing and Semantic Analysis “Boy, a month in Europe with Elaine. That guy’s coming home in a body bag.” ——Kramer, Seinfeld CHAPTER 15 read more..

  • Page - 1025

    984 T his is the last of this book’s three chapters on the construction of the XtremeScript compil- er. You started in Chapter 13 with the development of a complete lexical analyzer module, and integrated it with a full compiler framework in Chapter 14. You now have a compiler that, with the exception of a parser, is finished. Along the path from the source code to read more..

  • Page - 1026

    985 WHAT IS PARSING? In almost oversimplified terms, a script’s code can be said to exist in three primary forms as it passes through the compiler, and you’ve studied them exhaustively throughout this book’s recent chapters. The code begins as a raw stream of characters presented by the loader and preproces- sor modules. The lexical analyzer module then “elevates” this raw read more..

  • Page - 1027

    986 Syntax goes a long way towards helping you understand both what a language is saying, as well as whether it’s valid. Speaking in terms of syntax alone, you can determine that the two lines of code listed previously are expressions, and, furthermore, that they’re valid ones. To understand the shortcomings of syntactic analysis, however, you need to understand exactly how read more..

  • Page - 1028

    987 ing the context in which these constructs appear. It can perform tasks such as ensuring that an identifier is valid in an expression, like you saw previously, as well as preventing identifier redefin- ition, ensuring that the value returned from an expression is valid for its destination, and so on. Expressing Syntax Semantics are important, but I’m going to start with read more..

  • Page - 1029

    988 Notice that until the optional array notation is reached, there’s only one arrow to follow from one token to the next. Once the identifier is passed, however, the path forks to allow one of two possibilities. Lastly, note the difference between the rectangular and rounded nodes. Tokens enclosed in a rectangle refer to literal strings that must appear as-is, exactly, such read more..

  • Page - 1030

    989 To wrap this section up, let’s look quickly at what the grammar might look like if you threw some extra non-terminals in. Although these additions aren’t necessary in this specific example, they help demonstrate the flexibility of BNF more clearly. In this example, you’ll take the ‘[' Int ']' a non-terminal of its own, so it can be nested in the VarDecl read more..

  • Page - 1031

    990 The general constant declaration is the somewhat abstract root of the tree, whereas its child nodes are each of the terminal symbols—const, an identifier, =, and an integer literal. This partic- ular tree is a bit messy, however; it’s bogged down by useless nodes that only serve to get in the way. You can prune the tree a bit to remove the implicit and therefore read more..

  • Page - 1032

    991 const and = nodes are therefore implied. To make this example a bit more interesting, however, let’s expand the const keyword to define entire arrays of integer constants. The syntax diagram for this new version appears in Figure 15.6. Here’s an example of its usage: const MyArray [ 4 ] = { 0, 16, -4, 8192 }; WHAT IS PARSING? Figure 15.6 The syntax diagram for read more..

  • Page - 1033

    992 Note that now, due to their respective added complexity, I’ve abstracted the identifier and array size to an “L-Value,” and the array of values to an “R-Value”. Within the syntax tree, the Ident node under L-Value corresponds to the constant’s identifier, and the Int corresponds to its size. Under the R-Value, I simply filled in three values; there could actually be read more..

  • Page - 1034

    993 Once a program or script has been converted into a parse tree, it can be easily scanned and ana- lyzed by other modules, such as the semantic analyzer. Semantics are easy to verify in a parse tree, because the analyzer can rest assured that the tree is free of syntax errors, and can focus entirely on traversing the nodes and ensuring that they can legally appear read more..

  • Page - 1035

    994 if ( U > V ) { Z [ 0 ] = FuncX ( U / 2, V / 2 ); Z [ 1 ] = 0; } else { Z [ 0 ] = 0; Z [ 1 ] = FuncY ( U * V ); } return U * ( V << 8 ) + ( Z [ 0 ] + 3.14159 * FuncY ( Z [ 1 ] / V ) ); } It looks like a formidable challenge, and indeed it is, but the key is to approach it in an incre- mental manner that read more..

  • Page - 1036

    995 func X () { { DoStuff (); } return DoEvenMoreStuff (); } So, it’s clear that while loops aren’t the only places you’ll need the ability to parse expressions and code blocks. And because it’s obvious that both of these operations will be complex, it’s a good idea to abstract them into their own functions anyway. So, you’ll add the ParseExpr () and ParseBlock () read more..

  • Page - 1037

    996 THE XTREMESCRIPT PARSER MODULE The XtremeScript parser module will be implemented in parser.cpp|h, and will consist primarily of functions for parsing specific non-terminals of the XtremeScript grammar as expressed by the syntax diagrams. By putting all of these together, you’ll have a complete parser that understands the entire language and can translate token and lexeme streams read more..

  • Page - 1038

    997 As you might have already guessed, this variable will work just like the iScope field of the SymbolNode structure discussed in the last chapter—a value of zero means the scope is currently global, whereas any positive, nonzero value is interpreted as an index into the function table cor- responding to the current function. Check out Figure 15.11. THE XTREMESCRIPT PARSER read more..

  • Page - 1039

    998 void ReadToken ( Token ReqToken ) { // Determine if the next token is the required one if ( GetNextToken () != ReqToken ) { // If not, exit on a specific error char pstrErrorMssg [ 256 ]; switch ( ReqToken ) { case TOKEN_TYPE_INT: strcpy ( pstrErrorMssg, "Integer" ); break; case TOKEN_TYPE_FLOAT: strcpy ( pstrErrorMssg, "Float" ); break; case TOKEN_TYPE_IDENT: strcpy read more..

  • Page - 1040

    999 case TOKEN_TYPE_RSRVD_WHILE: strcpy ( pstrErrorMssg, "while" ); break; case TOKEN_TYPE_RSRVD_FUNC: strcpy ( pstrErrorMssg, "func" ); break; case TOKEN_TYPE_RSRVD_RETURN: strcpy ( pstrErrorMssg, "return" ); break; case TOKEN_TYPE_OP: strcpy ( pstrErrorMssg, "Operator" ); break; case TOKEN_TYPE_DELIM_COMMA: strcpy ( pstrErrorMssg, "," ); break; case TOKEN_TYPE_DELIM_OPEN_PAREN: strcpy read more..

  • Page - 1041

    1000 // Finish the message strcat ( pstrErrorMssg, " expected" ); // Display the error ExitOnCodeError ( pstrErrorMssg ); } } The function is long, but simple. As I mentioned, it reads the token, compares it to the token specified by ReqToken, and formulates the proper error message if they don’t match. For each token type, it creates a string that mentions the token read more..

  • Page - 1042

    1001 more sophisticated parser modules that can handle more and more of the language. After mas- tering declarations, you’ll start with simple expressions, and then move on to the entire expres- sion vocabulary of the language, and finally wrap it all up with general statements like loops, branching, and assignments. The result will be a finished parser module that completes read more..

  • Page - 1043

    1002 Syntax Diagrams The first thing to do is devise an initial syntax diagram that lays out the exact syntax for empty statements and code blocks. Figure 15.13 contains this diagram. 15. PARSING AND SEMANTIC ANALYSIS Figure 15.13 The syntax diagrams for empty statements and code blocks. This is an understandably simple diagram, but it’s still very effective in its description. A read more..

  • Page - 1044

    1003 ; { } ; { ; } {;} ;;; {}{}{} ;;; {;} { ; { { ; } ; } } In contrast, the following code blocks are illegal: { ; { { {{ } } It can be tricky to grasp at first, but the basic summary is this—Blocks are types of Statements, but they can also contain Statements. This recursive relationship gives the compiler the flexibility to parse arbitrary levels of nesting. Let’s read more..

  • Page - 1045

    1004 The Implementation Remember, the next step after creating syntax diagrams is committing them to code by imple- menting a parsing function for each non-terminal. The grammar so far has two non-terminals— Statement and Code Block. These will map directly to the two parsing functions you’ll write in this section, ParseStatement () and ParseBlock (). It’s also important to read more..

  • Page - 1046

    1005 // If we're at the end of the token stream, break the parsing loop if ( GetNextToken () == TOKEN_TYPE_END_OF_STREAM ) break; else RewindTokenStream (); } } That wasn’t so bad, huh? The function starts with a call to ResetLexer () to prep the lexical ana- lyzer module before everything begins. It then sets the current scope to SCOPE_GLOBAL, which makes sense because a read more..

  • Page - 1047

    1006 { // Unexpected end of file case TOKEN_TYPE_END_OF_STREAM: ExitOnCodeError ( "Unexpected end of file" ); break; // Block case TOKEN_TYPE_DELIM_OPEN_CURLY_BRACE: ParseBlock (); break; // Anything else is invalid default: ExitOnCodeError ( "Unexpected input" ); break; } } The logic here is simple. The first thing the function does is use the look-ahead to determine whether a semicolon read more..

  • Page - 1048

    1007 Blocks Blocks are handled by the ParseBlock () function, which performs the simple task of parsing every statement within a pair of curly braces. The great thing about this function is that even when the parser reaches its final, most sophisticated state, this will still be a profoundly simple function that’s little more than a single loop. Let’s look at the code read more..

  • Page - 1049

    1008 PARSING DECLARATIONS Taking a small step up from the decidedly dull world of empty statements and code blocks brings you to the language’s declarations. XtremeScript currently supports two fundamental types of declarations (although you’ll be adding a third before this section is over), variables and arrays (data declarations), and functions (logic/code declarations). Variables are read more..

  • Page - 1050

    1009 void ParseStatement () { // If the next token is a semicolon, the statement // is empty so return if ( GetLookAheadChar () == ';' ) { ReadToken ( TOKEN_TYPE_DELIM_SEMICOLON ); return; } // Determine the initial token of the statement Token InitToken = GetNextToken (); // Branch to a parse function based on the token switch ( InitToken ) { // Unexpected end of file case read more..

  • Page - 1051

    1010 As you can see, the parsing function you’ll use to handle function declarations is called ParseFunc (). It should be pretty clear that you’ll be continually updating the Statement non-terminal as each new statement type is added. With that out of the way, let’s talk about what this new function will do. Parsing the func token is obviously easy, as is the read more..

  • Page - 1052

    1011 // Add the non-host API function to the function // table and get its index int iFuncIndex = AddFunc ( GetCurrLexeme (), FALSE ); // Check for a function redefinition if ( iFuncIndex == -1 ) ExitOnCodeError ( "Function redefinition" ); // Set the scope to the function g_iCurrScope = iFuncIndex; By the time ParseFunc () is called, the func keyword has already been read more..

  • Page - 1053

    1012 of this, a specific convention must be agreed upon beforehand; once a language has defined such a convention, parameters are pushed onto the stack using the chosen order, and are popped off within the function’s code in the reverse order. XtremeScript will pass parameters in the left-to-right order, which means functions will have to read them from right to left. What read more..

  • Page - 1054

    1013 // Create an array to store the parameter list locally char ppstrParamList [ MAX_FUNC_DECLARE_PARAM_COUNT ][ MAX_IDENT_SIZE ]; // Read the parameters while ( TRUE ) { // Read the identifier ReadToken ( TOKEN_TYPE_IDENT ); // Copy the current lexeme to the parameter list array CopyCurrLexeme ( ppstrParamList [ iParamCount ] ); // Increment the parameter count ++ iParamCount; // Check read more..

  • Page - 1055

    1014 You can begin by attempting to read an opening parenthesis token. If it’s found, you immediately use the look-ahead to determine whether a closing parenthesis follows it. If so, the parameter list is empty and you can skip past the parameter parsing logic entirely. If not, you have to make sure the current function isn’t _Main (), because _Main () can’t legally read more..

  • Page - 1056

    1015 Parsing the Function’s Body The last order of business is parsing the function’s body. Fortunately, function bodies are really just code blocks, and because you’ve already written ParseBlock (), all you need to do is call it. Of course, before doing so, you need to use ReadToken () to ensure that an opening curly brace is next in the token stream. If so, read more..

  • Page - 1057

    1016 Once ParseBlock () returns, you know the entire function body has been handled. Of course, at this point, all it can contain are empty statements and nested blocks, but even after the rest of the statement types are added, ParseFunc () will remain the same. Because you’re now back outside the function’s body, you can set the scope back to SCOPE_GLOBAL. Remember, if the read more..

  • Page - 1058

    1017 Nothing’s changed except the initial g_iCurrScope check. If the scope is currently global, an error is presented to alert the users that the block is illegal. Variable and Array Declarations Now that you can determine whether you’re inside a function, and get a hold of current func- tion’s index at any time, you’re ready to tackle variable declarations. Figure 15.17 read more..

  • Page - 1059

    1018 { ReadToken ( TOKEN_TYPE_DELIM_SEMICOLON ); return; } // Determine the initial token of the statement Token InitToken = GetNextToken (); // Branch to a parse function based on the token switch ( InitToken ) { // Unexpected end of file case TOKEN_TYPE_END_OF_STREAM: ExitOnCodeError ( "Unexpected end of file" ); break; // Block case TOKEN_TYPE_DELIM_OPEN_CURLY_BRACE: ParseBlock (); break; // read more..

  • Page - 1060

    1019 Here’s the code to do so: void ParseVar () { // Read an identifier token ReadToken ( TOKEN_TYPE_IDENT ); // Copy the current lexeme into a local string buffer // to save the variable's identifier char pstrIdent [ MAX_LEXEME_SIZE ]; CopyCurrLexeme ( pstrIdent ); // Set the size to 1 for a variable (an array will // update this value) int iSize = 1; // Is the read more..

  • Page - 1061

    1020 // Read the closing brace ReadToken ( TOKEN_TYPE_DELIM_CLOSE_BRACE ); } // Add the identifier and size to the symbol table if ( AddSymbol ( pstrIdent, iSize, g_iCurrScope, SYMBOL_TYPE_VAR ) == -1 ) ExitOnCodeError ( "Identifier redefinition" ); // Read the semicolon ReadToken ( TOKEN_TYPE_DELIM_SEMICOLON ); } As you can see, it’s a pretty simple process, and definitely easier read more..

  • Page - 1062

    1021 Host API Function Declarations The last type of declaration to cover might not seem immediately obvious. What’s a host API dec- laration? If you recall the development of XASM in Chapter 9, you’ll remember that host API function calls were obvious to the assembler because they were also used in the context of a CallHost instruction. As a result, their differentiation read more..

  • Page - 1063

    1022 application never defines, at least this rules out the possibilities of accidental misspellings that the compiler doesn’t flag. To do this, you need to make a small addition to the XtremeScript lan- guage by adding the host keyword. The host Keyword The purpose of host is to allow the script writer to declare a host API function before its subse- quent use. read more..

  • Page - 1064

    1023 Then, under the LEX_STATE_IDENT case in the switch block that GetNextToken () uses to convert the terminal lexer state to a token type, you simply add this small block of code: // host if ( stricmp ( g_CurrLexerState.pstrCurrLexeme, "host" ) == 0 ) TokenType = TOKEN_TYPE_RSRVD_HOST; That’s all it takes. The lexer now understands the new keyword, and you’re ready to read more..

  • Page - 1065

    1024 // Function definition case TOKEN_TYPE_RSRVD_FUNC: ParseFunc (); break; // Host API function import case TOKEN_TYPE_RSRVD_HOST: ParseHost (); break; // Variable/array declaration case TOKEN_TYPE_RSRVD_VAR: ParseVar (); break; // Anything else is invalid default: ExitOnCodeError ( "Unexpected input" ); break; } } Figure 15.20 is a more recent version of the ever-evolving Statement non-terminal read more..

  • Page - 1066

    1025 Note that I refer to it as a “host API function import”. If you recall from Chapter 11, the process of exposing a function on behalf of the host application is called exporting. This means that from the script’s perspective, the function is being imported. Anyway, whenever the host token appears as the initial token of a new statement, ParseHost () is called to read more..

  • Page - 1067

    1026 Once the function name has been parsed and added to the function table with the host API flag, two more tokens are read to ensure that the statement ends with a (). Finally, a semicolon is read, and the declaration is fully parsed. Testing Code Emitter Module So far, the parser is shaping up quite nicely. It understands the fundamental structure of a script through read more..

  • Page - 1068

    1027 By saving this file as declare.xss and running it through the Programs/Chapter 15/15-01/ version of the compiler on the CD with the -A switch, you’ll get the following output: ; DECLARE.XASM ; Source File: TEST.XSS ; XSC Version: 0.8 ; Timestamp: Fri Sep 13 14:53:08 2002 ; ---- Directives ------------------------------------------- ; ---- Global Variables read more..

  • Page - 1069

    1028 Notice also that the host declaration seems to have disappeared. This is because such declarations only exist for the compiler’s benefit, not the assembler’s. However, by giving the compiler a record of which functions are which, it will know whether to ultimately emit the function call with the Call instruction or the CallHost instruction. PARSING SIMPLE EXPRESSIONS read more..

  • Page - 1070

    1029 Here you have two operands, separated by the + operator. You know, as a well-trained, arithmetic- loving human, that this expression is saying “add 16 to 32.” You also know, thanks to the human brain’s modest computation facilities, that the sum is 48. But how can you get the parser to do the same thing? You can start by applying the same approach used for read more..

  • Page - 1071

    1030 the same thing. The parsing process would begin just as it did in the last example—16, +, and 32 would be read from the token stream, and after the integer lexemes were converted to their actu- al values, the addition operator would be applied to them. This would yield 48. From here, you can simply continue the process, by conceptually “collapsing” 16 + 32 into read more..

  • Page - 1072

    1031 same precedence levels are meant to be handled in a sequential, left-to-right order. Imagine if you tried parsing the following expression using the current technique: 16 + 32 * 2 You’d first add 16 to 32, and then multiply the resulting 48 by 2. The “result” would be 96 , even though the real result is 80. The pure left-to-right method doesn’t take operator read more..

  • Page - 1073

    1032 For the purpose of this example, you’ll use two separate stacks. One will store the operands, and the other will store the operators. The following is a walk-through of the process of parsing the previous expression with these stacks. 16 is read as the first token, and is pushed onto the operand stack. The next token is +, which is pushed onto the operator stack. read more..

  • Page - 1074

    1033 This section covers the theory and code behind parsing expressions supporting: ■ Integer and floating-point literal values. ■ Basic arithmetic operators with the proper precedence rules: +, -, *, /. ■ The unary negation and plus operator. ■ Nesting with parentheses. ■ Variable and array references. This initial version of the expression parser doesn’t support the read more..

  • Page - 1075

    1034 ■ Sub-expressions. Sub-expressions, in the context of this first parser, are synonymous with expressions. The next version of the parser will differentiate between the two, but for now, they’re identical. A sub-expression is composed of a number of terms, each separat- ed by + or - operators. For example, X + Y - Z is an example of a sub-expression. X, Y, and Z read more..

  • Page - 1076

    1035 The same rule applies here; any element of the expression separated by the / or * operator that’s not nested within parentheses is considered a separate factor. Because factors are the lowest-level entities within this expression, you can evaluate them. The first is Y, which is simply a variable. The second is a nested expression within parentheses. What I’m driving at read more..

  • Page - 1077

    1036 by parsing each term and adding or subtracting it. Terms are parsed by multiplying or dividing each of the factors they contain. Let’s apply this to an example that combines the previous two. Consider the following expression: 10 + 128 * 4 - 16 + 10 / 2 Here you see multiple levels of operator precedence, which means things will be more complicat- ed this time read more..

  • Page - 1078

    1037 Coding the Expression Parser As you might have guessed, you can code an expression parser by creating Parse* () functions for each of the expression entities covered in the last section. Specifically, you need ParseExpr () for parsing expressions, which calls ParseSubExpr () for parsing sub-expressions, which subsequently calls ParseTerm () for parsing terms, which finally calls read more..

  • Page - 1079

    1038 I solved this problem by “simulating” a pair of general-purpose registers called _T0 and _T1 (T standing for “temporary”). This is accomplished by forcing the declaration of _T0 and _T1 as glob- als in every script. In other words, all XVM assembly scripts produced by the XSC compiler con- tain this at the top of their global definitions: Var _T0 Var _T1 Now, after pushing read more..

  • Page - 1080

    1039 Where Expr0 is the first operand and Expr1 is the second. This would be parsed by calling ParseExpr () to parse the first operand. The top element of the stack now contains the result of this expression (or at least, it will at runtime). The division operator would then be parsed and saved in a local variable. A second call would be made to ParseExpr (), and read more..

  • Page - 1081

    1040 // Parse the second term ParseTerm (); // Pop the first operand into _T1 iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_POP ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar1SymbolIndex ); // Pop the second operand into _T0 iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_POP ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar0SymbolIndex ); // Perform the binary operation read more..

  • Page - 1082

    1041 erated to perform either an addition or subtraction based on the operator token. After the oper- ation is performed, the value is pushed back onto the stack. ParseTerm () was called by ParseSubExpr () to handle each operand in between its additive opera- tors, so let’s take a look at it now: void ParseTerm () { int iInstrIndex; // The current operator type int iOpType; read more..

  • Page - 1083

    1042 // Perform the binary operation associated with the specified operator int iOpInstr; switch ( iOpType ) { // Binary multiplication case OP_TYPE_MUL: iOpInstr = INSTR_MUL; break; // Binary division case OP_TYPE_DIV: iOpInstr = INSTR_DIV; break; } iInstrIndex = AddICodeInstr ( g_iCurrScope, iOpInstr ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar0SymbolIndex ); AddVarICodeOp ( g_iCurrScope, read more..

  • Page - 1084

    1043 { // If it was found, save it and set the unary operator flag iUnaryOpPending = TRUE; iOpType = GetCurrOp (); } else { // Otherwise rewind the token stream RewindTokenStream (); } Factors can be preceded by unary operators, so the first thing the function does is check for one. You’re currently just supporting the unary + and -, so those are the only checks that read more..

  • Page - 1085

    1044 // Does an array index follow the identifier? if ( GetLookAheadChar () == '[' ) { // Ensure the variable is an array if ( pSymbol->iSize == 1 ) ExitOnCodeError ( "Invalid array" ); // Verify the opening brace ReadToken ( TOKEN_TYPE_DELIM_OPEN_BRACE ); // Make sure an expression is present if ( GetLookAheadChar () == ']' ) ExitOnCodeError ( "Invalid expression" ); read more..

  • Page - 1086

    1045 iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_PUSH ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, pSymbol->iIndex ); } else { ExitOnCodeError ( "Arrays must be indexed" ); } } } else { // It's not a variable or array ExitOnCodeError ( "Unknown identifier" ); } break; } // It's a nested expression, so call ParseExpr () recursively and validate // the presence of the closing read more..

  • Page - 1087

    1046 If an open bracket is found, the identifier is probably an array. The first check here is to make sure that the identifier’s symbol table record is indeed of the array type; otherwise, an error is flagged. If the symbol is a valid array, ParseExpr () is called again to parse the expression that lies in between the braces. The closing brace is then validated. The array read more..

  • Page - 1088

    1047 If a negation operator is present, the value on top of the stack (the value of the factor) is popped into _T0, negated with a Neg instruction, and pushed back on. For simplicity’s sake, I’ve left out the unary + operator; I hardly consider it common enough to worry about here, even though it’s accepted by the syntax. That’s all the code you need to parse read more..

  • Page - 1089

    1048 Push _T0 Push 4 Push 4 Pop _T0 Pop _T1 Mul _T0, _T1 Push _T0 Pop _T0 Pop _T1 Add _T0, _T1 Push _T0 Quite a bit of code for such a simple statement, eh? Unfortunately, such is the nature of a non- optimizing compiler. Fortunately, the code it does emit is quite easy to read, read more..

  • Page - 1090

    1049 expanding the ParseFactor () function is perhaps the easiest way to expand the parser, because factors lie at the bottom of the expression entity hierarchy and therefore don’t require any fur- ther parsing. All you need to do is determine the factor type’s value, and push it onto the stack. The new factor types are: string literal values, function calls, and the read more..

  • Page - 1091

    1050 // Make sure an expression is present if ( GetLookAheadChar () == ']' ) ExitOnCodeError ( "Invalid expression" ); // Parse the index as an expression recursively ParseExpr (); // Make sure the index is closed ReadToken ( TOKEN_TYPE_DELIM_CLOSE_BRACE ); // Pop the resulting value into _T0 and use it as the index // variable iInstrIndex = AddICodeInstr ( g_iCurrScope, read more..

  • Page - 1092

    1051 if ( GetFuncByName ( GetCurrLexeme () ) ) { // It is, so parse the call ParseFuncCall (); // Push the return value iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_PUSH ); AddRegICodeOp ( g_iCurrScope, iInstrIndex, REG_CODE_RETVAL ); } } break; } The TRUE and FALSE comments are the first newcomers, and are handled easily thanks to the lexer’s capability to directly return the read more..

  • Page - 1093

    1052 // It is, so start the parameter count at zero int iParamCount = 0; // Attempt to read the opening parenthesis ReadToken ( TOKEN_TYPE_DELIM_OPEN_PAREN ); // Parse each parameter and push it onto the stack while ( TRUE ) { // Find out if there's another parameter to push if ( GetLookAheadChar () != ')' ) { // There is, so parse it as an expression ParseExpr (); // read more..

  • Page - 1094

    1053 // Call the function, but make sure the right call instruction is used int iCallInstr = INSTR_CALL; if ( pFunc->iIsHostAPI ) iCallInstr = INSTR_CALLHOST; int iInstrIndex = AddICodeInstr ( g_iCurrScope, iCallInstr ); AddFuncICodeOp ( g_iCurrScope, iInstrIndex, pFunc->iIndex ); } In a nutshell, the logic simply scans through each parameter and calls ParseExpr () to parse it. It also read more..

  • Page - 1095

    1054 What you’ve seen here will play a large role in the logical operators you’ll develop in the next sec- tion, so make sure you understand what’s going on. Just to reiterate, the idea here is to generate code that implements the logic behind the operator. In this case, because you want to push the logical not of the factor onto the stack, you want to push read more..

  • Page - 1096

    1055 The Logical And Operator As an example of a logical operator, let’s look at logical and. Due to the compression of the XtremeScript operator precedence levels, you’re going to handle this operator in ParseExpr (), where the lowest-precedence level operators are handled. Here’s the code for converting a binary and operator expression into assembly: case OP_TYPE_LOGICAL_AND: { // read more..

  • Page - 1097

    1056 // L1: (Exit) AddICodeJumpTarget ( g_iCurrScope, iExitJumpTargetIndex ); break; } The basic logic here is as follows. Given an example line of XtremeScript like the following: X && Y; // Logical X and Y Assembly code should be generated that adheres to the following format: JE _T0, 0, False JE _T1, 0, False Push 1 Jmp Exit True: Push 0 read more..

  • Page - 1098

    1057 // Add the jump instruction's operands (_T0 and _T1) AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar0SymbolIndex ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar1SymbolIndex ); AddJumpTargetICodeOp ( g_iCurrScope, iInstrIndex, iTrueJumpTargetIndex ); // Generate the outcome for falsehood iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_PUSH ); AddIntICodeOp ( g_iCurrScope, iInstrIndex, 0 ); read more..

  • Page - 1099

    1058 Jmp Exit True: Push 1 Exit: Presto! The Rest You’ve seen an example of logical operators, and one of the relationals. Once you understand how these two work, you’re definitely prepared to understand the rest. Again, to save space in an already rather large chapter, I’ve omitted the remaining operators in print and instead encourage you to check them out read more..

  • Page - 1100

    1059 The next section will change all that, with the implementation of all the missing features I men- tioned previously. Before you go ahead and do that, however, it’s important to note that you don’t have a particularly convenient or readily available venue for testing the output of the compiler. Although the XVM is indeed finished, it’s not much good without a host read more..

  • Page - 1101

    1060 Of course, in order to use the XVM, you’ll need to link console.cpp with xvm.cpp and include xvm.h. This is done easily in Visual C++ by simply loading both console.cpp and xvm.cpp into the same Console Application project. Both the project and workspace files for accomplishing this are located in the Programs/Chapter 15/XVM Console/Source/ directory. The rest of this read more..

  • Page - 1102

    1061 Loading the Script Once you know a filename is present on the command line, you can start the XVM with a call to XS_Init () and load the script. Remember, it’s important to save the script’s index, and for the sake of completeness, you need to check for load errors as well: // Initialize the runtime environment XS_Init (); // Declare the thread indexes int read more..

  • Page - 1103

    1062 } printf ( ".\n" ); return 0; } Running the Script Once the thread is in memory, it’s time to run it. The script is initially started with a call to XS_StartScript (), and kept in motion with repeated calls to XS_RunScripts (). You call XS_RunScripts () repeatedly in a while loop that runs until a key is pressed. This way, any scripts that involve infinite read more..

  • Page - 1104

    1063 PrintString () As mentioned previously, you’ll wrap printf () to do the printing: void HAPI_PrintString ( int iThreadIndex ) { // Read in the parameters char * pstrString = XS_GetParamAsString ( iThreadIndex, 0 ); // Print the string printf ( "%s", pstrString ); // Return to the XVM XS_Return ( iThreadIndex, 1 ); } This simple function operates in three steps. First, it read more..

  • Page - 1105

    1064 // Return to the XVM XS_Return ( iThreadIndex, 0 ); } Remember, however, that even without parameters it’s vital to return from the function with XS_Return (). Forgetting to do so will lead to a corrupted stack and most likely crash the machine. Registering the API The last step is of course to register the three functions you just created. As you should remem- ber read more..

  • Page - 1106

    1065 Assignment Statements I intentionally decided not to support C/C++-style assignments, as they can appear anywhere in an expression and often lead to confusion. Rather, you’re taking a simpler route and making assignments their own specific type of statement. This lends itself to a cleaner language that’s eas- ier to parse. Of course, you’ll still support the full range of read more..

  • Page - 1107

    1066 { // It's an identifier, so treat the statement as an assignment ParseAssign (); } else { // It's invalid ExitOnCodeError ( "Invalid identifier" ); } break; } Once again the Statement syntax diagram grows, as shown in Figure 15.29. 15. PARSING AND SEMANTIC ANALYSIS Figure 15.29 The syntax diagram for Statements with assignments added. If an identifier is read, you can assume read more..

  • Page - 1108

    1067 void ParseAssign () { // Make sure we're inside a function if ( g_iCurrScope == SCOPE_GLOBAL ) ExitOnCodeError ( "Assignment illegal in global scope" ); int iInstrIndex; // Assignment operator int iAssignOp; // Annotate the line AddICodeSourceLine ( g_iCurrScope, GetCurrSourceLine () ); // ---- Parse the variable or array SymbolNode * pSymbol = GetSymbolByIdent ( GetCurrLexeme (), read more..

  • Page - 1109

    1068 else { // Make sure the variable isn't an array if ( pSymbol->iSize > 1 ) ExitOnCodeError ( "Arrays must be indexed" ); } The function begins by making sure the current scope isn’t global, and declaring a few variables. iInstrIndex will be used when generating the statement’s I-code to keep track of the current instruction node. iAssignOp will also be used later read more..

  • Page - 1110

    1069 // ---- Parse the assignment operator if ( GetNextToken () != TOKEN_TYPE_OP && ( GetCurrOp () != OP_TYPE_ASSIGN && GetCurrOp () != OP_TYPE_ASSIGN_ADD && GetCurrOp () != OP_TYPE_ASSIGN_SUB && GetCurrOp () != OP_TYPE_ASSIGN_MUL && GetCurrOp () != OP_TYPE_ASSIGN_DIV && GetCurrOp () != OP_TYPE_ASSIGN_MOD && GetCurrOp () != OP_TYPE_ASSIGN_EXP read more..

  • Page - 1111

    1070 if ( iIsArray ) { iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_POP ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar1SymbolIndex ); } // ---- Generate the I-code for the assignment instruction switch ( iAssignOp ) { // = case OP_TYPE_ASSIGN: iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_MOV ); break; // += case OP_TYPE_ASSIGN_ADD: iInstrIndex = AddICodeInstr ( g_iCurrScope, read more..

  • Page - 1112

    1071 // &= case OP_TYPE_ASSIGN_AND: iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_AND ); break; // |= case OP_TYPE_ASSIGN_OR: iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_OR ); break; // #= case OP_TYPE_ASSIGN_XOR: iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_XOR ); break; // <<= case OP_TYPE_ASSIGN_SHIFT_LEFT: iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_SHL ); break; // >>= read more..

  • Page - 1113

    1072 AddArrayIndexVarICodeOp () is called to generate an array indexed with a variable. This function is passed pSymbol->iIndex as well as g_iTempVar1SymbolIndex. Finally, you generate the source operand, which is just _T0. As an example, check out the follow- ing fragment of XtremeScript code: var MyArray [ 4 ]; var Radius; Radius = 4; MyArray [ 1 ] = 3.14159 * Radius ^ 2; read more..

  • Page - 1114

    1073 Function Calls Even though you’ve already written logic to call functions from within an expression, you still need to support statements that are themselves single function calls. Fortunately, this is extremely simple. The ParseFuncCall () function you wrote for the expression parser already encapsulates virtually all of the logic you need. All you need to do is update read more..

  • Page - 1115

    1074 { // It's an identifier, so treat the statement as an assignment ParseAssign (); } else if ( GetFuncByName ( GetCurrLexeme () ) ) { // It's a function // Annotate the line and parse the call AddICodeSourceLine ( g_iCurrScope, GetCurrSourceLine () ); ParseFuncCall (); // Verify the presence of the semicolon ReadToken ( TOKEN_TYPE_DELIM_SEMICOLON ); } else { // It's invalid read more..

  • Page - 1116

    1075 The host API function PrintString () is imported, followed by the definition of a script-defined function called PrintStringWrap () that wraps the host API version of the function to print a string as well. Within _Main (), both functions are called via function call statements. Here’s an excerpt of the compiled code: ; PrintStringWrap ( "This is a read more..

  • Page - 1117

    1076 As always, let’s start by adding the proper update to ParseStatement (), as shown in the code list- ing here and in Figure 15.32: // return case TOKEN_TYPE_RSRVD_RETURN: ParseReturn (); break; 15. PARSING AND SEMANTIC ANALYSIS Figure 15.32 The syntax diagram for Statements with return taken into account. With that out of the way, let’s check out ParseReturn (): void ParseReturn read more..

  • Page - 1118

    1077 // Annotate the line AddICodeSourceLine ( g_iCurrScope, GetCurrSourceLine () ); // If a semicolon doesn't appear to follow, parse the // expression and place it in _RetVal if ( GetLookAheadChar () != ';' ) { // Parse the expression to calculate the return value and // leave the result on the stack. ParseExpr (); // Determine which function we're returning from if ( read more..

  • Page - 1119

    1078 if ( g_ScriptHeader.iIsMainFuncPresent && g_ScriptHeader.iMainFuncIndex == g_iCurrScope ) { // It's _Main, so exit the script with _T0 as the exit code iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_EXIT ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar0SymbolIndex ); } else { // It's not _Main, so return from the function AddICodeInstr ( g_iCurrScope, INSTR_RET ); } } Of read more..

  • Page - 1120

    1079 Pop _RetVal Ret } As you can see, the X ^ 2 expression is emitted first, followed by the Pop _RetVal and Ret instruc- tions, which is everything you need. while Loops Expression parsing and assignment statements represented the line-by-line nature of the lan- guage—individual statements that perform specific tasks on their own. With the exception of function read more..

  • Page - 1121

    1080 conditional jump and will either fall through into the loop if the expression eval- uates to true, or jump to a label set beyond the last instruction of the loop body if the expression evaluates to false. This allows the first iteration of the loop to execute if the expression is true, and results in the loop being skipped entirely otherwise. The only problem is that read more..

  • Page - 1122

    1081 Parsing while Loops Now that you understand the theory and assembly language representation behind the while loop, you can write a ParseWhile () function that will parse it. To kick things off, check out the while loop’s syntax diagram in Figure 15.35. PARSING ADVANCED STATEMENTS AND CONSTRUCTS Figure 15.34 The assembly-language representation of a while loop. Figure 15.35 The read more..

  • Page - 1123

    1082 // Block case TOKEN_TYPE_DELIM_OPEN_CURLY_BRACE: ParseBlock (); break; // Function definition case TOKEN_TYPE_RSRVD_FUNC: ParseFunc (); break; // Host API function import case TOKEN_TYPE_RSRVD_HOST: ParseHost (); break; // Variable/array declaration case TOKEN_TYPE_RSRVD_VAR: ParseVar (); break; // while loop block case TOKEN_TYPE_RSRVD_WHILE: ParseWhile (); break; // Anything else is invalid default: read more..

  • Page - 1124

    1083 // Annotate the line AddICodeSourceLine ( g_iCurrScope, GetCurrSourceLine () ); // Get two jump targets; for the top and bottom of the loop int iStartTargetIndex = GetNextJumpTargetIndex (), iEndTargetIndex = GetNextJumpTargetIndex (); // Set a jump target at the top of the loop AddICodeJumpTarget ( g_iCurrScope, iStartTargetIndex ); // Read the opening parenthesis ReadToken ( read more..

  • Page - 1125

    1084 // Read the closing parenthesis ReadToken ( TOKEN_TYPE_DELIM_CLOSE_PAREN ); // Pop the result into _T0 and jump out of the loop if it's nonzero iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_POP ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar0SymbolIndex ); iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_JE ); AddVarICodeOp ( g_iCurrScope, iInstrIndex, g_iTempVar0SymbolIndex ); read more..

  • Page - 1126

    1085 Once the value is in _T0, you can generate the jump instruction that will determine whether to execute the next iteration of the loop. You therefore generate a JE instruction (jump if equal) that essentially looks like this: JE _T0, 0, <Loop End Jump Target> In other words, if the result of the loop’s expression was zero (false), exit the loop. You can now read more..

  • Page - 1127

    1086 break Once inside a loop, it might become necessary to pull the panic switch and immediately termi- nate it. Fortunately, XtremeScript supports C’s break statement for doing just this. At first glance, break seems like it should be an easy addition—after all, it’s just an unconditional jump to the loop’s ending jump target, right? For the most part, this is read more..

  • Page - 1128

    1087 The first step in implementing this solution is declaring a global instance of the Stack structure you created in the last chapter called g_LoopStack: Stack g_LoopStack; This loop needs to be initialized when the parser starts, so you can add the following line of code to ParseSourceCode (), just before it enters its statement parsing loop: InitStack ( & g_LoopStack ); read more..

  • Page - 1129

    1088 // Parse the loop body ParseStatement (); // Pop the loop instance off the stack Pop ( & g_LoopStack ); Quite simply, the code allocates a new Loop structure to hold the loop instance, writes the jump targets to it, and pushes onto the stack. ParseStatement () is then called, as usual, but with the added benefit of the loop stack. When the function returns, you read more..

  • Page - 1130

    1089 // Unconditionally jump to the end of the loop int iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_JMP ); AddJumpTargetICodeOp ( g_iCurrScope, iInstrIndex, iTargetIndex ); } PARSING ADVANCED STATEMENTS AND CONSTRUCTS Figure 15.37 The syntax diagram for break . You first ensure that the statement hasn’t occurred outside of a function, which is of course ille- gal. The source code read more..

  • Page - 1131

    1090 Pop _T0 JE _T0, 0, _L1 ; while ( U > V ) _L4: Push U Push V Pop _T1 Pop _T0 Pop _T0 JE _T0, 0, _L5 ; break; Jmp _L5 Jmp _L4 _L5: ; break; Jmp read more..

  • Page - 1132

    1091 // Annotate the line AddICodeSourceLine ( g_iCurrScope, GetCurrSourceLine () ); // Attempt to read the semicolon ReadToken ( TOKEN_TYPE_DELIM_SEMICOLON ); // Get the jump target index for the start of the loop int iTargetIndex = ( ( Loop * ) Peek ( & g_LoopStack ) )->iStartTargetIndex; // Unconditionally jump to the end of the loop int iInstrIndex = AddICodeInstr ( read more..

  • Page - 1133

    1092 for Loops Although for loops were mentioned in Chapter 7’s XtremeScript language specification, I won’t be implementing them here. Rather, they’re left as a roughly intermediate-level challenge to you. I did this for a number of reasons. First of all, the for loop is really just a different way to package the while loop. For example, the following for loop: for ( X read more..

  • Page - 1134

    1093 to true, the flow of execution “falls into” the true block, which resides just under the expression, and skips the false block when it reaches the end. Otherwise, the true block is skipped and the false block is executed. When the false block terminates, execution continues sequentially, because the rest of the code lies directly below it. The false block is of read more..

  • Page - 1135

    1094 Parsing if Blocks Now that you understand the form the emitted code should take, you can put together a parser rather easily. It’s primarily a matter of emitting the code blocks in the right order and keeping track of the jump targets. Figure 15.41 presents the syntax diagram for if blocks. Here’s the addition you make to ParseStatement () (reflected in Figure read more..

  • Page - 1136

    1095 PARSING ADVANCED STATEMENTS AND CONSTRUCTS Figure 15.41 The Statement syntax diagram, updated to include if blocks. Figure 15.42 The syntax diagram for if blocks. read more..

  • Page - 1137

    1096 // Create a jump target to mark the beginning of the false block int iFalseJumpTargetIndex = GetNextJumpTargetIndex (); // Read the opening parenthesis ReadToken ( TOKEN_TYPE_DELIM_OPEN_PAREN ); // Parse the expression and leave the result on the stack ParseExpr (); // Read the closing parenthesis ReadToken ( TOKEN_TYPE_DELIM_CLOSE_PAREN ); The next step is creating the jump target read more..

  • Page - 1138

    1097 { // If it's found, append the true block with an // unconditional jump past the false block int iSkipFalseJumpTargetIndex = GetNextJumpTargetIndex (); iInstrIndex = AddICodeInstr ( g_iCurrScope, INSTR_JMP ); AddJumpTargetICodeOp ( g_iCurrScope, iInstrIndex, iSkipFalseJumpTargetIndex ); // Place the false target just before the false block AddICodeJumpTarget ( g_iCurrScope, iFalseJumpTargetIndex ); read more..

  • Page - 1139

    1098 Here, the token following the true block (which is a single line in this example) is Exp, which begins with E. Even though this doesn’t necessarily have anything to do with the if block that pre- ceded it, the parser will interpret it as the start of an else clause going by the look-ahead alone. It will then attempt to parse the block, resulting in confusing read more..

  • Page - 1140

    1099 ; X = Y; Push Y Pop _T0 Mov X, _T0 _L1: X is pushed onto the stack and popped into _T0. _T0 is then compared to zero, and if it’s equal, a jump is made to _L0, which marks the top of the false block. Otherwise, the execution falls into the true block and executes sequentially until its last line, when an read more..

  • Page - 1141

    1100 15. PARSING AND SEMANTIC ANALYSIS Figure 15.43 A syntax diagram for the entire XtremeScript language. TEAMFLY Team-Fly® read more..

  • Page - 1142

    1101 /* Hello, world! */ host PrintString (); func _Main () { PrintString ( "Hello, world!" ); } Surreal, huh? Remember of course that because you’re running this on the XVM console, you need to import the PrintString () function. By saving it as hello.xss and passing it through the compiler like so: XSC hello.xss -A you can create both an .XSE and the .XASM file from read more..

  • Page - 1143

    1102 And of course, by running it in the XVM console, you’ll get the following: Hello, world! Drawing Rectangles I personally find coding for the XVM console to be a fun little exercise; it reminds me of the text- mode demo programs you find in the older books on languages such as Pascal and C. In addi- tion to Hello, world!, however, I remember a lot of the read more..

  • Page - 1144

    1103 while ( Y < g_YSize ) { // X-loop X = 0; while ( X < g_XSize ) { // Draw the next asterisk PrintString ( "*" ); // Move to the next column X += 1; } // Move to the next row PrintNewline (); Y += 1; } } After drawing each row of XSize asterisks, a call is made to PrintNewline () to move to the next line. X is incremented at each iteration of the read more..

  • Page - 1145

    1104 Var g_XSize Var g_YSize ; ---- Functions ----------------------------------------- ; ---- Main ---------------------------------------------- Func _Main { Var X Var Y ; g_XSize = 32; Push 32 Pop _T0 Mov g_XSize, _T0 ; g_YSize = 16; Push 16 Pop _T0 Mov g_YSize, _T0 ; Y read more..

  • Page - 1146

    1105 ; X = 0; Push 0 Pop _T0 Mov X, _T0 ; while ( X < g_XSize ) _L4: Push X Push g_XSize Pop _T1 Pop _T0 JL _T0, _T1, _L6 Push 0 Jmp _L7 _L6: Push 1 _L7: Pop read more..

  • Page - 1147

    1106 Lastly, by running rectangle.xse in the XVM, you get this, a 32x16 rectangle of asterisks: ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** ******************************** read more..

  • Page - 1148

    1107 when the compiler in question performs at least basic optimizations. Given the complexity of writing an optimizing compiler, as well as the fact that XSC was only one of many components described in this book, you’ll have to settle for a compiler whose sole goal is simply to work prop- erly. Fortunately, XSC does that. For an idea of what the demo will look like read more..

  • Page - 1149

    1108 To put it simply, this chapter’s implementation of the demo will consist of two major parts. The first is of course the host application, whose job is to perform low-level tasks like loading graphics and managing the program’s main loop, as well as to expose a host API. The second is the script, which will focus on the actual functionality and logic of the read more..

  • Page - 1150

    1109 Even without structures, this problem could be solved fairly easily with the help of two-dimension- al arrays. For example, you could allocate storage for the on-screen aliens with something like this: var Aliens [ MAX_ALIEN_COUNT ][ 5 ]; Each element of this array is actually five elements, which allows you to store X in element 0, Y in element 1, XVel in element 2, read more..

  • Page - 1151

    1110 Specifically, the host application will need to do the following: ■ Define the host API’s functions. ■ Initialize the XVM, register the host API, and shut everything down when the program ends. ■ Load the necessary graphics. ■ Load the script, and call it on a regular basis within the main loop. The Host API The host API’s primary functions are graphical, but read more..

  • Page - 1152

    1111 Defining a Host API Function As you learned in Chapter 11, a host API function is a typical C function that follows a specific prototype: void FuncName ( int iThreadIndex ) This signature allows the XVM to pass the function the index of the thread that called it, which is used within the function for various tasks such as reading parameters and returning values. read more..

  • Page - 1153

    1112 After calling W_BlitImage (), the function uses XS_Return () to return nothing and clean up its three parameters. BlitBG () BlitBG () is just a simple function that accepts no parameters and returns no values. Its sole con- cern is blitting the background image to the screen with a call to W_BlitImage (): void HAPI_BlitBG ( int iThreadIndex ) { // Blit the background image read more..

  • Page - 1154

    1113 tion within the script will be performed with a call to GetRandomNumber (), which returns a random number between iMin and iMax: void HAPI_GetRandomNumber ( int iThreadIndex ) { // Read in parameters int iMin = XS_GetParamAsInt ( iThreadIndex, 1 ); int iMax = XS_GetParamAsInt ( iThreadIndex, 0 ); // Return a random number between iMin and iMax XS_ReturnInt ( iThreadIndex, 2, read more..

  • Page - 1155

    1114 The parameter it reads with XS_GetParamAsInt () is an index corresponding to a specific timer. This index is then used in a switch block to read the timer’s state. The value is returned with XS_ReturnInt (). g_AnimSpeed and g_MoveSpeed are both handles to internal Wrappuh API timers, so check out the source on the companion CD if you want to learn more. Initialization read more..

  • Page - 1156

    1115 the details of how the demo deals with this, but there’s not much worth explaining here. Suffice it to say, the host application loads the required graphics and makes them globally available to the rest of the program. Handling the Script Lastly, there’s the issue of the script itself. The script is initially loaded with a call to XS_LoadScript (), which loads the read more..

  • Page - 1157

    1116 // Start the main loop MainLoop { // Start the current loop iteration HandleLoop { // Let XtremeScript handle the frame XS_CallScriptFunc ( iThreadIndex, "HandleFrame" ); // Check for the escape key and exit if it's down if ( W_GetKeyState ( W_KEY_ESC ) ) W_Exit (); } } XS_CallScriptFunc () is used in both cases instead of XS_InvokeScriptFunc (), because you want these read more..

  • Page - 1158

    1117 Here are the constants the script will use, in the form of their global declarations: Var ALIEN_COUNT ; Number of aliens on-screen Var MIN_VEL ; Minimum velocity Var MAX_VEL ; Maximum velocity Var ALIEN_WIDTH ; Width of the alien sprite Var ALIEN_HEIGHT ; Height of the alien sprite Var read more..

  • Page - 1159

    1118 Aliens [] array and updating each sprite’s pseudo-structure. It also resets CurrAnimFrame to zero. Remember, XtremeScript variables are not initialized and therefore contain unpredictable garbage values until they’re explicitly defined. The process of initializing the Aliens [] array is simple but may not appear immediately straight- forward. Because you’re working with a read more..

  • Page - 1160

    1119 Mov HALF_ALIEN_WIDTH, ALIEN_WIDTH Div HALF_ALIEN_WIDTH, 2 Mov HALF_ALIEN_HEIGHT, ALIEN_HEIGHT Div HALF_ALIEN_HEIGHT, 2 Mov ALIEN_FRAME_COUNT, 32 Mov ALIEN_MAX_FRAME, ALIEN_FRAME_COUNT Dec ALIEN_MAX_FRAME Mov ANIM_TIMER_INDEX, 0 Mov read more..

  • Page - 1161

    1120 Push Y CallHost GetRandomNumber Mov Y, _RetVal Mov Aliens [ CurrArrayIndex ], X Inc CurrArrayIndex Mov Aliens [ CurrArrayIndex ], Y Inc CurrArrayIndex The GetRandomNumber () function was specifically written to read its parameters in reverse order; that read more..

  • Page - 1162

    1121 CallHost GetRandomNumber Mov SpinDir, _RetVal Mov Aliens [ CurrArrayIndex ], SpinDir Inc CurrArrayIndex ; ---- Move to the next alien Inc CurrAlienIndex ; Keep looping until the last alien is reached JL CurrAlienIndex, ALIEN_COUNT, InitLoopStart After the last increment of read more..

  • Page - 1163

    1122 Notice that the second Mov instruction isn’t followed by an Inc. This is because when drawing the sprites, you don’t need to know their velocities. All you care about is their X, Y locations, which reside within the pseudo-structure at offsets 0 and 1, and the direction in which they’re spinning, which is found at offset 4. Because of this, offsets 2 and 3 are read more..

  • Page - 1164

    1123 ; Move to the next alien Inc CurrAlienIndex ; Keep looping until the last alien is reached JL CurrAlienIndex, ALIEN_COUNT, DrawLoopStart Once the frame drawing process is complete, you can call the host API function BlitFrame () to blit the final frame to the screen: ; ---- Blit the completed frame to the screen CallHost read more..

  • Page - 1165

    1124 zero once it reaches ALIEN_MAX_FRAME, so after each increment you compare the new frame to the maximum. If it’s less, a jump is made to SkipClipFrame, which prevents the frame index from wrapping around. Otherwise, you set it to zero. The last major task is moving the sprites along their paths, which is done in sync with the move- ment timer. Therefore, this code read more..

  • Page - 1166

    1125 Strangely, the first instruction in this block of code pushes CurrArrayIndex onto the stack. You’ll see why this is done shortly. For now, the real purpose of this code is setting the X, Y, Xvel, and YVel locals with the appropriate values. XVel and YVel are then added to X and Y, respectively, which moves the sprite along its path. Now that you’ve moved the read more..

  • Page - 1167

    1126 Now that you have the updated sprite locations and velocities calculated in the local variables, you need to store them in the Aliens [] array so they’ll be available for the next frame. However, after all of the array reading you’ve done, CurrArrayIndex has been incremented beyond the base index of the alien. Because you need to write back to the X, Y, Xvel, read more..

  • Page - 1168

    1127 The High-Level XtremeScript Script XtremeScript is very similar to C in most respects, which means that writing the script you labored over in the last section will be a breeze this time through. Most of C’s familiar amenities, such as while loops, expressions, and so on, are readily at your disposal. You can capitalize on these features thoroughly to express the read more..

  • Page - 1169

    1128 HALF_ALIEN_WIDTH = ALIEN_WIDTH / 2; HALF_ALIEN_HEIGHT = ALIEN_WIDTH / 2; ALIEN_FRAME_COUNT = 32; ALIEN_MAX_FRAME = ALIEN_FRAME_cOUNT - 1; ANIM_TIMER_INDEX = 0; MOVE_TIMER_INDEX = 1; The first noteworthy difference between what’s going on here and what went on the assembly ver- sion is the expressions used to define the constants. In assembly, the definition of HALF_ALIEN_WIDTH as read more..

  • Page - 1170

    1129 Aliens [ CurrArrayIndex + 2 ] = XVel; Aliens [ CurrArrayIndex + 3 ] = YVel; Aliens [ CurrArrayIndex + 4 ] = SpinDir; // Move to the next alien CurrAlienIndex += 1; CurrArrayIndex += 5; } Although the assembly version of the loop is using Inc and JL instructions to regulate iterations, while allows you to do everything with a single conditional expression. Furthermore, read more..

  • Page - 1171

    1130 else FinalAnimFrame = CurrAnimFrame; // Blit the sprite BlitSprite ( FinalAnimFrame, X, Y ); // Move to the next alien CurrAlienIndex += 1; CurrArrayIndex += 5; } // Blit the completed frame to the screen BlitFrame (); Again, you can’t help but appreciate the huge gains in clarity and brevity that are attributed to high-level code. In only a few lines, you’re expressing read more..

  • Page - 1172

    1131 CurrAlienIndex = 0; CurrArrayIndex = 0; while ( CurrAlienIndex < ALIEN_COUNT ) { // Get the X, Y location X = Aliens [ CurrArrayIndex ]; Y = Aliens [ CurrArrayIndex + 1 ]; // Get the X, Y velocities XVel = Aliens [ CurrArrayIndex + 2 ]; YVel = Aliens [ CurrArrayIndex + 3 ]; // Increment the paths of the aliens X += XVel; Y += YVel; Aliens [ CurrArrayIndex ] = read more..

  • Page - 1173

    1132 The Results Unfortunately, XtremeScript’s impressive usability comes at a significant price. The simple fact of the matter is that in the absence of any form of code optimization on behalf of the compiler, the high-level equivalent to a hand-coded assembly script will be hugely inefficient and run at a frac- tion of the speed. You’ve seen the evidence for this read more..

  • Page - 1174

    1133 Fortunately, it won’t take a particularly massive amount of brainpower to determine at least basic optimizations. Any ad hoc optimization you can notice will help, so give it a shot! To get you start- ed, here are a few general tips to keep in mind: ■ The stack is utilized to an almost criminal degree when parsing an expression, which is the primary reason that read more..

  • Page - 1175

    1134 games can also benefit from compiled scripts in the same way. Such games often idle for long periods of time, waiting for the player to react, and also involve lots of complex logic. XtremeScript would once again provide a perfectly adequate solution in these cases. SUMMARY This is it! After all the buildup and anticipation, you’ve finally created a real, fully read more..

  • Page - 1176

    1135 ■ 15_04/ is the final and complete parser module, which subsequently completes the com- piler. It adds the full range of XtremeScript statements: assignments, loops, branching, and so on. ■ XVM Console/ is a standalone version of the XVM that exposes a simple console output API, used for testing scripts as XSC compiles them. This is also where you’ll find the source read more..

  • Page - 1177

    This page intentionally left blank read more..

  • Page - 1178

    Part Seven Completing Your Training read more..

  • Page - 1179

    This page intentionally left blank read more..

  • Page - 1180

    Applying the System to a Full Game “I told many, many people.” ——Jeremy Goodwin, Sports Night CHAPTER 16 read more..

  • Page - 1181

    1140 X tremeScript is now a finished, ready-to-use scripting system. From start to finish, you’ve seen how every aspect of each of its three major components—the assembler, virtual machine, and compiler—are assembled. All that’s left is applying your work to an actual game, to get a feel for how scripting really works. The process of doing so is the focus of this read more..

  • Page - 1182

    1141 scattered keys and use them to activate some underlying machinery that allows you to escape. Your character is a levitating droid-type thing designed somewhat after the probe droids sent by the Empire to Hoth in The Empire Strikes Back. You float around the fortress, picking up keys, and battling your way to freedom. Along the way, other, different colored droids use read more..

  • Page - 1183

    1142 Initial Planning and Setup Lockdown is a simple game, so there wasn’t a whole lot that needed to be sorted out beforehand. I had an idea in my head and knew what it took to make it happen. However, it doesn’t take much for an attitude like that to degenerate into full-on cockiness, so I decided to avoid the unfortunate fate that waits all unprepared game read more..

  • Page - 1184

    1143 Lockdown takes place in a prison-like fortress inhabited by floating droids. There are three types of these droids, each of which attacks the player in a different way. The player is also a droid, and is equipped with a built-in laser cannon that can be used to ward off the attackers. In addition to destroying the evil droids, the player’s goal is to collect four read more..

  • Page - 1185

    1144 games suffer from the problem of enemies and other hazards “rushing in” from the side of the screen, because the player’s view restricts him or her from seeing enough of what’s ahead. By the time the player is able to react, these obstacles have already done their damage. By limiting the immediate action to a single screen, the player is always aware of the read more..

  • Page - 1186

    1145 key. Each time a key is used to activate a panel, it lights up with the color of the key. The player wins when all four panels are illuminated. I should also mention that as an extra atmospheric effect, I decided to make the light in each room flicker at random, resulting in a subtle but effective visual cue in the style of games like Resident Evil. The Enemy read more..

  • Page - 1187

    1146 droid design; I ended up making a number of changes in the final model, but this was a reason- ably close approximation. The aesthetic differences from one droid to another are actually quite simple. In another deci- sion made by deadlines, I decided not to waste the time designing three genuinely unique droid types, and to instead just vary the color. The blue droid read more..

  • Page - 1188

    1147 keyboard, allowing the users to move him around and fire his laser at will. This section will cover the major aspects of controlling the player droid, but the majority of what I’ll discuss here applies to the enemies as well. I’ll explain this relationship in more detail as the chapter progresses. Movement and Firing The two primary actions of the player are moving read more..

  • Page - 1189

    1148 Damage and Destruction Naturally, a big part of the game is taking damage and occasionally being destroyed. Because of this, each droid in the game maintains an “energy level” that determines how close it is to destruction. The maximum amount of energy allowed is eight points. Furthermore, because the game doesn’t feature power-ups of any kind, I decided to constantly read more..

  • Page - 1190

    1149 The Overall Package Lastly, it was important to sketch out what the average game screen would look like, especially with the interface superimposed over it. The end result is what I call “the overall package,” and attempts to prototype what the game will actually look like when running. Figure 16.7 is a sketch of the overall package I was going for. Note the read more..

  • Page - 1191

    1150 Phase Two—Asset Requirement Assessment So you know what everything needs to look like, more or less. The reality of graphics, however, is that even simple objects are often reduced to countless individual bitmaps, all of which must be stored and managed somehow. Ultimately, the assets of Lockdown were reduced to three major groups—graphics, sound, and scripts. 16. APPLYING THE read more..

  • Page - 1192

    1151 Graphics The graphics of the game are stored in the Gfx/ directory, so feel free to check out the individual .BMP files as you read (as stated in the “On the CD” section at the end of the chapter, you can find the finished, ready-to-play Lockdown game in Programs/Chapter 16/Lockdown/Executable/). The Fortress The first and most important graphical step was creating read more..

  • Page - 1193

    1152 16. APPLYING THE SYSTEM TO A FULL GAME Figure 16.9 The droid model, rendered in 3ds max. Figure 16.8 A typical room background. read more..

  • Page - 1194

    1153 The Keys The keys were also rendered in max, and were composed of very basic geometry. Again, however, the aid of a 3D modeler allowed me to convert my simple mesh into a complete animation quite easily. Figure 16.11 is a rendering of a Lockdown key. The Explosions Explosions are always tricky when making a game. They usually require too much fluid detail and read more..

  • Page - 1195

    1154 demonstrating the scripting engine, I figured it’d be worth throwing in a few effects to make things feel more complete. Lockdown’s sound effects can be found in the Sound/ directory. Effects The sound effects are your typical fair—lasers, explosions, and so on and so forth. They originally came from the General 6000 sound collection by Sound Ideas read more..

  • Page - 1196

    1155 Scripts The last of the game’s major assets are the scripts. Deciding what to script was a somewhat tricky issue, as the final decision can easily lie anywhere on the spectrum between too much and too lit- tle. For the sake of simplicity, however, I decided to choose a small and focused domain for the scripts to handle exclusively, rather than pummel the engine read more..

  • Page - 1197

    1156 Lockdown’s State Machine One of the benefits of the state machine approach to game design is that it allows the entire lifes- pan of the game, from beginning to end, to be planned out with a single state diagram. This is a great way to quickly and easily get a handle on exactly how things relate to each other before writ- ing any actual code, and was my read more..

  • Page - 1198

    1157 SCRIPTING STRATEGY Because this isn’t a book about general game programming, the actual development of Lockdown’s engine isn’t particularly relevant (or even really all that interesting; it’s not exactly a Halo-killer). Assuming the engine works, which it does, all you really care about now is using XtremeScript to control the droids. SCRIPTING STRATEGY Figure 16.13 The read more..

  • Page - 1199

    1158 The scripting strategy is simple; you want to run a single script in the background that controls the environment’s ambient effects (in other words, makes the room lights flicker), as well as run any of three droid-controlling behavior scripts. These scripts need to be loaded up front, and, during the execution of the game, run for as long as they’re needed. The read more..

  • Page - 1200

    1159 lines anyway. Besides, they should all be self-explanatory to begin with; anyone with even a basic understanding of 2D game programming should feel right at home. Miscellaneous Functions int GetRandInRange ( int Min, int Max ) This function returns a random integer value between Min and Max, inclusive. void ToggleRoomLights () Calling this function will toggle the lights in the read more..

  • Page - 1201

    1160 Registering the Functions The Lockdown host API is registered with the XVM in the game’s Init () function, right after the call to XS_Init (). As you can see, each of the functions are global, because there’s really no practical reason to fence certain functions off to certain scripts: XS_RegisterHostAPIFunc ( XS_GLOBAL_FUNC, "GetRandInRange", HAPI_GetRandInRange ); read more..

  • Page - 1202

    1161 XS_RegisterHostAPIFunc ( XS_GLOBAL_FUNC, "GetPlayerDroidX", HAPI_GetPlayerDroidX ); XS_RegisterHostAPIFunc ( XS_GLOBAL_FUNC, "GetPlayerDroidY", HAPI_GetPlayerDroidY ); Writing the Scripts Writing the scripts is the fun part, and isn’t particularly difficult. You have three scripts to write in total—the ambience script which runs constantly, and three droid behavior scripts that run read more..

  • Page - 1203

    1162 All the script needs is a _Main () function that starts a simple loop. This loop runs infinitely, allow- ing it to continually execute the game’s main loop. At each iteration, the host API function GetRandInRange () is called to get a random number between 0 and 50. If this number is 1, the lights toggle. When this is executed at runtime, the frequency of 1’s read more..

  • Page - 1204

    1163 // Calculate a new direction, distance and speed var Dir; var Dist; var Speed; Dir = GetRandInRange ( 0, 7 ); Dist = GetRandInRange ( 3, 20 ); Speed = GetRandInRange ( 5, 12 ); // Move the droid along the path while ( Dist > 0 ) { MoveEnemyDroid ( CurrDroid, Dir, Speed ); Dist -= 1; } } // Move to the next droid CurrDroid += 1; if ( CurrDroid > 7 ) CurrDroid read more..

  • Page - 1205

    1164 // ---- Host API Imports ------------- host GetRandInRange (); host MoveEnemyDroid (); host GetEnemyDroidX (); host GetEnemyDroidY (); host GetEnemyDroidDir (); host IsEnemyDroidAlive (); host FireEnemyDroidGun (); host GetPlayerDroidX (); host GetPlayerDroidY (); host GetPlayerDroidDir (); // ---- Constants ------------ // Directions var NORTH; var SOUTH; var EAST; var WEST; // ---- Main read more..

  • Page - 1206

    1165 // Enter the main loop while ( true ) { // If the current droid is alive, handle its behavior if ( IsEnemyDroidAlive ( CurrDroid ) ) { // The current direction, distance and speed // of the droid's movement var Dir; var Dist; var Speed; // The droid's X, Y location var EnemyDroidX; var EnemyDroidY; // The player's X, Y location var PlayerDroidX; var PlayerDroidY; // read more..

  • Page - 1207

    1166 // Use these locations to face the // droid in the proper direction when shooting if ( EnemyDroidX < PlayerDroidX ) { Dir = EAST; MoveEnemyDroid ( CurrDroid, Dir, 0 ); } else if ( EnemyDroidY < PlayerDroidY ) { Dir = SOUTH; MoveEnemyDroid ( CurrDroid, Dir, 0 ); } else if ( EnemyDroidX > PlayerDroidX ) { Dir = WEST; MoveEnemyDroid ( CurrDroid, Dir, 0 ); } else if ( read more..

  • Page - 1208

    1167 For the most part, this script mirrors the functionality of blue_droid.xss. The major difference is that now, as the droid moves, it randomly fires at the player. Once again, you use GetRandInRange () to give the droid a 1 in N chance to fire at each step. Instead of simply firing the weapon, how- ever, the enemy’s and player’s location is used to determine read more..

  • Page - 1209

    1168 var EAST; var WEST; // ---- Functions -------------- /************************************* * * GetPlayerFaceDir () * * Returns the direction in which an enemy droid * should face in order to face the player. */ func GetPlayerFaceDir ( CurrDroid ) { // The specified enemy's location, as well as the player's var EnemyDroidX; var EnemyDroidY; var PlayerDroidX; var read more..

  • Page - 1210

    1169 // ---- Main ---------------------------------------------------------------- func _Main () { // Initialize our "constants" to values that correspond // with Lockdown's internal direction constants NORTH = 0; EAST = 2; SOUTH = 4; WEST = 6; // Droid index counter var CurrDroid; CurrDroid = 0; // Enter the main loop while ( true ) { // If the droid is active, move it if ( read more..

  • Page - 1211

    1170 // Increment the droid's positions MoveEnemyDroid ( CurrDroid, Dir, Speed ); Dist -= 1; } } // Move to the next droid CurrDroid += 1; if ( CurrDroid > 7 ) CurrDroid = 0; } } This final script runs the gamut of host API functions, importing them all. It also defines a func- tion of its own, GetPlayerFaceDir (). Because the red droid needs to both move and fire read more..

  • Page - 1212

    1171 Within the _Main () function, things look more or less familiar. At each cycle through the loop, the next droid in the list is assigned a path to follow, except you’re now using GetPlayerFaceDir () to determine which direction to use. This is how the red droid manages to track the players as they move around the room. Within the movement loop, the frequency of read more..

  • Page - 1213

    1172 Within the main loop of the game, XS_RunScripts () is called once per frame. This handles any and all running scripts, but the real issue is when and how these scripts should be initially activated. In the case of the ambience script, you want it running at all times—regardless of what room the player is in. Because of this, the following line appears whenever the read more..

  • Page - 1214

    1173 In conjunction with the constant calling of XS_RunScripts () in the main loop, the logic discussed in this section regulates the activity of the loaded scripts. The ambient script runs at all times dur- ing the game play state, and three droid scripts are flipped on and off as the player navigates through the rooms of the fortress. Figure 16.16 shows the player in read more..

  • Page - 1215

    1174 Minimizing Expressions As stated, the inherent simplicity of while loops, if blocks, and other language constructs allow them to be translated to nearly optimal assembly language by nature. Because XSC converts them into little more than a few jumps and labels, the emitted code isn’t much different than what you might code by hand. Even functions and function calls are read more..

  • Page - 1216

    1175 own tick-counting system, but one that was based on the execution of instructions, rather than the passing of time. This modification was actually very simple. Remember, the XVM function GetCurrTime () was designed from the beginning as a “black box” that can be implemented with any timing mecha- nism without disrupting the virtual machine overall. All I had to do was read more..

  • Page - 1217

    1176 other words, the player can’t shoot while facing or moving diagonally). At any time, Escape can be pressed to exit the game and return to the title screen. Interacting with Objects There are three major objects the player interacts with throughout the game, aside from the enemy droids. These are the keys, the doors, and the key panels. All of these objects can be read more..

  • Page - 1218

    1177 in the key panel room. This allows me to initially activate the yellow and blue key panels. Also, because the key room is uninhabited, I use this stop as a chance to let my energy recharge with- out being disturbed. I then dive back into the fray, and head to the northeast corner where I pick up the red key. From there I move south until I hit the southwest read more..

  • Page - 1219

    1178 CHALLENGES ■ Intermediate: Change the scripts of one of the droids to include all three behavior types. For example, modify Red_Droid.xse so each of the on-screen droids behaves with one of the three existing attack methods, making them seem more random and lifelike. ■ Intermediate: Modify the behavior of the existing droids. For example, give the blue droids the capability read more..

  • Page - 1220

    Where to Go From Here “Now that you’ve found Robert Porter, take good care of him.” ——Prot, K-Pax CHAPTER 17 read more..

  • Page - 1221

    1180 W ell, well, well. Look at you, Mr. Fancypants. You started with nothing, and after 16 chap- ters of theory, design explanations, implementation details, and more exposure to my pompous and self-serving sense of humor than anyone should have to endure, you have walked away with a feature-rich, high-level, custom-designed-and-implemented scripting system that’s ready to be dropped read more..

  • Page - 1222

    1181 ■ Use the XtremeScript system developed over the course of the book and included on the CD, with a 100 percent understanding of how it’s all working. You’re of course free, if not encouraged, to make changes wherever you see fit, or use it as-is, right out of the box. ■ Modify XtremeScript to work with a language of your own design, geared towards your own read more..

  • Page - 1223

    1182 To get you started, though, let’s discuss some places to immediately go from here. The following topics will be, more or less, listed in order of increasing complexity, so try and pursue them in order. More Advanced Parsing Methods Like I’ve mentioned numerous times, this book has focused on recursive descent parsing because it’s among the most natural and intuitive read more..

  • Page - 1224

    1183 Yes, “OOP” has been quite a hot buzzword lately and will probably remain so for a while. But like all buzzwords, the subject should be approached with great caution. Do you really need objects to make your language work the way you want it to? Is it necessary, or are you just doing it to impress your message board buddies? A strong argument can be made both read more..

  • Page - 1225

    1184 example, while the quality of the compiler’s generated code does indeed play a large role, the simple fact that scripts run in a virtual environment rather than directly on the native processor takes a significant toll as well. Here are some facts to keep in mind: ■ Many scripts in full-scale game projects aren’t particularly complex to begin with-- like ambient read more..

  • Page - 1226

    1185 One thing to keep in mind, however, is that the JVM is designed to mimic a far lower-level of pro- cessing than the XVM. In the context of game scripting, speed and relative simplicity are far more important than low-level control in most circumstances, so there are certain aspects of the system that you should recognize as inappropriate in the context of game read more..

  • Page - 1227

    1186 Operating System Theory Aside from familiarizing yourself with the details of alternative operating systems for the purpose of porting, an understanding of general operating system theory can be invaluable when design- ing or redesigning your scripting system’s runtime environment. After all, virtual machines are very closely related to operating systems, both in terms of architecture read more..

  • Page - 1228

    1187 coded scripts from going nuts and blowing everything up. What follows are some ideas to consid- er if you decide to build an assembler with lower-level access in mind. Random Access to the Stack This can be as easy as defining a built-in array, perhaps called _Stack [], wherein each element maps directly to its corresponding stack index. This would allow any part of read more..

  • Page - 1229

    1188 specialized memory architecture, limits some of the lower-level tasks and capabilities often associ- ated with assembly language programming. Check out some of these ideas for developing a lower-level VM. Unified Memory Currently, the XVM enforces separate regions of memory for a script’s code and stack. Most hard- ware machines, as well as many virtual ones, take the opposite read more..

  • Page - 1230

    1189 resides in the host application’s memory, making individual characters inaccessible unless GetChar and SetChar are used. A lower-level approach would be to give each element in memory the capa- bility to hold a single character, rather than an entire string, so that contiguous regions of memo- ry would be used to store strings character-by-character. This approach gives read more..

  • Page - 1231

    1190 The Compiler and High-Level Language The XtremeScript compiler is undoubtedly powerful, and definitely well suited for the task of game scripting. Of course, there are countless ways to improve it and enhance its features, so let’s talk about a few of them. You may find that attempting to implement some of the following sug- gestions will help you advance in your read more..

  • Page - 1232

    1191 ; Comparisons/jumps JE X, 0, Case0 JE X, 1, Case1 JE X, 2, Case2 Jmp _Default: ; Case implementations _Case0: ; X equals 0 Jmp _Break _Case1: ; X equals 1 Jmp _Break _Case2: ; X equals 2 Jmp _Break ; Default case _Default: ; X is none of the above ; read more..

  • Page - 1233

    1192 struct MyStruct { var X; var Y; var Z [ 16 ]; } This structure can really be seen as an 18-element array, wherein X and Y are elements 0 and 1, and Z [ 0 ] through Z [ 15 ] are elements 3 through 17. Figure 17.5 presents an example of a structure and its representation on the stack. The only syntactic difference is that instead of using array index read more..

  • Page - 1234

    1193 Implementing structures up to this point is rather easy, because it really is just a reworked version of the already existing array feature. The real issues arise when you allow structures to contain ref- erences to other structures, and arrays of structures to be declared. Imagine the following scenario: struct StructX { var Elmnt0; var Elmnt1; var Elmnt2; } struct StructY { read more..

  • Page - 1235

    1194 Pointers and References Currently, the only method of indirection supported by XtremeScript is the use of variables and arrays to reference literal values. Pointers and references, however, add an additional level of indi- rection wherein variables can point to other variables. As an example, consider the following pointer syntax for XtremeScript: var MyVar; read more..

  • Page - 1236

    1195 I took the mnemonic from the 80x86’s LEA instruction, an acronym that stands for L oad E ffective A ddress. This instruction is used to determine the address of the specified identifier, and is more or less analogous to what you’re doing here. Once an XASM variable has been assigned the stack index of another, you need a way to tell instructions like Mov and Add read more..

  • Page - 1237

    1196 the scope of the function, but is referenced any- way. This is because class methods and propri- eties share the same scope. Remember, aside from the addition of functions, a class is implemented just like a struct, so you can get a basic idea of how they work from the previous section on structures. Methods are real- ly quite an easy addition; they can be represent- read more..

  • Page - 1238

    1197 ■ A simplified compiler that can directly leverage the features of the assembler in the code it outputs. ■ Minimized redundancy; because the assembler is already translating assembly to exe- cutable files, there’s no need to bend over backwards to make the compiler do the same thing. ■ The ability to directly hand-tune, optimize, or otherwise modify the output of read more..

  • Page - 1239

    1198 The advantages of this approach should be obvious: ■ Eased development process. By making the standalone compiler optional, the constant tweaking and updating that will invariably be a large part of game scripting can be eased by eliminating the intermediate compile step. Scripts can be immediately loaded by the game engine, which tends to be much faster and easier when read more..

  • Page - 1240

    1199 SUMMARY Well, this has been quite a little journey, eh? If you’re anything like I was, you probably thought the idea of building a high-level compiler and suitable runtime environment was impossible for mere mortals, and yet here you are—as long as you’ve followed along all this way, you too have ascended to the rank of scripting master. Sure, you’ve still got a read more..

  • Page - 1241

    1200 utmost of clarity. A game’s greatest asset is its suspension of disbelief—its ability to remove the players from reality and drop them head-first into a self-contained world—and this is what script- ing is all about. So, that’s that. I hope you’ve learned as much from this book as I attempted to explain. When I first set out to solve the mystery of high-level read more..

  • Page - 1242

    What’s on the CD APPENDIX A read more..

  • Page - 1243

    1202 T he included CD-ROM contains a number of supplemental materials to enhance your expe- rience with the book. They’re organized into the simple directory structure listed here: ■ Articles/ - A small collection of articles that discuss aspects of scripting not directly cov- ered in the book. ■ Programs/ - Contains the entire set of code and executable demos for the read more..

  • Page - 1244

    1203 INSTALLATION Installation is simple; some programs included have their own executable installers or self- extracting archives, while the rest of the content—namely the program demos and code—are “installed” by simply dragging them from the CD to your hard drive. The GUI should run auto- matically on its own, but if if it doesn’t, just use a program like Windows Explorer or read more..

  • Page - 1245

    This page intentionally left blank read more..

  • Page - 1246

    A abstraction layer, 174–179 ActiveStateTcl, 288 AI (artificial intelligence), 57 compilers, 1184 enemies, 57–60 allocating memory directly, 1189 analysis parsing, 985–987 semantic (compiling), 764 APIs, 20 hosts. See host APIs SDKs, 24 applications. See host applications architecture. See also structure hardware, targeting, 780–781 modular, 31 procedural scripting systems, 156–157 XVM, 569–570, read more..

  • Page - 1247

    1206 assembly languages (continued) XVM Assembly arithmetic, 400–401 bitwise, 401 comments, 407 conditional logic, 402–403 defined, 397–399 directives, 404–407 escape sequences, 407 functions, 403–406 instructions, 399–404 memory, 399–400 overview, 408 stacks, 403–405 strings, 402 assignment statements, parsing, 1065–1072 associative arrays Lua, 193–197 Tcl, 301–303 asynchronous script function read more..

  • Page - 1248

    1207 preprocessing, 120–124 data types, 115–125 designing, 74 domains, 68 engines functionality, 69–71 high-level control, 65–67 events, 69 hierarchy, 135–137 executing, 78–81 floating points, 115–116 game flags, 125–128 game intro, 90 implementing, 93–94 language, 91–92 script, 92–93 hacking, 139–140 implementing, 74–90 interfaces, 75–78 internal constant lists, 117–120 logic (iterative), read more..

  • Page - 1249

    1208 code (continued) low-level procedural scripting systems, 158–159 XtremeScript, 167–168 machine, 17, 753 opcodes, 17 relocatable, 779–780 source compilers, 863–864, 919–922 I-code, 940–942 XASM, 470–471 code-emitter module compiler, 863, 950–969 directives, 953–955 format, 950–951 functions, 958–966 headers, 952–953 parsing functions, 1026–1028 symbols, 955–958 XVM Assembly files, 966–969 read more..

  • Page - 1250

    1209 interface, 870 command-line options, 874–879 filenames, 871–874 logos, 870–871 interfaces, 866–867 life-span, 867–870 low-level languages, 753 luac, 185–186 machine code, 753 modules code-emitter, 950–969 code-emitter. See code-emitter module I-code, 932–949 lexer, 916–928 loader, 895–897 overview, 893–895 parser, 928 parser. See parsing preprocessor, 897–904 OOP, 1182–1183 optimizing, read more..

  • Page - 1251

    1210 context switches, 655 multithreading,679–682 cooperative multitasking, 654–658 core (Tcl), 290 counting references (Python), 266 critical sections, multithreading, 663–664 D data structures compilers linked lists, 880–888 stacks, 888–890 Lua, 241 XtremeScript, 351–354 data types CBS, 115–125 Lua, 191–193 coercion, 192 Python, 246 debug libraries (Python), 264–265 declarations, parsing, 1008 read more..

  • Page - 1252

    1211 events CBS, 69 FPSs, 52 hierarchy, 135–137 exceptions Python, 286 Tcl, 330 executables XASM, 444–455 XSE assembling, 558–563 compilers, 969–971 functions, 556–558, 601–603 header, 552–553, 594–595 host APIs, 557–558, 602–603 host applications, 731–732 instructions, 553–555, 595–599 strings, 555–556, 599–601 executing CBS, 78–81 concurrent multithreading, 659–666 concurrently, 109–110 read more..

  • Page - 1253

    1212 framework, XASM, 469, 494–495 functions, 479–482 headers, 473 host API, 487 instructions, 471–473, 487–494 interface, 470 labels, 485–487 linked lists, 474–477 source code, 470–471 strings, 477–479 symbols, 482–485 front end compiling, 768, 859 lexer module, 861 loader module, 860 parser module, 862 preprocessor module, 861 Func directive, 432–434 functionality engines (CBS), 69–71 read more..

  • Page - 1254

    1213 G games content, 15 engines, 15 flags (CBS), 125–128 intro sequence, 90 implementing, 93–94 language, 91–92 script, 92–93 Lockdown code, 1155–1157 graphics, 1151–1153 host API, 1158–1161 logic, 1142–1150 playing, 1175–1177 premise, 1140–1141 scripts, 1161–1173 sound, 1153–1154 speed, 1173–1175 state, 1155–1157 storyboards, 1142–1150 XtremeScript, 1158 logic, modular, 31 GetCommand read more..

  • Page - 1255

    1214 host applications (continued) XVM, 573–574, 682 asynchronous script function calls, 719–728 calling functions, 686–689 control functions, 697–699 embedding, 741 host API function calls, 699–711 host APIs, 742 integration interface, 686–728 multithreading, 728–739 native threads, 684 output, 745–746 priorities, 730–731, 734–735 public interface, 694–696 running scripts, 683–685 script read more..

  • Page - 1256

    1215 XASM, 471–473, 487–494 parsing, 543–551 XVM Assembly input, 439–440 XVM Assembly output, 447–451 XSE executable, 553–555, 595–599 XVM, 571, 584–585, 595–599 executing, 628–633, 637–647 pointers, 634–636 structure interfaces, 604–616, 622–623 XVM Assembly, 399–404 input, 439–440 output, 447–451 integration abstraction layer, 174–179 C Lua, 205 Python, 263 Tcl, 312 interfaces, read more..

  • Page - 1257

    1216 languages assembly. See assembly languages high-level, 753 inter-language functions, 180 intra-language functions, 180 low-level, 753 procedural scripting, 336–337 XtremeScript. See XtremeScript layer, abstraction, 174–179 lexemes, 785–786 lexer module compiler, 916–928 compilers, 861 lexers. See also lexing CD, 855–856 delimiters, 822–826 demo, 849–855 error handling, 797 identifiers, 811–822 read more..

  • Page - 1258

    1217 CBS, 71–78 Lua, 208–209 Python, 266–268 Tcl, 314–315 XVM, 574–575 local variables (assembly languages), 395–397 Lockdown CD, 1177 code, 1155–1157 graphics, 1151–1153 host API, 1158–1161 logic, 1142–1150 playing, 1175–1177 premise, 1140–1141 scripts, 1161–1173 sound, 1153–1154 speed, 1173–1175 state, 1155–1157 storyboards, 1142–1150 XtremeScript, 1158 logic conditional assembly read more..

  • Page - 1259

    1218 Lua (continued) statements, 189 states, 207–208 strings, 193–198 tables, 193–197 tag methods, 241 variables, 188–191 global, 226–228 Web sites, 242 Lua data structures, 241 lua interactive interpreter, 186–187 luac compiler, 185–186 M machine code, 17, 753 macro assemblers, 374 macros, preprocessing, 776–777 managing memory (XASM), 429–430 memory direct allocation, 1189 XASM, managing, read more..

  • Page - 1260

    1219 O object-oriented scripting systems, 21–22 objects. See also OOP FPSs, 51–57 Python, 265–276 RPGs, 41–45 OOP. See also objects assembly languages, 346–349 compilers, 1182–1183 Lua, 241 Python, 286 opcodes, 17 assembly languages, 383–385 operands assembling, 420–422 assembly languages, 337–344, 372–373 parameters, 373 XASM (XVM Assembly output), 449–451 XVM, executing, 636 operations, read more..

  • Page - 1261

    1220 parsing (continued) host applications, 1058–1062 loops for, 1092 while, 1079–1091 overview, 984–985 recursive descent, 994–996 scope, 996–997 statements, 1001–1007 conditional, 1092–1099 if, 1092–1099 strategy, 1000–1001 syntax diagrams, 987–988 tokens, 997–1000 trees, 989–993 XASM, 456–462, 527–528 directives, 529–541 functions, 531–534 initializing, 528–529 instructions, 543–551 line read more..

  • Page - 1262

    1221 debug library, 264–265 directories, 243 exceptions, 286 expressions, 254–256 functions, 261–263 calling, 268–271 exporting, 271–276 list, 286 host APIs, 273–278 host applications, 278–281 initializing, 265 interactive interpreter, 243–244 iterating, 258–261 lists, 251–254 loops, 258–261 module dictionaries, 269–270 objects, 265–276 OOP, 286 operators, 254–256 overview, 242 packages, 286 read more..

  • Page - 1263

    1222 scripting CBS. See CBS defined, 14–15 languages. See languages overview, 5–6, 15–20 purpose, 30–32 systems, 20–27 abstraction layer, 174–179 CD, 334 code, 24–26 command-based, 22–23 dynamically linked modules, 23–24 embeddable, 179 implementing, 179–181 integration, 174–179 interfaces, 174–179 interpreters, 24 Java, 27 Lua. See Lua object-oriented, 21–22 procedural. See procedural read more..

  • Page - 1264

    1223 stacks assembly languages, 389–397 compilers, 888–890 Lua, 209–215 XASM parsing, 530–531 XVM Assembly input, 431–432 XVM, 571, 585–586 structure interfaces, 616–623 XVM Assembly, 403–405, 431–432 state diagrams (lexers), 799 state machines (lexers), 791–792 statements Lua, 189 elseif, 200–201 if, 200–201 while, 201–203 parsing, 1001–1007 assignment, 1065–1072 conditional, 1092–1099 read more..

  • Page - 1265

    1224 Tcl ActiveStateTcl, 288 arrays, 301–303 bouncing sprite demo, 322–329 C, integrating, 312 case-sensitivity, 291 commands, 290–292 C functions, 316–320 calling, 315–316 comments, 297–298 compiling, 290 concepts, 184 conditional logic, 306–308 core, 290 directories, 288–289 exception handling, 330 expressions, 303–306 extensions, 290, 330 functions, 310–312, 316 global variables, 320–322 hash read more..

  • Page - 1266

    1225 U updating XVM host applications, 735–739 upgrading lexers, 814–818 utilities, lexing, 788 V values assembly languages, 392–395 expression parser, 1058 XVM, 583–584 Var directive, 434–436 variables assembling, 416–420 assembly languages, 395–397 global, 226–228 compilers, 890–891 Tcl, 320–322 tracking, 689–694 Lua, 188–191 parsing, 535–540, 1017–1020 Python, 244–246 Tcl, 298–301 XASM read more..

  • Page - 1267

    1226 XASM (continued) parsing, 456–462, 527–528 directives, 529–541 functions, 531–534 initializing, 528–529 instructions, 543–551 line labels, 542–543 parameters, 540–541 stacks, 530–531 variables, 535–540 strings, 462–469 tokenizer, 495–524 XSE executable assembling, 558–563 functions, 556–558 header, 552–553 host APIs, 557–558 instructions, 553–555 strings, 555–556 XtremeScript, 769–770 read more..

  • Page - 1268

    1227 executing, 627–628 binary operations, 638–639 conditional logic, 640–641 functions, 642–645 instruction pointers, 634–636 instructions, 628–633, 637–647 operands, 636 pauses, 633–634, 646 terminating, 646–648 functions, 587–588 calling, 578–581 global data tables, 571–572 headers, 583 host APIs, 587–588 host applications, 573–574, 682 asynchronous script function calls, 719–728 calling read more..

  • Page - 1269

    1228 XVM Assembly arithmetic, 400–401 bitwise, 401 code-emitter module, 966–969 comments, 407 conditional logic, 402–403 defined, 397–399 directives, 404–407 escape sequences, 407 functions, 403–406 instructions, 399–404 memory, 399–400 overview, 408 stacks, 403–405 strings, 402 XASM input comments, 442 directives, 431–439 functions, 432–434, 440–442 host API, 440–441 identifiers, 438–439 read more..

  • Page - 1270

    “Game programming is without a doubt the most intellectually challenging field of Computer Science in the world. However, we would be fooling ourselves if we said that we are ‘serious’ people! Writing (and reading) a game programming book should be an exciting adventure for both the author and the reader.” —André LaMothe, Series Editor Premier Press, Inc. www.premierpressbooks.com ™ read more..

  • Page - 1271

    TEAMFLY Team-Fly® read more..

  • Page - 1272

    Take Your Game to the XTREME! Xtreme Games LLC was founded to help small game developers around the world create and publish their games on the commercial market. Xtreme Games helps younger developers break into the field of game programming by insulating them from complex legal and business issues. Xtreme Games has hundreds of developers around the world, if you’re interested read more..

  • Page - 1273

    License Agreement/Notice of Limited Warranty By opening the sealed disc container in this book, you agree to the following terms and conditions. If, upon reading the following license agreement and notice of limited warranty, you cannot agree to the terms and conditions set forth, return the unused book with unopened disc to the place where you purchased it for a refund. License: read more..

Write Your Review