Like in nearly all of my other tutorials, let me begin with a disclaimer. :) I just taught myself basic 80x86 assembly less than a week ago and only spent 2 days actually reading the reference section of one of my CS course books. So, uh, I don't claim to be a "Local Expert" or anything on this. :) Secondly, most people reading this are probably even lazier than I am and don't want to spend those two days reading a real ASM guide, so, instead of focusing on the specifics of ASM, I'm just going to kind of "make up" some wishy-washy easier-to-understand stuff that will just enable you to write StarGraft actions/conditions. NOTE THAT THIS MEANS IF YOU REALLY WANT TO LEARN REAL 80X86 ASSEMBLY, DON'T READ THIS. IT IS WRONG. INTENTIONALLY! But if I explained everything correctly, you would probably be very confused. :)
Nevertheless, this isn't going to be for the weak-of-mind. If you're struggling to just get CS tools working properly, aren't aware of the various bugs, haven't read a decent hex editing tutorial/don't understand hex, then you probably want to get the hang of those things first. In addition, I'm going to move pretty fast, so you'll have to mull over stuff yourself.
Basics: Binary/Hexadecimal number systems
If you know anything about hexadecimal, then this should be quick review. A hexadecimal number is a base 16 number system instead of the usual base 10 (decimal). I.E., there are sixteen digits, 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F instead of only 10; A-F standing for decimal 10-15. Here are some examples (#####h or 0x#### means its a hex number, ####d or ####t means its a based 10 or decimal number):
10h = 16t
100h = 256t
21h = 33t
Got it? If not, review a good hex tutorial first.
Next of note is the binary number system. Its just like hexadecimal or decimal, but now its based 2. I.E., the only digits are 0 and 1. (So every 2t you add to a binary number means you need to carry to a new place, just like you need to carry for every 10 in decimal). Here are some examples (####b means binary number):
1b = 1h = 1t
10b = 2h = 2t
11b = 3h = 3t
101010b = 2Ah = 42t
Should be pretty easy to grasp if you already understand hexadecimal. Now why are these important? Because all the variables we're going to work with are going to be in one or the other when you deal with them (mostly hex).
Review: Bytes, Words, Double Words
Recall from a hex editing tutorial that computers usually organize data into distinct groups of numbers. A byte is a hexadecimal number of 2 digits, ##h. A word or integer is a hexadecimal number of 4 digits or 2 bytes, ####h. Here might be some new terminology for you. The high order byte of a word (or H.O. byte for short) is the byte on the left-side of the word (if the number reads from left to right). For example, the high order byte of word 1234h is 12h. The H.O. Byte of ABCDh is ABh. The low order (L.O.) Byte is the one on the right side. The L.O. Byte of 1234h is 34h. The L.O. Byte of ABCDh is CDh.
A double word (dword for short) is a hexadecimal number of 8 digits or two words or 4 bytes, ########h. The H.O. word of a dword is the word on the left side. The L.O. Word is the one on the right.
Oh, and a bit is one binary digit.
Got it? Nothing too tricky yet.
How Your Computer Works (Kinda)
At the deepest, most concrete, least visible level, your computer works by "moving around" a bunch of binary numbers represented by digital circuits. And that's it. That's all it ever does (the CPU anyway). Of course, since this happens a couple million times every second, it can do lots of cool stuff. :)
What were going to do is to "talk" to your CPU at this level, by feeding it instructions it can understand at the very lowest level, and have it do "stuff." Since this is lower than the level of Starcraft, heck, lower than your OS or what-not you're running, we can use it to instruct your CPU to do stuff to Starcraft while its running.
Here's the basic things you have to understand. There are basically 3 different things inside your computer case, the CPU (Pentium III, Athelon, AMD K6, whatever chip you got), memory (RAM), and Input/Output devices (keyboard, monitor, mouse, etc.). We're only interested in the first two for this.
Believe it or not, all "programs" on your computer are just data. They're a bunch of bits stored on your Hard Disk (an Input/Output device), that, when run, put themselves into your memory (RAM), and then your CPU (Processor) takes chunks of those bits and read various instructions out of them.
E.G. Let's say you run Starcraft. All of Starcraft -- its MPQ data, in EXE data, the program code itself -- it loaded into memory. Then, one instruction at a time, the bits of the code get moved from memory to your CPU which tries to understand the instruction (just a jumble of binary gibberish mainly :). The instruction may tell it to move some data around in memory, it may tell it to add some numbers together and then move them into some place in memory, it may tell it to jump to another place in memory to receive the next code instruction (remember, code is just data, like any other number). Once it completes an instruction, it picks out the next instruction from memory and executes that. And so on.
This is how we are going to create "button actions" for StarGraft to use when patching Starcraft. When you press a button in Starcraft, something happens. Namely, some low level instruction like the ones above get sent to your computer. Specifically, it tells your computer to "go to this button's code in memory." When it gets to that code, it executes the button's code, and the results show up on your screen (after the button is executed, it returns to the main code, sends some signals to your monitor -- I/O -- to refresh the screen, etc., etc. -- none of which you need to concern yourself which :). We're going to create this "button code." When we make a new action in StarGraft, and then input some instructions into that black box, Stargraft will put them into appropriate places in memory so that when you press a button that has that action, it will go to your code, and execute it.
Got it? Let's go.
80x86 Assembly is what people call the "language" your computer understands (unless you have a Mac :P). In essence, its just a bunch of specific binary number combinations that Intel programmed your CPU to understand. But we don't need to get into the nitty gritty of that because there are "assembler" programs out there that will allow you to use (slightly) more readable text code and then "assemble" it into binary code for you.
Before we get into the code, there are some specifics you need to understand about how your CPU works. First, you must know about registers. A register is a little spot on your CPU chip which can hold data (a number). There are several registers (not as many as you might think), but for now we'll only concern ourselves with 6 of them (well, more, but not really). To make it easier, let's give them names: EAX, EBX, ECX, EDX, ESI, and EDI. If you only want to remember a couple of them, just remember the first 4 and you should be fine (the more you know, the more flexible you can make your action "programs"). You don't have to really worry about the specific function of each one, because you only need to know what each register specializes in if you want to optimize your code (for speed). :P For our purposes, they all are the same type of thing.
Each of these registers can hold one dword of data (a number of 8 hex digits). No more, no less. Never thought your computer would reduce down to such a low level state, huh? :) We can also access the different "parts" of each of these registers. For example, we call the L.O. Word of EAX just AX. We call the L.O. word of EBX just BX. And the L.O. word of ECX, EDX, ESI, and EDI, are called CX, DX, SI, and DI respectively. Here's a little diagram:
In this picture, the value inside EAX is 1028AB00h. The value inside AX is AB00h. The value inside EBX is 00000001h or just 1h. The value inside BX is 0001h, or also just 1h. Note that if we change the value inside, say AX, it will also change the value of EAX. In actuality, you can also access the H.O. and L.O. Byte of AX, BX, CX, and DX. They are called AH and AL for H.O. AX byte and L.O. AX byte, BH and BL for H.O. BX byte and L.O. BX byte, etc. Not as important as the other two bigger values, but you may want to remember them anyway. Just things to keep in mind. It may be helpful to draw a diagram yourself to help yourself remember what's what.
What are these registers for? Well, the fundamental fact of life is that there aren't many things in your computer than can do "stuff." In fact, almost all the brain action happens on these few little registers. How it works is like this:
The first instruction (a dword usually) from memory is moved from memory onto a specialized register by the CPU (not one of the ones above). Once on the register, the CPU can read it and do what it tells it to do. More often that not, instructions will instruct the CPU to grab other pieces of data from memory and put it on another register (like one of the ones above). This is because the CPU can't really do stuff in memory; it has to grab the stuff and put it on a register to mess with it. But what good is messing with a couple of numbers in all these tiny registers? I mean, we don't even have 64 bytes to work with here. ;) Well, the CPU can also put the values that are in these registers back into some place in memory, and there's a whole lot of that (RAM, remember?). So basically, that's all that ever happens: CPU gets instructions, grabs tiny bytes/words/dwords from memory, messes with it in the register (add two things, compare two values, etc.), and then put it back. (Occasionally sending some signals to your monitor so you can see what's going on :). You can also move data from one register to another. I.E., you can move a value from EAX to EBX. (But you can't move a value from EAX to, say CX, since EAX has a dword in it, and CX can only hold a word, so it won't fit)
That shouldn't be too hard to digest. Its actually really simple, which is really the power behind your entire system. Now, let's talk about memory. The easiest way to think of it is like this: memory is one really, really big "file." Ever hex edited a file? Well, memory is just like that. It starts at offset 0x0000 (or just 0h) and then is strait data until, say, offset 0x99999999999999999999999999999999999999999999999 or something to that extent. :) To access a piece of data in memory, like finding a certain byte/word in a file in hex view, you just need to look up its offset.
Its actually a bit more complicated than that (not much), but we're not going to worry about that since it doesn't really matter. [To be somewhat correct, when the CPU looks up the offset of a piece of data in memory, it doesn't normally start counting from 0x0000; it first gets the offset of the start of some segment (usually the place that your Operating System designates to the specific program you're running, plus the location of where the "data" is, as opposed to "code") and then adds the offset you give the CPU to obtain the actual offset in memory. But that doesn't really matter so you can just forget about it for now :]
So let's do a quick pseudo example. Say we want to move the word value starting at 0x0068 memory to the AX register. (Notice that we're using the AX register here, not EAX, since we're moving a word value, not a dword value; thus we want to move it to a register with word length only) Here's the "instruction" we'd give to the CPU:
Move word at [0x0068] to AX.
The [...] brackets show that we're referencing the offset (if we didn't put the brackets then we would mean, move the value 0x0068 or 68h into AX). Let's assume that the word starting from 0x0068 was 0010h. Hence, we just moved the value 0010h into the AX register, so it now contains that value. Got it? Let's try another one.
Suppose we want to move the value in EBX to memory offset 0x0068. Here's the pseudo code:
Move EBX to [0x0068].
Note that we didn't have to say "move dword at EBX." Because EBX by default is a dword, the CPU assumes we mean move the entire dword from EBX and move it into memory, overwriting the data starting at offset 0x0068. Actually, we didn't have to specify word in the first example either, since AX holds a word by default, the CPU will load all it can from the offset and move it into AX, which just happens to be a word.
Let's try a more complicated example, but one you'll have to deal with a lot in making real SG action programs. Let's say that the value of an offset we want to get some data from is in memory offset 0x0068. Read that carefully. The data at 0x0068 isn't the data we want, its just another offset, a number (in this case, we'll assume its a dword). But we want to go to that offset. In other words, we don't know what is at 0x0068, but we know there is a number there. And we also know that that number is actually the value of another offset in memory. So first we have to grab that value, look at it, and then process it as an offset and grab the value from that offset. Here's the pseudo-code:
Move [0x0068] to EAX.
Move [EAX] to EBX.
Now the value we were looking for is in EBX. See how this works: Let's say 0x0068 contained the value 0x00001000 (which, remember is an offset to another place in memory, in particular 0x1000 :). First, we move that value to EAX. EAX now contains the value 0x00001000. Nothing new there. Now, we use EAX as the offset reference (remember, that's what the brackets are for), so we go to 0x00001000 in memory and grab the value there, and then put it in EBX. If you're a bit confused, here's one more attempt to clarify. Suppose I had done this:
Move EAX to EBX.
What would that have done? It would have moved the value in EAX to EBX. That is, if EAX contained the value 0x00001000, EBX would now also contain 0x00001000. Just treat them like variables. Now, instead I did:
Move [EAX] to EBX.
The [...] brackets tell your CPU to treat the stuff inside as an offset instead of just a value. So first, it reads EAX, gets the value 0x00001000, sees the brackets, so it goes to the offset 0x00001000 in memory, grabs the dword there, and then puts it in EBX. Its a bit to think about, but it really isn't that complicated. (FYI, this is called a pointer, since the original offset in memory just contained another offset to another place in memory; this can happen several times -- i.e., you can have an offset in memory that contains an offset to another place in memory that contains yet another offset to another place in memory :)
One last pseudo-excercise, and then we can start doing some real assembly. Let's say we have the example above, but instead of 0x0068 containing the value of the offset we want, we want to go to 4 bytes + that offset. In other words, 0x0068 contains the value of an offset. But we want to go to that offset plus 4 additional bytes. So if 0x0068 contained the value 0x00001000, we actually want to go to 0x00001004. How would we go about doing this? (assuming that we don't know explicitly that the offset is 0x00001004 already, of course :)
Move [0x0068] to EAX.
Move [EAX+0x04] to EBX.
Simple enough. Just a little more syntax. Here, I added 4h to the value of EAX before the CPU evaluates it as an offset, so it goes to the correct place. There's another way I could have done it:
Move [0x0068] to EAX.
Add 0x04 to EAX.
Move [EAX] to EBX.
Here, we explicitly added 4h to EAX as a separate instruction. Yes, you can do this. :)
If you got a feel for the all the stuff above, you're ready for some real assembly!