Optimisation

Optimisation
'''WARNING: SPOILERS! While no particular solution to any particular puzzle is given here, no guarantee is made that examples aren't very relevant to certain puzzles. You may find it is more fun to discover these strategies for yourself. You are recommended to try optimising as best you can without this guide, and then - once you are stuck for ways to find improvements - come back and see what other tricks you can try.'''

Introduction
Optimisation is trying to improve one (or more) of the three level metrics (cycles, space, activity). The level score is given for the worst-case performance. The metrics are:
 * 1) Cycles - the time it takes (in cycles) to complete the task.
 * 2) Space - the number of lines of instructions it takes to complete the task.
 * 3) Activity - the number of successful LINKs performed while completing a task.

General tricks

 * Watch performance closely, and try to identify wasted cycles / instructions / links.
 * It may be necessary to sacrifice one metric to help another. Improved cycles performance can often be found using methods that use more space (paradoxically), or more LINKs

Activity optimisation

 * Any time you use a link twice, consider whether it is necessary.
 * Save two jumps through a link by jumping once, and then using a REPL to create your second worker.
 * Save a return trip by sending information required back via GLOBAL M, and then dying.

Space optimisation

 * If you have repeated sections, consider whether you can create a loop, or a unified section to JUMP to.
 * Avoid doing things like DROP ; HALT at the end of your file
 * DROP ; HALT does the same thing as simply HALT.
 * This also applies in places where the EXA would crash trying to go into the next section.
 * Re-order code so your EXA can just drop into the next section without having to JUMP there. (You may be able to avoid the MARK too).
 * A pattern like COPY M X ; ADDI X val X can be replaced with ADDI M val X.
 * A pattern like FJMP END ... MARK END ; DROP ; HALT is very similar to DIVI X T X
 * MARK takes up space, which may be counter intuitive for some people coming from other languages.
 * Using Canaries and Terminators can avoid writing clean-up code.
 * For example:

NOTE 19 INSTRUCTIONS REPL WRITER REPL READER HALT

MARK WRITER COPY M X GRAB 300 MARK WLOOP TEST X = F FJMP WLOOP MODE COPY F X DROP HALT

MARK READER GRAB 301 MODE COPY M F DROP HALT

can be simplified to:

NOTE 10 INSTRUCTIONS @REP 0 SWITCHING MODES HERE AVOIDS DUPLICATING MODE CALL LATER @END COPY M X MODE REPL READER NOTE NO NEED FOR REPL WRITER

NOTE WRITER MARK UNNECCESSARY MARK WLOOP TEST X = F FJMP WLOOP COPY F M

NOTE FALLING THROUGH TO READER NOTE CODE WILL CRASH ANYWAY

MARK READER GRAB 301 COPY M F NOTE DROP AND HALT UNNEEDED

Cycle optimisation
This is, to my mind, the most in depth and satisfying area to optimise. (Opinions may vary.)

Choose the right algorithm
For each challenge, split it up into tasks, and handle each task individually. Some tasks may require others to be completed first.

Some consideration should be given to how tasks are split up. For example: a task like:

"Find out which values of A between 10 and 29 can have some operation performed on them" could be split into "Can 10 have this operation performed on it?"; "Can 11 have this operation performed on it?" ... and so on.

Potential problems

 * Execution paths: if you have a task that can be done, but no EXA working on it, you could try adding more EXAs.
 * Communications: time spent sitting on the M channel to read/write is time spent not doing anything. This can be true even when your EXAs are well synchronised, since a send down M always takes at least two cycles.
 * Blocking conditions generally: if you find that any of these come up, see if you can figure a way to avoid waiting for stuff to clear. Sometimes this involves carefully synchronising your EXAs to dance a very precise dance. When this is done correctly, the low cycle count is is the second best feeling in the world (right behind seeing the dance you choreographed go correctly).
 * Excessive bookkeeping: mostly this affects loops and branches.
 * Too many or unnecessary instructions. Fewer instructions (normally) take less time to execute.

Micro-optimisations
Many of the tricks to reduce size in the Space Optimisation section are useful here. Reducing instruction count as well as saving cycles (e.g. ADDI F 10 X over COPY F X ; ADD X 10 X) will also free up space for when you want to trade space for cycles with other optimisations.

It's always worth examining your COPY commands, to see if the copied data has any work done it. You may be able to combine some of the work with the copy.

Re-ordering instructions
It is worth examining instruction orders, particularly around M read/writes, and REPL instructions.

For example, the earlier an EXA starts work, the earlier it finishes, so if there are instructions before a REPL that could be done after the REPL, it might be worth moving the REPL forward. (Of course, if the new EXA then shows up too early and messes with orderings, or blocks another EXA that's not so good.)

When doing an M read or write, you want to do this as close to the other EXA doing the write or read. If you find an EXA is waiting for a cycle or two before reading M, consider moving the read so it is an instruction or two later. If there are useful things to do in this time even better. It might also be worth looking to see if the write can be done earlier.

Clamping in two instructions
The naive method of clamping is something like:

NOTE X CANNOT BE HIGHER THAN 100 TEST X > 100 FJMP CONTINUE COPY 100 X MARK CONTINUE

This is 3 cycles, 4 instructions, and clobbers T. It can be done with two cycles and no additional registers, however: the EXA only stores values between -9999 and 9999, so trying to make it larger than 9999 will clamp it at 9999 (and making it smaller than -9999 will clamp at the other end). So you can limit X to 100 with:

NOTE 9899 IS (9999 - 100) ADDI X 9899 X SUBI X 9899 X

If you're passing this value to another EXA, you can even do the clamp for free by doing the operations in the messaging:

NOTE SENDER ADDI X 9899 M

NOTE RECEIVER SUBI M 9899 X

Use M and F in math instructions
If you need to manipulate a value before interacting with a file or a message, try doing so with the F or M register directly in the math operation's arguments. For example:

COPY F X ADDI X 100 X COPY X M
 * INSTEAD OF

ADDI F 100 M
 * CAN USE A SINGLE INSTR

As mentioned above in Clamping, when sending messages, you can run two operations for free - one on the sender and one on the receiver.

SWIZ Packing
If you need to store or message several small values, you can use SWIZ to easily pack into and extract them from a single register value.

For example, passwords are generally three-digit values that need to be successively incremented. SWIZ can pull out each digit of the value with a single instruction each, without using additional intermediate storage.

Phone numbers, too, are 11 digits long, and take a significant time to transfer over message channels. You can message them faster by using SWIZ to pack 4 digits into each value and transferring that instead; this is somewhat faster if done on the fly, but especially faster if the sender can do the packing ahead of time. This can also significantly speed up phone number file handling, if repeatedly writing and erasing.

There are lots of tricks with SWIZ.

Use Illegal Operations
Rather than keeping your program flow consistent and correct, judiciously use illegal operations to kill EXAs when you're done with them.


 * Rather than testing for the end of a file only to DROP and HALT, just continue reading from the file so you die and drop the file automatically.
 * Rather than sending an explicit "more data" heartbeat message, if the valid data doesn't include 0, consider sending 0 when data is done, and have the worker EXAs just do DIVI 1 X T on the received data, automatically dying when the data is done without additional message overhead.
 * MODI with a 0 as its second argument is also illegal, so if a worker EXA knows what value it should send but needs to know whether or not to send it, consider MODI M, where M is either 0 (if they should die) or larger than (if they should send to ) for a single-op control that doesn't clobber any additional registers.

Loop unrolling
When to use it: when you have loops - particularly small loops - and lots of spare space.

Consider this code, which writes the numbers 0 to 9 into a file: COPY 10 T MARK WRITELOOP SUBI T 1 T SUBI 9 T F TJMP WRITELOOP

This takes 31 cycles to run, and spends about 2/3rds of the time doing bookkeeping instead of writing to the file.

We can do this in 10 cycles (the theoretical minimum - the file cannot receive values faster than that), and (as a bonus) this doesn't even clobber T:

@REP 10 COPY @{0,1} F @END

The main disadvantage here is that the first option is 5 instructions, and the second is 9. In this case it's not such a big difference, but if you were writing 0 to 99 to a file this would probably use too much space (100 instructions). In that case, though, you can always adopt a hybrid solution:

COPY 100 T MARK WRITELOOP SUBI 100 T X SUBI T 10 T @REP 10 ADDI X @{0,9} F @END TJMP WRITELOOP

This is 15 instructions altogether, but runs in 141 cycles, as opposed to the 301 it would take using a simple loop (and compared to the 100 cycles of the theoretical best case).

Loop unrolling can also be used when you don't know the full size of the loop, by testing if the loop size is greater than a certain threshold and then using an unrolled loop, and then having a standard loop for the last few items.

Parallelisation
Doing lots of things at once is often better than doing one thing at a time. The main way to parallelise in these puzzles is to form a pipeline. Messengers are good for this as you can have one EXA at every step of transferring the message. It means the message takes longer to get there, but you no longer stall reading from M, so the actual number of digits processed per second increases. If you had a loop where you did this: COPY #INPT M, you might stall waiting for M to be read. However, if instead you did a COPY #INPT X ; REPL MESSAGE, and have MESSAGE transfer the data directly, this would be less of a problem. Of course, it's two cycles to do this, but the other parallelisation strategy is "going wide", where two EXAs do the same thing in sync with each other. This means you could have one EXA reading from #INPT while the other REPLs, and then they switch. (This doesn't work if #INPT is a file.)

When things need to be done in a particular order, parallelisation can be tricky since you need to introduce synchronisation, but they are difficult fun problems to solve. While M channels can be used to synchronise EXAs, it's more fun to make it work without them.

One trick that can work is to save yourself a TEST and TJMP/FJMP by assuming either case could be true, and REPLing an EXA to follow the FALSE path, while following the TRUE path locally. This works if the wrong EXA will crash without causing any damage.