(edit 20140207: some stupid typos crept into the tables)

(edit 2014-02-08 : dropping all the pre-modifications)

The instruction set of the YASEP architecture is finally frozen, after years of fine-tuning and exploration !

In august 2013, during a discussion with JCH, I came up with a new encoding for the 4 remaining bits of the extended instructions that were reserved for register auto-updates. I've been struggling with the one big shortcoming of the architecture : the very limited range of Imm4, particularly for conditional relative jumps. I had hacked a few tricks but none were really satisfying.

JCH pointed to some autoupdate codes that didn't make sense in combination with other flags and that's how he found a way to get 2 more bit for SI4/Imm4.

I tried to simplify the system down to a few simpler codes, following these principles :

  • A little reminder : when a D register (a memory access register) is referenced, it's the corresponding A register that gets updated, according to the size of the accessed word (1, 2 or 4 bytes). Otherwise, A registers are incremented by 2 or 4 bytes (depending on the datapath width, 16 or 32 bits) and R registers are incremented by 1. It's not very orthogonal but quite efficient.
  • any register may be post-incremented or post-decremented with one instruction (handy for string/vector code)
  • There must be "room" for 2 Imm bits and it should not break existing compiled code (NOP=0000)
  • Any of the 4 register fields may be affected

The important trick that JCH found is that the Imm/Reg field invalidates certain auto-updates and frees some bits. In particular, it makes no sense to update SI4 when this source operand is immediate, so SI4 is associated with NOP in certain cases.

There is very little room and I had to make some compromises. For example, the CND field can't be updated when other registers are. Pre-incrementations are also avoided (see at the bottom why). It's not possible to increment one register and decrement another.

The resulting format provides Imm6 and one post-update for all extended instructions, and one to three post-updates when no immediate is present.
  • iRR instructions use 2 bits to encode Imm6 along with 2 bits for updates :
00 NOP
01 SND+
10 DST+
11 CND- (this helps loops)
  • RRR instructions use 4 bits to encode more complex updates
        00           01         10    11
00 NOP SND+,SI4+,DST+ SI4- SI4+
01 SND-,SI4- SND+,SI4+ SND- SND+
10 DST-,SI4- DST+,SI4+ DST- DST+
11 DST-,SND- DST+,SND+ CND- CND+
- The big advantage of this encoding is that it increases code density for a lot of very common sequences : stack manipulation, string/vector processing, counters... Code density increase does not always mean faster execution but it helps. Different microarchitectures might implement these flags with different approaches (serial or parallel)

- There are several drawbacks as well : the encoding favors density over decoding ease (but what can we do with only 4 bits ?). The new encoding also breaks Imm4 and a new assembler must be recoded from scratch (the current one is aging and its flexibility has been stretched to its limits).

- In the end, it is a progress :

  • Code density will increase again (maybe 20%).
  • Auto-updates are an optional feature but we have freed 2 Imm6 bits for general consumption. This will benefit all the YASEPs out there (which must be updated, fortunately there are not a lot yet ;-D ). Post-update is a first level of compatibility, and pre-update is more difficult to implement so it's a second level (less expected to be available).
  • This help extend the range of PC-relative conditional jumps
  • This solves the limitation of Shift/rotate operations
  • No more unused bits in the instructions !

- Some questions remain :

  • It makes sense to update the A register of a D register that has just been written to (to update the destination for the next write, in a string-copy sequence for example). What about the case where an instruction writes to a R register with post-increment ? What is the priority ? Auto-updates were initially meant for address registers only but later extended, should this be restricted again ? If so, would that break even more symmetry and create more complexity ?

Right now, the priority is to rewrite the assembler/disassembler and keep the simulator and VHDL up-to-date. My work system is in a bad state and it will take time to get everything back in order.


Why no pre-increment or pre-decrement ?


Pre-modification are removed because they break the very important rule that an instruction should not trap (or be able to trap) in the middle of the execution pipeline.
In the case of pre-incrementing an address register, such as MOV -D1, R1, the validity of the new address in A1 is known only after it is being computed, but there is no way to gracefully stop the instruction in the middle or even restart it. The proper way to do it is to move the -D1 into either a previous instruction using A1 or D1, or simply emit a short ADD -1 A1 instruction before the actual move to R1.

Remember : all the operands must be directly ready for use (at decode stage) before the instruction can proceed to execution stage.

The previous table was :
        00       01   10    11
00 NOP +SI4 SI4+ SI4-
01 SND+,SI4+ +SND SND+ SND-
10 DST+,SI4+ +DST DST+ DST-
11 DST+,SND+ +CND CND+ CND-
The new table uses the 4 pre-inc entries for 2-post-decrement and 3-post-increment.