YASEP news

To content | To menu | To search

Monday 29 June 2009

YASEP@HSF2009

On June 26th, I have presented a joint project with Laura, called "GPL" (Gaming Platform Libre), at the HackerSpace Festival (HSF2009) near Paris. See http://www.hackerspace.net/gaming-platform-libre

This is a french talk, and the slides are here.

I present the latest thoughts about how cryptographic protection of contents could be compatible with the gamer's and the game editor's freedom and cooperation. Some slides also present the latest updates in the YASEP instruction set.

Friday 24 April 2009

First Layout of a custom FPGA+SRAM board

I have not been fully satisfied by all the boards that I have seen. There are always details that don't match a project or requirements that are not met (size, price, features, whatever). So I finally decided to start my own board(s).

Firt route of a TSOP-2 SRAM to a A3P125 FPGA in VQ100

It seems that YASEP could easily replace microcontrollers that I already use. The flexibility offered by FPGAs and the ability to strip a thing down to the minimum, then expand on that depending on the needs, makes this solution more and more attractive. No difficult selection of features and package (as with fixed-function chips), put the FPGA on the board and route the pins...

I can't solder BGA package, or even build suitable PCBs myself, but I'm already able to make double-sided PCBs that can be fitted with a FPGA in 100, 144 or 208 pin in QFP package. I'll be able to reuse these designs in the future, or make my own cheap modules.

Saturday 4 April 2009

First details of the new "extended" long instruction

A precedent post has summarised the available "instruction forms", with or without immediate field (4 or 16-bits), with 2, 3 or 4 register addresses. Here we look at the "long form" (32-bit) using the "extended" fields that add 2 register addresses, conditional (speculative) execution and pointer updates.

Let's now examine the structure of the 16 bits that are added to the basic instruction word :

  • One bit indicates if the source is Imm4 (it replaces the corresponding field in the basic instruction).
  • 2 bits indicate a condition (LSB, MSB, Zero, Always) and another bit negates the result (The condition "never" will be used later but I'm not sure how).
  • 4 bits indicate which register is being tested
  • 4 bits indicate the destination register (replacing the src/dest field in the basic instruction)
  • 2 fields of 2 bits each encode the auto-update functions of one source register and the destination register (nop, post-inc, post-dec, pre-dec)

These fields are mostly orthogonal and can work in almost any combination. One can auto-update 2 registers (whether they are normal or belong to a memory access register pair), perform a 3-address operation and enable write-back depending on 97 conditions. It also preserves the availability of short immediate values, which further reduces code size. However it can increase the core's complexity.

One unexpected bonus is that this new architecture iteration is more compiler-friendly. At least, it's much less awkward or embarassing.

One bit could have been saved : the imm4 flag could be merged in the auto-update field for a source register. However this increases the logic overhead and prevents simultaneous use of auto-update AND imm4.

Stay tuned...

Yet another Instruction Set Architecture change

I wish it could stabilize soon, but at least movement is a sign of activity (or the reverse :-))

I was annoyed by the ASU operations :

  ADD, SUB, ADDS1, SUBS1, ADDS2, SUBS2, MIN, MAX

These instructions were the last ones that used skip technique, since it is progressively dropped in favor of relative branches by conditional add/sub to the PC register.

How is it possible to provide the same functionality without skip ? It's the same old question that decades of research has not yet answered definitively. The Carry Flag is the obvious solution but I have just dropped the "status/mode register" in favor of another general purpose register. So where can I find a stupid bit of room ?

The answer is there under my eyes : the LSB of the PC ...

OK OK I know it's ugly. But consider these aspects :

  • The PC points to the next instruction and never uses the LSB because all the YASEP instructions are aligned on 2-bytes boundaries.
  • Any write to the PC register modifies the bits 1 to 31. Bit 0 comes from the ASU's carry output.
  • We can declare that only the ASU operations (or context changes) can change the PC's LSB. All the other instructions can read it and test it, so the informations is easily available.
  • Since we dropped the 4 instructions that used skip, these "slots" can be filled by other instructions :
 CMPS, CMPU, SMIN, SMAX

CMPx are just like SUB but don't write the result back. I wish it could set the LSB of any register but the current architecture doesn't allow this, so please keep the destination field to PC when encoding the assembly instruction.

3 new instructions deal with signed comparison : CMPS, SMIN & SMAX. They were missing from the previous opcode maps but the elimination of the skip-instructions leaves enough room. I have to update the VHDL now...

  • Keeping the carry bit in the LSB of the PC can have a curious side effect : relative jumps with odd values will make the carry bit ripple to the other bits of the result, so the destination address that is written in the PC will depend on the value of the carry bit. In practice, there is no speed or size advantage (compared to condition codes in the new opcode extension) but the possibility is there...
  • Clearing the carry flag is done with
  CMP Rx, Rx
  • Setting the carry flag is done with
  CMP -1, Rx

(or something like that)

Usually, I would end the post with something along the lines of "this is good and everybody is happy". Now, I feel a bit disapointed that YASEP looks more like other architectures, and has less distinguishing features. It is less groundbreaking and it will have to face the same problems as the others, on top of its inherent quirks. But it's still better than nothing and I do my best to keep the system rather coherent and orthogonal.

Thursday 19 March 2009

what about YASEP2009 ?

Development of and around YASEP is going on in a weird way, but it still continues...

Why so much caution ? Because the changes to the architecture are quite deep. The instructions forms are increasingly complex and I've pushed the design beyond what I intended in the beginning.

If you don't remember, YASEP had only two ways to address data previously :

short form :

 Reg1 OP Reg2 => Reg1  (16 bits)

long form :

  Reg1 OP Imm16 => Reg2 (32 bits)

Now a few bits are freed and this gives much more "flexibility", so I added :

Short Immediate :

  Reg1 OP Imm4 => Reg1 (16 bits)

Long Register :

  Reg1 OP Reg2 => Reg3 (32 bits)

And because there was still some room, this last form has more elaborate versions :

Long conditional :

  Reg1 OP Reg2 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

And other versions come up when the Reg2 field is interpreted as Imm4 :

Long conditional short Imm: (excuse the name)

  Reg1 OP Imm4 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

Or without condition :

  Reg1 OP Imm4 => Reg3 (32 bits)

This applies to the computation instructions, the control instructions are still too undefined yet.

Code density should increase, which is worth the efforts. I don't know if it will reach the level of ARM or x86 but it is certainly a major advance. However, this breaks a lot of the assembler's mechanisms, so I prefer to rewrite it. This takes a while because the rest must be adapted too : the Instruction Set, the manual pages, the validators...

If you can't stand the wait, have a look at a precent, broken version at http://yasep.org/~whygee/yasep2009/, at least it is more recent than the main site.

Wednesday 18 February 2009

Listed : the dynamic LISTing EDitor

So I've been busy again...

This time, it's all about JavaScript. The preliminary version is available from http://yasep.org/~whygee/listed/listed.html

What is it really ? It's an interactive assembler in dynamic HTML, loaded with JavaScript and CSS stuff. It's also an interface to the JavaScript assembler and the simulator.

  • The little windowing system allows one to break a whole program into small chunks, that are easier to manage. Assembly langage listings can easily get messy, but local symbols and hideable sections reduce the usual clutter on one's window/screen.
  • As the user edits each line, the modifications are committed to the rest of the page : the instructions are re-assembled, the labels are updated where they are used, the simulator can reinterpret the sequence and give preliminary results for given testcases...
  • The assembler is not limited to YASEP : the CPU interface is going to be generic, and LISTED could support any CPU that can be described in JavaScript (that means : all, provided enough adaptations are coded). A dummy, overly simple and dumb CPU architecture will be given as an example, so somebody can easily adapt it for x86, PIC, Alpha, MIPS, POWER, or RCA1802 ...
  • This is going to be linked directly with ARF, which is another graphic coding interface.

I have been working on this for more than 3 weeks and a lot of work still remains. I focus on user comfort and UI design but I keep flexibility and expandability in mind. For example, I have developped YGWM to handle the windowing part, which will be reused by the whole yasep.org website. The assembler and simulator will remain completely decoupled.

In the end, it only confirms what I believed for some time : JavaScript is a fantastic opportunity for really new ideas, it provides portability and rapid design. However, after trying to make it compatible with different browsers, my strong recommendation is : use Firefox and stick to it

Friday 23 January 2009

YASEP2009 : "It's gonna be big"... when it comes

The YASEP architecture has changed so much that a big rewrite is necessary.

My local copy is so... broken here and there that I prefer to not update yasep.org. The modifications are so deep that it's not possible to just patch a few things.

The organisation of the website should evolve a lot and I'm thinking about new techniques.

The documentation must be partially rewritten, not simply updated here and there.

Today's site structure dates back to 2006, maybe the big rewrite is a good thing in fact.

However, this is so much work, and my concentration is so volatile, that I wonder when the website will be updated with something stable enough to be almost publishable. In fact, I'd rather not wonder, the answer would scare me. Anyway, I see that many efforts I have done in the past years have been fruitful and helped build the project as it is now. So I keep faith and continue.

Monday 19 January 2009

Yet another new Actel toy \o/

As you may know, YASEP16 will probably be used in my girlfriend's "pet projet" Ours Agile. This involves lots of real-time computations, countless sensors and more than 30 actuators... Sure, YASEP could handle that, probably. But the interfacing was giving me headaches, so many analog components (on top of high-speed memory) seems expensive and/or difficult.

Then I spotted a second-hand AFS600 evaluation kit from Actel, that I got for a fair price. It was a bit risky and I first thought it was broken. But since it's 2nd hand, somebody has probably played with it, and just uploaded a new configuration bitstream. With the help of a french rep., I found and uploaded the original demo bitstream and ... Magic happens !

Actel AFS600 eval board plugged

This FPGA family comes at "premium price" but it's a damn great opportunity for robotics projects :

  • 512KB of program space as Flash EEPROM (no need to download from external SPI !)
  • onchip 100MHz RC clock generator (exactly what I'm aiming at !)
  • RTC, temperature sensors, low power...
  • high-speed 30-channel ADC !
  • several integrated MOSFET gate drivers
  • 13K tiles vs 6K on the A3P250
  • 24 SRAM blocks vs 8 on the A3P250

This is definitely a great toy for robots...

Tuesday 6 January 2009

Evolution of the instruction set

As the execution units mature and get integrated as one block, things become clear, at least concerning the computation instructions. I'm currently focusing on the 16-bit flavour of YASEP and I expect that the following will hold true for YASEP32.

The ALU16 is nearing completion, though feature creep is still rampant. But I have identified a bunch of instructions that will not change much in the future, and they are gathered here :

- ROP2 : AND, OR, XOR, ANDN, ORN, XNOR, NAND, NOR
- ASU : ADD, SUB, ADDS1, SUBB1, ADDS2, SUBS2, MIN, MAX
- SHL : SHR, SHL, ROR, ROL, SAR  + MUL : MUL8L, MUL8H, MULINIT
- IE : MOV, SB, LSB, LZB (16/32b) SH, SHH, LSH, LZH (32 bits only)

This nice and square table represents the large majority of the used instructions, and this fits into 4 groups of 8 instead of the planned 8 groups. So...

This saves a bit that is used to encode other addressing modes. In 2008, there were 2 modes : short mode (RR) and long mode (RRImm16). Now, it is also possible to encode a short immediate in the short mode (RImm4, the register is replaced by a value), or use another register as a destination in the long mode (but 12 bits are unused).

Yes there are now 4 addressing modes and most code should feel their binary size shrink ! Furthermore, the datapath complexity is not impacted and the 3-registers version should reduce the number of cycles for a given portion of code.

How this affects usual code :

- add 1, r1 ==> r1 += 1

now takes 2 bytes instead of 4. The constant can range from -8 to +7.

- add r1, r2, r3 ==> r1 = r2 + r3

It takes 4 bytes as previously but it saves 1 clock cycle, compared to

- mov r2, r1
- add r3, r1

Note that the yasep.org site is not yet updated, I'll wait until things settle down.

Thursday 1 January 2009

Barrel Shifter : SHL16 ready

Hello and Happy New Year Everybody !

I took some time to work on the next major building block of the YASEP16 execution unit : the shift/rotate unit is now ready in 16-bit flavour.

I concentrate now on YASEP16 because it is smaller and marginally faster, and consumes less bandwidth. It can fit easily in the A3P250 and its 6K 3-input tiles, though i don't know how many tiles are needed in the end.

SHL_16 uses about 220 tiles, and Actel's place&route estimates the unit to run at 140MHz in pipelined version. This is slightly faster and smaller than ASU_ROP2 that performs Add/Sub and boolean operations (115 MHz and about 350 tiles). The overall ALU (ASU_ROP2 + SHL + IE) is going to take roughly 700 tiles, or 1/8th of the A3P250's surface. Speed is looking satisfying, as I intend to clock the thing at 96MHz on the ACME boards (64MHz * 1.5 with the PLL).

Overall, the following operations are ready for the 16-bit flavor :

  • ASU : ADD, SUB and compares as side effects.
  • ROP2 : AND/OR/XOR/NAND/NOR/XNOR/ANDN/ORN as well as comparison for equality (XOR followed by a OR reduction tree)
  • SHL : SHR/SHL/ROR/ROL/SAR

The next part to be developped is the IE (Insert/Extract) unit, for the load and stores of bytes into a half-word. Stay tuned...

''Note : some P&R runs give a bit higher working frequencies but I reserve 15 or 20% of margin, since I expect that all the units put together will need even more MUX2 all over the place, longer wires etc. resulting in slower operation.' Furthermore, it is only YASEP16 yet, and the 32-bit flavor will double the design's size... '

Friday 19 December 2008

How to double the SRAM capacity of a FPGA board ?

The FoxVHDL and the Colibri boards from ACME Systems come with 2 SRAM chips of 512K Bytes, so one application can benefit from one megabyte of 32-bit low-latency access. But even 1 megabyte may be too small for some uses. Some time ago, I found a way to extend the capacity : piggy-back soldering of another SRAM chip.

2 FoxVHDL FPGA boards from ACME Systems (modified by YG)

To keep the chips identical and avoid timing unbalance, I had to take the SRAMs from another board, but it is not a concern since this second board will be used for some purpose that does not need SRAMs.

I should stress of course that not only unsoldering, but also re-soldering is difficult, but it went well, thanks to special, adapted tools.

Of course, there is a trick : memory is not simply expanded this way. One has to reserve a new address bit, or both memories will be mapped to the same addresses. I have chosen to not connect the Chip Select pins of the additional chips, so they will be wired later to another unused FPGA output.

Two SRAM chips soldered on the footprint of one

If you want to attempt this hack on your board (whether ACME's or any of the other FPGA boards with static RAM), don't forget that adding pins on a bus adds capacitance, and slows the signals. The clock frequency won't be as fast as before, so make extensive tests to assert the new working parameters.

One way to keep the frequency high is either to use a larger SRAM chip (like Cypress or IDT 512x16b or 1Mx16 but they are difficult to find and expensive), or faster SRAMs : ACME uses 12ns chips, but other compatible chips are available with 10ns and 8ns access times (try Farnell). Also, you can control the rise/fall times with the I/O current options of the ProASIC3 pads, they can be set to several values.

Next step : using even higher-frequency, synchronous Static RAM, because they have a much higher bandwidth. However I don't know yet how to control the tight timings...

Thursday 18 December 2008

Site update, architecture modifications, and new FPGA boards

I recently got 3 colibri boards ! When you think about Italy, you think Ferrari and other excellent things, now I'll also think prototyping boards ;-)

Thanks to ACME systems, I bought 2* A3P250 and one A3P1000 boards for a friendly price. These are pre-series units and may slightly differ from later versions, but they are really as cool as the pictures let you think.

3 prototype Colibri boards from ACME Systems

The website is also updated : the JavaScript engine is now mostly functional for YASEP16 and YASEP32 versions. The documentation is not updated and many dark corners remain in the architecture definition. I have chosen to publish the latest versions, since I don't know when I'll do this next time.

Thursday 11 December 2008

The new Colibri board is almost here !

I just received ACME System's invoice for some of their new Actel-based boards. So I went to their website and remark that it is updated : ACME Colibri board

It is lighter, better and slimmer than the FoxVHDL board, as it seems that the VGA output was rarely used. The optional composite encoder seems to have been even less used, and it used some space. So the Colibri should be a bit cheaper too :-)

I don't know when I'll get mine, but I ordered botht the A3P250 and A3P1000 version.

Monday 1 December 2008

YASEP is published under the AGPL

Recently, I have finally chosen the licence for the YASEP project : it's the Affero GPL as published by the Free Software Foundation.

It is practically the same as the GNU GPL but with one interesting twist : You have to provide all the (derived) source code if you use it on a server.

For the YASEP project, it's not a problem because all the "intelligence" is provided by client-side JavaScript code, and the rest is static or dynamic HTML (not server-generated pages). However, using the AGPL is a clear sign that YASEP is not just a bunch of RTL files packed with documentation pages. It is a living, dynamic, organic set of files that interact with each others...

Also, I would like that eventual contributors keep the structure of the files and directories, so the whole archive remains available to anyone visiting the sites. YASEP directly provides the link to the current archive on the main page, and I believe that this is a good thing that others will do in the future.

Concerning the VHDL source code : since the only difference between AGPL and GPL is the server clause, well, I distribute the RTL files with AGPL too. One licence to rule them all...

YASEP2009 in preparation

A new big update of the YASEP website package is under development. Several improvements are already done but not uploaded yet :

- I corrected a small "bug" with Opera with the floating window (thanks to Laura for the help !)

- I added several pages, about Special Registers, the AGPL licence, the differences between YASEP16 and YASEP32... And a new VHDL directory appeared.

- The opcodes are undergoing major changes, too many to explain here

- The architecture abandons the CQ register but the documentation is not yet updated...

I hope that the package will reach stability in early January. There's a lot of work to be done...

Friday 14 November 2008

Open Graphics board needs more preorders !

I got a nice contact with the Open Graphics project :

http://www.traversaltech.com/store.phtml

http://www.opengraphics.org

http://www.openhardwarefoundation.org

They need more than 40 preorders before then can do the first batch of boards ! With the biiiig FPGA on this, helped with the fat and fast SDRAM, it's not just a good candidate for a graphics card, it's a dream for CPU designers !! And look at those nice extension connectors...

It's going to cost roughly $1500. If I had the money, I'd buy it right away. However I'm broke AND the free Xilinx software tools don't seem to work for this large FPGA. Again, I'll have to do what I have done during years : wait...

Building momentum

A few days ago, I was contacted by a teacher from a french research laboratory (http://www.femto-st.fr/) who wants to integrate a softcore into a Xilinx FPGA. He tried to integrate a LEON into a 200k gates array without success. Knowing that I worked on F-CPU and now on something else, we started to talk. I had estimated that YASEP-16 could fit in one half of a 250kG Actel chip

Now it seems that his student is starting to dive into the whole mess that I've accumulated on http://yasep.org :-D It's very intriguing because I intended YASEP to be a 1-man project. I'll have to slowly give up on this idea... But this is good because it can only get better : external points of view can spot inconsistencies or weak points, test assertions that I thought valid...

It's getting quite interesting now and I am even more motivated and excited ! YASEP is slowly growing, it's not just a little personal hack anymore. Until now, I was alone on board, even when http://ours-agile.org asked for the core. But other people now look deeper at the source code...

Sunday 26 October 2008

No news, good news ?

So I've been busy.

I've spent most of the summer developping YASEP and now I'm almost broke, so I'm hunting more "mundane" activities (of the kind that will maybe help feed my geekette and myself).

Fortunately, before I blew up my savings, I was able to buy toys that will be useful "in the future", like those über-sexy 4Mx36bit synchronous static SRAMs that clock at 225MHz (unfortunately, available only in BGA, but I've grown up recently)...

But seriously, I'm stuck :-(

I would LOVE to spend all my time developping YASEP, because... well it's so easy now, and everything is coming together at last ! I would just think about it, hack a bit more and get new results... Furthermore, I receive some very positive feedbacks now, particularly from ACME systems. Their new FPGA board (called COLIBRI and equiped with a A3P1000) is almost ideal for a 32-bit YASEP ! I can even swap the 12ns SRAM with 8ns versions and get a boost from 66MHz to 100MHz...

As a side note, I have just found how to integrate the UMIN/UMAX instructions in the FPGA implementation, without bloating the pipeline.

So I'm OK but it could be much better, in a less material and pragmatic world :-/


PS : Not everything is bleak : after last year's breathtaking series of concerts, I jumped in Satine's new project Satine ünder philharmonëën' wich involves 40 classic musicians. I'll add LED lights all over the stage !

Look at the singer in this video : Mia wears a blinking LED earring that I designed especially for her last year. If you're interested by a custom version, don't hesitate to ask me !

I know, it's not related directly to CPU design, but the new jewel I'm preparing will have 8 cores...

Monday 22 September 2008

A suitable HW platform for CPU design ?

My order to Lextronic.fr is delivered : I purchased a couple of "FOX VHDL" boards from http://www.lextronic.notebleue.com/P1792-platine-dextension-fox-vhdl.html

The FoxVHDL board (image courtesy of Acme Systems, Italia)

This small board has all I need, and only that, for YASEP : a A3P250 FPGA, and a couple of 256Kx16 SRAM at 12ns. More informations can be found at http://www.acmesystems.it/?id=120 and a new version of the board will appear soon ! (without the VGA connector that takes a lot of space) Even at 100 Euros each, with only 1MB of memory, this is an excellent board with a lot of potential !

Only problem : the SRAM has "only" 12ns of access time. YASEP is aimed at 100MHz. So either I run at 64MHz only (too bad) or I find 8ns versions of the SRAM chips. I've spent lots of effort in the 2nd option and it is not fruitful yet. The 12ns parts are so much easier to get !

I could try to play tricks with the pipeline gates (doing "wave pipelining" in the memory chip) but it's very risky. Well, it would be better to make my own board... with those crazy 200MHz synchronous SRAM chips that I also recently found. But that will come later : It's better to use the Fox VHDL boards first, write the VHDL code, and later only make a custom board.

In the mean time, I should get acustomed to the use of my new shiny hot air rework station. BGA chips are soon going to be solderable :-)

Monday 18 August 2008

New register organisation

The architecture of YASEP is very unorthodox. It is a living experiment and evolves in many unexpected directions.

However, one known uncertainty has always been how to implement the instruction fetch mechanisms. The memory queues have been a guideline but no organisation has been tested yet, and validated. Back when VSP emerged from the chaos of my brain, I wanted to use one of four queues to fetch instructions, and to indicate the current queue in the 2-bit CQ register. This idea was already implemented in the RCA1802 processor (the 4-bit P register) but this adds some overhead (and YASEP's instruction stream was never meant to be ultracompact).

Funny : I find more and more common traits between YASEP and 1802 :-)

The CQ register (just like the COSMAC's P register) also slows down the core as a whole cycle is necessary to fetch the opcode from the queue. This goes against the idea of a pipelined processor, the pipeline being the implementation of a sequential principle and sequence occurs a lot in an instruction flow.

However, the availability of several queues as potential pre-cooked jump destination (address as well as corresponding data) is very interesting and this remains in the YASEP architecture. A jump instruction with a direct immediate adress remains possible but with some (future and planned) architectures, there is the risk of a high execution latency.

I recently came to the conclusion that a compromise between the completely weird and the classical approaches is necessary.

So I keep the memory queues but the first one is modified and assigned to the instruction pointer and a status register. I had sworn that I would never do that, but I'm forced to admit that in a sense, and in the current situation (where no cache can support parallel memory accesses) something "looking like that" is necessary. And I'll do my best to avoid the inherent traps !

First, why do I need the registers #0 and #1 to hold these values ? In the currently planned first implementation, I can use a bank of 512 registers, or 32 banks of 16 registers. This means that context swapping can be very fast (1 major cycle) and I need to save many informations at once. If these informations are stored in the SR space (as previously planned), more cycles are needed to save/restore the "whole" context. So the best place to store these critical informations is in the register set itself. I could have chosen to create another parallel register bank but this would consume too much memory. The availability of the "Current/Next IP" is also very useful for computing addresses in position-independent code.

So the new register map is :

0h: IP (replaces A0)
1h: ST (replaces D0)
2h: A1 \ Q1
3h: D1 /
4h: A2 \ Q2
5h: D2 /
6h: A3 \ Q3 
7h: D3 /
8h: A3 \ Q4
9h: D3 /
Ah: A3 \ Q5 
Bh: D3 /
Ch: R0
Dh: R1
Eh: R2
Fh: R3

Second: what does the Status Register contain ? Of course, I avoid the storage of carry flags and such. But I can't avoid the auto-update bits of the 5 remaining queues. They use 2x5 bits and 6 bits are unaffected yet (for how long ?). These two bits per queue represent the following codes :

00 : no update
10 : post-incrementation
11 : post-decrementation

2 queues are able to implement a normal stack (LIFO) and 2 additional bits represent this ability. So Q4 and Q5 have the following properties in the Status Register:

bit N   : update on/off
bit N+1 : update up/down
bit N+2 : stack on/off (pre/post modification)

Third: These registers are not really real registers. These are "shadowed" registers with a physical instance copied somewhere else. This is necessary because the register set can't have enough ports and these 2 specific registers are critical and accessed every cycle . So their incorporation in the register map makes them easily remanent through context switches and IRQs, as well as easily alterable (without going through get/put instructions) but the register bank is updated only when these new registers are accessed. Some new datapaths must be reserved for them.

sigh...

This means that most of the opcode map (the part with the jump instructions) must be redesigned.

re-sigh...

- page 1 of 2