YASEP news

To content | To menu | To search

Monday 19 November 2012

YASim is working

Another quick announcement : YASim, the YASep Simulator, is now able to compute data, loop&call, and even display pixels on the screen.

It had to be said ;-) Particularly since the presentation is so soon...

Thursday 15 November 2012

The YASEP goes to Berlin

It's official : the YASEP is invited at the Exceptionally Hard & Soft Meeting held in Berlin on december 28-30 2012. 

A 30 minutes talk will present the project, it's shorter than at JM2L but it will be completely in english. It will also be the official transition between YASEP2011 and YASEP2013.

Hopefully, better live demos will be possible by then ;-)

Thursday 4 October 2012

YASEP@JM2L2012

It's official !

The YASEP will be presented during two hours, on saturday, Nov. 24th at Sophia Antipolis, during the Journées Méditerranéennes du Logiciel Libre organised by Linux Azur

Informations are available at http://jm2l.linux-azur.org/content/2012/yasep

The slides will be published during the presentation and will start the transition between the YASEP2011 and YASEP2013 architectures.

There is still a lot of work but it's looking good so far. See you there !

(updated 2012/11/15)

Monday 24 September 2012

Virtual Load and Store

 

The Instruction Set Manual is nearing completion.

As if things weren't serious enough, I'm now reviewing the Load and Store instructions. And since there is none, I removed them.

Confused ? It's normal : the YASEP has no load or store operation. But there were opcodes named after this, thinking they would make people feel more comforable. Which was not the brightest idea ever.

So I removed them and kept the aliases I had created : the Insert and Extract opcodes. As the name says, they insert and extract bytes or half-words (8 or 16 bits).

So far the YASEP had these opcodes : IB IH IHH ESB EZB ESH EZH

End of story ? NO !

 

Register set ports

The review of the Instruction Set Manual unearthed some concerns I had for a while : the "Load" and "Store" instructions get auxiliary data from other implicit registers. Which can create quite an electronic mess... And it's not flexible enough. It was created for the sake of load and stores, using the "register pair" system where the Address register provides its LSB to the shifting unit, so it knows how much alignment is necessary.

While reviewing the ESB / EZB instructions, I remarked that it was a one-read, one-write instruction (in FORM_RR or FORM_IR). What a waste of coding space, let's make it a RRR or IRR instruction and get rid of the implicit operand that must be fetched in the other registers. Easy, and architecturally much better.

But the Insert instructions (IB and IH) are another beast, they need 3 operands and a destination. The first operand is the register that contains the data to insert, the second is the word that will receive the data, and the 3rd operand is the implicit shift count that comes from the corresponding A register if a D register is written...

It's not cool. First there is this rule, that I thought was critical, of having only two read ports for the register set. Second, splitting the register set for performing parallel reads is a trick that can backfire mercilessly later. Going 3reads-1write would be great... At what cost ?

But wait, the YASEP is already 3reads-1write because there are the conditions to read. The microYASEP implements this by passing both halves of the extended instruction through the 2-reads register set. And there is one result that remains unused during the second cycle... That's it !
 
So in practice, despite the limitations, the microYASEP is a 4-reads 1-write engine. 1 read for condition, 3 reads for operands and the destination register is one of the operands. Great. Now let's create the new flags "READ_DST3" and "IMM_LSB" and change a lot of the code that is already around... Lots of work indeed.
 
 

The return of the carry flag

The YASEP has a carry flag and a zero flag so let's put them to good use.

One concern of the YASEP32 when inserting and extracting unaligned half-words, is the case when the offset is 3 bytes : only one byte is available in this case. This means that the result is partial and not working as expected. The simulator sends an error message but that's not very handy. The assembly code that detects this case could take at least a pair of instructions, then there is the leftover to align...

The little stroke of genius is to change the carry flag when such a situation occurs. Then, for the user's code, it's just a matter of a few conditional instructions to increment the pointer and store the remaining byte at the new address (or just skip the alignment sequence). Something like

 ; Unaligned write :
; memory is pointed to by A2:D2,
; the code stores 16 bits located in R1
IH A2 R1 D2 ; set the carry flag if A2's LSB are 11

EZB 1 R1 R2 CARRY ; If out-of-word then
; extract the high byte to temporary register
ADD 1 A2 CARRY ; point to the next byte
IB R2 A2 D2 CARRY ; now D2 points to the next word
; and its value, R1's high byte is inserted
; in D2's lower byte

It's a bit far fetched but it's still RISC.

 

Is IHH still needed ?

The instruction "Insert Halfword High" is a special case of IH that was created to supplement MOV so the user could overwrite the high half-word of a register. The intended use was this:

 ; put 12345678h in R1
MOV 5678h R1 ; LSB
IHH 1234h R1 ; MSB

A special opcode was needed because IH would get the shift from one of the A registers, or use 0, yet we needed to shift by 2 bytes...

Now we have a new IH that gets the shift amount from a register or an immediate field so things would be great in theory. In practice, there can be only one immediate number so IH is ruled out.

What we need is a MOV instruction that can shift the immediate value by 16 bits. So here we have it : MOVH. The instruction sequence is modified a bit :

 ; put 12345678h in R1
MOVH 1234h R1 ; MSB
OR 5678h R1 ; LSB
YASEP2013 looks better and better...

Monday 10 September 2012

el YASEP en español

The YASEP site is already available 3/4 in french (some less than important stuffs are not translated) and the language picker and all the machinery behind it are working well so... we started to translate a first page in Spanish ! Thanks Kris !

Other languages are welcome and desired : if you want to help, a page explains how translations work.

Friday 10 August 2012

The Zero status flag

Things are getting quite messy in the architecture now. Sorry for the long rant !


It started with the Carry flag. I hard turned the problem in every sense and perspective, and couldn't find a proper way to deal with it. The Golden Rule of all modern architectures is : no f****** carry flag or status register. MIPS has an overflow trap, F-CPU has a 2-reads-2-writes instruction. But the YASEP can't afford this luxury...


I thought I solved it with a dirty trick : storing the carry flag in the Program Counter's Least Significant Bit (PC's bit #0, to make the number odd or even). But it was really too ugly (I want this critical bit for other purposes later) so I moved the carry bit somewhere else, out of the register set.

It's still messy but... OK, it still works. For example, if an exception occurs, it's still easy for the same HW thread to save and restore the Carry flag's value. It's still longer than the single byte needed to save all the flags of an x86 CPU (PUSHF) but hey, we're RISC aren't we ?...

; save Carry to R1 :
mov 0 R1
mov 1 R1 CARRY
; restore carry from R1 :
add -1 R1

This works for a few entangled reasons. The carry flag is updated by only a handful, adder-specific instructions. This means that it would not be updated by mundane MOV instructions, for example, if the register set is manually saved or restored. The PC's bit #0 can also host the carry flag when it is automatically saved in hardware for an exception handler (that is, if I totally forget the fact that it's SMT and the handler could simply execute from another hardware thread).

And before the carry stuff, there was no such flag in the early YASEP/VSP, true to the Holy Dogma. There were carry-generating versions of ADD and SUB, where ADDC and SUBB would set the destination to 0 or 1 depending on the result's overflow. The problem is that unlike most RISC architectures, the YASEP does not have a lot of available registers. ADD and SUB are often used to compare values' magnitudes and the results require a temporary register. Since the YASEP has only 5 "normal" registers, it means that 20% of that space gets screwed up. Compare that to 3% for MIPS...

ADDC and SUBB have thus been replaced by CMPU and CMPS around 2009. They work by adjusting sign bits, substracting, and not writing the result, so a register is saved. The result goes to the carry flag, which is read by a new condition code.

...

OK, so now the YASEP has a Carry flag. It has found a nice place in the condition codes map. But wait, there is another condition bit available...

At first, I thought that it could be useful as an "overflow" flag (that is : if the result's sign was different from the operands' signs). But to this day, I still have to find a case where it is useful. Futhermore, the CMPS and CMPU instructions already deal with the operands' types.

Another useful bit would be a flag that indicates whether a critical section (opened by the CRIT opcode) had been aborted. This is the quintessential flag because it is totally context-dependent and write-only. No need to save it or restore it : if a trap occurs inside a critical section, just flush the bit and hopefully the critical section will be restarted by the application.

But I got lazy. Or, instead, I programmed real code and found that I would be happy with a flag that indicates if the last result was zero.

I got lazy and wrote this in the first microYASEP VHDL code :

  if WB_en='1' and FlagChangeCarry(int_opcode)='1' then
Carry <= Carry_out;
FlagZero <= zero_out;
end if;

Yes, I realise just now that it gets updated at the same time as the Carry flag. Worse : I see only now that the zero flag gets its value not from the Adder's result, but from the binary difference of both operands.

  zero_out <= '1' when ROP2_xor=(ROP2_xor'range=>'1')
else '0';

It's a nice trick because it doesn't increase the critical datapath for the adder : the big combining ANDN (with 16 or 32 input bits) is computed in parallel with the carry chain of the adder. But then it means that the Zero flag makes sense ONLY with the CMPU/CMPS instructions, not ADD or SUB... Which is a standard and expected behaviour in most other architectures :-/

But wait, in the above VHDL code, FlagZero is also updated for ADD and SUB ? Now this could eventually explains certain curious bugs I had...

...

OK, this is messy now. But it's not finished.

What I would ideally want is to have the zero flag updated any time a register is written, too. This corresponds to tying a big OR to the write bus of the register set. This potentially adds some significant latency to the pipeline. But... Is that useful ? If the result is written to a register then this register can be tested anyway with the usual conditions.

So really, the Zero flag is useful only for CMPU and CMPS, that indeed are the only computation instructions that don't write back the result. The VHDL code must be corrected.

And how/where can one save both the carry flag and the Zero flag ?

Saving one flag was already complex enough, saving two flags is still possible but longer.

; save Carry and Zero to R1 : 10 bytes
mov 0 R1
mov 1 R1 ZERO
add 2 R1 CARRY

; restore carry from R1 : 6 bytes add -2 R1
; restore Zero
and 1 R1 CMPU 1 R1

Wouldn't it be easier if the flags were accessible as a single normal register ? There are no registers left but Special Registers could do it.

; save Carry and Zero to R1 : 2 bytes
GET -1 R1

; restore carry from R1 : 2 bytes PUT R1 -1

But should it ? Moving data to the Special Registers is a slippery slope and when we start doing it, we want to apply this principle over and over and... it gets even more messy ! How many status flags will end up there and will this special register be called "status register" ?

Obviously I don't want this so I'll just avoid it and use the slow method for now.

Friday 27 July 2012

Nikolay's questions

I just received an email from another "hacker" who raised a lot of interesting questions. I answered by email and I share here some important ideas and insights.

Nikolay wrote :
> To be honest, I was more interested in YASEP16, because that would
> much harder task to solve compared to YASEP32 (32-bits are generally
> much more easier to define an instruction format, and if one sticks to
> fixed-length 32-bit instructions & encoded instruction format, things
> will look generally at least acceptable, if not even good).

I don't see what is so hard. It just happens that if you can do more, you can do less. it's explained there :

http://yasep.org/#!doc/16-32

http://yasep.org/#!doc/forms

> Let me start few steps away - I enjoyed to see that you're finally
> pissed-off of using separate tools and sources, and you had started to
> play with an integrated model for the CPU, that's used for
> rtl/docs/assembler/disassembler. I think this is so major step, that I
> can't even find any cool words for this. My understanding was that
> lots of interesting CPU projects die too early because the burden of
> supporting all the separate tools in a compatible state for both the
> developers usage and the community just quickly exausts the peolpe and
> they stop pushing forward. Having the model-based approach will
> hopefully leave more time for fun for the hobby-CPU designers :D.

I suppose that you refer to a hobby project in particular, right ? :-) In fact, building a whole, free, self-contained and EASY TO USE toolset was the main motivation : I think that the YASEP is a sub-standard architecture as it is now (yet it is being polished), but since it is so simple, I can also work on getting the tools RIGHT for a more "potent architecture" (whether totally new or dusted off from the archives...)

> Another interesting thing to see was that the YASEP16 & YASEP32
> partially share the instruction format. I was wondering whether the
> actual intention was to have forward binary compatibility with the
> YASEP32?

This part of the architecture is not well understood and I still struggle to explain it correctly. It is explained there http://yasep.org/#!doc/16-32

YASEP16 and YASEP32 are binary compatible on many levels (see the link above). They do not "partially share" the instruction format. The instruction format and decoder are 95% identical. Bus widths change and some 32-bits specific instructions are invalid in 16-bits mode. That's all.

In fact, you could create a single CPU core that executes 16-bits and 32-bits core with only minor alterations of the decoder. Switching from 16-bits to 32-bits mode should only affect memory addressing and organisation.

Furthermore, I once needed a MCU that would only handle 14-bits data : I see no real limitation that prevents me to make a N-bit CPU (N<=32) as long as the bus width is equal or larger than the instruction address bus. Then YASEP16 becomes just a particular subset of the architecture family.

> Now some questions for the instructions - I looked at the instruction
> formats and also at several instructions. Is there any difference
> between FORM_iR and FORM_Ri (used by GET & PUT)?

This is addressed in http://yasep.org/#!doc/forms#FORM_iR :

"This is just another way of writing iR when the instruction uses the immediate value as a destination address."

In other terms, it's just to make the assembly language conform to the rule 1 that the destination is always the last part. See http://yasep.org/#!doc/asm#asm_ila

Physically, iR is encoded as Ri :

http://yasep.org/#!ASM/asm#a?PUT%20A5%204 : PUT A5 4 => 4762h
http://yasep.org/#!ASM/asm#a?GET%204%20A5 : GET 4 A5 => 4742h

The immediate "4" stays at the same place. However, it is clearer to keep a writing rule and the source operand is easier to spot, it does not depend on the opcode.

> According to basic math instructions - I saw that ADD & SUB generate
> Carry/Borrow, but I'm not sure whether these instructions actually use
> this flag. Generally it's usefull when implementing arithmetics with
> integers wider than the CPU registers (and it's major pain in the ass
> when it's not available).

Carry/Borrow are used only by conditional instructions.

In practice I have seen that ADDC and SUBB are quite rare and I couldn't justify the added opcodes. Having a specific set of conditions, however, is far more interesting. So if you want to add with carry/borrow, it's a bit awkward but here is one way :

; R1 + R2 => R3:R4
ADD R1 R2 R3 ; generate the carry
ADD 1 R4 carry ; suppose R4 was cleared.

It's a bit unusual, but possible. All the necessary data are here, and if you want to do number crunching, use a more appropriate CPU :-) This one is for doing "control" stuff.

> Other things for commenting - I was a little surprised to see the > assembler format like "MOV Rx Ry" and I was wondering what influenced > you when designing the ASM syntax?

Experience with several architectures :-) Before F-CPU I had already made a few assemblers for existing and hypothetical architectures. Simpler is better but expressiveness and consistency must not be forgotten : one must see the intent in the source code.

> SHR, SAR, SHL, ROR, ROL - I was pleasantly surprised to see a full set
> of shift operations on a small micro. I have also one question about
> the shift/rotate operations - I didn't see anywhere that the Carry
> flag is used/updated by these opcodes. Generally playing with the
> Carry is used when converting between receiving/transmitting 1-bit
> data (think for software SPI/I2C/UART). I suppose that your intention
> is to use instead logical operations to extract the LSB, like "AND 1
> R3" for example?

Shift/rotate don't take the carry. I can't remember how many times I had to clear the carry flag before doing a shift on "another widespread architecture". How annoying.

If you want to shift a bit out, there's an easy way :
ADD R1 R1 ; => carry !
OR 1 R2 CARRY;

Or you can :

SHL 1 R1
OR 1 R2 MSB1 R1

Or the other way :

SHR 1 R1
OR 1 R2 LSB1 R1
Another method, if you want to have an arbitrary bit order, is to use a condition on R1's individual bits :
; count the number of bits set in R1's 16 LSB:
MOV 0 R2
ADD 1 R2  BIT0 0
ADD 1 R2  BIT0 1
ADD 1 R2  BIT0 2
ADD 1 R2  BIT0 3
ADD 1 R2  BIT0 4
ADD 1 R2  BIT0 5
ADD 1 R2  BIT0 6
ADD 1 R2  BIT0 7
ADD 1 R2  BIT0 8
ADD 1 R2  BIT0 9
ADD 1 R2  BIT0 10
ADD 1 R2  BIT0 11
ADD 1 R2  BIT0 12
ADD 1 R2  BIT0 13
ADD 1 R2  BIT0 14
ADD 1 R2  BIT0 15

(just an example, there are better and faster ways)

Note that this feature makes sense in a microcontroller, its purpose is to handle bits. It could be unavailable in a more "streamlined" version, because of pipeline delays. but you never know.

> CMPU, CMPS, UMIN, UMAX, SMIN, SMAX - just plain awesome. "Dude, this
> is not your grandfather's PIC!"

That's one way of seeing this. Notice that MIN/MAX are not available on certain implementations though. I can play with the inhibition of writeback to the destination register because i don't want to add a conditional MUX in the datapath, too much wire load. That would slow the whole system down. But it can "partially work" with the inhibition trick on some operands combinations.

> The signed 4-bit immediate is the victim of the short instruction
> format :D.

I think I have found a good balance because short immediates appear very often. However I wish I had room for 8-bits operands. Anyway, the whole thing remains orthogonal and (almost) simple. It's a compromise and there will always be annoying cases.

> I was joking before several days that if we don't have
> support for immediate values, we won't have any data to load because
> there's no way to create the data in RAM in the first place (which is
> not true of course, reading hard-wired constants from a special reg
> and/or incrementing/shifting/performing bit ops will still provide
> valuable way to enter data in the programs, but still... it sucks
> compared to the immediates. Actually, my opinion is that the immediate
> is always a victim of any ISA - when having fixed instructions you
> either sacrifice immediates for register addresses and more operands,
> or you sacrifice the operand count (and reuse one of the operands as
> both source & destination, doh), or you have multiple instruction
> formats that had to be decoded simultaneously in order to provide the
> needed flexibility (that's not cool, but it's inevitable price to
> pay).

That's the same dilemmas for everybody :-)

> About the instruction condition codes - are these available only for
> the YASEP32? And also, are they emulated by the high-level assembler
> when generating machine code for YASEP16?

They are available for both YASEP16 and YASEP32. The datapath width is not the instruction width.

> Btw, I didn't found a description of all the programmer-visible
> registers, so I'm not sure what are these used for (they look somehow
> like memory index & memory window registers, but that's a wild guess).

I am working at this moment on the related page. I work both on the french and english version to keep them in synch. An older version is there : http://yasep.org/yasep2009/docs/registers.html However several things have changed. I'm adding "register parking" and numbers have been changed (R1-R5 instead of R0-R4), plus other subtleties.

> About call/return functionality - I saw that you have done some
> preliminary work on it. I'm by no way expert on VHDL (I typically
> write for my hobby in Verilog), but I checked the RTL and to me it
> looks like that there's no other way to modify the PC register - it's
> not part of the general purpose register file, so instructions like
> CALL/RETURN will be inevitable, imho. Nevertheless, I would be happy
> if you can share your thoughts on this important topic.

It is very important and I resisted for a long while before adopting a particular solution. My main problem is that it's inherently a 2-writes operation. It is necessary to treat PC independently, which raises a lot of issues. But for now, it seems to work in the microYASEP pipeline. It may evolve, for example I have not implemented "call with offset" (CALL2) because i believe it's a slippery slope, but it's "technically possible" so hey...

> PUT, GET - I'm not sure how these functions access the SFRs (they look
> unimplemented in the VHDL).

They were implemented for a prototype and the SR map has not yet been standardised.

> Btw, I would typically advise against
> separating the address space (separation in any form) like memory and
> I/O spaces - it's much more straight-forward to move & manipulate data
> around with unified instructions and to access memory-mapped
> peripherals (but in the same address space). Of course, the inevitable
> price for peripherals/SFRs is the address decoding - it's either
> partial & ugly, or full and expensive :).

I make a distinction because :

  • memory is meant for high-speed, bulk transfers that MAY be performed out of order and with latency. Memory is for data and instructions.
  • SRs are "serialising" and immediate effect, critical for control. Memory mappings, inter-threads protection, configuration of peripherals etc. will be done there.

I hope this clears some misunderstandings :-)

Wednesday 20 June 2012

The old new YASEP site

After a long wait and even longer prototyping phase, I finally decided that there was no point in going further with 2 more and more unrelated websites. The http://yasep.org site, on one side, was the "official" yet completely out of touch and outdated reference. For a year now, development was progressing on another place in my personal account.

This separation was doing more harm than good but I didn't want to put an unfinished work at the forefront. I wanted to translate ALL the pages in the new format and in french. But lately, the prototype has been "good enough" and is fresh, quite accurate and useful.
The "old" site is still available at http://archives.yasep.org/yasep2009/ if you want to compare and enjoy the leap.

We're already in mid-2012 and the current version is still labeled 2011, I'll simply jump to YASEP2013 in early 2013 :-) A lot of work is needed before I reach this new milestone...

Friday 11 May 2012

The social YASEP

The quest for more gadgets never ceases !

Among some of the newest and shiniest gizmos is a connexion to a world-class bouchot, directly integrated into the YGWM GUI. You can test it yourself at

http://yasep.org/yasep2011/#!bouchot

For the record it is a direct connexion with YGLLO's bouchot, one of the "mussel"'s community many hangouts :-) You can even watch a demonstration video made by Finss.

More unrelated surprises are in store...

Sunday 29 April 2012

More JavaScript gadgets

There, I've done it !

Now you can click on example source code in the documentation windows. It creates a new editor window with this listing. No copy-paste needed and you can test and modify the examples ! And soon you'll be able to move blocks of code to/from different editor windows...

I am also adding the "Examples" menu, wich imports external .yas source files. More and more examples will be added in the future :-)

Saturday 7 April 2012

Licensing freedom

How does one license freedom ? And what freedom is there to license ?

I recently got an email that contained this bit :

"with your current license, could your microYASEP be linked into a commercial product without releasing the product VHDL ?"

I think there is a valid point here. I chose the AGPL (a slightly modified GPL) for certain reasons, and one of my goals is to foster totally free and open source designs, a bit like Arduino does. It's somehow a mission statement and I stick to that.

I know well that a good CPU with good support is a gift, a fantastic tool not only for hobbyists but also for industries and they play by different rules, that are sometimes opposite. Choosing a different licence for the whole thing is not considered, and it's too late anyway, and I like the AGPL in the context of the YASEP project. I believe that "as is", the AGPL is not inadapted to hardware designs, as it is very close to the GPL, which also spawned the LGPL that many HDL designs use. Furthermore, I believe it is best to use only one license for the whole project, otherwise it can become confusing.

I've seen other projects use "dual licensing" but I am not sure that it would work for a hardware project. It's still a good idea so I thought about something slightly different, like a "partners program". It's still an ongoing thought and it will certainly evolve but my idea looks like this :
Commercial entities who want to integrate the YASEP core in commercial products (along with other HDL) would submit me their designs (HDL and finished product) for a confidential evaluation and certification. They will also disclose on their website all the YASEP source code that they used, in exchange of an exemption agreement and mention on the YASEP site.

I know that most companies feel more comfortable with cores from ARM or Microchip or Atmel... But I already know that there are exceptions and those excited by the YASEP are sensitive to my perspective so we'll tune the partnership details together.

So far, I don't give much thought about this "issue" because without an advanced enough design, there is no point in licensing. I don't want to waste time in endless conversations about hypothesis and what-ifs. And after all I am the author so I have the final word :-P

Monday 19 March 2012

24 bits per instruction

I started writing the microYASEP for a specific project, one month ago, and started to write more software.

The benefits of writing code while designing the architecture can't be over emphasised. When you eat your own food, you are more careful of the recipes and the ingredients !

I had thought about how to use the last remaining condition code, I thought about an "overflow" bit but it's useless because the comparisons are already containing the signed/unsigned information. Writing code made me realise that the ZERO condition was not enough as many operations require a scratch register for the result of a comparison (using CMPU). So I made a "EQ" (equal) flag and I reduced the register pressure a bit.

About the title : my first real program is a bit more than 200 bytes so it is time to evaluate my early estimations. In 2009 I guessed that in average one half of the instructions are "long" (4 bytes) (and the other half are short, 2 bytes) and now I have a first result :

The program comprises 70 instructions, 34 are short and 36 are long. Pretty good guess so one can consider the YASEP architecture as a "roughly 24-bit instruction machine" :-)

Furthermore, I know that I had a loooooot of time to get used to the architecture and learn how to use it efficiently, but overall the YASEP is pretty comfortable to use. I could do almost everything with just 8 opcodes :

ADD MOV GET PUT CALL AND XOR CMPU

The only limitation I found was the very short (-8 to +7) 4 bits immediate range. Sometimes it would be very handy to extend it to 5 or 6 bits, for short jumps or loops for example, but it was a conscious compromise from the start, as finding more bits somewhere would make the architecture more complex and less orthogonal... Food for thought for the next architecture (wink...)

Tuesday 28 February 2012

at a glance...

How did I write 350 lines of code in one day to create a (mostly) working CPU ? It's thanks to a long and careful preparation, using diagrams like this one :

I use different colors to show the separate paths for control, address, data etc.

With this kind of general diagram, it's easy to code : just follow the wires and code the corresponding function...

A cleaner version will appear later :-)

Wednesday 22 February 2012

microYASEP's first boot !

Today is a big milestone : a tiny implementation of the YASEP has executed tens of instructions 

  phase='1' PC=3FE  RAM=0000  Result=0000 DST=0  R1=???? R2=???? R3=???? R4=????
phase='1' PC=3FE RAM=0000 Result=0000 DST=0 R1=???? R2=???? R3=???? R4=????
*** releasing reset ***
phase='1' PC=3FE RAM=0000 Result=0000 DST=0 R1=???? R2=???? R3=???? R4=????
phase='1' PC=3FE RAM=1009 Result=0000 DST=0 R1=???? R2=???? R3=???? R4=????
phase='0' PC=000 RAM=1009 Result=0000 DST=0 R1=???? R2=???? R3=???? R4=????
phase='1' PC=002 RAM=1234 Result=1234 DST=1 R1=???? R2=???? R3=???? R4=????
writing 1234 to R1
phase='0' PC=004 RAM=4009 Result=4009 DST=1 R1=1234 R2=???? R3=???? R4=????
phase='1' PC=006 RAM=5678 Result=5678 DST=4 R1=1234 R2=???? R3=???? R4=????
writing 5678 to R4
phase='0' PC=008 RAM=1115 Result=1115 DST=4 R1=1234 R2=???? R3=???? R4=5678
phase='1' PC=00A RAM=4321 Result=5555 DST=1 R1=1234 R2=???? R3=???? R4=5678
writing 5555 to R1
phase='0' PC=00C RAM=1117 Result=234B DST=1 R1=5555 R2=???? R3=???? R4=5678
phase='1' PC=00E RAM=0100 Result=AAAA DST=1 R1=5555 R2=???? R3=???? R4=5678
writing AAAA to R1
phase='0' PC=010 RAM=220A Result=5556 DST=2 R1=AAAA R2=???? R3=???? R4=5678
phase='1' PC=010 RAM=330A Result=0002 DST=2 R1=AAAA R2=???? R3=???? R4=5678
writing 0002 to R2
phase='0' PC=012 RAM=330A Result=0002 DST=2 R1=AAAA R2=0002 R3=???? R4=5678
phase='1' PC=012 RAM=2408 Result=0003 DST=3 R1=AAAA R2=0002 R3=???? R4=5678
writing 0003 to R3
phase='0' PC=014 RAM=2408 Result=0003 DST=3 R1=AAAA R2=0002 R3=0003 R4=5678
phase='1' PC=014 RAM=3408 Result=0002 DST=4 R1=AAAA R2=0002 R3=0003 R4=5678
writing 0002 to R4
phase='0' PC=016 RAM=3408 Result=0002 DST=4 R1=AAAA R2=0002 R3=0003 R4=0002
phase='1' PC=016 RAM=1317 Result=0003 DST=4 R1=AAAA R2=0002 R3=0003 R4=0002
writing 0003 to R4
....

You can find the source code there and play with the parameters :-)

The "microYASEP" is a compatible subset of the usual YASEP but with many limitations, like only 23 instructions (not yet all implemented), 2 cycles per instruction (no pipeline), not even data memory access... It is designed for tiny FPGAs and the core source code takes about 350 lines in VHDL. Data widths are 16 bits but could potentially be even smaller if needed (I'll have to check). I think it will run around 12 MIPS for the first system that will use it, it could be faster but this is useless.

This would not be possible without all the software tools I have written in the last months and years ! I can now assemble and export in hexadecimal or VHDL, create new custom configuration files with a few clicks, or tweak details at will. I have created a new system of "CPU profiles" that goes beyond the basic YASEP16/YASEP32 distinction.

The microYASEP is just one of the several possible microarchitectures possible with the YASEP. Later configurations will be faster, larger and with more features like the multiplier, shifter and memory interfaces... But with one first application and a running, basic core, the whole YASEP design can tune its details with more real-life feedback !

Monday 13 February 2012

Interactive Assembler, take 2

The YASEP is progressing toward the 2012 revision, with great features and hopefully a first micro-YASEP soon. This burst of productivity has no secret : I just NEED a YASEP as soon as possible for another project. At this moment, I'm working on the assembly environment, a quick development that reuses some code from listed (the LISTing Editor created and stopped in 2009).

So far it can already import and assemble source code from a textarea, not bad for a whole day of coding... I've also been surprised by what I could do in one day for the VHDL code ! This new interface is also able to edit the imported data and export the assembled listing, but I want it to move lines from one editor window to another, and much more... I'll even reuse the recycle bin, just like I did for listed ! But I changed the name to YASMed. Don't ask me why...

Hopefully, a new release should be online on the "prototype area" in a few weeks.

Monday 28 November 2011

A YASEP assembler in C by DeforaOS

Today, khorben from DeforaOS, sent me a surprise : this screenshot !



He is implementing his assembler/disassembler in C for his operating system project. A graphic interface is also available, among the many features in development ! In parallel, I implement some features in the YGWM interface that synthesise and export the relevant informations needed by his assembler. In the end we'll both have the tools to create a full working and autonomous system :-)

Thanks again for the screenshot !

Tuesday 8 November 2011

Register Parking

 

  Warning : this is partially deprecated since 2013-08-09, see http://yasep.org/#!doc/reg-mem#parking

As the YASEP architecture specifies, there are 5 normal registers (R1-R5) and 5 pairs of data/address registers  (A1-D1, A2-D2...) and it's quite difficult to find the right balance between both : each application and approach requires a different optimal number of registers.

When more registers are needed (if you need R6 or R7) then you could assign them to D1 and D2 for example. However you have to set A1 and A2 to a safe location otherwise chaos could propagate in the software. Another issue is that each write to the A registers will update the memory. A similar situation appears if we use the Ax registers as normal registers : each write will trigger a memory read. And in paged/protected memory systems, this would kill the TLB...

This is now "solved" with today's system, which defines hardwired "parking" addresses and internal behaviour (this is still preliminary but looking promising).

  • "Parking" addresses are defined as "negative" addresses (that is : all the MSB are set to 1). This addressing range, at the "top" of the memory space, is normally not used, or used for special purposes, such as "fast constants" addressed by the short immediate values :
    MOV -7, A3 ; mem[-7] contains a constant or a scratch value,
    MOV D3,... ; the address fits in 3 bits
  • To keep the "parking" system compatible with non-parked versions, the addresses are defined globally for all software. They are easy to remember, as the following code shows :
    ; Park all the registers
    MOV -1, A1
    MOV -2, A2
    MOV -3, A3
    MOV -4, A4
    MOV -5, A5
    These will become macros or pseudo-instructions.
  • The internal numbering of the registers is changed to ease hardware implementation. There is a direct match between the binary register number and the binary code of the address (bits 1 to 3) :

    park address  binary    reg.bin       reg.number   register
          -1             1111       1111              15              A1
          -2             1110       1101              13              A2
          -3             1101       1011              11              A3
          -4             1100       1001                9              A4
          -5             1011       0111                7              A5
  • Architecturally, it does not change much. The Data registers are "cached" by the register set. What the hardware parking system adds is just an inhibition of the "data write" signal that would occur normally each time the core writes to a D register.
  • Aliasing : No alias detection is expected. If A4/D4 writes to -2, D2 is not updated. Otherwise it would mean that the result bus could write to 5 registers in parallel, which is not reasonable.
  • Thread backup and restoration : the register set contains the cached version of the memory, it must be refreshed when a thread is restored (swapped in). If the Ax register matches a parked address, the memory doesn't need to be fetched to refresh the cache. Another solution is to save the Dx register through another Ax/Dx, so there is nothing to test during restoration (but memory read cycles could not be spared).
  • This sytem where the "parking" is defined by an auxiliary value (that is inherently preserved through context switches) is "cleaner" than a more radical approach where "status bits" (one per A/D pair) park the registers. The advantage of the radical approach is that two registers can be parked at once (instead of one) but it gets harder to use with a compiler or from user software (you can play with pointers in C or Pascal easily, though you won't be able to define which pair is used). On top of that, adding status/control bits is usually a nightmare
In the end, it's not very complex (not as much as it seems). The hardware price is a few logic gates that detect the parking addresses to inhibit memory writes. For the software writer, it just means more registers on demand and it will work whether the YASEP has the parking hardware or not. You CAN have R6, R7 or R8 but then you'll have to restrict data access and give up A1/D1, A2/D2 and A3/D3. You make the choice !

Sunday 25 September 2011

The YASEP and Defora

Today I think that one big issue with the YASEP project has been solved.

I met Pierre this week, and I start to discover the awesomeness of his Defora project. "Debian For All" turned into creating a whole new, compact, totally GPLv3 system. With almost no dependency from existing systems, yet compatible with them... Perfect for embedded computing too !

We just started to work on a C version of the existing JS assembler and we consider writing a C99-compliant compiler.

YES, you have read it : the YASEP will have binaries generated from C code ! And good code, at that, since it does not go through GCC !

Many roadblocks are now removed. When the code generation tools are in place, we can then simulate/emulate the core and start to write a microkernel...

What this means for me is that I can finally stop worrying about the operating system and application layer. The YASEP will not use Linux and I won't be forced to use the huge GCC armada. I will also have more time to focus on the hardware architecture and implementation. And Pierre is a security specialist...

Oh, by the way : YGWM won the 2nd rank (ex aequo with Pierre's Defora) at the Open World Forum Code Contest this week. A new, shiny, professional laptop was given away by HP and will become my main workstation. Going from an Atom to a Core i5 makes me feel spoiled :-)

Wednesday 7 September 2011

YASEP2011

Development is still happening, at a slow pace (due to work duties) but nothing is forsaken.

I'm still working toward a cheap Actel board that can be easily replicated and cheaply fabricated, and the professional projects might bring some interesting results.

On another front, I resumed work on YGWM and extended the functionalities. You can even test the results at http://ygdes.com/~whygee/yasep2011/ and the whole website will be reimplemented with this new paradigm. No more tabs ! Everything in one browser window with a huge virtual desk !


Sunday 8 May 2011

This little Least Significant Bit

(update : 2011/05/11)

I've been wondering since march of this year if the Least Significant Bit (LSB) of the Next Instruction Pointer (NIP or NPC) could be better used than now.

The YASEP instructions are 16-bits aligned and the instruction addresses have their LSB cleared by convention. This bit is usually wasted in word-aligned byte-oriented computer architectures.

In the current YASEP architecture, this LSB holds the carry flag of ADD/SUB operations. It is the only status flag that I couldn't get rid of with the usual architectural tricks. As a reminder, instructions can check 3 conditions : register is cleared, has its LSB cleared (odd/even) or MSB (sign) cleared. Every condition can be negated and a 4th condition serves as "always" or "reserved" case. Reading the LSB and MSB is easy, checking for a cleared register is more costly. In some implementation, the register set has "shadow" bits with precomputed/cached "register is clear" bits. But otherwise, no dirty trick is employed.

The Carry bit is less easy to handle : it's a dynamic result that can't be reconstructed from the 16 or 32 bits of the registers. It is not possible to restore it after a thread switch. It can't be added to the "condition cache" because it will have to be saved and restored (16 more bits to save ? Bleh...)

Here come the latest changes :

  • The carry bit is now "hidden", not available from the register set for computations (that would make other things more difficult). It exists as a bit that can only be tested via a specific condition code in the conditional instruction forms (certainly one that tests NIP).
  • The LSB of NIP is always cleared. However, when saving/restoring the state in memory, it will hold the carry bit. This is the only case when the two functions (carry and pointer) are mixed.
  • Writing a "1" to the LSB of NIP (other than for saving/restoring the state) triggers a trap. There are several uses :
  1. Breakpointing / tracing / debugging : inject a "1" in the LSB and you can see where the pointer is used.
  2. Safety : for example if the stack is corrupted, there is a chance that the LSB will be set and trigger the trap
In future iterations, this bit could be used for something else more pertinent (such as a second instruction memory bank selector) so it must be carefully handled by programers now.

- page 2 of 4 -