I just received an email from another "hacker" who raised a lot of interesting questions. I answered by email and I share here some important ideas and insights.

Nikolay wrote :
> To be honest, I was more interested in YASEP16, because that would
> much harder task to solve compared to YASEP32 (32-bits are generally
> much more easier to define an instruction format, and if one sticks to
> fixed-length 32-bit instructions & encoded instruction format, things
> will look generally at least acceptable, if not even good).

I don't see what is so hard. It just happens that if you can do more, you can do less. it's explained there :

http://yasep.org/#!doc/16-32

http://yasep.org/#!doc/forms

> Let me start few steps away - I enjoyed to see that you're finally
> pissed-off of using separate tools and sources, and you had started to
> play with an integrated model for the CPU, that's used for
> rtl/docs/assembler/disassembler. I think this is so major step, that I
> can't even find any cool words for this. My understanding was that
> lots of interesting CPU projects die too early because the burden of
> supporting all the separate tools in a compatible state for both the
> developers usage and the community just quickly exausts the peolpe and
> they stop pushing forward. Having the model-based approach will
> hopefully leave more time for fun for the hobby-CPU designers :D.

I suppose that you refer to a hobby project in particular, right ? :-) In fact, building a whole, free, self-contained and EASY TO USE toolset was the main motivation : I think that the YASEP is a sub-standard architecture as it is now (yet it is being polished), but since it is so simple, I can also work on getting the tools RIGHT for a more "potent architecture" (whether totally new or dusted off from the archives...)

> Another interesting thing to see was that the YASEP16 & YASEP32
> partially share the instruction format. I was wondering whether the
> actual intention was to have forward binary compatibility with the
> YASEP32?

This part of the architecture is not well understood and I still struggle to explain it correctly. It is explained there http://yasep.org/#!doc/16-32

YASEP16 and YASEP32 are binary compatible on many levels (see the link above). They do not "partially share" the instruction format. The instruction format and decoder are 95% identical. Bus widths change and some 32-bits specific instructions are invalid in 16-bits mode. That's all.

In fact, you could create a single CPU core that executes 16-bits and 32-bits core with only minor alterations of the decoder. Switching from 16-bits to 32-bits mode should only affect memory addressing and organisation.

Furthermore, I once needed a MCU that would only handle 14-bits data : I see no real limitation that prevents me to make a N-bit CPU (N<=32) as long as the bus width is equal or larger than the instruction address bus. Then YASEP16 becomes just a particular subset of the architecture family.

> Now some questions for the instructions - I looked at the instruction
> formats and also at several instructions. Is there any difference
> between FORM_iR and FORM_Ri (used by GET & PUT)?

This is addressed in http://yasep.org/#!doc/forms#FORM_iR :

"This is just another way of writing iR when the instruction uses the immediate value as a destination address."

In other terms, it's just to make the assembly language conform to the rule 1 that the destination is always the last part. See http://yasep.org/#!doc/asm#asm_ila

Physically, iR is encoded as Ri :

http://yasep.org/#!ASM/asm#a?PUT%20A5%204 : PUT A5 4 => 4762h
http://yasep.org/#!ASM/asm#a?GET%204%20A5 : GET 4 A5 => 4742h

The immediate "4" stays at the same place. However, it is clearer to keep a writing rule and the source operand is easier to spot, it does not depend on the opcode.

> According to basic math instructions - I saw that ADD & SUB generate
> Carry/Borrow, but I'm not sure whether these instructions actually use
> this flag. Generally it's usefull when implementing arithmetics with
> integers wider than the CPU registers (and it's major pain in the ass
> when it's not available).

Carry/Borrow are used only by conditional instructions.

In practice I have seen that ADDC and SUBB are quite rare and I couldn't justify the added opcodes. Having a specific set of conditions, however, is far more interesting. So if you want to add with carry/borrow, it's a bit awkward but here is one way :

; R1 + R2 => R3:R4
ADD R1 R2 R3 ; generate the carry
ADD 1 R4 carry ; suppose R4 was cleared.

It's a bit unusual, but possible. All the necessary data are here, and if you want to do number crunching, use a more appropriate CPU :-) This one is for doing "control" stuff.

> Other things for commenting - I was a little surprised to see the > assembler format like "MOV Rx Ry" and I was wondering what influenced > you when designing the ASM syntax?

Experience with several architectures :-) Before F-CPU I had already made a few assemblers for existing and hypothetical architectures. Simpler is better but expressiveness and consistency must not be forgotten : one must see the intent in the source code.

> SHR, SAR, SHL, ROR, ROL - I was pleasantly surprised to see a full set
> of shift operations on a small micro. I have also one question about
> the shift/rotate operations - I didn't see anywhere that the Carry
> flag is used/updated by these opcodes. Generally playing with the
> Carry is used when converting between receiving/transmitting 1-bit
> data (think for software SPI/I2C/UART). I suppose that your intention
> is to use instead logical operations to extract the LSB, like "AND 1
> R3" for example?

Shift/rotate don't take the carry. I can't remember how many times I had to clear the carry flag before doing a shift on "another widespread architecture". How annoying.

If you want to shift a bit out, there's an easy way :
ADD R1 R1 ; => carry !
OR 1 R2 CARRY;

Or you can :

SHL 1 R1
OR 1 R2 MSB1 R1

Or the other way :

SHR 1 R1
OR 1 R2 LSB1 R1
Another method, if you want to have an arbitrary bit order, is to use a condition on R1's individual bits :
; count the number of bits set in R1's 16 LSB:
MOV 0 R2
ADD 1 R2  BIT0 0
ADD 1 R2  BIT0 1
ADD 1 R2  BIT0 2
ADD 1 R2  BIT0 3
ADD 1 R2  BIT0 4
ADD 1 R2  BIT0 5
ADD 1 R2  BIT0 6
ADD 1 R2  BIT0 7
ADD 1 R2  BIT0 8
ADD 1 R2  BIT0 9
ADD 1 R2  BIT0 10
ADD 1 R2  BIT0 11
ADD 1 R2  BIT0 12
ADD 1 R2  BIT0 13
ADD 1 R2  BIT0 14
ADD 1 R2  BIT0 15

(just an example, there are better and faster ways)

Note that this feature makes sense in a microcontroller, its purpose is to handle bits. It could be unavailable in a more "streamlined" version, because of pipeline delays. but you never know.

> CMPU, CMPS, UMIN, UMAX, SMIN, SMAX - just plain awesome. "Dude, this
> is not your grandfather's PIC!"

That's one way of seeing this. Notice that MIN/MAX are not available on certain implementations though. I can play with the inhibition of writeback to the destination register because i don't want to add a conditional MUX in the datapath, too much wire load. That would slow the whole system down. But it can "partially work" with the inhibition trick on some operands combinations.

> The signed 4-bit immediate is the victim of the short instruction
> format :D.

I think I have found a good balance because short immediates appear very often. However I wish I had room for 8-bits operands. Anyway, the whole thing remains orthogonal and (almost) simple. It's a compromise and there will always be annoying cases.

> I was joking before several days that if we don't have
> support for immediate values, we won't have any data to load because
> there's no way to create the data in RAM in the first place (which is
> not true of course, reading hard-wired constants from a special reg
> and/or incrementing/shifting/performing bit ops will still provide
> valuable way to enter data in the programs, but still... it sucks
> compared to the immediates. Actually, my opinion is that the immediate
> is always a victim of any ISA - when having fixed instructions you
> either sacrifice immediates for register addresses and more operands,
> or you sacrifice the operand count (and reuse one of the operands as
> both source & destination, doh), or you have multiple instruction
> formats that had to be decoded simultaneously in order to provide the
> needed flexibility (that's not cool, but it's inevitable price to
> pay).

That's the same dilemmas for everybody :-)

> About the instruction condition codes - are these available only for
> the YASEP32? And also, are they emulated by the high-level assembler
> when generating machine code for YASEP16?

They are available for both YASEP16 and YASEP32. The datapath width is not the instruction width.

> Btw, I didn't found a description of all the programmer-visible
> registers, so I'm not sure what are these used for (they look somehow
> like memory index & memory window registers, but that's a wild guess).

I am working at this moment on the related page. I work both on the french and english version to keep them in synch. An older version is there : http://yasep.org/yasep2009/docs/registers.html However several things have changed. I'm adding "register parking" and numbers have been changed (R1-R5 instead of R0-R4), plus other subtleties.

> About call/return functionality - I saw that you have done some
> preliminary work on it. I'm by no way expert on VHDL (I typically
> write for my hobby in Verilog), but I checked the RTL and to me it
> looks like that there's no other way to modify the PC register - it's
> not part of the general purpose register file, so instructions like
> CALL/RETURN will be inevitable, imho. Nevertheless, I would be happy
> if you can share your thoughts on this important topic.

It is very important and I resisted for a long while before adopting a particular solution. My main problem is that it's inherently a 2-writes operation. It is necessary to treat PC independently, which raises a lot of issues. But for now, it seems to work in the microYASEP pipeline. It may evolve, for example I have not implemented "call with offset" (CALL2) because i believe it's a slippery slope, but it's "technically possible" so hey...

> PUT, GET - I'm not sure how these functions access the SFRs (they look
> unimplemented in the VHDL).

They were implemented for a prototype and the SR map has not yet been standardised.

> Btw, I would typically advise against
> separating the address space (separation in any form) like memory and
> I/O spaces - it's much more straight-forward to move & manipulate data
> around with unified instructions and to access memory-mapped
> peripherals (but in the same address space). Of course, the inevitable
> price for peripherals/SFRs is the address decoding - it's either
> partial & ugly, or full and expensive :).

I make a distinction because :

  • memory is meant for high-speed, bulk transfers that MAY be performed out of order and with latency. Memory is for data and instructions.
  • SRs are "serialising" and immediate effect, critical for control. Memory mappings, inter-threads protection, configuration of peripherals etc. will be done there.

I hope this clears some misunderstandings :-)