I just received an email from another "hacker" who raised a lot of
interesting questions. I answered by email and I share here some important
ideas and insights.
Nikolay wrote :
> To be honest, I was more interested in YASEP16, because that would
> much harder task to solve compared to YASEP32 (32-bits are generally
> much more easier to define an instruction format, and if one sticks
> fixed-length 32-bit instructions & encoded instruction format,
> will look generally at least acceptable, if not even good).
I don't see what is so hard. It just happens that if you can do more, you
can do less. it's explained there :
> Let me start few steps away - I enjoyed to see that you're
> pissed-off of using separate tools and sources, and you had started
> play with an integrated model for the CPU, that's used for
> rtl/docs/assembler/disassembler. I think this is so major step, that
> can't even find any cool words for this. My understanding was that
> lots of interesting CPU projects die too early because the burden of
> supporting all the separate tools in a compatible state for both the
> developers usage and the community just quickly exausts the peolpe
> they stop pushing forward. Having the model-based approach will
> hopefully leave more time for fun for the hobby-CPU designers :D.
I suppose that you refer to a hobby project in particular, right ? :-) In
fact, building a whole, free, self-contained and EASY TO USE toolset was the
main motivation : I think that the YASEP is a sub-standard architecture as it
is now (yet it is being polished), but since it is so simple, I can also work
on getting the tools RIGHT for a more "potent architecture" (whether totally
new or dusted off from the archives...)
> Another interesting thing to see was that the YASEP16 &
> partially share the instruction format. I was wondering whether the
> actual intention was to have forward binary compatibility with the
This part of the architecture is not well understood and I still struggle to
explain it correctly. It is explained there http://yasep.org/#!doc/16-32
YASEP16 and YASEP32 are binary compatible on many levels (see the link
above). They do not "partially share" the instruction format. The instruction
format and decoder are 95% identical. Bus widths change and some 32-bits
specific instructions are invalid in 16-bits mode. That's all.
In fact, you could create a single CPU core that executes 16-bits and
32-bits core with only minor alterations of the decoder. Switching from 16-bits
to 32-bits mode should only affect memory addressing and organisation.
Furthermore, I once needed a MCU that would only handle 14-bits data : I see
no real limitation that prevents me to make a N-bit CPU (N<=32) as long as
the bus width is equal or larger than the instruction address bus. Then YASEP16
becomes just a particular subset of the architecture family.
> Now some questions for the instructions - I looked at the
> formats and also at several instructions. Is there any difference
> between FORM_iR and FORM_Ri (used by GET & PUT)?
This is addressed in http://yasep.org/#!doc/forms#FORM_iR
"This is just another way of writing iR when the instruction uses the
immediate value as a destination address."
In other terms, it's just to make the assembly language conform to the rule
1 that the destination is always the last part. See http://yasep.org/#!doc/asm#asm_ila
Physically, iR is encoded as Ri :
http://yasep.org/#!ASM/asm#a?PUT%20A5%204 : PUT A5 4 => 4762h
http://yasep.org/#!ASM/asm#a?GET%204%20A5 : GET 4 A5 => 4742h
The immediate "4" stays at the same place. However, it is clearer to keep a
writing rule and the source operand is easier to spot, it does not depend on
> According to basic math instructions - I saw that ADD & SUB
> Carry/Borrow, but I'm not sure whether these instructions actually
> this flag. Generally it's usefull when implementing arithmetics with
> integers wider than the CPU registers (and it's major pain in the
> when it's not available).
Carry/Borrow are used only by conditional instructions.
In practice I have seen that ADDC and SUBB are quite rare and I couldn't
justify the added opcodes. Having a specific set of conditions, however, is far
more interesting. So if you want to add with carry/borrow, it's a bit awkward
but here is one way :
; R1 + R2 => R3:R4
ADD R1 R2 R3 ; generate the carry
ADD 1 R4 carry ; suppose R4 was cleared.
It's a bit unusual, but possible. All the necessary data are here, and if
you want to do number crunching, use a more appropriate CPU :-) This one is for
doing "control" stuff.
> Other things for commenting - I was a little surprised to see the
> assembler format like "MOV Rx Ry" and I was wondering what influenced >
you when designing the ASM syntax?
Experience with several architectures :-) Before F-CPU I had already made a
few assemblers for existing and hypothetical architectures. Simpler is better
but expressiveness and consistency must not be forgotten : one must see the
intent in the source code.
> SHR, SAR, SHL, ROR, ROL - I was pleasantly surprised to see a full
> of shift operations on a small micro. I have also one question about
> the shift/rotate operations - I didn't see anywhere that the Carry
> flag is used/updated by these opcodes. Generally playing with the
> Carry is used when converting between receiving/transmitting 1-bit
> data (think for software SPI/I2C/UART). I suppose that your
> is to use instead logical operations to extract the LSB, like "AND 1
> R3" for example?
Shift/rotate don't take the carry. I can't remember how many times I had to
clear the carry flag before doing a shift on "another widespread architecture".
If you want to shift a bit out, there's an easy way :
ADD R1 R1 ; => carry !
OR 1 R2 CARRY;
Or you can :
SHL 1 R1
OR 1 R2 MSB1 R1
Or the other way :
SHR 1 R1
OR 1 R2 LSB1 R1
Another method, if you want to have an arbitrary bit order, is to use a
condition on R1's individual bits :
; count the number of bits set in R1's 16 LSB:
MOV 0 R2
ADD 1 R2 BIT0 0
ADD 1 R2 BIT0 1
ADD 1 R2 BIT0 2
ADD 1 R2 BIT0 3
ADD 1 R2 BIT0 4
ADD 1 R2 BIT0 5
ADD 1 R2 BIT0 6
ADD 1 R2 BIT0 7
ADD 1 R2 BIT0 8
ADD 1 R2 BIT0 9
ADD 1 R2 BIT0 10
ADD 1 R2 BIT0 11
ADD 1 R2 BIT0 12
ADD 1 R2 BIT0 13
ADD 1 R2 BIT0 14
ADD 1 R2 BIT0 15
(just an example, there are better and faster ways)
Note that this feature makes sense in a microcontroller, its purpose is to
handle bits. It could be unavailable in a more "streamlined" version, because
of pipeline delays. but you never know.
> CMPU, CMPS, UMIN, UMAX, SMIN, SMAX - just plain awesome. "Dude,
> is not your grandfather's PIC!"
That's one way of seeing this. Notice that MIN/MAX are not available on
certain implementations though. I can play with the inhibition of writeback to
the destination register because i don't want to add a conditional MUX in the
datapath, too much wire load. That would slow the whole system down. But it can
"partially work" with the inhibition trick on some operands combinations.
> The signed 4-bit immediate is the victim of the short
> format :D.
I think I have found a good balance because short immediates appear very
often. However I wish I had room for 8-bits operands. Anyway, the whole thing
remains orthogonal and (almost) simple. It's a compromise and there will always
be annoying cases.
> I was joking before several days that if we don't have
> support for immediate values, we won't have any data to load because
> there's no way to create the data in RAM in the first place (which
> not true of course, reading hard-wired constants from a special reg
> and/or incrementing/shifting/performing bit ops will still provide
> valuable way to enter data in the programs, but still... it sucks
> compared to the immediates. Actually, my opinion is that the
> is always a victim of any ISA - when having fixed instructions you
> either sacrifice immediates for register addresses and more
> or you sacrifice the operand count (and reuse one of the operands as
> both source & destination, doh), or you have multiple
> formats that had to be decoded simultaneously in order to provide
> needed flexibility (that's not cool, but it's inevitable price to
That's the same dilemmas for everybody :-)
> About the instruction condition codes - are these available only
> the YASEP32? And also, are they emulated by the high-level assembler
> when generating machine code for YASEP16?
They are available for both YASEP16 and YASEP32. The datapath width is not
the instruction width.
> Btw, I didn't found a description of all the
> registers, so I'm not sure what are these used for (they look
> like memory index & memory window registers, but that's a wild
I am working at this moment on the related page. I work both on the french
and english version to keep them in synch. An older version is there :
However several things have changed. I'm adding "register parking" and numbers
have been changed (R1-R5 instead of R0-R4), plus other subtleties.
> About call/return functionality - I saw that you have done
> preliminary work on it. I'm by no way expert on VHDL (I typically
> write for my hobby in Verilog), but I checked the RTL and to me it
> looks like that there's no other way to modify the PC register -
> not part of the general purpose register file, so instructions like
> CALL/RETURN will be inevitable, imho. Nevertheless, I would be happy
> if you can share your thoughts on this important topic.
It is very important and I resisted for a long while before adopting a
particular solution. My main problem is that it's inherently a 2-writes
operation. It is necessary to treat PC independently, which raises a lot of
issues. But for now, it seems to work in the microYASEP pipeline. It may
evolve, for example I have not implemented "call with offset" (CALL2) because i
believe it's a slippery slope, but it's "technically possible" so hey...
> PUT, GET - I'm not sure how these functions access the SFRs (they
> unimplemented in the VHDL).
They were implemented for a prototype and the SR map has not yet been
> Btw, I would typically advise against
> separating the address space (separation in any form) like memory
> I/O spaces - it's much more straight-forward to move & manipulate
> around with unified instructions and to access memory-mapped
> peripherals (but in the same address space). Of course, the
> price for peripherals/SFRs is the address decoding - it's either
> partial & ugly, or full and expensive :).
I make a distinction because :
- memory is meant for high-speed, bulk transfers that MAY be performed out of
order and with latency. Memory is for data and instructions.
- SRs are "serialising" and immediate effect, critical for control. Memory
mappings, inter-threads protection, configuration of peripherals etc. will be
I hope this clears some misunderstandings :-)