YASEP news

To content | To menu | To search

Sunday 27 July 2014

The first PCB designed for the YASEP

My development time is dedicated to the "WizYASEP", a board containing a FPGA (A3P250VQ100) and a Wiznet WZ5300 (high speed TCP/IP module).

The first prototype board, using Dipole's services, is received and soldered :

In the coming months, a batch will be produced and deployed. They will display videos on outdoor LED screens, see some previous similar projects at https://www.facebook.com/pages/YGDES/101308076630787

Thanks to its connectors, this small board is quite versatile and will find many applications, as is or with extension boards. The PCB itself can also be customised and adapted for specific projects.

Later, this board might become the next development tool in the YASEP ecosystem : equiped with a HTTP server (with HTTAP support), storage and probes, it will be a great in-circuit-emulator for other YASEP boards, such as the Actuino

Due to time constraints, the first version will use an old microYASEP core (pre-2014) but the newer, faster and enhanced miniYASEP will follow in 2015, implementing the new instruction set format.

Oh, and Microsemi has released a QFP144 version of its Igloo2 FPGA. It looks very promising...

Monday 28 April 2014

Baby steps

The YASEP's development progresses on many fronts so big results are not visible yet.

I just started http://httap.org/ to document and publish the work on a TAP protocol over HTTP. A lot of multi-language development is going on, while writing articles on many connected subjects.

HTTaP is critical to rewrite the microYASEP from scratch with "Design For Test" in mind. In fact I'm investing time on debug/design tools in the hope that further development will be much faster, like on many occasions in the YASEP project !

In need to deliver a microYASEP VHDL core in a few months so it's pretty critical. The YASEP2014 architecture is still fresh in my head and there are so many details to correct. But some good things emerge anyway, such as my recent rewrite of the register set code (it now allows a "side channel" for address updates) or the removal of the ugly SWAP_IR flag (probably one of the most confusing aspects of the architecture that became finally pointless with the latest instruction format changes).

Translation (to spanish and german) is still on the table though no meaningful progress has occured lately.

As usual, pretty unstable snapshots are uploaded to http://ygdes.com/~whygee/yasep2014 whenever possible, so you can look at the evolution.

Wednesday 26 March 2014

Why ? Pourquoi ?

(Français plus bas)

I get asked this question often : Why design another CPU softcore ?

I (Whygee) design electronic circuits to address the needs of my clients. The commercial solutions are often lacking in flexibility. It's often hard to find the right part number to match a project with its fixed feature set, including I/Os, speed, processing power, memory, peripherals... The chips may be cheap but the whole toolset can become expensive, and requires a Windows computer. On top of that, parts stock obsoletes quickly and long term availability is never certain (will the company go bankrupt or be bought next year ?).

On the other hand, the existing open source softcores do not solve all these problems. Some of them are little student projects with no continuous development, the toolset is often minimal, the requirements or usefulness might be arbitrarily specific, the documentation may be unsufficient...

The YASEP's purpose is not only to create a processor family but also develop the tools that I need and that may be re-used for other projects, in particular to design other CPU architectures. Many softcores need to reinvent the wheel and can't afford to make a sleek integrated IDE. The YASEP is also a reusable modular toolset.

Today, I can wrap the YASEP around customer projects' needs, instead of warping my technical solution around some manufacturers' constraints. In fact, I can solve most of my microcontroller needs and I stock only a handful of FPGA part numbers. If the YASEP solves my problems, it might also solve yours and that's why I have open sourced all the necessary resources.

You will find some background on this project in these presentations: EHSM2012 (english) and JMLL2012 (french).


On me demande souvent : Pourquoi créer encore un autre microprocesseur ?

Mon métier est de réaliser des circuits électroniques pour répondre aux besoins de mes clients. La plupart du temps, les produits commerciaux ne sont pas assez flexible. Il est difficile de trouver la bonne référence de circuit intégré avec les bonnes caractéristiques, qui dispose du bon nombre d'entrées-sorties, qui soit juste assez rapide, avec juste assez de mémoire et les bons périphériques... Les puces sont souvent abordables mais l'ensemble des outils de développements peut coûter très cher, surtout si on doit acheter un ordinateur dédié sous Windows. En plus, les stocks de composants deviennent vite obsolètes et la pérennité à long terme n'est jamais assurée (est-ce que le compagnie va faire faillite ou être rachetée l'année prochaine ?).

D'un autre côté, les softcores open source ne résolvent pas tous ces problèmes. Certains de ces projets sont abandonnés, les outils de développement sont insuffisants ou beaucoup trop restrictifs, la documentation est insuffisante...

L'objectif du YASEP n'est pas juste de créer une famille de processeurs mais aussi de développer des outils dont j'ai besoin et qui peuvent être réutilisés par d'autres projets, en particulier pour concevoir d'autres architectures de processeurs. Beaucoup de projets de softcores doivent réinventer la roue et ne peuvent pas investir le temps pour créer une interface utilisateur aboutie. Le YASEP est donc aussi un ensemble d'outils modulaires et réutilisables.

Aujourdhui, je peux adapter totalement un YASEP pour répondre aux besoins de mes clients, au lieu de plier mes projets aux contraintes des fabricants. En fait, le YASEP convient pour la plupart des projets où un microcontrôleur est requis et je peux garder en stock un nombre plus réduit de références de circuits intégrés. Et si le YASEP résout mes problèmes, il pourrait aussi vous aider et c'est pour cela que je diffuse tous les outils en avec une licence libre.

Pour plus d'informations sur la génèse de ce projet, regardez ces deux présentations : EHSM2012 (anglais) and JMLL2012 (français).

Thursday 13 February 2014

Flag polarity

RISC does not like status flags, but it was determined to be a necessary evil for the YASEP. Yet something more important than cheduling has confused the flags : their values. They have often created more problems than they solve because I always remember the wrong ideas about them, or I forget that I changed a detail.

The Carry flag

In the beginning, it's very simple : the carry flag is set to 1 if the result of an addition generates a carry (overflows). It's valid both for signed and unsigned, thanks to the magic of 2s complement.

Then comes the subtraction : a little electronic quirk made me chose to not complement the borrow flag when there is a borrow. So the flag is set when there is no borrow. It's not usual but it saves maybe half a nanosecond and it's hidden by the symbolic treatment of the assembler with the "BORROW" condition.

Comparisons and MIN/MAX are even more confusing and I never know what to expect or how to come with the right thought process... The fact that the operands can often be swapped does not help !

(to be continued)

The Equal flag

This one just changed polarity, again. So now it's simple : it is set to 1 when the operands are equal.

The value's calculation is a bit subtle : it reuses the ROP2's XOR layer, but since CMP also performs a SUB in parallel, the SND operand is negated, so it's actually a XORN.
Additionally, the reduction was done by a OR, and now it's done by AND (otherwise it won't work).

The documentation at http://yasep.org/#!doc/reg-mem#equal is updated with a new diagram. This time, I should not forget the subtleties anymore...

Sunday 9 February 2014

Zero becomes Equal

I'm currently reviewing all the documentation of the YASEP. Going back to what I've done in the past, sometimes in a rush, and looking with the perspective of experience, gives me the opportunity to correct details that have become confusing in practice.

One example is the "Zero flag". It was created without thinking it through, during a frantic coding session.

It is affected only by the CMP instructions. The flag's value is the direct result of XORing both operands, then ORing all the bits. It uses the ROP2 unit's circuits, not the ALU's and it's independent from the Carry.

Why was it called "Zero" ? Maybe by habit. In fact it does not test if the operands are clear, even though it could. It tests if both operands are equal : it should be renamed to EQUAL / EQ and it would become coherent with the NEQ/EQ condition keywords. So let's do this !

Wednesday 5 February 2014

The yasep2014 milestone

With the resolution of the auto-update flags problem, other fundamental issues were ripe to be addressed. And they are now solved ! So here is the overview of the next major revision of the YASEP architecture :

The update fields are not the only one that changed. The immediate fields have been harmonised and the condition and destination fields have been swapped, which reduces the instruction decoder's complexity a bit (fewer wires and MUXes).

I'm also changing the rules that govern the assembly language's syntax. This will put the immediate values always after the opcode, which will save some code by dropping the Ri/RRi/RI/RRI forms that are seldom used. Less code makes better code :-)

New memory-handling instructions will also appear, like BSWAP, and a pair of "shift & OR" opcodes for bitstream extraction.

The VHDL code also needs a full rewrite that also offers a debugging and testing through an auxiliary port.

These changes are quite deep and require a full review of the whole code base, but it was necessary and now seems to be the right time to do things right, before new features are added on top. The changes will take a lot of time to mature and will not be published on the main site for a while. Stay tuned !

Friday 31 January 2014

Definition of the auto-update fields

(edit 20140207: some stupid typos crept into the tables)

(edit 2014-02-08 : dropping all the pre-modifications)

The instruction set of the YASEP architecture is finally frozen, after years of fine-tuning and exploration !

In august 2013, during a discussion with JCH, I came up with a new encoding for the 4 remaining bits of the extended instructions that were reserved for register auto-updates. I've been struggling with the one big shortcoming of the architecture : the very limited range of Imm4, particularly for conditional relative jumps. I had hacked a few tricks but none were really satisfying.

JCH pointed to some autoupdate codes that didn't make sense in combination with other flags and that's how he found a way to get 2 more bit for SI4/Imm4.

I tried to simplify the system down to a few simpler codes, following these principles :

  • A little reminder : when a D register (a memory access register) is referenced, it's the corresponding A register that gets updated, according to the size of the accessed word (1, 2 or 4 bytes). Otherwise, A registers are incremented by 2 or 4 bytes (depending on the datapath width, 16 or 32 bits) and R registers are incremented by 1. It's not very orthogonal but quite efficient.
  • any register may be post-incremented or post-decremented with one instruction (handy for string/vector code)
  • There must be "room" for 2 Imm bits and it should not break existing compiled code (NOP=0000)
  • Any of the 4 register fields may be affected

The important trick that JCH found is that the Imm/Reg field invalidates certain auto-updates and frees some bits. In particular, it makes no sense to update SI4 when this source operand is immediate, so SI4 is associated with NOP in certain cases.

There is very little room and I had to make some compromises. For example, the CND field can't be updated when other registers are. Pre-incrementations are also avoided (see at the bottom why). It's not possible to increment one register and decrement another.

The resulting format provides Imm6 and one post-update for all extended instructions, and one to three post-updates when no immediate is present.
  • iRR instructions use 2 bits to encode Imm6 along with 2 bits for updates :
00 NOP
01 SND+
10 DST+
11 CND- (this helps loops)
  • RRR instructions use 4 bits to encode more complex updates
        00           01         10    11
00 NOP SND+,SI4+,DST+ SI4- SI4+
01 SND-,SI4- SND+,SI4+ SND- SND+
10 DST-,SI4- DST+,SI4+ DST- DST+
- The big advantage of this encoding is that it increases code density for a lot of very common sequences : stack manipulation, string/vector processing, counters... Code density increase does not always mean faster execution but it helps. Different microarchitectures might implement these flags with different approaches (serial or parallel)

- There are several drawbacks as well : the encoding favors density over decoding ease (but what can we do with only 4 bits ?). The new encoding also breaks Imm4 and a new assembler must be recoded from scratch (the current one is aging and its flexibility has been stretched to its limits).

- In the end, it is a progress :

  • Code density will increase again (maybe 20%).
  • Auto-updates are an optional feature but we have freed 2 Imm6 bits for general consumption. This will benefit all the YASEPs out there (which must be updated, fortunately there are not a lot yet ;-D ). Post-update is a first level of compatibility, and pre-update is more difficult to implement so it's a second level (less expected to be available).
  • This help extend the range of PC-relative conditional jumps
  • This solves the limitation of Shift/rotate operations
  • No more unused bits in the instructions !

- Some questions remain :

  • It makes sense to update the A register of a D register that has just been written to (to update the destination for the next write, in a string-copy sequence for example). What about the case where an instruction writes to a R register with post-increment ? What is the priority ? Auto-updates were initially meant for address registers only but later extended, should this be restricted again ? If so, would that break even more symmetry and create more complexity ?

Right now, the priority is to rewrite the assembler/disassembler and keep the simulator and VHDL up-to-date. My work system is in a bad state and it will take time to get everything back in order.

Why no pre-increment or pre-decrement ?

Pre-modification are removed because they break the very important rule that an instruction should not trap (or be able to trap) in the middle of the execution pipeline.
In the case of pre-incrementing an address register, such as MOV -D1, R1, the validity of the new address in A1 is known only after it is being computed, but there is no way to gracefully stop the instruction in the middle or even restart it. The proper way to do it is to move the -D1 into either a previous instruction using A1 or D1, or simply emit a short ADD -1 A1 instruction before the actual move to R1.

Remember : all the operands must be directly ready for use (at decode stage) before the instruction can proceed to execution stage.

The previous table was :
        00       01   10    11
00 NOP +SI4 SI4+ SI4-
The new table uses the 4 pre-inc entries for 2-post-decrement and 3-post-increment.

Friday 9 August 2013

A new register parking system

In two years, I had some time to think about the "register parking" system that was introduced in this post : http://news.yasep.org/post/2011/11/08/Register-Parking

Now, I have understood the shortcomings and have found a slightly simpler system : the "parking address" is the same for everybody, it's -1.

Read the new description in the updated documentation at http://yasep.org/#!doc/reg-mem#parking

Tuesday 6 August 2013

Project split

A new website is now dedicated to the YGWM toolkit : http://ygwm.org/

It's empty now but it will host the project-independent files created for the YASEP's website, making them reusable by other projects. This also splits the development into manageable parts and reduces the size of the YASEP archives (that can get rid of YGWM-specific tutorials etc.)

I had the intention to "reboot" the whole yasep.org site (like i did in 2009 when i started including ygwm) and this is a great opportunity to do that again, reviewing all the existing code and solving the problems that had accumulated all those years ;-)

Finally, looking beyond the reboot, this is also a great way to start other projects, like other CPU families in the far future...

Friday 26 July 2013

There is always crazier than you.

Today, I received an email from Jean-Christophe who asked me to copy-paste some source code he attached, open a framebuffer and simulator, and let it run a few hours...

This is what his code displayed :

Yes, this is really a Mandelbrot set, computed in integer mode with 8-bit multiplies...

Of course it's slow but it's extremely interesting and gives ideas about how to speed up the simulator. It runs at about 2500 instructions / second on a Core i3 @2.3GHz on Firefox and so far it's behaving well... I think about :

 * adding a 16x16 bits multiply for YASEP32

 * adding register parking in the simulator

 * creating a data memory area for the simulator (so one does not have to use the framebuffer to store variables)

 * buffering access to the DOM so the browser is refreshed less often, and spends less time managing graphic elements that are not even seen or displayed...

Thanks Jean-Christophe, and keep blowing my mind !


JCH sent me a slightly faster version of the code :-) 518 million cycles instead of 820 millions...

Saturday 20 July 2013

The tracker's backend

The YASEP's website is slowly integrating a tracking system : it will keep a list of open tasks, bugs, features and other stuffs to do. This is a much clearer replacement for the http://yasep.org/changes.txt file that is growing out of hand...

The web interface at http://yasep.org#!track is only embryonic and will take time (months...) to develop. It's a front-end to a publicly accessible bugtracker located at https://bitbucket.org/dahozer/yasep/issues

A "cached" version of the list of opened bugs and tasks will be provided in each version, and it will get updated when opening the window, if the browser can access Internet.

So far, it's a "read only" system : I will keep the control of the bug tracker's contents, because the messages need special formatting to add extra metadata that are useful to me : for example, the estimated effort to complete a task (hours, day, week...). I also want to prevent injection of arbitrary data into the project. If you find any bug, don't hesitate to contact me directly.

Sunday 14 July 2013

More about the IPC instructions

Jean-Christophe sent me an interesting email loaded with questions about the IPC instructions. Before I address them, I felt that I should provide some background in the precedent post about "limiting YASEP32's threads code sizes to 16 bits". Go read it now !

Done ? OK. And now, the (translated) email.

> Concerning the 3 instructions IPC, IPE et IPR of the YASEP, I have read that you designed them with the HURD's needs in mind. However, I'm not sure to see how it solves the problem.

This story started long ago, in the F-CPU era, and the encounter with the HURD team at RMLL2002 in Bordeaux. They were trying to solve the problem of slow inter-server calls that crippled the efficiency of their system. That's more than 10 years ago...

Since then, many things have evolved and the question is quite a bit different, now that I can redesign the WHOLE computing platform, not even being forced to "run Linux" or "run the HURD". I make YASEP run whatever I need or want and I don't care as much about others' whims. But the question remained.

The IPC instructions solve one part of the problem of switching fast to another thread. These instructions make sense when the YASEP is implemented with a large register bank that can hold 8, 16 or 32 tread contexts. In this case, and if the called thread is preloaded, the switch is almost instantaneous.

Of course, you can't limit a system to run only 8, 16 or 32 contexts. This is only suitable for a SMT architecture (see : "barrel processor") but software size could grow beyond that. My actually running Linux laptop has about 177 tasks right now, and only a few are actually using the CPU. So, for large implementations, the YASEP must store the corresponding, actual thread ID as well as a smaller, 5-bit ID for the cache. Or 6 bits, if you want to emulate Ubicom.
Some associative memory and you implement the cache mechanism. And if your code calls a thread that is not already loaded in the CPU register bank, you "get hit by the miss", but "this should not happen too often". And embedded systems that don't need 32 simultaneous threads can just use the 5-bit ID directly (and trap if the thread ID is too large).

For the rest of this post, bear in mind that I am not a microkernel specialist. There are even several points of view about them and I will only speak about mine, despite having never created a full operating system. At least I know what features and behaviour my OS will have.

> If these instructions are meant to call routines in safe code sections (TCB) it might be a good, flexible solution.

That was the initial purpose : call shared libraries and fast system calls. Later, it evolved with the addition of some additional security checks. It was possible because the YASEP is not created to use classic paged memory only.

> But I understand that the HURD's problem was to provide a platform where same-level users (less privileged than the machine or the admin) could run their own servers and exchange services safely. One typical example being User A creating a USB file system and letting User B access files on the USB flash dongle. B must call functions (readdir, read, write, etc.) in A's server.

There the problem is not totally solved by the IPC instructions but they help by making the context switch faster. However the big problem is the transfer of data across memory addressing spaces, and that requires a later, deeper analysis and smart design. It's way too early for this now.

> What happens when the server does not return to the client ? This could happen if an error or bug occurs in the server, or if it is malicious. There is a need of a mechanism that lets the caller get his control back but it complicates the design of the servers, that could be interrupted at any moment (which is not impossible or desired).

Reliable code MUST be resilient. And code by definition may be interrupted at any point. Exception handling is an integral feature of high-level languages and low-level systems are naturally prone to failures : flash memory errors (worn out ?), file system capacity saturated, network down for any reason, USB cable that is removed without notice... And bugs happen.

Even critical sections could fail. This is why I consider an instruction design (for http://yasep.org/#!ISM/CRIT) where you can check if the critical section has been interrupted or not. In that case, you re-start the critical section. It should be more resilient and safer than blindly disabling interrupts. Expect things to fail instead of relying on assumptions.

> One of the underlying questions is : who "pays" for the resources that are necessary to run the request ? The client, who lends his resources (and that must get them back from the server in case of a failure) or the user who provides the server (who then risks being DDOSed)...

My opinion is that the requester must pay for the resources to run the request, for example by providing CPU time, access rights and memory, that are necessary to complete the request. Of course, certain protections are necessary to prevent abuses or to limit the effects of bugs.

How the resources are accounted is another important thing to define, along with how to easily transfer data blocks or credentials between servers or their instances. For example, if a server is called by User 1, it should not be able to share these informations with the instance that services User 2. I'm interested by hardware-based solutions that speed-up and simplify software design, without turning into a mess like the iAPX432 :-)

A crippled YASEP ? It's for your own good...

The YASEP is my playground so I can experiment freely with computer science and architecture. It lets me consider the whole software and hardware stack, not just some pieces here and there.

A recent conversation about the IPC system (Inter Process Call) reminded me of a characteristic that I once considered. It is quite extreme, but it opens interesting possibilities : limit the code size in YASEP32 to 16 bits addresses. That makes 32K half-words or about 20K instructions per thread. Of course, there would be no absolute limit in the number of threads.

But why cripple the architecture ?

Because the YASEP is meant to be an embedded CPU, running a microkernel OS, and here is why it is possible and desirable.

 - The imm16 field is available to most opcodes and it provides the best code density.

 - Embedded systems and microcontrollers usually run small applications. There is little chance that 20K instructions get exhausted by a single thread, for a single purpose. Once you reach 10K instructions, you have already included libraries, system management code, and of course tons of data, and all of these can (and should) be kept separated.

 - Microkernels spread, divide and separate system functions, using "servers". My idea would be to push this approach and mentality down to the user software level. And there is no way to coerce SW developers to NOT write bloatware without "some sacrifices".

 - GPUs already have such an approach/organisation (small chunks of code for discrete functions) for hardware and software reasons. "computational kernels" are quite short.

 - In the "microkernel" spirit, shared libraries become a special type of server. In POSIX systems, the flat address space of each thread maps to each shared library, and memory paging flips the cards in the background. A huge pointer (and big arithmetics) is necessary. In an early YASEP32 implementing multiple threads, shared libraries become possible without adding memory paging. A bit like a hardened uClinux. Code (and the return stack) would be a little bit more compact (no 32-bit constants for code addresses) and reaching external resources would just require knowing the "extension code", the ID of the thread that provides the feature in a "black box".

 - Such an approach is possible because "IPC is cheap" now. In the "flat linear space" systems used in POSIX, the burden of protection is on the paged memory (TLB) and the fault handlers, who can take hundreds of cycles to switch to a different task... But the YASEP's IPC is marginally more complex than a CALL.

 - Only code would have limited address ranges. Data pointers are not affected. I have programmed and used MS-DOS systems in 16 bits environments and the constant problem was how to access data, not code. Of course there are certain applications that need more than 20K instructions but they need to be split, the PC of these times had "overlays" and other kludgy mechanisms that I don't want to revive.

 - Modularity. Design software so its components are interchangeable, hence a bug here has less chances to affect code there. Or hot-swap a module/server/lib while the system is running. Microkernels can do it, why not applications ?

 - Safety is increased by the extra degree of separation between any "sufficiently large chunk of bugs^Wcode" to prevent or reduce both accidental and malicious malfunction.

But for this approach to be realistict, cheap IPC is not enough. Sharing data, fast and safely, between "servers" is another key !

TBD (to be designed)

Wednesday 2 January 2013

The YASEP2013 season is open !

Happy new year everybody !

EHSM2012 just finished and it was a great opportunity to meet likeminded hackers, and the MP3, video and slides are available in the archives.

The YASEP2011 period is now closed and archived in its glory and gory, YASEP2013 will try to keep the high level that it inherited :-)

And to start this new year, a new contribution will be integrated : the "firstrun" is translated to german, the 4th language that the interface now supports !

It will take a few weeks to update the site but a lot of small incremental enhancements are expected soon.

Enjoy this new year !

Thursday 29 November 2012

YASEP2011 is almost finished

JMLL2012 was great !

  •  The conference had no "demo effect" !
  •  MP3 + transcription with the slides will be available (one day...)
  •  microYASEP is coded both in 16-bits and 32-bits
  •  microYASEP simulates in JavaScript and VHDL
  •  the framebuffer (graphic output) works in JavaScript and VHDL too (but VHDL only on 32-bits linux)
  •  the JavaScript simulator runs (almost well) under the webkit-based Midori browser, hence on the Raspberry Pi !
  •  The project turned 10 !


A lot needs to be done :
  •  solve one GUI bug
  •  add new tutorials and translate the others
  •  transcribe and translate the conference
  •  translate all the website to Spanish
  •  adapt it for webkit
  •  finish the last 2 instruction set manual pages
  •  more (virtual and real) interfaces for the CPU : servo-motor, alphanumeric LCD, LEDs&Keys...
  •  create the enhanced serial interface for programming & debug
  •  prepare the Berlin presentation and demos
  •  transition to YASEP2013 
Any help is welcome !

Monday 19 November 2012

YASim is working

Another quick announcement : YASim, the YASep Simulator, is now able to compute data, loop&call, and even display pixels on the screen.

It had to be said ;-) Particularly since the presentation is so soon...

Thursday 15 November 2012

The YASEP goes to Berlin

It's official : the YASEP is invited at the Exceptionally Hard & Soft Meeting held in Berlin on december 28-30 2012. 

A 30 minutes talk will present the project, it's shorter than at JM2L but it will be completely in english. It will also be the official transition between YASEP2011 and YASEP2013.

Hopefully, better live demos will be possible by then ;-)

Thursday 4 October 2012


It's official !

The YASEP will be presented during two hours, on saturday, Nov. 24th at Sophia Antipolis, during the Journées Méditerranéennes du Logiciel Libre organised by Linux Azur

Informations are available at http://jm2l.linux-azur.org/content/2012/yasep

The slides will be published during the presentation and will start the transition between the YASEP2011 and YASEP2013 architectures.

There is still a lot of work but it's looking good so far. See you there !

(updated 2012/11/15)

Monday 24 September 2012

Virtual Load and Store


The Instruction Set Manual is nearing completion.

As if things weren't serious enough, I'm now reviewing the Load and Store instructions. And since there is none, I removed them.

Confused ? It's normal : the YASEP has no load or store operation. But there were opcodes named after this, thinking they would make people feel more comforable. Which was not the brightest idea ever.

So I removed them and kept the aliases I had created : the Insert and Extract opcodes. As the name says, they insert and extract bytes or half-words (8 or 16 bits).

So far the YASEP had these opcodes : IB IH IHH ESB EZB ESH EZH

End of story ? NO !


Register set ports

The review of the Instruction Set Manual unearthed some concerns I had for a while : the "Load" and "Store" instructions get auxiliary data from other implicit registers. Which can create quite an electronic mess... And it's not flexible enough. It was created for the sake of load and stores, using the "register pair" system where the Address register provides its LSB to the shifting unit, so it knows how much alignment is necessary.

While reviewing the ESB / EZB instructions, I remarked that it was a one-read, one-write instruction (in FORM_RR or FORM_IR). What a waste of coding space, let's make it a RRR or IRR instruction and get rid of the implicit operand that must be fetched in the other registers. Easy, and architecturally much better.

But the Insert instructions (IB and IH) are another beast, they need 3 operands and a destination. The first operand is the register that contains the data to insert, the second is the word that will receive the data, and the 3rd operand is the implicit shift count that comes from the corresponding A register if a D register is written...

It's not cool. First there is this rule, that I thought was critical, of having only two read ports for the register set. Second, splitting the register set for performing parallel reads is a trick that can backfire mercilessly later. Going 3reads-1write would be great... At what cost ?

But wait, the YASEP is already 3reads-1write because there are the conditions to read. The microYASEP implements this by passing both halves of the extended instruction through the 2-reads register set. And there is one result that remains unused during the second cycle... That's it !
So in practice, despite the limitations, the microYASEP is a 4-reads 1-write engine. 1 read for condition, 3 reads for operands and the destination register is one of the operands. Great. Now let's create the new flags "READ_DST3" and "IMM_LSB" and change a lot of the code that is already around... Lots of work indeed.

The return of the carry flag

The YASEP has a carry flag and a zero flag so let's put them to good use.

One concern of the YASEP32 when inserting and extracting unaligned half-words, is the case when the offset is 3 bytes : only one byte is available in this case. This means that the result is partial and not working as expected. The simulator sends an error message but that's not very handy. The assembly code that detects this case could take at least a pair of instructions, then there is the leftover to align...

The little stroke of genius is to change the carry flag when such a situation occurs. Then, for the user's code, it's just a matter of a few conditional instructions to increment the pointer and store the remaining byte at the new address (or just skip the alignment sequence). Something like

 ; Unaligned write :
; memory is pointed to by A2:D2,
; the code stores 16 bits located in R1
IH A2 R1 D2 ; set the carry flag if A2's LSB are 11

EZB 1 R1 R2 CARRY ; If out-of-word then
; extract the high byte to temporary register
ADD 1 A2 CARRY ; point to the next byte
IB R2 A2 D2 CARRY ; now D2 points to the next word
; and its value, R1's high byte is inserted
; in D2's lower byte

It's a bit far fetched but it's still RISC.


Is IHH still needed ?

The instruction "Insert Halfword High" is a special case of IH that was created to supplement MOV so the user could overwrite the high half-word of a register. The intended use was this:

 ; put 12345678h in R1
MOV 5678h R1 ; LSB
IHH 1234h R1 ; MSB

A special opcode was needed because IH would get the shift from one of the A registers, or use 0, yet we needed to shift by 2 bytes...

Now we have a new IH that gets the shift amount from a register or an immediate field so things would be great in theory. In practice, there can be only one immediate number so IH is ruled out.

What we need is a MOV instruction that can shift the immediate value by 16 bits. So here we have it : MOVH. The instruction sequence is modified a bit :

 ; put 12345678h in R1
MOVH 1234h R1 ; MSB
OR 5678h R1 ; LSB
YASEP2013 looks better and better...

Monday 10 September 2012

el YASEP en español

The YASEP site is already available 3/4 in french (some less than important stuffs are not translated) and the language picker and all the machinery behind it are working well so... we started to translate a first page in Spanish ! Thanks Kris !

Other languages are welcome and desired : if you want to help, a page explains how translations work.

- page 1 of 4