YASEP news

To content | To menu | To search

Wednesday 8 December 2010

ACTUINO day 1

Yesterday, while talking with Jeff about our respective and converging goals, a new idea came.

Today, actuino.org is registered. The website will appear later, one day, but the name is found and secured while we work toward the new milestone of an electronic board that is DIY-friendly, very powerful, affordable and paving the way for developing the YASEP.

The one big issue for me though is that I'll have to cope with the Atmel architecture, which I don't "speak" so any help is appreciated :-)




Saturday 20 November 2010

Fast and secure InterProcess Communications

(post version : 20110108)

(update : 20110515 : environment inheritance)

Recently (2010/11/20) I found the critical elements that solve a crucial problem that the Hurd team submitted to me in ... 2002. It took time and many attempts but I think that the YASEP is a great place to experiment with this idea and prove its worth.

The Hurd uses a lot of processes to separate functions, enforce security and modularize the operating system. It uses "Inter Process Communication" (IPC) such as message passing and this is snail slow on x86 and most other architectures.

The YASEP uses hardware threads which is a concept close, but not identical, to the processes of an operating system. And these last days I have found what was missing : the "execution context" ! So with the YASEP, a process is a hardware thread (a set of registers and special registers) associated to an execution context (the memory mapping, the access rights etc.)

Repeat after me : a process is a thread in a context.

This distinction is necessary because threads are activated for handling interrupts, operating system functions, library function calls and communication between the programs. It's a major feature of the processor which should provide functionalities that go beyond a mere microcontroller...

So IPC is necessary to make a decent OS and it requires several hardware threads (threads can be interleaved at the hardware level to provide with concurrency and better performance) and several contexts (for the operating system, device drivers, libraries, interrupt handlers...). The processor state can jump at will from one to the other with much less latency than an usual CPU.

The antagonistic requirements are as follows :

  1. A process must be able to call code from another context FAST, as fast as possible.
  2. The mechanism must be totally SAFE and SECURE.
  3. The physical implementation must be SIMPLE.

Simple and fast go hand in hand (ask Seymour Cray. Oh, wait, too late...). In the YASEP, communication takes place with a restricted variant of the function call instruction. Function calls are difficult to "harden" and more generic and specific instructions are usually found in other architectures to provide IPC or system calls. These are quite simple to implement in a CISC architecture like x86 because microcodes can do whatever is required... But they are slow because several dependent memory fetches must be performed (read the access rights table then find the address of the code to execute, whatever...)

The YASEP is a RISC-inspired architecture and requires a new approach. What I have found requires just 3 new opcodes :
  1. IPC : InterProcess Call
  2. IPE : InterProcess Call Entry
  3. IPR : InterProcess Call Return
Since the YASEP has a bank of several threads in the register set, the context switch is a matter of a few cycles only. One way to further reduce the execution time is to pre-calculate the destination address of the called code : no call table or things that require several chained/dependent memory accesses. In order to obtain the jump address, a thread must register itself in the called process and obtain the context number and the effective address. The calling thread can then modify its own code (update the constants) or variables to make the proper IPC later. Here is how simple it gets :
     IPC R1, R2    ; call context number R2 at address R1
IPC 1234h, R2 ; call context number R2 at immediate address 1234
Security is a bigger beast and just changing the TID (Thread ID) value is not a good method. The first big problem is that any code can call any context at ANY address and a security mechanism is required to block unwanted calls from succeeding. The policies could be arbitrarily complex (depending on the OS strategies) and don't belong in hardware (unlike x86), a software-based authorisation system is preferred (like MIPS !). This is the role of the IPE instruction :
  1. IPE provides the Thread ID and Process ID of the calling thread (it's a kind of GET). From this, the callee can choose to accept or refuse the call, provide a specific service or even choose to not check at all. Any software can create its own policy, call by call !
  2. IPE is NECESSARY for the IPC instruction to complete. If IPC points to an instruction that is NOT IPE, an error is triggered. This prevents all applications from jumping anywhere in any code.
  3. Each thread can restrict the range of callable addresses so calls can't enter data sections. This is the role of additional registers.
When the thread calls code from another context at the right address, the register set is preserved (not touched) so the transmission of parameters takes no effort. However several new issues appear.

For example, how can one thread in a different context access data from the previous context ? The proposed solution is to provide an attribute to each Address register : the context number. Upon call, the newly spawned process will modify the necessary attributes to access to both the current and the calling process. Which means that all the previous contexts must be kept in the processor (since interthread calls must be reentrant). Before the call, the calling process should mark the memory ranges it accepts to share with the called process (marking the range as "shareable"). This way, no data copy is necessary !

The return address and thread/process/context IDs must be managed by the CPU core itself to prevent tampering by the caller or callee. This is the last point that needs some big work and HW real estate ... A classic stack, with a stack pointer, stack base and stack limit, are necessary hardware resources to add.

So let's sum up the added hardware :
  • Each context must be able to mark memory ranges as data-read and/or data-write by other threads. This can be indicated by flags for each page in the page table. How this can be restricted to certain threads (that are in the call stack) is still uncertain, a token scheme should be created where a permission can be passed to (and inherited from) another thread.
  • Each context has 2 registers that are compared to the called address to restrict unwanted calls.
  • Each process has a set of 3 registers for the IPC stack (pointer, base, limit). Pointer and limit are compared for equality upon call and pointer and base are compared during return.
  • There are also 5 new thread-private registers that determine the owner (thread number) of a pointer. They must be preserved in HW if the caller or callee are not trusting or trusted.
That makes about 10 new registers ! How this will be implemented is still uncertain. Maybe a hardcoded sequence of instructions will be streamed through the instruction decoder, unless everything is done in parallel in big enough chips. This reminds me that in the past, I wanted to add "attributes" to the address generators of the VSP, with base/index/limit/stride, now there is the context number that is some kind of "address space number" (ASN). We can finally merge these ideas and in 16 bits code, we can use ASNs like segments in x86 : one for executable data, one for the stack, several for data, and no opcode prefix is needed.

Whatever the implementation, we're going here from a system initially designed for libraries and system calls, extended to the next level : a micro-kernel oriented architecture where processes can share memory they own so others can work on it, with little overhead. Will the Hurd people be finally happy now ?

Saturday 8 May 2010

YASEP2010


The main YASEP site has not been updated for a while...
Worse : the f-cpu.seul.org miror is down since january !

Is the project dead ?

No :-)

In fact a lot of things are being prepared, mostly in the commercial, infrastructure and very-low-level hardware (like : where are those 0402 capacitors ?) fronts. It's really exciting but it takes a lot of time and money ! Fortunately I'm not completely alone.

A string of good news will probably come in 2011, they will help the bootstrap of the whole YASEP project with different kinds of support, with broad public exposure. It will be possible to have a YASEP implementation in hand, I work both on the hardware and software sides :-) BTW, a recent Wikipedia article has appeared with a short summary of the YASEP's architecture.

Another critical part of the project (the VHDL source code and its infrastructure) is in active development : GHDL is now the officially supported simulator. I have interviewed the main developer (Tristan Gingold) for GNU/Linux Magazine France n°127. With Laura, we started a series of articles about VHDL development under Linux and I am proposing increasingly advanced ... hacks :-) The first YASEP implementations will be designed with "design for test" in mind.

In parallel, another subproject is the design of a Libre, affordable, compact and Ethernet-enabled JTAG programming probe. More on this subject in the future, but it's critical for the rest of the whole project : my JTAG probes are either USB (and constrained to Actel parts, and don't work under Linux) or parallel-port (no new consumer-grade computer today has this port anymore).

Finally, after the seul.org debacle (due to main server being compromised because of its participation in the tor network), I have opened a new miror at TuxFamily.

So I'm still polishing the tools and gathering the parts. It's not a visible activity but it's probably the most important. What does an architecture mean if there is no infrastructure behind ? With no physical implementation that one can buy and hack oneself ?

Friday 13 November 2009

Support of Alphanumeric LCD with YASEP

I have been very busy since august, unfortunately not with YASEP but I keep an eye on this project. Even though I can't dedicate days and weeks to this, I try to gather things here and there when they appear, like electronic parts, ideas, and ways to implement them.

For example I've been thinking about how to display informations with a simple FPGA kit.  I already have a nice collection of alphanumeric LCD modules that is expanding, so they are a good and cheap output peripheral.

From there, at least three things follow :

  1. The modules I own have different resolutions : from 1x8char to 4x20 but there is no electronic means to distinguish them from the others. So I recently imagined a method, discussed a bit about it on USENET and decided that it was worth implementing it. I am writing a RFC about this now.
  2. I'm going to add a set of Special Registers that support the parallel interface to a LCD module in nibble mode. This is going to provide automatic strobes, and ease application software development. This unit will also support readback of LCD resolution, supporting the protocol defined in 1. Contrast voltage is controlled by a simple PWM/PD circuit instead of a trimpot.
  3. While looking around for more informations about the HD44780-compatible modules, wikipedia sent me to a JavaScript HD44780 simulator designed years ago by Dincer Aydin. He has done even crazier things like a graphic LCD simulator or a PIC assembler in JavaScript ! I asked if I could reuse the alphanumeric code and Dincer kindly accepted :-D I have not looked at the source code but I presume that it's going to need a lot of work (particularly for updating the display engine, because updates are "optimised out" in Firefox). Anyway the YASEP simulator is not even mature enough so there is no hurry... 
Everything seems to be in place for a future use of alphanumeric LCD modules. I have more than 20pc available, I have already used some of them on a past PIC project, and the JavaScript framework will support them. I'm not saying it's going to be easy, but it's far easier than I thought !

Friday 2 October 2009

When you connect the power supply, it works...

As one can guess from the past messages on this "*log", I have been slowly preparing custom FPGA boards as a background activity. It's not an easy thing and can be quite expensive. So I patiently gathered the necessary parts through online stores and eBay, looking for interesting deals.

Finally I have all the necessary parts for a cheap and repeatable prototype. Among others :

* A bunch of A3P250VQG100 : I got them from a really nice Canadian guy and I'll use this specific reference as the main target for the future works. I originally intended to target the A3P125 but I got more powerful for less money so why refuse ? :-D The A3P250 has enough logic for moderately complex stuff (though SRAM is really TIGHT) and can replace microcontrollers in many cases.

* QFP100 adapter boards : FUTURLEC has cheap and good proto boards. The tin makes soldering easy, just add some liquid flux, no need for aditional solder.

With the help of Actel's docs and the schematics of other boards, including ACME's Colibri, I easily wired the power rails. The board is not recommended for high-speed signals but the goal is only to check the schematics for more ambitious boards (probably manufactured through FUTURLEC too, as their PCB pooling service looks great).

I created a small dumb VHDL design (8-bit counter with clock, reset and increment/decrement inputs) backed by a small board. The additional board also provides 3.3V from a battery, so I could avoid long wires from power supplies.

And in order to be programmed, the FPGA needs a JTAG interface. I soldered everything correctly but the JTAG/USB interface would refuse to work. After a small nap and many hypothesis, the problem was obvious : the JTAG signals were correctly wired but I forgot to wire the power supplies :-/ Obviously, when it's fixed, the things work considerably better... I'm amazed that it is the only error, considering my sleep deprivation :-D

First Actel proto board \o/

No, really, it just works as expected. I may have finally become good at this, after all the failures and false starts of the past :-D It even reproduces the strange behaviour that I had seen in other designs : the pins are REALLY sensitive ! Don't forget the pull-down's ! I made a basic/passive anti-bounce (just a RC filter) but it is useless : a single clock push creates many strobes and the counter advances unpredictably. But I half-expected it and I did not even register the inputs in VHDL so it is naturally glitch-prone, so I don't care. It "works".

What does this mean ?

* When enough resources are gathered, complex things become easier. I have invested a lot of money and time in the past years just to get to that point and ... it feels good !

* Great things that were "possible" now become "available" for future projects. This includes YASEP and other (commercial ?) designs.

* FPGA are damn cool ! Actel's chips are certainly slower and less capable than other makers but their products make this little board possible and easy : once the power supplies and the JTAG are (hum, correctly) wired, the board can be plugged in other cheap prototypes. No need of external Flash chip, bootstrapping EPLD, or whatever...

Next in line : a parallel port interface (hooked to a computer) then the SRAM chips :-D Then I'll try to develop an embedded CPU design with Ethernet that could replace the Rabbit, PIC and AVR. Finally, I wish to create an Ethernet-based JTAG programmer that will replace the proprietary and USB-bound FlashPro3. This proprietary probe is not extremely expensive (Actel has wisely created even cheaper versions) but USB is such an annoyance !

Sunday 23 August 2009

Back from vacations...

The lack of Internet access during 2 weeks of vacations was a very good thing for the YASEP, the development was stimulated and efficient !

I should mention that the environment helped being in a great mood, if you don't count all the insects. Have a look at this picture or this video if you wonder what it's like to develop in VHDL in the country, under a wonderful tree and sitting next to the tent. BTW, thanks Toshiba for the extra-life-battery pack for the Portégé 3490, I could work about 5 hours in a row but it recharges very slowly.

I did a lot of cleanup, completed some pages, integrated the first extended instructions and re-enabled the disassembler. I also examined the multiply instructions and created an algorithm that initialises the multiply lookup-tables ! I also added an algorithm that generates random opcode examples, instead of the fixed strings of before. It's more efficient at finding bugs !

Before I upload the new site, I still have to change some fields and remove the _X forms (as they are useless now, because the "always" condition has the same effect).

I'm also working in parallel on the VHDL source code. I'm adding a CRC32 unit mapped in the SRs so communications and files will have better and faster checks. Unfortunately, I lost a few days of work in a defunct hard disk...

Stay tuned !

edit :

The site is updated, enjoy !

I also recovered the few days of work locked in one of the computers, the disk is not completely dead (it's just dead slow so a Slackware LiveCD is necessary)

The next steps are : website minification, VHDL code development,  further development of listed, pointer update, short jump/call instructions...

I'm also looking at compression/decompression algorithms such as deflate and range coding.

Friday 24 July 2009

YASEP en français

Grâce au concours de Laura, une partie des pages web du site YASEP est en cours de traduction en français. Pour l'instant, ont été intégrées les pages suivantes : l'index, les registres, les instructions, la carte interactive des opcodes et YASEP16/32. D'autres pages devraient suivre, j'attends que Laura soumette d'autres pages.

Au début du projet, j'avais décidé de tout faire uniquement en anglais. Mon expérience m'a montré que le support de plusieurs formats ou langues différentes augmente la charge de travail, donc réduit le temps passé à créer des choses utiles. De plus, il y a toujours une version qui est à la traine et cela rend le projet incohérent, vu de l'extérieur. On a alors tendance à ne plus se référer qu'à la version "principale" (en anglais) et la version traduite sombre dans l'inutilité.

Ce coup-ci, il est bien clair que la version "officielle" du projet est en anglais. La traduction française sera probablement en retard sur un nombre inconnu de points, à mesure que le temps passe. Mais la démarche de traduction apporte plusieurs avantages :

* D'abord, j'ai tendance à écrire en anglais de manière absconse et à la fin je suis le seul à comprendre ce que j'ai écrit. La traduction me confronte à mes mauvaises manies et m'oblige à reformuler mes phrases, pour les rendre plus claires. C'est en accord avec mon exigence d'accessibilité, d'autant plus que la traductrice, Laura, est moins bonne en anglais et en technique que moi, et je voudrais être compris par des personnes encore plus débutantes.

* Ensuite, Laura est plus proche et plus exigente que les collaborateurs précédents. J'en attends une meilleure qualité et un meilleur suivi.

* Aussi, avoir deux versions d'une même page web force à séparer la présentation, le contenu et les scripts : c'est la nécessité de modularité et de non-redondance qui deviennent importants.

En plus, cela me permet de revoir et donc améliorer les pages originales, d'y faire du tri...

Comme d'habitude, je suis intéressé par toute remarque constructive pour améliorer le site.

Saturday 4 July 2009

Probable new features

When a project has practical uses and implications, it is interesting to see how it evolves and better fill de gaps that a purely theoretical design would address. For YASEP, the modifications have been very deep, while many of the neat original ideas remain. Lately, there have been a few new ideas that may or may not be implemented.

  • A new CRIT instruction :

This is a method to perform atomic instruction sequences. It opens a HW-garanteed CRITical section, that lasts a few and constant number of instructions (1 to 16 depending on the imm4 argument). After/before this, IRQs and other things are checked, to prevent the system from hanging because of back-to-back CRIT instructions...

  • External bus expansion with off-chip buffers

In the case where the number of FPGA pins is low, a lot of them are used by external SRAM. The address and data bus could be used to expand the I/O count, by adding a few 74LVC574 and 74LCV245. In this case, a few specific instructions are required because the GET and PUT instructions work only with internal resources. Another issue is the bus loading that might affect the timings and/or speed. The Inputs and outputs could be easily separated, the output latches can be tied to the address bus (because it is unidirectional) while the Input buffers can only be tied to the data bus. Voltage translation is also a desired feature.

  • CRC32 accelerator

As the need for a zlib port arises, the necessity to check CRC32 signatures becomes a problem. I have already designed CRC routines and... well... they can become quite heavy. OTOH, it is rather straight-forward to do in hardware. I don't want to make yet another instruction here because this would make the pipeline more complex (and the number of registers is already too small) but a small set of SR will do the trick.

  • DMA for SPI

SPI is used when booting the CPU from a SPI Flash memory, or when communicating with Ethernet or 2.4GHz interfaces. Adding a simple DMA capability would save a lot of cycles and latency.

Other things will certainly come later...

Monday 29 June 2009

YASEP@HSF2009

On June 26th, I have presented a joint project with Laura, called "GPL" (Gaming Platform Libre), at the HackerSpace Festival (HSF2009) near Paris. See http://www.hackerspace.net/gaming-platform-libre

This is a french talk, and the slides are here.

I present the latest thoughts about how cryptographic protection of contents could be compatible with the gamer's and the game editor's freedom and cooperation. Some slides also present the latest updates in the YASEP instruction set.

Friday 24 April 2009

First Layout of a custom FPGA+SRAM board

I have not been fully satisfied by all the boards that I have seen. There are always details that don't match a project or requirements that are not met (size, price, features, whatever). So I finally decided to start my own board(s).

Firt route of a TSOP-2 SRAM to a A3P125 FPGA in VQ100

It seems that YASEP could easily replace microcontrollers that I already use. The flexibility offered by FPGAs and the ability to strip a thing down to the minimum, then expand on that depending on the needs, makes this solution more and more attractive. No difficult selection of features and package (as with fixed-function chips), put the FPGA on the board and route the pins...

I can't solder BGA package, or even build suitable PCBs myself, but I'm already able to make double-sided PCBs that can be fitted with a FPGA in 100, 144 or 208 pin in QFP package. I'll be able to reuse these designs in the future, or make my own cheap modules.

Saturday 4 April 2009

First details of the new "extended" long instruction

A precedent post has summarised the available "instruction forms", with or without immediate field (4 or 16-bits), with 2, 3 or 4 register addresses. Here we look at the "long form" (32-bit) using the "extended" fields that add 2 register addresses, conditional (speculative) execution and pointer updates.

Let's now examine the structure of the 16 bits that are added to the basic instruction word :

  • One bit indicates if the source is Imm4 (it replaces the corresponding field in the basic instruction).
  • 2 bits indicate a condition (LSB, MSB, Zero, Always) and another bit negates the result (The condition "never" will be used later but I'm not sure how).
  • 4 bits indicate which register is being tested
  • 4 bits indicate the destination register (replacing the src/dest field in the basic instruction)
  • 2 fields of 2 bits each encode the auto-update functions of one source register and the destination register (nop, post-inc, post-dec, pre-dec)

These fields are mostly orthogonal and can work in almost any combination. One can auto-update 2 registers (whether they are normal or belong to a memory access register pair), perform a 3-address operation and enable write-back depending on 97 conditions. It also preserves the availability of short immediate values, which further reduces code size. However it can increase the core's complexity.

One unexpected bonus is that this new architecture iteration is more compiler-friendly. At least, it's much less awkward or embarassing.

One bit could have been saved : the imm4 flag could be merged in the auto-update field for a source register. However this increases the logic overhead and prevents simultaneous use of auto-update AND imm4.

Stay tuned...

Yet another Instruction Set Architecture change

I wish it could stabilize soon, but at least movement is a sign of activity (or the reverse :-))

I was annoyed by the ASU operations :

  ADD, SUB, ADDS1, SUBS1, ADDS2, SUBS2, MIN, MAX

These instructions were the last ones that used skip technique, since it is progressively dropped in favor of relative branches by conditional add/sub to the PC register.

How is it possible to provide the same functionality without skip ? It's the same old question that decades of research has not yet answered definitively. The Carry Flag is the obvious solution but I have just dropped the "status/mode register" in favor of another general purpose register. So where can I find a stupid bit of room ?

The answer is there under my eyes : the LSB of the PC ...

OK OK I know it's ugly. But consider these aspects :

  • The PC points to the next instruction and never uses the LSB because all the YASEP instructions are aligned on 2-bytes boundaries.
  • Any write to the PC register modifies the bits 1 to 31. Bit 0 comes from the ASU's carry output.
  • We can declare that only the ASU operations (or context changes) can change the PC's LSB. All the other instructions can read it and test it, so the informations is easily available.
  • Since we dropped the 4 instructions that used skip, these "slots" can be filled by other instructions :
 CMPS, CMPU, SMIN, SMAX

CMPx are just like SUB but don't write the result back. I wish it could set the LSB of any register but the current architecture doesn't allow this, so please keep the destination field to PC when encoding the assembly instruction.

3 new instructions deal with signed comparison : CMPS, SMIN & SMAX. They were missing from the previous opcode maps but the elimination of the skip-instructions leaves enough room. I have to update the VHDL now...

  • Keeping the carry bit in the LSB of the PC can have a curious side effect : relative jumps with odd values will make the carry bit ripple to the other bits of the result, so the destination address that is written in the PC will depend on the value of the carry bit. In practice, there is no speed or size advantage (compared to condition codes in the new opcode extension) but the possibility is there...
  • Clearing the carry flag is done with
  CMP Rx, Rx
  • Setting the carry flag is done with
  CMP -1, Rx

(or something like that)

Usually, I would end the post with something along the lines of "this is good and everybody is happy". Now, I feel a bit disapointed that YASEP looks more like other architectures, and has less distinguishing features. It is less groundbreaking and it will have to face the same problems as the others, on top of its inherent quirks. But it's still better than nothing and I do my best to keep the system rather coherent and orthogonal.

Thursday 19 March 2009

what about YASEP2009 ?

Development of and around YASEP is going on in a weird way, but it still continues...

Why so much caution ? Because the changes to the architecture are quite deep. The instructions forms are increasingly complex and I've pushed the design beyond what I intended in the beginning.

If you don't remember, YASEP had only two ways to address data previously :

short form :

 Reg1 OP Reg2 => Reg1  (16 bits)

long form :

  Reg1 OP Imm16 => Reg2 (32 bits)

Now a few bits are freed and this gives much more "flexibility", so I added :

Short Immediate :

  Reg1 OP Imm4 => Reg1 (16 bits)

Long Register :

  Reg1 OP Reg2 => Reg3 (32 bits)

And because there was still some room, this last form has more elaborate versions :

Long conditional :

  Reg1 OP Reg2 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

And other versions come up when the Reg2 field is interpreted as Imm4 :

Long conditional short Imm: (excuse the name)

  Reg1 OP Imm4 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

Or without condition :

  Reg1 OP Imm4 => Reg3 (32 bits)

This applies to the computation instructions, the control instructions are still too undefined yet.

Code density should increase, which is worth the efforts. I don't know if it will reach the level of ARM or x86 but it is certainly a major advance. However, this breaks a lot of the assembler's mechanisms, so I prefer to rewrite it. This takes a while because the rest must be adapted too : the Instruction Set, the manual pages, the validators...

If you can't stand the wait, have a look at a precent, broken version at http://yasep.org/~whygee/yasep2009/, at least it is more recent than the main site.

Wednesday 18 February 2009

Listed : the dynamic LISTing EDitor

So I've been busy again...

This time, it's all about JavaScript. The preliminary version is available from http://yasep.org/~whygee/listed/listed.html

What is it really ? It's an interactive assembler in dynamic HTML, loaded with JavaScript and CSS stuff. It's also an interface to the JavaScript assembler and the simulator.

  • The little windowing system allows one to break a whole program into small chunks, that are easier to manage. Assembly langage listings can easily get messy, but local symbols and hideable sections reduce the usual clutter on one's window/screen.
  • As the user edits each line, the modifications are committed to the rest of the page : the instructions are re-assembled, the labels are updated where they are used, the simulator can reinterpret the sequence and give preliminary results for given testcases...
  • The assembler is not limited to YASEP : the CPU interface is going to be generic, and LISTED could support any CPU that can be described in JavaScript (that means : all, provided enough adaptations are coded). A dummy, overly simple and dumb CPU architecture will be given as an example, so somebody can easily adapt it for x86, PIC, Alpha, MIPS, POWER, or RCA1802 ...
  • This is going to be linked directly with ARF, which is another graphic coding interface.

I have been working on this for more than 3 weeks and a lot of work still remains. I focus on user comfort and UI design but I keep flexibility and expandability in mind. For example, I have developped YGWM to handle the windowing part, which will be reused by the whole yasep.org website. The assembler and simulator will remain completely decoupled.

In the end, it only confirms what I believed for some time : JavaScript is a fantastic opportunity for really new ideas, it provides portability and rapid design. However, after trying to make it compatible with different browsers, my strong recommendation is : use Firefox and stick to it

Friday 23 January 2009

YASEP2009 : "It's gonna be big"... when it comes

The YASEP architecture has changed so much that a big rewrite is necessary.

My local copy is so... broken here and there that I prefer to not update yasep.org. The modifications are so deep that it's not possible to just patch a few things.

The organisation of the website should evolve a lot and I'm thinking about new techniques.

The documentation must be partially rewritten, not simply updated here and there.

Today's site structure dates back to 2006, maybe the big rewrite is a good thing in fact.

However, this is so much work, and my concentration is so volatile, that I wonder when the website will be updated with something stable enough to be almost publishable. In fact, I'd rather not wonder, the answer would scare me. Anyway, I see that many efforts I have done in the past years have been fruitful and helped build the project as it is now. So I keep faith and continue.

Monday 19 January 2009

Yet another new Actel toy \o/

As you may know, YASEP16 will probably be used in my girlfriend's "pet projet" Ours Agile. This involves lots of real-time computations, countless sensors and more than 30 actuators... Sure, YASEP could handle that, probably. But the interfacing was giving me headaches, so many analog components (on top of high-speed memory) seems expensive and/or difficult.

Then I spotted a second-hand AFS600 evaluation kit from Actel, that I got for a fair price. It was a bit risky and I first thought it was broken. But since it's 2nd hand, somebody has probably played with it, and just uploaded a new configuration bitstream. With the help of a french rep., I found and uploaded the original demo bitstream and ... Magic happens !

Actel AFS600 eval board plugged

This FPGA family comes at "premium price" but it's a damn great opportunity for robotics projects :

  • 512KB of program space as Flash EEPROM (no need to download from external SPI !)
  • onchip 100MHz RC clock generator (exactly what I'm aiming at !)
  • RTC, temperature sensors, low power...
  • high-speed 30-channel ADC !
  • several integrated MOSFET gate drivers
  • 13K tiles vs 6K on the A3P250
  • 24 SRAM blocks vs 8 on the A3P250

This is definitely a great toy for robots...

Tuesday 6 January 2009

Evolution of the instruction set

As the execution units mature and get integrated as one block, things become clear, at least concerning the computation instructions. I'm currently focusing on the 16-bit flavour of YASEP and I expect that the following will hold true for YASEP32.

The ALU16 is nearing completion, though feature creep is still rampant. But I have identified a bunch of instructions that will not change much in the future, and they are gathered here :

- ROP2 : AND, OR, XOR, ANDN, ORN, XNOR, NAND, NOR
- ASU : ADD, SUB, ADDS1, SUBB1, ADDS2, SUBS2, MIN, MAX
- SHL : SHR, SHL, ROR, ROL, SAR  + MUL : MUL8L, MUL8H, MULINIT
- IE : MOV, SB, LSB, LZB (16/32b) SH, SHH, LSH, LZH (32 bits only)

This nice and square table represents the large majority of the used instructions, and this fits into 4 groups of 8 instead of the planned 8 groups. So...

This saves a bit that is used to encode other addressing modes. In 2008, there were 2 modes : short mode (RR) and long mode (RRImm16). Now, it is also possible to encode a short immediate in the short mode (RImm4, the register is replaced by a value), or use another register as a destination in the long mode (but 12 bits are unused).

Yes there are now 4 addressing modes and most code should feel their binary size shrink ! Furthermore, the datapath complexity is not impacted and the 3-registers version should reduce the number of cycles for a given portion of code.

How this affects usual code :

- add 1, r1 ==> r1 += 1

now takes 2 bytes instead of 4. The constant can range from -8 to +7.

- add r1, r2, r3 ==> r1 = r2 + r3

It takes 4 bytes as previously but it saves 1 clock cycle, compared to

- mov r2, r1
- add r3, r1

Note that the yasep.org site is not yet updated, I'll wait until things settle down.

Thursday 1 January 2009

Barrel Shifter : SHL16 ready

Hello and Happy New Year Everybody !

I took some time to work on the next major building block of the YASEP16 execution unit : the shift/rotate unit is now ready in 16-bit flavour.

I concentrate now on YASEP16 because it is smaller and marginally faster, and consumes less bandwidth. It can fit easily in the A3P250 and its 6K 3-input tiles, though i don't know how many tiles are needed in the end.

SHL_16 uses about 220 tiles, and Actel's place&route estimates the unit to run at 140MHz in pipelined version. This is slightly faster and smaller than ASU_ROP2 that performs Add/Sub and boolean operations (115 MHz and about 350 tiles). The overall ALU (ASU_ROP2 + SHL + IE) is going to take roughly 700 tiles, or 1/8th of the A3P250's surface. Speed is looking satisfying, as I intend to clock the thing at 96MHz on the ACME boards (64MHz * 1.5 with the PLL).

Overall, the following operations are ready for the 16-bit flavor :

  • ASU : ADD, SUB and compares as side effects.
  • ROP2 : AND/OR/XOR/NAND/NOR/XNOR/ANDN/ORN as well as comparison for equality (XOR followed by a OR reduction tree)
  • SHL : SHR/SHL/ROR/ROL/SAR

The next part to be developped is the IE (Insert/Extract) unit, for the load and stores of bytes into a half-word. Stay tuned...

''Note : some P&R runs give a bit higher working frequencies but I reserve 15 or 20% of margin, since I expect that all the units put together will need even more MUX2 all over the place, longer wires etc. resulting in slower operation.' Furthermore, it is only YASEP16 yet, and the 32-bit flavor will double the design's size... '

Friday 19 December 2008

How to double the SRAM capacity of a FPGA board ?

The FoxVHDL and the Colibri boards from ACME Systems come with 2 SRAM chips of 512K Bytes, so one application can benefit from one megabyte of 32-bit low-latency access. But even 1 megabyte may be too small for some uses. Some time ago, I found a way to extend the capacity : piggy-back soldering of another SRAM chip.

2 FoxVHDL FPGA boards from ACME Systems (modified by YG)

To keep the chips identical and avoid timing unbalance, I had to take the SRAMs from another board, but it is not a concern since this second board will be used for some purpose that does not need SRAMs.

I should stress of course that not only unsoldering, but also re-soldering is difficult, but it went well, thanks to special, adapted tools.

Of course, there is a trick : memory is not simply expanded this way. One has to reserve a new address bit, or both memories will be mapped to the same addresses. I have chosen to not connect the Chip Select pins of the additional chips, so they will be wired later to another unused FPGA output.

Two SRAM chips soldered on the footprint of one

If you want to attempt this hack on your board (whether ACME's or any of the other FPGA boards with static RAM), don't forget that adding pins on a bus adds capacitance, and slows the signals. The clock frequency won't be as fast as before, so make extensive tests to assert the new working parameters.

One way to keep the frequency high is either to use a larger SRAM chip (like Cypress or IDT 512x16b or 1Mx16 but they are difficult to find and expensive), or faster SRAMs : ACME uses 12ns chips, but other compatible chips are available with 10ns and 8ns access times (try Farnell). Also, you can control the rise/fall times with the I/O current options of the ProASIC3 pads, they can be set to several values.

Next step : using even higher-frequency, synchronous Static RAM, because they have a much higher bandwidth. However I don't know yet how to control the tight timings...

Thursday 18 December 2008

Site update, architecture modifications, and new FPGA boards

I recently got 3 colibri boards ! When you think about Italy, you think Ferrari and other excellent things, now I'll also think prototyping boards ;-)

Thanks to ACME systems, I bought 2* A3P250 and one A3P1000 boards for a friendly price. These are pre-series units and may slightly differ from later versions, but they are really as cool as the pictures let you think.

3 prototype Colibri boards from ACME Systems

The website is also updated : the JavaScript engine is now mostly functional for YASEP16 and YASEP32 versions. The documentation is not updated and many dark corners remain in the architecture definition. I have chosen to publish the latest versions, since I don't know when I'll do this next time.

- page 3 of 4 -