YASEP news

To content | To menu | To search

Monday 28 November 2011

A YASEP assembler in C by DeforaOS

Today, khorben from DeforaOS, sent me a surprise : this screenshot !

defora disassembler screenshot

He is implementing his assembler/disassembler in C for his operating system project. A graphic interface is also available, among the many features in development ! In parallel, I implement some features in the YGWM interface that synthesise and export the relevant informations needed by his assembler. In the end we'll both have the tools to create a full working and autonomous system :-)

Thanks again for the screenshot !

Tuesday 8 November 2011

Register Parking

As the YASEP architecture specifies, there are 5 normal registers (R1-R5) and 5 pairs of data/address registers  (A1-D1, A2-D2...) and it's quite difficult to find the right balance between both : each application and approach requires a different optimal number of registers.

When more registers are needed (if you need R6 or R7) then you could assign them to D1 and D2 for example. However you have to set A1 and A2 to a safe location otherwise chaos could propagate in the software. Another issue is that each write to the A registers will update the memory. A similar situation appears if we use the Ax registers as normal registers : each write will trigger a memory read. And in paged/protected memory systems, this would kill the TLB...

This is now "solved" with today's system, which defines hardwired "parking" addresses and internal behaviour (this is still preliminary but looking promising).

  • "Parking" addresses are defined as "negative" addresses (that is : all the MSB are set to 1). This addressing range, at the "top" of the memory space, is normally not used, or used for special purposes, such as "fast constants" addressed by the short immediate values :
    MOV -7, A3 ; mem[-7] contains a constant or a scratch value,
    MOV D3,... ; the address fits in 3 bits
  • To keep the "parking" system compatible with non-parked versions, the addresses are defined globally for all software. They are easy to remember, as the following code shows :
    ; Park all the registers
    MOV -1, A1
    MOV -2, A2
    MOV -3, A3
    MOV -4, A4
    MOV -5, A5
    These will become macros or pseudo-instructions.
  • The internal numbering of the registers is changed to ease hardware implementation. There is a direct match between the binary register number and the binary code of the address (bits 1 to 3) :

    park address  binary    reg.bin       reg.number   register
          -1             1111       1111              15              A1
          -2             1110       1101              13              A2
          -3             1101       1011              11              A3
          -4             1100       1001                9              A4
          -5             1011       0111                7              A5
  • Architecturally, it does not change much. The Data registers are "cached" by the register set. What the hardware parking system adds is just an inhibition of the "data write" signal that would occur normally each time the core writes to a D register.
  • Aliasing : No alias detection is expected. If A4/D4 writes to -2, D2 is not updated. Otherwise it would mean that the result bus could write to 5 registers in parallel, which is not reasonable.
  • Thread backup and restoration : the register set contains the cached version of the memory, it must be refreshed when a thread is restored (swapped in). If the Ax register matches a parked address, the memory doesn't need to be fetched to refresh the cache. Another solution is to save the Dx register through another Ax/Dx, so there is nothing to test during restoration (but memory read cycles could not be spared).
  • This sytem where the "parking" is defined by an auxiliary value (that is inherently preserved through context switches) is "cleaner" than a more radical approach where "status bits" (one per A/D pair) park the registers. The advantage of the radical approach is that two registers can be parked at once (instead of one) but it gets harder to use with a compiler or from user software (you can play with pointers in C or Pascal easily, though you won't be able to define which pair is used). On top of that, adding status/control bits is usually a nightmare
In the end, it's not very complex (not as much as it seems). The hardware price is a few logic gates that detect the parking addresses to inhibit memory writes. For the software writer, it just means more registers on demand and it will work whether the YASEP has the parking hardware or not. You CAN have R6, R7 or R8 but then you'll have to restrict data access and give up A1/D1, A2/D2 and A3/D3. You make the choice !

Sunday 25 September 2011

The YASEP and Defora

Today I think that one big issue with the YASEP project has been solved.

I met Pierre this week, and I start to discover the awesomeness of his Defora project. "Debian For All" turned into creating a whole new, compact, totally GPLv3 system. With almost no dependency from existing systems, yet compatible with them... Perfect for embedded computing too !

We just started to work on a C version of the existing JS assembler and we consider writing a C99-compliant compiler.

YES, you have read it : the YASEP will have binaries generated from C code ! And good code, at that, since it does not go through GCC !

Many roadblocks are now removed. When the code generation tools are in place, we can then simulate/emulate the core and start to write a microkernel...

What this means for me is that I can finally stop worrying about the operating system and application layer. The YASEP will not use Linux and I won't be forced to use the huge GCC armada. I will also have more time to focus on the hardware architecture and implementation. And Pierre is a security specialist...

Oh, by the way : YGWM won the 2nd rank (ex aequo with Pierre's Defora) at the Open World Forum Code Contest this week. A new, shiny, professional laptop was given away by HP and will become my main workstation. Going from an Atom to a Core i5 makes me feel spoiled :-)

Wednesday 7 September 2011

YASEP2011

Development is still happening, at a slow pace (due to work duties) but nothing is forsaken.

I'm still working toward a cheap Actel board that can be easily replicated and cheaply fabricated, and the professional projects might bring some interesting results.

On another front, I resumed work on YGWM and extended the functionalities. You can even test the results at http://ygdes.com/~whygee/yasep2011/ and the whole website will be reimplemented with this new paradigm. No more tabs ! Everything in one browser window with a huge virtual desk !


Sunday 8 May 2011

This little Least Significant Bit

(update : 2011/05/11)

I've been wondering since march of this year if the Least Significant Bit (LSB) of the Next Instruction Pointer (NIP or NPC) could be better used than now.

The YASEP instructions are 16-bits aligned and the instruction addresses have their LSB cleared by convention. This bit is usually wasted in word-aligned byte-oriented computer architectures.

In the current YASEP architecture, this LSB holds the carry flag of ADD/SUB operations. It is the only status flag that I couldn't get rid of with the usual architectural tricks. As a reminder, instructions can check 3 conditions : register is cleared, has its LSB cleared (odd/even) or MSB (sign) cleared. Every condition can be negated and a 4th condition serves as "always" or "reserved" case. Reading the LSB and MSB is easy, checking for a cleared register is more costly. In some implementation, the register set has "shadow" bits with precomputed/cached "register is clear" bits. But otherwise, no dirty trick is employed.

The Carry bit is less easy to handle : it's a dynamic result that can't be reconstructed from the 16 or 32 bits of the registers. It is not possible to restore it after a thread switch. It can't be added to the "condition cache" because it will have to be saved and restored (16 more bits to save ? Bleh...)

Here come the latest changes :

  • The carry bit is now "hidden", not available from the register set for computations (that would make other things more difficult). It exists as a bit that can only be tested via a specific condition code in the conditional instruction forms (certainly one that tests NIP).
  • The LSB of NIP is always cleared. However, when saving/restoring the state in memory, it will hold the carry bit. This is the only case when the two functions (carry and pointer) are mixed.
  • Writing a "1" to the LSB of NIP (other than for saving/restoring the state) triggers a trap. There are several uses :
  1. Breakpointing / tracing / debugging : inject a "1" in the LSB and you can see where the pointer is used.
  2. Safety : for example if the stack is corrupted, there is a chance that the LSB will be set and trigger the trap
In future iterations, this bit could be used for something else more pertinent (such as a second instruction memory bank selector) so it must be carefully handled by programers now.

Wednesday 8 December 2010

ACTUINO day 1

Yesterday, while talking with Jeff about our respective and converging goals, a new idea came.

Today, actuino.org is registered. The website will appear later, one day, but the name is found and secured while we work toward the new milestone of an electronic board that is DIY-friendly, very powerful, affordable and paving the way for developing the YASEP.

The one big issue for me though is that I'll have to cope with the Atmel architecture, which I don't "speak" so any help is appreciated :-)




Saturday 20 November 2010

Fast and secure InterProcess Communications

(post version : 20110108)

(update : 20110515 : environment inheritance)

Recently (2010/11/20) I found the critical elements that solve a crucial problem that the Hurd team submitted to me in ... 2002. It took time and many attempts but I think that the YASEP is a great place to experiment with this idea and prove its worth.

The Hurd uses a lot of processes to separate functions, enforce security and modularize the operating system. It uses "Inter Process Communication" (IPC) such as message passing and this is snail slow on x86 and most other architectures.

The YASEP uses hardware threads which is a concept close, but not identical, to the processes of an operating system. And these last days I have found what was missing : the "execution context" ! So with the YASEP, a process is a hardware thread (a set of registers and special registers) associated to an execution context (the memory mapping, the access rights etc.)

Repeat after me : a process is a thread in a context.

This distinction is necessary because threads are activated for handling interrupts, operating system functions, library function calls and communication between the programs. It's a major feature of the processor which should provide functionalities that go beyond a mere microcontroller...

So IPC is necessary to make a decent OS and it requires several hardware threads (threads can be interleaved at the hardware level to provide with concurrency and better performance) and several contexts (for the operating system, device drivers, libraries, interrupt handlers...). The processor state can jump at will from one to the other with much less latency than an usual CPU.

The antagonistic requirements are as follows :

  1. A process must be able to call code from another context FAST, as fast as possible.
  2. The mechanism must be totally SAFE and SECURE.
  3. The physical implementation must be SIMPLE.

Simple and fast go hand in hand (ask Seymour Cray. Oh, wait, too late...). In the YASEP, communication takes place with a restricted variant of the function call instruction. Function calls are difficult to "harden" and more generic and specific instructions are usually found in other architectures to provide IPC or system calls. These are quite simple to implement in a CISC architecture like x86 because microcodes can do whatever is required... But they are slow because several dependent memory fetches must be performed (read the access rights table then find the address of the code to execute, whatever...)

The YASEP is a RISC-inspired architecture and requires a new approach. What I have found requires just 3 new opcodes :
  1. IPC : InterProcess Call
  2. IPE : InterProcess Call Entry
  3. IPR : InterProcess Call Return
Since the YASEP has a bank of several threads in the register set, the context switch is a matter of a few cycles only. One way to further reduce the execution time is to pre-calculate the destination address of the called code : no call table or things that require several chained/dependent memory accesses. In order to obtain the jump address, a thread must register itself in the called process and obtain the context number and the effective address. The calling thread can then modify its own code (update the constants) or variables to make the proper IPC later. Here is how simple it gets :
     IPC R1, R2    ; call context number R2 at address R1
IPC 1234h, R2 ; call context number R2 at immediate address 1234
Security is a bigger beast and just changing the TID (Thread ID) value is not a good method. The first big problem is that any code can call any context at ANY address and a security mechanism is required to block unwanted calls from succeeding. The policies could be arbitrarily complex (depending on the OS strategies) and don't belong in hardware (unlike x86), a software-based authorisation system is preferred (like MIPS !). This is the role of the IPE instruction :
  1. IPE provides the Thread ID and Process ID of the calling thread (it's a kind of GET). From this, the callee can choose to accept or refuse the call, provide a specific service or even choose to not check at all. Any software can create its own policy, call by call !
  2. IPE is NECESSARY for the IPC instruction to complete. If IPC points to an instruction that is NOT IPE, an error is triggered. This prevents all applications from jumping anywhere in any code.
  3. Each thread can restrict the range of callable addresses so calls can't enter data sections. This is the role of additional registers.
When the thread calls code from another context at the right address, the register set is preserved (not touched) so the transmission of parameters takes no effort. However several new issues appear.

For example, how can one thread in a different context access data from the previous context ? The proposed solution is to provide an attribute to each Address register : the context number. Upon call, the newly spawned process will modify the necessary attributes to access to both the current and the calling process. Which means that all the previous contexts must be kept in the processor (since interthread calls must be reentrant). Before the call, the calling process should mark the memory ranges it accepts to share with the called process (marking the range as "shareable"). This way, no data copy is necessary !

The return address and thread/process/context IDs must be managed by the CPU core itself to prevent tampering by the caller or callee. This is the last point that needs some big work and HW real estate ... A classic stack, with a stack pointer, stack base and stack limit, are necessary hardware resources to add.

So let's sum up the added hardware :
  • Each context must be able to mark memory ranges as data-read and/or data-write by other threads. This can be indicated by flags for each page in the page table. How this can be restricted to certain threads (that are in the call stack) is still uncertain, a token scheme should be created where a permission can be passed to (and inherited from) another thread.
  • Each context has 2 registers that are compared to the called address to restrict unwanted calls.
  • Each process has a set of 3 registers for the IPC stack (pointer, base, limit). Pointer and limit are compared for equality upon call and pointer and base are compared during return.
  • There are also 5 new thread-private registers that determine the owner (thread number) of a pointer. They must be preserved in HW if the caller or callee are not trusting or trusted.
That makes about 10 new registers ! How this will be implemented is still uncertain. Maybe a hardcoded sequence of instructions will be streamed through the instruction decoder, unless everything is done in parallel in big enough chips. This reminds me that in the past, I wanted to add "attributes" to the address generators of the VSP, with base/index/limit/stride, now there is the context number that is some kind of "address space number" (ASN). We can finally merge these ideas and in 16 bits code, we can use ASNs like segments in x86 : one for executable data, one for the stack, several for data, and no opcode prefix is needed.

Whatever the implementation, we're going here from a system initially designed for libraries and system calls, extended to the next level : a micro-kernel oriented architecture where processes can share memory they own so others can work on it, with little overhead. Will the Hurd people be finally happy now ?

Saturday 8 May 2010

YASEP2010


The main YASEP site has not been updated for a while...
Worse : the f-cpu.seul.org miror is down since january !

Is the project dead ?

No :-)

In fact a lot of things are being prepared, mostly in the commercial, infrastructure and very-low-level hardware (like : where are those 0402 capacitors ?) fronts. It's really exciting but it takes a lot of time and money ! Fortunately I'm not completely alone.

A string of good news will probably come in 2011, they will help the bootstrap of the whole YASEP project with different kinds of support, with broad public exposure. It will be possible to have a YASEP implementation in hand, I work both on the hardware and software sides :-) BTW, a recent Wikipedia article has appeared with a short summary of the YASEP's architecture.

Another critical part of the project (the VHDL source code and its infrastructure) is in active development : GHDL is now the officially supported simulator. I have interviewed the main developer (Tristan Gingold) for GNU/Linux Magazine France n°127. With Laura, we started a series of articles about VHDL development under Linux and I am proposing increasingly advanced ... hacks :-) The first YASEP implementations will be designed with "design for test" in mind.

In parallel, another subproject is the design of a Libre, affordable, compact and Ethernet-enabled JTAG programming probe. More on this subject in the future, but it's critical for the rest of the whole project : my JTAG probes are either USB (and constrained to Actel parts, and don't work under Linux) or parallel-port (no new consumer-grade computer today has this port anymore).

Finally, after the seul.org debacle (due to main server being compromised because of its participation in the tor network), I have opened a new miror at TuxFamily.

So I'm still polishing the tools and gathering the parts. It's not a visible activity but it's probably the most important. What does an architecture mean if there is no infrastructure behind ? With no physical implementation that one can buy and hack oneself ?

Friday 13 November 2009

Support of Alphanumeric LCD with YASEP

I have been very busy since august, unfortunately not with YASEP but I keep an eye on this project. Even though I can't dedicate days and weeks to this, I try to gather things here and there when they appear, like electronic parts, ideas, and ways to implement them.

For example I've been thinking about how to display informations with a simple FPGA kit.  I already have a nice collection of alphanumeric LCD modules that is expanding, so they are a good and cheap output peripheral.

From there, at least three things follow :

  1. The modules I own have different resolutions : from 1x8char to 4x20 but there is no electronic means to distinguish them from the others. So I recently imagined a method, discussed a bit about it on USENET and decided that it was worth implementing it. I am writing a RFC about this now.
  2. I'm going to add a set of Special Registers that support the parallel interface to a LCD module in nibble mode. This is going to provide automatic strobes, and ease application software development. This unit will also support readback of LCD resolution, supporting the protocol defined in 1. Contrast voltage is controlled by a simple PWM/PD circuit instead of a trimpot.
  3. While looking around for more informations about the HD44780-compatible modules, wikipedia sent me to a JavaScript HD44780 simulator designed years ago by Dincer Aydin. He has done even crazier things like a graphic LCD simulator or a PIC assembler in JavaScript ! I asked if I could reuse the alphanumeric code and Dincer kindly accepted :-D I have not looked at the source code but I presume that it's going to need a lot of work (particularly for updating the display engine, because updates are "optimised out" in Firefox). Anyway the YASEP simulator is not even mature enough so there is no hurry... 
Everything seems to be in place for a future use of alphanumeric LCD modules. I have more than 20pc available, I have already used some of them on a past PIC project, and the JavaScript framework will support them. I'm not saying it's going to be easy, but it's far easier than I thought !

Friday 2 October 2009

When you connect the power supply, it works...

As one can guess from the past messages on this "*log", I have been slowly preparing custom FPGA boards as a background activity. It's not an easy thing and can be quite expensive. So I patiently gathered the necessary parts through online stores and eBay, looking for interesting deals.

Finally I have all the necessary parts for a cheap and repeatable prototype. Among others :

* A bunch of A3P250VQG100 : I got them from a really nice Canadian guy and I'll use this specific reference as the main target for the future works. I originally intended to target the A3P125 but I got more powerful for less money so why refuse ? :-D The A3P250 has enough logic for moderately complex stuff (though SRAM is really TIGHT) and can replace microcontrollers in many cases.

* QFP100 adapter boards : FUTURLEC has cheap and good proto boards. The tin makes soldering easy, just add some liquid flux, no need for aditional solder.

With the help of Actel's docs and the schematics of other boards, including ACME's Colibri, I easily wired the power rails. The board is not recommended for high-speed signals but the goal is only to check the schematics for more ambitious boards (probably manufactured through FUTURLEC too, as their PCB pooling service looks great).

I created a small dumb VHDL design (8-bit counter with clock, reset and increment/decrement inputs) backed by a small board. The additional board also provides 3.3V from a battery, so I could avoid long wires from power supplies.

And in order to be programmed, the FPGA needs a JTAG interface. I soldered everything correctly but the JTAG/USB interface would refuse to work. After a small nap and many hypothesis, the problem was obvious : the JTAG signals were correctly wired but I forgot to wire the power supplies :-/ Obviously, when it's fixed, the things work considerably better... I'm amazed that it is the only error, considering my sleep deprivation :-D

First Actel proto board \o/

No, really, it just works as expected. I may have finally become good at this, after all the failures and false starts of the past :-D It even reproduces the strange behaviour that I had seen in other designs : the pins are REALLY sensitive ! Don't forget the pull-down's ! I made a basic/passive anti-bounce (just a RC filter) but it is useless : a single clock push creates many strobes and the counter advances unpredictably. But I half-expected it and I did not even register the inputs in VHDL so it is naturally glitch-prone, so I don't care. It "works".

What does this mean ?

* When enough resources are gathered, complex things become easier. I have invested a lot of money and time in the past years just to get to that point and ... it feels good !

* Great things that were "possible" now become "available" for future projects. This includes YASEP and other (commercial ?) designs.

* FPGA are damn cool ! Actel's chips are certainly slower and less capable than other makers but their products make this little board possible and easy : once the power supplies and the JTAG are (hum, correctly) wired, the board can be plugged in other cheap prototypes. No need of external Flash chip, bootstrapping EPLD, or whatever...

Next in line : a parallel port interface (hooked to a computer) then the SRAM chips :-D Then I'll try to develop an embedded CPU design with Ethernet that could replace the Rabbit, PIC and AVR. Finally, I wish to create an Ethernet-based JTAG programmer that will replace the proprietary and USB-bound FlashPro3. This proprietary probe is not extremely expensive (Actel has wisely created even cheaper versions) but USB is such an annoyance !

Sunday 23 August 2009

Back from vacations...

The lack of Internet access during 2 weeks of vacations was a very good thing for the YASEP, the development was stimulated and efficient !

I should mention that the environment helped being in a great mood, if you don't count all the insects. Have a look at this picture or this video if you wonder what it's like to develop in VHDL in the country, under a wonderful tree and sitting next to the tent. BTW, thanks Toshiba for the extra-life-battery pack for the Portégé 3490, I could work about 5 hours in a row but it recharges very slowly.

I did a lot of cleanup, completed some pages, integrated the first extended instructions and re-enabled the disassembler. I also examined the multiply instructions and created an algorithm that initialises the multiply lookup-tables ! I also added an algorithm that generates random opcode examples, instead of the fixed strings of before. It's more efficient at finding bugs !

Before I upload the new site, I still have to change some fields and remove the _X forms (as they are useless now, because the "always" condition has the same effect).

I'm also working in parallel on the VHDL source code. I'm adding a CRC32 unit mapped in the SRs so communications and files will have better and faster checks. Unfortunately, I lost a few days of work in a defunct hard disk...

Stay tuned !

edit :

The site is updated, enjoy !

I also recovered the few days of work locked in one of the computers, the disk is not completely dead (it's just dead slow so a Slackware LiveCD is necessary)

The next steps are : website minification, VHDL code development,  further development of listed, pointer update, short jump/call instructions...

I'm also looking at compression/decompression algorithms such as deflate and range coding.

Friday 24 July 2009

YASEP en français

Grâce au concours de Laura, une partie des pages web du site YASEP est en cours de traduction en français. Pour l'instant, ont été intégrées les pages suivantes : l'index, les registres, les instructions, la carte interactive des opcodes et YASEP16/32. D'autres pages devraient suivre, j'attends que Laura soumette d'autres pages.

Au début du projet, j'avais décidé de tout faire uniquement en anglais. Mon expérience m'a montré que le support de plusieurs formats ou langues différentes augmente la charge de travail, donc réduit le temps passé à créer des choses utiles. De plus, il y a toujours une version qui est à la traine et cela rend le projet incohérent, vu de l'extérieur. On a alors tendance à ne plus se référer qu'à la version "principale" (en anglais) et la version traduite sombre dans l'inutilité.

Ce coup-ci, il est bien clair que la version "officielle" du projet est en anglais. La traduction française sera probablement en retard sur un nombre inconnu de points, à mesure que le temps passe. Mais la démarche de traduction apporte plusieurs avantages :

* D'abord, j'ai tendance à écrire en anglais de manière absconse et à la fin je suis le seul à comprendre ce que j'ai écrit. La traduction me confronte à mes mauvaises manies et m'oblige à reformuler mes phrases, pour les rendre plus claires. C'est en accord avec mon exigence d'accessibilité, d'autant plus que la traductrice, Laura, est moins bonne en anglais et en technique que moi, et je voudrais être compris par des personnes encore plus débutantes.

* Ensuite, Laura est plus proche et plus exigente que les collaborateurs précédents. J'en attends une meilleure qualité et un meilleur suivi.

* Aussi, avoir deux versions d'une même page web force à séparer la présentation, le contenu et les scripts : c'est la nécessité de modularité et de non-redondance qui deviennent importants.

En plus, cela me permet de revoir et donc améliorer les pages originales, d'y faire du tri...

Comme d'habitude, je suis intéressé par toute remarque constructive pour améliorer le site.

Saturday 4 July 2009

Probable new features

When a project has practical uses and implications, it is interesting to see how it evolves and better fill de gaps that a purely theoretical design would address. For YASEP, the modifications have been very deep, while many of the neat original ideas remain. Lately, there have been a few new ideas that may or may not be implemented.

  • A new CRIT instruction :

This is a method to perform atomic instruction sequences. It opens a HW-garanteed CRITical section, that lasts a few and constant number of instructions (1 to 16 depending on the imm4 argument). After/before this, IRQs and other things are checked, to prevent the system from hanging because of back-to-back CRIT instructions...

  • External bus expansion with off-chip buffers

In the case where the number of FPGA pins is low, a lot of them are used by external SRAM. The address and data bus could be used to expand the I/O count, by adding a few 74LVC574 and 74LCV245. In this case, a few specific instructions are required because the GET and PUT instructions work only with internal resources. Another issue is the bus loading that might affect the timings and/or speed. The Inputs and outputs could be easily separated, the output latches can be tied to the address bus (because it is unidirectional) while the Input buffers can only be tied to the data bus. Voltage translation is also a desired feature.

  • CRC32 accelerator

As the need for a zlib port arises, the necessity to check CRC32 signatures becomes a problem. I have already designed CRC routines and... well... they can become quite heavy. OTOH, it is rather straight-forward to do in hardware. I don't want to make yet another instruction here because this would make the pipeline more complex (and the number of registers is already too small) but a small set of SR will do the trick.

  • DMA for SPI

SPI is used when booting the CPU from a SPI Flash memory, or when communicating with Ethernet or 2.4GHz interfaces. Adding a simple DMA capability would save a lot of cycles and latency.

Other things will certainly come later...

Monday 29 June 2009

YASEP@HSF2009

On June 26th, I have presented a joint project with Laura, called "GPL" (Gaming Platform Libre), at the HackerSpace Festival (HSF2009) near Paris. See http://www.hackerspace.net/gaming-platform-libre

This is a french talk, and the slides are here.

I present the latest thoughts about how cryptographic protection of contents could be compatible with the gamer's and the game editor's freedom and cooperation. Some slides also present the latest updates in the YASEP instruction set.

Friday 24 April 2009

First Layout of a custom FPGA+SRAM board

I have not been fully satisfied by all the boards that I have seen. There are always details that don't match a project or requirements that are not met (size, price, features, whatever). So I finally decided to start my own board(s).

Firt route of a TSOP-2 SRAM to a A3P125 FPGA in VQ100

It seems that YASEP could easily replace microcontrollers that I already use. The flexibility offered by FPGAs and the ability to strip a thing down to the minimum, then expand on that depending on the needs, makes this solution more and more attractive. No difficult selection of features and package (as with fixed-function chips), put the FPGA on the board and route the pins...

I can't solder BGA package, or even build suitable PCBs myself, but I'm already able to make double-sided PCBs that can be fitted with a FPGA in 100, 144 or 208 pin in QFP package. I'll be able to reuse these designs in the future, or make my own cheap modules.

Saturday 4 April 2009

First details of the new "extended" long instruction

A precedent post has summarised the available "instruction forms", with or without immediate field (4 or 16-bits), with 2, 3 or 4 register addresses. Here we look at the "long form" (32-bit) using the "extended" fields that add 2 register addresses, conditional (speculative) execution and pointer updates.

Let's now examine the structure of the 16 bits that are added to the basic instruction word :

  • One bit indicates if the source is Imm4 (it replaces the corresponding field in the basic instruction).
  • 2 bits indicate a condition (LSB, MSB, Zero, Always) and another bit negates the result (The condition "never" will be used later but I'm not sure how).
  • 4 bits indicate which register is being tested
  • 4 bits indicate the destination register (replacing the src/dest field in the basic instruction)
  • 2 fields of 2 bits each encode the auto-update functions of one source register and the destination register (nop, post-inc, post-dec, pre-dec)

These fields are mostly orthogonal and can work in almost any combination. One can auto-update 2 registers (whether they are normal or belong to a memory access register pair), perform a 3-address operation and enable write-back depending on 97 conditions. It also preserves the availability of short immediate values, which further reduces code size. However it can increase the core's complexity.

One unexpected bonus is that this new architecture iteration is more compiler-friendly. At least, it's much less awkward or embarassing.

One bit could have been saved : the imm4 flag could be merged in the auto-update field for a source register. However this increases the logic overhead and prevents simultaneous use of auto-update AND imm4.

Stay tuned...

Yet another Instruction Set Architecture change

I wish it could stabilize soon, but at least movement is a sign of activity (or the reverse :-))

I was annoyed by the ASU operations :

  ADD, SUB, ADDS1, SUBS1, ADDS2, SUBS2, MIN, MAX

These instructions were the last ones that used skip technique, since it is progressively dropped in favor of relative branches by conditional add/sub to the PC register.

How is it possible to provide the same functionality without skip ? It's the same old question that decades of research has not yet answered definitively. The Carry Flag is the obvious solution but I have just dropped the "status/mode register" in favor of another general purpose register. So where can I find a stupid bit of room ?

The answer is there under my eyes : the LSB of the PC ...

OK OK I know it's ugly. But consider these aspects :

  • The PC points to the next instruction and never uses the LSB because all the YASEP instructions are aligned on 2-bytes boundaries.
  • Any write to the PC register modifies the bits 1 to 31. Bit 0 comes from the ASU's carry output.
  • We can declare that only the ASU operations (or context changes) can change the PC's LSB. All the other instructions can read it and test it, so the informations is easily available.
  • Since we dropped the 4 instructions that used skip, these "slots" can be filled by other instructions :
 CMPS, CMPU, SMIN, SMAX

CMPx are just like SUB but don't write the result back. I wish it could set the LSB of any register but the current architecture doesn't allow this, so please keep the destination field to PC when encoding the assembly instruction.

3 new instructions deal with signed comparison : CMPS, SMIN & SMAX. They were missing from the previous opcode maps but the elimination of the skip-instructions leaves enough room. I have to update the VHDL now...

  • Keeping the carry bit in the LSB of the PC can have a curious side effect : relative jumps with odd values will make the carry bit ripple to the other bits of the result, so the destination address that is written in the PC will depend on the value of the carry bit. In practice, there is no speed or size advantage (compared to condition codes in the new opcode extension) but the possibility is there...
  • Clearing the carry flag is done with
  CMP Rx, Rx
  • Setting the carry flag is done with
  CMP -1, Rx

(or something like that)

Usually, I would end the post with something along the lines of "this is good and everybody is happy". Now, I feel a bit disapointed that YASEP looks more like other architectures, and has less distinguishing features. It is less groundbreaking and it will have to face the same problems as the others, on top of its inherent quirks. But it's still better than nothing and I do my best to keep the system rather coherent and orthogonal.

Thursday 19 March 2009

what about YASEP2009 ?

Development of and around YASEP is going on in a weird way, but it still continues...

Why so much caution ? Because the changes to the architecture are quite deep. The instructions forms are increasingly complex and I've pushed the design beyond what I intended in the beginning.

If you don't remember, YASEP had only two ways to address data previously :

short form :

 Reg1 OP Reg2 => Reg1  (16 bits)

long form :

  Reg1 OP Imm16 => Reg2 (32 bits)

Now a few bits are freed and this gives much more "flexibility", so I added :

Short Immediate :

  Reg1 OP Imm4 => Reg1 (16 bits)

Long Register :

  Reg1 OP Reg2 => Reg3 (32 bits)

And because there was still some room, this last form has more elaborate versions :

Long conditional :

  Reg1 OP Reg2 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

And other versions come up when the Reg2 field is interpreted as Imm4 :

Long conditional short Imm: (excuse the name)

  Reg1 OP Imm4 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

Or without condition :

  Reg1 OP Imm4 => Reg3 (32 bits)

This applies to the computation instructions, the control instructions are still too undefined yet.

Code density should increase, which is worth the efforts. I don't know if it will reach the level of ARM or x86 but it is certainly a major advance. However, this breaks a lot of the assembler's mechanisms, so I prefer to rewrite it. This takes a while because the rest must be adapted too : the Instruction Set, the manual pages, the validators...

If you can't stand the wait, have a look at a precent, broken version at http://yasep.org/~whygee/yasep2009/, at least it is more recent than the main site.

Wednesday 18 February 2009

Listed : the dynamic LISTing EDitor

So I've been busy again...

This time, it's all about JavaScript. The preliminary version is available from http://yasep.org/~whygee/listed/listed.html

What is it really ? It's an interactive assembler in dynamic HTML, loaded with JavaScript and CSS stuff. It's also an interface to the JavaScript assembler and the simulator.

  • The little windowing system allows one to break a whole program into small chunks, that are easier to manage. Assembly langage listings can easily get messy, but local symbols and hideable sections reduce the usual clutter on one's window/screen.
  • As the user edits each line, the modifications are committed to the rest of the page : the instructions are re-assembled, the labels are updated where they are used, the simulator can reinterpret the sequence and give preliminary results for given testcases...
  • The assembler is not limited to YASEP : the CPU interface is going to be generic, and LISTED could support any CPU that can be described in JavaScript (that means : all, provided enough adaptations are coded). A dummy, overly simple and dumb CPU architecture will be given as an example, so somebody can easily adapt it for x86, PIC, Alpha, MIPS, POWER, or RCA1802 ...
  • This is going to be linked directly with ARF, which is another graphic coding interface.

I have been working on this for more than 3 weeks and a lot of work still remains. I focus on user comfort and UI design but I keep flexibility and expandability in mind. For example, I have developped YGWM to handle the windowing part, which will be reused by the whole yasep.org website. The assembler and simulator will remain completely decoupled.

In the end, it only confirms what I believed for some time : JavaScript is a fantastic opportunity for really new ideas, it provides portability and rapid design. However, after trying to make it compatible with different browsers, my strong recommendation is : use Firefox and stick to it

Friday 23 January 2009

YASEP2009 : "It's gonna be big"... when it comes

The YASEP architecture has changed so much that a big rewrite is necessary.

My local copy is so... broken here and there that I prefer to not update yasep.org. The modifications are so deep that it's not possible to just patch a few things.

The organisation of the website should evolve a lot and I'm thinking about new techniques.

The documentation must be partially rewritten, not simply updated here and there.

Today's site structure dates back to 2006, maybe the big rewrite is a good thing in fact.

However, this is so much work, and my concentration is so volatile, that I wonder when the website will be updated with something stable enough to be almost publishable. In fact, I'd rather not wonder, the answer would scare me. Anyway, I see that many efforts I have done in the past years have been fruitful and helped build the project as it is now. So I keep faith and continue.

- page 1 of 2