YASEP news

To content | To menu | To search

Friday 13 November 2009

Support of Alphanumeric LCD with YASEP

I have been very busy since august, unfortunately not with YASEP but I keep an eye on this project. Even though I can't dedicate days and weeks to this, I try to gather things here and there when they appear, like electronic parts, ideas, and ways to implement them.

For example I've been thinking about how to display informations with a simple FPGA kit.  I already have a nice collection of alphanumeric LCD modules that is expanding, so they are a good and cheap output peripheral.

From there, at least three things follow :

  1. The modules I own have different resolutions : from 1x8char to 4x20 but there is no electronic means to distinguish them from the others. So I recently imagined a method, discussed a bit about it on USENET and decided that it was worth implementing it. I am writing a RFC about this now.
  2. I'm going to add a set of Special Registers that support the parallel interface to a LCD module in nibble mode. This is going to provide automatic strobes, and ease application software development. This unit will also support readback of LCD resolution, supporting the protocol defined in 1. Contrast voltage is controlled by a simple PWM/PD circuit instead of a trimpot.
  3. While looking around for more informations about the HD44780-compatible modules, wikipedia sent me to a JavaScript HD44780 simulator designed years ago by Dincer Aydin. He has done even crazier things like a graphic LCD simulator or a PIC assembler in JavaScript ! I asked if I could reuse the alphanumeric code and Dincer kindly accepted :-D I have not looked at the source code but I presume that it's going to need a lot of work (particularly for updating the display engine, because updates are "optimised out" in Firefox). Anyway the YASEP simulator is not even mature enough so there is no hurry... 
Everything seems to be in place for a future use of alphanumeric LCD modules. I have more than 20pc available, I have already used some of them on a past PIC project, and the JavaScript framework will support them. I'm not saying it's going to be easy, but it's far easier than I thought !

Saturday 4 July 2009

Probable new features

When a project has practical uses and implications, it is interesting to see how it evolves and better fill de gaps that a purely theoretical design would address. For YASEP, the modifications have been very deep, while many of the neat original ideas remain. Lately, there have been a few new ideas that may or may not be implemented.

  • A new CRIT instruction :

This is a method to perform atomic instruction sequences. It opens a HW-garanteed CRITical section, that lasts a few and constant number of instructions (1 to 16 depending on the imm4 argument). After/before this, IRQs and other things are checked, to prevent the system from hanging because of back-to-back CRIT instructions...

  • External bus expansion with off-chip buffers

In the case where the number of FPGA pins is low, a lot of them are used by external SRAM. The address and data bus could be used to expand the I/O count, by adding a few 74LVC574 and 74LCV245. In this case, a few specific instructions are required because the GET and PUT instructions work only with internal resources. Another issue is the bus loading that might affect the timings and/or speed. The Inputs and outputs could be easily separated, the output latches can be tied to the address bus (because it is unidirectional) while the Input buffers can only be tied to the data bus. Voltage translation is also a desired feature.

  • CRC32 accelerator

As the need for a zlib port arises, the necessity to check CRC32 signatures becomes a problem. I have already designed CRC routines and... well... they can become quite heavy. OTOH, it is rather straight-forward to do in hardware. I don't want to make yet another instruction here because this would make the pipeline more complex (and the number of registers is already too small) but a small set of SR will do the trick.

  • DMA for SPI

SPI is used when booting the CPU from a SPI Flash memory, or when communicating with Ethernet or 2.4GHz interfaces. Adding a simple DMA capability would save a lot of cycles and latency.

Other things will certainly come later...

Monday 29 June 2009

YASEP@HSF2009

On June 26th, I have presented a joint project with Laura, called "GPL" (Gaming Platform Libre), at the HackerSpace Festival (HSF2009) near Paris. See http://www.hackerspace.net/gaming-platform-libre

This is a french talk, and the slides are here.

I present the latest thoughts about how cryptographic protection of contents could be compatible with the gamer's and the game editor's freedom and cooperation. Some slides also present the latest updates in the YASEP instruction set.

Friday 24 April 2009

First Layout of a custom FPGA+SRAM board

I have not been fully satisfied by all the boards that I have seen. There are always details that don't match a project or requirements that are not met (size, price, features, whatever). So I finally decided to start my own board(s).

Firt route of a TSOP-2 SRAM to a A3P125 FPGA in VQ100

It seems that YASEP could easily replace microcontrollers that I already use. The flexibility offered by FPGAs and the ability to strip a thing down to the minimum, then expand on that depending on the needs, makes this solution more and more attractive. No difficult selection of features and package (as with fixed-function chips), put the FPGA on the board and route the pins...

I can't solder BGA package, or even build suitable PCBs myself, but I'm already able to make double-sided PCBs that can be fitted with a FPGA in 100, 144 or 208 pin in QFP package. I'll be able to reuse these designs in the future, or make my own cheap modules.

Saturday 4 April 2009

First details of the new "extended" long instruction

A precedent post has summarised the available "instruction forms", with or without immediate field (4 or 16-bits), with 2, 3 or 4 register addresses. Here we look at the "long form" (32-bit) using the "extended" fields that add 2 register addresses, conditional (speculative) execution and pointer updates.

Let's now examine the structure of the 16 bits that are added to the basic instruction word :

  • One bit indicates if the source is Imm4 (it replaces the corresponding field in the basic instruction).
  • 2 bits indicate a condition (LSB, MSB, Zero, Always) and another bit negates the result (The condition "never" will be used later but I'm not sure how).
  • 4 bits indicate which register is being tested
  • 4 bits indicate the destination register (replacing the src/dest field in the basic instruction)
  • 2 fields of 2 bits each encode the auto-update functions of one source register and the destination register (nop, post-inc, post-dec, pre-dec)

These fields are mostly orthogonal and can work in almost any combination. One can auto-update 2 registers (whether they are normal or belong to a memory access register pair), perform a 3-address operation and enable write-back depending on 97 conditions. It also preserves the availability of short immediate values, which further reduces code size. However it can increase the core's complexity.

One unexpected bonus is that this new architecture iteration is more compiler-friendly. At least, it's much less awkward or embarassing.

One bit could have been saved : the imm4 flag could be merged in the auto-update field for a source register. However this increases the logic overhead and prevents simultaneous use of auto-update AND imm4.

Stay tuned...

Yet another Instruction Set Architecture change

I wish it could stabilize soon, but at least movement is a sign of activity (or the reverse :-))

I was annoyed by the ASU operations :

  ADD, SUB, ADDS1, SUBS1, ADDS2, SUBS2, MIN, MAX

These instructions were the last ones that used skip technique, since it is progressively dropped in favor of relative branches by conditional add/sub to the PC register.

How is it possible to provide the same functionality without skip ? It's the same old question that decades of research has not yet answered definitively. The Carry Flag is the obvious solution but I have just dropped the "status/mode register" in favor of another general purpose register. So where can I find a stupid bit of room ?

The answer is there under my eyes : the LSB of the PC ...

OK OK I know it's ugly. But consider these aspects :

  • The PC points to the next instruction and never uses the LSB because all the YASEP instructions are aligned on 2-bytes boundaries.
  • Any write to the PC register modifies the bits 1 to 31. Bit 0 comes from the ASU's carry output.
  • We can declare that only the ASU operations (or context changes) can change the PC's LSB. All the other instructions can read it and test it, so the informations is easily available.
  • Since we dropped the 4 instructions that used skip, these "slots" can be filled by other instructions :
 CMPS, CMPU, SMIN, SMAX

CMPx are just like SUB but don't write the result back. I wish it could set the LSB of any register but the current architecture doesn't allow this, so please keep the destination field to PC when encoding the assembly instruction.

3 new instructions deal with signed comparison : CMPS, SMIN & SMAX. They were missing from the previous opcode maps but the elimination of the skip-instructions leaves enough room. I have to update the VHDL now...

  • Keeping the carry bit in the LSB of the PC can have a curious side effect : relative jumps with odd values will make the carry bit ripple to the other bits of the result, so the destination address that is written in the PC will depend on the value of the carry bit. In practice, there is no speed or size advantage (compared to condition codes in the new opcode extension) but the possibility is there...
  • Clearing the carry flag is done with
  CMP Rx, Rx
  • Setting the carry flag is done with
  CMP -1, Rx

(or something like that)

Usually, I would end the post with something along the lines of "this is good and everybody is happy". Now, I feel a bit disapointed that YASEP looks more like other architectures, and has less distinguishing features. It is less groundbreaking and it will have to face the same problems as the others, on top of its inherent quirks. But it's still better than nothing and I do my best to keep the system rather coherent and orthogonal.

Thursday 19 March 2009

what about YASEP2009 ?

Development of and around YASEP is going on in a weird way, but it still continues...

Why so much caution ? Because the changes to the architecture are quite deep. The instructions forms are increasingly complex and I've pushed the design beyond what I intended in the beginning.

If you don't remember, YASEP had only two ways to address data previously :

short form :

 Reg1 OP Reg2 => Reg1  (16 bits)

long form :

  Reg1 OP Imm16 => Reg2 (32 bits)

Now a few bits are freed and this gives much more "flexibility", so I added :

Short Immediate :

  Reg1 OP Imm4 => Reg1 (16 bits)

Long Register :

  Reg1 OP Reg2 => Reg3 (32 bits)

And because there was still some room, this last form has more elaborate versions :

Long conditional :

  Reg1 OP Reg2 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

And other versions come up when the Reg2 field is interpreted as Imm4 :

Long conditional short Imm: (excuse the name)

  Reg1 OP Imm4 IF{NOT} Reg4{LSB/MSB/Zero/ready} => Reg3 (32 bits)

Or without condition :

  Reg1 OP Imm4 => Reg3 (32 bits)

This applies to the computation instructions, the control instructions are still too undefined yet.

Code density should increase, which is worth the efforts. I don't know if it will reach the level of ARM or x86 but it is certainly a major advance. However, this breaks a lot of the assembler's mechanisms, so I prefer to rewrite it. This takes a while because the rest must be adapted too : the Instruction Set, the manual pages, the validators...

If you can't stand the wait, have a look at a precent, broken version at http://yasep.org/~whygee/yasep2009/, at least it is more recent than the main site.

Wednesday 18 February 2009

Listed : the dynamic LISTing EDitor

So I've been busy again...

This time, it's all about JavaScript. The preliminary version is available from http://yasep.org/~whygee/listed/listed.html

What is it really ? It's an interactive assembler in dynamic HTML, loaded with JavaScript and CSS stuff. It's also an interface to the JavaScript assembler and the simulator.

  • The little windowing system allows one to break a whole program into small chunks, that are easier to manage. Assembly langage listings can easily get messy, but local symbols and hideable sections reduce the usual clutter on one's window/screen.
  • As the user edits each line, the modifications are committed to the rest of the page : the instructions are re-assembled, the labels are updated where they are used, the simulator can reinterpret the sequence and give preliminary results for given testcases...
  • The assembler is not limited to YASEP : the CPU interface is going to be generic, and LISTED could support any CPU that can be described in JavaScript (that means : all, provided enough adaptations are coded). A dummy, overly simple and dumb CPU architecture will be given as an example, so somebody can easily adapt it for x86, PIC, Alpha, MIPS, POWER, or RCA1802 ...
  • This is going to be linked directly with ARF, which is another graphic coding interface.

I have been working on this for more than 3 weeks and a lot of work still remains. I focus on user comfort and UI design but I keep flexibility and expandability in mind. For example, I have developped YGWM to handle the windowing part, which will be reused by the whole yasep.org website. The assembler and simulator will remain completely decoupled.

In the end, it only confirms what I believed for some time : JavaScript is a fantastic opportunity for really new ideas, it provides portability and rapid design. However, after trying to make it compatible with different browsers, my strong recommendation is : use Firefox and stick to it

Friday 23 January 2009

YASEP2009 : "It's gonna be big"... when it comes

The YASEP architecture has changed so much that a big rewrite is necessary.

My local copy is so... broken here and there that I prefer to not update yasep.org. The modifications are so deep that it's not possible to just patch a few things.

The organisation of the website should evolve a lot and I'm thinking about new techniques.

The documentation must be partially rewritten, not simply updated here and there.

Today's site structure dates back to 2006, maybe the big rewrite is a good thing in fact.

However, this is so much work, and my concentration is so volatile, that I wonder when the website will be updated with something stable enough to be almost publishable. In fact, I'd rather not wonder, the answer would scare me. Anyway, I see that many efforts I have done in the past years have been fruitful and helped build the project as it is now. So I keep faith and continue.

Monday 19 January 2009

Yet another new Actel toy \o/

As you may know, YASEP16 will probably be used in my girlfriend's "pet projet" Ours Agile. This involves lots of real-time computations, countless sensors and more than 30 actuators... Sure, YASEP could handle that, probably. But the interfacing was giving me headaches, so many analog components (on top of high-speed memory) seems expensive and/or difficult.

Then I spotted a second-hand AFS600 evaluation kit from Actel, that I got for a fair price. It was a bit risky and I first thought it was broken. But since it's 2nd hand, somebody has probably played with it, and just uploaded a new configuration bitstream. With the help of a french rep., I found and uploaded the original demo bitstream and ... Magic happens !

Actel AFS600 eval board plugged

This FPGA family comes at "premium price" but it's a damn great opportunity for robotics projects :

  • 512KB of program space as Flash EEPROM (no need to download from external SPI !)
  • onchip 100MHz RC clock generator (exactly what I'm aiming at !)
  • RTC, temperature sensors, low power...
  • high-speed 30-channel ADC !
  • several integrated MOSFET gate drivers
  • 13K tiles vs 6K on the A3P250
  • 24 SRAM blocks vs 8 on the A3P250

This is definitely a great toy for robots...

Tuesday 6 January 2009

Evolution of the instruction set

As the execution units mature and get integrated as one block, things become clear, at least concerning the computation instructions. I'm currently focusing on the 16-bit flavour of YASEP and I expect that the following will hold true for YASEP32.

The ALU16 is nearing completion, though feature creep is still rampant. But I have identified a bunch of instructions that will not change much in the future, and they are gathered here :

- ROP2 : AND, OR, XOR, ANDN, ORN, XNOR, NAND, NOR
- ASU : ADD, SUB, ADDS1, SUBB1, ADDS2, SUBS2, MIN, MAX
- SHL : SHR, SHL, ROR, ROL, SAR  + MUL : MUL8L, MUL8H, MULINIT
- IE : MOV, SB, LSB, LZB (16/32b) SH, SHH, LSH, LZH (32 bits only)

This nice and square table represents the large majority of the used instructions, and this fits into 4 groups of 8 instead of the planned 8 groups. So...

This saves a bit that is used to encode other addressing modes. In 2008, there were 2 modes : short mode (RR) and long mode (RRImm16). Now, it is also possible to encode a short immediate in the short mode (RImm4, the register is replaced by a value), or use another register as a destination in the long mode (but 12 bits are unused).

Yes there are now 4 addressing modes and most code should feel their binary size shrink ! Furthermore, the datapath complexity is not impacted and the 3-registers version should reduce the number of cycles for a given portion of code.

How this affects usual code :

- add 1, r1 ==> r1 += 1

now takes 2 bytes instead of 4. The constant can range from -8 to +7.

- add r1, r2, r3 ==> r1 = r2 + r3

It takes 4 bytes as previously but it saves 1 clock cycle, compared to

- mov r2, r1
- add r3, r1

Note that the yasep.org site is not yet updated, I'll wait until things settle down.

Thursday 1 January 2009

Barrel Shifter : SHL16 ready

Hello and Happy New Year Everybody !

I took some time to work on the next major building block of the YASEP16 execution unit : the shift/rotate unit is now ready in 16-bit flavour.

I concentrate now on YASEP16 because it is smaller and marginally faster, and consumes less bandwidth. It can fit easily in the A3P250 and its 6K 3-input tiles, though i don't know how many tiles are needed in the end.

SHL_16 uses about 220 tiles, and Actel's place&route estimates the unit to run at 140MHz in pipelined version. This is slightly faster and smaller than ASU_ROP2 that performs Add/Sub and boolean operations (115 MHz and about 350 tiles). The overall ALU (ASU_ROP2 + SHL + IE) is going to take roughly 700 tiles, or 1/8th of the A3P250's surface. Speed is looking satisfying, as I intend to clock the thing at 96MHz on the ACME boards (64MHz * 1.5 with the PLL).

Overall, the following operations are ready for the 16-bit flavor :

  • ASU : ADD, SUB and compares as side effects.
  • ROP2 : AND/OR/XOR/NAND/NOR/XNOR/ANDN/ORN as well as comparison for equality (XOR followed by a OR reduction tree)
  • SHL : SHR/SHL/ROR/ROL/SAR

The next part to be developped is the IE (Insert/Extract) unit, for the load and stores of bytes into a half-word. Stay tuned...

''Note : some P&R runs give a bit higher working frequencies but I reserve 15 or 20% of margin, since I expect that all the units put together will need even more MUX2 all over the place, longer wires etc. resulting in slower operation.' Furthermore, it is only YASEP16 yet, and the 32-bit flavor will double the design's size... '

Friday 19 December 2008

How to double the SRAM capacity of a FPGA board ?

The FoxVHDL and the Colibri boards from ACME Systems come with 2 SRAM chips of 512K Bytes, so one application can benefit from one megabyte of 32-bit low-latency access. But even 1 megabyte may be too small for some uses. Some time ago, I found a way to extend the capacity : piggy-back soldering of another SRAM chip.

2 FoxVHDL FPGA boards from ACME Systems (modified by YG)

To keep the chips identical and avoid timing unbalance, I had to take the SRAMs from another board, but it is not a concern since this second board will be used for some purpose that does not need SRAMs.

I should stress of course that not only unsoldering, but also re-soldering is difficult, but it went well, thanks to special, adapted tools.

Of course, there is a trick : memory is not simply expanded this way. One has to reserve a new address bit, or both memories will be mapped to the same addresses. I have chosen to not connect the Chip Select pins of the additional chips, so they will be wired later to another unused FPGA output.

Two SRAM chips soldered on the footprint of one

If you want to attempt this hack on your board (whether ACME's or any of the other FPGA boards with static RAM), don't forget that adding pins on a bus adds capacitance, and slows the signals. The clock frequency won't be as fast as before, so make extensive tests to assert the new working parameters.

One way to keep the frequency high is either to use a larger SRAM chip (like Cypress or IDT 512x16b or 1Mx16 but they are difficult to find and expensive), or faster SRAMs : ACME uses 12ns chips, but other compatible chips are available with 10ns and 8ns access times (try Farnell). Also, you can control the rise/fall times with the I/O current options of the ProASIC3 pads, they can be set to several values.

Next step : using even higher-frequency, synchronous Static RAM, because they have a much higher bandwidth. However I don't know yet how to control the tight timings...

Thursday 18 December 2008

Site update, architecture modifications, and new FPGA boards

I recently got 3 colibri boards ! When you think about Italy, you think Ferrari and other excellent things, now I'll also think prototyping boards ;-)

Thanks to ACME systems, I bought 2* A3P250 and one A3P1000 boards for a friendly price. These are pre-series units and may slightly differ from later versions, but they are really as cool as the pictures let you think.

3 prototype Colibri boards from ACME Systems

The website is also updated : the JavaScript engine is now mostly functional for YASEP16 and YASEP32 versions. The documentation is not updated and many dark corners remain in the architecture definition. I have chosen to publish the latest versions, since I don't know when I'll do this next time.

Thursday 11 December 2008

The new Colibri board is almost here !

I just received ACME System's invoice for some of their new Actel-based boards. So I went to their website and remark that it is updated : ACME Colibri board

It is lighter, better and slimmer than the FoxVHDL board, as it seems that the VGA output was rarely used. The optional composite encoder seems to have been even less used, and it used some space. So the Colibri should be a bit cheaper too :-)

I don't know when I'll get mine, but I ordered botht the A3P250 and A3P1000 version.

Monday 1 December 2008

YASEP is published under the AGPL

Recently, I have finally chosen the licence for the YASEP project : it's the Affero GPL as published by the Free Software Foundation.

It is practically the same as the GNU GPL but with one interesting twist : You have to provide all the (derived) source code if you use it on a server.

For the YASEP project, it's not a problem because all the "intelligence" is provided by client-side JavaScript code, and the rest is static or dynamic HTML (not server-generated pages). However, using the AGPL is a clear sign that YASEP is not just a bunch of RTL files packed with documentation pages. It is a living, dynamic, organic set of files that interact with each others...

Also, I would like that eventual contributors keep the structure of the files and directories, so the whole archive remains available to anyone visiting the sites. YASEP directly provides the link to the current archive on the main page, and I believe that this is a good thing that others will do in the future.

Concerning the VHDL source code : since the only difference between AGPL and GPL is the server clause, well, I distribute the RTL files with AGPL too. One licence to rule them all...

YASEP2009 in preparation

A new big update of the YASEP website package is under development. Several improvements are already done but not uploaded yet :

- I corrected a small "bug" with Opera with the floating window (thanks to Laura for the help !)

- I added several pages, about Special Registers, the AGPL licence, the differences between YASEP16 and YASEP32... And a new VHDL directory appeared.

- The opcodes are undergoing major changes, too many to explain here

- The architecture abandons the CQ register but the documentation is not yet updated...

I hope that the package will reach stability in early January. There's a lot of work to be done...

Friday 14 November 2008

Open Graphics board needs more preorders !

I got a nice contact with the Open Graphics project :

http://www.traversaltech.com/store.phtml

http://www.opengraphics.org

http://www.openhardwarefoundation.org

They need more than 40 preorders before then can do the first batch of boards ! With the biiiig FPGA on this, helped with the fat and fast SDRAM, it's not just a good candidate for a graphics card, it's a dream for CPU designers !! And look at those nice extension connectors...

It's going to cost roughly $1500. If I had the money, I'd buy it right away. However I'm broke AND the free Xilinx software tools don't seem to work for this large FPGA. Again, I'll have to do what I have done during years : wait...

Building momentum

A few days ago, I was contacted by a teacher from a french research laboratory (http://www.femto-st.fr/) who wants to integrate a softcore into a Xilinx FPGA. He tried to integrate a LEON into a 200k gates array without success. Knowing that I worked on F-CPU and now on something else, we started to talk. I had estimated that YASEP-16 could fit in one half of a 250kG Actel chip

Now it seems that his student is starting to dive into the whole mess that I've accumulated on http://yasep.org :-D It's very intriguing because I intended YASEP to be a 1-man project. I'll have to slowly give up on this idea... But this is good because it can only get better : external points of view can spot inconsistencies or weak points, test assertions that I thought valid...

It's getting quite interesting now and I am even more motivated and excited ! YASEP is slowly growing, it's not just a little personal hack anymore. Until now, I was alone on board, even when http://ours-agile.org asked for the core. But other people now look deeper at the source code...

Sunday 26 October 2008

No news, good news ?

So I've been busy.

I've spent most of the summer developping YASEP and now I'm almost broke, so I'm hunting more "mundane" activities (of the kind that will maybe help feed my geekette and myself).

Fortunately, before I blew up my savings, I was able to buy toys that will be useful "in the future", like those über-sexy 4Mx36bit synchronous static SRAMs that clock at 225MHz (unfortunately, available only in BGA, but I've grown up recently)...

But seriously, I'm stuck :-(

I would LOVE to spend all my time developping YASEP, because... well it's so easy now, and everything is coming together at last ! I would just think about it, hack a bit more and get new results... Furthermore, I receive some very positive feedbacks now, particularly from ACME systems. Their new FPGA board (called COLIBRI and equiped with a A3P1000) is almost ideal for a 32-bit YASEP ! I can even swap the 12ns SRAM with 8ns versions and get a boost from 66MHz to 100MHz...

As a side note, I have just found how to integrate the UMIN/UMAX instructions in the FPGA implementation, without bloating the pipeline.

So I'm OK but it could be much better, in a less material and pragmatic world :-/


PS : Not everything is bleak : after last year's breathtaking series of concerts, I jumped in Satine's new project Satine ünder philharmonëën' wich involves 40 classic musicians. I'll add LED lights all over the stage !

Look at the singer in this video : Mia wears a blinking LED earring that I designed especially for her last year. If you're interested by a custom version, don't hesitate to ask me !

I know, it's not related directly to CPU design, but the new jewel I'm preparing will have 8 cores...

- page 1 of 2