OutputLogic.com

LFSR Counters – Part 3

May 11th, 2009 Evgeni 4 comments

Here is how the LFSR Counter Generator works:

(1) Specify counter value, e.g. 200. It’s 8 bits, so the tool selects 8-bit LFSR with polynomial coefficients taken from the table in [1].

(2) Reset LFSR to 0, run a loop that shifts the LFSR 200 times. Then latch its value (LFSR_COUNT_VAL).

(3) Use that 8-bit LFSR and LFSR_COUNT_VAL to generate a Verilog code. When the LFSR hits LFSR_COUNT_VAL, it counted 200.

This approach is working because the polynomial selected in (1) has a maximum-length property. That is it generates a sequence of unique values from 0 to 2ⁿ-1.

I synthesized a 32-bit LFSR counter for Xilinx Virtex5 chip and compared its size with a regular 32-bit counter.

Here are the results:

Module	Slices	Regs	LUTs
regular_counter	17	32	44
lfsr_counter	10	32	7

References

Peter Alfke, Efficient Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators, Xilinx application note Xapp052

Tags: fpga, lfsr counter, verilog, vhdl

Because generating the code for LFSR counter is a computation- and memory-intensive operation, and it’s running on a server, the server usually times out after the counter value exceeds ~22 bits. I rearchitected the tool in such a way that if requested counter size is greater than 20 bit, it’s sent to a server in chunks ot 20 bits. To implement that I was using a standard AJAX approach: XmlHTPRequest and callback. That also allowed me to put a progress bar – I used jsProgressBarHandler from Bram.us.

Still, it’s a quite slow operation, so I limited the LFSR Counter size to 31 bit for practical purposes. There is no fundamental problem with that. It can be as large as 168 bit, it’d just take forever to complete.

I also created a stand-alone application that can generate large LFSR counters faster. Download it from SourceForge.

Tags: fpga, lfsr counter, verilog, vhdl

Parallel CRC Generator

May 5th, 2009 Evgeni 218 comments

Download a full version of this article

Every modern communication protocol uses one or more error detection algorithms. Cyclic Redundancy Check, or CRC, is by far the most popular one. CRC properties are defined by the generator polynomial length and coefficients. The protocol specification usually defines CRC in hex or polynomial notation. For example, CRC5 used in USB 2.0 protocol is represented as 0x5 in hex notation or as G(x)=x⁵+x²+1 in the polynomial. This CRC is implemented in hardware as a shift register as shown in the following picture. crc5

The problem is that in many cases shift register implementation is suboptimal. It only allows the calculation of one bit every clock. If a design has 32-bit wide datapath, meaning that every clock CRC module has to calculate CRC on 32-bit of data, this scheme will not work. Somehow this serial shift register implementation has to be converted into a parallel N-bit wide circuit, where N is the design datapath width, so that every clock N bits are processed.

I started researching the available literature on parallel CRC calculation methods and found only a handful of papers ([2], [3]) that deal with this issue. Most sources are academic and focus on the theoretical aspect of the problem. They are too impractical to implement in software or hardware for a quick code generation.

I came up with the following scheme that I’ve used to build an online Parallel CRC Generator tool. Here is a description of the steps in which I make use USB CRC5 mentioned above. crc5-parallel

(1) Let’s denote N=data width, M=CRC width. For example, if we want to generate parallel USB CRC5 for 4-bit datapath, N=4, M=5.

(2) Implement serial CRC generator routine using given polynomial or hex notation. It’s easy to do in any programming language or script: C, Java, Perl, Verilog, etc.

(3) Parallel CRC implementation is a function of N-bit data input as well as M-bit current state CRC, as shown in the above figure. We’re going to build two matrices: Mout (next state CRC) as a function of Min(current state CRC) when N=0 and Mout as a function of Nin when M=0.

(4) Using the routine from (2) calculate CRC for the N values when Min=0. Each value is one-hot encoded, that is there is only one bit set. For N=4 the values are 0x1, 0x2, 0x4, 0x8. Mout = F(Nin,Min=0)

(5) Build NxM matrix, Each row contains the results from (3) in increasing order. For example, 1’st row contains the result of input=0x1, 2’nd row is input=0x2, etc. The output is M-bit wide, which the desired CRC width. Here is the matrix for USB CRC5 with N=4.

matrix1

(6) Each column in this matrix, and that’s the interesting part, represents an output bit Mout[i] as a function of Nin.

(7) Using the routine from (3) calculate CRC for the M values when Nin=0. Each value is one-hot encoded, that is there is only one bit set. For M=5 the values are 0x1, 0x2, 0x4, 0x8, 0x10. Mout = F(Nin=0,Min)

(8) Build MxM matrix, Each row contains the results from (7) in increasing order. Here is the matrix for USB CRC5 with N=4

matrix2 (9) Now, build an equation for each Mout[i] bit: all Nin[j] and Min[k] bits in column [i] participate in the equation. The participating inputs are XORed together.

Mout[0] = Min[1]^Min[4]^Nin[0]^Nin[3]

Mout[1] = Min[2]^Nin[1]

Mout[2] = Min[1]^Min[3]^Min[4]^Nin[0]^Nin[2]^Nin[3]

Mout[3] = Min[2]^Min[4]^Nin[1]^Nin[3]

Mout[4] = Min[0]^Min[3]^Nin[2]

That is our parallel CRC.

I presume since the invention of the CRC algorithm more than 40 years ago, somebody has already came up with this approach. I just coulnd’t find it and “reinvented the wheel”.

Keep me posted if the CRC Generation tool works for you, or you need more clarifications on the algorithm.

[September 29th, 2010] Users frequently ask why their implementation of serial CRC doesn’t match the generated parallel CRC with the same polynomial. There are few reasons for that:
– bit order into serial CRC isn’t the same as how the data is fed into the parallel CRC
– input data bits are inverted
– LFSR is not initialized the same way. A lot of protocols initialize it with F-s, and that’s what is done in the parallel CRC.

Download a full version of this article

References

CRC on Wikipedia
G. Campobello, G Patane, M Russo, “Parallel CRC Realization”
W Lu, S. Wong, “A Fast CRC Update Implementation”

Tags: parallel crc, verilog, vhdl, Xilinx

LFSR Counters

May 4th, 2009 Evgeni 37 comments

Most of the EE or CS graduates know or at least have heard about different types of hardware counters: prescaled, Johnson, ripple carry, linear feedback shift register (LFSR), and others.
The majority of logic designers use the first two types, because they’re simple to implement in Verilog or VHDL. However, for some applications LFSR counters offer a significant advantage in terms of logic utilization and maximum frequency.
The other day I run into Xilinx LFSR Counter core and decided to explore its advantages. I was so impressed with its area saving comparing with regular counters that I decided to write an online tool that generates a Verilog code for an LFSR counter of an arbitrary value.
This LFSR Counter Generator tool is running on the server. The time it takes to generate the code depends exponentially on the counter size. It takes several seconds to generate a 20-bit counter. But bigger counters cause the server to timeout with the current tool implementation.
I’m planning to tweak the implementation to be able to generate counters up to ~30 bits. More than that would take too long no matter what approach is taken.

Please post you comments about the experience with the tool, features you’d like to add, and the issues you’ve seen.

References:

Peter Alfke, Efficient Shift Registers, LFSR Counters, and Long Pseudo-Random Sequence Generators,
Xilinx application note Xapp052
Maria George and Peter Alfke, Linear Feedback Shift Registers in Virtex Devices, Xilinx application note Xapp210
Xilinx Linear Feedback Shift Register (LFSR) Logic Core

Tags: lfsr counter, verilog, vhdl, Xilinx

Welcome to OutputLogic.com

May 2nd, 2009 Evgeni 7 comments

Juggling between my full time job as a logic designer and my personal life I found some time to bootstrap OutputLogic.com. I wanted to share some of the tools and ideas I’ve accumulated over the years and now it’s a good time to do so.

An important design choice that I’ve made was to have those tools entirely web-based. In my opinion, it’s so much more convenient and user-friendly than having lots of scattered applications, written in different languages, for different operating systems, and with inconsistent user interfaces.

It looks like a web browser is becoming the focal point of the user interaction with the computer. More and more high-quality web application are showing up and getting a widespread adoption. Just to mention a few: Gmail, Google Maps, Google Documents. Just a few years ago you’d use a standalone application for sending an email, finding a direction, or creating a spreadsheet.

But writing a decent web-based application requires a set of very specialized skills that takes time to acquire and master. It’s no more just cranking an HTML code and peppering it with some JavaScript. One needs to be familiar with half a dozen scripting languages, several application frameworks, databases. For lots of people it is a full-time job.
Nevertheless, another design choice was to develop everything myself. I knew it’d be quite a challenge, but that’s exactly what makes the process so fun.

I don’t have much experience doing web design, which means that implementing things that I want and how I want takes a lot more time than it should. Things like getting around JavaScript quirks, figuring out why GCI doesn’t work, debugging Perl scripts, reverse-engineering piles of PHP code, finding the right framework for the site and development tools to work with, and many others. That’s the disadvantage: the learning curve.

Sometimes it’s hard to implement a low-level feature with a high-level language, which is not designed for that task. On the other hand, it’s often so much easier to design a piece of user interface to be displayed in a browser with just a few lines of script rather than writing it almost from scratch in Java, C++, or whatever language is used for standalone applications.

In any case, that was a short introduction. Thanks for taking you time reading this post. Your valuable input to the OutputLogic.com is greatly appreciated.

Tags: fpga, lfsr counter, parallel crc generator, parallel scrambler generator, verilog, vhdl

Archive

LFSR Counters – Part 3

LFSR Counters – Part 2

Parallel CRC Generator

LFSR Counters

Welcome to OutputLogic.com

Posts