Every modern communication protocol uses one or more error detection algorithms. Cyclic Redundancy Check, or CRC, is by far the most popular one. CRC properties are defined by the generator polynomial length and coefficients. The protocol specification usually defines CRC in hex or polynomial notation. For example, CRC5 used in USB 2.0 protocol is represented as 0x5 in hex notation or as G(x)=x5+x2+1 in the polynomial. This CRC is implemented in hardware as a shift register as shown in the following picture.
The problem is that in many cases shift register implementation is suboptimal. It only allows the calculation of one bit every clock. If a design has 32-bit wide datapath, meaning that every clock CRC module has to calculate CRC on 32-bit of data, this scheme will not work. Somehow this serial shift register implementation has to be converted into a parallel N-bit wide circuit, where N is the design datapath width, so that every clock N bits are processed.
I started researching the available literature on parallel CRC calculation methods and found only a handful of papers ([2], [3]) that deal with this issue. Most sources are academic and focus on the theoretical aspect of the problem. They are too impractical to implement in software or hardware for a quick code generation.
I came up with the following scheme that I’ve used to build an online Parallel CRC Generator tool. Here is a description of the steps in which I make use USB CRC5 mentioned above.
(1) Let’s denote N=data width, M=CRC width. For example, if we want to generate parallel USB CRC5 for 4-bit datapath, N=4, M=5.
(2) Implement serial CRC generator routine using given polynomial or hex notation. It’s easy to do in any programming language or script: C, Java, Perl, Verilog, etc.
(3) Parallel CRC implementation is a function of N-bit data input as well as M-bit current state CRC, as shown in the above figure. We’re going to build two matrices: Mout (next state CRC) as a function of Min(current state CRC) when N=0 and Mout as a function of Nin when M=0.
(4) Using the routine from (2) calculate CRC for the N values when Min=0. Each value is one-hot encoded, that is there is only one bit set. For N=4 the values are 0x1, 0x2, 0x4, 0x8. Mout = F(Nin,Min=0)
(5) Build NxM matrix, Each row contains the results from (3) in increasing order. For example, 1’st row contains the result of input=0x1, 2’nd row is input=0x2, etc. The output is M-bit wide, which the desired CRC width. Here is the matrix for USB CRC5 with N=4.
(6) Each column in this matrix, and that’s the interesting part, represents an output bit Mout[i] as a function of Nin.
(7) Using the routine from (3) calculate CRC for the M values when Nin=0. Each value is one-hot encoded, that is there is only one bit set. For M=5 the values are 0x1, 0x2, 0x4, 0x8, 0x10. Mout = F(Nin=0,Min)
(8) Build MxM matrix, Each row contains the results from (7) in increasing order. Here is the matrix for USB CRC5 with N=4
(9) Now, build an equation for each Mout[i] bit: all Nin[j] and Min[k] bits in column [i] participate in the equation. The participating inputs are XORed together.
Mout[0] = Min[1]^Min[4]^Nin[0]^Nin[3]
Mout[1] = Min[2]^Nin[1]
Mout[2] = Min[1]^Min[3]^Min[4]^Nin[0]^Nin[2]^Nin[3]
Mout[3] = Min[2]^Min[4]^Nin[1]^Nin[3]
Mout[4] = Min[0]^Min[3]^Nin[2]
That is our parallel CRC.
I presume since the invention of the CRC algorithm more than 40 years ago, somebody has already came up with this approach. I just coulnd’t find it and “reinvented the wheel”.
Keep me posted if the CRC Generation tool works for you, or you need more clarifications on the algorithm.
[September 29th, 2010] Users frequently ask why their implementation of serial CRC doesn’t match the generated parallel CRC with the same polynomial. There are few reasons for that:
– bit order into serial CRC isn’t the same as how the data is fed into the parallel CRC
– input data bits are inverted
– LFSR is not initialized the same way. A lot of protocols initialize it with F-s, and that’s what is done in the parallel CRC.
References
- CRC on Wikipedia
- G. Campobello, G Patane, M Russo, “Parallel CRC Realization”
- W Lu, S. Wong, “A Fast CRC Update Implementation”