Home > Logic Design > Parallel CRC Generator

Parallel CRC Generator


Download a full version of this article


Every modern communication protocol uses one or more error detection algorithms. Cyclic Redundancy Check, or CRC, is by far the most popular one. CRC properties are defined by the generator polynomial length and coefficients. The protocol specification usually defines CRC in hex or polynomial notation. For example, CRC5 used in USB 2.0 protocol is represented as 0×5 in hex notation or as G(x)=x5+x2+1 in the polynomial. This CRC is implemented in hardware as a shift register as shown in the following picture.crc5

The problem is that in many cases shift register implementation is suboptimal. It only allows the calculation of one bit every clock. If a design has 32-bit wide datapath, meaning that every clock CRC module has to calculate CRC on 32-bit of data, this scheme will not work. Somehow this serial shift register implementation has to be converted into a parallel N-bit wide circuit, where N is the design datapath width, so that every clock N bits are processed.

I started researching the available literature on parallel CRC calculation methods and found only a handful of papers ([2], [3]) that deal with this issue. Most sources are academic and focus on the theoretical aspect of the problem. They are too impractical to implement in software or hardware for a quick code generation.

I came up with the following scheme that I’ve used to build an online Parallel CRC Generator tool. Here is a description of the steps in which I make use USB CRC5 mentioned above.    crc5-parallel

(1) Let’s denote N=data width, M=CRC width. For example, if we want to generate parallel USB CRC5 for 4-bit datapath, N=4, M=5.

(2) Implement serial CRC generator routine using given polynomial or hex notation. It’s easy to do in any programming language or script: C, Java, Perl, Verilog, etc.

(3) Parallel CRC implementation is a function of N-bit data input as well as M-bit current state CRC, as shown in the above figure. We’re going to build two matrices: Mout (next state CRC) as a function of Min(current state CRC) when N=0 and Mout as a function of Nin when M=0.

(4) Using the routine from (2) calculate CRC for the N values when Min=0. Each value is one-hot encoded, that is there is only one bit set. For N=4 the values are 0×1, 0×2, 0×4, 0×8.  Mout = F(Nin,Min=0)

(5) Build NxM matrix, Each row contains the results from (3) in increasing order. For example, 1’st row contains the result of input=0×1, 2′nd row is input=0×2, etc. The output is M-bit wide, which the desired CRC width. Here is the matrix for USB CRC5 with N=4.

matrix1

(6) Each column in this matrix, and that’s the interesting part, represents an output bit Mout[i] as a function of Nin.

(7) Using the routine from (3) calculate CRC for the M values when Nin=0. Each value is one-hot encoded, that is there is only one bit set. For M=5 the values are 0×1, 0×2, 0×4, 0×8, 0×10.  Mout = F(Nin=0,Min)

(8) Build MxM matrix, Each row contains the results from (7) in increasing order. Here is the matrix for USB CRC5 with N=4

matrix2(9) Now, build an equation for each Mout[i] bit: all Nin[j] and Min[k] bits in column [i] participate in the equation. The participating inputs are XORed together.

Mout[0] = Min[1]^Min[4]^Nin[0]^Nin[3]

Mout[1] = Min[2]^Nin[1]

Mout[2] = Min[1]^Min[3]^Min[4]^Nin[0]^Nin[2]^Nin[3]

Mout[3] = Min[2]^Min[4]^Nin[1]^Nin[3]

Mout[4] = Min[0]^Min[3]^Nin[2]

That is our parallel CRC.

I presume since the invention of the CRC algorithm more than 40 years ago, somebody has already came up with this approach. I just coulnd’t find it and “reinvented the wheel”.

Keep me posted if the CRC Generation tool works for you, or you need more clarifications on the algorithm.


[September 29th, 2010] Users frequently ask why their implementation of serial CRC doesn’t match the generated parallel CRC with the same polynomial. There are few reasons for that:
– bit order into serial CRC isn’t the same as how the data is fed into the parallel CRC
– input data bits are inverted
– LFSR is not initialized the same way. A lot of protocols initialize it with F-s, and that’s what is done in the parallel CRC.


Download a full version of this article


Error Control Coding


References

  1. CRC on Wikipedia
  2. G. Campobello, G Patane, M Russo, “Parallel CRC Realization”
  3. W Lu, S. Wong, “A Fast CRC Update Implementation”






  1. Serjik
    March 7th, 2012 at 02:26 | #1

    Hi Evgeni,
    Thank you for your good website and also good tools, keep it and grow it,
    Dear Sir, i want to use “CRC32 for 802.3″ in my application, communication between PC and FPGA(in Verilog),
    i use your CRC generator from “http://outputlogic.com/?page_id=321″ and choose “CRC32 for 802.3″, then i simulate it with this commands:

    `timescale 1ns / 1ns

    ////////////////////////////////////////////////////////
    //////////////// \ START TEST BENCH/ \\\\\\\\\\\\\\\\
    ////////////////////////////////////////////////////////

    module TB__CRC32();

    //———————————————————-
    crc U_004__CRC32(
    .data_in(Data_in),
    .crc_en(Crc_en),
    .crc_out(Crc_out),
    .rst(rst_n),
    .clk(clk)
    );

    wire [31:0]Crc_out;
    reg Crc_en;
    reg [7:0]Data_in;
    //———————————————————-
    reg clk;
    reg rst_n;

    initial // Initialize all variables
    begin: TEST_CASE

    rst_n = 0;
    clk = 0;
    Crc_en = 0;//disable
    Data_in = 8′h41;//”A”

    #10 @(negedge clk);
    rst_n = 1;
    #10 @(negedge clk);

    rst_n = 0;
    #10 @(negedge clk);

    Data_in = 8′h41;//”A”
    Crc_en = 0;//disable

    #10 @(negedge clk);
    Crc_en = 1;//enable
    #1 @(negedge clk);
    Crc_en = 0;//disable

    #50 @(negedge clk);

    #3000 @(negedge clk);
    $finish; // Terminate simulation
    end

    always #1 clk = ~clk;

    endmodule
    ////////////////////////////////////////////////////////
    ////////////////// \ END TEST BENCH/ \\\\\\\\\\\\\\\\
    ////////////////////////////////////////////////////////

    OK, after this simulation and Data_in = 8′h41;//”A”
    , Crc_out is: 32′h7e4fd274

    >>>>BUT WHEN TEST IN PC SIDE

    **1) i have my CRC32 module in C++ and my program
    gets 8bit unsigned char data type
    it calculate CRC32 with this polynomial
    **0×04C11DB7 is the official polynomial used by PKZip, WinZip and Ethernet.
    **x^32 + x^26 + x^23 + x^22 + x^16 + x^12 +
    x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
    >>>>>>> i get for (character A) 8′h41 hex value this CRC32 value: 32′hd3d99e8b <<<<<<<<>>>>>> i get for (character A) 8′h41 hex value this CRC32 value: 32′hD3D99E8B <<<<<<<<>>>>>> i get for (character A) 8′h41 hex value this CRC32 value: 32′hd3d99e8b <<<<<<<<>verilog simulation say’s 32′h7e4fd274<>other calculations
    say 32′hD3D99E8B<<, can you help me what is wrong here?
    sorry if my question is very simple, but it is a big problem for me,

    Thanks in advance,
    Best Regards Serjik.

  2. Tennirva
    March 7th, 2012 at 04:01 | #2

    Hello Evgeni, I’m posting this message to check if it will be shown immediately. Because I already posted a message before.

    Tennirva

  3. Fabien_782
    March 12th, 2012 at 07:43 | #3

    Hi everyone,

    i’m currently working on an hardware implementation of crc32 algorithm using the 2-step method, but it seems that the functions given in this paper by A. Simionescu: “CRC Tool: Computing CRC in Parallel for Ethernet,” Nobug Consulting,
    are not doing fine.

    If i’m not missunderstanding, here is he thing:

    Using a polynomial’s degree of 123, each intermediate values of CRC calculated on each clock cycles with incoming words are on 123 bits. the initial value of CRC must be (others=>’0′) using this methode. The huge advantage of this way to compute CRC is that on last word, you don’t have to pay attention on empty bytes, because either you have 0 or 1 or 2 … empty bytes, if your word is filled with ‘0′ where it’s empty, your final CRC value will be the same.
    => Much less logical elements used.

    But, what is weird with functions given on this article, is that the final value of CRC isn’t generated properly.

    Does anyone implemented these functions and did it worked fine on first try ?

    Thanks in advance,
    Fabien

  4. Fabien_782
    March 12th, 2012 at 08:48 | #4

    Here are the functions with issues:

    function [31:0] next_crc32_data122;
    input [122:0] inp;
    integer i;
    begin
    for(i=0; i<123; i=i+1)
    next_crc32_data122 = next_crc32_data1(32’h00000000, inp[122-i]);
    end
    endfunction

    the function called in the loop is:

    function [31:0] next_crc32_data1; //remainder of M(x)*x^32/P(x)
    input [31:0] crc;
    // previous CRC value
    input B;
    // input data bit (MSB first)
    begin
    next_crc32_data1 = {crc[30:0], 1’b0} ^ ({32{(crc[31]^B)}} &
    32’b00000100110000010001110110110111);
    end
    endfunction

    The first function with the loop may contain an error in it…

    Regards,
    Fabien

  5. Serjik
    April 10th, 2012 at 09:55 | #5

    Hi, if any body needs “CRC32 for 802.3″ with equal results whith this sites crc32 generator,

    http://crc32-checksum.waraxe.us/
    or
    http://www.lammertbies.nl/comm/info/crc-calculation.html

    can use crc_chk.v and crc_gen.v files at this opensource project:
    https://opencores.org/project,ethernet_tri_mode

    Regards Serjik.

  6. DG
    April 22nd, 2012 at 09:54 | #6

    hi ,site is really great especially for beginner like me.can any one tell how the reg value lfsr_q linked with crc_out and also what abt its initial value.
    thanks
    DG

  7. April 22nd, 2012 at 19:57 | #7

    Hi,

    LFSR value is the same as CRC. Its initial value depends on the protocol.

    Thanks,
    Evgeni

  8. Rean
    October 8th, 2012 at 19:40 | #8

    Hi,

    The article is really useful. Thanks alot.

    A little question. Why do we use ONE HOT values for the two matrices input ?

    Regards

  9. October 9th, 2012 at 08:47 | #9

    Because it enables any combination of the input values.

    Thanks,
    Evgeni

  10. Rean
    October 22nd, 2012 at 23:16 | #10

    Thanks for reply — (I am nearly half way to the correct understanding)

    Does it means that — it mimics the behaviour of LFSR in a sense that : There is only one input to a CRC and it matters only when it is SET (=1) — & since it is a shift register it shifts towards the output.

    @Evgeni

Comment pages
1 2 3 158
  1. No trackbacks yet.