CRC parallel calculation
I have implemented a verilog code for CRC8 parallel with help of your document “A Practical Parallel CRC Generation Method”. For this many thanks.

I do have a question regarding CRC polynomial: How can I insert the CRC polynomial (0x 107) as a parameter in my design? Is it possible?

Hi Evgeni,
I read you paper and understood the general idea of CRC parallel calculation, but not in depth. However, is it possible to implement the CRC parallel calculation using purely combinational logic (XOR arrary)? That is, if I load in the data and I can get the CRC checksum instantly. In the code generated by the website, it looks like the CRC of next block of data will depend on the CRC of previous block of data, which is not reasonable to me.
Thanks for any help.
Kevin

This is the idea behind parallel CRC. The disadvantage of the parallel CRC is that it requires more XOR gates and more levels of logic as the data goes wider.
It’s possible to implement, for example, 4096-bit CRC, and calculate it instantly on 4096-bit data. But in practice it’s implemented as 32-bit data (can be other value as well), and so it’d take 4096/32=128 clocks. That’s why there are registers on the output of the CRC to store intermediate results.

@Evgeni
OK, thanks. I am only using 8-13 bit data, so area is not a concern. If I want it calculated instantly, I just need to follow the method you described in paper and ignore the regs, right? In this case, I should just need to fix the signals which are used to be previous CRC state to all ‘1’s, right?

@Evgeni
Thank you so much. I think I figured out the problem. Initialization with ‘1′ or ‘0′ may lead to different CRC checksum, but both of them are compliant to the same polynominal. In the standard arithmetic computation, we are used to putting ‘0′ when doing the long division, that’s the only difference in practical design.

Hello. Your arcticle about CRC generator is very interesting. But replacement “for … loop” algorithm with “XOR” have advantage only for bad synthesizer like Xilinx. I use Altera Quartus for your example and two methods have the same result.
Denis

Hi Evgeni
Thanks very much for your article in outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf
I found it very helpful in starting to grasp the concepts behind the design of parallel CRC generation.
I did notice that maybe there is a typo on page 41 where you present a set of equations for Mout. I think the first line which is:
Mout[0]=Min[1]^Min[1]^Min[0]^Min[3]
should be
Mout[0]=Min[1]^Min[1]^Nin[0]^Nin[3]

Has anyone tried the CRC for the CoaxPress standard? It is a 32bit CRC based on the IEEE_802.3. But, as noted in the standard, the input bits are reversed, the output crc is reversed, etc, etc. They give an example message packet in the standard and its computed CRC, but I can’t for the life of me get their CRC in their example. The standard is here and the example is on page 35 in the Comment. As they mention in the body of the text, the 8B/10B codes of K27.7 are treated as the D27.7 which is hex FB. They don’t exactly say what the input data length is, but I have tried 8bit, 16bit and 32bit inputs to no avail. I have tried negating the output crc….no good. If anyone has had success at this, please let me know. Thanks.

Evgeni,
I’m not sure what you mean. I have been trying to recreate the example in the coaxpress standard so that I know my crc calculation code is correct. They give an example message with its calculated crc. It seems simple enough. I just can’t seem to recreate it.

Hi. Great site and really useful.
I’m trying to generate a scrambler for 100GE. I am using a 640 bit bus w/ a 57 bit LSFR 1+x^38+x^57. I am getting the following error when it completes:

Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

The time it takes to generate scrambler code grows quadratically with the number of data bits. Therefore, it’s quite possible that server times out. There is a command-line utility to generate CRC with wide data to overcome this kind of a problem. But unfortunately, I haven’t made public such a utility to generate scrambler code.

In scrambler when we are giving input as all 0’s.then the generator giving output as all 0’s. but as per the scrambler defination long sequences can be changed with 0’s and 1’s.

But in above case it is not effecting with all 0’s input.

Can you please explain why it is not changing output with all 0’s input.

Thanks for your response,
As per your comments i tried with scrambler initial value with all 1’s and and scrambler input is all 0’s. then also its giving all 0’s output.because it it xor’ed with x^39+x^58(1 xor 1 =0) ,so 0(taps xor’ed) xor 0(input) output is 0. so it is not effecting by initial value(if the initial value like 10101010..seguence then it will effect).

And iam using self synchronous scrambler(10gbase-R IEEE 802.3-2012_section4.pdf 49.2.6 scrambler)

Seeding with all-1’s is just one option to implement a scrambler. Ethernet specification you’re referring to implements it differently to avoid the problem with long 0 sequences. The specification requires new scrambler seed for each packet. In addition, the data is NRZ and differentially encoded.

Hi Evgeni, I want to understand how I can design a circuit, and variants of it with different polynomials, similar to the one your software produced ( scrambler module for data[31:0], lfsr[22:0]=1+x^2+x^5+x^8+x^16+x^21+x^23;); what paper or textbook would be the place to find the information? I want to understand the details of the math and algorithms that your software implemented, so that I can understnad and design the circuits myself.

Hi Evgeni,
I was going through your paper “A Practical Parallel CRC Generation Method” at http://outputlogic.com/my-stuff/parallel_crc_generator_whitepaper.pdf
and saw there is a typo in equation
Mout[0]= Min[1] ^ Min[4] ^ Min[0] ^ Min[3] at page 43
instead the equation should come out to be
Mout[0]= Min[1] ^ Min[4] ^ Nin[0] ^ Nin[3]

Thank you so much for the wonderful tool. It’s really useful.

I generated a verilog code for 16-bit CRC with polynomial X^16 + X^12 + X^5 + 1. The input data width is 16-bit. It seems that, in the verilog code, the input data is XORed with shift register bit by bit first, then the shift register shifts for 16 time, resulting the new shift register values for next cycle. Is this the standard generation process? I also see some methods that the input data is fed into the shift register bit by bit during the shifting operation. The results from two methods are different. I get confused which one I should use. If I use one method, and the receiver uses another, will it work correctly?

Hi Evgeni,
I just implemented your code to calculate CRC-6. I had the following question : Since the Parallel CRC generation method uses a serial CRC generator(called recursively) to calculate the H1 and H2 tables values which it then uses to come up with a refined parallel implementation, if I wanted to calculate the CRC register with the initial values set to all 0’s instead of 1’s as we are doing, can I just do so by setting the initial lfsr value to all 0’s during reset or would i need to calculate the table values again aswell and then come up with a XOR implementation using that?

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please. Thanks in advance

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please.

Many thanks for your work on this topic – but I need you to check it for me.

I put your generated (VHDL) in SIMULINK for the polynomial X^7 + X^6 + 1 (which is used on SONET SDH) for 4 bit data,
I take it with additive (side) PRBS scramblers, the scrambler and the descrambler are the same circuit, unlike in multiplicative scramblers where the data is fed in and out at opposite ends of the delay chain.

In which case I put a scrambler in series with a descrambler, i.e. 2x the same implementation of your VHDL
but only 1 bit worked…the other 3 remained scrambled.

I can implement and try the VHDL, but that will take me a bit longer, I just need a create a test-bench to drive it.
I can send you a picture of the SIMULINK Circuit I constructed as an email attachment for you to see what I have done.
or even the SIMULINK .SLX file in case you have SIMULINK or maybe a free SIMULINK viewer ?
It appears to be quite simple in SIMULINK to construct a circuit mode from your VHDL, (i.e. I think my circuit is right)
which is why I am wondering how these parallel forms of the polynomial are generated….?
Is there a book or paper or standard maths that describes changing the LFSR to a parallel implantation.

I don’t understand how you generate the terms for a parallel scrambler architecture,
it doesn’t seem to match what I see in a text=book “high speed serdes devices and applications”, page 142.
Actually yours looks more correct, theirs looks over simplified.

IF you engage with me on this, I will buy your book and our company could offer you some consultancy.
We are using MATLAB/SIMULINK for a software-Defined-Radio project. the Scrambler block is not supported
for auto HDL-coding I simlink, so we’re creating the scrambler as a circuits of XORs and D types that will code.

I can get additive (Fibonacci) and Galois forms working for bit-serial LFSRs,
but so far, no joy in turning things into parallel.

Also, I don’t quite know why I can’t just split the input stream (say 1 byte wide) into 8 bit-serial scramblers
and the recombine it in the same that way post the descramblers.
It perhaps uses more hardware resource, but overall, not much hardware resource, either way…????
If I did use multiple bit-serial scramblers, my XOR gates reduce in fan out and depth,
allowing faster operation..

Best Regards,
Mike Brewin
Design Engineer, ECS, England.

Hi Dmitry,

Yes, there is a typo.

Swapping bits in 0×6522df69 will get 0×9add2096

Swizzling all bits in 0×9add2096 will get 0×6904bb59

Thanks,

Evgeni

CRC parallel calculation

I have implemented a verilog code for CRC8 parallel with help of your document “A Practical Parallel CRC Generation Method”. For this many thanks.

I do have a question regarding CRC polynomial: How can I insert the CRC polynomial (0x 107) as a parameter in my design? Is it possible?

Thank you

Jamil

@Jamil

Hi Jamil,

One way to parametrize CRC polynomial is to do bitwise logic AND with the taps, for example, using “for” loop in Verilog.

Thanks,

Evgeni

Hi Evgeni,

I read you paper and understood the general idea of CRC parallel calculation, but not in depth. However, is it possible to implement the CRC parallel calculation using purely combinational logic (XOR arrary)? That is, if I load in the data and I can get the CRC checksum instantly. In the code generated by the website, it looks like the CRC of next block of data will depend on the CRC of previous block of data, which is not reasonable to me.

Thanks for any help.

Kevin

Hi Kevin,

This is the idea behind parallel CRC. The disadvantage of the parallel CRC is that it requires more XOR gates and more levels of logic as the data goes wider.

It’s possible to implement, for example, 4096-bit CRC, and calculate it instantly on 4096-bit data. But in practice it’s implemented as 32-bit data (can be other value as well), and so it’d take 4096/32=128 clocks. That’s why there are registers on the output of the CRC to store intermediate results.

Thanks,

Evgeni

@Evgeni

OK, thanks. I am only using 8-13 bit data, so area is not a concern. If I want it calculated instantly, I just need to follow the method you described in paper and ignore the regs, right? In this case, I should just need to fix the signals which are used to be previous CRC state to all ‘1’s, right?

Hi Kevin,

That’s right. Most of the CRC specifications use ‘1’s to initialize the CRC. But strictly speaking, it doesn’t need to be the case.

Thanks,

Evgeni

@Evgeni

Thank you so much. I think I figured out the problem. Initialization with ‘1′ or ‘0′ may lead to different CRC checksum, but both of them are compliant to the same polynominal. In the standard arithmetic computation, we are used to putting ‘0′ when doing the long division, that’s the only difference in practical design.

Hello. Your arcticle about CRC generator is very interesting. But replacement “for … loop” algorithm with “XOR” have advantage only for bad synthesizer like Xilinx. I use Altera Quartus for your example and two methods have the same result.

Denis

Hi Evgeni

Thanks very much for your article in outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf

I found it very helpful in starting to grasp the concepts behind the design of parallel CRC generation.

I did notice that maybe there is a typo on page 41 where you present a set of equations for Mout. I think the first line which is:

Mout[0]=Min[1]^Min[1]^Min[0]^Min[3]

should be

Mout[0]=Min[1]^Min[1]^Nin[0]^Nin[3]

Thanks again,

Terry

Hi Terry,

That’s right, there is a typo. It was found a long time ago, here: http://outputlogic.com/?p=158#comment-8235

Thanks,

Evgeni

Has anyone tried the CRC for the CoaxPress standard? It is a 32bit CRC based on the IEEE_802.3. But, as noted in the standard, the input bits are reversed, the output crc is reversed, etc, etc. They give an example message packet in the standard and its computed CRC, but I can’t for the life of me get their CRC in their example. The standard is here and the example is on page 35 in the Comment. As they mention in the body of the text, the 8B/10B codes of K27.7 are treated as the D27.7 which is hex FB. They don’t exactly say what the input data length is, but I have tried 8bit, 16bit and 32bit inputs to no avail. I have tried negating the output crc….no good. If anyone has had success at this, please let me know. Thanks.

Have you tried 1-bit data, all-zeros?

Evgeni,

I’m not sure what you mean. I have been trying to recreate the example in the coaxpress standard so that I know my crc calculation code is correct. They give an example message with its calculated crc. It seems simple enough. I just can’t seem to recreate it.

I found my error. All is right with the world again.

Hi. Great site and really useful.

I’m trying to generate a scrambler for 100GE. I am using a 640 bit bus w/ a 57 bit LSFR 1+x^38+x^57. I am getting the following error when it completes:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Thanks,

Frank

Hi Frank,

The time it takes to generate scrambler code grows quadratically with the number of data bits. Therefore, it’s quite possible that server times out. There is a command-line utility to generate CRC with wide data to overcome this kind of a problem. But unfortunately, I haven’t made public such a utility to generate scrambler code.

Thanks,

Evgeni

Hi

In scrambler when we are giving input as all 0’s.then the generator giving output as all 0’s. but as per the scrambler defination long sequences can be changed with 0’s and 1’s.

But in above case it is not effecting with all 0’s input.

Can you please explain why it is not changing output with all 0’s input.

Thanks,

Shravan kumar

Hi Evgeni,

Thanks for your response,

As per your comments i tried with scrambler initial value with all 1’s and and scrambler input is all 0’s. then also its giving all 0’s output.because it it xor’ed with x^39+x^58(1 xor 1 =0) ,so 0(taps xor’ed) xor 0(input) output is 0. so it is not effecting by initial value(if the initial value like 10101010..seguence then it will effect).

And iam using self synchronous scrambler(10gbase-R IEEE 802.3-2012_section4.pdf 49.2.6 scrambler)

So can you please refer the above document.

Thanks,

Shravan kumar

Hi Shravan,

Seeding with all-1’s is just one option to implement a scrambler. Ethernet specification you’re referring to implements it differently to avoid the problem with long 0 sequences. The specification requires new scrambler seed for each packet. In addition, the data is NRZ and differentially encoded.

Thanks,

Evgeni

Hi Evgeni, I just wanted to thank you for this wonderful website. You totally rock!

Best,

John

Hi Evgeni, I want to understand how I can design a circuit, and variants of it with different polynomials, similar to the one your software produced ( scrambler module for data[31:0], lfsr[22:0]=1+x^2+x^5+x^8+x^16+x^21+x^23;); what paper or textbook would be the place to find the information? I want to understand the details of the math and algorithms that your software implemented, so that I can understnad and design the circuits myself.

Thanks,

John

Hi John,

You can start with this article:

http://outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf

It’s about CRC, but math used in scramblers is similar.

Thanks,

Evgeni

Hi Evgeni,

I was going through your paper “A Practical Parallel CRC Generation Method” at http://outputlogic.com/my-stuff/parallel_crc_generator_whitepaper.pdf

and saw there is a typo in equation

Mout[0]= Min[1] ^ Min[4] ^ Min[0] ^ Min[3] at page 43

instead the equation should come out to be

Mout[0]= Min[1] ^ Min[4] ^ Nin[0] ^ Nin[3]

Can you please have a look at it.

Thanks,

sanjay

Hi Sanjay,

Yes, it’s a known typo.

Thanks,

Evgeni

Hi Evgeni,

Thank you so much for the wonderful tool. It’s really useful.

I generated a verilog code for 16-bit CRC with polynomial X^16 + X^12 + X^5 + 1. The input data width is 16-bit. It seems that, in the verilog code, the input data is XORed with shift register bit by bit first, then the shift register shifts for 16 time, resulting the new shift register values for next cycle. Is this the standard generation process? I also see some methods that the input data is fed into the shift register bit by bit during the shifting operation. The results from two methods are different. I get confused which one I should use. If I use one method, and the receiver uses another, will it work correctly?

Thanks,

Fenghua

Hi Evgeni,

I figured it out now. It seems that MSB first and LSB first will give different results.

Thanks.

Fenghua

Hi Evgeni,

I just implemented your code to calculate CRC-6. I had the following question : Since the Parallel CRC generation method uses a serial CRC generator(called recursively) to calculate the H1 and H2 tables values which it then uses to come up with a refined parallel implementation, if I wanted to calculate the CRC register with the initial values set to all 0’s instead of 1’s as we are doing, can I just do so by setting the initial lfsr value to all 0’s during reset or would i need to calculate the table values again aswell and then come up with a XOR implementation using that?

Hi Daniyal,

H1 and H2 matrices are independent of the initial value of CRC.

Thanks,

Evgeni

how we can generate pseudo – randonm number of length 79 by using lfsr

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please. Thanks in advance

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please.

Hi Evgeni,

Many thanks for your work on this topic – but I need you to check it for me.

I put your generated (VHDL) in SIMULINK for the polynomial X^7 + X^6 + 1 (which is used on SONET SDH) for 4 bit data,

I take it with additive (side) PRBS scramblers, the scrambler and the descrambler are the same circuit, unlike in multiplicative scramblers where the data is fed in and out at opposite ends of the delay chain.

In which case I put a scrambler in series with a descrambler, i.e. 2x the same implementation of your VHDL

but only 1 bit worked…the other 3 remained scrambled.

I can implement and try the VHDL, but that will take me a bit longer, I just need a create a test-bench to drive it.

I can send you a picture of the SIMULINK Circuit I constructed as an email attachment for you to see what I have done.

or even the SIMULINK .SLX file in case you have SIMULINK or maybe a free SIMULINK viewer ?

It appears to be quite simple in SIMULINK to construct a circuit mode from your VHDL, (i.e. I think my circuit is right)

which is why I am wondering how these parallel forms of the polynomial are generated….?

Is there a book or paper or standard maths that describes changing the LFSR to a parallel implantation.

I don’t understand how you generate the terms for a parallel scrambler architecture,

it doesn’t seem to match what I see in a text=book “high speed serdes devices and applications”, page 142.

Actually yours looks more correct, theirs looks over simplified.

See the following link and arrow to page 142.

https://books.google.co.uk/books?id=Cx3r0H-4AhEC&pg=PA141&lpg=PA141&dq=scrambler+and+descramblers&source=bl&ots=voSExBbg_H&sig=ACfU3U1SXzU09_1AmbAsEP-lF2ZZ90NDdQ&hl=en&sa=X&ved=2ahUKEwjwx7Kx64HgAhXdTBUIHchPC8Q4FBDoATADegQIBRAB#v=onepage&q=scrambler%20and%20descramblers&f=false

IF you engage with me on this, I will buy your book and our company could offer you some consultancy.

We are using MATLAB/SIMULINK for a software-Defined-Radio project. the Scrambler block is not supported

for auto HDL-coding I simlink, so we’re creating the scrambler as a circuits of XORs and D types that will code.

I can get additive (Fibonacci) and Galois forms working for bit-serial LFSRs,

but so far, no joy in turning things into parallel.

Also, I don’t quite know why I can’t just split the input stream (say 1 byte wide) into 8 bit-serial scramblers

and the recombine it in the same that way post the descramblers.

It perhaps uses more hardware resource, but overall, not much hardware resource, either way…????

If I did use multiple bit-serial scramblers, my XOR gates reduce in fan out and depth,

allowing faster operation..

Best Regards,

Mike Brewin

Design Engineer, ECS, England.

Hi Mike,

I’m going to reply to your email.

Thanks,

Evgeni