CRC parallel calculation
I have implemented a verilog code for CRC8 parallel with help of your document “A Practical Parallel CRC Generation Method”. For this many thanks.

I do have a question regarding CRC polynomial: How can I insert the CRC polynomial (0x 107) as a parameter in my design? Is it possible?

Hi Evgeni,
I read you paper and understood the general idea of CRC parallel calculation, but not in depth. However, is it possible to implement the CRC parallel calculation using purely combinational logic (XOR arrary)? That is, if I load in the data and I can get the CRC checksum instantly. In the code generated by the website, it looks like the CRC of next block of data will depend on the CRC of previous block of data, which is not reasonable to me.
Thanks for any help.
Kevin

This is the idea behind parallel CRC. The disadvantage of the parallel CRC is that it requires more XOR gates and more levels of logic as the data goes wider.
It’s possible to implement, for example, 4096-bit CRC, and calculate it instantly on 4096-bit data. But in practice it’s implemented as 32-bit data (can be other value as well), and so it’d take 4096/32=128 clocks. That’s why there are registers on the output of the CRC to store intermediate results.

@Evgeni
OK, thanks. I am only using 8-13 bit data, so area is not a concern. If I want it calculated instantly, I just need to follow the method you described in paper and ignore the regs, right? In this case, I should just need to fix the signals which are used to be previous CRC state to all ‘1’s, right?

@Evgeni
Thank you so much. I think I figured out the problem. Initialization with ‘1’ or ‘0’ may lead to different CRC checksum, but both of them are compliant to the same polynominal. In the standard arithmetic computation, we are used to putting ‘0’ when doing the long division, that’s the only difference in practical design.

Hello. Your arcticle about CRC generator is very interesting. But replacement “for … loop” algorithm with “XOR” have advantage only for bad synthesizer like Xilinx. I use Altera Quartus for your example and two methods have the same result.
Denis

Hi Evgeni
Thanks very much for your article in outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf
I found it very helpful in starting to grasp the concepts behind the design of parallel CRC generation.
I did notice that maybe there is a typo on page 41 where you present a set of equations for Mout. I think the first line which is:
Mout[0]=Min[1]^Min[1]^Min[0]^Min[3]
should be
Mout[0]=Min[1]^Min[1]^Nin[0]^Nin[3]

Has anyone tried the CRC for the CoaxPress standard? It is a 32bit CRC based on the IEEE_802.3. But, as noted in the standard, the input bits are reversed, the output crc is reversed, etc, etc. They give an example message packet in the standard and its computed CRC, but I can’t for the life of me get their CRC in their example. The standard is here and the example is on page 35 in the Comment. As they mention in the body of the text, the 8B/10B codes of K27.7 are treated as the D27.7 which is hex FB. They don’t exactly say what the input data length is, but I have tried 8bit, 16bit and 32bit inputs to no avail. I have tried negating the output crc….no good. If anyone has had success at this, please let me know. Thanks.

Evgeni,
I’m not sure what you mean. I have been trying to recreate the example in the coaxpress standard so that I know my crc calculation code is correct. They give an example message with its calculated crc. It seems simple enough. I just can’t seem to recreate it.

Hi. Great site and really useful.
I’m trying to generate a scrambler for 100GE. I am using a 640 bit bus w/ a 57 bit LSFR 1+x^38+x^57. I am getting the following error when it completes:

Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

The time it takes to generate scrambler code grows quadratically with the number of data bits. Therefore, it’s quite possible that server times out. There is a command-line utility to generate CRC with wide data to overcome this kind of a problem. But unfortunately, I haven’t made public such a utility to generate scrambler code.

In scrambler when we are giving input as all 0’s.then the generator giving output as all 0’s. but as per the scrambler defination long sequences can be changed with 0’s and 1’s.

But in above case it is not effecting with all 0’s input.

Can you please explain why it is not changing output with all 0’s input.

Thanks for your response,
As per your comments i tried with scrambler initial value with all 1’s and and scrambler input is all 0’s. then also its giving all 0’s output.because it it xor’ed with x^39+x^58(1 xor 1 =0) ,so 0(taps xor’ed) xor 0(input) output is 0. so it is not effecting by initial value(if the initial value like 10101010..seguence then it will effect).

And iam using self synchronous scrambler(10gbase-R IEEE 802.3-2012_section4.pdf 49.2.6 scrambler)

Seeding with all-1’s is just one option to implement a scrambler. Ethernet specification you’re referring to implements it differently to avoid the problem with long 0 sequences. The specification requires new scrambler seed for each packet. In addition, the data is NRZ and differentially encoded.

Hi Evgeni, I want to understand how I can design a circuit, and variants of it with different polynomials, similar to the one your software produced ( scrambler module for data[31:0], lfsr[22:0]=1+x^2+x^5+x^8+x^16+x^21+x^23;); what paper or textbook would be the place to find the information? I want to understand the details of the math and algorithms that your software implemented, so that I can understnad and design the circuits myself.

Hi Evgeni,
I was going through your paper “A Practical Parallel CRC Generation Method” at http://outputlogic.com/my-stuff/parallel_crc_generator_whitepaper.pdf
and saw there is a typo in equation
Mout[0]= Min[1] ^ Min[4] ^ Min[0] ^ Min[3] at page 43
instead the equation should come out to be
Mout[0]= Min[1] ^ Min[4] ^ Nin[0] ^ Nin[3]

Thank you so much for the wonderful tool. It’s really useful.

I generated a verilog code for 16-bit CRC with polynomial X^16 + X^12 + X^5 + 1. The input data width is 16-bit. It seems that, in the verilog code, the input data is XORed with shift register bit by bit first, then the shift register shifts for 16 time, resulting the new shift register values for next cycle. Is this the standard generation process? I also see some methods that the input data is fed into the shift register bit by bit during the shifting operation. The results from two methods are different. I get confused which one I should use. If I use one method, and the receiver uses another, will it work correctly?

Hi Evgeni,
I just implemented your code to calculate CRC-6. I had the following question : Since the Parallel CRC generation method uses a serial CRC generator(called recursively) to calculate the H1 and H2 tables values which it then uses to come up with a refined parallel implementation, if I wanted to calculate the CRC register with the initial values set to all 0’s instead of 1’s as we are doing, can I just do so by setting the initial lfsr value to all 0’s during reset or would i need to calculate the table values again aswell and then come up with a XOR implementation using that?

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please. Thanks in advance

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please.

Many thanks for your work on this topic – but I need you to check it for me.

I put your generated (VHDL) in SIMULINK for the polynomial X^7 + X^6 + 1 (which is used on SONET SDH) for 4 bit data,
I take it with additive (side) PRBS scramblers, the scrambler and the descrambler are the same circuit, unlike in multiplicative scramblers where the data is fed in and out at opposite ends of the delay chain.

In which case I put a scrambler in series with a descrambler, i.e. 2x the same implementation of your VHDL
but only 1 bit worked…the other 3 remained scrambled.

I can implement and try the VHDL, but that will take me a bit longer, I just need a create a test-bench to drive it.
I can send you a picture of the SIMULINK Circuit I constructed as an email attachment for you to see what I have done.
or even the SIMULINK .SLX file in case you have SIMULINK or maybe a free SIMULINK viewer ?
It appears to be quite simple in SIMULINK to construct a circuit mode from your VHDL, (i.e. I think my circuit is right)
which is why I am wondering how these parallel forms of the polynomial are generated….?
Is there a book or paper or standard maths that describes changing the LFSR to a parallel implantation.

I don’t understand how you generate the terms for a parallel scrambler architecture,
it doesn’t seem to match what I see in a text=book “high speed serdes devices and applications”, page 142.
Actually yours looks more correct, theirs looks over simplified.

IF you engage with me on this, I will buy your book 🙂 and our company could offer you some consultancy.
We are using MATLAB/SIMULINK for a software-Defined-Radio project. the Scrambler block is not supported
for auto HDL-coding I simlink, so we’re creating the scrambler as a circuits of XORs and D types that will code.

I can get additive (Fibonacci) and Galois forms working for bit-serial LFSRs,
but so far, no joy in turning things into parallel.

Also, I don’t quite know why I can’t just split the input stream (say 1 byte wide) into 8 bit-serial scramblers
and the recombine it in the same that way post the descramblers.
It perhaps uses more hardware resource, but overall, not much hardware resource, either way…????
If I did use multiple bit-serial scramblers, my XOR gates reduce in fan out and depth,
allowing faster operation..

Best Regards,
Mike Brewin
Design Engineer, ECS, England.

I implemented your algorithm and generated CRC-64-ECMA for single bit and for 256-bit. I made a testbench in which i gave the data input as 0 for both the systems. In the serial implementation, I get different values of CRC as the clock keeps advancing. I do not get why that happens. And if so, what is the final value of my CRC. Also, the serial and parallel CRC do not seem to match in any clock cycle. What am i doing wrong? Could you please email me at your convenience to rucha_95@yahoo.com

There are several reasons why serial and parallel implementation don’t match. One is the incorrect bit and byte order.
I think most of the questions in this thread deal with exact same issue. You might want to review it first.

I have the parallel implementation and I have my CRC getting generated in the first clock for some input data. Now, I want to test for the correctness of the data: Take the data, append the CRC at the end , and XOR this with the CRC that we received. And the result should be zero. How do i go about implementing this?

The result of the CRC check depends on the way CRC is appended to the original data. But in most cases it’s not zero for a reason – to detect trailing zero sequences in data. It’s usually called “magic number” that you need to compare against.

Thanks for sharing the knowledge.
I have seen CRC implementation based on the mask table, is there any such way of implementing ? If yes, then the implementation of CRC-16 through which method is better either polynomial or mask_table.
Also can I get the reference of the Mask Table implementation procedure?

I have been using your parallel CRC generator and I am getting a correct output when I compare it to a serial crc generator. I notice however that the result I’m getting are noted as CRC-32(MPEG-2) which means the input and output are not reflected. CRC-32(JAMCRC) uses the same polynomial and initial value but it has a reflected input and output. I’ve tried swapping the bit order of my incoming data and the output CRC but I am not able to reproduce the serial result for this version. Do you know if it possible to generate this output using this calculator or does it require different logic for the data to be operated on?

Hi.
At first,your RTL code is wonderful source.
When i use your code for generating a 16 bit (ccitt) with poly x16+x12+x5+x0 and compare with result at :https://crccalc.com/
It show that the result of your code (poly x16+x12+x5+x0) is belong to: CRC-16/CCITT-FALSE Algorithm.
Whilst, my reference book show the result as CRC-16/MCRF4XX Algorithm which has the same Poly 0x1021 and Init 0xFFFF.
I wondering that, do you have any code for this situation.
Pls, this will help me a lot.
Thanks

Hi John,
I did a quick search of JAMCRC. It says that it’s simply the bitwise-not of the standard CRC-32, whatever you call “reflected output”.
If that’s the case, it’s trivial to change generated output by adding “~” (bitwise-not).
Thanks,
Evgeni

I am trying to generate the scrambler for 1+x^43 as *User define* with poly mentioned. However, the generated coded seems does not do anything at all since lfsr_q *NEVER* change regardless of what the input are.

So if the start seed is all FFF’s then the output just an invert of the input, nothing else..

Hi Evgeni,
I need to design a parallel CRC generator with 32 bits of data width and a 20 bits polynomial. I tried your CRC generator with the parameters a mentioned but if my message word has also 32 bits and I need compute the checksums of each word per cycle, the initial value of the lsfr_q register must be initialized for every word right? I mean, I dont have to assign them to the lfsr_c value for every clock cycle, rather what I have to do is for every cycle you have to assert reset to 1.
In the other hand, I was reading the Campobello papar that you have referenced in your article and says that for this method the number of bits of your message (k) and your degree polynomial (m) must be multiples of the number of bits to be procesed in parallel
(w). But in my case I have k=w=32 and m=20 and that condition does not acomplished but I tried anyways, using asserting reset to 1 for every clock cycle as I mentioned above, and it seems to work. So I’m not sure what is going on.
I hope you can clarify my doubts
Greetings,
Diego

Diego :
Hi Evgeni,
I need to design a parallel CRC generator with 32 bits of data width and a 20 bits polynomial. I tried your CRC generator with the parameters mentioned before but if my message word has also 32 bits and I need compute the checksums of each word per cycle, the initial value of the lsfr_q register must be initialized for every word right? I mean, I dont have to assign them to the lfsr_c value for every clock cycle, rather what I have to do is for every cycle I have to assert reset to 1.
In the other hand, I was reading the Campobello paper that you have referenced in your article and says that for this method the number of bits of your message (k) and your degree polynomial (m) must be multiples of the number of bits to be procesed in parallel
(w). But in my case I have k=w=32 and m=20 and that condition does not acomplished but I tried anyways, using asserting reset to 1 for every clock cycle as I mentioned above, and it seems to work. So I’m not sure what is going on.
I hope you can clarify my doubts
Greetings,
Diego

20-bit CRC output becomes LFSR initialization value for the next clock. In the very first clock, LFSR is typically initialized with all-1s. But it’s not always the case. Some protocols might use a different value.

@Evgeni
I think I understand now, the checksum bits of a frame are computed, not of a single word.
On the other hand, I’m not sure if I’m misunderstanding it but there is a restriction in the Campobello’s paper that says that the that the degree of polynomial generator (m) and the length of the message to be processed (k) are both multiples of the number of bits to be processed in parallel (w). So, according that, I cannot design a CRC generator with m=20 and w=32, is that right? Regardless that restriction, I’ve tried anyways with this polynomial 0xC1ACF and a data_width=32 and it seems it works.
Thank you very much for your time and patience, I’m a desperate student learning on the way about this.

You can have CRC and data of any width and any with polynomial, there is absolutely no restriction. But only certain class of polynomials have desired error detection properties. If your CRC and data are too wide, you’d have performance issues (clock frequency in FPGA or ASIC). Also, if CRC is too wide, it’s going to increase the communication overhead. For parallel CRCs, it’s exactly the same as serial 1-bit CRC with unroll loop.

The generators have been invaluable to me. It’s a terrific resource!

The website is definitely running, but it looks like the generators aren’t running. I specify the parameters and in step 2 I only get the message:
Specify polynomial coefficients

It’s a lot ask, but would you consider sharing your code?

Hi Dmitry,

Yes, there is a typo.

Swapping bits in 0×6522df69 will get 0x9add2096

Swizzling all bits in 0x9add2096 will get 0x6904bb59

Thanks,

Evgeni

CRC parallel calculation

I have implemented a verilog code for CRC8 parallel with help of your document “A Practical Parallel CRC Generation Method”. For this many thanks.

I do have a question regarding CRC polynomial: How can I insert the CRC polynomial (0x 107) as a parameter in my design? Is it possible?

Thank you

Jamil

@Jamil

Hi Jamil,

One way to parametrize CRC polynomial is to do bitwise logic AND with the taps, for example, using “for” loop in Verilog.

Thanks,

Evgeni

Hi Evgeni,

I read you paper and understood the general idea of CRC parallel calculation, but not in depth. However, is it possible to implement the CRC parallel calculation using purely combinational logic (XOR arrary)? That is, if I load in the data and I can get the CRC checksum instantly. In the code generated by the website, it looks like the CRC of next block of data will depend on the CRC of previous block of data, which is not reasonable to me.

Thanks for any help.

Kevin

Hi Kevin,

This is the idea behind parallel CRC. The disadvantage of the parallel CRC is that it requires more XOR gates and more levels of logic as the data goes wider.

It’s possible to implement, for example, 4096-bit CRC, and calculate it instantly on 4096-bit data. But in practice it’s implemented as 32-bit data (can be other value as well), and so it’d take 4096/32=128 clocks. That’s why there are registers on the output of the CRC to store intermediate results.

Thanks,

Evgeni

@Evgeni

OK, thanks. I am only using 8-13 bit data, so area is not a concern. If I want it calculated instantly, I just need to follow the method you described in paper and ignore the regs, right? In this case, I should just need to fix the signals which are used to be previous CRC state to all ‘1’s, right?

Hi Kevin,

That’s right. Most of the CRC specifications use ‘1’s to initialize the CRC. But strictly speaking, it doesn’t need to be the case.

Thanks,

Evgeni

@Evgeni

Thank you so much. I think I figured out the problem. Initialization with ‘1’ or ‘0’ may lead to different CRC checksum, but both of them are compliant to the same polynominal. In the standard arithmetic computation, we are used to putting ‘0’ when doing the long division, that’s the only difference in practical design.

Hello. Your arcticle about CRC generator is very interesting. But replacement “for … loop” algorithm with “XOR” have advantage only for bad synthesizer like Xilinx. I use Altera Quartus for your example and two methods have the same result.

Denis

Hi Evgeni

Thanks very much for your article in outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf

I found it very helpful in starting to grasp the concepts behind the design of parallel CRC generation.

I did notice that maybe there is a typo on page 41 where you present a set of equations for Mout. I think the first line which is:

Mout[0]=Min[1]^Min[1]^Min[0]^Min[3]

should be

Mout[0]=Min[1]^Min[1]^Nin[0]^Nin[3]

Thanks again,

Terry

Hi Terry,

That’s right, there is a typo. It was found a long time ago, here: http://outputlogic.com/?p=158#comment-8235

Thanks,

Evgeni

Has anyone tried the CRC for the CoaxPress standard? It is a 32bit CRC based on the IEEE_802.3. But, as noted in the standard, the input bits are reversed, the output crc is reversed, etc, etc. They give an example message packet in the standard and its computed CRC, but I can’t for the life of me get their CRC in their example. The standard is here and the example is on page 35 in the Comment. As they mention in the body of the text, the 8B/10B codes of K27.7 are treated as the D27.7 which is hex FB. They don’t exactly say what the input data length is, but I have tried 8bit, 16bit and 32bit inputs to no avail. I have tried negating the output crc….no good. If anyone has had success at this, please let me know. Thanks.

Have you tried 1-bit data, all-zeros?

Evgeni,

I’m not sure what you mean. I have been trying to recreate the example in the coaxpress standard so that I know my crc calculation code is correct. They give an example message with its calculated crc. It seems simple enough. I just can’t seem to recreate it.

I found my error. All is right with the world again.

Hi. Great site and really useful.

I’m trying to generate a scrambler for 100GE. I am using a 640 bit bus w/ a 57 bit LSFR 1+x^38+x^57. I am getting the following error when it completes:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Thanks,

Frank

Hi Frank,

The time it takes to generate scrambler code grows quadratically with the number of data bits. Therefore, it’s quite possible that server times out. There is a command-line utility to generate CRC with wide data to overcome this kind of a problem. But unfortunately, I haven’t made public such a utility to generate scrambler code.

Thanks,

Evgeni

Hi

In scrambler when we are giving input as all 0’s.then the generator giving output as all 0’s. but as per the scrambler defination long sequences can be changed with 0’s and 1’s.

But in above case it is not effecting with all 0’s input.

Can you please explain why it is not changing output with all 0’s input.

Thanks,

Shravan kumar

Hi Evgeni,

Thanks for your response,

As per your comments i tried with scrambler initial value with all 1’s and and scrambler input is all 0’s. then also its giving all 0’s output.because it it xor’ed with x^39+x^58(1 xor 1 =0) ,so 0(taps xor’ed) xor 0(input) output is 0. so it is not effecting by initial value(if the initial value like 10101010..seguence then it will effect).

And iam using self synchronous scrambler(10gbase-R IEEE 802.3-2012_section4.pdf 49.2.6 scrambler)

So can you please refer the above document.

Thanks,

Shravan kumar

Hi Shravan,

Seeding with all-1’s is just one option to implement a scrambler. Ethernet specification you’re referring to implements it differently to avoid the problem with long 0 sequences. The specification requires new scrambler seed for each packet. In addition, the data is NRZ and differentially encoded.

Thanks,

Evgeni

Hi Evgeni, I just wanted to thank you for this wonderful website. You totally rock!

Best,

John

Hi Evgeni, I want to understand how I can design a circuit, and variants of it with different polynomials, similar to the one your software produced ( scrambler module for data[31:0], lfsr[22:0]=1+x^2+x^5+x^8+x^16+x^21+x^23;); what paper or textbook would be the place to find the information? I want to understand the details of the math and algorithms that your software implemented, so that I can understnad and design the circuits myself.

Thanks,

John

Hi John,

You can start with this article:

http://outputlogic.com/my-stuff/circuit-cellar-january-2010-crc.pdf

It’s about CRC, but math used in scramblers is similar.

Thanks,

Evgeni

Hi Evgeni,

I was going through your paper “A Practical Parallel CRC Generation Method” at http://outputlogic.com/my-stuff/parallel_crc_generator_whitepaper.pdf

and saw there is a typo in equation

Mout[0]= Min[1] ^ Min[4] ^ Min[0] ^ Min[3] at page 43

instead the equation should come out to be

Mout[0]= Min[1] ^ Min[4] ^ Nin[0] ^ Nin[3]

Can you please have a look at it.

Thanks,

sanjay

Hi Sanjay,

Yes, it’s a known typo.

Thanks,

Evgeni

Hi Evgeni,

Thank you so much for the wonderful tool. It’s really useful.

I generated a verilog code for 16-bit CRC with polynomial X^16 + X^12 + X^5 + 1. The input data width is 16-bit. It seems that, in the verilog code, the input data is XORed with shift register bit by bit first, then the shift register shifts for 16 time, resulting the new shift register values for next cycle. Is this the standard generation process? I also see some methods that the input data is fed into the shift register bit by bit during the shifting operation. The results from two methods are different. I get confused which one I should use. If I use one method, and the receiver uses another, will it work correctly?

Thanks,

Fenghua

Hi Evgeni,

I figured it out now. It seems that MSB first and LSB first will give different results.

Thanks.

Fenghua

Hi Evgeni,

I just implemented your code to calculate CRC-6. I had the following question : Since the Parallel CRC generation method uses a serial CRC generator(called recursively) to calculate the H1 and H2 tables values which it then uses to come up with a refined parallel implementation, if I wanted to calculate the CRC register with the initial values set to all 0’s instead of 1’s as we are doing, can I just do so by setting the initial lfsr value to all 0’s during reset or would i need to calculate the table values again aswell and then come up with a XOR implementation using that?

Hi Daniyal,

H1 and H2 matrices are independent of the initial value of CRC.

Thanks,

Evgeni

how we can generate pseudo – randonm number of length 79 by using lfsr

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please. Thanks in advance

For 1001 data and 1011 generator polynomial i got a sequence of CRC_output like 7,2,1,0,6,4,3. for this actual output is 110 how to know which is the CRC and explain me someone please.

Hi Evgeni,

Many thanks for your work on this topic – but I need you to check it for me.

I put your generated (VHDL) in SIMULINK for the polynomial X^7 + X^6 + 1 (which is used on SONET SDH) for 4 bit data,

I take it with additive (side) PRBS scramblers, the scrambler and the descrambler are the same circuit, unlike in multiplicative scramblers where the data is fed in and out at opposite ends of the delay chain.

In which case I put a scrambler in series with a descrambler, i.e. 2x the same implementation of your VHDL

but only 1 bit worked…the other 3 remained scrambled.

I can implement and try the VHDL, but that will take me a bit longer, I just need a create a test-bench to drive it.

I can send you a picture of the SIMULINK Circuit I constructed as an email attachment for you to see what I have done.

or even the SIMULINK .SLX file in case you have SIMULINK or maybe a free SIMULINK viewer ?

It appears to be quite simple in SIMULINK to construct a circuit mode from your VHDL, (i.e. I think my circuit is right)

which is why I am wondering how these parallel forms of the polynomial are generated….?

Is there a book or paper or standard maths that describes changing the LFSR to a parallel implantation.

I don’t understand how you generate the terms for a parallel scrambler architecture,

it doesn’t seem to match what I see in a text=book “high speed serdes devices and applications”, page 142.

Actually yours looks more correct, theirs looks over simplified.

See the following link and arrow to page 142.

https://books.google.co.uk/books?id=Cx3r0H-4AhEC&pg=PA141&lpg=PA141&dq=scrambler+and+descramblers&source=bl&ots=voSExBbg_H&sig=ACfU3U1SXzU09_1AmbAsEP-lF2ZZ90NDdQ&hl=en&sa=X&ved=2ahUKEwjwx7Kx64HgAhXdTBUIHchPC8Q4FBDoATADegQIBRAB#v=onepage&q=scrambler%20and%20descramblers&f=false

IF you engage with me on this, I will buy your book 🙂 and our company could offer you some consultancy.

We are using MATLAB/SIMULINK for a software-Defined-Radio project. the Scrambler block is not supported

for auto HDL-coding I simlink, so we’re creating the scrambler as a circuits of XORs and D types that will code.

I can get additive (Fibonacci) and Galois forms working for bit-serial LFSRs,

but so far, no joy in turning things into parallel.

Also, I don’t quite know why I can’t just split the input stream (say 1 byte wide) into 8 bit-serial scramblers

and the recombine it in the same that way post the descramblers.

It perhaps uses more hardware resource, but overall, not much hardware resource, either way…????

If I did use multiple bit-serial scramblers, my XOR gates reduce in fan out and depth,

allowing faster operation..

Best Regards,

Mike Brewin

Design Engineer, ECS, England.

Hi Mike,

I’m going to reply to your email.

Thanks,

Evgeni

Hi Evgeni,

I implemented your algorithm and generated CRC-64-ECMA for single bit and for 256-bit. I made a testbench in which i gave the data input as 0 for both the systems. In the serial implementation, I get different values of CRC as the clock keeps advancing. I do not get why that happens. And if so, what is the final value of my CRC. Also, the serial and parallel CRC do not seem to match in any clock cycle. What am i doing wrong? Could you please email me at your convenience to rucha_95@yahoo.com

Hi Rucha,

There are several reasons why serial and parallel implementation don’t match. One is the incorrect bit and byte order.

I think most of the questions in this thread deal with exact same issue. You might want to review it first.

Thanks,

Evgeni

Hi,

I have the parallel implementation and I have my CRC getting generated in the first clock for some input data. Now, I want to test for the correctness of the data: Take the data, append the CRC at the end , and XOR this with the CRC that we received. And the result should be zero. How do i go about implementing this?

Hi Rucha,

The result of the CRC check depends on the way CRC is appended to the original data. But in most cases it’s not zero for a reason – to detect trailing zero sequences in data. It’s usually called “magic number” that you need to compare against.

Thanks,

Evgeni

Hi Evgeni,

Thanks for sharing the knowledge.

I have seen CRC implementation based on the mask table, is there any such way of implementing ? If yes, then the implementation of CRC-16 through which method is better either polynomial or mask_table.

Also can I get the reference of the Mask Table implementation procedure?

Thanks,

M.M.Naveen Kumar

Hi Evgeni,

I have been using your parallel CRC generator and I am getting a correct output when I compare it to a serial crc generator. I notice however that the result I’m getting are noted as CRC-32(MPEG-2) which means the input and output are not reflected. CRC-32(JAMCRC) uses the same polynomial and initial value but it has a reflected input and output. I’ve tried swapping the bit order of my incoming data and the output CRC but I am not able to reproduce the serial result for this version. Do you know if it possible to generate this output using this calculator or does it require different logic for the data to be operated on?

Thanks,

John

Hi.

At first,your RTL code is wonderful source.

When i use your code for generating a 16 bit (ccitt) with poly x16+x12+x5+x0 and compare with result at :https://crccalc.com/

It show that the result of your code (poly x16+x12+x5+x0) is belong to: CRC-16/CCITT-FALSE Algorithm.

Whilst, my reference book show the result as CRC-16/MCRF4XX Algorithm which has the same Poly 0x1021 and Init 0xFFFF.

I wondering that, do you have any code for this situation.

Pls, this will help me a lot.

Thanks

Hi John,

I did a quick search of JAMCRC. It says that it’s simply the bitwise-not of the standard CRC-32, whatever you call “reflected output”.

If that’s the case, it’s trivial to change generated output by adding “~” (bitwise-not).

Thanks,

Evgeni

HI,

I am trying to generate the scrambler for 1+x^43 as *User define* with poly mentioned. However, the generated coded seems does not do anything at all since lfsr_q *NEVER* change regardless of what the input are.

So if the start seed is all FFF’s then the output just an invert of the input, nothing else..

Any comment on that?

Thanks,

Tony

Hi Evgeni,

I need to design a parallel CRC generator with 32 bits of data width and a 20 bits polynomial. I tried your CRC generator with the parameters a mentioned but if my message word has also 32 bits and I need compute the checksums of each word per cycle, the initial value of the lsfr_q register must be initialized for every word right? I mean, I dont have to assign them to the lfsr_c value for every clock cycle, rather what I have to do is for every cycle you have to assert reset to 1.

In the other hand, I was reading the Campobello papar that you have referenced in your article and says that for this method the number of bits of your message (k) and your degree polynomial (m) must be multiples of the number of bits to be procesed in parallel

(w). But in my case I have k=w=32 and m=20 and that condition does not acomplished but I tried anyways, using asserting reset to 1 for every clock cycle as I mentioned above, and it seems to work. So I’m not sure what is going on.

I hope you can clarify my doubts

Greetings,

Diego

Hi Diego,

20-bit CRC output becomes LFSR initialization value for the next clock. In the very first clock, LFSR is typically initialized with all-1s. But it’s not always the case. Some protocols might use a different value.

Thanks,

Evgeni

@Evgeni

I think I understand now, the checksum bits of a frame are computed, not of a single word.

On the other hand, I’m not sure if I’m misunderstanding it but there is a restriction in the Campobello’s paper that says that the that the degree of polynomial generator (m) and the length of the message to be processed (k) are both multiples of the number of bits to be processed in parallel (w). So, according that, I cannot design a CRC generator with m=20 and w=32, is that right? Regardless that restriction, I’ve tried anyways with this polynomial 0xC1ACF and a data_width=32 and it seems it works.

Thank you very much for your time and patience, I’m a desperate student learning on the way about this.

Hi Diego,

You can have CRC and data of any width and any with polynomial, there is absolutely no restriction. But only certain class of polynomials have desired error detection properties. If your CRC and data are too wide, you’d have performance issues (clock frequency in FPGA or ASIC). Also, if CRC is too wide, it’s going to increase the communication overhead. For parallel CRCs, it’s exactly the same as serial 1-bit CRC with unroll loop.

Thanks,

Evgeni

@Evgeni

Understood, thank you very much Evgeni!

Greetings

The generators have been invaluable to me. It’s a terrific resource!

The website is definitely running, but it looks like the generators aren’t running. I specify the parameters and in step 2 I only get the message:

Specify polynomial coefficients

It’s a lot ask, but would you consider sharing your code?

Thanks!