Home > Logic Design > Book: 100 Power Tips for FPGA Designers

Book: 100 Power Tips for FPGA Designers

Front cover

This book is a collection of articles on various aspects of FPGA design: synthesis, simulation, porting ASIC designs, floorplanning and timing closure, design methodologies, performance, area and power optimizations, RTL coding, IP core selection, and many others.

The book is intended for system architects, design engineers, and students who want to improve their FPGA design skills. Both novice and seasoned logic and hardware engineers can find bits of useful information.

This book is written by a practicing FPGA logic designer, and contains a lot of illustrations, code examples, and scripts. Rather than providing information applicable to all FPGA vendors, this book edition focuses on Xilinx Virtex-6 and Spartan-6 FPGA families. Code examples are written in Verilog HDL.

Download excerpt from the book
Download source code, projects, and scripts

Paperback edition on Amazon.com , Amazon.de, and Amazon.co.uk

Number of pages: 474
Publisher: CreateSpace

Kindle edition on Amazon.com

The book can be read in color on a PC or MAC using free Kindle for PC or Kindle for MAC application.
It can also be read on an iPhone or iPad using free Kindle for iPhone or Kindle for iPad application.

Readers based in India can purchase the book on Flipkart.com

Chinese-speaking readers can purchase the book on PHEI

Google eBook edition
The book can be read in color on a PC, MAC, Tablet/iPad. Extensive preview is available.

ePub edition on Barnes and Noble
The book can also be read using free Nook for PC, Adobe Digital Edition applications, or on other eReaders that support ePub format.

Any questions, comments, suggestions about the book are welcome.

  1. Sam Reaves
    August 21st, 2013 at 17:50 | #1


    I am trying to implement a Synchronous Decade counter with CE in and CE out in a Spartan 3 device that will have a 50% duty cycle. Do you know where I can find a schematic or code for such a device? I have the counter working and all of the outputs seem as they should even on the target FPGA but when I cascade two counters only the first one in a chain has the proper outputs the others do not clock. Any suggestions would be greatly appreciated.


  2. August 21st, 2013 at 20:14 | #2

    Hi Sam,

    I’d presume that your design can be easily simulated to find the problem.


  3. Sam Reaves
    August 22nd, 2013 at 13:54 | #3

    Well I just got it working today and I strung up a chain of them that divided down a 100MHx Clock (I used the DCM to multiply an external 10MHz clock to 100MHz). Works like gangbusters!

    I do have one question. I used a second clock buffer in an attempt to bring the 100MHz multiplied clock out to an external pin. Do you know if this should work as I did not see any activity on the pin even though the counter chain was working properly.



  4. August 22nd, 2013 at 15:21 | #4

    Hi Sam,

    I don’t think you need to manually insert any buffer. Just wire the clock to the IO; tools should automatically insert it.


  5. Rakhi Thakur
    September 22nd, 2013 at 08:51 | #5

    I do have one question that with the help of any polynomial when we generate scramler data,at the receiver can we know that which was the polynomial by which we sent the data?or can we recover the polynomial at the receiver?

  6. September 22nd, 2013 at 10:02 | #6

    Hi Rakhi,

    If user doesn’t know the data being sent and scrambler polynomial, I presume there is nothing can be done.
    If data is known, user can collect a lot of data and try to sweep different polynomials, hoping that one of them will work. But I’m not aware of a generic algorithm that does that.


  7. September 28th, 2013 at 02:19 | #7

    Hello Evgeni,

    what machine did you use as a build server for the build runtime benchmarks in your book? Is it something that is available off-the-shelf like an HP Z800?


  8. September 29th, 2013 at 12:18 | #8

    Hi Guy,

    Yes, it was an off-the-shelf Dell server.


  9. Rajdeep Mukherjee
    December 18th, 2013 at 02:19 | #9


    I am working with behavioral Verilog design. Can you help me to get
    an idea about how control flow is flattened out in behavioral Verilog and people
    usually claim that control flow in Verilog is obscure and control flow
    is encoded in Verilog in data-encoded way.

    Can you please give me a small example (say a FSM, or a counter) and help
    me to understand that how is control flow in Verilog is encoded in
    data-driven way?

    I would appreciate any help in this regard.

    Many thanks in advance.
    Thank You.

  10. December 18th, 2013 at 03:30 | #10

    Hi Rajdeep,

    This is the first time I encounter with “data-encoded way” and “data-driven way” terminology.

    But there is at least a couple of different ways to implement control flow statements, e.g. FSM.
    This document describes those on page 79: http://www.xilinx.com/itp/xilinx10/books/docs/sim/sim.pdf


  11. Rajdeep Mukherjee
    December 19th, 2013 at 02:08 | #11

    Hello Evgeni,

    Many thanks for your reply.
    I would like to clarify here that what I meant by “data-encoded way”. I am working
    with behavioral synthesizable subset of Verilog that allows control-flow statements like if-else and switch(case) but does not allow repeat, for, while, continue statements. So, in a sense, the behavioral code structure in Verilog has a flattened control-flow structure in it (without these loop constructs). This is easy to see because you can model the effect of while or for loops using only if-then-else and switch(case), but in a data-encoded way.

    So, the FSM examples you referred has the same modeling with flattened control-flow. Can you please give me some more insight or references on this.
    From your experience, did you come across any behavioral Verilog designs that has an explicit control-flow structure which is not flattened. Also, please inform
    whether any behavioral synthesis tool allow loop constructs like for, while, repeat, an forever?

    Many Thanks in advance.
    Looking forward.


  12. December 19th, 2013 at 04:44 | #12

    Hi Rajdeep,

    The best reference would be the manual for the synthesis tool itself with supported constructs and examples.
    As far as I know, pretty much all synthesis tools support for loop, but not while,repeat, and forever. At least the ones I worked with: Xilinx ISE/Vivado, Altera Quartus, Synopsys Synplify and Design Compiler.

    I think another way to call flattening of control-flow structure is loop unrolling – this is the term I’m more familiar with.


  13. Rajdeep Mukherjee
    December 19th, 2013 at 11:51 | #13

    Hello Evgeni,

    Many thanks for your reply.
    I agree that loop-unrolling is a popular term used in this context.

    Further along the same lines, I am inquisitive to know the following from you.
    As “for” loops are synthesizable by behavioral synthesis tool, so
    given a behavioral Verilog design with for loop inside it, how does a
    behavioral synthesis tool deal with it? Does it always unroll the loop or
    does it perform partial unrolling? Depends on what factors.
    Can you please share something on this.

    Many thanks in advance.

  14. December 19th, 2013 at 12:50 | #14

    Hi Rajdeep,

    I think exact behavior and limitations are not part of the specification, and depend on the synthesis tool. You need to consult the manual and even talk to tech support of that tool.
    But generally speaking, if the loop is unfeasible, for example if it contains so many loops that cannot be unrolled to fit the chip, then it’s going to fail – either during synthesis or place and route.

    Another place to get more information are HLS (high level synthesis) tools such as Xilinx Vivado ESL. HLS tools heavily use loop constructs.


  15. Rajdeep Mukherjee
    December 20th, 2013 at 04:10 | #15

    Hello Evgeni,

    Many thanks for your ideas and references.
    Will surely keep in touch.


  16. January 23rd, 2014 at 19:18 | #16

    this is srinivas reddy
    i did a project on FPGA implementation of pipelined 2D-DCT and quantization architecture for JPEG image compression
    i request u asking a query about above project

    Which software and hardware implementation for above project
    which algorithm and which language is used for above project
    what is the main use of above project

    these much of queries asking because i am preparing sops sending to the us universities
    please send the answers about above queries

    thanking u sir

    yours faithfully

  17. Rajdeep Mukherjee
    April 28th, 2014 at 10:57 | #17

    Hi Evgeni,

    Hope you are fine.
    I have a query regarding development of control-path intensive behavioral verilog design. Currently, I have developed a IEEE 754 32-bit floating-point Add/Sub unit in verilog which is quite data-path intensive. But I am looking for a control-path intensive design in Verilog like USB controller, memory controller etc. I got few designs from Opencores but I cannot characterize whether these designs have enough control-path in it just by looking at the code. Can you please tell what are the major characteristics of any control-path intensive designs in Verilog. If I spot
    some data-path units, and a FSM in a design, can I consider it as design with

    Looking forward to your reply.

    Many Thanks
    Best regards,

  18. April 28th, 2014 at 16:42 | #18

    Hi Rajdeep,

    As far as I know, there is no clear metrics that distinguishes data-path and control-path intensive designs. You can get some idea by looking at fanouts of control or clock enable signals.
    If the design has 8 stages of 256-bit data, that’s 2K fanout. Perhaps the ratio of registers to LUTs is going to be higher in data-path intensive designs.


  19. Rajdeep Mukherjee
    April 30th, 2014 at 00:29 | #19

    Hello Evgeni,

    Thank you for your reply.
    If a design has separate data-path and control-path then the basic
    characteristics if such design is that the controller is a FSM which controls
    the operations in the data-path. But not all control-path and data-path
    mixed model of designs reflects this characteristics due to design complexity.
    But isn’t it the case that a control-path intensive design will always have a FSM
    controller to control circuit operations and the complexity of the design depends
    on the number of states in the FSM controller. Isn’t this a metric to characterise
    control-path intensive designs.

    Please correct me if I am wrong.

    Best regards,

  20. April 30th, 2014 at 01:45 | #20

    Hi Rajdeep,

    Such a control-path intensive design might also have a lot of control logic with FSMs inside the datapath.
    One example is packet processor, which does packet matching, classification, and filtering in each stage of the datapath. In addition, there is a large FSM that controls datapath operation.
    So by looking at characteristics of such a design, it’s not clear that it is control-path intensive.


  21. Rajdeep Mukherjee
    April 30th, 2014 at 02:44 | #21

    Hello Evgeni,

    Many thanks for the clarification.
    Also, I got a USB 1.1 physical Interface core from Opencores
    but I am totally not sure if it contains enough control-path in it.
    Could you please let me know if the design (link below) meets the
    requirement. (Link: http://opencores.org/project,usb_phy)
    Could you also please suggest a control-path intensive
    Verilog design which is available in opensource.
    That would be of great help.

    best regards,

  22. April 30th, 2014 at 07:47 | #22

    Hi Rajdeep,

    I’d say memory controller, processor and encryption cores have a lot of control logic.


  23. Rajdeep Mukherjee
    May 16th, 2014 at 03:42 | #23

    Hi Evgeni,

    Hope you are fine.
    Could you tell me the basic difference between simulation
    semantics and synthesis semantics. Could you link me to some
    resource where I can get to understand the difference between these
    two semantics.

    Many thanks in anticipation.

  24. Rick G
    October 13th, 2014 at 08:56 | #24

    Hi Evgeni,

    Thanks for publishing your book.

    Is there an errata for download somewhere? I like to print them out and insert them in the book.


  25. October 13th, 2014 at 09:43 | #25

    Hi Rick,

    There is no errata for this book.


  26. Avinash
    November 6th, 2015 at 03:19 | #26

    Hi Evgeni,

    Does DSP48 support SIMD?
    I know DSP48E does

    The problem is when I’m instantiating the DSP core for my Virtex 6 design I’m not able to use a DSP48E macro.

  27. bk
    December 27th, 2019 at 10:59 | #27

    Hi Evgeni,

    Polynomial G(x) =1+x^39+x^58
    I generated parallel scrambler generation code by selecting :
    step 1:
    Data width =64
    Polynomial width=58
    step 2:
    choose box-1 & box-39, (x^58 automatically selected)
    Is this compatible with IEEE 803.2 10G Base-R..?

Comment pages
1 2 671
  1. No trackbacks yet.