Home  >  Edison  >  Setting Up

i686 or x86_64?

32 bits or 64 bits operation

The Edison main SoC is a 22 nm Intel Atom “Tangier” (Z34XX) that includes two Atom Silvermont (SLM) cores. Although never advertised by Intel, the CPU is known to be 64 bits (x86_64) capable.

Advantages of running in x86_64 mode

  1. In x86_64 mode certain instructions become available that might substantially speed up your application. Using these instructions will probably require effort on your part. But take care: on Silvermont it appears to be quite easy to destroy the obtained performance enhancement, see Disadvantage … below.
  2. In x86_64 there are more registers available to the compiler, making it easier to generate optimized code.
  3. In the far future linux might drop 32 bit support entirely. Assuming you are still using your Edison then, having 64 bit support might become a necessity. In the meanwhile you may experience less rigorously tested code on i686 with associated bugs.

Examples where code optimizations have been done specific to Edison / x86_64 have been implemented 1):
Base64 encoding/decoding
CRC32C encoding/decoding

1) Feel free to add Edison / NUC E3815 or other Baytrail examples here.

Disdvantages of running in x86_64 mode

There are quite a number of disadvantages:

  1. All code compiled for x86_64 is larger in size, both on disk as well as in memory.
  2. Certain code may actually run substantially slower on SLM architectures.

From the Intel® 64 and IA-32 Architectures Optimization Reference Manual F.8.1.2 Front End High IPC Considerations:

  • The total length of the instruction bytes that can be decoded each cycle varies by microarchitecture.
    SLM: up to 16 bytes per cycle with instruction not more than 8 bytes in length. For an instruction length exceeding 8 bytes, only one instruction per cycle is decoded on decoder 0.
  • An instruction with multiple prefixes can restrict decode throughput. The restriction is on the length of bytes combining prefixes and escape bytes. There is a 3 cycle penalty when the escape/prefix count exceeds the following limits as specified per microarchitectures.
    SLM: the limit is 3 bytes.
  • Only decoder 0 can decode an instruction exceeding the limit of prefix/escape byte restriction on the Silvermont and Goldmont microarchitectures.
  • The maximum number of branches that can be decoded each cycle is 1 for SLM. Prevent a re-steer penalty by avoiding back-to-back conditional branches.

Unfortunately x86_64 mode will add a prefix byte to instructions that are already long. For instance CRC32Q will exceed the limit causing a 3 cycle penalty, which totally destroys the obtained performance enhancement.

Fortunately there is a way around this restriction. Again from the Optimization Reference Manual F.8.1.4 Loop Unrolling and Loop Stream Detector, engauging the Loop Stream Detector (LSD):

The Silvermont and Goldmont microarchitectures include a Loop Stream Detector (LSD) that provides the back end with uops that are already decoded. This provides performance and power benefits. When the LSD is engaged, front end decode restrictions, such as number of prefix/escape bytes and instruction length, no longer apply.

It appears the LSD can kick in for short loops, and after a certain amount of loops occured (although this is not clearly documented the number is probably 64). To use this, take care not to have the compiler unroll your loop. The effect can be quite dramatic, as the 3 cycle penalty is eliminated after 64 iterations a a 3x speed up can be observed for long running loops.

Enabling x86_64 mode

If you checked out master, in meta-intel-edison/meta-intel-edison-bsp/conf/machine/edison.conf change KBUILD_DEFCONFIG="x86_64_defconfig" and set DEFAULTTUNE = "core2-64".

Alternatively you can checkout scarthgap which does this for you.