x86 is a family of instruction set architectures (ISA) based on Intel 8086 CPU.

Before x86

Intel 4004 and 8008


Datapoint 2200 is one of the first personal computers designed by CTC. In 1969, CTC contracted two companies, Intel and Texas Instruments, to make the chip of this computer. TI was unable to make a reliable part and dropped out. Intel was unable to make CTC's deadline. Intel and CTC renegotiated their contract, ending up with CTC keeping its money and Intel keeping the eventually completed processor. In 1970, Datapoint 2200 was released with a processor built from discrete TTL modules instead of a chip.

In 1971, Intel finished the 4-bit version of the Datapoint 2200 CPU, Intel 4004. This is the first commercially available microprocessor by Intel. In 1972, Intel finally released the Datapoint 2200 compatible CPU, Intel 8008.

Intel 8080 and its modified models

After Intel 4004 and 8008, Intel released the successors in 1974. The Intel 4040 and 8080. The 4-bit version Intel 4040 was actually realeased later than the 8-bit Intel 8080. One of the reason of releasing Intel 4040 is simply to decrease the cost of motherboard manufactures.

Intel 8080 and its variants are one of the most successful CPUs in 1970s. One widely used varient is called Zilog Z80. The CPU of Gameboy is also a variant of Intel 8080, called Sharp LR35902.


Intel also authorized some third-party manufactures like NEC to manufacture the Intel 8080 compatible CPU.


x86 Family

After the success of Intel 8080, Intel started to design its 16-bit version, which later becomes one of the most successful ISA family, the x86 family.



The x86 architecture almost kept all features from the Intel 8080, including some rarely used by any applications like BCD instructions, but expand it with much richer features.


Addressing Mode

  • Real-address Mode
  • Protected Mode (Introduced in Intel 80286)
  • System Management Mode (Introduced in Intel 80386)

One noticable feature of the x86 CPUs, is the addressing mode introduced. Modern Operating Systems like Linux, FreeBSD or Windows are all running on the protected mode, where the real memory is mapped into virtual memory, so that the operating system could manage the applications with secured memory operations.

x86-16 and x87

The ealier models of x86 CPUs have a seperated floating point unit called x87. Some of the third-party units like Cyrix floating point unit always out-performed the original Intel made FPU. The Intel 80386 introduced the pipelining technology and built-in FPU, which greatly increased the performance of x86 CPUs. One very famous example benefitted from this feature is the game, DOOM. They designed one of the world's first 3D game with specific optimization with the architecture of 386 and 486 CPUs, without the help of PPU or GPU.


x86-32 (IA-32)

After the success of x86 architecure, Intel expanded the ISA with 32-bit support. The official name of the architecture is called IA-32 (Intel Architecture, 32-bit). In this stage, Intel no longer used 80x86 as their product name. Instead, they are using Celeron and Pentium for the home-used CPUs, and Xeon for server and workstation CPUs.

The last architecture of x86-32 designed by Intel is called Netburst. Since x86 is a CISC architecture, which means some of the instruction costs dozens or even hundreds of cycles to finish. As the prediction of Moore's Law, Intel was thinking they could reach up to 10GHz with the new architecture by increasing the number of pipelines. But the other problem is that if there's a branch prediction error, all of the pipelines must be flushed, which caused serious performance cost. As a result, after Intel produced the 130um version of Pentium 3, called Tualatin architecture, it out-perforced the state-of-the-art Pentium 4 Netburt architecture with the same 130um. Meanwhile, CISC architectures like MIPS or ARM are producing products with closer and closer performance to the Intel products in embedded and desktop market.

Finally, Intel claimed the Netburst architecture is a failure and picked up the Pentium 3 architecture again with Hyper-threading and MMX supported as the architecture base of later Intel models.

x86-64 (amd64)

While Intel was continuing the failure of Netburst. AMD has already started working on 64-bit extension ISA for x86, so called x86-64, or amd64 as the offical name.

One noticable issue is that, IA-64 (Intel Architecture, 64-bit) doesn't have any relationship with x86-64 or amd64. It's actually the name of Intel's EPIC architecture, which was a total failure we've mentioned in the Dependency Graphs (advanced).

One very important optimazation techniques introduced in the past decade for x86-64 architecture is SIMD instrctions. With the improvement of compilers in the past years. Most loops could be optimized into SIMD in modern compilers like gcc or clang, so called Auto-Vectorization.

For example, we have a C code as below.

 int foo(int *A, int n) {
   unsigned sum = 0;
   for (int i = 0; i < n; ++i)
     sum += A[i] + 5;
   return sum;

After compilation with clang 8.0.0 (x86-64, -O3), you would get the following assembly code.

         .long   5                       # 0x5
         .long   5                       # 0x5
         .long   5                       # 0x5
         .long   5                       # 0x5
 foo(int*, int):                              # @foo(int*, int)
         test    esi, esi
         jle     .LBB0_1
         mov     ecx, esi
         cmp     esi, 7
         ja      .LBB0_4
         xor     edx, edx
         xor     eax, eax
         jmp     .LBB0_12
         xor     eax, eax
         mov     edx, ecx
         and     edx, -8
         lea     rsi, [rdx - 8]
         mov     rax, rsi
         shr     rax, 3
         add     rax, 1
         mov     r8d, eax
         and     r8d, 1
         test    rsi, rsi
         je      .LBB0_5
         mov     esi, 1
         sub     rsi, rax
         lea     rax, [r8 + rsi]
         add     rax, -1
         pxor    xmm0, xmm0
         xor     esi, esi
         movdqa  xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [5,5,5,5]
         pxor    xmm2, xmm2
 .LBB0_7:                                # =>This Inner Loop Header: Depth=1
         paddd   xmm0, xmm1
         paddd   xmm2, xmm1
         movdqu  xmm3, xmmword ptr [rdi + 4*rsi]
         movdqu  xmm4, xmmword ptr [rdi + 4*rsi + 16]
         movdqu  xmm5, xmmword ptr [rdi + 4*rsi + 32]
         movdqu  xmm6, xmmword ptr [rdi + 4*rsi + 48]
         paddd   xmm3, xmm1
         paddd   xmm0, xmm3
         paddd   xmm0, xmm5
         paddd   xmm4, xmm1
         paddd   xmm2, xmm4
         paddd   xmm2, xmm6
         add     rsi, 16
         add     rax, 2
         jne     .LBB0_7
         movdqa  xmm3, xmm0
         paddd   xmm3, xmm1
         paddd   xmm1, xmm2
         test    r8, r8
         je      .LBB0_11
         movdqu  xmm0, xmmword ptr [rdi + 4*rsi]
         paddd   xmm3, xmm0
         movdqu  xmm0, xmmword ptr [rdi + 4*rsi + 16]
         paddd   xmm1, xmm0
         movdqa  xmm0, xmm3
         movdqa  xmm2, xmm1
         paddd   xmm0, xmm2
         pshufd  xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
         paddd   xmm1, xmm0
         pshufd  xmm0, xmm1, 229         # xmm0 = xmm1[1,1,2,3]
         paddd   xmm0, xmm1
         movd    eax, xmm0
         cmp     rdx, rcx
         je      .LBB0_13
 .LBB0_12:                               # =>This Inner Loop Header: Depth=1
         mov     esi, dword ptr [rdi + 4*rdx]
         lea     eax, [rax + rsi]
         add     eax, 5
         add     rdx, 1
         cmp     rcx, rdx
         jne     .LBB0_12
         movdqa  xmm3, xmmword ptr [rip + .LCPI0_0] # xmm3 = [5,5,5,5]
         xor     esi, esi
         movdqa  xmm1, xmm3
         test    r8, r8
         jne     .LBB0_10
         jmp     .LBB0_11



トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2019-06-29 (土) 12:04:17 (819d)