#author("2019-06-29T11:59:37+09:00","","")
#author("2019-06-29T12:04:17+09:00","","")
* Introduction [#uc80f297]

x86 is a family of instruction set architectures (ISA) based on Intel 8086 CPU. 

* Before x86 [#bd5da225]

** Intel 4004 and 8008 [#x95db670]

#ref(https://upload.wikimedia.org/wikipedia/commons/9/9c/Datapoint2200img.jpg,center,wrap)

Datapoint 2200 is one of the first personal computers designed by [[CTC:https://en.wikipedia.org/wiki/Datapoint]]. In 1969, CTC contracted two companies, Intel and Texas Instruments, to make the chip of this computer. TI was unable to make a reliable part and dropped out. Intel was unable to make CTC's deadline. Intel and CTC renegotiated their contract, ending up with CTC keeping its money and Intel keeping the eventually completed processor. In 1970, Datapoint 2200 is released with a processor built from discrete TTL modules instead of a chip.
Datapoint 2200 is one of the first personal computers designed by [[CTC:https://en.wikipedia.org/wiki/Datapoint]]. In 1969, CTC contracted two companies, Intel and Texas Instruments, to make the chip of this computer. TI was unable to make a reliable part and dropped out. Intel was unable to make CTC's deadline. Intel and CTC renegotiated their contract, ending up with CTC keeping its money and Intel keeping the eventually completed processor. In 1970, Datapoint 2200 was released with a processor built from discrete TTL modules instead of a chip.

#ref(https://www.intel.com/content/dam/www/public/us/en/images/photography-business/16x9/60592-1971-4004-processor-16x9.jpg.rendition.intel.web.720.405.jpg,center,wrap)

In 1971, Intel finished the 4-bit version of the Datapoint 2200 CPU, [[Intel 4004:https://www.intel.com/content/www/us/en/history/museum-story-of-intel-4004.html]]. This is the first commercially available microprocessor by Intel. In 1972, Intel finally released the Datapoint 2200 compatible CPU, [[Intel 8008:https://en.wikipedia.org/wiki/Intel_8008]].

** Intel 8080 and its modified models [#gc3b0be7]

After Intel 4004 and 8008, Intel released the successors in 1974. The Intel 4040 and 8080. The 4-bit version Intel 4040 is actually realeased later than the 8-bit Intel 8080. One of the reason of releasing Intel 4040 is simply to decrease the cost of motherboard manufactures.
After Intel 4004 and 8008, Intel released the successors in 1974. The Intel 4040 and 8080. The 4-bit version Intel 4040 was actually realeased later than the 8-bit Intel 8080. One of the reason of releasing Intel 4040 is simply to decrease the cost of motherboard manufactures.

Intel 8080 and its variants are one of the most successful CPUs in 1970s. One widely used varient is called Zilog Z80. The CPU of Gameboy is also a variant of Intel 8080, called Sharp LR35902.

#ref(https://i.imgur.com/pqz6UDKl.png,center,wrap)

Intel also authorized some third-party manufactures like NEC to manufacture the Intel 8080 compatible CPU.

#ref(https://i.imgur.com/bBhgRg3l.png,center,wrap)

* x86 Family [#m951cacb]

After the success of Intel 8080, Intel started to design its 16-bit version, which later becomes one of the most successful ISA family, the x86 family.

** Walkthrough [#uf77ceb1]

*** ISA [#hfc7406c]

The x86 architecture almost kept all features from the Intel 8080, including some rarely used by any applications like BCD instructions, but expand it with much richer features.

#ref(https://i.stack.imgur.com/VTxd0h.jpg,center,wrap)

*** Addressing Mode [#v34bda5f]

- Real-address Mode
- Protected Mode (Introduced in Intel 80286)
- System Management Mode (Introduced in Intel 80386)

One noticable feature of the x86 CPUs, is the addressing mode introduced. Modern Operating Systems like Linux, FreeBSD or Windows are all running on the protected mode, where the real memory is mapped into virtual memory, so that the operating system could manage the applications with secured memory operations.

** x86-16 and x87 [#uaf8a975]

The ealier models of x86 CPUs have a seperated floating point unit called x87. Some of the third-party units like Cyrix floating point unit always out-performed the original Intel made FPU. The Intel 80386 introduced the pipelining technology and built-in FPU, which greatly increased the performance of x86 CPUs. One very famous example benefitted from this feature is the game, DOOM. They designed one of the world's first 3D game with specific optimization with the architecture of 386 and 486 CPUs, without the help of PPU or GPU.

#ref(https://i.imgur.com/z0X4Q09l.png,center,wrap)

** x86-32 (IA-32) [#i5d62354]

After the success of x86 architecure, Intel expanded the ISA with 32-bit support. The official name of the architecture is called IA-32 (Intel Architecture, 32-bit). In this stage, Intel no longer used 80x86 as their product name. Instead, they are using Celeron and Pentium for the home-used CPUs, and Xeon for server and workstation CPUs.

The last architecture of x86-32 designed by Intel is called [[Netburst:https://en.wikipedia.org/wiki/NetBurst_(microarchitecture)]]. Since x86 is a CISC architecture, which means some of the instruction costs dozens or even hundreds of cycles to finish. As the prediction of Moore's Law, Intel was thinking they could reach up to 10GHz with the new architecture by increasing the number of pipelines. But the other problem is that if there's a branch prediction error, all of the pipelines must be flushed, which caused serious performance cost. As a result, after Intel produced the 130um version of Pentium 3, called Tualatin architecture, it out-perforced the state-of-the-art Pentium 4 Netburt architecture with the same 130um. Meanwhile, CISC architectures like MIPS or ARM are producing products with closer and closer performance to the Intel products in embedded and desktop market.

Finally, Intel claimed the Netburst architecture is a failure and picked up the Pentium 3 architecture again with Hyper-threading and MMX supported as the architecture base of later Intel models.

** x86-64 (amd64) [#e10734cd]

While Intel is continuing the failure of Netburst. AMD is starting working on 64-bit extension ISA for x86, so called x86-64, or amd64 as the offical name.
While Intel was continuing the failure of Netburst. AMD has already started working on 64-bit extension ISA for x86, so called x86-64, or amd64 as the offical name.

One noticable issue is that, IA-64 (Intel Architecture, 64-bit) doesn't have any relationship with x86-64 or amd64. It's actually the name of Intel's EPIC architecture, which was a total failure we've mentioned in the [[Dependency Graphs]] (advanced).

One very important optimazation techniques introduced in the past decade for x86-64 architecture is SIMD instrctions. With the improvement of compilers in the past years. Most loops could be optimized into SIMD in modern compilers like gcc or clang, so called [[Auto-Vectorization:https://llvm.org/docs/Vectorizers.html]].

For example, we have a C code as below.

  int foo(int *A, int n) {
    unsigned sum = 0;
    for (int i = 0; i < n; ++i)
      sum += A[i] + 5;
    return sum;
  }

After compilation with clang 8.0.0 (x86-64, -O3), you would get the following assembly code.

  .LCPI0_0:
          .long   5                       # 0x5
          .long   5                       # 0x5
          .long   5                       # 0x5
          .long   5                       # 0x5
  foo(int*, int):                              # @foo(int*, int)
          test    esi, esi
          jle     .LBB0_1
          mov     ecx, esi
          cmp     esi, 7
          ja      .LBB0_4
          xor     edx, edx
          xor     eax, eax
          jmp     .LBB0_12
  .LBB0_1:
          xor     eax, eax
          ret
  .LBB0_4:
          mov     edx, ecx
          and     edx, -8
          lea     rsi, [rdx - 8]
          mov     rax, rsi
          shr     rax, 3
          add     rax, 1
          mov     r8d, eax
          and     r8d, 1
          test    rsi, rsi
          je      .LBB0_5
          mov     esi, 1
          sub     rsi, rax
          lea     rax, [r8 + rsi]
          add     rax, -1
          pxor    xmm0, xmm0
          xor     esi, esi
          movdqa  xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [5,5,5,5]
          pxor    xmm2, xmm2
  .LBB0_7:                                # =>This Inner Loop Header: Depth=1
          paddd   xmm0, xmm1
          paddd   xmm2, xmm1
          movdqu  xmm3, xmmword ptr [rdi + 4*rsi]
          movdqu  xmm4, xmmword ptr [rdi + 4*rsi + 16]
          movdqu  xmm5, xmmword ptr [rdi + 4*rsi + 32]
          movdqu  xmm6, xmmword ptr [rdi + 4*rsi + 48]
          paddd   xmm3, xmm1
          paddd   xmm0, xmm3
          paddd   xmm0, xmm5
          paddd   xmm4, xmm1
          paddd   xmm2, xmm4
          paddd   xmm2, xmm6
          add     rsi, 16
          add     rax, 2
          jne     .LBB0_7
          movdqa  xmm3, xmm0
          paddd   xmm3, xmm1
          paddd   xmm1, xmm2
          test    r8, r8
          je      .LBB0_11
  .LBB0_10:
          movdqu  xmm0, xmmword ptr [rdi + 4*rsi]
          paddd   xmm3, xmm0
          movdqu  xmm0, xmmword ptr [rdi + 4*rsi + 16]
          paddd   xmm1, xmm0
          movdqa  xmm0, xmm3
          movdqa  xmm2, xmm1
  .LBB0_11:
          paddd   xmm0, xmm2
          pshufd  xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
          paddd   xmm1, xmm0
          pshufd  xmm0, xmm1, 229         # xmm0 = xmm1[1,1,2,3]
          paddd   xmm0, xmm1
          movd    eax, xmm0
          cmp     rdx, rcx
          je      .LBB0_13
  .LBB0_12:                               # =>This Inner Loop Header: Depth=1
          mov     esi, dword ptr [rdi + 4*rdx]
          lea     eax, [rax + rsi]
          add     eax, 5
          add     rdx, 1
          cmp     rcx, rdx
          jne     .LBB0_12
  .LBB0_13:
          ret
  .LBB0_5:
          movdqa  xmm3, xmmword ptr [rip + .LCPI0_0] # xmm3 = [5,5,5,5]
          xor     esi, esi
          movdqa  xmm1, xmm3
          test    r8, r8
          jne     .LBB0_10
          jmp     .LBB0_11

* References [#t9ca68c7]

+ https://youtu.be/HyzD8pNlpwI
+ http://www.pastraiser.com/cpu/gameboy/gameboy_opcodes.html
+ https://stackoverflow.com/questions/6401586/intel-x86-opcode-reference
+ https://www.felixcloutier.com/x86/
+ https://software.intel.com/en-us/articles/intel-sdm
+ https://developer.amd.com/resources/developer-guides-manuals/
+ https://pcper.com/2011/08/yes-netburst-really-was-that-bad-cpu-architectures-tested/

トップ   編集 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS