Why do we need to be careful about binary data formats? Because different computers store data differently, and we want interoperability between computers, and we want to exchange data between them.
In fact, this is such a big problem that FTP has a binary mode that foces files to be transferred exactly as they are; it is primarily useful between machines of the same type and usually same operating system.
Data files may contain one of two types of data: text (typically ASCII, or JIS, Shift-JIS, etc.) or binary data, meaning numbers. One useful tool for seeing the contents of files is the Unix utility od.
[rdv@localhost network-programming-in-c]$ od -t x4z lec07.html | more 0000000 783f3c0a 76206c6d 69737265 223d6e6f >.<?xml version="< 0000020 22302e31 636e6520 6e69646f 69223d67 >1.0" encoding="i< 0000040 322d6f73 2d323230 3f22706a 213c0a3e >so-2022-jp"?>.<!< 0000060 54434f44 20455059 6c6d7468 42555020 >DOCTYPE html PUB< [rdv@localhost network-programming-in-c]$ od -t x1z lec07.html | more 0000000 0a 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 >.<?xml version="< 0000020 31 2e 30 22 20 65 6e 63 6f 64 69 6e 67 3d 22 69 >1.0" encoding="i< 0000040 73 6f 2d 32 30 32 32 2d 6a 70 22 3f 3e 0a 3c 21 >so-2022-jp"?>.<!< 0000060 44 4f 43 54 59 50 45 20 68 74 6d 6c 20 50 55 42 >DOCTYPE html PUB<
The simplest way to make data transportable is to convert it from binary to ASCII text, using a function like printf() instead of write(). However, this approach has two large disadvantages:
Floating point numbers represent a special problem, because different types of processors may define the numbers differently. Simply maintaining byte order is not enough.
Sun Microsystems developed NFS, the Network File System, in the early 1980s. They needed a way to correctly transfer data among their machines. (Given that all of their machines at the time used the same processor type, this was a visionary, and very fortunate, decision.) They developed XDR, the eXternal Data Representation, for use in RPC, or Remote Procedure Call. XDR serializes data of different types:
The most basic APIs for converting data from host format to network format and back are htonl() and ntohl() for 32-bit integers, and htons() and ntohs() for 16-bit integers.
Again, for examples, we will work from the IBM article.
When transporting mixed types of data, such as text and binary, the sender and receiver must agree on the boundaries of the data.
Some of you may have programmed in Java already. If so, you may not even have been aware of the issues around binary data representations, because Java RMI takes care of them for you. Sun's version of Java stores data in big-endian format internally, and uses it for transport across the network, as well.
CORBA is a distributed object system that also handles the details internally for you. CORBA is quite complex.