Jamsa Chapter 7: The Socket Interface

An application program interface (API) is a group of functions programmers use to develop applicaton programs for a particular computer environment. The UNIX Berkeley socket interface is an API developed at the University of California at Berkeley.

 Berkeley sockets were built into UNIX, but Windows sockets are built as a separate software library.

 Basic concepts of sockets

 Sockets use the open-read-write-close model, similar to file i/o. When a socket is opened, the library returns a file descriptor or socket handle which is a pointer to a record that contains information about the connection. But two essential differences existed.

 Passive and active processes. Unlike file i/o, a new concept of a server or a passively waiting process, had to be invented. Someone has to be home to "answer the phone" when the active (client) program issues a socket request.

 Connectionless operation. Fixed addressing is used for connection oriented communication of byte streams, but what about IP-style connectionless traffic? We'll see...

 Creating Sockets.

 File descriptors usually point to a real file (they are valid) upon creation, but sockets (and handles) are created first, then connected, in two operations. For example

 

socket_handle = socket(protocol_family,socket_type,protocol);
Protocol-family: e. g. TCP/IP is represented by symbolic value PF_INET. Other protocol families include Novell's NetWare and XNS - Xerox's multilayer protocol, an ancestor to Netware.

 socket-type: datagram (SOCK_DGRAM) or byte-stream (SOCK_STREAM) or raw data (SOCK_RAW). This latter is used by programs such as Ping to bypass the transport layer to get at stuff like ICMP, the Internet Control Message Protocol.

 protocol: e. g. TCP is designated by IPPROTO_TCP. The whole thing would then look like

 

socket_handle = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
The pointer 'socket_handle' comes back pointing to a formatted data structure which is the socket descriptor - but no I/O has actually taken place yet. This data structure will ultimately contain local and remote IP addresses and local and remote protocol port numbers. These data values are collectively called socket addresses - but these fields are still empty right now.

 Connecting a Socket.

 A client will execute code such as the following to initiate an active connection.

 

result=connect(socket_handle,remote_socket_address,
address_length)
The first descriptor cites the handle we just created. The second points to a special socket address structure, whose format and address_length depend on the protocol family in use; we'll deal with it later. Its contents point to the remote communications partner. The local ("self") information will be automatically filled in.

 The other end: servers, and connectionless systems.

 Both for servers, which will wait passively for traffic, and for connectionless systems (which ALWAYS wait for traffic), we have to create an entity to listen to a protocol port. That is done as follows:

 

result = bind(socket_handle, local_socket_address,
address_length)

This can be seen to be quite similar to the "connect" function.

 

Transmitting Data through A Socket

Berkeley sockets use five functions to transmit data: three for connected sockets and two for unconnected sockets.

Connected sockets' commands don't specify address (because the "connect" already specified them.) Example:

 

result=write(socket_handle, message_buffer, buffer_length)

This sends data from a simple buffer. But a more structured way to send data is provided by writev, which uses a mixed list of pointers and block sizes to designate a collection of data sources.

 

result=writev(socket_handle, io_vector, vector_length)

In this case, the io_vector has a structure consisting of alternating 32 bit entities. The first points to the first data block; the second tells how long that block is. The third points to another data block, etc. "vector_length" tells how many entries are in the table (not bytes or words, but separate data blocks.)

 The third means of transmitting data through connected sockets is with the send function, which is similar to the write function.

 

result=send(socket_handle, message_buffer, buffer_length,
special_flags)

The main difference is the special flags, which can be used to do things such as designate urgent or "out-of-band" data.

 Result of all these operations is an integer value telling how many bytes were transmitted through the socket. If an error occurs, the returned value is -1.

Connectionless sockets use other commands, to wit:

 result=sendto(socket_handle, message_buffer, buffer_length,
special_flags, socket_address_structure,
address_structure_length)

This is like the send command except you provide the address via two additional fields.

 Sendmsg is analogous to writev in that it uses a more elaborate storage mechanism for the outgoing message, which includes both the data and the address to which to send the data.

result=sendmsg(socket_handle, message_structure,
special_flags.)

 

Receiving Data Through A Socket

There are five receiving functions which correspond to the five transmitting functions described above - but you don't have to USE corresponding functions to communicate. You do have to use connection oriented receivers with connection-oriented senders, and conversely.
Transmit Function Corresponding Receive Fn.
send recv
write read
writev readv
sendto recvfrom
sendmsg recvmsg

The connection-oriented process works out via the following sequence of events at the server and client sites. Note that not all the commands in this example (e. g. listen(), accept()) have been explained yet.

 
Connection Oriented Server Connection Oriented Client
socket() socket()
bind()
listen()
accept()
blocks until it receives a client connection request
&#128; <-- negotiate connection --------> connect()
read() <--data (request) ---- write()
process the service request
write() -- data (reply) ----> read()

First, the server and clients both create sockets. Then the server binds the socket to a local protocol port with the bind() function. The "listen" command tells the server to stand ready for incoming connections and to acknowledge them. The "accept" command tells the server to accept any offered connection.

 The client doesn't call the bind function, because it doesn't actually need to specify a local port or IP address. The client will directly communicate via the connect() and write() and read() functions. Connect() puts the local port and IP information into the socket descriptor.

 The connectionless version of this transaction looks like this:

 
Connectionless Server Connectionless
Client
socket() socket()
bind() bind()
recvfrom()
blocks until it receives data from a client
&#128; <--data (request) ---- sendto()
process the service request
sendto() -- data (reply) ----> recfvrom()

This all makes sense in terms of the description above.

 

Using Sockets with Servers

Servers can be iterative or concurrent; we assume concurrent for this discussion. The "listen" function prepares the socket system to handle multiple (essentially) simultaneous requests for service, by placing incoming service requests in a queue.
result = listen(socket_handle, queue_length)

After the queue gets full, the socket will refuse additional connections and the client will receive an error message.

 Servers have no idea which client will request services. As such, the server cannot specify the client host address or protocol port. The "accept" function puts the socket system to sleep while waiting for an incoming connection.

 Accept has three parameters.

 result=accept(socket_handle, socket_addreess, address_length)

 An iterative server would call 'accept', deal with the problem and then call 'accept' again.

 A concurrent server should have a "wildcard" port, for receiving service requests in the form of TCP segments specifying ports that do not currently exist. It would create a new process to handle each service request. The child process receives a copy of the new socket and manages the service request. Then the master server closes its copy of the socket and calls 'accept' again.

Checking up on Sockets

 The 'select' function is capable of checking the status of multiple sockets without actually reading their data. It can be used to check for errors, or to determine the number and identity of ports that are ready for I/O.

 

Mini-Quiz for Chapter 7.

1. Discuss the passive role of servers and the 'bind' function, with regard to connection-oriented communication.

 2. Does the 'socket' function actually establish a connection to a remote host? What is this function's primary effect?

 3. Compare the 'write' and 'writev' functions.