Direct communication
From MosixWiki
DIRECT COMMUNICATION(M7) MOSIX Description DIRECT COMMUNICATION(M7) NAME DIRECT COMMUNICATION - migratable sockets between MOSIX processes PURPOSE Normally, MOSIX processes do all their I/O (and most system-calls) via their home-node: this can be slow because operations are limited by the network speed and latency. Direct communication allows processes to pass messages directly between them, bypassing their home-nodes. For example, if process X whose home-node is A and runs on node B wishes to send a message over a socket to process Y whose home-node is C and runs on node D, then the message has to pass over the network from B to A to C to D. Using direct communication, the message will pass directly send messages to mailboxes of other processes anywhere in the Grid (that are willing to accept them). Direct communication makes the location of processes transparent, so the senders do not need to know where the receivers run, but only to identify them by their home-node and process-ID (PID) in their home-node. Direct communication guarantees that the order of messages per receiver is preserved, even when the sender(s) and receiver migrate - no matter where to and how many times they migrate. SENDING MESSAGES To start sending messages to another process, use: them = open("/proc/mosix/mbox/{a.b.c.d}/{pid}", 1); where {a.b.c.d} is the IP address of the receiver's home-node and {pid} is the process-ID of the receiver. To send messages to a process with the same home-node, you can use 0.0.0.0 instead of the local IP address (this is even preferable, because it allows the communication to proceed in the rare event when the home-node is shut-down from its cluster). The returned value (them) is not a standard (POSIX) file-descriptor: it can only be used within the following system calls: w = write(them, message, length); fcntl(them, F_SETFL, O_NONBLOCK); fcntl(them, F_SETFL, 0); dup2(them, 1); dup2(them, 2); close(them); Zero-length messages are allowed. Each process may at any time have up to 128 open direct communication file-descriptors for sending messages to other processes. These file- descriptors are inherited by child processes (after fork(2)). When dup2 is used as above, the corresponding file-descriptor (1 for standard-output; 2 for standard-error) is associated with sending mes- sages to the same process as them. In that case, only the above calls (write, fcntl, close, but not dup2) can then be used with that descriptor. RECEIVING MESSAGES To start receiving messages, create a mailbox: my_mbox = open("/proc/mosix/mybox", O_CREAT, flags); where flags is any combination (bitwise OR) of the following: 1 Allow receiving messages from other users of the same group (GID). 2 Allow receiving messages from all other users. 4 Allow receiving messages from processes with other home-nodes. 8 Do not delay: normally when attempting to receive a message and no fitting message was received, the call blocks until either a message or a signal arrives, but with this flag, the call returns immedi- ately a value of -1 (with errno set to EAGAIN). 16 Receive a SIGIO signal (See signal(7)) when a message is ready to be read (for assynchroneous operation). 32 Normally, when attempting to read and the next message does not fit in the read buffer (the message length is bigger than the count parameter of the read(2) system-call), the next message is trun- cated. When this bit is set, the first message that fits the read- buffer will be read (even if out of order): if none of the pending messages fits the buffer, the receiving process either waits for a new message that fits the buffer to arrive, or if bit 8 ("do not delay") is also set, returns -1 with errno set to EAGAIN. 64 Treat zero-length messages as an end-of-file condition: once a zero- length message is read, all further reads will return 0 (pending and future messages are not deleted, so they can still be read once this flag is cleared). The returned value (my_mbox) is not a standard (POSIX) file-descriptor: it can only be used within the following system calls: r = read(my_mbox, buf, count); dup2(my_mbox, 0); close(my_mbox); Reading my_mbox always reads a single message at a time, even when count allows reading more messages. A message can have zero-length, but count must be positive. unlike in "SENDING MESSAGES" above, my_mbox is NOT inherited by child processes. When dup2 is used as above, file-descriptor 0 (standard-input) is associ- ated with receiving messages from other processes. In that case, only the above calls (read, close, but not dup2) can then be used with file- descriptor 0. Closing my_mbox (or close(0) if dup2(my_mbox, 0) was used - whichever is closed last) discards all pending messages. To change the flags of the mailbox without losing any pending messages, open it again (without using close): my_mbox = open("/proc/mosix/mybox", O_CREAT, new_flags); Note that when removing permission-flags (1, 2 and 4) from new_flags, messages that were already sent earlier will still arrive, even from senders that are no longer allowed to send messages to the current pro- cess. Re-opening always returns the same value (my_mbox) as the initial open (unless an error occurs and -1 is returned). Also note that if dup2(my_mbox, 0) was used, new_flags will immediately apply to file- descriptor 0 as well. Extra information is available about the latest message that was read. To get this information, you should first define the following macro: static inline unsigned int GET_IP(char *file_name) { int ip = open(file_name, 0); return((unsigned int)((ip==-1 && errno>255) ? -errno: ip)); } To find the IP address of the sender's home, use: sender_home = GET_IP("/proc/self/sender_home"); To find the process-ID (PID) of the sender, use: sender_pid = open("/proc/self/sender_pid", 0); To find the IP address of the node where the sender was running when the message was sent, use: sender_location = GET_IP("/proc/self/sender_location"); (this can be used, for example, to make a manual migration request in order to bring together communicating processes to the same node) To find the length of the last message, use: bytes = open("/proc/self/message_length", 0); (this makes it possible to detect truncated messages: if the last message was truncated, bytes will contain the original length) There is one more tool to check whether the last message arrived from the local cluster or not (eg. from the rest of the Grid): in_my_cluster = open("/proc/self/in_cluster", O_CREAT, sender_home); This is a general-purpose tool that can be used to find whether any given IP address is part of the caller's cluster: it is not strictly part of direct communication and should not be called too often (eg. per received-message) because it requires the home-node's assistance, so using it extensively would defeat the purpose and efficiency of direct communication. ERRORS Sender errors: ENOENT Invalid pathname in open: the specified IP address is not part of this cluster/Grid, or the process-ID is out of range (must be 2-32767). ESRCH No such process (this error is detected only when attempting to send - not when opening the connection). EACCES No permission to send to that process. ENOSPC Non-blocking (O_NONBLOCK) was requested and the receiver has no more space to accept this message - perhaps try again later. ECONNABORTED The home-node of the receiver is no longer in our Grid. EMFILE The maximum of 128 direct communicaiton file-descriptors is already in use. EINVAL When opening, the second parameter does not contain the bit "1"; When writing, the length is negative or more than 32MB. ETIMEDOUT Failed to establish connection with the mail-box managing daemon (postald). ECONNREFUSED The mail-box managing (postald) refused to serve the call (proba- bly a MOSIX installation error). EIO Communication breakdown with the mail-box managing daemon (postald). Receiver errors: EAGAIN No message is currently available for reading and the "Do not delay" flag is set. EXFULL Messages were possibly lost (usually due to insufficient memory): the receiver may still be able to receive new messages. ENOMSG The receiver had insufficient memory to store the last message. Despite this error, it is still possible to find out who sent the last message and its original length. Errors that are common to both sender and receiver: EINTR Read/write interrupted by a signal. ENOMEM Insufficient memory to complete the operation. EFAULT Bad read/write buffer address. ENETUNREACH Could not establish a connection with the mail-box managing dea- mon (postald). ECONNRESET Connection lost with the mail-box managing daemon (postald). POSSIBLE APPLICATIONS The scope of direct communication is very wide: almost any program that requires communication between related processes can benefit. Following are a few examples: 1. Use direct communication within standard communication packages and libraries, such as MPI. 2. Pipe-like applications where one process' output is the other's input: write your own code or use the existing mospipe(1) MOSIX utility. 3. Direct communiction can be used to implement fast I/O for migrated processes (with the cooperation of a local process on the node where the migrated process is running). In particular, it can be used to give migrated processes access to data from a common NFS server without causing their home-node to become a bottleneck. LIMITATIONS Processes that are involved in direct communication (having open file- descriptors for either sending or receiving messages) cannot be check- pointed and cannot execute mosrun recursively or native (see mosrun(1)). SEE ALSO mosrun(1), mospipe(1), mosix(7). MOSIX May 2006 MOSIX