Difference between revisions of "Direct communication"

From MosixWiki
Jump to: navigation, search
(New page: DIRECT COMMUNICATION(M7) MOSIX Description DIRECT COMMUNICATION(M7) '''NAME''' DIRECT COMMUNICATION - migratable sockets between MOSIX processes '''PURPOSE''' ...)
 
Line 14: Line 14:
 
     runs on node D, then the message has to pass over the network from B to A
 
     runs on node D, then the message has to pass over the network from B to A
 
     to C to D.  Using direct communication, the message will pass directly
 
     to C to D.  Using direct communication, the message will pass directly
     send messages to mailboxes of other processes anywhere in the Grid (that
+
     send messages to mailboxes of other processes anywhere within the multi-    cluster Grid (that are willing to accept them).
    are willing to accept them).
+
 
    
 
    
 
     Direct communication makes the location of processes transparent, so the
 
     Direct communication makes the location of processes transparent, so the
Line 21: Line 20:
 
     them by their home-node and process-ID (PID) in their home-node.
 
     them by their home-node and process-ID (PID) in their home-node.
 
    
 
    
      Direct communication guarantees that the order of messages per receiver
+
    Direct communication guarantees that the order of messages per receiver
 
     is preserved, even when the sender(s) and receiver migrate - no matter
 
     is preserved, even when the sender(s) and receiver migrate - no matter
      where to and how many times they migrate.
+
    where to and how many times they migrate.
 
      
 
      
 
  '''SENDING MESSAGES'''
 
  '''SENDING MESSAGES'''
Line 143: Line 142:
 
    
 
    
 
     There is one more tool to check whether the last message arrived from the
 
     There is one more tool to check whether the last message arrived from the
     local cluster or not (eg. from the rest of the Grid):
+
     local cluster or not (e.g., from other clusters in the Grid):
 
           in_my_cluster = open("/proc/self/in_cluster", O_CREAT, sender_home);
 
           in_my_cluster = open("/proc/self/in_cluster", O_CREAT, sender_home);
 
     This is a general-purpose tool that can be used to find whether any given
 
     This is a general-purpose tool that can be used to find whether any given
Line 168: Line 167:
 
    
 
    
 
     ECONNABORTED
 
     ECONNABORTED
             The home-node of the receiver is no longer in our Grid.
+
             The home-node of the receiver is no longer in our multi-            cluster Grid.
 
    
 
    
 
     EMFILE  The maximum of 128 direct communicaiton file-descriptors is
 
     EMFILE  The maximum of 128 direct communicaiton file-descriptors is

Revision as of 13:41, 23 November 2007

DIRECT COMMUNICATION(M7)       MOSIX Description      DIRECT COMMUNICATION(M7)
  
NAME
    DIRECT COMMUNICATION - migratable sockets between MOSIX processes
  
PURPOSE
    Normally, MOSIX processes do all their I/O (and most system-calls) via
    their home-node: this can be slow because operations are limited by the
    network speed and latency.  Direct communication allows processes to pass
    messages directly between them, bypassing their home-nodes.
  
    For example, if process X whose home-node is A and runs on node B wishes
    to send a message over a socket to process Y whose home-node is C and
    runs on node D, then the message has to pass over the network from B to A
    to C to D.  Using direct communication, the message will pass directly
    send messages to mailboxes of other processes anywhere within the multi-     cluster Grid (that are willing to accept them).
  
    Direct communication makes the location of processes transparent, so the
    senders do not need to know where the receivers run, but only to identify
    them by their home-node and process-ID (PID) in their home-node.
  
    Direct communication guarantees that the order of messages per receiver
    is preserved, even when the sender(s) and receiver migrate - no matter
    where to and how many times they migrate.
    
SENDING MESSAGES
    To start sending messages to another process, use:
          them = open("/proc/mosix/mbox/{a.b.c.d}/{pid}", 1);
    where {a.b.c.d} is the IP address of the receiver's home-node and {pid}
    is the process-ID of the receiver.  To send messages to a process with
    the same home-node, you can use 0.0.0.0 instead of the local IP address
    (this is even preferable, because it allows the communication to proceed
    in the rare event when the home-node is shut-down from its cluster).
    
    The returned value (them) is not a standard (POSIX) file-descriptor: it
    can only be used within the following system calls:
  
          w = write(them, message, length);
          fcntl(them, F_SETFL, O_NONBLOCK);
          fcntl(them, F_SETFL, 0);
          dup2(them, 1);
          dup2(them, 2);
          close(them);
    
    Zero-length messages are allowed.
    
    Each process may at any time have up to 128 open direct communication
    file-descriptors for sending messages to other processes.  These file-
    descriptors are inherited by child processes (after fork(2)).
    
    When dup2 is used as above, the corresponding file-descriptor (1 for
    standard-output; 2 for standard-error) is associated with sending mes-
    sages to the same process as them.  In that case, only the above calls
    (write, fcntl, close, but not dup2) can then be used with that descriptor.
   
RECEIVING MESSAGES
    To start receiving messages, create a mailbox:
          my_mbox = open("/proc/mosix/mybox", O_CREAT, flags);
    where flags is any combination (bitwise OR) of the following:
   
    1    Allow receiving messages from other users of the same group (GID).
    2    Allow receiving messages from all other users.
    4    Allow receiving messages from processes with other home-nodes.
    8    Do not delay: normally when attempting to receive a message and no
         fitting message was received, the call blocks until either a message
         or a signal arrives, but with this flag, the call returns immedi-
         ately a value of -1 (with errno set to EAGAIN).
    16   Receive a SIGIO signal (See signal(7)) when a message is ready to be
         read (for assynchroneous operation).
    32   Normally, when attempting to read and the next message does not fit
         in the read buffer (the message length is bigger than the count
         parameter of the read(2) system-call), the next message is trun-
         cated.  When this bit is set, the first message that fits the read-
         buffer will be read (even if out of order): if none of the pending
         messages fits the buffer, the receiving process either waits for a
         new message that fits the buffer to arrive, or if bit 8 ("do not
         delay") is also set, returns -1 with errno set to EAGAIN.
    64   Treat zero-length messages as an end-of-file condition: once a zero-
         length message is read, all further reads will return 0 (pending and
         future messages are not deleted, so they can still be read once this
         flag is cleared).
   
    The returned value (my_mbox) is not a standard (POSIX) file-descriptor:
    it can only be used within the following system calls:
   
          r = read(my_mbox, buf, count);
          dup2(my_mbox, 0);
          close(my_mbox);
   
    Reading my_mbox always reads a single message at a time, even when count
    allows reading more messages.  A message can have zero-length, but count
    must be positive.
   
    unlike in "SENDING MESSAGES" above, my_mbox is NOT inherited by child
    processes.
   
    When dup2 is used as above, file-descriptor 0 (standard-input) is associ-
    ated with receiving messages from other processes.  In that case, only
    the above calls (read, close, but not dup2) can then be used with file-
    descriptor 0.
   
    Closing my_mbox (or close(0) if dup2(my_mbox, 0) was used - whichever is
    closed last) discards all pending messages.
   
    To change the flags of the mailbox without losing any pending messages,
    open it again (without using close):
   
          my_mbox = open("/proc/mosix/mybox", O_CREAT, new_flags);
   
    Note that when removing permission-flags (1, 2 and 4) from new_flags,
    messages that were already sent earlier will still arrive, even from
    senders that are no longer allowed to send messages to the current pro-
    cess.  Re-opening always returns the same value (my_mbox) as the initial
    open (unless an error occurs and -1 is returned).  Also note that if
    dup2(my_mbox, 0) was used, new_flags will immediately apply to file-
    descriptor 0 as well.
   
    Extra information is available about the latest message that was read.
    To get this information, you should first define the following macro:
          static inline unsigned int GET_IP(char *file_name)
          {
                int ip = open(file_name, 0);
                return((unsigned int)((ip==-1 && errno>255) ? -errno: ip));
          }
  
    To find the IP address of the sender's home, use:
          sender_home = GET_IP("/proc/self/sender_home");
   
    To find the process-ID (PID) of the sender, use:
          sender_pid = open("/proc/self/sender_pid", 0);
 
    To find the IP address of the node where the sender was running when the
    message was sent, use:
          sender_location = GET_IP("/proc/self/sender_location");
    (this can be used, for example, to make a manual migration request in
    order to bring together communicating processes to the same node)
 
    To find the length of the last message, use:
          bytes = open("/proc/self/message_length", 0);
    (this makes it possible to detect truncated messages: if the last message
    was truncated, bytes will contain the original length)
 
    There is one more tool to check whether the last message arrived from the
    local cluster or not (e.g., from other clusters in the Grid):
         in_my_cluster = open("/proc/self/in_cluster", O_CREAT, sender_home);
    This is a general-purpose tool that can be used to find whether any given
    IP address is part of the caller's cluster: it is not strictly part of
    direct communication and should not be called too often (eg. per
    received-message) because it requires the home-node's assistance, so
    using it extensively would defeat the purpose and efficiency of direct
    communication.
 
ERRORS
    Sender errors:
 
    ENOENT  Invalid pathname in open: the specified IP address is not part of
            this cluster/Grid, or the process-ID is out of range (must be
            2-32767).
 
    ESRCH   No such process (this error is detected only when attempting to
            send - not when opening the connection).
 
    EACCES  No permission to send to that process.
 
    ENOSPC  Non-blocking (O_NONBLOCK) was requested and the receiver has no
            more space to accept this message - perhaps try again later.
  
    ECONNABORTED
            The home-node of the receiver is no longer in our multi-             cluster Grid.
 
    EMFILE  The maximum of 128 direct communicaiton file-descriptors is
            already in use.
 
    EINVAL  When opening, the second parameter does not contain the bit "1";
            When writing, the length is negative or more than 32MB.
 
    ETIMEDOUT
            Failed to establish connection with the mail-box managing daemon
            (postald).
 
    ECONNREFUSED
            The mail-box managing (postald) refused to serve the call (proba-
            bly a MOSIX installation error).
 
    EIO     Communication breakdown with the mail-box managing daemon (postald).
 
    Receiver errors:
 
    EAGAIN  No message is currently available for reading and the "Do not
            delay" flag is set.
 
    EXFULL  Messages were possibly lost (usually due to insufficient memory):
            the receiver may still be able to receive new messages.
 
    ENOMSG  The receiver had insufficient memory to store the last message.
            Despite this error, it is still possible to find out who sent the
            last message and its original length.
 
    Errors that are common to both sender and receiver:
 
    EINTR   Read/write interrupted by a signal.
 
    ENOMEM  Insufficient memory to complete the operation.
 
    EFAULT  Bad read/write buffer address.
 
    ENETUNREACH
            Could not establish a connection with the mail-box managing dea-
            mon (postald).
 
    ECONNRESET
            Connection lost with the mail-box managing daemon (postald).
 
POSSIBLE APPLICATIONS
    The scope of direct communication is very wide: almost any program that
    requires communication between related processes can benefit.  Following
    are a few examples:
 
    1.   Use direct communication within standard communication packages and
         libraries, such as MPI.
 
    2.   Pipe-like applications where one process' output is the other's
         input: write your own code or use the existing mospipe(1) MOSIX
         utility.
 
    3.   Direct communiction can be used to implement fast I/O for migrated
         processes (with the cooperation of a local process on the node where
         the migrated process is running).  In particular, it can be used to
         give migrated processes access to data from a common NFS server
         without causing their home-node to become a bottleneck.
 
LIMITATIONS
    Processes that are involved in direct communication (having open file-
    descriptors for either sending or receiving messages) cannot be check-
    pointed and cannot execute mosrun recursively or native (see mosrun(1)).
 
SEE ALSO
    mosrun(1), mospipe(1), mosix(7).
 
MOSIX                              May 2006                              MOSIX