Mosix

From MosixWiki
Revision as of 12:16, 13 November 2007 by Amnon (Talk | contribs)

Jump to: navigation, search
MOSIX(M7)                      MOSIX Description                     MOSIX(M7)
 
NAME
     MOSIX - sharing the power of clusters and multiclusters (grids)
     
INTRODUCTION
    MOSIX is a generic solution for dynamic management of resources in a
    cluster or in a grid. MOSIX allows users to draw the most out of all the
    connected computers, including utilization of idle computers.
  
    At the core of MOSIX are adaptive resource sharing algorithms, applying
    preemptive process migration based on processor loads, memory and I/O
    demands of the processes, thus causing the cluster or the grid to work
    cooperatively similar to a single computer with many processors.
  
    The "cluster" concept of MOSIX need not correspond to a particular con-
    figuration of computers: each MOSIX cluster may range from a single work-
    station to a large combination of computers - workstations, servers,
    blades, multi-core computers, etc. possibly of different speeds and num-
    ber of processors.
  
    A MOSIX "grid" is a collection of clusters that belong to different enti-
    ties (owners) who wish to share their resources subject to certain admin-
    istrative conditions, the most prevailing condition being that when an
    owner needs its computers - these computers are returned immediately to
    the exclusive use of their owner. An owner can also assign priorities to
    guest processes of other owners, defining who can use their computers and
    when.  Typically, an owner is an individual user, a group of users or a
    department that own the computers.  The grid is usually restricted, due
    to trust and security reasons, to a single organization, possibly in var-
    ious sites/branches, even across the world.
  
    MOSIX supports dynamic grid configurations, where clusters can join and
    leave the grid at any time.  When there are plenty of resources in the
    grid, the MOSIX queuing system allows more processes to start.  When
    resources become scarce (because other clusters leave or claim their
    resources and processes must migrate back to their home-clusters), MOSIX
    has a freezing feature that can automatically freeze excess processes to
    prevent memory-overload on the home-nodes.
  
    This version of MOSIX is based on Linux for the x86 family of processors.
    Unlike earlier versions of MOSIX, only programs that are started by the
    mosrun(1) utility are affected and can be considered "migratable" - other
    programs are considered as "standard Linux programs" and are not affected
    by MOSIX.
  
    MOSIX maintains a high level of compatiblity with standard Linux, so that
    binaries of almost every application that runs under Linux can run com-
    pletely unmodified under the MOSIX "migratable" category.  The exceptions
    are usually system-administration or graphic utilities that would not
    benefit from process-migration anyway.  If a "migratable" program that
    was started by mosrun(1) attempts to use unsupported features, it will
    either be killed with an appropriate error message, or if a ``do not
    kill option is selected, an error is returned to the program: such pro-
    grams should probably run as standard Linux programs.
   
    In order to improve the overall resource usage, processes of "migratable"
    programs may be moved automatically and transparently to other nodes
    within the cluster and the grid.  As the demands for resources change,
    processes may move again, as many times as necessary, to continue opti-
    mizing the overall resource utilization, subject to the inter-grid prior-
    ities and policies.  Manual-control over process migration is also sup-
    ported.
     
    MOSIX is particularly suitable for running CPU-intensive computational
    programs with unpredictable resource usage and run times, and programs
    with moderate amounts of I/O.  Programs that perform large amounts of I/O
    should better be run as standard Linux programs.
    
    Apart from process-migration, MOSIX can provide both "migratable" and
    "standard Linux" programs with the benefits of optimal initial assignment
    and live-queuing.  The unique feature of live-queuing means that although
    a job is queued to run later, when resources are available, once it
    starts, it remains attached to its original Unix/Linux environment (stan-
    dard-input/output/error, signals, etc.).
  
REQUIREMENTS
    1.   All the nodes in the cluster must be connected to a network that
         supports TCP/IP and UDP/IP under Linux and be accessible to each
         other using unique IP addresses in the range 0.1.0.0 to
         255.254.254.255.
  
    2.   The architecture of all nodes can be either i386 (32-bit) or x86_64
         (64-bit).  Processes that are started on a 32-bit node can migrate
         to a 64-bit node, but not the opposite.
 
    3.   In Multiprocessor nodes (SMP), all the processors must be of the
         same speed.
 
    4.   The system-administrators of all the connected nodes must be able to
         trust each other (see more on SECURITY below).
   
CONFIGURATION
    To configure MOSIX interactively, simply run mosconf: it will lead you
    step-by-step through the various configuration items.
 
    The following describes the MOSIX configuration files in depth for manual
    configuration.  The directory /etc/mosix should be created, with at least
    the following files:
 
    /etc/mosix/mosix.map
           This file defines which computers participate in your MOSIX clus-
           ter.  The file contains up to 256 data lines and/or alias lines
           that can be in any order.  It may also include any number of com-
           ment lines beginning with a '#', as well as empty lines.
 
           Data lines have 2 or 3 fields:
 
           1.   The IP ("a.b.c.d" or host-name) of the first node in a range
                of nodes with consecutive IPs.
 
           2.   The number of nodes in that range.
 
           3.   Optional combination of letter-flags:
                p[roximate]  do not use compression on migration, e.g., over
                             fast networks or slow CPUs.
                o[utsider]   inaccessible to local-class processes.
 
           Alias lines are of the form:
               a.b.c.d=e.f.g.h
           or
               a.b.c.d=host-name
 
           They mean that the IP address on the left-hand-side refers to the
           same node as the right-hand-side.
 
           NOTES:
 
           1.   It is an error to attempt to declare the local node an "out-
                sider".
 
           2.   When using host names, the first result of gethostbyname(3)
                must return their IP address that is to be used by MOSIX: if
                in doubt - specify the IP address.
 
           3.   The right-hand-side in alias lines must appear within the
                data lines.
 
           4.   IP addresses 0.0.x.x and 255.255.255.x are not allowed in
                MOSIX.
 
           5.   If you change /etc/mosix/mosix.map while MOSIX is running,
                you need to run setpe to notify MOSIX of the changes.
 
    /etc/mosix/secret
           This is a security file that is used to prevent ordinary users
           from interfering and/or compromizing security by connecting to the
           internal MOSIX TCP ports.  The file should contain just a single
           line with a password that must be identical on all the nodes of
           the cluster/grid.  This file must be accessible to ROOT only
           (chmod 600!)
 
    /etc/mosix/ecsecret
           Like /etc/mosix/secret, but used for running batch jobs as a
           client (see mosrun(1)).  If you do not wish to allow this node to
           send batch-jobs, do not create this file.
 
    /etc/mosix/essecret
           Like /etc/mosix/secret, but used for running batch jobs as a
           server (see mosrun(1)).  The password must match the client's
           /etc/mosix/ecsecret.  If you do not wish to allow this node to be
           a batch-server, do not create this file.
 
    /etc/mosix/partners
           This directory specifies other clusters that this cluster wishes
           to cooperate with.  If this cluster is not part of a grid, this
           directory must exist, but remain empty.
 
    /etc/mosix/grid
           This directory must be created.
 
    The following files are optional:
 
    /etc/mosix/mosip
           This file should contain our IP address, to be used for MOSIX pur-
           poses, in the regular format - a.b.c.d.  This file can be omitted
           only if the output of ifconfig(8) ("inet addr:") matches exactly
           one of the IP addresses listed in the data lines of
           /etc/mosix/mosix.map.
 
    /etc/mosix/myfeatures
           This file contains one line of comma-separated topological fea-
           tures for this node (if any).  For example: yellow,wood,chicken.
 
           The list of all 32 features (one line per feature) can be found in
           /etc/mosix/features.
 
           If this file is missing, this node is assumed to have no topologi-
           cal features.  (see topology(7))
 
    /etc/mosix/freeze.conf
           This file sets the automatic freezing policies on a per-class
           basis for MOSIX processes originating in this node.  Each line
           describes the policy for one class of processes.  The lines can be
           in any order and classes that are not mentioned are not touched by
           the automatic freezing mechanisms.
 
           The space-separated constants in each line are as follows:
           1. class-number
                  A positive integer identifying a class of processes
           2. load-units:
                  Used in fields #3-#6 below: 0=processes; 1=standard-load
           3. RED-MARK (floating point)
                  Freeze when load is higher
           4. BLUE-MARK (floating point)
                  Unfreeze when load is lower
           5. minautofreeze (floating point)
                  Freeze processes that are evacuated back home on arrival if
                  load gest equal or above this
           6. minclustfreeze (floating point)
                  Freeze processes that are evacuated back to this cluster on
                  arrival if load gets equal or above this
           7. min-keep
                  Keep running at least this number of processes - even if
                  load is above RED-MARK.
           8. max-procs
                  Freeze excess processes above this number - even if load is
                  below BLUE-MARK.
           9. slice
                  Time (in minutes) that a process of this class is allowed
                  to run while there are automatically-frozen process(es) of
                  this class. After this period, the running process will be
                  frozen and a frozen process will start to run.
 
           NOTES:
 
           1.   The load-units in fields #3-#6 depend on field #2.  If 0,
                each unit represents the load created by a CPU-bound process
                on this computer.  If 1, each unit represents the load cre-
                ated by a CPU-bound process on a "standard" MOSIX computer
                (e.g.  a 3GHz Pentium-IV).  The difference is that the faster
                the computer and the more processors it has, the load created
                by each CPU process decreases proportionally.
 
           2.   Fields #3,#4,#5,#6 are floating-point, the rest are integers.
 
           3.   A value of "-1" in fields #3,#5,#6,#8 means ignoring that
                feature.
 
           4.   The first 4 fields are mandatory: omitted fields beyond them
                have the following values: minautofreeze=-1,mincluster-
                freeze=-1,min-keep=0, max-procs=-1,slice=20.
 
           5.   The RED-MARK must be significantly higher than BLUE-MARK:
                otherwise a perpetual cycle of freezing and unfreezing could
                occur.  You should allow at least 1.1 processes difference
                between them.
 
           6.   Frozen processes do not respond to anything, except an
                unfreeze request or a signal that kills them.
 
           7.   Processes that were frozen manually are not unfrozen automat-
                ically.
 
           This file may also contain lines starting with '/' to indicate
           freezing-directory names.  A "Freezing directory" is an existing
           directory (often a mount-point) where the memory contents of
           frozen process is saved.  For successful freezing, the disk-parti-
           tion of freezing-directories should have sufficient free disk-
           space to contain the memory image of all the frozen processes.
 
           If more than one freezing directory is listed, the freezing direc-
           tory is chosen at random by each freezing process.  It is also
           possible to assign selection probabilities by adding a numeric
           weight after the directory-name, for example:
 
                /tmp       2
                /var/tmp   0.5
                /mnt/tmp   2.5
 
                In this example, the total weight is 2+0.5+2.5=5, so out of
                every 10 frozen processes, an average of 4 (10*2/5) will be
                frozen to /tmp, an average of 1 (10*0.5/5) to /var/tmp and an
                average of 5 (10*2.5/5) to /mnt/tmp.
 
           When the weight is missing, it defaults to 1.  A weight of 0 means
           that this directory should be used only if all others cannot be
           accessed.
 
           If no freezing directories are specified, all freezing will be to
           the /freeze directory (or symbolic-link).
 
           Freezing files are usually created with "root" (Super-User) per-
           missions, but if /etc/mosix/freeze.conf contains a line of the
           form:
                 U {UID}
           then they are created with permissions of the given numeric UID
           (this is sometimes needed when freezing to NFS directories that do
           not allow "root" access).
 
    /etc/mosix/partners/*
           If your cluster is part of a grid, then you should designate one
           file in this directory for each of the other clusters in the grid.
 
           The file-names should indicate the corresponding cluster-names
           (maximum 128 characters), for example: "geography", "chemistry",
           "management", "development", "sales", "students-lab-A", etc.  The
           format of each file is a follows:
 
           Line #1:
                  A verbal human-readable description of the cluster.
           Line #2:
                  Four space-separated integers as follows:
 
                  1. Priority:
                           0-65535, the lower the better.
                           The priority of the local cluster is always 0.
                           MOSIX gives precedence to processes with higher
                           priority - if they arrive, guests with lower pri-
                           ority will be expelled.
                  2. Cango:
                           0=never send local processes to that cluster.
                           1=local processes may go to that cluster.
                  3. Cantake:
                           0=do not accept guest-processes from that cluster.
                           1=accept guest-processes from that cluster.
                  4. Canexpand:
                           0=no:   Only nodes listed in the lines below may
                                   be recognized as part of that cluster: if
                                   a core node from that cluster tells us
                                   about other nodes in their cluster -
                                   ignore those unlisted nodes.
                           1=yes:  Core-nodes of that cluster may specify
                                   other nodes that are in that cluster, and
                                   this node should believe them even if they
                                   are not listed in the lines below.
                           -1=do not ask the other cluster:
                                   do not consult the other cluster to find
                                   out which nodes are in that cluster:
                                   instead just rely on and use the lines
                                   below.
           Following lines:
                  Each line describes a range of consecutive IP addresses
                  that are believed to be part of the other cluster, contain-
                  ing 5 space-separated items as follows:
 
                  1. IP1 (or host-name):
                       First node in range.
                  2. n:
                       Number of nodes in this range.
                  3. Core:
                       0=no:   This range of nodes may not inform us about
                               who else is in that cluster.
                       1=yes:  This range of nodes could inform us of who
                               else is in that cluster.
                  4. Participate:
                       0=no    This range is (as far as this node is con-
                               cerned) not part of that cluster.
                       1=yes   This range is probably a part of that cluster.
                  5. Proximate:
                       0=no    Use compression on migration to/from that
                               cluster.
                       1=yes   Do not use compression when migrating to/from
                               that cluster (network is very fast and CPU is
                               slow).
           NOTES:
 
           1.   From time-to-time, MOSIX will consult one or more of the
                "core" nodes to find the actual map of their cluster.  It is
                recommended to list such core nodes. The alternative is to
                set canexpand to -1, causing the map of that cluster to be
                determined solely by this file.
 
           2.   Nodes that do not "participate" are excluded even if listed
                as part of their cluster by the core-nodes (but they could
                possibly still be used as "core-nodes" to list other nodes)
 
           3.   All core-nodes must have the same value for "proximate",
                because the "proximate" field of unlisted nodes is copied
                from that of the core-node from which we happened to find
                about them and this cannot be ambiguous.
 
           4.   When using host names rather than IP addresses, the first
                result of gethostbyname(3) must return their IP address that
                is used by MOSIX: if in doubt - specify the IP address
                instead.
 
           5.   IP addresses 0.0.x.x and 255.255.255.x cannot be used in
                MOSIX.
 
    /etc/mosix/userview.map
           Although you could use only IP numbers and host-names to specify
           and display nodes of your cluster and grid, it is more convenient
           to use small integers as node numbers.  This file allows you to
           map integers to IP addresses.  Each line in this file contains 3
           elements:
 
           1.   A node number (1-65535)
           2.   IP1 (or host-name, clearly identifiable by gethostbyname(3))
           3.   Number of nodes in range (the number of the last one must not
                exceed 65535)
 
           Not all the nodes in the grid (or even the cluster) need to have a
           mapping - you should specify only the ones that you often use.
 
    /etc/mosix/queue.conf
           This file configures the queueing system (see mosrun(1), mosq(1)).
           All lines in this file are optional and may appear in any order.
           Usually, one node in each cluster is assigned by the system-admin-
           istrator to manage the queue, while the remaining nodes point to
           that manager.
 
           Defining the queue manager:
 
           The line:
                C {hostname}
           assigns a specific node from the cluster (hostname) to manage the
           job queue.  This line should appear in every node of the cluster.
           In the absence of this line, each node manages its own queue.
 
           Defining the default priority:
 
           The line:
                P {priority}
           assigns a default job-priority to all the jobs from this node.
           The lower this value - the higher the priority.  In the absence of
           this line, the default priority is 50.
 
           Commonly, user-ID's are identical on all the nodes in the cluster.
           The line (with a single letter):
                S
           indicates that this is not the case, so users on other nodes
           (except the Super-User) will be prevented from sending requests to
           modify the status of queued jobs from this node.
 
           Configuring the queue manager:
 
           The following lines are relevant only in the queue manager node
           and are ignored on all other nodes:
 
           The MOSIX queueing system determines dynamically how many pro-
           cesses to run.  The line:
                M {maxproc}
           if present, imposes a maximal number of processes that are allowed
           to run from the queue simultaneously on top of the regular queue-
           ing policy.  For example,
                M 20
           sets the upper limit to 20 processes, even if the cluster/grid has
           more resources.
 
           The line:
                X {1 <= x <= 8}
           defines the maximal number of queued processes that may run simul-
           taneously per CPU.  This option applies only to processors within
           the cluster and is not available for the rest of the grid, where
           the queueing system may run at most one process per CPU.  In the
           absence of this line the default is
                X 1
 
           The line:
                Z {n}
           causes the first n jobs of priority 0 to start immediately (out of
           order), without checking if resources are available, leaving that
           responsibility to the system administrator.
 
           Example: the cluster has 10 dual-CPU nodes, so the queueing system
           normally allows 20 jobs to run.  In order to allow urgent jobs to
           run immediately (without waiting for regular jobs to complete),
           the system administrator configures a line: Z 10, thus allowing
           each node to run a maximum of 3 jobs.
 
           Fair-share policy:
                The fairness policy determine the order in which jobs are
                initially placed in the queue.  Note that fairness should not
                be confused with priority (as defined by the P {priority}
                line or by mosrun -q{pri} and possibly modified by mosq(1)):
                priorities always take precedence, so here we only discuss
                the initial placement in the queue of jobs with the same pri-
                ority.
 
                The default queueing policy is "first-come-first-served".
                Alternatively, jobs of different users can be placed in the
                queue in an interleaved manner.
 
                The line (with a single letter):
                     F
                switches the queueing policy to the interleaved policy.
 
                The advantage of the interleaved approach is that a user
                wishing to run a relatively small number of processes, does
                not need to wait for all the jobs that were already placed in
                the queue.  The disadvantage is that older jobs need to wait
                longer.
 
                Normally, the interleaving ratio is equal among all users.
                For example, with two users (A and B) the queue may look like
                A-B-A-B-A-B-A-B.
 
                Each user is assigned an interleave ratio which determines
                (proportionally) how well their jobs will be placed in the
                queue relative to other users: the smaller that ratio - the
                better placement they will get (and vice versa).  Normally
                all users receive the same default interleave-ratio of 10 per
                process.  However, lines of the form:
                     U {UID} {1 <= interleave <= 100}
                can set a different interleave ratio for different users.
                UID can be either numeric or symbolic and there is no limit
                on the number of these 'U' lines.  Examples:
                1.   Two users (A & B):
                     U userA 5
                     (userB is not listed, hence it gets the default of 10)
                     The queue looks like: A-A-B-A-A-B-A-A-B...
                2.   Two users (A & B):
                     U userA 20
                     U userB 15
                     The queue looks like: B-A-B-A-B-A-B-B-A-B-A-B-A-B-B-A...
                3.   Three users (A, B & C):
                     U userA 25
                     U userB 7
                     (userC is not listed, hence it gets the default of 10)
                     The queue looks like: B-C-B-C-B-A-B-C-B-C-B-A-B-C-B-C...
 
                Note that since the interleave ratio is determined per pro-
                cess (and not per job), different (more complex) results will
                occur when multi-process jobs are submitted to the queue.
 
    /etc/mosix/private.conf
           This file specifies where Private Temporary Files (PTFs) are
           stored: PTFs are an important feature of mosrun(1) and may consume
           a significant amount of disk-space.  It is important to ensure
           that sufficient disk-space is reserved for PTFs, but without
           allowing them to disturb other jobs by filling up disk-partitions.
           Guest processes can also demand unpredictable amounts of disk-
           space for their PTFs, so we must make sure that they do not dis-
           turb local operations.
 
           up to 3 different directories can be specified: for local pro-
           cesses; guest-processes from the local cluster; and guest-pro-
           cesses from the rest of the grid.  Accordingly, each line in this
           file has 3 fields:
 
           1.   A combination of the letters: 'O' (own node), 'C' (own clus-
                ter) and 'G' (rest of the grid).  For example, OC, C, CG or
                OCG.
           2.   A directory name (usually a mount-point) starting with '/',
                where PTFs for the above processes are to be stored.
           3.   An optional numeric limit, in Megabytes, of the total size of
                PTFs per-process.
 
           If /etc/mosix/private.conf does not exist, then all PTFs will be
           stored in "/private".  If the directory "/private" also does not
           exist, or if /etc/mosix/private.conf exists but does not contain a
           line with an appropriate letter in the first field ('O', 'C' or
           'G'), then no disk-space is allocated for PTFs of the affected
           processes, which usually means that processes requiring PTFs will
           not be able to run on this node.  Such guest processes that start
           using PTFs will migrate back to their home-nodes.
 
           When the third field is missing, it defaults to:
                   5 Gigabytes for local processes.
                   2 Gigabytes for processes from the same cluster.
                   1 Gigabyte for processes from the rest of the grid.
           In any case, guest processes cannot exceed the size limit of their
           home-node even on nodes that allow them more space.
 
    /etc/mosix/retainpri
           This file contains an integer, specifying a delay in seconds: how
           long after all MOSIX processes of a certain priority (see above,
           /etc/mosix/priority) finish (or leave) to allow processes of lower
           priority (higher numbers) to start.  When this file is absent,
           there is no delay and processes with lower priority may arrive as
           soon as there are no processes with a higher priority.
 
    /etc/mosix/speed
           If this file exists, it should contain a positive integer
           (1-10,000,000), providing the relative speed of the processor: the
           bigger the faster, where 10,000 units of speed are equivalent to a
           3GHz Pentium-IV, and AMD (Athlon or Opteron) processors are, as a
           rule of thumb, 1.5 times faster than Intel processors of the same
           frequency.
 
           Normally this file is not necessary because the speed of the pro-
           cessor is automatically detected by the kernel when it boots.
           There are however two cases when you should consider using this
           option:
           1.   When you have a heterogeneous cluster and always use MOSIX to
                run a specific program (or programs) that perform better on
                certain processor-types than on others.
           2.   On Virtual-Machines that run over a hosting operating-system:
                in this case, the speed that the kernel detects is unreliable
                and can vary significantly depending on the load of the
                underlying operating-systems when it boots.
 
    /etc/mosix/maxguests
           If this file exists, it should contain an integer limit on the
           number of simultaneous guest-processes from the grid.  Otherwise,
           the maximum number of guest-processes from the grid is set to the
           default of 10.
 
    /etc/mosix/.log_mosrun
           When this file is present, information about invocations of
           mosrun(1) and process migrations will be recorded in the system-
           log (by default "/var/log/messages" on most Linux distributions).
 
    /etc/mosix/mostune
           Tuning constants optimizes the MOSIX performance by telling it
           about the costs of networked operations.  MOSIX has built-in tun-
           ing default constants.  This file is used to override them to suit
           your particular hardware and networks.
  
           For most users, This file is difficult to set up manually. Thus,
           MOSIX comes with a program to assemble it.  For more information,
           see topology(7).
 
INTERFACE FOR PROGRAMS
    The following interface is provided for programs running under mosrun(1)
    that wish to interface with their MOSIX run-time environment:
 
    All access to MOSIX is performed via the "open" system call, but the use
    of "open" is incidental and does not involve actual opening of files.  If
    the program were to run as a regular Linux program, those "open" calls
    would fail, returning -1, since the quoted files never exist, and
    errno(3) would be set to ENOENT.
 
    open("/proc/self/{special}", 0)
    reads a value from the MOSIX run-time environment.
 
    open("/proc/self/{special}", 1|O_CREAT, newval)
    writes a value to the MOSIX run-time environment.
 
    open("/proc/self/{special}", 2|O_CREAT, newval)
    both writes a new value and return the previous value.
 
    (the O_CREAT flag is only required when your program is compiled with the
    64-bit file-size option, but is harmless otherwise).
 
    Some "files" are read-only, some are write-only and some can do both
    (rw).  The "files" are as follows:
  
    /proc/self/migrate
           writing a 0 migrates back home; writing -1 causes a migration con-
           sideration; writing the unsigned value of an IP address or a logi-
           cal node number, attempts to migrate there.  Successful migration
           returns 0, failure returns -1 (write only)
 
    /proc/self/lock
           When locked (1), no automatic migration may occur (except when
           running on the current node is no longer allowed); when unlocked
           (0), automatic migration can occur. (rw)
 
    /proc/self/whereami
           reads where the program is running: 0 if at home, otherwise usu-
           ally an unsigned IP address, but if possible, its corresponding
           logical node number. (read only)
 
    /proc/self/nmigs
           reads the total number of migrations performed by this process and
           its MOSRUN ancesstors before it was born. (read only)
 
    /proc/self/sigmig
           Reads/sets a signal number (1-64 or 0 to cancel) to be received
           after each migration. (rw)
 
    /proc/self/glob
           Reads/modifies the process class.  Processes of class 0 are not
           allowed to migrate outside the local cluster.  Classes can also
           affect the automatic-freezing policy. (rw)
  
    /proc/self/needmem
           Reads/modifies the process's memory requirement in Megabytes, so
           it does not automatically migrate to nodes with less free memory.
           Acceptable values are 0-262143.  (rw)
 
    /proc/self/unsupportok
           when 0, unsupported system-calls cause the process to be killed;
           when 1 or 2, unsupported system-calls return -1 with errno set to
           ENOSYS; when 2, an appropriate error-message will also be written
           to stderr. (rw)
 
    /proc/self/clear
           clears process statistics. (write only)
 
    /proc/self/cpujob
           Normally when 0, system-calls and I/O are taken into account for
           migration considerations.  When set to 1, they are ignored. (rw)
 
    /proc/self/localtime
           When 0, gettimeofday(2) is always performed on the home node.
           When 1, the date/time is taken from where the process is running.
           (rw)
 
    /proc/self/decayrate
           Reads/modifies the decay-rate per second (0-10000): programs can
           alternate between periods of intensive CPU and periods of demand-
           ing I/O.  Decisions to migrate should be based neither on momen-
           tary program behaviour nor on extremely long term behaviour, so a
           balance must be struck, where old process statistics gradually
           decay in favour of newer statistics.  The lesser the decay rate,
           the more weight is given to new information.  The higher the decay
           rate, the more weight is given to older information.  This option
           is provided for users who know well the cyclic behavior of their
           program. (rw)
 
    /proc/self/checkpoint
           When writing (any value) - perform a checkpoint.  When only read-
           ing - return the version number of the next checkpoint to be made.
           When reading and writing - perform a checkpoint and return its
           version.  Returns -1 if the checkpoint fails, 0 if writing only
           and checkpoint is successful. (rw)
 
    /proc/self/checkpointfile
           The third argument (newval) is a pointer to a file-name to be used
           as the basis for future checkpoints (see mosrun(1)).  (write only)
 
    /proc/self/checkpointlimit
           Reads/modifies the maximal number of checkpoint files to create
           before recycling the checkpoint version number.  A value of 0
           unlimits the number of checkpoints files.  The maximal value
           allowed is 10000000.
 
    /proc/self/checkpointinterval
           When writing, sets the interval in minutes for automatic check-
           points (see mosrun(1)).  A value of 0 cancels automatic check-
           points.  The maximal value allowed is 10000000.  Note that writing
           has a side effect of reseting the time left to the next check-
           point. Thus, writing too frequently is not recommended.  (rw)
 
    More functions are available through the direct_communication(7) feature.
 
    The following information is available via the /proc file system for
    everyone to read (not just within the MOSIX run-time environment):
 
    /proc/{pid}/from
           The IP address (a.b.c.d) of the process' home-node ("0" if a local
           process).
 
    /proc/{pid}/where
           The IP address (a.b.c.d) where the process is runing ("0" if run-
           ning here).
 
    /proc/{pid}/class
           The class of the process.
 
    /proc/{pid}/origipid
           The original PID of the process on its home-node ("0" if a local
           process).
 
    /proc/{pid}/freezer
           Whether and why the process was frozen:
 
           0      Not frozen
 
           1      Frozen automatically due to high load.
 
           2      Frozen by the evacuation policy, to prevent flooding by
                  arriving processes when clusters are disconnected.
 
           3      Frozen due to manual request.
 
           -66    This is a guest process from another home-mode (freezing is
                  always on the home-node, hence not applicable here).
 
    Attempting to read the above for non-MOSIX processes returns the string "-3".
 
STARTING MOSIX
    To start MOSIX, run /etc/init.d/mosix start.  Alternately, run mosd.
 
SECURITY
    All nodes within a MOSIX cluster and a MOSIX grid must trust each other's
    super-user(s), otherwise the security of the whole cluster or grid is
    compromized.

    Hostile computers must not be allowed physical access to the internal
    MOSIX network where they could masquerade as having IP addresses of
    trusted nodes.
 
SEE ALSO
    mosrun(1), mosctl(1), migrate(1), setpe(1), mon(1), mosps(1),
    moskillall(1), mosq(1), bestnode(1), mospipe(1), direct_communication(7),
    topology(7).

HISTORY
    This is the 10-th version of MOSIX.  The MOSIX FAQ web page has more
    information about the previous releases.

MOSIX                              May 2006                              MOSIX