Amnon Barak projects  
Prof. Amnon Barak  
The MOSIX System R&D Projects  
2018 - 2024: Collective operations for MPI.  
  • Goals: optimizations of collective operations for MPI.
  • Development platform: a multi-core cluster.
  • Selected publication:

2014 - 2021: MOSIX-4  

2013 - 2019 The FFMK project. A Fast and Fault-tolerant Microkernel-based System for Exascale Computing  
  • Goals: Parallel algorithms for resilient computing in Exascale systems.
  • Joint research with TUDOS - Operating Systems Group, TU Dresden, Prof. Dr. H. Härtig (PI); ZIH - Center for Information Services and HPC, TU Dresden, Prof. Dr. W. E. Nagel (PI); ZIB - Zuse Institute Berlin, Prof. Dr. A. Reinefeld (PI).
  • Selected publications:
    • C. Weinhold, A. Lackorzynski, J. Bierbaum, M. Küttler, M. Planeta, H. Weisbach, M. Hille, H. Härtig, A. Margolin, D. Sharf, E. Levy, P. Gak, A. Barak, M. Gholami, F. Schintke, T. Schütt, A. Reinefeld, M. Lieber and W.E. Nagel. FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing. In: H-J. Bungartz, S. Reiz, B. Uekermann, P. Neumann, W.E. Nagel (Eds.), Software for Exascale Computing - SPPEXA 2016-2019, LNCSE, Vol 136. Springer, Chem, July 2020.

    • M. Küttler, M. Planeta, J. Bierbaum, C. Weinhold, H. Härtig, A. Barak, T. Hoefler. Corrected Trees for Reliable Group Communication. Proc. Principles and Practice of Parallel Programming (PPoPP'19), Washington, DC, Feb. 2019.

    • C. Weinhold, A. Lackorzynski, J. Bierbaum, M. Küttler, M. Planeta, H. Härtig, A. Shiloh, E. Levy, T. Ben-Nun, A. Barak, T. Steinke, T. Schütt, J. Fajerski, A. Reinefeld, M. Lieber, W.E. Nagel. FFMK: A Fast and Fault-tolerant Microkernel-based System for Exascale Computing. In: H-J. Bungartz, P. Neumann, W.E. Nagel (Eds.), Software for Exascale Computing - SPPEXA 2013 - 2015, LNCSE, Vol. 113, Springer, Oct. 2016.

2012 - 2018 The MAPS Framework for optimizing GPU and multi-GPU applications  
  • Goals: Develop optimized GPU and multi-GPU applications using Memory Access Pattern Specification.
  • Joint research with T. Ben-Nun, E. Levy and E. Rubin.
  • Development platform: our GPU clusters.
  • Selected publication:

2009 - 2018: VirtualCL (VCL) Cluster Platform  
  • Goals: allow applications to transparently utilize many OpenCL devices (CPUs, GPUs, Accelerators) in a cluster.
    • Subproject: The Many GPUs Package (MGP) that provides extended OpenMP and C++ APIs for running OpenCL kernels.
    • Subproject: SuperCL, an extension of OpenCL that allows OpenCL micro-programs to run efficiently on accelerator devices in remote nodes.
  • Joint research with T. Ben-Nun, E. Levy, A. Shiloh and J. Smith.
  • Development platform: a cluster of Infiniband connected (4-way Core i7) nodes, each with NVIDIA and AMD GPU devices.
  • Selected publication:
    • A. Barak, T. Ben-Nun, E. Levy and Shiloh A. A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU Devices, Workshop on Parallel Programming and Applications on Accelerator Clusters (PPAAC), IEEE Cluster 2010, Crete, Sept. 2010.
    • A. Barak and A. Shiloh, The MOSIX VCL Cluster Platform (abstract), Proc. Intel European Research & Innovation Conf., pp. 196, Leixlip, Oct. 2011.

2008 - 2009: MOSIX Reach the Clouds (MRC)  
  • Goals: allow applications to start on one node, then run on Clouds, without pre-copying files to the Clouds.

2004 - 2008: MOSIX for Linux-2.6 & Linux 3.x  
  • Goals: Redesign and upgrade of MOSIX, including R&D of on-line algorithms for fair-share of resources in clusters, data compression and virtual machine migration.
  • Joint research with A. Shiloh, L. Amar, E. Levy, T. Maoz, E. Meiri and M. Okun.
  • Development platform: a 96 node cluster.
  • Outcome: a production system - in use worldwide.
  • Selected publications:
    • A. Barak, A. Shiloh and L. Amar. An Organizational Grid of Federated MOSIX Clusters. Proc. 5th IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid'05), pp. 350-357, Cardiff, May 2005.
    • L. Amar, A. Barak, E. Levy and M. Okun. An On-line Algorithm for Fair-Share Node Allocations in a Cluster. Proc. 7th IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid'07), pp. 83-91, Rio de Janeiro, May 2007.
    • E. Meiri and A. Barak. Parallel Compression of Correlated Files, IEEE Cluster 2007, pp.285-292, Austin, Sept. 2007.
    • T. Maoz, A. Barak and L. Amar. Combining Virtual Machine Migration with Process Migration for HPC on Multi-Clusters and Grids, IEEE Cluster 2008, pp. 89-98, Tsukuba, Sept. 2008.

2000 - 2003: MOSIX for Linux-2.4 and DFSA/MFS  
  • Goals: upgrade MOSIX to Linux kernel 2.4, including R&D of scalable cluster file systems and parallel I/O.
  • Joint research with A. Shiloh and L. Amar.
  • Development platform: a 72 node cluster.
  • Outcome: a production system - used worldwide.
  • Selected publications:
    • L. Amar, A. Barak and A. Shiloh. The MOSIX Parallel I/O System for Scalable I/O Performance. Proc. 14th Int. Conf. on Parallel and Distributed Computing and Systems (PDCS'02), pp. 495-500, Cambridge, MA, Nov. 2002.
    • L. Amar, A. Barak and A. Shiloh. The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems. Cluster Computing, Vol. 7(2):141-150, April 2004.

1998 - 2000: MOSIX for Linux-2.2  
  • Goals: major redesign of MOSIX for Linux kernel 2.2 on x86 computers.
  • Joint research with A. Shiloh and O. La'adan.
  • Development platform: a 32 node cluster.
  • Outcome: a production system - used worldwide.
  • Selected publication:
    • A. Barak, O. La'adan and A. Shiloh. Scalable Cluster Computing with MOSIX for Linux, Proc. Linux Expo '99, pp. 95-100, Raleigh, N.C., May 1999.

  1991 - 1998: MOSIX for the i486/Pentium and BSD/OS  
  • Goals: develop MOSIX for the i486/Pentium and BSDI's BSD/OS, then study the performance of parallel applications and TCP/IP over the Myrinet LAN.
  • Joint research with A. Shiloh, Y. Yarom, O. La'adan, A. Braverman, I. Gilderman and I. Metrik.
  • Development platforms: initially, a cluster of 8 i486 workstations and 10 i486 DX-2 (Multibus II) Single Board Computer. Later, a cluster with 16 Pentium-100 and 64 Pentium II workstations, connected by Myrinet.
  • Outcome: a production system - used in several countries.
  • Selected publications:
    • A. Barak, O. La'adan and Y. Yarom, The NOW MOSIX and its Preemptive Process Migration Scheme, Bull. IEEE Tech. Committee on Operating Systems and Application Environments, Vol. 7(2):5-11, Summer 1995.
    • A. Barak and O. La'adan. Experience with a Scalable PC Cluster for HPC, Proc. Cluster Computing Conf. (CCC 97), Emory Univ., Atlanta, GA, March 1997.
    • A. Barak and A. Braverman. Memory Ushering in a Scalable Computing Cluster, Microprocessors and Microsystems, Vol. 22(3-4):175-182, Aug. 1998.
    • A. Barak, I. Gilderman and I. Metrik. Performance of the Communication Layers of TCP/IP with the Myrinet Gigabit LAN, Computer Communications, Vol. 22(11), July 1999.

1988 - 1990: MOSIX for the VME532 and SVR2  
  • Goals: develop MOSIX for the VME532 using AT&T System V release 2, including R&D of distributed algorithms, monitoring tools and evaluation of the network performance.
  • Joint research with A. Shiloh and R. Wheeler and S. Guday.
  • Development platform: a parallel computer made of 4 VME enclosures, each with 4-6 VME532 (NS32532) computers that communicated via the VME-bus. The VME enclosures were connected by the ProNET-80 LAN.
  • Outcome: a production system - installed in several sites.
  • Selected publications:

1988: MOSIX for the VAX and SVR2  
  • Goals: port MOS to the VAX architecture using AT&T System V release 2, including R&D of communication protocols.
  • Joint research with A. Shiloh and G. Shwed.
  • Development platform: a cluster of a VAX-780 and 4 VAX-750 computers connected by Ethernet.
  • Outcome: a running system - used internally.

  1983 - 1988: MOS for the M68K and Unix Version 7  
  • Goals: develop MOS for the M68K and Bell Lab's Unix Version 7 with some BSD 4.1 extensions, including R&D of load-balancing algorithms, distributed file systems and scalability.
  • Joint research with A. Shiloh, O.G. Paradise, D. Malki, and R. Wheeler.
  • Development platform: a cluster of 7 CADMUS/PCS MC68K computers connected by the ProNET-10 LAN.
  • Outcome: a running system - used for followup R&D projects.
  • Selected publications:
    • A. Barak and O.G. Paradise. MOS - Scaling Up UNIX. Proc. Summer 1986 USENIX Conf., pp. 414-418, Atlanta, GA, June 1986.
    • A. Barak, D. Malki and R. Wheeler. AFS, BFS, CFS ... or Distributed File Systems for UNIX, Proc. Autumn 86 EUUG Conf. on Dist. Systems, pp. 214-222, Manchester, Sept. 1986.
    • A. Barak and O.G. Paradise. MOS - a Load Balancing UNIX, Proc. Autumn 86 EUUG Conf. on Dist. Systems, pp. 273-280, Manchester, Sept. 1986.

  1981 - 1983: MOS for the PDP-11 and Unix Version 7  
  • Goals: design and implementation of a multi-computer operating system that provides a single-system image.
  • Joint research with A. Litman, R. Hofner, J. Mizel and A. Shiloh.
  • Development platform: a cluster of a PDP-11/45 and 4 PDP-11/23 computers initially connected by DRV11-J (Parallel Line Interface) and later by ProNET-10, running Bell Lab's Unix Version 7.
  • Outcome: a working prototype.
  • Selected publication:

1977 - 1980: UNIX with Satellite Processors  
  • Goals: study performance gains when distributing Unix processes to remote computers.
  • Joint research with A. Shapir, G. Steinberg and A.I. Karshmer.
  • Development platform: a "cluster" of a PDP-11/45 and a PDP-11/10, connected by parallel I/O, running Bell Lab's Unix Version 6.
  • Outcome: a working prototype.
  • Selected publication:
    • A. Barak and A. Shapir. UNIX with satellite Processors. Software: Practice & Experience, Vol. 10(5):383-392, May 1980.
    • A. Barak, A. Shapir, G. Steinberg and A.I. Karshmer. A Modular, Distributed UNIX. Proc. 14th Hawaii Int. Conf. on System Science, pp. 740-747, January 1981.