Distributed Deep Learning

Watson Machine Learning Community Edition stack uses technologies to deliver exceptional training performance by distributing a single training job across a cluster of servers. Distributed Deep Learning (DDL) brings intelligence about the structure and the layout of the underlying cluster (topology) which includes intelligence about the location of the cluster’s different compute resources such as Graphical Process Units (GPUs) and CPUs. Watson Machine Learning Community Edition is unique in that this capability is incorporated in to the Deep Learning frameworks as an integrated binary, reducing complexity for clients as they bring in high-performance cluster capability. 

Because of this capability, Watson ML CE with DDL can scale jobs across large numbers of cluster resources with very little loss to communication overhead. 

DDL is currently compatible with bundled TensorFlow and Pytorch, for a getting-started tutorial on how to use DDL, check the links from IBM for TensorFlow, PyTorch, and IBM Caffe.