publications
2023
- Under ReviewFast Hardware-Aware Matrix-Free Algorithm for Higher-Order Finite-Element Discretized Matrix Multivector Products on Distributed SystemsGourab Panigrahi, Nikhil Kodali, Debashis Panda, and 1 more authorOct 2023
Recent hardware-aware matrix-free algorithms for higher-order finite-element (FE) discretized matrix-vector multiplications reduce floating point operations and data access costs compared to traditional sparse matrix approaches. This work proposes efficient matrix-free algorithms for evaluating FE discretized matrix-multivector products on both multi-node CPU and GPU architectures. We address a critical gap in existing matrix-free implementations, which are well suited only for the action of FE discretized matrices on a single vector. We employ batched evaluation strategies, with the batchsize tailored to underlying hardware architectures, leading to better data locality and enabling further parallelization. On CPUs, we utilize even-odd decomposition, SIMD vectorization, and overlapping computation and communication strategies. On GPUs, we employ strategies to overlap compute and data movement in conjunction with GPU shared memory, constant memory, and kernel fusion to reduce data accesses. Our implementation outperforms the baselines for Helmholtz operator action, achieving up to 1.4x improvement on one CPU node and up to 2.8x on one GPU node, while reaching up to 4.4x and 1.5x improvement on multiple nodes for CPUs (~ 3000 cores) and GPUs (~ 25 GPUs), respectively. We further benchmark the performance of the proposed implementation for solving a model eigenvalue problem for 1024 smallest eigenvalue-eigenvector pairs by employing the Chebyshev Filtered Subspace Iteration method, achieving up to 1.5x improvement on one CPU node and up to 2.2x on one GPU node while reaching up to 3.0x and 1.4x improvement on multinode CPUs (~ 3000 cores) and GPUs (~ 25 GPUs), respectively.
2017
- PhysRevBShort-range atomic ordering in nonequilibrium silicon-germanium-tin semiconductorsS. Mukherjee, N. Kodali, D. Isheim, and 5 more authorsPhysical Review B, Apr 2017Publisher: American Physical Society
The precise knowledge of the atomic order in monocrystalline alloys is fundamental to understand and predict their physical properties. With this perspective, we utilized laser-assisted atom probe tomography to investigate the three-dimensional distribution of atoms in nonequilibrium epitaxial Sn-rich group-IV SiGeSn ternary semiconductors. Different atom probe statistical analysis tools including frequency distribution analysis, partial radial distribution functions, and nearest-neighbor analysis were employed in order to evaluate and compare the behavior of the three elements to their spatial distributions in an ideal solid solution. This atomistic-level analysis provided clear evidence of an unexpected repulsive interaction between Sn and Si leading to the deviation of Si atoms from the theoretical random distribution. This departure from an ideal solid solution is supported by first-principles calculations and attributed to the tendency of the system to reduce its mixing enthalpy throughout the layer-by-layer growth process.