On the Efficiency of Supernodal Factorization in Interior-Point Method Using CPU-GPU Collaboration

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Jan 2020Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 8, pages 120,892-120,904 (eissn: 2169-3536,

Copyright policy )

Authors: Usman Ali Shah; Suhail Yousaf; Iftikhar Ahmad; Muhammad Ovais Ahmad;

doi: 10.1109/access.2020.3006353 , 10.60692/8ez8p-vjt40 , 10.60692/2t5e6-ta668

On the Efficiency of Supernodal Factorization in Interior-Point Method Using CPU-GPU Collaboration

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

El método de punto interior primo-dual (PDIPM) es la técnica más eficiente para resolver problemas de programación lineal dispersa (LP). A pesar de su eficiencia, PDIPM sigue siendo un algoritmo intensivo en cómputo. Afortunadamente, las unidades de procesamiento gráfico (GPU) tienen el potencial de cumplir con este requisito. Sin embargo, su peculiar arquitectura implica una relación positiva entre la densidad del problema y la aceleración, lo que implica una afinidad limitada de las GPU por la dispersión del problema. Para superar esta dificultad, la implementación híbrida de última generación (CPU-GPU) de PDIPM explota la presencia de supernodos en matrices dispersas durante la factorización. Los supernodos son grupos de columnas similares que pueden tratarse como submatrices densas. El método de factorización utilizado en el solucionador de última generación solo realiza operaciones seleccionadas relacionadas con supernodos grandes en la GPU. Se sabe que este método infrautiliza la potencia de cálculo de la GPU al tiempo que aumenta la sobrecarga de comunicación CPU-GPU. Estas deficiencias nos animaron a adaptar otro método de factorización, que procesa conjuntos de supernodos relacionados en GPU, e introducirlo en la implementación PDIPM de un popular solucionador de código abierto. Nuestra adaptación permitió que el método de factorización mitigara mejor los efectos de los errores de redondeo acumulados en múltiples iteraciones de PDIPM. Para aumentar las ganancias de rendimiento, también utilizamos un método eficiente de multiplicación de matrices basado en CPU. Cuando se probó para un conjunto de problemas dispersos bien conocidos, el solucionador adaptado mostró aceleraciones promedio de aproximadamente 55X, 1.14X y 1.05X sobre la versión original del solucionador de código abierto, el solucionador de última generación y un solucionador patentado altamente optimizado conocido como CPLEX, respectivamente. Estos resultados indican claramente que nuestro enfoque híbrido propuesto puede conducir a ganancias de rendimiento significativas para resolver grandes problemas dispersos.

La méthode du point intérieur primal-dual (PDIPM) est la technique la plus efficace pour résoudre les problèmes de programmation linéaire clairsemée (LP). Malgré son efficacité, PDIPM reste un algorithme gourmand en calcul. Heureusement, les unités de traitement graphique (GPU) ont le potentiel de répondre à cette exigence. Cependant, leur architecture particulière implique une relation positive entre la densité du problème et l'accélération, impliquant à l'inverse une affinité limitée des GPU pour la rareté du problème. Pour surmonter cette difficulté, l'implémentation hybride de pointe (CPU-GPU) de PDIPM exploite la présence de supernœuds dans des matrices éparses lors de la factorisation. Les supernœuds sont des groupes de colonnes similaires qui peuvent être traités comme des sous-matrices denses. La méthode de factorisation utilisée dans le solveur de pointe n'effectue que des opérations sélectionnées liées aux grands supernœuds sur le GPU. Cette méthode est connue pour sous-utiliser la puissance de calcul du GPU tout en augmentant la surcharge de communication CPU-GPU. Ces lacunes nous ont incités à adapter une autre méthode de factorisation, qui traite des ensembles de supernœuds connexes sur GPU, et à l'introduire dans l'implémentation PDIPM d'un solveur open source populaire. Notre adaptation a permis à la méthode de factorisation de mieux atténuer les effets des erreurs d'arrondi accumulées sur plusieurs itérations de PDIPM. Pour augmenter les gains de performance, nous avons également utilisé une méthode efficace de multiplication matricielle basée sur le processeur. Lorsqu'il a été testé pour un ensemble de problèmes rares bien connus, le solveur adapté a montré des accélérations moyennes d'environ 55X, 1,14X et 1,05X sur la version originale du solveur open source, le solveur de pointe et un solveur propriétaire hautement optimisé connu sous le nom de CPLEX, respectivement. Ces résultats indiquent fortement que notre approche hybride proposée peut conduire à des gains de performance significatifs pour résoudre de grands problèmes épars.

Primal-dual interior-point method (PDIPM) is the most efficient technique for solving sparse linear programming (LP) problems. Despite its efficiency, PDIPM remains a compute-intensive algorithm. Fortunately, graphics processing units (GPUs) have the potential to meet this requirement. However, their peculiar architecture entails a positive relationship between problem density and speedup, conversely implying a limited affinity of GPUs for problem sparsity. To overcome this difficulty, the state-of-the-art hybrid (CPU-GPU) implementation of PDIPM exploits presence of supernodes in sparse matrices during factorization. Supernodes are groups of similar columns that can be treated as dense submatrices. Factorization method used in the state-of-the-art solver performs only selected operations related to large supernodes on GPU. This method is known to underutilize GPU's computational power while increasing CPU-GPU communication overhead. These shortcomings encouraged us to adapt another factorization method, which processes sets of related supernodes on GPU, and introduce it to the PDIPM implementation of a popular open-source solver. Our adaptation enabled the factorization method to better mitigate the effects of round-off errors accumulated over multiple iterations of PDIPM. To augment performance gains, we also used an efficient CPU-based matrix multiplication method. When tested for a set of well-known sparse problems, the adapted solver showed average speed-ups of approximately 55X, 1.14X and 1.05X over the open-source solver's original version, the state-of-the-art solver, and a highly optimized proprietary solver known as CPLEX, respectively. These results strongly indicate that our proposed hybrid approach can lead to significant performance gains for solving large sparse problems.

طريقة النقطة الداخلية البدائية المزدوجة (PDIPM) هي التقنية الأكثر فعالية لحل مشكلات البرمجة الخطية المتفرقة (LP). على الرغم من كفاءته، لا يزال PDIPM خوارزمية كثيفة الحوسبة. لحسن الحظ، تتمتع وحدات معالجة الرسومات (GPUs) بالقدرة على تلبية هذا المطلب. ومع ذلك، فإن بنيتها الغريبة تستلزم وجود علاقة إيجابية بين كثافة المشكلة والتسريع، مما يعني على العكس من ذلك تقاربًا محدودًا لوحدات معالجة الرسومات لتناثر المشكلة. للتغلب على هذه الصعوبة، يستغل التنفيذ الهجين المتطور (CPU - GPU) لـ PDIPM وجود العقد الفائقة في المصفوفات المتناثرة أثناء التحليل. العقد الفائقة هي مجموعات من الأعمدة المتشابهة التي يمكن التعامل معها على أنها مصفوفات فرعية كثيفة. تقوم طريقة التحليل المستخدمة في جهاز الحل المتطور بتنفيذ عمليات محددة فقط تتعلق بالعقد الفائقة الكبيرة على وحدة معالجة الرسومات. ومن المعروف أن هذه الطريقة تقلل من استخدام الطاقة الحسابية لوحدة معالجة الرسومات مع زيادة الاتصالات بين وحدة المعالجة المركزية ووحدة المعالجة المركزية. شجعتنا أوجه القصور هذه على تكييف طريقة أخرى لتحليل العوامل، والتي تعالج مجموعات من العقد الفائقة ذات الصلة على GPU، وتعريفها بتنفيذ PDIPM لآلة حل شائعة مفتوحة المصدر. مكّن تكييفنا طريقة التحليل إلى عوامل للتخفيف بشكل أفضل من آثار أخطاء التقريب المتراكمة على التكرارات المتعددة لـ PDIPM. لزيادة مكاسب الأداء، استخدمنا أيضًا طريقة ضرب مصفوفة فعالة قائمة على وحدة المعالجة المركزية. عند اختباره لمجموعة من المشكلات المتفرقة المعروفة، أظهر المحلل المتكيف متوسط تسارع يبلغ حوالي 55x و 1.14X و 1.05X على الإصدار الأصلي لآلة الحل مفتوحة المصدر، وآلة الحل الحديثة، وآلة الحل المحسنة للغاية والمعروفة باسم CPLEX، على التوالي. تشير هذه النتائج بقوة إلى أن نهجنا المختلط المقترح يمكن أن يؤدي إلى مكاسب كبيرة في الأداء لحل المشكلات الكبيرة المتناثرة.

Related Organizations

University of Engineering and Technology Peshawar
Pakistan
Islamia College University
Pakistan
Information Technology University
Pakistan
Islamia College
Pakistan
University of Engineering and Technology
Pakistan

View all View all

Keywords

General-purpose computing on graphics processing units, Quadratic Programming, GPU, FOS: Mechanical engineering, Speedup, Sparse matrix, Engineering, Interior-Point Methods, Factorization, Eigenvalues and eigenvectors, Numerical Analysis, Nonlinear Programming, Numerical Optimization Techniques, Computer graphics (images), Physics, Computational science, linear programming, Programming language, Algorithm, Computational Theory and Mathematics, Solver, Sparse matrices, Physical Sciences, Gaussian, Graphics, Electrical engineering. Electronics. Nuclear engineering, Sparse Linear Systems, Parallel computing, LU decomposition, Multiplication (music), Aerospace Engineering, Quantum mechanics, Matrix decomposition, Parallel Computing, Satellite Communication Networks and Systems, FOS: Mathematics, Interior point method, Matrix Algorithms and Iterative Methods, GPGPU, Computer science, TK1-9971, Overhead (engineering), Operating system, Combinatorics, Computer Science, primal-dual interior-point method, Mathematics

8 Research products, page 1 of 1

Toward online multi‐period power dispatch with AC constraints and renewable energy
2018IsAmongTopNSimilarDocuments
Comparison of total variation algorithms for electrical impedance tomography
2015IsAmongTopNSimilarDocuments
Regularised primal–dual interior‐point method for dynamic optimal power flow with block‐angular structures
2020IsAmongTopNSimilarDocuments
Efficient two-level interior point method for optimal pumped hydrostorage scheduling exploiting the nonsparse matrix structure
2001IsAmongTopNSimilarDocuments
Model-based Volt-Var Optimization using Advanced Metering Infrastructure in distribution networks
2015IsAmongTopNSimilarDocuments
Experimental/clinical evaluation of EIT image reconstruction with ℓ₁data and image norms
2013IsAmongTopNSimilarDocuments
Vectorization implementation of optimal power flow in rectangular form based on interior point method
2008IsAmongTopNSimilarDocuments
glpk software on GNU
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

gold

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all