International Journal of Applied Science https://j.ideasspread.org/index.php/ijas <p>International Journal of Applied Science (IJAS) is an international, double-blind peer-reviewed, open-access journal, published by IDEAS SPREAD INC. It publishes original research, applied, and educational articles in all areas of applied science. It provides an academic platform for professionals and researchers to contribute innovative work in the field.<br>Authors are encouraged to submit complete, unpublished, original works that are not under review in any other journals. The scopes of the journal include, but are not limited to, the following fields: Agriculture, Biological Engineering and Application, Applied Mathematics and Statistics, Applied Physics and Engineering, Applied Chemistry and Materials Sciences, Civil Engineering and Architecture, Computer and Information Sciences and Application, Energy, Environmental Science and Engineering, Mechanics, Metrology, Military Science, Space Science, Sports Science, Ergonomics, Health Sciences, Fisheries science, Food Science, Forestry and all the fields related to applied science.<br>The journal is published in both print and online versions. The online version is free access and download.</p> IDEAS SPREAD INC en-US International Journal of Applied Science 2576-7240 <p>Copyright for this article is retained by the author(s), with first publication rights granted to the journal.<br>This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).</p> Optimizing Logcumsumexp on Cambricon MLU: Architecture-Aware Scheduling and Memory Management https://j.ideasspread.org/index.php/ijas/article/view/1652 <p class="text"><span lang="EN-US">The Logcumsumexp algorithm is a core method for numerically stable cumulative summation in logarithmic space, especially suitable for scenarios involving extremely small or large numerical computations. By applying logarithmic transformation, this algorithm effectively addresses the common issues of underflow and overflow in probability calculations, deep learning, and statistical modeling, making it an important high-performance computing algorithm. In recent years, China's chip industry has been continuously rising, and the domestic MLU computing platform from Cambricon Technologies has provided new options for global users. Based on the Cambricon MLU computing platform and in combination with its hardware structure, this paper constructs a set of Logcumsumexp algorithm named MLULCSE, which can perform Logcumsumexp operations on tensors of any dimension along the specified dimension and has been optimized for different types of Logcumsumexp tasks. By categorizing tasks into four types and implementing different strategies tailored to the hardware architecture, we achieved efficient logcumsumexp computation. This work enables efficient probabilistic computing on domestic AI accelerators, experimental results show that MLULCSE running on MLU 370-X4 has a hardware time that is controlled within 7 times compared to Pytorch Logcumsumexp running on Tesla V100, and in some cases, it even reaches 0.42 times.</span></p> Xiaohu Xu Yusen Zhu ##submission.copyrightStatement## http://creativecommons.org/licenses/by/4.0 2025-07-03 2025-07-03 8 3 p1 p1 10.30560/ijas.v8n3p1