[WIP]add mluop cholesky #1146

dglr · 2024-11-13T17:49:33Z

Thanks for your contribution and we appreciate it a lot. 🚀🚀

1. Motivation

Please describe your motivation and the goal you want to achieve through this pull request.

2. Modification

Please briefly describe what modification is made in this pull request, and indicate where to make the modification.

Are new test cases added? If so, please post the corresponding generator-PR link here.

3. Test Report

If you want to know how to do operator testing, you can see GTest-User-Guide-zh.

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

For static threshold standard details, see: MLU-OPS™ Accuracy Acceptance Standard.

static threshold
- diff1
  - float32 mlu diff1 <= 1e-5
  - float32 mlu diff1 <= 3e-3
  - float16 mlu diff1 <= 3e-3
- diff2
  - float32 mlu diff2 <= 1e-5
  - float32 mlu diff2 <= 3e-3
  - float16 mlu diff2 <= 3e-3
- diff3
  - mlu diff3 == 0
  - mlu diff3_1 == 0
  - mlu diff3_2 == 0
dynamic threshold
- diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
- diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
- diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
  - float32, threshold = 1e-5
  - float16, threshold = 1e-3

3.1.2 Operator Scheme checklist

Supported hardware
- MLU370
- MLU590
Job types
- BLOCK
- UNION1
- UNION2
- UNION4
- The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform：MLU370

# The test results should contain Op name, Shape, Data type,  
#   MLU Hardware Time(us), MLU Interface Time(us), MLU IO Efficiency, 
#   MLU Compute Efficiency, and Mlu Workspace Size(Bytes)
# 
# for example:
#
# ----------- case0 -----------
# case0
# [Op name                ]: abs
# [Shape                  ]: input.shape=[1024,1024,3,4], output.shape=[1024,1024,3,4]
# [Data type]             ]: float32
# [MLU Hardware Time      ]: 15728 (us)
# [MLU Interface Time     ]: 369.008 (us)
# [MLU IO Efficiency      ]: 0.23275
# [MLU Compute Efficiency ]: 0.5
# [Mlu Workspace Size     ]: -1 (Bytes)
# 
# ----------- case1 -----------
# ...

Platform：MLU590

# ----------- case0 -----------
# ----------- case1 -----------
# ...

3.4 Summary Analysis

Please give a brief overview here, if you want to note and summarize the content.

ArtIntAI · 2024-11-22T08:23:23Z

kernels/cholesky/cholesky_union1.mlu

+}
+
+__mlu_global__ void inverse_kernel(int batch, float* d_input, int ld_input,
+                                   int stride_input, float* d_output,


.mlu文件中的一些共性问题：
1.关键步骤缺少不要的注释
2.sync_cluster存在多用的问题，建议对于每个sync和sync_cluster加上必要的注释说明同步了什么操作，目的是啥
3.调用cnnl代码的逻辑不要放到.mlu中，.mlu文件本质上是deivce上的函数，调用cnnl的接口是个host侧行为
4.不要自己创建cnrtQueue，统一使用外部传入的handle->queue
5.涉及到cnrtDim的，用policyFunc函数封装
6.变量命名不清晰，建议不要缩写名字，提升可读性

ArtIntAI · 2024-11-22T08:24:04Z

kernels/cholesky/complex_cholesky_union1.mlu

+                              const int lda, int width, float* sram_buffer,
+                              float* dst) {
+  int id = taskId % 4;
+  int span = CPOTF_NB;


.mlu文件中的一些共性问题：
1.关键步骤缺少必要的注释
2.sync_cluster存在多用的问题，建议对于每个sync和sync_cluster加上必要的注释说明同步了什么操作，目的是啥
3.调用cnnl代码的逻辑不要放到.mlu中，.mlu文件本质上是deivce上的函数，调用cnnl的接口是个host侧行为
4.不要自己创建cnrtQueue，统一使用外部传入的handle->queue
5.涉及到cnrtDim的，用policyFunc函数封装
6.变量命名不清晰，建议不要缩写名字，提升可读性

ArtIntAI · 2024-11-22T08:24:19Z

kernels/cholesky/complex_cholesky_union1.mlu

+ *************************************************************************/
+
+#include "cholesky.h"
+#define COMPLEX_OFFSET(A, off) (((float*)A) + (2 * (off)))


这里的目的是啥，建议加上注释

ArtIntAI · 2024-11-22T08:25:00Z

kernels/cholesky/cholesky.h

+#define CLUSTER_NUM 1
+#define M (TASK_NUM * POTF_NB)
+#define ZERO 0.0
+#define SHARED_MEM_SIZE (((M * POTF_NB / TASK_NUM * 4) + (POTF_NB * POTF_NB)))


建议加上注释，说明空间是怎么使用的

ArtIntAI · 2024-11-22T08:25:24Z

kernels/cholesky/cholesky.h

+#define M (TASK_NUM * POTF_NB)
+#define ZERO 0.0
+#define SHARED_MEM_SIZE (((M * POTF_NB / TASK_NUM * 4) + (POTF_NB * POTF_NB)))
+#define OFFSET_ROW(A, i, j) A + ((i) * (lda) + (j))


建议加上注释，说明这些offset宏的目的

ArtIntAI · 2024-11-22T08:30:15Z

测试报告最后也更新下吧，另外测试报告中性能部分建议测试下float/complex upper=false和upper=true的性能

ArtIntAI · 2024-11-25T06:19:43Z

随机测试时较多case会出现精度问题&coredum，建议使用下面的脚本生成下case并做自我测试

```python
import sys
import os
import numpy as np
def dShape(shapes):
    shape_val = '"shape":['
    for i in range(len(shapes)-1):
        shape_val += str(shapes[i])+','
    shape_val += str(shapes[len(shapes)-1]) + ']'
    return  shape_val

def dType(data_type):
    return '"dtype":"' + data_type + '"'

def dRandomDistribution(start, end):
    return '"random_distribution":{"uniform":[' + str(start) + ',' + str(end) + ']}'

def dlayout(data_layout):
    return '"layout":"' + data_layout + '"'

def dNanInf():
    naninf_bool_list = ['true', 'false']
    has_nan = np.random.choice(naninf_bool_list)
    has_inf = np.random.choice(naninf_bool_list)
    result_str = f'"contain_nan": {has_nan}, "contain_inf": {has_inf}'
    return result_str

def dInplace():
    inplace_bool_list = ['true', 'false']
    is_inplace = np.random.choice(inplace_bool_list)
    result_str = f'"inplace": {is_inplace}'
    return result_str

def dUpper():
    upper_bool_list = ['true', 'false']
    is_upper = np.random.choice(upper_bool_list)
    result_str = f'"upper": {is_upper}'
    return result_str

def genSingleCase(dtype='float32', params_list=[1,1,False]):
    B = params_list[0]
    N = params_list[1]
    nan_inf = params_list[2]

    input_shape = [B, N, N]
    output_shape = [B, N, N]

    inputs = '     {\n       "inputs":['
    if nan_inf:
      input1 = '{' + dShape(input_shape) + ',' + dType(dtype) + ',' + dRandomDistribution(0,100) + ","+ dNanInf() + "," + dlayout("ARRAY") + '}'
    else :
      input1 = '{' + dShape(input_shape) + ',' + dType(dtype) + ',' + dRandomDistribution(0,100) + "," + dlayout("ARRAY") + '}'

    outputs = '       "outputs":['
    output1 = '{' + dShape(output_shape) + ',' + dType(dtype) + ',' + dlayout("ARRAY") + '}'


    inputs += input1 + '],\n'
    outputs += output1  + '],\n'

    op_param = '       "op_params":{' + dInplace() + ","+ dUpper() + '},\n'

    proto_param = '       "proto_params":{"write_data":true}'

    cur_res = inputs + outputs + op_param + proto_param + '\n     }'
    return cur_res

def genCase(dtype = "float32", nan_inf=False):
    count = 1
    cur_res = '     "manual_data":[\n'
    B = np.random.randint(1,32)
    N = np.random.randint(1,128)
    param = [B, N, nan_inf]
    cur_res += genSingleCase(dtype = dtype, params_list=param)

    for i in range(5):
        if i % 2 == 0:
            count += 1
            B = np.random.randint(1,32)
            N = np.random.randint(128,256)
            param = [B, N, nan_inf]
            cur_res += ',\n' + genSingleCase(dtype = dtype, params_list=param)

        if i % 3 == 0:
            count += 1
            B = np.random.randint(1,32)
            N = np.random.randint(256,512)
            param = [B, N, nan_inf]
            cur_res += ',\n' + genSingleCase(dtype = dtype, params_list=param)

        if i % 5 == 0:
            count += 1
            B = np.random.randint(1,32)
            N = np.random.randint(512,1024)
            param = [B, N, nan_inf]
            cur_res += ',\n' + genSingleCase(dtype = dtype, params_list=param)
    cur_res += '\n     ]\n}'
    print("the count of cases:", count)
    return cur_res

if __name__ == "__main__":
    res = '{\n\
    "op_name":"cholesky",\n\
    "device":"gpu",\n\
    "require_value":true,\n\
    "evaluation_criterion":["diff1","diff2", "diff4"],\n\
    "threshold_rate":[10,10,1],\n\
    "if_dynamic_threshold": true,\n\
    "supported_mlu_platform":["370", "590"],\n'
    dtype = "float32"
    res_fp32 = res + genCase(dtype)
    file = open("./cholesky_random_float32.json",'w')
    file.write(res_fp32)
    res_fp32_nan_inf = res + genCase(dtype, True)
    file = open("./cholesky_random_float32_nan_and_inf.json",'w')
    file.write(res_fp32_nan_inf)
    dtype = "complex_float"
    res_complex_fp32 = res + genCase(dtype)
    file = open("./cholesky_random_complex_fp32.json",'w')
    file.write(res_complex_fp32)
    res_complex_fp32_nan_inf = res + genCase(dtype, True)
    file = open("./cholesky_random_complex_fp32_nan_and_inf.json",'w')
    file.write(res_complex_fp32_nan_inf)
    file.close()

出错场景下的json配置：
[
[cholesky_random_float32.json](https://github.com/user-attachments/files/17898254/cholesky_random_float32.json)
[cholesky_random_float32_nan_and_inf.json](https://github.com/user-attachments/files/17898256/cholesky_random_float32_nan_and_inf.json)
[cholesky_random_complex_fp32.json](https://github.com/user-attachments/files/17898257/cholesky_random_complex_fp32.json)
[cholesky_random_complex_fp32_nan_and_inf.json](https://github.com/user-attachments/files/17898258/cholesky_random_complex_fp32_nan_and_inf.json)
](url)

ArtIntAI · 2024-11-25T06:40:24Z

另外针对用户感知到的一些tensor信息，如下所列，支持的做下测试，不支持的可以参考下其他算子做好参数拦截
1.large tensor(tensor单个维度超过2G num, tensor的所有维度乘积超过2G num)
2. inplace，输入和输出tensor地址一致
3. stride，如果不支持做好参数检查报错
4. 广播，如果不支持做好参数检查报错
5. 输入和输出包含nan/inf时精度是否和GPU精度对齐
6. 输入tensor是0元素，某个维度是0

ArtIntAI · 2024-11-25T06:55:02Z

测试的generator代码中也有问题，会有下面的问题，也请修复下
RROR:root [06:50:42.801] [builder.py:249] got exception when running case {'inputs': [{'require_value': True, 'shape': [23, 872, 872], 'random_distribution': {'uniform': [0, 100]}, 'layout': 'ARRAY', 'dtype': 'complex_float', 'position': None, 'scale': None, 'offset': None, 'onchip_dtype': 'unset', 'contain_nan': None, 'contain_inf': None}], 'outputs': [{'require_value': True, 'shape': [23, 872, 872], 'layout': 'ARRAY', 'dtype': 'complex_float', 'position': None, 'scale': None, 'offset': None, 'onchip_dtype': 'unset', 'handle_param': HandleParam(round_mode=<QuantizeRoundMode.ROUND_OFF_ZERO: 2>)}], 'handle_param': HandleParam(round_mode=<QuantizeRoundMode.ROUND_OFF_ZERO: 2>), 'src_schema': 'json', 'cast_mode': None}, reason: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 779 is not positive-definite).
Traceback (most recent call last):
File "/ict/mlu-ops-generator/framework/builder.py", line 247, in run
is_success = runner(case)
File "/ict/mlu-ops-generator/framework/builder.py", line 231, in
runner = lambda case: case.run()
File "/ict/mlu-ops-generator/framework/builder.py", line 349, in run
op_test.run()
File "/ict/mlu-ops-generator/nonmlu_ops/base/optest.py", line 59, in run
outputs_baseline = self.compute()
File "/ict/mlu-ops-generator/nonmlu_ops/cholesky/compute.py", line 273, in compute
result_L_complex64 = torch.linalg.cholesky(A_complex64,upper=upper)
torch._C._LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 779 is not positive-definite).

dglr · 2024-11-25T13:19:55Z

随机测试时较多case会出现精度问题&coredum，建议使用下面的脚本生成下case并做自我测试

```python
import sys
import os
import numpy as np
def dShape(shapes):
    shape_val = '"shape":['
    for i in range(len(shapes)-1):
        shape_val += str(shapes[i])+','
    shape_val += str(shapes[len(shapes)-1]) + ']'
    return  shape_val

def dType(data_type):
    return '"dtype":"' + data_type + '"'

def dRandomDistribution(start, end):
    return '"random_distribution":{"uniform":[' + str(start) + ',' + str(end) + ']}'

def dlayout(data_layout):
    return '"layout":"' + data_layout + '"'

def dNanInf():
    naninf_bool_list = ['true', 'false']
    has_nan = np.random.choice(naninf_bool_list)
    has_inf = np.random.choice(naninf_bool_list)
    result_str = f'"contain_nan": {has_nan}, "contain_inf": {has_inf}'
    return result_str

def dInplace():
    inplace_bool_list = ['true', 'false']
    is_inplace = np.random.choice(inplace_bool_list)
    result_str = f'"inplace": {is_inplace}'
    return result_str

def dUpper():
    upper_bool_list = ['true', 'false']
    is_upper = np.random.choice(upper_bool_list)
    result_str = f'"upper": {is_upper}'
    return result_str

def genSingleCase(dtype='float32', params_list=[1,1,False]):
    B = params_list[0]
    N = params_list[1]
    nan_inf = params_list[2]

    input_shape = [B, N, N]
    output_shape = [B, N, N]

    inputs = '     {\n       "inputs":['
    if nan_inf:
      input1 = '{' + dShape(input_shape) + ',' + dType(dtype) + ',' + dRandomDistribution(0,100) + ","+ dNanInf() + "," + dlayout("ARRAY") + '}'
    else :
      input1 = '{' + dShape(input_shape) + ',' + dType(dtype) + ',' + dRandomDistribution(0,100) + "," + dlayout("ARRAY") + '}'

    outputs = '       "outputs":['
    output1 = '{' + dShape(output_shape) + ',' + dType(dtype) + ',' + dlayout("ARRAY") + '}'


    inputs += input1 + '],\n'
    outputs += output1  + '],\n'

    op_param = '       "op_params":{' + dInplace() + ","+ dUpper() + '},\n'

    proto_param = '       "proto_params":{"write_data":true}'

    cur_res = inputs + outputs + op_param + proto_param + '\n     }'
    return cur_res

def genCase(dtype = "float32", nan_inf=False):
    count = 1
    cur_res = '     "manual_data":[\n'
    B = np.random.randint(1,32)
    N = np.random.randint(1,128)
    param = [B, N, nan_inf]
    cur_res += genSingleCase(dtype = dtype, params_list=param)

    for i in range(5):
        if i % 2 == 0:
            count += 1
            B = np.random.randint(1,32)
            N = np.random.randint(128,256)
            param = [B, N, nan_inf]
            cur_res += ',\n' + genSingleCase(dtype = dtype, params_list=param)

        if i % 3 == 0:
            count += 1
            B = np.random.randint(1,32)
            N = np.random.randint(256,512)
            param = [B, N, nan_inf]
            cur_res += ',\n' + genSingleCase(dtype = dtype, params_list=param)

        if i % 5 == 0:
            count += 1
            B = np.random.randint(1,32)
            N = np.random.randint(512,1024)
            param = [B, N, nan_inf]
            cur_res += ',\n' + genSingleCase(dtype = dtype, params_list=param)
    cur_res += '\n     ]\n}'
    print("the count of cases:", count)
    return cur_res

if __name__ == "__main__":
    res = '{\n\
    "op_name":"cholesky",\n\
    "device":"gpu",\n\
    "require_value":true,\n\
    "evaluation_criterion":["diff1","diff2", "diff4"],\n\
    "threshold_rate":[10,10,1],\n\
    "if_dynamic_threshold": true,\n\
    "supported_mlu_platform":["370", "590"],\n'
    dtype = "float32"
    res_fp32 = res + genCase(dtype)
    file = open("./cholesky_random_float32.json",'w')
    file.write(res_fp32)
    res_fp32_nan_inf = res + genCase(dtype, True)
    file = open("./cholesky_random_float32_nan_and_inf.json",'w')
    file.write(res_fp32_nan_inf)
    dtype = "complex_float"
    res_complex_fp32 = res + genCase(dtype)
    file = open("./cholesky_random_complex_fp32.json",'w')
    file.write(res_complex_fp32)
    res_complex_fp32_nan_inf = res + genCase(dtype, True)
    file = open("./cholesky_random_complex_fp32_nan_and_inf.json",'w')
    file.write(res_complex_fp32_nan_inf)
    file.close()

出错场景下的json配置：
[
[cholesky_random_float32.json](https://github.com/user-attachments/files/17898254/cholesky_random_float32.json)
[cholesky_random_float32_nan_and_inf.json](https://github.com/user-attachments/files/17898256/cholesky_random_float32_nan_and_inf.json)
[cholesky_random_complex_fp32.json](https://github.com/user-attachments/files/17898257/cholesky_random_complex_fp32.json)
[cholesky_random_complex_fp32_nan_and_inf.json](https://github.com/user-attachments/files/17898258/cholesky_random_complex_fp32_nan_and_inf.json)
](url)

请问具体是在哪个规模下出现了精度问题或者coredump问题呢，麻烦举出一些例子我优先复现然后修复

ArtIntAI · 2024-11-26T02:29:07Z

出错的case：
’‘’
14 * 679 * 679 upper
18 * 925 * 925, upper
‘’‘

ArtIntAI · 2024-11-27T12:31:39Z

kernels/cholesky/cholesky_union1.mlu

+    if (batch == 1) {
+      func_type = CNRT_FUNC_TYPE_UNION1;
+    } else if (batch == 2) {
+      func_type = CNRT_FUNC_TYPE_UNION2;


板卡上不一定有这个类型，建议参考这里进行设置：

mlu-ops/kernels/dynamic_point_to_voxel/dynamic_point_to_voxel_backward/dynamic_point_to_voxel_backward.cpp

Line 191 in 5ae8c94

*k_type = mluop::runtime::getJobLimitCapabilityCnrtFuncType(handle);

mlu-ops/kernels/fft/c2c_fft/c2c_fft_host.cpp

Line 1668 in 5ae8c94

int task_type = mluop::runtime::getJobLimitCapability(handle);

已修改为U1类型

ArtIntAI · 2024-11-27T12:42:06Z

kernels/cholesky/cholesky.cpp

+                       type_size * size_a * lda * ((uint64_t)batch_size - 16),
+                       CNRT_MEM_TRANS_DIR_DEV2DEV));
+      } else {
+        CNRT_CHECK(cnrtMemcpy(d_output, workspace,


不建议使用cnrtMemcpy和cnrtMemset，cnrtQueueSync，会对上层使用mlu_graph有问题
建议cnrtMemcpy使用片上的__memcpy来替换
cnrtMemset使用片上设置数据来替换
cnrtQueueSync可以去掉，对于同一个queue来说，queue内的kernel调用（使用<<<>>>）是串行的

ArtIntAI · 2024-12-06T08:29:43Z

kernels/cholesky/cholesky_union1.mlu

+      func_type = CNRT_FUNC_TYPE_UNION4;
+      carry_batch = 4;
+    } else {
+      func_type = CNRT_FUNC_TYPE_UNION8;


这里要根据板卡的实际最大cluster数目来，这里写死了U8，有些板卡没有U8这个类型
可以参考这里的写法

mlu-ops/kernels/fft/rfft/rfft_host.cpp

Line 1032 in 662a162

int task_type = mluop::runtime::getJobLimitCapability(handle);

其他类似的写死U8的地方也请一起修改下

ArtIntAI · 2025-01-03T03:41:37Z

test/mlu_op_gtest/pb_gtest/src/zoo/cholesky/cholesky.cpp

+
+
+
+  if (result_mul) {


结果验收上参考svd，需要验收输出结果L或者U， L@LT，以及output结果是下三角或者上三角，当前的处理，只处理了第一种方式，需要增加另外两种的测试。
关于结果这块的比较上：当前result_mul默认是false，只测试了结果的上下三角这块
同一个case还需要同时测试result_mul是true时，结果的还原性
另外还需要增加测试结果一定是上三角或者下三角的测试，这个可以参考https://github.com/pytorch/pytorch/blob/main/test/test_linalg.py#L622

另外generator的逻辑也麻烦根据上面的comments做下update

ArtIntAI · 2025-01-03T03:46:46Z

test/mlu_op_gtest/pb_gtest/src/zoo/cholesky/cholesky.cpp

+void cpu_compute(float* cpu_c, int n_, int ldda_, bool upper_, bool trans_,
+mluOpDataType_t type_) {
+  if (trans_) {
+    for (int64_t i = 0; i < n_; i++) {


cpu计算过程这里加上关键的计算步骤吧，方便后续维护和阅读

ArtIntAI · 2025-01-03T03:47:18Z

test/mlu_op_gtest/pb_gtest/src/zoo/cholesky/cholesky.cpp

+  if (parser_->device() != CPU) {
+    if (result_mul) {
+      for (int i = 0; i < batch_size_; i++) {
+        if (type_ == MLUOP_DTYPE_FLOAT) {


这里做的trans和fill_zeo，设置1加下注释说明下意图，方便理解和后续维护

dglr force-pushed the cholesky1 branch from 99441cd to 284e6e8 Compare November 13, 2024 20:09

dglr and others added 29 commits November 14, 2024 11:56

complete the float type cholesky operator

fc47e70

[WIP]add mluop cholesky

77db74c

add cholesky doc

0e9a1f8

modify mathematical formula

6051cce

add complex type

86a2c41

finish complex batch

efa3d08

fix ang bugs

4872a42

fix nram workspace, update doc

0f12676

add pseudocode

86ceaba

add comments

beb7e53

add index.rst

40f62ba

format code

d935fb9

[Fix](mluOpCholesky): fix format

e42d270

[Fix](mluOpCholesky): add mluoplog when sqrt

bfcf2b2

[Fix](mluOpCholesky): reset workspace

e4f330b

[Fix](mluOpCholesky): rename getworkspace size function

9eb9dc4

[Fix](mluOpCholesky): rewrite description in mlu_op

76631ee

[Docs](mluOpCholesky): update docs

23acef7

[Fix](mluOpCholesky): del printf

40661cc

[Docs](mluOpCholesky): rewrite Conjugate transpose symbol

4d34d54

[Fix](mluOpCholesky): format

fc1a0ac

[Fix](mluOpCholesky): add layout check

b0d5b6e

[Fix](mluOpCholesky): fix mem check

9f7dcd5

[Docs](mluOpCholesky): add test doc

7888578

[Docs](mluOpCholesky): add coverage test

435f829

[Fix](mluOpCholesky): add dimension equals test

4f9a1af

[Fix](mluOpCholesky): add coverage function

43bbe67

[Fix](mluOpCholesky): test

976a88b

[Fix](mluOpCholesky): test

284e6e8

ArtIntAI reviewed Nov 22, 2024

View reviewed changes

ArtIntAI reviewed Nov 27, 2024

View reviewed changes

dglr added 6 commits November 30, 2024 08:49

[Fix](mluOpCholesky): add policy func

2b5822d

[Fix](mluOpCholesky): rename variables

ecc80ec

[Fix](mluOpCholesky): add some comments

0938f55

[Fix](mluOpCholesky): mv cnnl to cpp

3213104

[Fix](mluOpCholesky): add new memcpy

a9c9db6

[Fix](mluOpCholesky): remove useless sync

876ebc2

ArtIntAI reviewed Dec 6, 2024

View reviewed changes

dglr added 7 commits December 9, 2024 06:08

[Fix](mluOpCholesky): remove cnrtmemcpy cnrtqueuesync

94664e5

[Fix](mluOpCholesky): update cholesky_test

2c96f6f

[Fix](mluOpCholesky): fix bugs

9718b02

[Docs](mluOpCholesky): update doc

a472285

[Fix](mluOpCholesky): fix sync bugs

5d35eca

[Docs](mluOpCholesky): update doc

55ff179

[Fix](mluOpCholesky): add param check

b52a27a

ArtIntAI reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]add mluop cholesky #1146

[WIP]add mluop cholesky #1146

dglr commented Nov 13, 2024

ArtIntAI Nov 22, 2024

ArtIntAI Nov 22, 2024 •

edited

Loading

dglr Dec 4, 2024

ArtIntAI Nov 22, 2024

dglr Dec 4, 2024

ArtIntAI Nov 22, 2024

dglr Dec 4, 2024

ArtIntAI Nov 22, 2024

dglr Dec 4, 2024

ArtIntAI commented Nov 22, 2024

ArtIntAI commented Nov 25, 2024 •

edited

Loading

ArtIntAI commented Nov 25, 2024

ArtIntAI commented Nov 25, 2024 •

edited

Loading

dglr commented Nov 25, 2024

ArtIntAI commented Nov 26, 2024 •

edited

Loading

ArtIntAI Nov 27, 2024

dglr Dec 10, 2024

ArtIntAI Nov 27, 2024

dglr Dec 10, 2024

ArtIntAI Dec 6, 2024

ArtIntAI Dec 6, 2024

dglr Dec 10, 2024

ArtIntAI Jan 3, 2025

ArtIntAI Jan 3, 2025

ArtIntAI Jan 3, 2025

ArtIntAI Jan 3, 2025

[WIP]add mluop cholesky #1146

Are you sure you want to change the base?

[WIP]add mluop cholesky #1146

Conversation

dglr commented Nov 13, 2024

1. Motivation

2. Modification

3. Test Report

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

3.1.2 Operator Scheme checklist

3.2 Accuracy Test

3.2.1 Accuracy Test

3.2.2 Parameter Check

3.3 Performance Test

3.4 Summary Analysis

Choose a reason for hiding this comment

ArtIntAI Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArtIntAI commented Nov 22, 2024

ArtIntAI commented Nov 25, 2024 • edited Loading

ArtIntAI commented Nov 25, 2024

ArtIntAI commented Nov 25, 2024 • edited Loading

dglr commented Nov 25, 2024

ArtIntAI commented Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArtIntAI Nov 22, 2024 •

edited

Loading

ArtIntAI commented Nov 25, 2024 •

edited

Loading

ArtIntAI commented Nov 25, 2024 •

edited

Loading

ArtIntAI commented Nov 26, 2024 •

edited

Loading