博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
MAGMA
阅读量:4971 次
发布时间:2019-06-12

本文共 5235 字,大约阅读时间需要 17 分钟。

 

使用gotoblas2+CUDA安装magma1.1.0(227)

准备阶段:

1 安装CUDA

2安装cpu  BLAS

3安装LAPACK

安装过程:

1 按照README文档进行安装

2 在make.inc lib'中加入-lgfortran

3 出现error

gcc -O3 -DADD_ -DGPUSHMEM=130 -fPIC -Xlinker -zmuldefs -DGPUSHMEM=130  testing_zhetrd.o  -o testing_zhetrd lin/liblapacktest.a -L../lib \

          -lcuda -lmagma -lmagmablas -lmagma -L/opt/GotoBLAS2 -L/usr/local/cuda/lib64 -L/usr/lib64   /opt/GotoBLAS2/libgoto.a -lgoto -lpthread -lcublas -lcudart -llapack  -lm -lgfortran
../lib/libmagma.a(zlatrd.o): In function `magma_zlatrd':
zlatrd.cpp:(.text+0x3be): undefined reference to `zdotc'
collect2: ld returned 1 exit status
make: *** [testing_zhetrd] 错误 1

解决方案:参考和

The forum post linked above talks about how to fix the issue in zlatrd.cpp and clatrd.cpp by replacingblasf77_*dotc withcblas_*dotc_sub.

Be aware that the function is used twice. The first around line 256, and the second around line 325. Here are the changes to be made inzlatrd.cpp(在src目录下)

 

cblas_zdotc_sub(i, W(0, iw), ione, A(0, i), ione, &value);  // Line 256

  //blasf77_zdotc(&value, &i, W(0, iw), &ione, A(0, i), &ione);

  ...

  ...

  cblas_zdotc_sub(i_n, W(i +1, i), ione,A(i +1, i), ione, &value); // Line 326

  //blasf77_zdotc(&value, &i_n, W(i+1,i), &ione, A(i+1, i), &ione);

 
原因:
This problem comes from zdot not having the same interface in the different BLAS implementations. We didn't realize there would be a problem for GotoBLAS with this change. The way it is now will work for MKL. If you open file zlatrd.cpp, before calling blasf77_zdotc, there is a call to cblas_zdotc_sub that is commented. This is an alternative to calling the blasf77_zdotc but you would have to add linking to cblas (if it is not part of the GOTO BLAS). The other way is to see what is the ZDOT interface in GotoBLAS and call it the correct way. Meanwhile probably for the next release we will make all BLAS functions to use CBLAS and require linking to CBLAS to avoid problems like this.
4 运行时无法找到libgoto.so
解决:export LD_LIBRARY_PATH加上
 
参考安装方案:

OPTIONS

Firstly, MAGMA needs a CPU LAPACK and BLAS backend installed on your machine.

There are four options for this.

  1. Intel’s
  2. AMD’s
  3. Netlib’s +
  4. Netlib’s +

Each of the four options can be configured by one of the files make.inc.$(LIB). LIB is eithermkl,acml, atlas or goto. I wanted to go the opensource all the way with this.For reasons inexplicable, I chose GOTOBLAS2 over ATLAS.

GOTOBLAS2

That meant, I had to build GOTOBLAS2 first. It was mostly painless; Except, I had gcc 4.6. Which meant the compiler started  complaining about-l flags with nothing mentioned to the right. It was quickly evident that a parser was broken in the pipeline. After digging through perl code (with which I have *no* experience) for a few minutes, I had the fix. The following patch had to be made tof_check inside the root directory of gotoblas.

$link =~ s/\-rpath\s+/\-rpath\@/g;$link =~ s/\-l\ /\-l/g; # Add this new line around line 237.

MAGMA

Finally, with everything setup, I had to make a change or two to make.inc.goto.

- Change GPU_TARGET = 1 (because I use a fermi card. Leave as 0 if you have pre-fermi cards).
- Change lgoto to lgoto2
- Copy make.inc.goto to make.inc
Doing a make at this point halts with a .
The forum post linked above talks about how to fix the issue in zlatrd.cpp and clatrd.cpp by replacingblasf77_*dotc withcblas_*dotc_sub.
Be aware that the function is used twice. The first around line 256, and the second around line 325. Here are the changes to be made inzlatrd.cpp

如下

cblas_zdotc_sub(i, W(0, iw), ione, A(0, i), ione, &value);  // Line 256

  //blasf77_zdotc(&value, &i, W(0, iw), &ione, A(0, i), &ione);
  ...
  ...
  cblas_zdotc_sub(i_n, W(i +1, i), ione,A(i +1, i), ione, &value); // Line 326
  //blasf77_zdotc(&value, &i_n, W(i+1,i), &ione, A(i+1, i), &ione)

 
 
Make similar changes in clatrd.cpp. do a make. Add
-j if you are in a hurry. You are good to go!

 

2在深圳超算上安装MAGMA

./testing_sgeqrf: error while loading shared libraries: libcublas.so.4: failed to map segment from shared object: Cannot allocate memory

不知道是cuda没有安装好(因为权限问题驱动没有装好),还是系统的问题?

3安装CLMAGMA

需要opencl blas(这个从可以从AMD得到)

需要cpu blas 和 cpu lapack (使用 mkl)

大概还需要amd app

测试中报错,放弃。

 

  
 
 3MAGMA测试
使用多个GPU setenv MAGMA_NUM_GPUS 4
 
####在testing目录下,我们看到测试过程中使用了magma_sgeqrf2_gpu,magma_sgeqrf_gpu等同一函数的不同版本。这一般是因为存储策略不同。
sgeqrf2_gpu is LAPACK consistent in terms of input and output data layout. The sgeqrf_gpu version stores the triangular matrices used in the factorization. sgeqrf3_gpu stores the triangular matrices but also modifies the storage for the Householder vectors used in the factorization - 0s are put in the upper triangular parts of the panels, 1s on the diagonal, and the upper triangular parts are stored separately
####测试testing目录,我们会发现testing_sgeqrf 和testing_sgeqrf_gpu两个函数的结果都包含有cpu和gpu性能。原因是这样的:两者分别测试了sgeqrf的cpu接口和gpu接口,但是并不代表二者都仅仅使用cpu或者gpu.事实上,二者都是用了cpu和gpu,但是testing_sgeqrf来说,它的输入输出存储在cpu的mem上,而testing_sgeqrf_gpu的存储则是在gpu的mem上。
#####testing_sgeqrf 和testing_sgeqrf_gpu两个函数: 总的性能testing_sgeqrf/testing_sgeqrf_gpu大约为96%,,CPU的性能比:testing_sgeqrf/testing_sgeqrf_gpu=1.02,gpu的性能比testing_sgeqrf/testing_sgeqrf_gpu=0.96(测试规模1000-20000)
*_gpu表示输入输出存放在GPU中,而没有_gpu的表示存放在CPU中
4 QR分解代码(CUDA版)
 
 

转载于:https://www.cnblogs.com/catkins/archive/2012/06/11/5270787.html

你可能感兴趣的文章
C++ 继承、函数重载
查看>>
Javascript获取select下拉框选中的的值
查看>>
【Linux开发】CCS远程调试ARM,AM4378
查看>>
springmvc常用注解标签详解
查看>>
Linux之ssh服务介绍
查看>>
Java Swing提供的文件选择对话框 - JFileChooser
查看>>
排序:冒泡排序
查看>>
github下载安装
查看>>
Hive学习之路 (十九)Hive的数据倾斜
查看>>
Hat’s Words
查看>>
Java中instanceof关键字的用法总结
查看>>
引用类型-Function类型
查看>>
洗牌Shuffle'm Up POJ-3087 模拟
查看>>
设计模式之享元模式
查看>>
.vimrc配置
查看>>
Nginx Configuration 免费HTTPS加密证书
查看>>
(转)Android 仿订单出票效果 (附DEMO)
查看>>
高薪是怎么跳出来的
查看>>
jvm栈-运行控制,jvm-堆运行存储共享单元
查看>>
数据库多张表导出到excel
查看>>