c cuda 指定gpu_最简单的方法来测试来自cmake的cuda功能的GPU的存在？

We have some nightly build machines that have the cuda libraries installed, but which do not have a cuda-capable GPU installed.These machines are capable of building cuda-enabled programs, but they ..

weixin_39662228

379人浏览 · 2020-12-19 22:27:02

weixin_39662228 · 2020-12-19 22:27:02 发布

We have some nightly build machines that have the cuda libraries installed, but which do not have a cuda-capable GPU installed. These machines are capable of building cuda-enabled programs, but they are not capable of running these programs.

In our automated nightly build process, our cmake scripts use the cmake command

find_package(CUDA)

to determine whether the cuda software is installed. This sets the cmake variable CUDA_FOUND on platforms that have cuda software installed. This is great and it works perfectly. When CUDA_FOUND is set, it is OK to build cuda-enabled programs. Even when the machine has no cuda-capable GPU.

But cuda-using test programs naturally fail on the non-GPU cuda machines, causing our nightly dashboards look "dirty". So I want cmake to avoid running those tests on such machines. But I still want to build the cuda software on those machines.

After getting a positive CUDA_FOUND result, I would like to test for the presence of an actual GPU, and then set a variable, say CUDA_GPU_FOUND, to reflect this.

What is the simplest way to get cmake to test for the presence of a cuda-capable gpu?

This needs to work on three platforms: Windows with MSVC, Mac, and Linux. (That's why we use cmake in the first place)

EDIT: There are a couple of good looking suggestions in the answers for how write a program to test for the presence of a GPU. What is still missing is the means of getting CMake to compile and run this program at configuration time. I suspect that the TRY_RUN command in CMake will be critical here, but unfortunately that command is nearly undocumented, and I cannot figure out how to make it work. This CMake part of the problem might be a much more difficult question. Perhaps I should have asked this as two separate questions...

解决方案

The answer to this question consists of two parts:

A program to detect the presence of a cuda-capable GPU.

CMake code to compile, run, and interpret the result of that program at configuration time.

For part 1, the gpu sniffing program, I started with the answer provided by fabrizioM because it is so compact. I quickly discovered that I needed many of the details found in unknown's answer to get it to work well. What I ended up with is the following C source file, which I named has_cuda_gpu.c:

#include

int main() {

int deviceCount, device;

int gpuDeviceCount = 0;

struct cudaDeviceProp properties;

cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount);

if (cudaResultCode != cudaSuccess)

deviceCount = 0;

/* machines with no GPUs can still report one emulation device */

for (device = 0; device < deviceCount; ++device) {

cudaGetDeviceProperties(&properties, device);

if (properties.major != 9999) /* 9999 means emulation only */

++gpuDeviceCount;

}

printf("%d GPU CUDA device(s) found\n", gpuDeviceCount);

/* don't just return the number of gpus, because other runtime cuda

errors can also yield non-zero return values */

if (gpuDeviceCount > 0)

return 0; /* success */

else

return 1; /* failure */

}

Notice that the return code is zero in the case where a cuda-enabled GPU is found. This is because on one of my has-cuda-but-no-GPU machines, this program generates a runtime error with non-zero exit code. So any non-zero exit code is interpreted as "cuda does not work on this machine".

You might ask why I don't use cuda emulation mode on non-GPU machines. It is because emulation mode is buggy. I only want to debug my code, and work around bugs in cuda GPU code. I don't have time to debug the emulator.

The second part of the problem is the cmake code to use this test program. After some struggle, I have figured it out. The following block is part of a larger CMakeLists.txt file:

find_package(CUDA)

if(CUDA_FOUND)

try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR

${CMAKE_BINARY_DIR}

${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c

CMAKE_FLAGS

-DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE}

-DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY}

COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR

RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR)

message("${RUN_OUTPUT_VAR}") # Display number of GPUs found

# COMPILE_RESULT_VAR is TRUE when compile succeeds

# RUN_RESULT_VAR is zero when a GPU is found

if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR)

set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present")

else()

set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present")

endif()

endif(CUDA_FOUND)

This sets a CUDA_HAVE_GPU boolean variable in cmake that can subsequently be used to trigger conditional operations.

It took me a long time to figure out that the include and link parameters need to go in the CMAKE_FLAGS stanza, and what the syntax should be. The try_run documentation is very light, but there is more information in the try_compile documentation, which is a closely related command. I still needed to scour the web for examples of try_compile and try_run before getting this to work.

Another tricky but important detail is the third argument to try_run, the "bindir". You should probably always set this to ${CMAKE_BINARY_DIR}. In particular, do not set it to ${CMAKE_CURRENT_BINARY_DIR} if you are in a subdirectory of your project. CMake expects to find the subdirectory CMakeFiles/CMakeTmp within bindir, and spews errors if that directory does not exist. Just use ${CMAKE_BINARY_DIR}, which is one location where those subdirectories seem to naturally reside.

FlagOS智算系统软件栈

欢迎来到FlagOS开发社区，这里是一个汇聚了AI开发者、数据科学家、机器学习爱好者以及业界专家的活力平台。我们致力于成为业内领先的Triton技术交流与应用分享的殿堂，为推动人工智能技术的普及与深化应用贡献力量。

更多推荐

不可不知小技巧｜CPP Wrapper 完全指南：让你的 Triton 算子性能再提升一步

通过本文的探讨，我们深入了解了在处理计算量很小的算子时，如何通过 C++ wrapper 来降低 Wrapper+JIT runtime 的开销。通过 C++ wrapper，我们可以直接在 C++ 层面进行类型管理和内存分配，从而避免 Python 调用带来的额外开销。实验数据也证明了这种方法的有效性，性能提升显著。对于大模型推理、端侧部署、高频小算子调用等场景，C++ Wrapper 能让 T

FlagOS智算系统软件栈

10芯齐发：众智FlagOS完成DeepSeek-V4多芯适配，清微智能与曦望实现284B模型版本的适配开源

在FlagOS的统一算子库FlagGems、统一编译器FlagTree及基于FlagScale的多芯片适配支持下，海光、沐曦、华为、摩尔线程（FP8）、昆仑芯、平头哥、天数、英伟达（FP8）、清微、曦望等10款芯片，已经完成 DeepSeek-V4系列模型的跨芯适配及验证。同时，基于 FlagRelease 直接提供了多芯片版本的 DeepSeek-V4-FlagOS 模型版本，标准化 Docke

FlagOS智算系统软件栈

大模型新拐点：FlagOS+Engram 开启算存协同新时代

本文首先介绍 Engram 的核心思想，然后阐述基于 FlagOS 系统软件栈中的训练插件完成的 Engram 架构全链路复现。在此基础上，重点展示 FlagOS 针对 Engram 进行的三大工程优化。实验量化结果表明，FlagOS 对 Engram 的优化在保持额外负载为零的前提下，使吞吐最高提升近 150%。