luraess/alltoall_test_cuda_multigpu.jl

liaoweiyang2017 · 2022-11-29T07:51:01Z

When I run this code, here is the running info:

mpirun -np 2 julia alltoall_multiGPU.jl 
rank=0 rank_loc=0 (gpu_id=CuDevice(0)), size=2, dst=1, src=1
rank=1 rank_loc=1 (gpu_id=CuDevice(1)), size=2, dst=0, src=0
start sending...
ERROR: LoadError: ERROR: LoadError: ReadOnlyMemoryError()
Stacktrace:
 [1] MPI_Sendrecv(sendbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, sendcount::Int32, sendtype::MPI.Datatype, dest::Int64, sendtag::Int64, recvbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, recvcount::Int32, recvtype::MPI.Datatype, source::Int64, recvtag::Int64, comm::MPI.Comm, status::Base.RefValue{MPI.API.MPI_Status})
   @ MPI.API ~/.julia/packages/MPI/tJjHF/src/api/generated_api.jl:2268
 [2] Sendrecv!
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:236 [inlined]
 [3] Sendrecv!
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:241 [inlined]
 [4] Sendrecv!
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:246 [inlined]
 [5] #Sendrecv!#208
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:227 [inlined]
 [6] Sendrecv!(sendbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, dest::Int64, sendtag::Int64, recvbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, source::Int64, recvtag::Int64, comm::MPI.Comm)
   @ MPI ./deprecated.jl:72
 [7] top-level scope
   @ ~/julia_workshop/mpi_cuda/alltoall_multiGPU.jl:21
in expression starting at /home/em/julia_workshop/mpi_cuda/alltoall_multiGPU.jl:21
ReadOnlyMemoryError()
Stacktrace:
 [1] MPI_Sendrecv(sendbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, sendcount::Int32, sendtype::MPI.Datatype, dest::Int64, sendtag::Int64, recvbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, recvcount::Int32, recvtype::MPI.Datatype, source::Int64, recvtag::Int64, comm::MPI.Comm, status::Base.RefValue{MPI.API.MPI_Status})
   @ MPI.API ~/.julia/packages/MPI/tJjHF/src/api/generated_api.jl:2268
 [2] Sendrecv!
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:236 [inlined]
 [3] Sendrecv!
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:241 [inlined]
 [4] Sendrecv!
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:246 [inlined]
 [5] #Sendrecv!#208
   @ ~/.julia/packages/MPI/tJjHF/src/pointtopoint.jl:227 [inlined]
 [6] Sendrecv!(sendbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, dest::Int64, sendtag::Int64, recvbuf::CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}, source::Int64, recvtag::Int64, comm::MPI.Comm)
   @ MPI ./deprecated.jl:72
 [7] top-level scope
   @ ~/julia_workshop/mpi_cuda/alltoall_multiGPU.jl:21
in expression starting at /home/em/julia_workshop/mpi_cuda/alltoall_multiGPU.jl:21

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 110697 RUNNING AT EM
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

And my julia envrionment is:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.6 (2022-03-28)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.6.6
Commit b8708f954a (2022-03-28 07:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6238R CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)
Environment:
  JULIA_PKG_SERVER = https://mirrors.tuna.tsinghua.edu.cn/julia

(@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.1
  [336ed68f] CSV v0.10.4
  [052768ef] CUDA v3.9.0
  [992eb4ea] CondaPkg v0.2.12
  [a93c6f00] DataFrames v1.3.4
  [31a5f54b] Debugger v0.7.6
  [aae7a2af] DiffEqFlux v1.47.1
  [0c46a032] DifferentialEquations v7.1.0
  [aaf54ef3] DistributedArrays v0.6.6
  [31c24e10] Distributions v0.25.56
  [ffbed154] DocStringExtensions v0.8.6
  [1a297f60] FillArrays v0.13.2
  [587475ba] Flux v0.13.0
  [7bf95e4d] FluxTraining v0.3.0
  [0e44f5e4] Hwloc v2.0.0
  [7073ff75] IJulia v1.23.3
  [d8c32880] ImageInTerminal v0.4.8
  [86fae568] ImageView v0.11.0
  [916415d5] Images v0.25.2
  [40713840] IncompleteLU v0.2.0
  [42fd0dbc] IterativeSolvers v0.9.2
  [98e50ef6] JuliaFormatter v0.22.11
  [b7fa5abe] KittyTerminalImages v0.3.2
  [ba0b0d4f] Krylov v0.8.0
  [9a2cd570] KrylovMethods v0.6.0
  [7031d0ef] LazyGrids v0.3.0
  [5c8ed15e] LinearOperators v2.3.1
  [33e6dc65] MKL v0.4.3
  [eb30cadb] MLDatasets v0.5.16
  [da04e1cc] MPI v0.20.4
  [0987c9cc] MonteCarloMeasurements v1.0.8
  [9b87118b] PackageCompiler v1.7.7
  [d96e819e] Parameters v0.12.3
  [46dd5b70] Pardiso v0.5.4
  [149e707d] PkgDev v1.7.2
  [91a5bcdd] Plots v1.28.1
  [c3e4b0f8] Pluto v0.19.9
  [27ebfcd6] Primes v0.5.2
  [438e738f] PyCall v1.93.1
  [6099a3de] PythonCall v0.9.4
  [295af30f] Revise v3.3.3
  [5e47fb64] TestImages v1.6.2
  [37e2e46d] LinearAlgebra
  [de0858da] Printf
  [2f01184e] SparseArrays
  [8dfed614] Test

(@v1.6) pkg>

Is there anything wrong with my julia environment? And I have 2 Nvidia 3070 GPUs in my workstation. Could you help me to solve it?

luraess · 2022-11-29T10:12:09Z

Hi, it could be you're MPI has no CUDA-aware support. For CUDA-aware MPI to work, you should use system MPI that was linked against local CUDA install during compilation. If you are using system MPI that is supposed to be CUDA-aware, you can check it's functionality by typing MPI.has_cuda() (see here).

	using MPI
	using CUDA
	MPI.Init()
	comm = MPI.COMM_WORLD
	rank = MPI.Comm_rank(comm)
	# select device
	comm_l = MPI.Comm_split_type(comm, MPI.COMM_TYPE_SHARED, rank)
	rank_l = MPI.Comm_rank(comm_l)
	gpu_id = CUDA.device!(rank_l)
	# select device
	size = MPI.Comm_size(comm)
	dst = mod(rank+1, size)
	src = mod(rank-1, size)
	println("rank=$rank rank_loc=$rank_l (gpu_id=$gpu_id), size=$size, dst=$dst, src=$src")
	N = 4
	send_mesg = CuArray{Float64}(undef, N)
	recv_mesg = CuArray{Float64}(undef, N)
	fill!(send_mesg, Float64(rank))
	CUDA.synchronize()
	rank==0 && println("start sending...")
	MPI.Sendrecv!(send_mesg, dst, 0, recv_mesg, src, 0, comm)
	println("recv_mesg on proc $rank_l: $recv_mesg")
	rank==0 && println("done.")

luraess/alltoall_test_cuda_multigpu.jl

liaoweiyang2017 commented Nov 29, 2022 •

edited

Loading

Uh oh!

luraess commented Nov 29, 2022

Uh oh!

luraess/alltoall_test_cuda_multigpu.jl

liaoweiyang2017 commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luraess commented Nov 29, 2022

Uh oh!

liaoweiyang2017 commented Nov 29, 2022 •

edited

Loading