Hi all,
I am writing to seek suggestions on solving the error message coming from a Linux server when I tried to run OpenMPI. Specifically, I had the following two command lines to the terminal:
make mpi
mpirun -np 2 ./filename
Both commands run well on my own MacBook terminal. However, when I switch to a Linux server (provided by the university), the first one runs well, but the second one spins out the following error message, although the program seems to be running.
Any suggestion on understanding the message or fixing the issue would be highly appreciated!
SERVER:rank0.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank0.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank0: PSM3 can't open nic unit: 0 (err=23)
SERVER:rank0: PSM3 can't open nic unit: 0 (err=23)
SERVER:rank0.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank0: PSM3 can't open nic unit: 0 (err=23)
SERVER:rank1.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank1: PSM3 can't open nic unit: 0 (err=23)
SERVER:rank1.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank1: PSM3 can't open nic unit: 0 (err=23)
SERVER:rank1.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank1: PSM3 can't open nic unit: 0 (err=23)
SERVER:rank0.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank0: PSM3 can't open nic unit: 0 (err=23)
--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly
unusual; your job may behave unpredictably (and/or abort) after this.
Local host: SERVER
Location: mtl_ofi_component.c:513
Error: Invalid argument (22)
--------------------------------------------------------------------------
SERVER:rank1.FILENAME: Failed to get bond0 (unit 0) cpu set
SERVER:rank1: PSM3 can't open nic unit: 0 (err=23)
Hello. I am processor 0 out of 2
Hello. I am processor 1 out of 2
[SERVER:1022598] 1 more process has sent help message help-mtl-ofi.txt / OFI call fail
[SERVER:1022598] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Sorry, I don’t know how to block quote all this info and decide to use the code mode for the error message.]
Thanks, and I look forward to hearing from you!
Best,
Long