Научно-образовательный IT-форум при КНИТУ-КАИ

Информация о пользователе

Привет, Гость! Войдите или зарегистрируйтесь.



[+] MPI.NET + Windows Server 2012 R2

Сообщений 1 страница 6 из 6

1

РЕШЕНИЕ: http://landwatersun.ru/viewtopic.php?id=308#p1195

На сервере под управлением ОС MS Windows Server 2012 R2 установил необходимые пакеты для запуска и компиляции MPI.NET приложений (http://www.osl.iu.edu/research/mpi.net/files/1.0.0/MPI.NET SDK.msi), провел успешную компиляцию тестовой программы PingPong (c:\Program Files (x86)\MPI.NET\Examples\PingPong.cs) c применением Visual Studio Professional 2012.

При запуске локально посредством команды:

mpiexec –n 2 PingPong.exe

программа отрабатывает правильно с выводом на экран:
Rank 0 is alive and running on gpucluster4.cs.kstu-kai.ru
Pinging process with rank 1... Pong!
Rank 1 is alive and running on gpucluster4.cs.kstu-kai.ru

При запуске посредством команды:

mpiexec –hosts 1 gpucluster4.cs.kstu-kai.ru 2 PingPong.exe

процессы виснут.
Лог модуля smpd.exe при этом следующий:
[-1:3168]...\smpd_get_opt_int
[-1:3168]....\smpd_get_opt
[-1:3168]..../smpd_get_opt
[-1:3168]....\smpd_get_opt_int
[-1:3168].....\smpd_get_opt_int
[-1:3168]......\smpd_get_opt
[-1:3168]....../smpd_get_opt
[-1:3168]...../smpd_parse_command_args
[-1:3168].....\smpd_entry_point
[00:3168]......created a set for the listener: 352
[00:3168]......\smpd_create_context
[00:3168].......\smpd_init_context
[00:3168]......./smpd_init_context
[00:3168]....../smpd_create_context
[00:3168]......smpd listening on port 8677
[00:3168]......\smpd_enter_at_state
[00:3168].......sock_waiting for the next event.
[00:3168].......\smpd_state_smpd_listening
[00:3168]........authenticating new connection
[00:3168]........\smpd_create_context
[00:3168].........\smpd_init_context
[00:3168]........./smpd_init_context
[00:3168]......../smpd_create_context
[00:3168]......./smpd_state_smpd_listening
[00:3168].......sock_waiting for the next event.
[00:3168].......\smpd_state_reading_job_context
[00:3168]........read job context: 'job'
[00:3168]........calling smpd_server_auth_connection_ex
[00:3168]........\smpd_sspi_server_context_init
[00:3168].........calling QuerySecurityPackageInfo
[00:3168].........NTLM package, NTLM Security Package, with: max 2888 byte token
, capabilities bitmask 0x882b37
[00:3168].........calling AcquireCredentialsHandle
[00:3168]......../smpd_sspi_server_context_init
[00:3168]......./smpd_state_reading_job_context
[00:3168].......sock_waiting for the next event.
[00:3168].......\smpd_server_read_sspi_buffer
[00:3168]........read sspi header: '53'
[00:3168]......./smpd_server_read_sspi_buffer
[00:3168].......sock_waiting for the next event.

Интересно, чем может быть вызвано зависание процессов? Следует учесть, что на обычном маломощном узле под управлением Windows XP, 7 все работает исправно.

Вопрос также переадресован на http://social.technet.microsoft.com/For … orum=WS8ru

2

При подаче команды mpiexec –hosts 1 gpucluster4.cs.kstu-kai.ru 2 PingPong.exe процесс виснет потому, что не может определить правильно его количество. Прочитав несколько статей в интернете, я узнала, что команду для запуска приложения на кластере нужно прописывать немного иначе:
mpiexec –hosts 1 gpucluster4.cs.kstu-kai.ru : -n 2 PingPong.exe

3

Дерюжова Н.В. написал(а):

mpiexec –hosts 1 gpucluster4.cs.kstu-kai.ru : -n 2 PingPong.exe

Интересная версия. На обычном персональном узле подобный запуск вызывает ошибку:

Error: no executable specified
Unable to parse the mpiexec command arguments.

Проверю завтра на сервере.

4

Забыл отметить, что при правильном исполнении команды:

mpiexec –hosts 1 ИМЯ_МАШИНЫ 2 PingPong.exe

лог модуля smpd должен заканчиваться примерно так:

[01:3864].........../smpd_create_command
[01:3864]...........\smpd_add_command_int_arg
[01:3864].........../smpd_add_command_int_arg
[01:3864]...........\smpd_add_command_arg
[01:3864].........../smpd_add_command_arg
[01:3864]...........\smpd_add_command_arg
[01:3864].........../smpd_add_command_arg
[01:3864]...........sending reply to barrier command 'C628E4E1-CBE3-4763-8661-46
9490AA4935'.
[01:3864]...........\smpd_add_command_arg
[01:3864].........../smpd_add_command_arg
[01:3864]...........sending result command to pmi server context: "cmd=result sr
c=1 dest=1 tag=14 cmd_tag=5 cmd_orig=barrier ctx_key=1 result=DBS_SUCCESS "
[01:3864]...........\smpd_post_write_command
[01:3864]............\smpd_package_command
[01:3864]............/smpd_package_command
[01:3864]............smpd_post_write_command on the pmi server context sock 476:
101 bytes for command: "cmd=result src=1 dest=1 tag=14 cmd_tag=5 cmd_orig=barri
er ctx_key=1 result=DBS_SUCCESS "
[01:3864].........../smpd_post_write_command
[01:3864]........../smpd_handle_barrier_command
[01:3864]........./smpd_handle_command
[01:3864].........\smpd_post_read_command
[01:3864]..........posting a read for a command header on the pmi server context
, sock 476
[01:3864]........./smpd_post_read_command
[01:3864]......../smpd_state_reading_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_writing_cmd
[01:3864].........wrote command
[01:3864].........command written to pmi server: "cmd=result src=1 dest=1 tag=13
cmd_tag=5 cmd_orig=barrier ctx_key=0 result=DBS_SUCCESS "
[01:3864].........\smpd_free_command
[01:3864]........./smpd_free_command
[01:3864]......../smpd_state_writing_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_cmd_header
[01:3864].........read command header
[01:3864].........command header read, posting read for data: 39 bytes
[01:3864]......../smpd_state_reading_cmd_header
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_writing_cmd
[01:3864].........wrote command
[01:3864].........command written to pmi server: "cmd=result src=1 dest=1 tag=14
cmd_tag=5 cmd_orig=barrier ctx_key=1 result=DBS_SUCCESS "
[01:3864].........\smpd_free_command
[01:3864]........./smpd_free_command
[01:3864]......../smpd_state_writing_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_cmd_header
[01:3864].........read command header
[01:3864].........command header read, posting read for data: 39 bytes
[01:3864]......../smpd_state_reading_cmd_header
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_cmd
[01:3864].........read command
[01:3864].........\smpd_parse_command
[01:3864]........./smpd_parse_command
[01:3864].........read command: "cmd=done src=1 dest=1 tag=6 ctx_key=0 "
[01:3864].........\smpd_handle_command
[01:3864]..........handling command:
[01:3864].......... src  = 1
[01:3864].......... dest = 1
[01:3864].......... cmd  = done
[01:3864].......... tag  = 6
[01:3864].......... ctx  = pmi server
[01:3864].......... len  = 39
[01:3864].......... str  = cmd=done src=1 dest=1 tag=6 ctx_key=0
[01:3864]..........\smpd_command_destination
[01:3864]...........1 -> 1 : returning NULL context
[01:3864]........../smpd_command_destination
[01:3864]........./smpd_handle_command
[01:3864].........not posting read for another command because SMPD_CLOSE return
ed
[01:3864]......../smpd_state_reading_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_cmd
[01:3864].........read command
[01:3864].........\smpd_parse_command
[01:3864]........./smpd_parse_command
[01:3864].........read command: "cmd=done src=1 dest=1 tag=6 ctx_key=1 "
[01:3864].........\smpd_handle_command
[01:3864].........ReadFile failed, error 109
[01:3864].........*** smpd_piothread finishing pid:5892 ***
[01:3864]..........[01:3864]..........*** smpd_piothread finishing pid:5892 ***[
01:3864].......... src  = 1
othread finishing pid:5892 ***
[01:3864].......... src  = 1
[01:3864].......... dest = 1
[01:3864].......... cmd  = done
[01:3864].......... tag  = 6
[01:3864].......... ctx  = pmi server
[01:3864].......... len  = 39
[01:3864].......... str  = cmd=done src=1 dest=1 tag=6 ctx_key=1
[01:3864]..........\smpd_command_destination
[01:3864]...........1 -> 1 : returning NULL context
[01:3864]........../smpd_command_destination
[01:3864]........./smpd_handle_command
[01:3864].........not posting read for another command because SMPD_CLOSE return
ed
[01:3864]......../smpd_state_reading_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the pmi server context.
[01:3864].........process refcount == 2, pmi server closed.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864].........ReadFile failed, error 109
[01:3864].........*** smpd_piothread finishing pid:7116 ***
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_stdouterr
[01:3864]......../smpd_state_reading_stdouterr
[01:3864]........Operation failed - result = -1, closing stderr context.
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_stdouterr
[01:3864]......../smpd_state_reading_stdouterr
[01:3864]........Operation failed - result = -1, closing stdout context.
[01:3864]........ReadFile failed, error 109
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the pmi server context.[01:3864].........soc
ket closed for the pmi server context.
[01:3864].........process refcount == 2, pmi server closed.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_stdouterr
[01:3864]......../smpd_state_reading_stdouterr
[01:3864]........Operation failed - result = -1, closing stdout context.
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the stderr context.
[01:3864].........process refcount == 1, stderr closed.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the stdout context.
[01:3864].........process refcount == 0, waiting for the process to finish exiti
ng.
[01:3864].........\smpd_wait_process
[01:3864]........./smpd_wait_process
[01:3864].........*** smpd_pinthread finishing pid:5892 ***
[01:3864].........\smpd_create_command
[01:3864]........./smpd_create_command
[01:3864].........\smpd_add_command_int_arg
[01:3864]........./smpd_add_command_int_arg
[01:3864].........\smpd_add_command_int_arg
[01:3864]........./smpd_add_command_int_arg
[01:3864].........\smpd_add_command_arg
[01:3864]........./smpd_add_command_arg
[01:3864].........creating an exit command for rank 1, pid 5892, exit code 0.
[01:3864].........\smpd_post_write_command
[01:3864]..........\smpd_package_command
[01:3864]........../smpd_package_command
[01:3864]..........smpd_post_write_command on the parent context sock 348: 98 by
tes for command: "cmd=exit src=1 dest=0 tag=15 rank=1 code=0 kvs=C628E4E1-CBE3-4
763-8661-469490AA4935 "
[01:3864]........./smpd_post_write_command
[01:3864].........\smpd_free_process_struct
[01:3864]........./smpd_free_process_struct
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_stdouterr
[01:3864]......../smpd_state_reading_stdouterr
[01:3864]........Operation failed - result = -1, closing stderr context.
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the stdout context.
[01:3864].........process refcount == 1, stdout closed.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the stdin context.
[01:3864].........Unaffiliated stdin context closing.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_writing_cmd
[01:3864].........wrote command
[01:3864].........command written to parent: "cmd=exit src=1 dest=0 tag=15 rank=
1 code=0 kvs=C628E4E1-CBE3-4763-8661-469490AA4935 "
[01:3864].........\smpd_free_command
[01:3864]........./smpd_free_command
[01:3864]......../smpd_state_writing_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the stderr context.
[01:3864].........process refcount == 0, waiting for the process to finish exiti
ng.
[01:3864].........\smpd_wait_process
[01:3864]........./smpd_wait_process
[01:3864].........\smpd_create_command
[01:3864].........*** smpd_pinthread finishing pid:7116 ***
[01:3864]........./smpd_create_command
[01:3864].........\smpd_add_command_int_arg
[01:3864]........./smpd_add_command_int_arg
[01:3864].........\smpd_add_command_int_arg
[01:3864]........./smpd_add_command_int_arg
[01:3864].........\smpd_add_command_arg
[01:3864]........./smpd_add_command_arg
[01:3864].........creating an exit command for rank 0, pid 7116, exit code 0.
[01:3864].........\smpd_post_write_command
[01:3864]..........\smpd_package_command
[01:3864]........../smpd_package_command
[01:3864]..........smpd_post_write_command on the parent context sock 348: 98 by
tes for command: "cmd=exit src=1 dest=0 tag=16 rank=0 code=0 kvs=C628E4E1-CBE3-4
763-8661-469490AA4935 "
[01:3864]........./smpd_post_write_command
[01:3864].........\smpd_free_process_struct
[01:3864]........./smpd_free_process_struct
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the stdin context.
[01:3864].........Unaffiliated stdin context closing.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864]......../smpd_handle_op_close
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_writing_cmd
[01:3864].........wrote command
[01:3864].........command written to parent: "cmd=exit src=1 dest=0 tag=16 rank=
0 code=0 kvs=C628E4E1-CBE3-4763-8661-469490AA4935 "
[01:3864].........\smpd_free_command
[01:3864]........./smpd_free_command
[01:3864]......../smpd_state_writing_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_cmd_header
[01:3864].........read command header
[01:3864].........command header read, posting read for data: 30 bytes
[01:3864]......../smpd_state_reading_cmd_header
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_reading_cmd
[01:3864].........read command
[01:3864].........\smpd_parse_command
[01:3864]........./smpd_parse_command
[01:3864].........read command: "cmd=close src=0 dest=1 tag=7 "
[01:3864].........\smpd_handle_command
[01:3864]..........handling command:
[01:3864].......... src  = 0
[01:3864].......... dest = 1
[01:3864].......... cmd  = close
[01:3864].......... tag  = 7
[01:3864].......... ctx  = parent
[01:3864].......... len  = 30
[01:3864].......... str  = cmd=close src=0 dest=1 tag=7
[01:3864]..........\smpd_command_destination
[01:3864]...........1 -> 1 : returning NULL context
[01:3864]........../smpd_command_destination
[01:3864]..........\smpd_handle_close_command
[01:3864]...........\smpd_create_command
[01:3864].........../smpd_create_command
[01:3864]...........sending closed command to parent: "cmd=closed src=1 dest=0 t
ag=17 "
[01:3864]...........\smpd_post_write_command
[01:3864]............\smpd_package_command
[01:3864]............/smpd_package_command
[01:3864]............smpd_post_write_command on the parent context sock 348: 45
bytes for command: "cmd=closed src=1 dest=0 tag=17 "
[01:3864].........../smpd_post_write_command
[01:3864]...........posted closed command.
[01:3864]........../smpd_handle_close_command
[01:3864]........./smpd_handle_command
[01:3864].........not posting read for another command because SMPD_CLOSE return
ed
[01:3864]......../smpd_state_reading_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_state_writing_cmd
[01:3864].........wrote command
[01:3864].........command written to parent: "cmd=closed src=1 dest=0 tag=17 "
[01:3864].........closed command written, posting close of the sock.
[01:3864].........MPIDU_Sock_post_close(348)
[01:3864].........\smpd_free_command
[01:3864]........./smpd_free_command
[01:3864]......../smpd_state_writing_cmd
[01:3864]........sock_waiting for the next event.
[01:3864]........\smpd_handle_op_close
[01:3864].........socket closed for the parent context.
[01:3864].........Unaffiliated parent context closing.
[01:3864].........\smpd_free_context
[01:3864]........./smpd_free_context
[01:3864].........all contexts closed, exiting state machine.
[01:3864]......../smpd_handle_op_close
[01:3864]......./smpd_enter_at_state
[01:3864]....../smpd_parse_command_args
[01:3864]......\smpd_exit
[01:3864].......\smpd_kill_all_processes
[01:3864]......./smpd_kill_all_processes

5

C:\Users\Администратор\Documents\Visual Studio 2012\Projects\ConsoleApplication1
\ConsoleApplication1\bin\Debug>mpiexec -hosts 1 gpucluster4.cs.kstu-kai.ru : -n
2 PingPong.exe
Error: no executable specified.

6

Вопрос решен.

Удалил пакет Microsoft Compute Cluster Pack SDK (sdk_x64), оказалось на сервере он НЕ работает исправно. Оставил тот пакет, который ранее был установлен, HPC Pack 2012 R2 MS-MPI Redistributable Package http://www.microsoft.com/en-us/download … x?id=41634 В данном пакете также имеются модули mpiexec и smpd, которые на сервере работают исправно.

Вывод: при установке ПО на сервер нужно быть внимательным, так как порой ПО, предназначенное для обычных персональных машин, не всегда исправно работает на серверных ОС.