Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.
linux cluster distribute computing no speed up
Posted May 8, 2012, 11:50 p.m. EDT Cluster & Cloud Computing 5 Replies
Please login with a confirmed email address before reporting spam
cluster configuration:1head node+8nodes,32cores each node,64GB ram each node。
comsol:comsol4.0 floating network licence installed on the head node, intel mpi
mpi file: COMSOL40\models\ACDC_Module\Verification_Models\parallel_wires.mph the air domain is increased in order to get more DOF, stationary study, direct solver MUMPS
test process:
Step 1. Solve this problem in 1 node using 16 cores
Step 2. Solve it in 4 nodes using 16 cores each node
test result: the solution time is almost the same (about 100 seconds), no speedup is obtained. It seems that all the nodes do the same thing.
Please check the folowing logs and help me out please, Thanks a lot.
PBS log for Step 1
------------------------------------------------------------------------------
--- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol
--- Running on 16 processes (cores) on the following nodes:
16 node5
--- mpd BOOT
/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verbose
running mpdallexit on node5
LAUNCHED mpd on node5 via
RUNNING: mpd on node5
--- mpd TRACE
--- Parallel COMSOL RUN
/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log
--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012
--------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol--- Running on 16 processes (cores) on the following nodes: 16 node5--- mpd BOOT/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verboserunning mpdallexit on node5LAUNCHED mpd on node5 via RUNNING: mpd on node5--- mpd TRACE--- Parallel COMSOL RUN/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012--------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol--- Running on 16 processes (cores) on the following nodes: 16 node5--- mpd BOOT/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verboserunning mpdallexit on node5LAUNCHED mpd on node5 via RUNNING: mpd on node5--- mpd TRACE--- Parallel COMSOL RUN/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012------------------------------------------------------------------------------
PBS log for step 2
------------------------------------------------------------------------------
--- Starting job at: Wed May 9 11:04:36 CST 2012
--- Current working directory is: /home/lubo/comsol
--- Running on 64 processes (cores) on the following nodes:
16 node5
16 node3
16 node2
16 node1
--- mpd BOOT
/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 mpd boot -f /var/torque/aux//1724.mgmt -mpirsh ssh --verbose
running mpdallexit on node5
LAUNCHED mpd on node5 via
RUNNING: mpd on node5
LAUNCHED mpd on node3 via node5
LAUNCHED mpd on node2 via node5
LAUNCHED mpd on node1 via node5
RUNNING: mpd on node3
RUNNING: mpd on node2
RUNNING: mpd on node1
--- mpd TRACE
--- Parallel COMSOL RUN
/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log
--- mpd ALLEXIT
--- Job finished at: Wed May 9 11:06:31 CST 2012
------------------------------------------------------------------------------
license logs for step 2
.....
11:01:05 (lmgrd) LMCOMSOL using TCP-port 46447
11:03:00 (LMCOMSOL) TCP_NODELAY NOT enabled
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node5
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node2
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node3
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node1
11:03:08 (LMCOMSOL) OUT: "ACDC" lubo@node5
11:03:08 (LMCOMSOL) OUT: "COMSOL" lubo@node5
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node5
11:04:49 (LMCOMSOL) IN: "ACDC" lubo@node5
11:04:49 (LMCOMSOL) IN: "COMSOL" lubo@node5
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node1
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node3
11:04:50 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node2
comsol bach output for step 1
*******************************************
********COMSOL progress output file********
*******************************************
Wed May 09 10:58:25 CST 2012
---------- Current Progress: 100 %
Memory: 348/348 1048/1048
Current Progress: 0 %
Memory: 370/370 1065/1065
---------- Current Progress: 100 %
Memory: 387/387 1082/1082
Current Progress: 0 %
Memory: 527/527 1359/1359
Linear solver
Number of degrees of freedom solved for: 1484195
- Current Progress: 10 %
Memory: 526/527 1379/1379
- Current Progress: 13 %
Memory: 932/932 1885/1885
- Current Progress: 15 %
Memory: 1125/1125 2057/2057
-- Current Progress: 27 %
Memory: 1068/1125 2003/2057
--- Current Progress: 30 %
Memory: 876/1125 1812/2057
Symmetric matrices found.
--- Current Progress: 31 %
Memory: 1230/1230 2958/2958
--- Current Progress: 32 %
Memory: 1254/1254 2958/2958
--- Current Progress: 33 %
Memory: 1279/1279 2958/2958
--- Current Progress: 34 %
Memory: 1298/1298 2958/2958
--- Current Progress: 36 %
Memory: 1318/1318 2958/2958
--- Current Progress: 37 %
Memory: 1359/1359 2958/2958
--- Current Progress: 38 %
Memory: 1383/1383 2958/2958
--- Current Progress: 39 %
Memory: 1399/1399 2958/2958
---- Current Progress: 40 %
Memory: 1421/1421 2958/2958
---- Current Progress: 41 %
Memory: 1448/1448 2958/2958
---- Current Progress: 42 %
Memory: 1460/1460 2958/2958
---- Current Progress: 43 %
Memory: 1481/1481 2958/2958
---- Current Progress: 46 %
Memory: 1497/1497 2958/2958
---- Current Progress: 48 %
Memory: 1514/1514 2958/2958
----- Current Progress: 50 %
Memory: 1533/1533 2958/2958
----- Current Progress: 52 %
Memory: 1549/1549 2958/2958
----- Current Progress: 54 %
Memory: 1562/1562 2958/2958
----- Current Progress: 56 %
Memory: 1580/1580 2958/2958
----- Current Progress: 59 %
Memory: 1597/1597 2958/2958
------ Current Progress: 60 %
Memory: 1622/1622 2958/2958
------ Current Progress: 62 %
Memory: 1628/1628 2958/2958
------ Current Progress: 64 %
Memory: 1647/1647 2958/2958
------ Current Progress: 67 %
Memory: 1662/1662 2958/2958
------ Current Progress: 68 %
Memory: 1682/1682 2958/2958
------- Current Progress: 70 %
Memory: 1693/1693 2958/2958
------- Current Progress: 73 %
Memory: 1710/1710 2958/2958
------- Current Progress: 75 %
Memory: 1726/1726 2958/2958
------- Current Progress: 77 %
Memory: 1753/1753 2958/2958
------- Current Progress: 78 %
Memory: 1762/1762 2958/2958
-------- Current Progress: 81 %
Memory: 1777/1777 2958/2958
-------- Current Progress: 83 %
Memory: 1796/1796 2958/2958
-------- Current Progress: 85 %
Memory: 1811/1811 2958/2958
-------- Current Progress: 87 %
Memory: 1828/1828 2958/2958
--------- Current Progress: 90 %
Memory: 1844/1844 2958/2958
Iter Damping Stepsize #Res #Jac #Sol
---------- Current Progress: 100 %
Memory: 2161/2161 3284/3284
1 1.0000000 0.53 1 1 1
Total time: 91.721 s.
comsol batch output for step 2
*******************************************
********COMSOL progress output file********
*******************************************
Wed May 09 11:04:41 CST 2012
---------- Current Progress: 100 %
Memory: 361/361 1061/1061
Current Progress: 0 %
Memory: 383/383 1082/1082
---------- Current Progress: 100 %
Memory: 395/395 1091/1091
Current Progress: 0 %
Memory: 543/543 1375/1375
Linear solver
Number of degrees of freedom solved for: 1484195
- Current Progress: 10 %
Memory: 657/657 1592/1592
- Current Progress: 13 %
Memory: 1269/1269 2365/2365
-- Current Progress: 20 %
Memory: 1122/1269 2057/2365
-- Current Progress: 27 %
Memory: 1079/1269 2016/2365
--- Current Progress: 30 %
Memory: 904/1269 1842/2365
Symmetric matrices found.
--- Current Progress: 31 %
Memory: 1141/1269 2391/2391
--- Current Progress: 33 %
Memory: 1158/1269 2391/2391
--- Current Progress: 35 %
Memory: 1178/1269 2391/2391
--- Current Progress: 37 %
Memory: 1195/1269 2391/2391
--- Current Progress: 39 %
Memory: 1216/1269 2391/2391
---- Current Progress: 41 %
Memory: 1227/1269 2391/2391
---- Current Progress: 43 %
Memory: 1244/1269 2391/2391
---- Current Progress: 45 %
Memory: 1261/1269 2391/2391
---- Current Progress: 47 %
Memory: 1283/1283 2391/2391
---- Current Progress: 49 %
Memory: 1305/1305 2391/2391
----- Current Progress: 52 %
Memory: 1321/1321 2391/2391
----- Current Progress: 54 %
Memory: 1338/1338 2391/2391
----- Current Progress: 55 %
Memory: 1363/1363 2391/2391
Iter Damping Stepsize #Res #Jac #Sol
--------- Current Progress: 94 %
Memory: 1318/1363 2545/2545
1 1.0000000 0.53 1 1 1
---------- Current Progress: 100 %
Memory: 1853/1853 3262/3262
Node 1:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Node 2:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Node 3:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Total time: 108.638 s.
comsol:comsol4.0 floating network licence installed on the head node, intel mpi
mpi file: COMSOL40\models\ACDC_Module\Verification_Models\parallel_wires.mph the air domain is increased in order to get more DOF, stationary study, direct solver MUMPS
test process:
Step 1. Solve this problem in 1 node using 16 cores
Step 2. Solve it in 4 nodes using 16 cores each node
test result: the solution time is almost the same (about 100 seconds), no speedup is obtained. It seems that all the nodes do the same thing.
Please check the folowing logs and help me out please, Thanks a lot.
PBS log for Step 1
------------------------------------------------------------------------------
--- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol
--- Running on 16 processes (cores) on the following nodes:
16 node5
--- mpd BOOT
/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verbose
running mpdallexit on node5
LAUNCHED mpd on node5 via
RUNNING: mpd on node5
--- mpd TRACE
--- Parallel COMSOL RUN
/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log
--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012
--------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol--- Running on 16 processes (cores) on the following nodes: 16 node5--- mpd BOOT/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verboserunning mpdallexit on node5LAUNCHED mpd on node5 via RUNNING: mpd on node5--- mpd TRACE--- Parallel COMSOL RUN/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012--------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol--- Running on 16 processes (cores) on the following nodes: 16 node5--- mpd BOOT/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verboserunning mpdallexit on node5LAUNCHED mpd on node5 via RUNNING: mpd on node5--- mpd TRACE--- Parallel COMSOL RUN/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012------------------------------------------------------------------------------
PBS log for step 2
------------------------------------------------------------------------------
--- Starting job at: Wed May 9 11:04:36 CST 2012
--- Current working directory is: /home/lubo/comsol
--- Running on 64 processes (cores) on the following nodes:
16 node5
16 node3
16 node2
16 node1
--- mpd BOOT
/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 mpd boot -f /var/torque/aux//1724.mgmt -mpirsh ssh --verbose
running mpdallexit on node5
LAUNCHED mpd on node5 via
RUNNING: mpd on node5
LAUNCHED mpd on node3 via node5
LAUNCHED mpd on node2 via node5
LAUNCHED mpd on node1 via node5
RUNNING: mpd on node3
RUNNING: mpd on node2
RUNNING: mpd on node1
--- mpd TRACE
--- Parallel COMSOL RUN
/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log
--- mpd ALLEXIT
--- Job finished at: Wed May 9 11:06:31 CST 2012
------------------------------------------------------------------------------
license logs for step 2
.....
11:01:05 (lmgrd) LMCOMSOL using TCP-port 46447
11:03:00 (LMCOMSOL) TCP_NODELAY NOT enabled
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node5
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node2
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node3
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node1
11:03:08 (LMCOMSOL) OUT: "ACDC" lubo@node5
11:03:08 (LMCOMSOL) OUT: "COMSOL" lubo@node5
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node5
11:04:49 (LMCOMSOL) IN: "ACDC" lubo@node5
11:04:49 (LMCOMSOL) IN: "COMSOL" lubo@node5
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node1
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node3
11:04:50 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node2
comsol bach output for step 1
*******************************************
********COMSOL progress output file********
*******************************************
Wed May 09 10:58:25 CST 2012
---------- Current Progress: 100 %
Memory: 348/348 1048/1048
Current Progress: 0 %
Memory: 370/370 1065/1065
---------- Current Progress: 100 %
Memory: 387/387 1082/1082
Current Progress: 0 %
Memory: 527/527 1359/1359
Linear solver
Number of degrees of freedom solved for: 1484195
- Current Progress: 10 %
Memory: 526/527 1379/1379
- Current Progress: 13 %
Memory: 932/932 1885/1885
- Current Progress: 15 %
Memory: 1125/1125 2057/2057
-- Current Progress: 27 %
Memory: 1068/1125 2003/2057
--- Current Progress: 30 %
Memory: 876/1125 1812/2057
Symmetric matrices found.
--- Current Progress: 31 %
Memory: 1230/1230 2958/2958
--- Current Progress: 32 %
Memory: 1254/1254 2958/2958
--- Current Progress: 33 %
Memory: 1279/1279 2958/2958
--- Current Progress: 34 %
Memory: 1298/1298 2958/2958
--- Current Progress: 36 %
Memory: 1318/1318 2958/2958
--- Current Progress: 37 %
Memory: 1359/1359 2958/2958
--- Current Progress: 38 %
Memory: 1383/1383 2958/2958
--- Current Progress: 39 %
Memory: 1399/1399 2958/2958
---- Current Progress: 40 %
Memory: 1421/1421 2958/2958
---- Current Progress: 41 %
Memory: 1448/1448 2958/2958
---- Current Progress: 42 %
Memory: 1460/1460 2958/2958
---- Current Progress: 43 %
Memory: 1481/1481 2958/2958
---- Current Progress: 46 %
Memory: 1497/1497 2958/2958
---- Current Progress: 48 %
Memory: 1514/1514 2958/2958
----- Current Progress: 50 %
Memory: 1533/1533 2958/2958
----- Current Progress: 52 %
Memory: 1549/1549 2958/2958
----- Current Progress: 54 %
Memory: 1562/1562 2958/2958
----- Current Progress: 56 %
Memory: 1580/1580 2958/2958
----- Current Progress: 59 %
Memory: 1597/1597 2958/2958
------ Current Progress: 60 %
Memory: 1622/1622 2958/2958
------ Current Progress: 62 %
Memory: 1628/1628 2958/2958
------ Current Progress: 64 %
Memory: 1647/1647 2958/2958
------ Current Progress: 67 %
Memory: 1662/1662 2958/2958
------ Current Progress: 68 %
Memory: 1682/1682 2958/2958
------- Current Progress: 70 %
Memory: 1693/1693 2958/2958
------- Current Progress: 73 %
Memory: 1710/1710 2958/2958
------- Current Progress: 75 %
Memory: 1726/1726 2958/2958
------- Current Progress: 77 %
Memory: 1753/1753 2958/2958
------- Current Progress: 78 %
Memory: 1762/1762 2958/2958
-------- Current Progress: 81 %
Memory: 1777/1777 2958/2958
-------- Current Progress: 83 %
Memory: 1796/1796 2958/2958
-------- Current Progress: 85 %
Memory: 1811/1811 2958/2958
-------- Current Progress: 87 %
Memory: 1828/1828 2958/2958
--------- Current Progress: 90 %
Memory: 1844/1844 2958/2958
Iter Damping Stepsize #Res #Jac #Sol
---------- Current Progress: 100 %
Memory: 2161/2161 3284/3284
1 1.0000000 0.53 1 1 1
Total time: 91.721 s.
comsol batch output for step 2
*******************************************
********COMSOL progress output file********
*******************************************
Wed May 09 11:04:41 CST 2012
---------- Current Progress: 100 %
Memory: 361/361 1061/1061
Current Progress: 0 %
Memory: 383/383 1082/1082
---------- Current Progress: 100 %
Memory: 395/395 1091/1091
Current Progress: 0 %
Memory: 543/543 1375/1375
Linear solver
Number of degrees of freedom solved for: 1484195
- Current Progress: 10 %
Memory: 657/657 1592/1592
- Current Progress: 13 %
Memory: 1269/1269 2365/2365
-- Current Progress: 20 %
Memory: 1122/1269 2057/2365
-- Current Progress: 27 %
Memory: 1079/1269 2016/2365
--- Current Progress: 30 %
Memory: 904/1269 1842/2365
Symmetric matrices found.
--- Current Progress: 31 %
Memory: 1141/1269 2391/2391
--- Current Progress: 33 %
Memory: 1158/1269 2391/2391
--- Current Progress: 35 %
Memory: 1178/1269 2391/2391
--- Current Progress: 37 %
Memory: 1195/1269 2391/2391
--- Current Progress: 39 %
Memory: 1216/1269 2391/2391
---- Current Progress: 41 %
Memory: 1227/1269 2391/2391
---- Current Progress: 43 %
Memory: 1244/1269 2391/2391
---- Current Progress: 45 %
Memory: 1261/1269 2391/2391
---- Current Progress: 47 %
Memory: 1283/1283 2391/2391
---- Current Progress: 49 %
Memory: 1305/1305 2391/2391
----- Current Progress: 52 %
Memory: 1321/1321 2391/2391
----- Current Progress: 54 %
Memory: 1338/1338 2391/2391
----- Current Progress: 55 %
Memory: 1363/1363 2391/2391
Iter Damping Stepsize #Res #Jac #Sol
--------- Current Progress: 94 %
Memory: 1318/1363 2545/2545
1 1.0000000 0.53 1 1 1
---------- Current Progress: 100 %
Memory: 1853/1853 3262/3262
Node 1:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Node 2:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Node 3:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Total time: 108.638 s.
Attachments:
5 Replies Last Post Feb 4, 2015, 10:29 a.m. EST