User Tools

Site Tools


documentation:performance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
documentation:performance [16:44 15.11.2013] – [Scalability of multi-threading] Walterdocumentation:performance [20:25 18.02.2014] (current) – [Hardware] Walter
Line 1: Line 1:
-~~NOTOC~~ +===== Performance and scalability =====
- +
-===== Introduction: Performance and scalability =====+
  
 The performance and scalability of simulations in Morpheus heavily depend on the type of (multi-scale) model that is being simulated. It is therefore difficult to make general statements on the computational efficiency. However, we can test the performance on a set of "benchmark" models that form the modules from which more complex model can be constructed.  The performance and scalability of simulations in Morpheus heavily depend on the type of (multi-scale) model that is being simulated. It is therefore difficult to make general statements on the computational efficiency. However, we can test the performance on a set of "benchmark" models that form the modules from which more complex model can be constructed. 
  
-We have tested the performance of ODE lattices, reaction-diffusion (PDE) models and cellular Potts models, using the available [[examples:examples|Example models]]. The [[documentation:performance#Results|results]] show the execution time and memory consumption for these models as well as their scalability in terms of problem size and scalability in terms of efficiency of multi-threading.+We have tested the performance of ODE lattices, reaction-diffusion (PDE) modelscellular Potts models (CPM) and a multiscale model (CPM+PDE), using the available [[examples:examples|Example models]]. The [[documentation:performance#Results|results]] show the execution time and memory consumption for these models as well as their scalability in terms of problem size and scalability in terms of efficiency of multi-threading.
  
-=== Performance measurements === +==== Performance measurements ==== 
  
 To quantify performance, we measured the following aspects for each simulation: To quantify performance, we measured the following aspects for each simulation:
Line 14: Line 12:
   * **Memory usage** in terms of the physical memory (RAM) used by the simulation, using the resident set size (RSS) from the ''/proc/self/stat'' pseudo-file.   * **Memory usage** in terms of the physical memory (RAM) used by the simulation, using the resident set size (RSS) from the ''/proc/self/stat'' pseudo-file.
  
-=== Scaling with problem size ===+==== Scalability with problem size ====
  
-We investigated the scalability with respect to problem size to see how performance in terms of the execution time ([[http://en.wikipedia.org/wiki/Wall-clock_time|wall time]]) and memory usage (RAM) scales with increasing population size or lattice size.+We investigated the scalability with respect to problem size to see how performance in terms of the execution time and memory usage (RAM) scales with increasing population size or lattice size.
  
 We calculate and plotted both the execution time and memory usage in: We calculate and plotted both the execution time and memory usage in:
-  * **Absolute** terms: execution time in seconds (sec) and memory in megabytes (MB).  +  * **Absolute** terms: time in seconds (sec) and memory in megabytes (MB).  
-  * **Relative** terms: execution time and memory per cell or lattice site in millisecond (msec) or kilobyte (kB). +  * **Relative** terms: time memory per cell/lattice site in millisecond (msec) kilobyte (kB). 
  
 Performance in absolute sense provides a sense of the problems sizes that are practically manageable within certain time and memory constraints.  Performance in absolute sense provides a sense of the problems sizes that are practically manageable within certain time and memory constraints. 
Line 26: Line 24:
 Performance in relative sense shows the scalability of the simulation for problem sizes. Ideally, the performance per cell or lattice site stays constant or decreases with increasing problem sizes. Performance in relative sense shows the scalability of the simulation for problem sizes. Ideally, the performance per cell or lattice site stays constant or decreases with increasing problem sizes.
    
-=== Scalability of multi-threading ===+====  Scalability of parallel processing ==== 
  
 We have also measured the scalability with respect to the number of openMP threads to see how the performance scale with the number of concurrent threads. We have also measured the scalability with respect to the number of openMP threads to see how the performance scale with the number of concurrent threads.
Line 36: Line 34:
 ===== Methods ===== ===== Methods =====
  
-=== Benchmark models ===+==== Benchmark models ==== 
  
 The models used in performance tests are available as [[examples:examples|Example models]]: The models used in performance tests are available as [[examples:examples|Example models]]:
Line 45: Line 43:
  
 The models are run without analysis and visualization tools and execution time is measured from ''StartTime'' to ''StopTime''. The time for initialization is excluded since this vanishes for large jobs. The models are run without analysis and visualization tools and execution time is measured from ''StartTime'' to ''StopTime''. The time for initialization is excluded since this vanishes for large jobs.
-=== Hardware === 
  
-All simulations were performed on a [[http://ark.intel.com/products/41316|Intel Core i7-860 vPro]]. +==== Hardware ==== 
-++++ Hardware specification |+ 
 +All simulations were performed on a [[http://ark.intel.com/products/41316|Intel Core i7-860 vPro]]. ++++Hardware specification |
 | # of Cores | 4 | | # of Cores | 4 |
 | # of Threads | 8 (hyperthreading) | | # of Threads | 8 (hyperthreading) |
Line 59: Line 57:
 ===== Results ===== ===== Results =====
  
 +==== Benchmark tests ====
 ^ ODE |  {{:documentation:performance:ode_25.png?link&125| }} | {{:documentation:performance:ode_100.png?link&125| }} | {{:documentation:performance:ode_400.png?link&125| }} | {{:documentation:performance:ode_2500.png?link&125| }} | {{:documentation:performance:ode_10000.png?link&125| }} | {{:documentation:performance:ode_40000.png?link&125| }} | ^ ODE |  {{:documentation:performance:ode_25.png?link&125| }} | {{:documentation:performance:ode_100.png?link&125| }} | {{:documentation:performance:ode_400.png?link&125| }} | {{:documentation:performance:ode_2500.png?link&125| }} | {{:documentation:performance:ode_10000.png?link&125| }} | {{:documentation:performance:ode_40000.png?link&125| }} |
 | Cells | 25 | 100 | 400 | 2500 | 10000 | 40000 | | Cells | 25 | 100 | 400 | 2500 | 10000 | 40000 |
Line 69: Line 68:
 | Cells | 8 | 50 | 200 | 800 | 5000 | | Cells | 8 | 50 | 200 | 800 | 5000 |
  
 +-----
 +==== Performance statistics ====
  
 ^ ^ Problem size \\ (absolute) ^ Problem size \\ (relative) ^ ^ Multi-threading ^ ^ ^ Problem size \\ (absolute) ^ Problem size \\ (relative) ^ ^ Multi-threading ^
Line 76: Line 77:
 ^ CPM \\ {{:documentation:performance:cpm_2000.png?link&100| }}| {{:documentation:performance:performance_cpm_problemsize_absolute.png?direct&300|}} ((Execution time is almost linearly with number of ''CPM'' cells. Small memory footprint, despite ''edgelist'' tracking.)) | {{:documentation:performance:performance_cpm_problemsize_relative.png?direct&300|}} ((Exec. time per ''CPM'' cell is almost constant, although performance decreases for larger systems. Decrease of memory usage per cell is here mostly due to use of large lattice in all cases.)) | | {{:documentation:performance:performance_cpm_multithreading.png?direct&300|}} ((Parallel processing is not available for ''CPM'' simulations. Therefore, multithreading does not results in speed-up. Instead, the multithreading overhead even slightly decreases performance.)) | ^ CPM \\ {{:documentation:performance:cpm_2000.png?link&100| }}| {{:documentation:performance:performance_cpm_problemsize_absolute.png?direct&300|}} ((Execution time is almost linearly with number of ''CPM'' cells. Small memory footprint, despite ''edgelist'' tracking.)) | {{:documentation:performance:performance_cpm_problemsize_relative.png?direct&300|}} ((Exec. time per ''CPM'' cell is almost constant, although performance decreases for larger systems. Decrease of memory usage per cell is here mostly due to use of large lattice in all cases.)) | | {{:documentation:performance:performance_cpm_multithreading.png?direct&300|}} ((Parallel processing is not available for ''CPM'' simulations. Therefore, multithreading does not results in speed-up. Instead, the multithreading overhead even slightly decreases performance.)) |
 ^ CPM + PDE \\ {{:documentation:performance:cpmpde_400.png?link&125| }} | {{:documentation:performance:performance_cpmpde_problemsize_absolute.png?direct&300|}} | {{:documentation:performance:performance_cpmpde_problemsize_relative.png?direct&300|}} | |{{:documentation:performance:performance_cpmpde_multithreading.png?direct&300|}} | ^ CPM + PDE \\ {{:documentation:performance:cpmpde_400.png?link&125| }} | {{:documentation:performance:performance_cpmpde_problemsize_absolute.png?direct&300|}} | {{:documentation:performance:performance_cpmpde_problemsize_relative.png?direct&300|}} | |{{:documentation:performance:performance_cpmpde_multithreading.png?direct&300|}} |
 +
documentation/performance.1384530273.txt.gz · Last modified: 16:44 15.11.2013 by Walter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki