HydroGeoSphere User Conference 2011 Leibniz Universitat Hannover, April 11-13, 2011 How to Speed Up HGS without Parallel Computing A Parallel Computational Framework to Simulate Flow and Transport in Integrated Hydrologic Systems GUI: Visual HydroGeoSphere How to Speed Up HGS without Parallel Computing 2 3 Control Volume Finite Element Method For Solving Groundwater and Surface Water Flow ∆Vw ∆hi ∆Si Q ji = ≅ Ss + n ∑ ∆t ∆t ∆t j = j1 j= j4 = Q ji C ji (h j − hi ) j4 Q j 4i Aij = Cij (hi , h j ) Q j 3i Q j1i i j1 Q j 2i j2 j= j4 j3 Ss − ∑ Cij − Aii = ∆t j = j1 A(h) ⋅ h = b(h) 4 Newton-Raphson Method For Solving Nonlinear Equations Root finding: find hs that satisfies the following: A(h s ) ⋅ h s = b(h s ) or R (h s= ) b(h s ) − A(h s ) ⋅ h= 0 s R ∂R R(h + ∆= h) R ( h) + = ∆h 0 ∂h −1 ∂R ∆h =− R ( h) ∂h R (h0 ) R (h1 ) hk +1= hk + ∆hk +1 if ∆hk +1 < ε abs , then hk +1 hs R (h2 ) R (h3 ) or h3∆h3 h2 hs ∆h2 h1 ∆h1 h0 h if R(h ) < ε , then h h k +1 rel k +1 s 5 Newton-Raphson Method For Solving Nonlinear Equations ∂R R(h + ε J ) − R (h) = J = εJ ∂h if ∆hk +1 < ε abs , then hk +1 hs or if R(hk +1 ) < ε rel , then hk +1 hs ∆hallowed ∆= t new ∆t old max(∆h) ∆t new α ≤ old ≤ β ∆t 6 Newton-Raphson Method For Solving Nonlinear Equations −1 ∂R −1 ∆h =− = − R ( h ) J R ( h) ∂h hk +1= hk + ∆hk +1 hk +1 = hk + κ∆hk +1 where κ ≤ 1 κ= 1 if (∆hk ∆hk +1 ) > 0 R R (h0 ) R (h2 ) ∆h3 < 0 h1 ∆h1 < 0 R (h3 ) h3 R (h1 ) ∆h2 > 0 h 2 hs h0 7 Newton-Raphson Method For Solving Nonlinear Equations −1 ∂R −1 ∆h =− = − R ( h ) J R ( h) ∂h hk +1= hk + ∆hk +1 hk +1 = hk + κ∆hk +1 where κ ≤ 1 κ= 1 if (∆hk ∆hk +1 ) > 0 κ= (∆hk +1 ) allowed or 1 ∆hk +1 Control Volume Finite Element Method For Solving Groundwater and Surface Water Flow ∆Vw ∆hi ∆Si Q ji = ≅ Ss + n ∑ ∆t ∆t ∆t j = j1 j= j4 = Q ji C ji (h j − hi ) j4 Q j 4i Aij = Cij (hi , h j ) Q j 3i Q j1i i j1 Q j 2i j2 j= j4 j3 Ss − ∑ Cij − Aii = ∆t j = j1 A(h) ⋅ h = b(h) 9 Control Volume Finite Element Method For Solving Groundwater and Surface Water Flow = Q C (h − h ) ji ji j i if C ji 1, then hi ≈ h j d o5/ 3 C ji is derived from Qo = − 1/ 2 ∇(d o + zo ) nΦ 3 d o min[d o2 / 3 , d o2,/allowed ] d o5/ 3 1/ 2 ≈ 2 nΦ n max[Φ1/ 2 , Φ1/allowed ] 10 Root finding: find hs that satisfies the following: R (h s= ) b(h s ) − A(h s ) ⋅ h= 0 s if ∆hk +1 < ε abs , then hk +1 hs Guarantees solutions to be close enough Not guarantee achievement of objective Can be problematic for highly sensitive cases if R(hk +1 ) < ε rel , then hk +1 hs Not guarantee close enough solutions Guarantees achievement of objective Can be problematic for insensitive cases 11 no nodal flow check remove negative coefficient finite difference mode etc Aij = Cij (hi , h j ) j= j4 Ss − ∑ Cij − Aii = ∆t j = j1 A(h) ⋅ h = b(h) 12 Interactions at the Interface Between Surface and Subsurface Flow Regimes Rill Storage and Storage Exclusion Manning’s Equation for Surface Flow ∂ho vox = − Kox ∂x ∂ho voy = − Koy ∂y d o2 / 3 1 Kox = n x [ ∂ho ∂s ]1/ 2 d o2 / 3 1 Koy = n y [ ∂ho ∂s ]1/ 2 0.3 Kox ⋅ n x [∂ho ∂s ] 1/ 2 0.2 d o2 / 3 0.1 0 0 0.02 0.04 0.06 do 0.08 0.1 Modified Manning’s Equation: Rill Storage and Storage Exclusion vox = −k H Kox ∂ho ∂x 1 ∂ho voy = −k H Koy ∂y kH 0 k H S 2(1− S ) = 1 S 2(1− S ) do < H d H d < do < H s do > H s 0 S= (d o − H d ) ( H s − H d ) Hd Ho Hs do Relative Permeability for Fluid Exchange qexch = −kV ,up K exch ( ho − h pm ) / lexch 1 K exch = K zz , pm kV ,up k r ,o k r , pm = k r ,o if h pm > ho if ho > h pm Sexch 2(1− Sexch ) = 1 Sexch = d o / H s Sexch 2(1− Sexch ) k r ,o do < H s do > H s 0 Hd Ho Hs do Volume Depth and Surface Porosity Hs do 2 H − H d − − H d H d − s o o o 1 − s s asin s Hs H s 2 2 dV ( d o ) = π ( d H ) Hs − + s o 4 dV ( d o ) ∈ C 1 φo = dV / d o do ≤ H s do > H s 3 1 0.8 π 4 2 φo 0.6 ( dV / d o ) dV / H s 0.4 1 0.2 0 0 0 1 2 do / H s 3 0 1 2 3 do / H s 4 5 HydroGeoSphere: A three-dimensional numerical model describing fully-integrated subsurface and surface flow and transport Attempt to account for all interactions between surface and subsurface flow regimes Conceuptually Superior to linked simulators or iteratively coupled simulators Complex (more processes, highly nonlinear) Overview of HydroGeoSphere Features 2D overland/stream flow (Diffusion-wave equation), including stream/surface drainage network genesis. 3D variably-saturated flow (Richards’ equation + ET) in porous medium. 3D variably-saturated flow in macropores, fractures and karst conduits (dual-porosity, dual-permeability) Advective-dispersive, reactive solute/thermal transport in all continua. Allows for complex topography, irregular surface & subsurface pr operties, density-dependent flow, pumping wells, tiles, etc. Fully-coupled, simultaneous solution of surface/subsurface flow and transport via Control-Volume Finite Element Method. Overview of HydroGeoSphere Computing flow for serial HGS Domain p artitioning 23 Computing time for serial HGS Distribution of computing time Matrix assembly 20.7 % ILUC 1.5 % Matrix solver 74.5 % • The targets of parallelization are matrix assembly and iterativ e 24 matrix solver Parallel schemes applied to HGS: Global matrix assembly Coarse grain parallelism Assembling Matrix is highly independent, so coarse grai n parallelism is applied, leading to high parallel efficiency. Thread 1 Thread 2 Thread 3 Thread 4 25 Parallel schemes applied to HGS: Matrix solver Algorithm of BiCGSTAB 26 Parallel schemes applied to HGS: Matrix solver Floating-point operation (Chin and Forsyth, 1993) (Chin and Forsyth, MV DP199 SV 8) LU Floating-point operation 2nzp 2nz 4n 6n 3D, level 0-ILU 12n 12n 4n 6n > 75 % Computing time using Scalasca Computing tim e per iteration ( %) LU MV DP SV 55.8 37.4 2.4 0.02 > 93 % LU and MV are hot spots for BiCGSTAB • MV is independent computation, so is easy to parallelize • LU needs information on neighboring nodes, so a new parall el method is necessary 27 Parallelization of LU solve: Domain partitioning and node reordering - Domain partitioning scheme is applied to make independent sub-matrix - Node reordering is performed to narrow the matrix bandwidth. Schematic of a) domain partitioning method and b) Numbering scheme 28 Parallelization of LU solve: Data structure for domain decomposition The Sparse Matrix Storage Scheme (Yale format) ia : the array of row pointers into ja. ja : the array of column indices a : the array of all the matrix nonzero scalar elements Example code of Matrix-Vector multiplication DO 20 jind = ia(ibf) + 1,ia(ibf+1) - 1 x(ibf) = x(ibf) - a( jind)*x( ja( ji nd)) 20 CONTINUE DO 20 jind = iaf(ibf) + 1,iaf(ibf+1) - 1 x(ibf) = x(ibf) - af( jind)*x( jaf( j ind)) 20 CONTINUE 29 (After Clift et al., 1996) Parallelization of LU solve Results of domain partitioning and node reordering Original matri x reordered matri x (4 Threads) 30 Parallelization of LU solve: Matrix and array chopping for privatization • Original parallel scheme for LU solve 31 Parallelization of LU solve: Matrix and array chopping for privatization Matrix chopping (After Ruud van der Pas, 20 09) 32 Parallelization of LU solve: Matrix and array chopping for privatization Array chopping 33 Scalability tests of Parallel HydroGeoSphere Saturated flow no Nx ny nz Simulation type Variably-saturated flow Heterogeneity (m/day) Mean ln(K) Var ln(K) Test machine 1 100 33 10 S1) , VS2) -4.6 10.6 GPC3) 2 330 330 10 S, VS -4.6 10.6 GPC 3 330 330 10 VS -4.6 10.6 TCS4) S1) : Saturated flow; VS2) : Variably-saturated flow GPC3) : General Purpose Cluster (GPC) at Scinet/University of Toronto TCS4) : Tightly Coupled System (TCS) at Scinet/University of Toronto Parallel HydroGeoSphere (PHGS) is compiled by Intel® FORTRAN compiler 10.1 for GPC (-O0 and –O3) and IBM xlf compiler for TCS (-O5) 34 Scalability tests of Parallel HydroGeoSphere Scalability of conceptual models (simulation #1, GPC, 105 node s) Privatization No Privatization 6.3 Saturated flow (no opt.) 4.5 Saturated flow (opt.) 35 Scalability tests of Parallel HydroGeoSphere Scalability of conceptual models (simulation #1, GPC, 105 node s) Privatization No Privatization 6.9 Variablysaturated flow (no opt.) 5.1 Variablysaturated flow (opt.) 36 Scalability tests of Parallel HydroGeoSphere Scalability of conceptual models (simulation #2, GPC, 106 nodes) Privatization No Privatization 6.8 Saturated flow (no opt.) 5.1 Saturated flow (opt.) 37 Scalability tests of Parallel HydroGeoSphere Scalability of conceptual models (simulation #2, GPC, 106 node s) Privatization No Privatization 6.8 Variablysaturated flow (no opt.) 5.9 Variablysaturated flow (opt.) 38 Scalability tests of Parallel HydroGeoSphere (optimization) Scalability of conceptual models (simulation #3, TCS, 106 nodes) Privatization No Privatization 12.2 Variablysaturate d flow (opt.) 8.8 39 Conclusions For surface and subsurface flow simulations, the computational hot spots are matrix assembly and the iterative solver. To obtain high parallel efficiency, a coarse grain parallelism and privatization scheme is applied to matrix assembly and iterative matrix solver (BiCGSTAB), respectively. Privatization scheme increases parallel efficiency. 40 GUI: Visual HydroGeoSphere Flow Chart for Visual HGS 42 2D Grid Generation Grid generation in 2D H-plane Grid generation in 2D V-plane 43 Assigning Material Properties 44 Simulation Parameters Subsurface flow Medium Type Material Properties Initial Condition Boundary Condition Surface flow porous medium surface runoff hydraulic conductivity, specific storage, porosity, etc friction factor, Rill storage height, Bottom leakance, etc initial head initial water depth specified heads, specified flux critical depth boundary 45 Post-Processing 46 Triangulation CGAL(Computation Geometry Algorithm Library) Delaunay Refinement Package - PLSG (Planar straight line graph): segment, isolated point - seed point - Triangle Constraints: size and angle for triangles Angular contraints Size contraints Internal boundaries 47 Geo-Modelling 48 Geo-Modelling using Borehole Information 49 Main Interface menu toolbar object browser 3D view property view 50 Geometry Selection by layer node segment highlighting face element 51 Simulation parameters Material properties 52 Examples 53
© Copyright 2025