MERLIN-Expo Wiki

Introduction

Uncertainty analysis involve the propagation of uncertainty in model inputs (parameters) to estimate their impact on model outputs. From uncertainty analysis one can also learn of the importance of specific parameters (and other model components) to an end result.

It is recommended that uncertainty analysis is performed iteratively. It is time consuming (and therefore costly) to collect detailed data for parameters. Sensitivity analysis can help us identify which parameters are of biggest importance and for which we should focus our data gathering effort.

An uncertainty analysis in MERLIN-Expo starts with identifying uncertain parameters. These parameters are assigned probability density functions that describe the knowledge you have about each parameter value. Before a simulation is run, settings such as number of iterations, which parameters to include etc. are entered. You then proceed to running simulations after which you create charts and tables.

Identifying uncertain parameters

Many of the uncertain parameters in the MERLIN-Expo library have been given probability density functions (PDFs) for the several contaminants. The documentation of each model describes not only how the PDFs have been derived but also suggest how you can proceed to improve the data for your site and how you can derive PDFs for contaminants that are not included in the library.

Not all parameters are associated with (relevant) uncertainties, though it depends on your situation. If you study a specific agricultural land, the parameter for the surface area of the soil would not be uncertain. If you have a single local measurement of the soil density, this parameter could also be considered certain.

The pre-defined PDF's for site specific parameters often have very wide distributions because they are based on measurements of many sites. If you have access to local data you should use it instead of the pre-defined PDF. If you have only one measurement, use it and state that the parameter is not uncertain. If you have several measurements you should use them to derive a new PDF.

MERLIN-Expo does not include functions to fit your data to a distribution, but the software resources section lists some tools you can use. Also, there are generic distributions in MERLIN-Expo that allow you to enter measurements directly such as the general distribution and the histogram distribution.

Assigning probability density functions

The parameters screen lists all parameters used by the sub-systems of your model. In the table where you enter data for a specific parameter is a column named PDF.

A PDF is given using a named parameter approach. The following PDF is for a normal distribution, with a mean value of 0.025 and a standard deviation of 0.012. It is also truncated at 0 in order to avoid potential negative values:

PDF
norm(mean=0.025,sd=0.012,trmin=0.0)

Remembering the names of all distribution functions and their parameters is difficult, so you rarely enter distributions this way. Instead you use the PDF editor which appears when you click a cell in the table for which a PDF is required.

The editor asks you for the type of distribution (in this case normal). When a distribution type is selected, a row of boxes appear into which you enter parameters for the distribution.

The editor offers more functionality, read more here.

Setting up a probabilistic simulation

From the simulation screen you can open the probabilistic settings window by clicking the Probabilistic settings button in the toolbar.

General settings

Number of simulations - Choose the number of iterations to run. The more uncertain parameters you include in your analysis, the more iterations are needed to cover all the combinations of randomized parameter values.
Seed - The seed for the random number generator is by default itself random. This means that completely new data sets will be generated each time you start a simulation. If you wish to be able to get the same sampling each type you start a simulation, enter any number in this box.
Sampling - Specifies the scheme for generating parameter samples, either Monte Carlo (pseudo random numbers) or Latin hypercube sampling. Latin hypercube sampling gives you better coverage of each distribution, but can result in extreme outliers.

Selecting parameters for the uncertainty study

The Parameters page lets you select which of the uncertain parameters to include in your study. The more uncertain parameters you have, the more iterations you must run in order to cover a realistic combination of all all parameters. More iterations means longer simulations and more data to process after the simulation completes.

All parameters that have been assigned PDF's are listed on the left hand side. Choose the parameters you wish to include by selecting them in the list and clicking the > button.

Definining parameter correlations

When running a probabilistic simulation, a set of values is generated for each uncertain parameter by random sampling of their corresponding distributions. The random sampling can result in very unlikely combinations.

Example:

You are studying a generic river and have identified the width and depth of the river as uncertain:

Parameter	Distribution	Min	Max
Width	uniform	10	30
Depth	uniform	2	6

When the simulation starts, a set of values is generated for each:

Iteration	Width	Depth
1	15	2
2	10	6
3	29	2
4	14	5
5	23	3
…	…	…

This would mean that the river in the second iteration is narrow but deep, in the third iteration it is wide but very shallow. These might be very unlikely situations, and your results would not be realistic. The depth and width are correlated - when there is much water in the river both the depth and width should grow and vice versa.

Parameter correlations are described by assigning weights between -1 and 1 for each parameter pair. A value larger than zero implies a positive correlation - when the first parameter has a large value, the second parameter should also have a large value. A negative weight implies a negative correlation - a large value for the first parameter should be combined with a small value for the second parameter.

After the sampling has been performed, MERLIN-Expo will try to sort the samples to accomodate the correlation weights. With a value of 0.9 between width and height, we would get the following sets of values:

Iteration	Width	Depth
1	10	2
4	14	2
2	15	3
3	23	5
5	29	6
…	…	…

The Correlation page lets you set up correlation for parameters. You must first click Enabled to enable correlations. Then click the Add button to add each correlation.

Running simulations

Probabilistic simulations generate a lot of data. It is a good idea to select only the simulation outputs and time points you are interested in.

Selecting time points

Click the Simulation settings… button in the toolbar of the simulation screen.
By default, MERLIN-Expo will output values for each day: the Time series is a linear series with an increment of 1 day.
Edit the time series by clicking the Edit button
Either change the increment (for instance to 10 or 50), or choose some other type of series. The Custom series allow you to enter exactly for which time points you want results. Read more here.

Selecting outputs

In the Simulation settings window, click the Outputs tab.
Remove all the blocks you are not interested in.

Starting the simulation

To run a probabilistic simulation, you need to first to change Simulation type from Deterministic to Probabilistic. Then click the Run button.

Creating charts and tables

Many new types of charts and tables are available for probabilistic results, please refer to the charts screen and the tables screen pages.

Sensitivity analysis: random methods

After a probabilistic simulation you have access to the same charts and tables as after a sensitivity analysis using random methods.

Troubleshooting

A lot of problems can arise during a probabilistic simulation which would not happen in a deterministic case. When parameter samples are drawn from the probability density functions, you will inevitably end up with some extreme values. As discussed in the correlation section, it is also easy to end up with combinations of parameter values which are unlikely or extreme. This can cause iterations to take hundreds of times longer than iterations with less extreme parameter values. In bad cases, the numerical solver has to abort because it cannot meet error tolerances.

Problems are not always easy to identify - is it a specific PDF which causes problems or a combination of samples?

Problem	Solution
Simulation message: X is NaN at t=Y	NaN
Simulation message: could not without reducing	Error tolerance
Simulation never finishes	Memory problem,Time out

NaN

http://en.wikipedia.org/wiki/NaN (Not a Number) is the result of an undefined calculation. Examples are 0/0 (zero divided by zero), log(-3) (logarithm of a negative number), -1.5^(10/3) etc.

MERLIN-Expo will abort a simulation as soon as a NaN is detected in any intermediate calculation result. For debugging purposes however, it is sometimes useful to continue the simulation.

NaN is sometimes used as a default value for parameters to assure that users have entered values. So, if the NaN is reported for a parameter, assert that its values are correctly set in the parameters_screen.

Debugging NaN and Infinite values

Step 1

The error message contains the name of the block for which the NaN/Infinity value was discovered. In the information box of the model_screen, find the equation for the block. This will show you candidate parameters for the error.

Example:

River.Mass transfer coefficient at the surface water-sediment interface is NaN at t=0

The equation is D_water·φ_sed^(4.0/3.0)/(Δ_sed+Δw·φ_sed^(4.0/3.0))

It is easy to tell that if φ_sed is < 0, a NaN would be reported.

Step 2

MERLIN-Expo can be told to continue simulations even when infinity/NaN occurs and to produce statistics on which iterations failed. After the simulation is finished, a raw data table can be created to see exactly for which parameter sets the simulation fails.

Open Simulation settings window
1. In the Output page, make sure that all uncertain parameters are among the simulation outputs.
2. In the Advanced page
  1. Deselect Halt on error. This will make MERLIN-Expo continue with the next iteration instead of aborting the simulation.
  2. Select Output statistics. After the simulation is complete, a set of outputs will be available with information on which iterations failed.
Open Probabilistic settings window.
1. In the General page, decrease the number as much as you can - no need to wait for a 1000 iterations if the first error is reported for iteration 10.
2. Enter a number for the Seed instead of using Auto. This way the same sampling will be repeated every time you run a new simulation.
Start a simulation
Go to the the tables screen.
1. Among the simulation outputs there should be a folder named _Statistics. In it, right click Failed_NaN and select Raw data table from the menu.
2. Failed_NaN will be 1.0 for each failed iteration.
Finally create a table with all the parameters: in the Type drop down list (below the results), select Parameter.
1. Select all parameters with CTRL+A.
2. Right-click one of the parameters and choose Raw data table from the menu.
3. Edit the table by right-clicking and choosing Edit…
4. Add a column for Failed_NaN by clicking the Add column button. The new column will be located last in the table.
5. Select this new, blank, column. In the Output field above, click the … button, and add Failed_NaN.
6. Sort the table by clicking the Failed_NaN column so that the failed iterations end up in the top of the table.
7. Now, try to see if there is anything obviously wrong with the any of the parameter values.

Step 3

When it is not possible to understand by observing the parameter values what is wrong, only one thing remains: the process of elimination.

In the probabilistic settings window, go to the Parameters screen.
Remove all except for one parameter.
Run a simulation.
If the simulation is successful,
Go back to the probabilistic settings window and add another parameter.
Continue until a simulation fails.