Note
Go to the end to download the full example code.
Latin hypercube sampling of the objective function in three survey line example#
If you are running this notebook locally, make sure you’ve followed steps here to set up the environment. (This environment.yml file specifies a list of packages required to run the notebooks)
# -------------------------------------------------------- #
# #
# Uncomment below to set up environment on "colab" #
# #
# -------------------------------------------------------- #
# !pip install -U cofi
# !pip install git+https://github.com/JuergHauser/PyP223.git
# !pip install smt
# !git clone https://github.com/inlab-geo/cofi-examples.git
# %cd cofi-examples/examples/vtem_max
# If this notebook is run locally PyP223 and smt need to be installed separately by uncommenting the following lines,
# that is by removing the # and the white space between it and the exclamation mark.
# !pip install git+https://github.com/JuergHauser/PyP223.git
# !pip install smt
import numpy
import smt
import smt.sampling_methods
import tqdm
from vtem_max_forward_lib import (
problem_setup,
system_spec,
survey_setup,
ForwardWrapper
)
numpy.random.seed(42)
Background#
The time required to solve the forward problem is what frequently dominates the time required to solve an inverse problem. An approximate mathematical model also known as a surrogate model may be constructed and used instead of the full forward problem with the advantage that evaluating the approximate model typically only takes a fraction of the time required to solve the full forward problem. The surrogate modelling toolbox (SMTorg/smt) is a Python library that provides a range of surrogate modelling methods.
Here we use the surrogate modelling toolbox to creata surrogate model for the objective function used in the three survey line example. This notebook generates training and test/validation samples of the objective function using latin hypercube sampling. Compared to random sampling latin hypercube sampling seeks to ensure that the set of random numbers is representative of the real variability. The training samples are used to create the surrogate model and the test samples are used to assess its predictive power.
For large numbers of samples it can be be convenient to convert the notebook into a script and run it from the comand line, using the following command to create the script.
jupyter nbconvert --to script three_survey_lines_latin_hypercube_sampling.ipynb
# set the number of training and test samples
ntrain=100
ntest=25
Problem definition#
tx_min = 115
tx_max = 281
tx_interval = 15
ty_min = 25
ty_max = 176
ty_interval = 75
tx_points = numpy.arange(tx_min, tx_max, tx_interval)
ty_points = numpy.arange(ty_min, ty_max, ty_interval)
n_transmitters = len(tx_points) * len(ty_points)
tx, ty = numpy.meshgrid(tx_points, ty_points)
tx = tx.flatten()
ty = ty.flatten()
fiducial_id = numpy.arange(len(tx))
line_id = numpy.zeros(len(tx), dtype=int)
line_id[ty==ty_points[0]] = 0
line_id[ty==ty_points[1]] = 1
line_id[ty==ty_points[2]] = 2
survey_setup = {
"tx": tx, # transmitter easting/x-position
"ty": ty, # transmitter northing/y-position
"tz": numpy.array([50]*n_transmitters), # transmitter height/z-position
"tazi": numpy.deg2rad(numpy.array([90]*n_transmitters)), # transmitter azimuth
"tincl": numpy.deg2rad(numpy.array([6]*n_transmitters)), # transmitter inclination
"rx": tx, # receiver easting/x-position
"ry": numpy.array([100]*n_transmitters), # receiver northing/y-position
"rz": numpy.array([50]*n_transmitters), # receiver height/z-position
"trdx": numpy.array([0]*n_transmitters), # transmitter receiver separation inline
"trdy": numpy.array([0]*n_transmitters), # transmitter receiver separation crossline
"trdz": numpy.array([0]*n_transmitters), # transmitter receiver separation vertical
"fiducial_id": fiducial_id, # unique id for each transmitter
"line_id": line_id # id for each line
}
true_model = {
"res": numpy.array([300, 1000]),
"thk": numpy.array([25]),
"peast": numpy.array([175]),
"pnorth": numpy.array([100]),
"ptop": numpy.array([30]),
"pres": numpy.array([0.1]),
"plngth1": numpy.array([100]),
"plngth2": numpy.array([100]),
"pwdth1": numpy.array([0.1]),
"pwdth2": numpy.array([90]),
"pdzm": numpy.array([75]),
"pdip": numpy.array([60])
}
forward = ForwardWrapper(true_model, problem_setup, system_spec, survey_setup,
["pdip","pdzm", "peast", "ptop", "pwdth2"], data_returned=["vertical"])
['pdip', 'pdzm', 'peast', 'ptop', 'pwdth2']
# check the order of parameters in a model vector
forward.params_to_invert
['pdip', 'pdzm', 'peast', 'ptop', 'pwdth2']
true_param_value = numpy.array([60,65, 175, 30, 90])
xtrue=true_param_value
Generate synthetic data
# The data
absolute_noise= 0.05
# create data and ad a realisation of the noise
data_pred_true = forward(true_param_value)
data_obs = data_pred_true + numpy.random.randn(len(data_pred_true))*absolute_noise
# define data covariance matrix
sigma=absolute_noise
Cdinv=numpy.identity(len(data_obs))*(1.0/(sigma*sigma))
Perform Latin Hypercube sampling#
Define objective function
def my_objective(model):
dpred = forward(model)
residual = dpred - data_obs
return residual.T @ Cdinv @ residual
ndim=len(true_param_value)
xlimits=numpy.array([[10,80],[30,150],[150,190],[25,45],[60,120]])
sampling = smt.sampling_methods.LHS(xlimits=xlimits,seed=42)
xtrain=sampling(ntrain)
ytrain=[]
xtest=sampling(ntest)
ytest=[]
for x in tqdm.tqdm(xtrain):
ytrain.append(my_objective(x))
for x in tqdm.tqdm(xtest):
ytest.append(my_objective(x))
0%| | 0/100 [00:00<?, ?it/s]
1%| | 1/100 [00:01<02:57, 1.80s/it]
2%|▏ | 2/100 [00:03<03:00, 1.84s/it]
3%|▎ | 3/100 [00:06<03:21, 2.08s/it]
4%|▍ | 4/100 [00:08<03:40, 2.30s/it]
5%|▌ | 5/100 [00:10<03:20, 2.11s/it]
6%|▌ | 6/100 [00:13<03:34, 2.28s/it]
7%|▋ | 7/100 [00:15<03:41, 2.38s/it]
8%|▊ | 8/100 [00:17<03:18, 2.16s/it]
9%|▉ | 9/100 [00:19<03:25, 2.26s/it]
10%|█ | 10/100 [00:22<03:28, 2.32s/it]
11%|█ | 11/100 [00:24<03:19, 2.24s/it]
12%|█▏ | 12/100 [00:26<03:04, 2.09s/it]
13%|█▎ | 13/100 [00:28<03:03, 2.10s/it]
14%|█▍ | 14/100 [00:30<03:01, 2.11s/it]
15%|█▌ | 15/100 [00:32<03:01, 2.14s/it]
16%|█▌ | 16/100 [00:35<03:11, 2.28s/it]
17%|█▋ | 17/100 [00:37<03:03, 2.20s/it]
18%|█▊ | 18/100 [00:39<02:57, 2.16s/it]
19%|█▉ | 19/100 [00:41<02:51, 2.12s/it]
20%|██ | 20/100 [00:43<02:56, 2.20s/it]
21%|██ | 21/100 [00:45<02:46, 2.10s/it]
22%|██▏ | 22/100 [00:48<02:55, 2.25s/it]
23%|██▎ | 23/100 [00:50<03:02, 2.37s/it]
24%|██▍ | 24/100 [00:52<02:47, 2.20s/it]
25%|██▌ | 25/100 [00:54<02:44, 2.19s/it]
26%|██▌ | 26/100 [00:56<02:39, 2.16s/it]
27%|██▋ | 27/100 [00:59<02:48, 2.31s/it]
28%|██▊ | 28/100 [01:01<02:32, 2.12s/it]
29%|██▉ | 29/100 [01:02<02:23, 2.02s/it]
30%|███ | 30/100 [01:05<02:22, 2.04s/it]
31%|███ | 31/100 [01:06<02:15, 1.97s/it]
32%|███▏ | 32/100 [01:08<02:10, 1.92s/it]
33%|███▎ | 33/100 [01:10<02:06, 1.89s/it]
34%|███▍ | 34/100 [01:12<02:10, 1.98s/it]
35%|███▌ | 35/100 [01:14<02:10, 2.01s/it]
36%|███▌ | 36/100 [01:16<02:07, 1.99s/it]
37%|███▋ | 37/100 [01:18<02:09, 2.05s/it]
38%|███▊ | 38/100 [01:21<02:09, 2.09s/it]
39%|███▉ | 39/100 [01:23<02:07, 2.08s/it]
40%|████ | 40/100 [01:24<01:59, 1.98s/it]
41%|████ | 41/100 [01:27<02:07, 2.16s/it]
42%|████▏ | 42/100 [01:29<01:57, 2.03s/it]
43%|████▎ | 43/100 [01:31<02:04, 2.19s/it]
44%|████▍ | 44/100 [01:33<02:02, 2.19s/it]
45%|████▌ | 45/100 [01:35<01:53, 2.07s/it]
46%|████▌ | 46/100 [01:37<01:53, 2.11s/it]
47%|████▋ | 47/100 [01:40<01:59, 2.25s/it]
48%|████▊ | 48/100 [01:42<01:55, 2.22s/it]
49%|████▉ | 49/100 [01:44<01:47, 2.11s/it]
50%|█████ | 50/100 [01:47<01:52, 2.25s/it]
51%|█████ | 51/100 [01:49<01:54, 2.34s/it]
52%|█████▏ | 52/100 [01:51<01:42, 2.14s/it]
53%|█████▎ | 53/100 [01:53<01:45, 2.25s/it]
54%|█████▍ | 54/100 [01:55<01:36, 2.10s/it]
55%|█████▌ | 55/100 [01:57<01:35, 2.12s/it]
56%|█████▌ | 56/100 [01:59<01:34, 2.15s/it]
57%|█████▋ | 57/100 [02:02<01:35, 2.21s/it]
58%|█████▊ | 58/100 [02:04<01:29, 2.14s/it]
59%|█████▉ | 59/100 [02:05<01:22, 2.01s/it]
60%|██████ | 60/100 [02:08<01:20, 2.02s/it]
61%|██████ | 61/100 [02:10<01:25, 2.20s/it]
62%|██████▏ | 62/100 [02:13<01:28, 2.33s/it]
63%|██████▎ | 63/100 [02:15<01:23, 2.27s/it]
64%|██████▍ | 64/100 [02:17<01:20, 2.23s/it]
65%|██████▌ | 65/100 [02:20<01:22, 2.34s/it]
66%|██████▌ | 66/100 [02:22<01:17, 2.29s/it]
67%|██████▋ | 67/100 [02:24<01:14, 2.26s/it]
68%|██████▊ | 68/100 [02:26<01:09, 2.17s/it]
69%|██████▉ | 69/100 [02:28<01:04, 2.08s/it]
70%|███████ | 70/100 [02:30<01:06, 2.23s/it]
71%|███████ | 71/100 [02:33<01:07, 2.31s/it]
72%|███████▏ | 72/100 [02:36<01:07, 2.40s/it]
73%|███████▎ | 73/100 [02:38<01:06, 2.45s/it]
74%|███████▍ | 74/100 [02:40<01:01, 2.35s/it]
75%|███████▌ | 75/100 [02:42<00:57, 2.30s/it]
76%|███████▌ | 76/100 [02:45<00:53, 2.25s/it]
77%|███████▋ | 77/100 [02:47<00:53, 2.33s/it]
78%|███████▊ | 78/100 [02:50<00:52, 2.40s/it]
79%|███████▉ | 79/100 [02:52<00:49, 2.33s/it]
80%|████████ | 80/100 [02:54<00:43, 2.18s/it]
81%|████████ | 81/100 [02:56<00:43, 2.31s/it]
82%|████████▏ | 82/100 [02:58<00:40, 2.27s/it]
83%|████████▎ | 83/100 [03:01<00:38, 2.24s/it]
84%|████████▍ | 84/100 [03:03<00:35, 2.20s/it]
85%|████████▌ | 85/100 [03:05<00:32, 2.18s/it]
86%|████████▌ | 86/100 [03:07<00:32, 2.29s/it]
87%|████████▋ | 87/100 [03:09<00:29, 2.25s/it]
88%|████████▊ | 88/100 [03:12<00:26, 2.19s/it]
89%|████████▉ | 89/100 [03:14<00:24, 2.19s/it]
90%|█████████ | 90/100 [03:16<00:23, 2.32s/it]
91%|█████████ | 91/100 [03:18<00:19, 2.15s/it]
92%|█████████▏| 92/100 [03:21<00:18, 2.29s/it]
93%|█████████▎| 93/100 [03:22<00:14, 2.13s/it]
94%|█████████▍| 94/100 [03:25<00:12, 2.14s/it]
95%|█████████▌| 95/100 [03:27<00:11, 2.28s/it]
96%|█████████▌| 96/100 [03:29<00:08, 2.24s/it]
97%|█████████▋| 97/100 [03:31<00:06, 2.10s/it]
98%|█████████▊| 98/100 [03:33<00:04, 2.00s/it]
99%|█████████▉| 99/100 [03:35<00:02, 2.05s/it]
100%|██████████| 100/100 [03:38<00:00, 2.23s/it]
100%|██████████| 100/100 [03:38<00:00, 2.18s/it]
0%| | 0/25 [00:00<?, ?it/s]
4%|▍ | 1/25 [00:01<00:42, 1.77s/it]
8%|▊ | 2/25 [00:03<00:46, 2.00s/it]
12%|█▏ | 3/25 [00:05<00:41, 1.89s/it]
16%|█▌ | 4/25 [00:08<00:45, 2.18s/it]
20%|██ | 5/25 [00:10<00:43, 2.17s/it]
24%|██▍ | 6/25 [00:12<00:39, 2.10s/it]
28%|██▊ | 7/25 [00:14<00:37, 2.07s/it]
32%|███▏ | 8/25 [00:17<00:37, 2.23s/it]
36%|███▌ | 9/25 [00:19<00:37, 2.33s/it]
40%|████ | 10/25 [00:21<00:34, 2.28s/it]
44%|████▍ | 11/25 [00:24<00:33, 2.36s/it]
48%|████▊ | 12/25 [00:26<00:29, 2.29s/it]
52%|█████▏ | 13/25 [00:28<00:26, 2.24s/it]
56%|█████▌ | 14/25 [00:30<00:24, 2.21s/it]
60%|██████ | 15/25 [00:32<00:21, 2.16s/it]
64%|██████▍ | 16/25 [00:34<00:19, 2.14s/it]
68%|██████▊ | 17/25 [00:36<00:16, 2.04s/it]
72%|███████▏ | 18/25 [00:39<00:15, 2.20s/it]
76%|███████▌ | 19/25 [00:40<00:12, 2.08s/it]
80%|████████ | 20/25 [00:43<00:10, 2.10s/it]
84%|████████▍ | 21/25 [00:45<00:08, 2.24s/it]
88%|████████▊ | 22/25 [00:47<00:06, 2.13s/it]
92%|█████████▏| 23/25 [00:49<00:04, 2.20s/it]
96%|█████████▌| 24/25 [00:52<00:02, 2.32s/it]
100%|██████████| 25/25 [00:54<00:00, 2.16s/it]
100%|██████████| 25/25 [00:54<00:00, 2.17s/it]
xtrain=numpy.array(xtrain)
ytrain=numpy.array(ytrain)
xtest=numpy.array(xtest)
ytest=numpy.array(ytest)
with open('three_survey_lines_lhs.npy', 'wb') as f:
numpy.save(f,ndim)
numpy.save(f,xlimits)
numpy.save(f,xtrain)
numpy.save(f,ytrain)
numpy.save(f,xtest)
numpy.save(f,ytest)
Watermark#
watermark_list = ["cofi", "numpy", "scipy", "matplotlib","smt"]
for pkg in watermark_list:
pkg_var = __import__(pkg)
print(pkg, getattr(pkg_var, "__version__"))
cofi 0.2.11+71.gb28b5b0
numpy 2.2.6
scipy 1.17.1
matplotlib 3.10.9
smt 2.13.0
sphinx_gallery_thumbnail_number = -1
Total running time of the script: (4 minutes 34.975 seconds)