RestoreIO is an online computational service to Restore Incomplete Oceanographic datasets, with a specific focus on ocean surface velocity data. This gateway can also generate data ensemble and perform statistical analysis, which allows uncertainty qualification of such datasets.
This user guide offers both a quick overview of the application usage and more in-depth details. We recommend starting with the Getting Started section and then acquainting yourself with the interface detailed in the following sections.
When accessing our online service, your web browser communicates with our computing server through ports 3000 to 3010. However, firewalls implemented by certain internet connections, such as public WiFi networks, including eduroams, may block these ports. In the event of a connection failure, you will receive a notification message. To ensure uninterrupted access to our service, we recommend switching to an alternative network connection.
To get a quick overview on the interface, we recommend watching a short demo video.
To quickly familiarize yourself with the application, you can follow these steps to run a minimalistic example on a sample dataset:
This section offers comprehensive guidance on preparing your datasets to ensure their compatibility as input for RestoreIO.
The input dataset can consist of one or multiple files, which should adhere to the following formats:
.nc
, .nc4
, .ncd
, or .nc.gz
..ncml
or .ncml.gz
. For more information on NcML files, see Single Dataset Stored Across Multiple Files.Note that it is also acceptable to provide a NetCDF file without a file extension.
It is highly recommended to save your data file in NetCDF4 format instead of NetCDF3. For more information, refer to NetCDF4 Package documentation in Python or NetCDF Files in MATLAB.
Also, you may follow the best practice of preparing a NetCDF file, which involves passing the CF 1.8 compliance test using online CF-convention compliance checker. In particular, this involves adding the standard_name
attribute to your variables (see Required NetCDF Variables section below). For reference, you can consult the comprehensive list of CF compliance standard names.
An input NetCDF file to our application should include all the variables listed in the table below. To ensure proper detection of these variables by our application, each variable should include at least one of the attributes: standard_name
, name
, or both, as listed in the table. Note that checking the (standard) name is done in a case-insensitive manner.
Variable | Acceptable Standard Names | Acceptable Names |
---|---|---|
Time | time |
t , time , datetime |
Longitude | longitude |
lon , long , longitude |
Latitude | latitude |
lat , latitude |
Ocean's Surface East Velocity |
surface_eastward_sea_water_velocity eastward_sea_water_velocity surface_geostrophic_eastward_sea_water_velocity surface_geostrophic_sea_water_x_velocity surface_geostrophic_eastward_sea_water_velocity_assuming_sea_level_for_geoid surface_eastward_geostrophic_sea_water_velocity_assuming_sea_level_for_geoid surface_geostrophic_sea_water_x_velocity_assuming_mean_sea_level_for_geoid surface_geostrophic_sea_water_x_velocity_assuming_sea_level_for_geoid surface_geostrophic_eastward_sea_water_velocity_assuming_mean_sea_level_for_geoid sea_water_x_velocity x_sea_water_velocity
|
east_vel eastward_vel u ugos east_velocity eastward_velocity u-velocity
|
Ocean's Surface North Velocity |
surface_northward_sea_water_velocity northward_sea_water_velocity surface_geostrophic_northward_sea_water_velocity surface_geostrophic_sea_water_y_velocity surface_geostrophic_northward_sea_water_velocity_assuming_sea_level_for_geoid surface_northward_geostrophic_sea_water_velocity_assuming_sea_level_for_geoid surface_geostrophic_sea_water_y_velocity_assuming_mean_sea_level_for_geoid surface_geostrophic_sea_water_y_velocity_assuming_sea_level_for_geoid surface_geostrophic_northward_sea_water_velocity_assuming_mean_sea_level_for_geoid sea_water_y_velocity y_sea_water_velocity
|
north_vel northward_vel v vgos north_velocity northward_velocity v-velocity
|
Apart from the required variables mentioned above, you have the option to include the following additional variables in your input file. Note that there is no standard name established for these variables, so you should provide a name
attribute according to the table. These variables are used exclusively for the purposes of uncertainty quantification by generating data ensemble. For more details, you may refer to the Generate Ensemble section.
Variable | Acceptable Standard Names | Acceptable Names |
---|---|---|
Ocean's Surface East Velocity Error | N/A | east_err , east_error |
Ocean's Surface North Velocity Error | N/A | north_err , north_error |
Geometric Dilution of Precision (East Component) | N/A | dopx , gdopx |
Geometric Dilution of Precision (North Component) | N/A | dopy , gdopy |
The following provides further details for each of the variables listed in the tables above.
The time variable should be a one-dimensional array and strictly increases in values.
units
: a string specifying both the time unit (such as years
, months
, days
, hours
, minutes
, seconds
or microseconds
) and the origin of the time axis (such as since 1970-01-01 00:00:00 UTC
). If this attribute is not provided, the default assumption is days since 1970-01-01 00:00:00 UTC
.calendar
: a string indicating the time calendar. If this attribute is not provided, the default assumption is gregorian
.Ensure that the time variable is not masked. If the _FillValue
attribute is included, the variable will be masked. Therefore, make sure this attribute is not present for the time variable.
These variables should be one-dimensional arrays, each representing an axis of a rectilinear grid. The values in both longitude and latitude arrays should either strictly increase or strictly decrease. The units of the arrays should be degrees positive eastward (for longitude) and degrees positive northward (for latitude).
Our application is designed to process data on rectilinear grids which are presented by one-dimensional longitude and latitude arrays. However, if your data is on irregular grids represented by two-dimensional longitude and latitude arrays, you can remap the data to a rectilinear grid by using interpolation functions such as scipy.interpolate.griddata in Python or griddata in MATLAB.
Ensure that the longitude and latitude variables are not masked. The presence of _FillValue
attribute, for example, will cause these variables to be masked. Therefore, make sure this attribute is not present for the longitude and latitude variables.
There is no restriction on the physical unit of the velocity variables; however, they should be oriented positive eastward (for the east component) and positive northward (for the north component).
The east and north ocean's surface velocity variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.
The order of dimensions for a velocity variable, named east_vel
for instance, is as follows:
east_vel[time, lat, lon]
in Python and east_vel(lon, lat, time)
in MATLAB.east_vel[time, depth, lat, lon]
in Python and east_vel(lon, lat, depth, time)
in MATLAB.Note that the order of dimensions in MATLAB is reversed compared to Python.
In areas where the velocity is unknown (either due to being located on land or having incomplete data coverage), the velocity variable should be masked using one of the following methods:
9999.0
and assign the attribute missing_value
or _FillValue
with this value.NaN
.When you select the Generate ensemble of velocity field for uncertainty quantification option within the interface, the east and north velocity error variables are used. However, for uncertainty quantification purposes, you have the alternative option of providing the Geometric Dilution of Precision Variables instead of the velocity error variables.
For further details, refer to Generate Ensemble section.
The velocity error variables should be expressed as non-negative values and use the same unit as the velocity variable, such as both being in meters per second. If your velocity error values are not in the same unit as the velocity variables (e.g., velocity in meters per second and velocity error in centimeters per second), you can convert the velocity error unit by using the Scale velocity error entry within the interface. This scale factor will be directly multiplied to the error variables in your files.
The east and north ocean's surface velocity error variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.
The order of dimensions for a velocity error variable, named east_vel
for instance, is as follows:
east_vel[time, lat, lon]
in Python and east_vel(lon, lat, time)
in MATLAB.east_vel[time, depth, lat, lon]
in Python and east_vel(lon, lat, depth, time)
in MATLAB.Note that the order of dimensions in MATLAB is reversed compared to Python.
Unlike the velocity variable, masking the velocity error variables is not mandatory. However, if you choose to apply masks to the velocity error variables, the same rules that apply to the velocity variable should also be followed for the velocity error variables.
The Geometric Dilution of Precision (GDOP) is relevant to HF radar datasets, and it quantifies the effect of the geometric configuration of the HF radars on the uncertainty in velocity estimates. To gain a better understanding of the GDOP variables, we recommend referring to Section 2 of [2].
When you select the Generate ensemble of velocity field for uncertainty quantification option within the interface, the Ocean's East and North Velocity Error Variables are used. However, for uncertainty quantification purposes, you have the alternative option of providing the GDOP variables instead of the velocity error variables.
For further details on the usage of GDOP variables, refer to Generate Ensemble section.
When utilizing the GDOP variables instead of the velocity error variables, ensure to specify the Scale velocity error entry within the interface. This value should be set to the radial error of HF radars. The velocity error is then calculated as the product of this scale factor and the GDOP variables.
The GDOP variables should be expressed as non-negative values. The GDOP variables are dimensionless, however, when the GDOP variables are provided instead of the velocity error, the unit of the Scale velocity error quantity should be the same unit as your velocity variable.
The east and north ocean's surface velocity error variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.
The order of dimensions for a velocity error variable, named east_vel
for instance, is as follows:
east_vel[time, lat, lon]
in Python and east_vel(lon, lat, time)
in MATLAB.east_vel[time, depth, lat, lon]
in Python and east_vel(lon, lat, depth, time)
in MATLAB.Note that the order of dimensions in MATLAB is reversed compared to Python.
Unlike the velocity variable, masking the velocity error variables is not mandatory. However, if you choose to apply masks to the velocity error variables, the same rules that apply to the velocity variable should also be followed for the velocity error variables.
You can provide the input dataset in two different ways:
To select the desired input method, simply use the tabs available in the Input Type section. Details of each method will be described in the following.
Note About Tabs:
During the execution of a computation, only the currently active tab among the available tab options will be taken into consideration, regardless of whether you have entered data in other tabs. For example, if you have uploaded a file under the Upload File tab, but later you accidentally switched to the OpenDap URL tab, your selection will be interpreted as using the OpenDap URL option. However, the file you have uploaded will not be discarded. This rule applies to all tab selections within the form.
Finding OpenDap URL from THREDDS catalog (Click to Play/Stop)
You can provide an OpenDap URL of a remote dataset through the Enter URL text box under the OpenDap URL tab within the Input Type section of the interface.
Many providers of geophysical data host their datasets on THREDDS Data servers, which offer OpenDap protocols. The following steps guide you to obtain the OpenDap URL of a remote dataset hosted on a THREDDS server. In the example below, we use a sample HF radar data hosted on our THREDDS server available at https://transport.me.berkeley.edu/thredds.
https://transport.me.berkeley.edu/thredds/dodsC/root/WHOI-HFR/WHOI_HFR_2014_restored.nc
For a visual demonstration of the steps described above, you may refer to the animated clip.
To help users explore the application, we have included the OpenDap URLs for two sample datasets in the Input Type section of the interface. Simply click on the drop-down menu located on the right side of the Enter URL text box, and choose one of the sample datasets. This selection will automatically populate the URL text box with the corresponding OpenDap URL.
You can upload your files under the Upload File tab from the Input Type entry. You can upload one or multiple files (see Multi-File Dataset section below).
To upload data files, you will need to provide an 8-character token string (refer to Enter Token within the interface). Tokens are required to prevent bots from uploading malicious files to the server. To obtain a token, please make a request to ude.yelekreb@ilemas. Tokens are available free of charge and offer unlimited uploads without any expiration date.
You have the option to provide multiple files. A multi-file datasets can appear in two scenarios:
If your dataset is divided into multiple files, where each file represents a distinct part of the data (e.g., different time frames), you can use the NetCDF Markup Language (NcML) to create an ncml
file that aggregates all the individual NetCDF files into a single dataset. To provide this multi-file dataset, simply specify the URL of the NcML file. For detailed guidance on using NcML, you can consult the NcML Tutorial.
Alternatively, you may have several files, with each file representing an independent dataset. In this case, you can provide all these datasets simultaneously to the interface. An example of such multiple files could be an ensemble obtained from ocean models, where each file corresponds to a velocity ensemble.
The following steps guide you to provide multiple files.
When providing multiple files, the name of your files (or the URLs) should include a numeric pattern. For instance, you can use the file name format like MyInputxxxxFile.nc
where xxxx
is the numeric pattern. An example of such data URLs where the pattern ranges from 0000
to 0020
could be:
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0000File.nc https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0001File.nc https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0002File.nc ... https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0020File.nc
There are two method for providing your multi-file data:
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0015File.ncOur application can parse the URL pattern and generate the URLs for the other remote files automatically.
Uploading multiple files alone does not automatically enable processing of multiple files. To enable multi-file processing, you need to explicitly enable this feature within the application's interface (refer to Enable processing multiple files (separate datasets) option).
Within the interface, enter the Minimum file index and Maximum file index to define the range of files to be processed. This allows the application to search through your uploaded files or generate new URLs based on the provided URL to access the other datasets.
For example, in the case of the URLs mentioned earlier, you can enter 0
as the minimum file index and 20
as the maximum file index within the interface. Alternatively, you can specify the full iterator pattern with the leading zeros as 0000
to 0020
.
After providing the input data, it is recommended that you perform a scan of your dataset by clicking on Scan Data. This button performs a simple check on your data to make sure required variables exists and are readable. In addition, it reads the time span and spatial extent of the data and fills the interface form with these information. This is often useful if you do not have a priori knowledge on the time and spatial extent of your data. In particular, the scan applies the following information to the form:
You might adjust these settings to a desired configuration.
To visualize your input data, click on the Visualize button located in the input data section. This will display the east and north components of the velocity data on the map.
Please keep in mind that if your input dataset is hosted remotely and the remote data server does not allow Cross-Origin Resource Sharing (CORS) policy, the visualization of your dataset may be slower on our end as our server will try to proxy the remote data server to access the data. If you experience slow rendering or encounter an error during visualization, you can choose to skip visualization and proceed with the computation.
In the Time section of the interface, you can specify the time span of the data to be processed. There are two options available under the Specify Time Span tabs: Single time and Time interval.
Select this option to process a specific time within the input dataset. You can use the Time point entry to specify the desired time. If the chosen time does not exactly match any time stamp in your input data, the closest available time will be used for processing.
This option allows you to process a specific time interval within the input dataset. You can define the start and end times by providing the Initial time and Final time entries. If the specified times do not exactly match any time stamps in your input data, the closest available times will be used for processing.
Alternatively, you can select the Process whole dataset times option to process the entire time span within your input data.
Please note that the time interval option cannot be used if you have enabled generating ensemble (refer to Generate Ensemble).
In the Domain section of the interface, you have the option to specify a spatial subset of the data for processing. By choosing a subset of the domain, the output file will only contain the specified spatial region.
If you select the Use whole dataset bounds option, the entire spatial extent within your input data will be processed. However, please be aware that for large spatial datasets, this option might require a significant amount of time for processing. To optimize efficiency, we recommend subsetting your input dataset to a relevant segment that aligns with your analysis.
The input dataset's grid comprises two distinct sets of points: locations with available velocity data and locations where velocity data is not provided. These regions are referred to as the known domain \(\Omega_k\) and the unknown domain \(\Omega_u\), respectively. Therefore, the complete grid of the input data \(\Omega\) can be decomposed into \(\Omega = \Omega_k \cup \Omega_u\).
The primary objective of data reconstruction is to fill the data gaps within the regions where velocity data is missing. The region of missing data, \(\Omega_m\), is part of the unknown domain \(\Omega_u\). However, the unknown domain contains additional points that are not necessarily missing, such as points located on land, denoted as \(\Omega_l\), or regions of the ocean that are not included in the dataset, which we denote as \(\Omega_o\).
Before proceeding with reconstructing the missing velocity data, it is essential to first identify the missing domain \(\Omega_m\). This involves segmenting the unknown domain \(\Omega_u\) into \(\Omega_u = \Omega_m \cup \Omega_l \cup \Omega_o\). These tasks require the knowledge of the ocean's domain and land domain. You can configure these steps within the interface, as described in Detect Data Domain and Detect Land below.
For detailed information on domain segmentation, we recommend referring to [1].
By the data domain, \(\Omega_d\), we refer to the union of both the known domain \(\Omega_k\) and the missing domain \(\Omega_m\), namely, \(\Omega_d = \Omega_k \cup \Omega_m\). Once the missing velocity field is reconstructed, the combination of both the known and missing domains will become the data domain.
The purpose of the Detect Data Domain section of the interface is to identify \(\Omega_d\). This can be done in two ways:
By selecting the Convex hull around available points option, the data domain \(\Omega_d\) is defined as the region enclosed by a convex hull around the known domain \(\Omega_k\). As such, any unknown point inside the convex hull is flagged as missing, and all points outside this convex hull are considered as part of the ocean domain \(\Omega_o\) or land \(\Omega_l\).
By selecting the Concave hull around available points option, the data domain \(\Omega_d\) is defined as the region enclosed by a convex hull around the known domain \(\Omega_k\). As such, any unknown point inside the convex hull is flagged as missing, and all points outside this convex hull are considered as part of the ocean domain \(\Omega_o\) or land \(\Omega_l\).
Note that a concave hull (also known as alpha shape) is not unique and is characterized by a radius parameter (refer to Alpha shape radius (in Km) entry within the interface). The radius is the inverse of the \(\alpha\) parameter in alpha-shapes. A smaller radius causes the concave hull to shrink more toward the set of points it is encompassing. Conversely, a larger radius yields a concave hull that is closer to a convex hull. We recommend setting the radius (in the unit of Km) to a few multiples of the grid size. For instance, for an HF radar dataset with a 2 km resolution, where the grid points are spaced 2 km apart, a radius of approximately 10 km works fine for most datasets.
We recommend choosing concave hull over convex hull as it can better identify the data domain within your input files, provided that the radius parameter is tuned appropriately.
In some cases, a part of the convex or concave hull might overlap with the land domain, leading to the mistaken flagging of such intersections as missing domains to be reconstructed. To avoid this issue, it is recommended to detect the land domain \(\Omega_l\) and exclude it from the data domain \(\Omega_d\) if there is any intersection. There are three options available within the interface regarding the treatment of the land domain:
The land boundaries are queried using the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG). For large datasets, we advise against using the third option, as using high accuracy map can significantly increase the processing time for detecting land. For most datasets, we recommend using the second option, as it offers sufficient accuracy while remaining relatively fast.
Domain segmentation
The following figure serves as an illustration of the domain segmentation for an HF radar data in the region of Monterey Bay, as an example. In the left panel, the green domain represents the known area where velocity data is available, while the red domain signifies the region without velocity data. In the right panel, the missing domain is highlighted in red. This domain is determined by the red points from the left panel that fall within the concave hull around the green points. The points located outside the concave hull are considered non-data points, representing the ocean domain. The reconstruct of the velocity field is done within the red points shown in the right panel.
If your dataset's data domain is close to land (e.g., in HF radar datasets spanning across coastlines), you can extend the data domain beyond the region identified by the convex or concave hulls, reaching up to the coastline. To achieve this, you can enable the Extend data restoration up to the coastline option within the interface.
Extending Data Domain up to Coastline
By extending the data domain to the land, a zero boundary condition for the velocity field on the land is imposed. However, note that this assumption is less results in less credible reconstructed field when the coastal gap is large.
The illustration below showcases the impact of activating this feature for the example of Monterey Bay region. Notably, the alteration can be observed in the right panel, where the area between the data domain and the coastline is highlighted in red. This signifies that the gaps extending up to the coastlines will be comprehensively reconstructed.
With the Refine Grid entry in the interface, you can increase the dataset's grid size by an integer factor along both longitude and latitude axes. This process involves interpolating the data onto a more refined grid. It's important to note that this refinement doesn't enhance the data resolution.
We advise keeping the refinement level at the default value of 1, unless there's a specific reason to refine the grid size. Increasing the refinement level can significantly increase computation time and may not provide additional benefits in most cases.
In addition to reconstructing missing data, RestoreIO offers the functionality to generate ensemble of the velocity vector field. The ensemble serve the purpose of uncertainty quantification, which can be valuable for various applications. For more details on the ensemble generation algorithm, you may refer to [2].
To generate a velocity ensemble, simply enable the Generate ensemble of velocity field for uncertainty quantification option within the interface. Note that ensemble can only be generated for a single time point. As a result, selecting this option will automatically activate the Single Time tab in the Specify Time Span section.
To generate an ensemble, you should provide one of the following additional variables in your input file:
If you choose to provide GDOP variables instead of the velocity error variables, the velocity errors are calculated from GDOP as follows:
$$ \begin{align} \sigma_e &= \sigma_r \mathrm{GDOP}_e, \\ \sigma_n &= \sigma_r \mathrm{GDOP}_n, \end{align} $$where \(\sigma_e\) and \(\sigma_n\) are the east and north components of the velocity error, \(\mathrm{GDOP_e}\) and \(\mathrm{GDOP}_n\) are the east and north components of the GDOP, respectively, and \(\sigma_r\) is the radar's radial error. You can specify \(\sigma_r\) using the Scale velocity error entry within the interface (also refer to Scale Velocity Errors section below).
The following settings can be set within the Generate Ensemble section of the interface.
The Write samples to output option allows you to save the entire population of ensemble vector fields to the output file. If this option is not enabled, only the mean and standard deviation of the ensemble will be stored. For more details, please refer to the Output Variables section.
The Number of (Monte-Carlo) samples entry within the interface enables you to specify the number of samples to be generated. This value should be greater than the number of velocity data points. Keep in mind that the processing time increases linearly with larger sample sizes.
To generate an ensemble, the eigenvalues and eigenvectors of the covariance matrix of the velocity data need to be computed. For a velocity data with \(n\) data points, this means the eigenvalues and eigenvectors of an \(n \times n\) matrix have to be calculated. However, such a computation has a complexity of \(\mathcal{O}(n^3)\), which can be infeasible for large datasets.
To handle this, we employ a practical approach where we only compute a reduced number of \(m\) eigenvalues and eigenvectors of the covariance matrix, where \(m\) can be much smaller than \(n\). This simplification reduces the complexity to \(\mathcal{O}(n m^2)\), which enables us to process larger datasets while maintaining a reasonable level of accuracy. For a better understanding of this concept, we refer the interested reader to Section 4 of [2].
The Number of eigen-modes (in percent) entry within the interface allows you to specify the number of eigenvectors of the data covariance to be utilized in the computations. The number of modes should be given as a percentage of the ratio \(m/n\).
Keep in mind that the processing time quadratically increases with the number of eigenmodes. We recommend setting this value to around 5% to 10% for most datasets.
The Kernel width entry within the interface represents the width of a spatial kernel used to construct the covariance matrix of the velocity data. The kernel width is measured in the unit of the velocity data points. For example, a kernel width of 5 on an HF radar dataset with a 2 km spatial resolution implies a kernel width of 10 km.
It is assumed that spatial distances larger than the kernel width are uncorrelated. Therefore, reducing the kernel width makes the covariance matrix of the data more sparse, resulting in more efficient processing. However, a smaller kernel width may lead to information loss within the dataset. As a general recommendation, we suggest setting this value to 5 to 20 data points.
The Scale velocity error entry serves two purposes:
The results of RestoreIO are stored in a NetCDF file with a .nc
format. This file encompasses a selection of the following variables, contingent on the chosen configuration:
In addition to the detailed description of each variable provided below, you can explore a compilation of output variable names, along with their corresponding NetCDF dimensions, in the NetCDF Variables section available within the tutorials.
The mask variable is a three-dimensional array with dimensions for time, longitude, and latitude. This variable is stored under the name mask
in the output file.
The mask variable includes information about the result of domain segmentation (refer to Domain Segmentation section). This array contains integer values -1
, 0
, 1
, and 2
that are interpreted as follows:
-1
indicates the location is identified to be on the land domain \(\Omega_l\). In these locations, the output velocity variable is masked.0
indicates the location is identified to be on the known domain \(\Omega_k\). These locations have velocity data in the input file. The same velocity values are preserved in the output file.1
indicates the location is identified to be on the missing domain \(\Omega_m\). These locations do not have a velocity data in the input file, but they do have a reconstructed velocity data on the output file.2
indicates the location is identified to be on the ocean domain \(\Omega_o\). In these locations, the output velocity variable is masked.The reconstructed east and north velocity variables are stored in the output file under the names east_vel
and north_vel
, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.
The velocity variables on each of the segmented domains are defined as follows:
-1
or 2
, the output velocity variables are masked.0
, the output velocity variables have the same values as the corresponding variables in the input file.1
, the output velocity variables are reconstructed. If the Generate ensemble of velocity field for uncertainty quantification is enabled, these output velocity variables are obtained by the mean of the velocity ensemble, where the missing domain of each ensemble is reconstructed.If the Generate ensemble of velocity field for uncertainty quantification option is enabled within the interface, the east and north velocity error variables will be included in the output file under the names east_err
and north_err
, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.
The velocity error variables on each of the segmented domains are defined as follows:
-1
or 2
, the output velocity error variables are masked.0
, the output velocity error variables are obtained from either the corresponding velocity error or GDOP variables in the input file scaled by the value of Scale velocity error.1
, the output velocity error variables are obtained from the standard deviation of the ensemble, where the missing domain of each ensemble is reconstructed.When you activate the Generate ensemble of velocity field for uncertainty quantification option, a collection of velocity field ensemble is created. Yet, by default, the output file only contains the mean and standard deviation of the ensemble. To incorporate all ensemble samples into the output file, you should additionally enable the Write samples to output option within the interface. This action saves the east and north velocity ensemble variables in the output file as east_vel_ensemble
and north_vel_ensemble
, respectively. These variables are four-dimensional arrays with dimensions for ensemble, time, longitude, and latitude.
The ensemble dimension of the array has the size \(s+1\) where \(s\) is the number of samples specified by Number of (Monte-Carlo) samples entry within the interface (also refer to Number of (Monte-Carlo) Samples section). The first ensemble with the index \(0\) (assuming zero-based numbering) corresponds to the original input dataset. The other samples with the indices \(1, \dots, s\) correspond to the generated samples.
The velocity ensemble variables on each of the segmented domains are defined similar to those presented for Reconstructed East and North Velocities. In particular, the missing domain of each ensemble is reconstructed independently.
Note that the mean and standard deviation of the velocity ensemble arrays over the ensemble dimension yield the Reconstructed East and North Velocities and East and North Velocity Errors variables, respectively.
After the computation is complete, you can visualize the results by clicking on the Visualize button. This action loads an animated visualization of your results in the map section.
To gain more control over visualization, make use of the Show Workbench button located on the top-left corner of the visualization screen. This feature enables you to set specific color ranges and styles, tailoring the display to your preferences. If you find yourself working with large datasets on a slower internet connection, consider switching to a 2D map view (as opposed to the spherical globe) by clicking the Map button in the upper-right corner of the map screen. Additionally, to reset the view to the standard angle, simply click the circle at the center of the zoom buttons on the right panel of the screen.
Within the workbench, you have the ability to select variables for visualization according to your needs. These options include the original east and north velocities from your input dataset, the reconstructed east and north velocities, and the mask variable. Additionally, if you have enabled the generation of ensemble (refer to the Generate Ensemble section), you can also visualize the east and north velocity errors.
Utilizing the workbench sidebar, you can also engage in an interactive comparison between input and reconstructed velocity datasets. To achieve this, choose two pairs of original and reconstructed velocity variables, such as the original and reconstructed east velocities. This action will overlay both sets of variable fields. By adjusting the opacity of the reconstructed velocity through the slider on the left sidebar, you can manage the visibility of one field while exposing the overlaid field. To see this process in action, you can refer to the provided demo video.
Note that the output data is stored only during the lifetime of your browser session. Once you refresh or close your browser, both the uploaded input data (if any) and the generated output files will be completely purged from the server.
You can download the output result using Download button. Depending on your selection of single or multi-file dataset (refer to Multi-File Datasets), the output file you download can be one of the following:
If your input consists of a single dataset (either as a single input file or multiple files representing a single dataset), the output result will be a single .nc
file.
If your input files represent multiple separate datasets (refer to Multiple Separate Dataset, Each within a File), a distinct output file with a .nc
format is generated for each input file (or URL). These output files are named similarly to their corresponding input files. All of these files are then bundled into a .zip
file. When you download your results, you will receive this zip file.
During the generation of output results, the output NetCDF files are temporarily hosted on our THREDDS data server for the duration of your browser session. As such, you have the option to view the THREDDS catalog page corresponding to your data. The catalog page contains useful information about your output file, allowing you to explore its contents before downloading. You can use services such as OPENDAP to explore the variables, NetcdfSubset to subset your data, and Godiva3 as an alternative visualization tool.
To access the THREDDS catalog of your data, navigate to the dropdown menu on the right corner of the Download button and select View THREDDS Catalog.
Every user's session is assigned a distinct ID, represented by a 32-character string. This ID serves as both the name for the output files and can be useful for reporting any bugs or issues. You can locate your exclusive identifier in the About section below.
To conveniently save and retrieve your configuration settings within your session, our application offers to save and load the form contents. By clicking on the Save Form button, you can store all the settings you have entered into a file named FormData.json
. This file captures your configuration and can be downloaded for future use. In subsequent visits to the web page, you can easily restore your previous settings by uploading the FormData.json
file using the Load Form button. Examples of form files for various tasks can be found in the Produce Output File section within the tutorials provided below.
The following concise tutorial offers guidance on reading the output files using NetCDF library in both Python and MATLAB. If you require general information on utilizing the NetCDF library, you can consult the NetCDF API in Python or NetCDF Files tutorial in MATLAB.
To install the NetCDF package in Python, you have two options:
Using conda
to install the NetCDF package distributed on Anaconda Cloud:
$ conda install -c conda-forge netcdf4
Using pip
to install the NetCDF package distributed on PyPI:
$ sudo pip install netCDF4
For MATLAB users, the NetCDF library is already included in the MATLAB installation.
To begin, you can utilize the form files provided below, each of which includes preset settings for a specific task:
Download the desired form and load it into the interface using the Load Form button. This action pre-configures the interface settings, but you can adjust them as needed. Once the settings are ready, scroll to the bottom of the interface and click on the Restore button to initiate the computation. Once the computation is complete, use the Download button to retrieve the output results. The resulting output file will be saved under the name output-dataset.nc
.
Below are brief demonstrations on how to read the output file and view its variables using both Python and MATLAB. Keep in mind that the list of output variables in the code examples below may differ based on the specific example you selected.
# Import the netCDF4 package >>> import netCDF4 # Create a NetCDF object from the data >>> nc = netCDF4.Dataset('output-dataset.nc') # Print the list of all dimensions in the file >>> nc.dimensions.keys() dict_keys(['time', 'lon', 'lat']) # Print the list of all variables in the file >>> nc.variables.keys() dict_keys(['time', 'lon', 'lat', 'mask', 'east_vel', 'north_vel']) # Assigning nc objects to variables >>> t = nc.variables['time'] >>> lon = nc.variables['lon'] >>> lat = nc.variables['lat'] >>> u = nc.variables['east_vel'] >>> v = nc.variables['north_vel']
>> % Display NetCDF file information >> ncdisp('output-dataset.nc') >> % Alternative way of getting a summary of file information >> info = ncinfo('output-dataset.nc'); >> % Get dimensions name >> info.Dimensions >> % Get variables info >> info.Variables >> % Assigning nc objects to variables >> t = ncread('output-dataset.nc', 'time'); >> lon = ncread('output-dataset.nc', 'lon'); >> lat = ncread('output-dataset.nc', 'lat'); >> u = ncread('output-dataset.nc', 'east_vel'); >> v = ncread('output-dataset.nc', 'north_vel');
The NetCDF dimensions and variables in the above code are further detailed below.
The output dataset contains some or all of the following NetCDF dimensions, depending on the configuration:
Name | Size | Description |
---|---|---|
ensemble |
\(s+1\), where \(s\) is the number of (Monte-Carlo) samples. | The ensemble index of arrays. This dimension is present in the output file only when the ensemble generation option is enabled. |
time |
The count of time points extracted from the input dataset within the designated time interval. If the ensemble generation option is activated or the Single time tab is chosen, the size is one. | The time index of arrays. |
lon |
The count of longitude points extracted from the input dataset within the specified subdomain bounds. | The longitude index of arrays. |
lat |
The count of latitude points extracted from the input dataset within the specified subdomain bounds. | The latitude index of arrays. |
The output dataset contains some or all of the following NetCDF variables, depending on the configuration:
Variable | Name and Dimensions | Notes |
---|---|---|
Time | time[time] |
This comprises the time points extracted from the input dataset's within the specified time span. |
Longitude | lon[lon] |
This comprises the longitudes extracted from the input dataset's rectangular grid, within the specified subdomain bounds. |
Latitude | lat[lat] |
This comprises the latitudes extracted from the input dataset's rectangular grid, within the specified subdomain bounds. |
Mask | mask[lat, lon] |
This variable represents the segmentation of the domain into ocean, land, known velocity and unknown velocity domains. |
Reconstructed East Velocity | east_vel[time, lat, lon] |
If the ensemble generation option is enabled, this variable represents the mean of ensemble_east_vel over the ensemble dimension, with a time dimension size of one. |
Reconstructed North Velocity | north_vel[time, lat, lon] |
If the ensemble generation option is enabled, this variable represents the mean of ensemble_north_vel over the ensemble dimension, with a time dimension size of one. |
East Velocity Error | east_err[time, lat, lon] |
This variable represents the standard deviation of ensemble_east_err over the ensemble dimension, with a time dimension size of one. This variable is present in the output dataset only if ensemble generation option is enabled. |
North Velocity Error | north_err[time, lat, lon] |
This variable represents the standard deviation of ensemble_north_err over the ensemble dimension, with a time dimension size of one. This variable is present in the output dataset only if ensemble generation option is enabled. |
East Velocity Ensemble | east_vel_ensemble[ensemble, lat, lon] |
This variable is present in the output dataset only if ensemble generation option is enabled. |
North Velocity Ensemble | north_vel_ensemble[ensemble, lat, lon] |
This variable is present in the output dataset only if ensemble generation option is enabled. |
The indexing for the variables listed in the above table is presented for Python. It's important to note that the indexing of NetCDF variables in MATLAB follows a reverse ordering. For instance, a variable like east_vel[time, lat, lon]
in Python should be interpreted as east_vel(lon, lat, time)
in MATLAB.
If you find RestoreIO valuable for your research, we kindly ask that you recognize the original contributions by citing the following references:
[1] | Ameli, S. and Shadden, S. C. (2019). A transport method for restoring incomplete ocean current measurements. Journal of Geophysical Research: Oceans, 124, 227– 242. doi: https://doi.org/10.1029/2018JC014254 | BibTex | arXiv | ||
|
|||||
[2] | Ameli, S. and Shadden, S. C. (2023). Stochastic Modeling of HF Radar Data for Uncertainty Quantification and Gap Filling. arXiv: 2206.09976 [physics.ao-ph]. | BibTex | arXiv | ||
|
|||||
[3] | RestoreIO - An online computational service to restore incomplete oceanographic datasets. Available at https://restoreio.org | BibTex | |||
|
We are delighted to introduce TraceFlows, a companion tool to RestoreIO that offers high-performance computational services for Lagrangian analysis of geophysical fluid flows, with a primary focus on ocean surface currents. TraceFlows is designed to be seamlessly compatible with RestoreIO outputs, allowing users to preprocess their data and reconstruct missing spatial gaps with RestoreIO before transitioning to TraceFlows for in-depth Lagrangian analysis.
You can access TraceFlows at https://traceflows.org.
Issues and comments may be addressed to ude.yelekreb@ilemas. Please provide the Session ID within your email to better identify issues. Your current session ID is:
This material is based upon work supported by the National Science Foundation grant No. 1520825.