Restore Incomplete Oceanographic Data

Restoring Missing Coverages of HF Radar Dataset


Scroll Down
Input Data


Enter Token

Select File
Browse

Enable processing multiple files (separate datasets)

Minimum file index:

Maximum file index:

Time

Time point

Process whole dataset times

Initial time

Final time

Domain
Use whole dataset bounds

West bound longitude:

East bound longitude:

South bound latitude:

North bound latitude:

Domain Segmentation
Alpha shape radius (in Km)
Extend data restoration up to the coastline.
Refinement level to increase the resolution:

Generate Ensemble
Generate ensemble of velocity field for uncertainty quantification
Write samples to output

Number of (Monte-Carlo) samples:

Number of eigen-modes (in percent):

Kernel width:

Scale velocity errors:

0%

User Guide


RestoreIO is an online computational service to Restore Incomplete Oceanographic datasets, with a specific focus on ocean surface velocity data. This gateway can also generate data ensemble and perform statistical analysis, which allows uncertainty qualification of such datasets.

This user guide offers both a quick overview of the application usage and more in-depth details. We recommend starting with the Getting Started section and then acquainting yourself with the interface detailed in the following sections.

Getting Started

Network Connection

When accessing our online service, your web browser communicates with our computing server through ports 3000 to 3010. However, firewalls implemented by certain internet connections, such as public WiFi networks, including eduroams, may block these ports. In the event of a connection failure, you will receive a notification message. To ensure uninterrupted access to our service, we recommend switching to an alternative network connection.

Demo Video

To get a quick overview on the interface, we recommend watching a short demo video.

Quick Start Using a Sample Dataset

To quickly familiarize yourself with the application, you can follow these steps to run a minimalistic example on a sample dataset:

  1. Click on the dropdown menu located on the right side of the Enter URL text box (note that for this example, you don't need to provide any URL manually). From the menu, choose an example dataset that you would like to use. For instance, you can select Martha's Vineyard 2014. This selection will automatically fill the text box with the dataset's URL and populate the rest of the form with example settings that align with this dataset.
  2. Scroll to the end of the form and click on the Restore button. Allow a few moments for the computation to complete.
  3. Once the computation finishes, you can click on the Download button to save the output file.
  4. To visualize the result, click on the Visualize button. This will direct you to the map section. Within the map interface, you will find a bottom panel bar with a play button. Click on the play button to initiate the animation and explore the visual representation of the results.

Preparing Input Data

This section offers comprehensive guidance on preparing your datasets to ensure their compatibility as input for RestoreIO.

File Format

The input dataset can consist of one or multiple files, which should adhere to the following formats:

  1. NetCDF file format with file extensions .nc, .nc4, .ncd, or .nc.gz.
  2. NcML file format with file extensions .ncml or .ncml.gz. For more information on NcML files, see Single Dataset Stored Across Multiple Files.

Note that it is also acceptable to provide a NetCDF file without a file extension.

Best Practice for NetCDF Files

It is highly recommended to save your data file in NetCDF4 format instead of NetCDF3. For more information, refer to NetCDF4 Package documentation in Python or NetCDF Files in MATLAB.

Also, you may follow the best practice of preparing a NetCDF file, which involves passing the CF 1.8 compliance test using online CF-convention compliance checker. In particular, this involves adding the standard_name attribute to your variables (see Required NetCDF Variables section below). For reference, you can consult the comprehensive list of CF compliance standard names.

Required NetCDF Variables

An input NetCDF file to our application should include all the variables listed in the table below. To ensure proper detection of these variables by our application, each variable should include at least one of the attributes: standard_name, name, or both, as listed in the table. Note that checking the (standard) name is done in a case-insensitive manner.

Variable Acceptable Standard Names Acceptable Names
Time time t, time, datetime
Longitude longitude lon, long, longitude
Latitude latitude lat, latitude
Ocean's Surface East Velocity surface_eastward_sea_water_velocity
eastward_sea_water_velocity
surface_geostrophic_eastward_sea_water_velocity
surface_geostrophic_sea_water_x_velocity
surface_geostrophic_eastward_sea_water_velocity_assuming_sea_level_for_geoid
surface_eastward_geostrophic_sea_water_velocity_assuming_sea_level_for_geoid
surface_geostrophic_sea_water_x_velocity_assuming_mean_sea_level_for_geoid
surface_geostrophic_sea_water_x_velocity_assuming_sea_level_for_geoid
surface_geostrophic_eastward_sea_water_velocity_assuming_mean_sea_level_for_geoid
sea_water_x_velocity
x_sea_water_velocity
east_vel
eastward_vel
u
ugos
east_velocity
eastward_velocity
u-velocity
Ocean's Surface North Velocity surface_northward_sea_water_velocity
northward_sea_water_velocity
surface_geostrophic_northward_sea_water_velocity
surface_geostrophic_sea_water_y_velocity
surface_geostrophic_northward_sea_water_velocity_assuming_sea_level_for_geoid
surface_northward_geostrophic_sea_water_velocity_assuming_sea_level_for_geoid
surface_geostrophic_sea_water_y_velocity_assuming_mean_sea_level_for_geoid
surface_geostrophic_sea_water_y_velocity_assuming_sea_level_for_geoid
surface_geostrophic_northward_sea_water_velocity_assuming_mean_sea_level_for_geoid
sea_water_y_velocity
y_sea_water_velocity
north_vel
northward_vel
v
vgos
north_velocity
northward_velocity
v-velocity

Optional NetCDF Variables

Apart from the required variables mentioned above, you have the option to include the following additional variables in your input file. Note that there is no standard name established for these variables, so you should provide a name attribute according to the table. These variables are used exclusively for the purposes of uncertainty quantification by generating data ensemble. For more details, you may refer to the Generate Ensemble section.

Variable Acceptable Standard Names Acceptable Names
Ocean's Surface East Velocity Error N/A east_err, east_error
Ocean's Surface North Velocity Error N/A north_err, north_error
Geometric Dilution of Precision (East Component) N/A dopx, gdopx
Geometric Dilution of Precision (North Component) N/A dopy, gdopy

The following provides further details for each of the variables listed in the tables above.

1. Time Variable

The time variable should be a one-dimensional array and strictly increases in values.

Optional Attributes

Masking

Ensure that the time variable is not masked. If the _FillValue attribute is included, the variable will be masked. Therefore, make sure this attribute is not present for the time variable.

2. Longitude and Latitude Variables

These variables should be one-dimensional arrays, each representing an axis of a rectilinear grid. The values in both longitude and latitude arrays should either strictly increase or strictly decrease. The units of the arrays should be degrees positive eastward (for longitude) and degrees positive northward (for latitude).

Data on Irregular Grids

Our application is designed to process data on rectilinear grids which are presented by one-dimensional longitude and latitude arrays. However, if your data is on irregular grids represented by two-dimensional longitude and latitude arrays, you can remap the data to a rectilinear grid by using interpolation functions such as scipy.interpolate.griddata in Python or griddata in MATLAB.

Masking

Ensure that the longitude and latitude variables are not masked. The presence of _FillValue attribute, for example, will cause these variables to be masked. Therefore, make sure this attribute is not present for the longitude and latitude variables.

3. Ocean's Surface East and North Velocity Variables

Unit

There is no restriction on the physical unit of the velocity variables; however, they should be oriented positive eastward (for the east component) and positive northward (for the north component).

Array Dimensions

The east and north ocean's surface velocity variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.

Dimensions Order

The order of dimensions for a velocity variable, named east_vel for instance, is as follows:

Note that the order of dimensions in MATLAB is reversed compared to Python.

Masking

In areas where the velocity is unknown (either due to being located on land or having incomplete data coverage), the velocity variable should be masked using one of the following methods:

4. Ocean's Surface East and North Velocity Error Variables (Optional)

When you select the Generate ensemble of velocity field for uncertainty quantification option within the interface, the east and north velocity error variables are used. However, for uncertainty quantification purposes, you have the alternative option of providing the Geometric Dilution of Precision Variables instead of the velocity error variables.

For further details, refer to Generate Ensemble section.

Unit

The velocity error variables should be expressed as non-negative values and use the same unit as the velocity variable, such as both being in meters per second. If your velocity error values are not in the same unit as the velocity variables (e.g., velocity in meters per second and velocity error in centimeters per second), you can convert the velocity error unit by using the Scale velocity error entry within the interface. This scale factor will be directly multiplied to the error variables in your files.

Array Dimensions

The east and north ocean's surface velocity error variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.

Dimensions Order

The order of dimensions for a velocity error variable, named east_vel for instance, is as follows:

Note that the order of dimensions in MATLAB is reversed compared to Python.

Masking

Unlike the velocity variable, masking the velocity error variables is not mandatory. However, if you choose to apply masks to the velocity error variables, the same rules that apply to the velocity variable should also be followed for the velocity error variables.

5. Geometric Dilution of Precision Variables (Optional)

The Geometric Dilution of Precision (GDOP) is relevant to HF radar datasets, and it quantifies the effect of the geometric configuration of the HF radars on the uncertainty in velocity estimates. To gain a better understanding of the GDOP variables, we recommend referring to Section 2 of [2].

When you select the Generate ensemble of velocity field for uncertainty quantification option within the interface, the Ocean's East and North Velocity Error Variables are used. However, for uncertainty quantification purposes, you have the alternative option of providing the GDOP variables instead of the velocity error variables.

For further details on the usage of GDOP variables, refer to Generate Ensemble section.

Set Scale Velocity Error Entry

When utilizing the GDOP variables instead of the velocity error variables, ensure to specify the Scale velocity error entry within the interface. This value should be set to the radial error of HF radars. The velocity error is then calculated as the product of this scale factor and the GDOP variables.

Unit

The GDOP variables should be expressed as non-negative values. The GDOP variables are dimensionless, however, when the GDOP variables are provided instead of the velocity error, the unit of the Scale velocity error quantity should be the same unit as your velocity variable.

Array Dimensions

The east and north ocean's surface velocity error variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.

Dimensions Order

The order of dimensions for a velocity error variable, named east_vel for instance, is as follows:

Note that the order of dimensions in MATLAB is reversed compared to Python.

Masking

Unlike the velocity variable, masking the velocity error variables is not mandatory. However, if you choose to apply masks to the velocity error variables, the same rules that apply to the velocity variable should also be followed for the velocity error variables.

Providing Input Data

You can provide the input dataset in two different ways:

To select the desired input method, simply use the tabs available in the Input Type section. Details of each method will be described in the following.

1. Providing URL of Remote Data

Finding Opendap URL

Finding OpenDap URL from THREDDS catalog (Click to Play/Stop)

You can provide an OpenDap URL of a remote dataset through the Enter URL text box under the OpenDap URL tab within the Input Type section of the interface.

Finding the OpenDap URL from THREDDS Catalogs

Many providers of geophysical data host their datasets on THREDDS Data servers, which offer OpenDap protocols. The following steps guide you to obtain the OpenDap URL of a remote dataset hosted on a THREDDS server. In the example below, we use a sample HF radar data hosted on our THREDDS server available at https://transport.me.berkeley.edu/thredds.

For a visual demonstration of the steps described above, you may refer to the animated clip.

To help users explore the application, we have included the OpenDap URLs for two sample datasets in the Input Type section of the interface. Simply click on the drop-down menu located on the right side of the Enter URL text box, and choose one of the sample datasets. This selection will automatically populate the URL text box with the corresponding OpenDap URL.

2. Upload Files

You can upload your files under the Upload File tab from the Input Type entry. You can upload one or multiple files (see Multi-File Dataset section below).

Token

To upload data files, you will need to provide an 8-character token string (refer to Enter Token within the interface). Tokens are required to prevent bots from uploading malicious files to the server. To obtain a token, please make a request to ude.yelekreb@ilemas. Tokens are available free of charge and offer unlimited uploads without any expiration date.

Multi-File Datasets

You have the option to provide multiple files. A multi-file datasets can appear in two scenarios:

1. Single Dataset Stored Across Multiple Files

If your dataset is divided into multiple files, where each file represents a distinct part of the data (e.g., different time frames), you can use the NetCDF Markup Language (NcML) to create an ncml file that aggregates all the individual NetCDF files into a single dataset. To provide this multi-file dataset, simply specify the URL of the NcML file. For detailed guidance on using NcML, you can consult the NcML Tutorial.

2. Multiple Separate Datasets, Each within a File

Alternatively, you may have several files, with each file representing an independent dataset. In this case, you can provide all these datasets simultaneously to the interface. An example of such multiple files could be an ensemble obtained from ocean models, where each file corresponds to a velocity ensemble.

The following steps guide you to provide multiple files.

1. Name Your Files with a Numeric Pattern

When providing multiple files, the name of your files (or the URLs) should include a numeric pattern. For instance, you can use the file name format like MyInputxxxxFile.nc where xxxx is the numeric pattern. An example of such data URLs where the pattern ranges from 0000 to 0020 could be:

https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0000File.nc
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0001File.nc
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0002File.nc
...
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0020File.nc

2. Provide Multiple Files

There are two method for providing your multi-file data:

3. Enable Multi-File Processing

Uploading multiple files alone does not automatically enable processing of multiple files. To enable multi-file processing, you need to explicitly enable this feature within the application's interface (refer to Enable processing multiple files (separate datasets) option).

4. Provide File Iterator Range

Within the interface, enter the Minimum file index and Maximum file index to define the range of files to be processed. This allows the application to search through your uploaded files or generate new URLs based on the provided URL to access the other datasets.

For example, in the case of the URLs mentioned earlier, you can enter 0 as the minimum file index and 20 as the maximum file index within the interface. Alternatively, you can specify the full iterator pattern with the leading zeros as 0000 to 0020.

Scan Input Data

After providing the input data, it is recommended that you perform a scan of your dataset by clicking on Scan Data. This button performs a simple check on your data to make sure required variables exists and are readable. In addition, it reads the time span and spatial extent of the data and fills the interface form with these information. This is often useful if you do not have a priori knowledge on the time and spatial extent of your data. In particular, the scan applies the following information to the form:

You might adjust these settings to a desired configuration.

Visualize Input Data

To visualize your input data, click on the Visualize button located in the input data section. This will display the east and north components of the velocity data on the map.

Please keep in mind that if your input dataset is hosted remotely and the remote data server does not allow Cross-Origin Resource Sharing (CORS) policy, the visualization of your dataset may be slower on our end as our server will try to proxy the remote data server to access the data. If you experience slow rendering or encounter an error during visualization, you can choose to skip visualization and proceed with the computation.

Time

In the Time section of the interface, you can specify the time span of the data to be processed. There are two options available under the Specify Time Span tabs: Single time and Time interval.

1. Single Time

Select this option to process a specific time within the input dataset. You can use the Time point entry to specify the desired time. If the chosen time does not exactly match any time stamp in your input data, the closest available time will be used for processing.

2. Time Interval

This option allows you to process a specific time interval within the input dataset. You can define the start and end times by providing the Initial time and Final time entries. If the specified times do not exactly match any time stamps in your input data, the closest available times will be used for processing.

Alternatively, you can select the Process whole dataset times option to process the entire time span within your input data.

Please note that the time interval option cannot be used if you have enabled generating ensemble (refer to Generate Ensemble).

Domain

In the Domain section of the interface, you have the option to specify a spatial subset of the data for processing. By choosing a subset of the domain, the output file will only contain the specified spatial region.

If you select the Use whole dataset bounds option, the entire spatial extent within your input data will be processed. However, please be aware that for large spatial datasets, this option might require a significant amount of time for processing. To optimize efficiency, we recommend subsetting your input dataset to a relevant segment that aligns with your analysis.

Domain Segmentation

The input dataset's grid comprises two distinct sets of points: locations with available velocity data and locations where velocity data is not provided. These regions are referred to as the known domain \(\Omega_k\) and the unknown domain \(\Omega_u\), respectively. Therefore, the complete grid of the input data \(\Omega\) can be decomposed into \(\Omega = \Omega_k \cup \Omega_u\).

The primary objective of data reconstruction is to fill the data gaps within the regions where velocity data is missing. The region of missing data, \(\Omega_m\), is part of the unknown domain \(\Omega_u\). However, the unknown domain contains additional points that are not necessarily missing, such as points located on land, denoted as \(\Omega_l\), or regions of the ocean that are not included in the dataset, which we denote as \(\Omega_o\).

Before proceeding with reconstructing the missing velocity data, it is essential to first identify the missing domain \(\Omega_m\). This involves segmenting the unknown domain \(\Omega_u\) into \(\Omega_u = \Omega_m \cup \Omega_l \cup \Omega_o\). These tasks require the knowledge of the ocean's domain and land domain. You can configure these steps within the interface, as described in Detect Data Domain and Detect Land below.

For detailed information on domain segmentation, we recommend referring to [1].

Detect Data Domain

By the data domain, \(\Omega_d\), we refer to the union of both the known domain \(\Omega_k\) and the missing domain \(\Omega_m\), namely, \(\Omega_d = \Omega_k \cup \Omega_m\). Once the missing velocity field is reconstructed, the combination of both the known and missing domains will become the data domain.

The purpose of the Detect Data Domain section of the interface is to identify \(\Omega_d\). This can be done in two ways:

1. Using Convex Hull

By selecting the Convex hull around available points option, the data domain \(\Omega_d\) is defined as the region enclosed by a convex hull around the known domain \(\Omega_k\). As such, any unknown point inside the convex hull is flagged as missing, and all points outside this convex hull are considered as part of the ocean domain \(\Omega_o\) or land \(\Omega_l\).

2. Using Concave Hull

By selecting the Concave hull around available points option, the data domain \(\Omega_d\) is defined as the region enclosed by a convex hull around the known domain \(\Omega_k\). As such, any unknown point inside the convex hull is flagged as missing, and all points outside this convex hull are considered as part of the ocean domain \(\Omega_o\) or land \(\Omega_l\).

Note that a concave hull (also known as alpha shape) is not unique and is characterized by a radius parameter (refer to Alpha shape radius (in Km) entry within the interface). The radius is the inverse of the \(\alpha\) parameter in alpha-shapes. A smaller radius causes the concave hull to shrink more toward the set of points it is encompassing. Conversely, a larger radius yields a concave hull that is closer to a convex hull. We recommend setting the radius (in the unit of Km) to a few multiples of the grid size. For instance, for an HF radar dataset with a 2 km resolution, where the grid points are spaced 2 km apart, a radius of approximately 10 km works fine for most datasets.

We recommend choosing concave hull over convex hull as it can better identify the data domain within your input files, provided that the radius parameter is tuned appropriately.

Detect Land

In some cases, a part of the convex or concave hull might overlap with the land domain, leading to the mistaken flagging of such intersections as missing domains to be reconstructed. To avoid this issue, it is recommended to detect the land domain \(\Omega_l\) and exclude it from the data domain \(\Omega_d\) if there is any intersection. There are three options available within the interface regarding the treatment of the land domain:

The land boundaries are queried using the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG). For large datasets, we advise against using the third option, as using high accuracy map can significantly increase the processing time for detecting land. For most datasets, we recommend using the second option, as it offers sufficient accuracy while remaining relatively fast.

Illustration of Domain Segmentation

Finding Opendap URL

Domain segmentation

The following figure serves as an illustration of the domain segmentation for an HF radar data in the region of Monterey Bay, as an example. In the left panel, the green domain represents the known area where velocity data is available, while the red domain signifies the region without velocity data. In the right panel, the missing domain is highlighted in red. This domain is determined by the red points from the left panel that fall within the concave hull around the green points. The points located outside the concave hull are considered non-data points, representing the ocean domain. The reconstruct of the velocity field is done within the red points shown in the right panel.

Extend Data Domain to Coastline

If your dataset's data domain is close to land (e.g., in HF radar datasets spanning across coastlines), you can extend the data domain beyond the region identified by the convex or concave hulls, reaching up to the coastline. To achieve this, you can enable the Extend data restoration up to the coastline option within the interface.

Finding Opendap URL

Extending Data Domain up to Coastline

By extending the data domain to the land, a zero boundary condition for the velocity field on the land is imposed. However, note that this assumption is less results in less credible reconstructed field when the coastal gap is large.

The illustration below showcases the impact of activating this feature for the example of Monterey Bay region. Notably, the alteration can be observed in the right panel, where the area between the data domain and the coastline is highlighted in red. This signifies that the gaps extending up to the coastlines will be comprehensively reconstructed.

Refine Grid

With the Refine Grid entry in the interface, you can increase the dataset's grid size by an integer factor along both longitude and latitude axes. This process involves interpolating the data onto a more refined grid. It's important to note that this refinement doesn't enhance the data resolution.

We advise keeping the refinement level at the default value of 1, unless there's a specific reason to refine the grid size. Increasing the refinement level can significantly increase computation time and may not provide additional benefits in most cases.

Generate Ensemble

In addition to reconstructing missing data, RestoreIO offers the functionality to generate ensemble of the velocity vector field. The ensemble serve the purpose of uncertainty quantification, which can be valuable for various applications. For more details on the ensemble generation algorithm, you may refer to [2].

To generate a velocity ensemble, simply enable the Generate ensemble of velocity field for uncertainty quantification option within the interface. Note that ensemble can only be generated for a single time point. As a result, selecting this option will automatically activate the Single Time tab in the Specify Time Span section.

Required Variables

To generate an ensemble, you should provide one of the following additional variables in your input file:

If you choose to provide GDOP variables instead of the velocity error variables, the velocity errors are calculated from GDOP as follows:

$$ \begin{align} \sigma_e &= \sigma_r \mathrm{GDOP}_e, \\ \sigma_n &= \sigma_r \mathrm{GDOP}_n, \end{align} $$

where \(\sigma_e\) and \(\sigma_n\) are the east and north components of the velocity error, \(\mathrm{GDOP_e}\) and \(\mathrm{GDOP}_n\) are the east and north components of the GDOP, respectively, and \(\sigma_r\) is the radar's radial error. You can specify \(\sigma_r\) using the Scale velocity error entry within the interface (also refer to Scale Velocity Errors section below).

Ensemble Generation Settings

The following settings can be set within the Generate Ensemble section of the interface.

Write Samples to Output

The Write samples to output option allows you to save the entire population of ensemble vector fields to the output file. If this option is not enabled, only the mean and standard deviation of the ensemble will be stored. For more details, please refer to the Output Variables section.

Number of (Monte-Carlo) Samples

The Number of (Monte-Carlo) samples entry within the interface enables you to specify the number of samples to be generated. This value should be greater than the number of velocity data points. Keep in mind that the processing time increases linearly with larger sample sizes.

Number of Eigen-Modes

To generate an ensemble, the eigenvalues and eigenvectors of the covariance matrix of the velocity data need to be computed. For a velocity data with \(n\) data points, this means the eigenvalues and eigenvectors of an \(n \times n\) matrix have to be calculated. However, such a computation has a complexity of \(\mathcal{O}(n^3)\), which can be infeasible for large datasets.

To handle this, we employ a practical approach where we only compute a reduced number of \(m\) eigenvalues and eigenvectors of the covariance matrix, where \(m\) can be much smaller than \(n\). This simplification reduces the complexity to \(\mathcal{O}(n m^2)\), which enables us to process larger datasets while maintaining a reasonable level of accuracy. For a better understanding of this concept, we refer the interested reader to Section 4 of [2].

The Number of eigen-modes (in percent) entry within the interface allows you to specify the number of eigenvectors of the data covariance to be utilized in the computations. The number of modes should be given as a percentage of the ratio \(m/n\).

Keep in mind that the processing time quadratically increases with the number of eigenmodes. We recommend setting this value to around 5% to 10% for most datasets.

Kernel Width

The Kernel width entry within the interface represents the width of a spatial kernel used to construct the covariance matrix of the velocity data. The kernel width is measured in the unit of the velocity data points. For example, a kernel width of 5 on an HF radar dataset with a 2 km spatial resolution implies a kernel width of 10 km.

It is assumed that spatial distances larger than the kernel width are uncorrelated. Therefore, reducing the kernel width makes the covariance matrix of the data more sparse, resulting in more efficient processing. However, a smaller kernel width may lead to information loss within the dataset. As a general recommendation, we suggest setting this value to 5 to 20 data points.

Scale Velocity Errors

The Scale velocity error entry serves two purposes:

Output Variables

The results of RestoreIO are stored in a NetCDF file with a .nc format. This file encompasses a selection of the following variables, contingent on the chosen configuration:

  1. Mask
  2. Reconstructed East and North Velocities
  3. East and North Velocity Errors
  4. East and North Velocity Ensemble

In addition to the detailed description of each variable provided below, you can explore a compilation of output variable names, along with their corresponding NetCDF dimensions, in the NetCDF Variables section available within the tutorials.

1. Mask

The mask variable is a three-dimensional array with dimensions for time, longitude, and latitude. This variable is stored under the name mask in the output file.

Interpreting Variable over Segmented Domains

The mask variable includes information about the result of domain segmentation (refer to Domain Segmentation section). This array contains integer values -1, 0, 1, and 2 that are interpreted as follows:

2. Reconstructed East and North Velocities

The reconstructed east and north velocity variables are stored in the output file under the names east_vel and north_vel, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.

Interpreting Variable over Segmented Domains

The velocity variables on each of the segmented domains are defined as follows:

3. East and North Velocity Errors

If the Generate ensemble of velocity field for uncertainty quantification option is enabled within the interface, the east and north velocity error variables will be included in the output file under the names east_err and north_err, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.

Interpreting Variable over Segmented Domains

The velocity error variables on each of the segmented domains are defined as follows:

4. East and North Velocity Ensemble

When you activate the Generate ensemble of velocity field for uncertainty quantification option, a collection of velocity field ensemble is created. Yet, by default, the output file only contains the mean and standard deviation of the ensemble. To incorporate all ensemble samples into the output file, you should additionally enable the Write samples to output option within the interface. This action saves the east and north velocity ensemble variables in the output file as east_vel_ensemble and north_vel_ensemble, respectively. These variables are four-dimensional arrays with dimensions for ensemble, time, longitude, and latitude.

Ensemble Dimension

The ensemble dimension of the array has the size \(s+1\) where \(s\) is the number of samples specified by Number of (Monte-Carlo) samples entry within the interface (also refer to Number of (Monte-Carlo) Samples section). The first ensemble with the index \(0\) (assuming zero-based numbering) corresponds to the original input dataset. The other samples with the indices \(1, \dots, s\) correspond to the generated samples.

Interpreting Variable over Segmented Domains

The velocity ensemble variables on each of the segmented domains are defined similar to those presented for Reconstructed East and North Velocities. In particular, the missing domain of each ensemble is reconstructed independently.

Mean and Standard Deviation of Ensemble

Note that the mean and standard deviation of the velocity ensemble arrays over the ensemble dimension yield the Reconstructed East and North Velocities and East and North Velocity Errors variables, respectively.

Visualize Output Results

After the computation is complete, you can visualize the results by clicking on the Visualize button. This action loads an animated visualization of your results in the map section.

To gain more control over visualization, make use of the Show Workbench button located on the top-left corner of the visualization screen. This feature enables you to set specific color ranges and styles, tailoring the display to your preferences. If you find yourself working with large datasets on a slower internet connection, consider switching to a 2D map view (as opposed to the spherical globe) by clicking the Map button in the upper-right corner of the map screen. Additionally, to reset the view to the standard angle, simply click the circle at the center of the zoom buttons on the right panel of the screen.

Within the workbench, you have the ability to select variables for visualization according to your needs. These options include the original east and north velocities from your input dataset, the reconstructed east and north velocities, and the mask variable. Additionally, if you have enabled the generation of ensemble (refer to the Generate Ensemble section), you can also visualize the east and north velocity errors.

Interactively Compare Original and Reconstructed Datasets

Utilizing the workbench sidebar, you can also engage in an interactive comparison between input and reconstructed velocity datasets. To achieve this, choose two pairs of original and reconstructed velocity variables, such as the original and reconstructed east velocities. This action will overlay both sets of variable fields. By adjusting the opacity of the reconstructed velocity through the slider on the left sidebar, you can manage the visibility of one field while exposing the overlaid field. To see this process in action, you can refer to the provided demo video.

Download Output Results

Note that the output data is stored only during the lifetime of your browser session. Once you refresh or close your browser, both the uploaded input data (if any) and the generated output files will be completely purged from the server.

Download NetCDF File(s)

You can download the output result using Download button. Depending on your selection of single or multi-file dataset (refer to Multi-File Datasets), the output file you download can be one of the following:

1. Output for Single Dataset

If your input consists of a single dataset (either as a single input file or multiple files representing a single dataset), the output result will be a single .nc file.

2. Output for Multi-File Dataset

If your input files represent multiple separate datasets (refer to Multiple Separate Dataset, Each within a File), a distinct output file with a .nc format is generated for each input file (or URL). These output files are named similarly to their corresponding input files. All of these files are then bundled into a .zip file. When you download your results, you will receive this zip file.

View THREDDS Catalog

During the generation of output results, the output NetCDF files are temporarily hosted on our THREDDS data server for the duration of your browser session. As such, you have the option to view the THREDDS catalog page corresponding to your data. The catalog page contains useful information about your output file, allowing you to explore its contents before downloading. You can use services such as OPENDAP to explore the variables, NetcdfSubset to subset your data, and Godiva3 as an alternative visualization tool.

To access the THREDDS catalog of your data, navigate to the dropdown menu on the right corner of the Download button and select View THREDDS Catalog.

Sessions

Session ID

Every user's session is assigned a distinct ID, represented by a 32-character string. This ID serves as both the name for the output files and can be useful for reporting any bugs or issues. You can locate your exclusive identifier in the About section below.

Save and Load Sessions

To conveniently save and retrieve your configuration settings within your session, our application offers to save and load the form contents. By clicking on the Save Form button, you can store all the settings you have entered into a file named FormData.json. This file captures your configuration and can be downloaded for future use. In subsequent visits to the web page, you can easily restore your previous settings by uploading the FormData.json file using the Load Form button. Examples of form files for various tasks can be found in the Produce Output File section within the tutorials provided below.

Tutorial


The following concise tutorial offers guidance on reading the output files using NetCDF library in both Python and MATLAB. If you require general information on utilizing the NetCDF library, you can consult the NetCDF API in Python or NetCDF Files tutorial in MATLAB.

Install NetCDF

To install the NetCDF package in Python, you have two options:

For MATLAB users, the NetCDF library is already included in the MATLAB installation.

Produce Output File

To begin, you can utilize the form files provided below, each of which includes preset settings for a specific task:

Download the desired form and load it into the interface using the Load Form button. This action pre-configures the interface settings, but you can adjust them as needed. Once the settings are ready, scroll to the bottom of the interface and click on the Restore button to initiate the computation. Once the computation is complete, use the Download button to retrieve the output results. The resulting output file will be saved under the name output-dataset.nc.

Read Output File

Below are brief demonstrations on how to read the output file and view its variables using both Python and MATLAB. Keep in mind that the list of output variables in the code examples below may differ based on the specific example you selected.

# Import the netCDF4 package
>>> import netCDF4

# Create a NetCDF object from the data
>>> nc = netCDF4.Dataset('output-dataset.nc')

# Print the list of all dimensions in the file
>>> nc.dimensions.keys()
dict_keys(['time', 'lon', 'lat'])

# Print the list of all variables in the file
>>> nc.variables.keys()
dict_keys(['time', 'lon', 'lat', 'mask', 'east_vel', 'north_vel'])

# Assigning nc objects to variables
>>> t = nc.variables['time']
>>> lon = nc.variables['lon']
>>> lat = nc.variables['lat']
>>> u = nc.variables['east_vel']
>>> v = nc.variables['north_vel']
>> % Display NetCDF file information
>> ncdisp('output-dataset.nc')

>> % Alternative way of getting a summary of file information
>> info = ncinfo('output-dataset.nc');

>> % Get dimensions name
>> info.Dimensions

>> % Get variables info
>> info.Variables

>> % Assigning nc objects to variables
>> t = ncread('output-dataset.nc', 'time');
>> lon = ncread('output-dataset.nc', 'lon');
>> lat = ncread('output-dataset.nc', 'lat');
>> u = ncread('output-dataset.nc', 'east_vel');
>> v = ncread('output-dataset.nc', 'north_vel');

The NetCDF dimensions and variables in the above code are further detailed below.

Output NetCDF Dimensions

The output dataset contains some or all of the following NetCDF dimensions, depending on the configuration:

Name Size Description
ensemble \(s+1\), where \(s\) is the number of (Monte-Carlo) samples. The ensemble index of arrays. This dimension is present in the output file only when the ensemble generation option is enabled.
time The count of time points extracted from the input dataset within the designated time interval. If the ensemble generation option is activated or the Single time tab is chosen, the size is one. The time index of arrays.
lon The count of longitude points extracted from the input dataset within the specified subdomain bounds. The longitude index of arrays.
lat The count of latitude points extracted from the input dataset within the specified subdomain bounds. The latitude index of arrays.

Output NetCDF Variables

The output dataset contains some or all of the following NetCDF variables, depending on the configuration:

Variable Name and Dimensions Notes
Time time[time] This comprises the time points extracted from the input dataset's within the specified time span.
Longitude lon[lon] This comprises the longitudes extracted from the input dataset's rectangular grid, within the specified subdomain bounds.
Latitude lat[lat] This comprises the latitudes extracted from the input dataset's rectangular grid, within the specified subdomain bounds.
Mask mask[lat, lon] This variable represents the segmentation of the domain into ocean, land, known velocity and unknown velocity domains.
Reconstructed East Velocity east_vel[time, lat, lon] If the ensemble generation option is enabled, this variable represents the mean of ensemble_east_vel over the ensemble dimension, with a time dimension size of one.
Reconstructed North Velocity north_vel[time, lat, lon] If the ensemble generation option is enabled, this variable represents the mean of ensemble_north_vel over the ensemble dimension, with a time dimension size of one.
East Velocity Error east_err[time, lat, lon] This variable represents the standard deviation of ensemble_east_err over the ensemble dimension, with a time dimension size of one. This variable is present in the output dataset only if ensemble generation option is enabled.
North Velocity Error north_err[time, lat, lon] This variable represents the standard deviation of ensemble_north_err over the ensemble dimension, with a time dimension size of one. This variable is present in the output dataset only if ensemble generation option is enabled.
East Velocity Ensemble east_vel_ensemble[ensemble, lat, lon] This variable is present in the output dataset only if ensemble generation option is enabled.
North Velocity Ensemble north_vel_ensemble[ensemble, lat, lon] This variable is present in the output dataset only if ensemble generation option is enabled.

The indexing for the variables listed in the above table is presented for Python. It's important to note that the indexing of NetCDF variables in MATLAB follows a reverse ordering. For instance, a variable like east_vel[time, lat, lon] in Python should be interpreted as east_vel(lon, lat, time) in MATLAB.

Gallery

Comparison of Original and Restored Datasets





Martha's Vineyard 2014

Martha's Vineyard, MA


Comparison of original versus reconstructed velocity data for the HF radar data at the Martha's Vineyard, MA on July 2014.

Select the Visualize button and utilize the Opacity slider on the left sidebar of the map to dynamically compare the superimposed representations of the original and reconstructed data fields.

Monterey Bay

Monterey Bay, CA


Comparison of original versus reconstructed velocity data for the HF radar data at the Monterey Bay, CA, on January 2017.

Select the Visualize button and utilize the Opacity slider on the left sidebar of the map to dynamically compare the superimposed representations of the original and reconstructed data fields.



About


Citations

If you find RestoreIO valuable for your research, we kindly ask that you recognize the original contributions by citing the following references:


[1] Ameli, S. and Shadden, S. C. (2019). A transport method for restoring incomplete ocean current measurements. Journal of Geophysical Research: Oceans, 124, 227– 242. doi: https://doi.org/10.1029/2018JC014254 PDF arXiv
@article{https://doi.org/10.1029/2018JC014254,
    author  = {Ameli, Siavash and Shadden, Shawn C.},
    title   = {A Transport Method for Restoring Incomplete Ocean Current Measurements},
    journal = {Journal of Geophysical Research: Oceans},
    volume  = {124},
    number  = {1},
    pages   = {227-242},
    doi     = {https://doi.org/10.1029/2018JC014254},
    url     = {https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018JC014254},
    eprint  = {https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2018JC014254},
    year    = {2019}
}
[2] Ameli, S. and Shadden, S. C. (2023). Stochastic Modeling of HF Radar Data for Uncertainty Quantification and Gap Filling. arXiv: 2206.09976 [physics.ao-ph]. PDF arXiv
@misc{arxiv.2206.09976,
    doi           = {10.48550/ARXIV.2206.09976},
    title         = {Stochastic Modeling of HF Radar Data for Uncertainty Quantification and Gap Filling},
    author        = {Ameli, S. and Shadden, S. C.},
    year          = {2023},
    archivePrefix = {arXiv},
    eprint        = {2206.09976},
    primaryClass  = {physics.ao-ph},
    howpublished  = {\emph{arXiv}: 2206.09976 [physics.ao-ph]},
}
[3] RestoreIO - An online computational service to restore incomplete oceanographic datasets. Available at https://restoreio.org
@misc{restoreio,
    title = {RestoreIO - An online computational service to restore incomplete oceanographic datasets},
    howpublished = {Available at \url{https://restoreio.org}},
}

Companion Applications

We are delighted to introduce TraceFlows, a companion tool to RestoreIO that offers high-performance computational services for Lagrangian analysis of geophysical fluid flows, with a primary focus on ocean surface currents. TraceFlows is designed to be seamlessly compatible with RestoreIO outputs, allowing users to preprocess their data and reconstruct missing spatial gaps with RestoreIO before transitioning to TraceFlows for in-depth Lagrangian analysis.

You can access TraceFlows at https://traceflows.org.


Reporting Issues

Issues and comments may be addressed to ude.yelekreb@ilemas. Please provide the Session ID within your email to better identify issues. Your current session ID is:

Session ID:

Privacy Policy


Acknowledgement

This material is based upon work supported by the National Science Foundation grant No. 1520825.