User Guide

RestoreIO is an online computational service to Restore Incomplete Oceanographic datasets, with a specific focus on ocean surface velocity data. This gateway can also generate data ensemble and perform statistical analysis, which allows uncertainty qualification of such datasets.

This user guide offers both a quick overview of the application usage and more in-depth details. We recommend starting with the Getting Started section and then acquainting yourself with the interface detailed in the following sections.

Getting Started
Preparing Input Data
Providing Input Data
- 1. Providing URL of Remote Data
- 2. Upload Files
Multi-File Datasets
- 1. Single Dataset Stored Across Multiple Files
- 2. Multiple Separate Datasets, Each within a File
Scan Input Data
Visualize Input Data
Time
Domain
Domain Segmentation
Generate Ensemble
- Required Variables
- Ensemble Generation Settings
Output Variables
Visualize Output Results
- Interactively Compare Original and Reconstructed Datasets
Download Output Results
- Download NetCDF File(s)
- View THREDDS Catalog
Sessions
- Session ID
- Save and Load Sessions

Getting Started

Network Connection

When accessing our online service, your web browser communicates with our computing server through ports 3000 to 3010. However, firewalls implemented by certain internet connections, such as public WiFi networks, including eduroams, may block these ports. In the event of a connection failure, you will receive a notification message. To ensure uninterrupted access to our service, we recommend switching to an alternative network connection.

Demo Video

To get a quick overview on the interface, we recommend watching a short demo video.

Quick Start Using a Sample Dataset

To quickly familiarize yourself with the application, you can follow these steps to run a minimalistic example on a sample dataset:

Click on the dropdown menu located on the right side of the Enter URL text box (note that for this example, you don't need to provide any URL manually). From the menu, choose an example dataset that you would like to use. For instance, you can select Martha's Vineyard 2014. This selection will automatically fill the text box with the dataset's URL and populate the rest of the form with example settings that align with this dataset.
Scroll to the end of the form and click on the Restore button. Allow a few moments for the computation to complete.
Once the computation finishes, you can click on the Download button to save the output file.
To visualize the result, click on the Visualize button. This will direct you to the map section. Within the map interface, you will find a bottom panel bar with a play button. Click on the play button to initiate the animation and explore the visual representation of the results.

Preparing Input Data

This section offers comprehensive guidance on preparing your datasets to ensure their compatibility as input for RestoreIO.

File Format

The input dataset can consist of one or multiple files, which should adhere to the following formats:

NetCDF file format with file extensions .nc, .nc4, .ncd, or .nc.gz.
NcML file format with file extensions .ncml or .ncml.gz. For more information on NcML files, see Single Dataset Stored Across Multiple Files.

Note that it is also acceptable to provide a NetCDF file without a file extension.

Best Practice for NetCDF Files

It is highly recommended to save your data file in NetCDF4 format instead of NetCDF3. For more information, refer to NetCDF4 Package documentation in Python or NetCDF Files in MATLAB.

Also, you may follow the best practice of preparing a NetCDF file, which involves passing the CF 1.8 compliance test using online CF-convention compliance checker. In particular, this involves adding the standard_name attribute to your variables (see Required NetCDF Variables section below). For reference, you can consult the comprehensive list of CF compliance standard names.

Required NetCDF Variables

An input NetCDF file to our application should include all the variables listed in the table below. To ensure proper detection of these variables by our application, each variable should include at least one of the attributes: standard_name, name, or both, as listed in the table. Note that checking the (standard) name is done in a case-insensitive manner.

Variable	Acceptable Standard Names	Acceptable Names
Time	`time`	`t`, `time`, `datetime`
Longitude	`longitude`	`lon`, `long`, `longitude`
Latitude	`latitude`	`lat`, `latitude`
Ocean's Surface East Velocity	`surface_eastward_sea_water_velocity` `eastward_sea_water_velocity` `surface_geostrophic_eastward_sea_water_velocity` `surface_geostrophic_sea_water_x_velocity` `surface_geostrophic_eastward_sea_water_velocity_assuming_sea_level_for_geoid` `surface_eastward_geostrophic_sea_water_velocity_assuming_sea_level_for_geoid` `surface_geostrophic_sea_water_x_velocity_assuming_mean_sea_level_for_geoid` `surface_geostrophic_sea_water_x_velocity_assuming_sea_level_for_geoid` `surface_geostrophic_eastward_sea_water_velocity_assuming_mean_sea_level_for_geoid` `sea_water_x_velocity` `x_sea_water_velocity`	`east_vel` `eastward_vel` `u` `ugos` `east_velocity` `eastward_velocity` `u-velocity`
Ocean's Surface North Velocity	`surface_northward_sea_water_velocity` `northward_sea_water_velocity` `surface_geostrophic_northward_sea_water_velocity` `surface_geostrophic_sea_water_y_velocity` `surface_geostrophic_northward_sea_water_velocity_assuming_sea_level_for_geoid` `surface_northward_geostrophic_sea_water_velocity_assuming_sea_level_for_geoid` `surface_geostrophic_sea_water_y_velocity_assuming_mean_sea_level_for_geoid` `surface_geostrophic_sea_water_y_velocity_assuming_sea_level_for_geoid` `surface_geostrophic_northward_sea_water_velocity_assuming_mean_sea_level_for_geoid` `sea_water_y_velocity` `y_sea_water_velocity`	`north_vel` `northward_vel` `v` `vgos` `north_velocity` `northward_velocity` `v-velocity`

Optional NetCDF Variables

Apart from the required variables mentioned above, you have the option to include the following additional variables in your input file. Note that there is no standard name established for these variables, so you should provide a name attribute according to the table. These variables are used exclusively for the purposes of uncertainty quantification by generating data ensemble. For more details, you may refer to the Generate Ensemble section.

Variable	Acceptable Standard Names	Acceptable Names
Ocean's Surface East Velocity Error	N/A	`east_err`, `east_error`
Ocean's Surface North Velocity Error	N/A	`north_err`, `north_error`
Geometric Dilution of Precision (East Component)	N/A	`dopx`, `gdopx`
Geometric Dilution of Precision (North Component)	N/A	`dopy`, `gdopy`

The following provides further details for each of the variables listed in the tables above.

1. Time Variable

The time variable should be a one-dimensional array and strictly increases in values.

Optional Attributes

units: a string specifying both the time unit (such as years, months, days, hours, minutes, seconds or microseconds) and the origin of the time axis (such as since 1970-01-01 00:00:00 UTC). If this attribute is not provided, the default assumption is days since 1970-01-01 00:00:00 UTC.
calendar: a string indicating the time calendar. If this attribute is not provided, the default assumption is gregorian.

Masking

Ensure that the time variable is not masked. If the _FillValue attribute is included, the variable will be masked. Therefore, make sure this attribute is not present for the time variable.

2. Longitude and Latitude Variables

These variables should be one-dimensional arrays, each representing an axis of a rectilinear grid. The values in both longitude and latitude arrays should either strictly increase or strictly decrease. The units of the arrays should be degrees positive eastward (for longitude) and degrees positive northward (for latitude).

Data on Irregular Grids

Our application is designed to process data on rectilinear grids which are presented by one-dimensional longitude and latitude arrays. However, if your data is on irregular grids represented by two-dimensional longitude and latitude arrays, you can remap the data to a rectilinear grid by using interpolation functions such as scipy.interpolate.griddata in Python or griddata in MATLAB.

Masking

Ensure that the longitude and latitude variables are not masked. The presence of _FillValue attribute, for example, will cause these variables to be masked. Therefore, make sure this attribute is not present for the longitude and latitude variables.

3. Ocean's Surface East and North Velocity Variables

Unit

There is no restriction on the physical unit of the velocity variables; however, they should be oriented positive eastward (for the east component) and positive northward (for the north component).

Array Dimensions

The east and north ocean's surface velocity variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.

Dimensions Order

The order of dimensions for a velocity variable, named east_vel for instance, is as follows:

For three dimensional arrays, the order should be east_vel[time, lat, lon] in Python and east_vel(lon, lat, time) in MATLAB.
For four dimensional arrays, the order should be east_vel[time, depth, lat, lon] in Python and east_vel(lon, lat, depth, time) in MATLAB.

Note that the order of dimensions in MATLAB is reversed compared to Python.

Masking

In areas where the velocity is unknown (either due to being located on land or having incomplete data coverage), the velocity variable should be masked using one of the following methods:

The recommended approach is to use masked arrays such as by numpy.ma.MaskArray class in Python or netcdf.defVarFill function in MATLAB (only for NetCDF4).
Set the velocity value on such locations to a large number such as 9999.0 and assign the attribute missing_value or _FillValue with this value.
Set the velocity value on such locations to NaN.

4. Ocean's Surface East and North Velocity Error Variables (Optional)

When you select the Generate ensemble of velocity field for uncertainty quantification option within the interface, the east and north velocity error variables are used. However, for uncertainty quantification purposes, you have the alternative option of providing the Geometric Dilution of Precision Variables instead of the velocity error variables.

For further details, refer to Generate Ensemble section.

Unit

The velocity error variables should be expressed as non-negative values and use the same unit as the velocity variable, such as both being in meters per second. If your velocity error values are not in the same unit as the velocity variables (e.g., velocity in meters per second and velocity error in centimeters per second), you can convert the velocity error unit by using the Scale velocity error entry within the interface. This scale factor will be directly multiplied to the error variables in your files.

Array Dimensions

The east and north ocean's surface velocity error variables should be three-dimensional arrays that include dimensions for time, longitude, and latitude. However, you can also provide four-dimensional arrays, where an additional dimension represents depth. In the latter case, only the first index of the depth dimension (representing the surface at near zero depth) will be read from these variables.

Dimensions Order

The order of dimensions for a velocity error variable, named east_vel for instance, is as follows:

For three dimensional arrays, the order should be east_vel[time, lat, lon] in Python and east_vel(lon, lat, time) in MATLAB.
For four dimensional arrays, the order should be east_vel[time, depth, lat, lon] in Python and east_vel(lon, lat, depth, time) in MATLAB.

Note that the order of dimensions in MATLAB is reversed compared to Python.

Masking

Unlike the velocity variable, masking the velocity error variables is not mandatory. However, if you choose to apply masks to the velocity error variables, the same rules that apply to the velocity variable should also be followed for the velocity error variables.

5. Geometric Dilution of Precision Variables (Optional)

The Geometric Dilution of Precision (GDOP) is relevant to HF radar datasets, and it quantifies the effect of the geometric configuration of the HF radars on the uncertainty in velocity estimates. To gain a better understanding of the GDOP variables, we recommend referring to Section 2 of [2].

When you select the Generate ensemble of velocity field for uncertainty quantification option within the interface, the Ocean's East and North Velocity Error Variables are used. However, for uncertainty quantification purposes, you have the alternative option of providing the GDOP variables instead of the velocity error variables.

For further details on the usage of GDOP variables, refer to Generate Ensemble section.

Set Scale Velocity Error Entry

When utilizing the GDOP variables instead of the velocity error variables, ensure to specify the Scale velocity error entry within the interface. This value should be set to the radial error of HF radars. The velocity error is then calculated as the product of this scale factor and the GDOP variables.

Unit

The GDOP variables should be expressed as non-negative values. The GDOP variables are dimensionless, however, when the GDOP variables are provided instead of the velocity error, the unit of the Scale velocity error quantity should be the same unit as your velocity variable.

Array Dimensions

Dimensions Order

The order of dimensions for a velocity error variable, named east_vel for instance, is as follows:

For three dimensional arrays, the order should be east_vel[time, lat, lon] in Python and east_vel(lon, lat, time) in MATLAB.
For four dimensional arrays, the order should be east_vel[time, depth, lat, lon] in Python and east_vel(lon, lat, depth, time) in MATLAB.

Note that the order of dimensions in MATLAB is reversed compared to Python.

Masking

Providing Input Data

You can provide the input dataset in two different ways:

By specifying the URL of data hosted on remote THREDDS data servers.
By uploading your own data files from your local machine.

To select the desired input method, simply use the tabs available in the Input Type section. Details of each method will be described in the following.

Note About Tabs:

During the execution of a computation, only the currently active tab among the available tab options will be taken into consideration, regardless of whether you have entered data in other tabs. For example, if you have uploaded a file under the Upload File tab, but later you accidentally switched to the OpenDap URL tab, your selection will be interpreted as using the OpenDap URL option. However, the file you have uploaded will not be discarded. This rule applies to all tab selections within the form.

1. Providing URL of Remote Data

Finding OpenDap URL from THREDDS catalog (Click to Play/Stop)

You can provide an OpenDap URL of a remote dataset through the Enter URL text box under the OpenDap URL tab within the Input Type section of the interface.

Finding the OpenDap URL from THREDDS Catalogs

Many providers of geophysical data host their datasets on THREDDS Data servers, which offer OpenDap protocols. The following steps guide you to obtain the OpenDap URL of a remote dataset hosted on a THREDDS server. In the example below, we use a sample HF radar data hosted on our THREDDS server available at https://transport.me.berkeley.edu/thredds.

Visit the catalog webpage of the dataset.
From the list of Service, select the OPENDAP service. This brings you to the OPENDAP Dataset Access Form for this dataset.
From the OPENDAP Dataset Access Form, find the Data URL text box. This contains the OpenDap URL of this dataset, which is:
```
https://transport.me.berkeley.edu/thredds/dodsC/root/WHOI-HFR/WHOI_HFR_2014_restored.nc
```

For a visual demonstration of the steps described above, you may refer to the animated clip.

To help users explore the application, we have included the OpenDap URLs for two sample datasets in the Input Type section of the interface. Simply click on the drop-down menu located on the right side of the Enter URL text box, and choose one of the sample datasets. This selection will automatically populate the URL text box with the corresponding OpenDap URL.

2. Upload Files

You can upload your files under the Upload File tab from the Input Type entry. You can upload one or multiple files (see Multi-File Dataset section below).

Token

To upload data files, you will need to provide an 8-character token string (refer to Enter Token within the interface). Tokens are required to prevent bots from uploading malicious files to the server. To obtain a token, please make a request to ude.yelekreb@ilemas. Tokens are available free of charge and offer unlimited uploads without any expiration date.

Multi-File Datasets

You have the option to provide multiple files. A multi-file datasets can appear in two scenarios:

1. Single Dataset Stored Across Multiple Files

If your dataset is divided into multiple files, where each file represents a distinct part of the data (e.g., different time frames), you can use the NetCDF Markup Language (NcML) to create an ncml file that aggregates all the individual NetCDF files into a single dataset. To provide this multi-file dataset, simply specify the URL of the NcML file. For detailed guidance on using NcML, you can consult the NcML Tutorial.

2. Multiple Separate Datasets, Each within a File

Alternatively, you may have several files, with each file representing an independent dataset. In this case, you can provide all these datasets simultaneously to the interface. An example of such multiple files could be an ensemble obtained from ocean models, where each file corresponds to a velocity ensemble.

The following steps guide you to provide multiple files.

1. Name Your Files with a Numeric Pattern

When providing multiple files, the name of your files (or the URLs) should include a numeric pattern. For instance, you can use the file name format like MyInputxxxxFile.nc where xxxx is the numeric pattern. An example of such data URLs where the pattern ranges from 0000 to 0020 could be:

https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0000File.nc
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0001File.nc
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0002File.nc
...
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0020File.nc

2. Provide Multiple Files

There are two method for providing your multi-file data:

Uploading your own files: When uploading your files, ensure that you select all the files together before clicking the upload button. You can hold the CTRL key while selecting multiple files to upload them simultaneously.
Provide URL: If you choose to provide URLs for remote data, you only need to provide the URL corresponding to one of your files. For example, you can enter:
```
https://transport.me.berkeley.edu/thredds/dodsC/public/SomeDirectory/MyInput0015File.nc
```
Our application can parse the URL pattern and generate the URLs for the other remote files automatically.

3. Enable Multi-File Processing

Uploading multiple files alone does not automatically enable processing of multiple files. To enable multi-file processing, you need to explicitly enable this feature within the application's interface (refer to Enable processing multiple files (separate datasets) option).

4. Provide File Iterator Range

Within the interface, enter the Minimum file index and Maximum file index to define the range of files to be processed. This allows the application to search through your uploaded files or generate new URLs based on the provided URL to access the other datasets.

For example, in the case of the URLs mentioned earlier, you can enter 0 as the minimum file index and 20 as the maximum file index within the interface. Alternatively, you can specify the full iterator pattern with the leading zeros as 0000 to 0020.

Scan Input Data

After providing the input data, it is recommended that you perform a scan of your dataset by clicking on Scan Data. This button performs a simple check on your data to make sure required variables exists and are readable. In addition, it reads the time span and spatial extent of the data and fills the interface form with these information. This is often useful if you do not have a priori knowledge on the time and spatial extent of your data. In particular, the scan applies the following information to the form:

The Initial time and Time point are set to the first time of the dataset.
The Final time is set to the last time of the dataset.
The longitude and latitude bounds are set to the full extent of the input dataset domain.

You might adjust these settings to a desired configuration.

Visualize Input Data

To visualize your input data, click on the Visualize button located in the input data section. This will display the east and north components of the velocity data on the map.

Please keep in mind that if your input dataset is hosted remotely and the remote data server does not allow Cross-Origin Resource Sharing (CORS) policy, the visualization of your dataset may be slower on our end as our server will try to proxy the remote data server to access the data. If you experience slow rendering or encounter an error during visualization, you can choose to skip visualization and proceed with the computation.

Time

In the Time section of the interface, you can specify the time span of the data to be processed. There are two options available under the Specify Time Span tabs: Single time and Time interval.

1. Single Time

Select this option to process a specific time within the input dataset. You can use the Time point entry to specify the desired time. If the chosen time does not exactly match any time stamp in your input data, the closest available time will be used for processing.

2. Time Interval

This option allows you to process a specific time interval within the input dataset. You can define the start and end times by providing the Initial time and Final time entries. If the specified times do not exactly match any time stamps in your input data, the closest available times will be used for processing.

Alternatively, you can select the Process whole dataset times option to process the entire time span within your input data.

Please note that the time interval option cannot be used if you have enabled generating ensemble (refer to Generate Ensemble).

Domain

In the Domain section of the interface, you have the option to specify a spatial subset of the data for processing. By choosing a subset of the domain, the output file will only contain the specified spatial region.

If you select the Use whole dataset bounds option, the entire spatial extent within your input data will be processed. However, please be aware that for large spatial datasets, this option might require a significant amount of time for processing. To optimize efficiency, we recommend subsetting your input dataset to a relevant segment that aligns with your analysis.

Domain Segmentation

The input dataset's grid comprises two distinct sets of points: locations with available velocity data and locations where velocity data is not provided. These regions are referred to as the known domain $\Omega_k$ and the unknown domain $\Omega_u$, respectively. Therefore, the complete grid of the input data $\Omega$ can be decomposed into $\Omega = \Omega_k \cup \Omega_u$.

The primary objective of data reconstruction is to fill the data gaps within the regions where velocity data is missing. The region of missing data, $\Omega_m$, is part of the unknown domain $\Omega_u$. However, the unknown domain contains additional points that are not necessarily missing, such as points located on land, denoted as $\Omega_l$, or regions of the ocean that are not included in the dataset, which we denote as $\Omega_o$.

Before proceeding with reconstructing the missing velocity data, it is essential to first identify the missing domain $\Omega_m$. This involves segmenting the unknown domain $\Omega_u$ into $\Omega_u = \Omega_m \cup \Omega_l \cup \Omega_o$. These tasks require the knowledge of the ocean's domain and land domain. You can configure these steps within the interface, as described in Detect Data Domain and Detect Land below.

For detailed information on domain segmentation, we recommend referring to [1].

Detect Data Domain

By the data domain, $\Omega_d$, we refer to the union of both the known domain $\Omega_k$ and the missing domain $\Omega_m$, namely, $\Omega_d = \Omega_k \cup \Omega_m$. Once the missing velocity field is reconstructed, the combination of both the known and missing domains will become the data domain.

The purpose of the Detect Data Domain section of the interface is to identify $\Omega_d$. This can be done in two ways:

1. Using Convex Hull

By selecting the Convex hull around available points option, the data domain $\Omega_d$ is defined as the region enclosed by a convex hull around the known domain $\Omega_k$. As such, any unknown point inside the convex hull is flagged as missing, and all points outside this convex hull are considered as part of the ocean domain $\Omega_o$ or land $\Omega_l$.

2. Using Concave Hull

By selecting the Concave hull around available points option, the data domain $\Omega_d$ is defined as the region enclosed by a convex hull around the known domain $\Omega_k$. As such, any unknown point inside the convex hull is flagged as missing, and all points outside this convex hull are considered as part of the ocean domain $\Omega_o$ or land $\Omega_l$.

Note that a concave hull (also known as alpha shape) is not unique and is characterized by a radius parameter (refer to Alpha shape radius (in Km) entry within the interface). The radius is the inverse of the $\alpha$ parameter in alpha-shapes. A smaller radius causes the concave hull to shrink more toward the set of points it is encompassing. Conversely, a larger radius yields a concave hull that is closer to a convex hull. We recommend setting the radius (in the unit of Km) to a few multiples of the grid size. For instance, for an HF radar dataset with a 2 km resolution, where the grid points are spaced 2 km apart, a radius of approximately 10 km works fine for most datasets.

We recommend choosing concave hull over convex hull as it can better identify the data domain within your input files, provided that the radius parameter is tuned appropriately.

Detect Land

In some cases, a part of the convex or concave hull might overlap with the land domain, leading to the mistaken flagging of such intersections as missing domains to be reconstructed. To avoid this issue, it is recommended to detect the land domain $\Omega_l$ and exclude it from the data domain $\Omega_d$ if there is any intersection. There are three options available within the interface regarding the treatment of the land domain:

Do not detect land, assume all grid is in ocean
Detect and exclude land
Detect and exclude land (high accuracy, very slow)

The land boundaries are queried using the Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG). For large datasets, we advise against using the third option, as using high accuracy map can significantly increase the processing time for detecting land. For most datasets, we recommend using the second option, as it offers sufficient accuracy while remaining relatively fast.

Illustration of Domain Segmentation

Domain segmentation

The following figure serves as an illustration of the domain segmentation for an HF radar data in the region of Monterey Bay, as an example. In the left panel, the green domain represents the known area where velocity data is available, while the red domain signifies the region without velocity data. In the right panel, the missing domain is highlighted in red. This domain is determined by the red points from the left panel that fall within the concave hull around the green points. The points located outside the concave hull are considered non-data points, representing the ocean domain. The reconstruct of the velocity field is done within the red points shown in the right panel.

Extend Data Domain to Coastline

If your dataset's data domain is close to land (e.g., in HF radar datasets spanning across coastlines), you can extend the data domain beyond the region identified by the convex or concave hulls, reaching up to the coastline. To achieve this, you can enable the Extend data restoration up to the coastline option within the interface.

Extending Data Domain up to Coastline

By extending the data domain to the land, a zero boundary condition for the velocity field on the land is imposed. However, note that this assumption is less results in less credible reconstructed field when the coastal gap is large.

The illustration below showcases the impact of activating this feature for the example of Monterey Bay region. Notably, the alteration can be observed in the right panel, where the area between the data domain and the coastline is highlighted in red. This signifies that the gaps extending up to the coastlines will be comprehensively reconstructed.

Refine Grid

With the Refine Grid entry in the interface, you can increase the dataset's grid size by an integer factor along both longitude and latitude axes. This process involves interpolating the data onto a more refined grid. It's important to note that this refinement doesn't enhance the data resolution.

We advise keeping the refinement level at the default value of 1, unless there's a specific reason to refine the grid size. Increasing the refinement level can significantly increase computation time and may not provide additional benefits in most cases.

Generate Ensemble

In addition to reconstructing missing data, RestoreIO offers the functionality to generate ensemble of the velocity vector field. The ensemble serve the purpose of uncertainty quantification, which can be valuable for various applications. For more details on the ensemble generation algorithm, you may refer to [2].

To generate a velocity ensemble, simply enable the Generate ensemble of velocity field for uncertainty quantification option within the interface. Note that ensemble can only be generated for a single time point. As a result, selecting this option will automatically activate the Single Time tab in the Specify Time Span section.

Required Variables

To generate an ensemble, you should provide one of the following additional variables in your input file:

Ocean's Surface East and North Velocity Error Variables
Geometric Dilution of Precision Variables

If you choose to provide GDOP variables instead of the velocity error variables, the velocity errors are calculated from GDOP as follows:

$$ \begin{align} \sigma_e &= \sigma_r \mathrm{GDOP}_e, \\ \sigma_n &= \sigma_r \mathrm{GDOP}_n, \end{align} $$

where $\sigma_e$ and $\sigma_n$ are the east and north components of the velocity error, $\mathrm{GDOP_e}$ and $\mathrm{GDOP}_n$ are the east and north components of the GDOP, respectively, and $\sigma_r$ is the radar's radial error. You can specify $\sigma_r$ using the Scale velocity error entry within the interface (also refer to Scale Velocity Errors section below).

Ensemble Generation Settings

The following settings can be set within the Generate Ensemble section of the interface.

Write Samples to Output

The Write samples to output option allows you to save the entire population of ensemble vector fields to the output file. If this option is not enabled, only the mean and standard deviation of the ensemble will be stored. For more details, please refer to the Output Variables section.

Number of (Monte-Carlo) Samples

The Number of (Monte-Carlo) samples entry within the interface enables you to specify the number of samples to be generated. This value should be greater than the number of velocity data points. Keep in mind that the processing time increases linearly with larger sample sizes.

Number of Eigen-Modes

To generate an ensemble, the eigenvalues and eigenvectors of the covariance matrix of the velocity data need to be computed. For a velocity data with $n$ data points, this means the eigenvalues and eigenvectors of an $n \times n$ matrix have to be calculated. However, such a computation has a complexity of $\mathcal{O}(n^3)$, which can be infeasible for large datasets.

To handle this, we employ a practical approach where we only compute a reduced number of $m$ eigenvalues and eigenvectors of the covariance matrix, where $m$ can be much smaller than $n$. This simplification reduces the complexity to $\mathcal{O}(n m^2)$, which enables us to process larger datasets while maintaining a reasonable level of accuracy. For a better understanding of this concept, we refer the interested reader to Section 4 of [2].

The Number of eigen-modes (in percent) entry within the interface allows you to specify the number of eigenvectors of the data covariance to be utilized in the computations. The number of modes should be given as a percentage of the ratio $m/n$.

Keep in mind that the processing time quadratically increases with the number of eigenmodes. We recommend setting this value to around 5% to 10% for most datasets.

Kernel Width

The Kernel width entry within the interface represents the width of a spatial kernel used to construct the covariance matrix of the velocity data. The kernel width is measured in the unit of the velocity data points. For example, a kernel width of 5 on an HF radar dataset with a 2 km spatial resolution implies a kernel width of 10 km.

It is assumed that spatial distances larger than the kernel width are uncorrelated. Therefore, reducing the kernel width makes the covariance matrix of the data more sparse, resulting in more efficient processing. However, a smaller kernel width may lead to information loss within the dataset. As a general recommendation, we suggest setting this value to 5 to 20 data points.

Scale Velocity Errors

The Scale velocity error entry serves two purposes:

If the Ocean's Surface East and North Velocity Error Variables are included in the input dataset, the provided scale value is multiplied by the velocity error. This is useful to match the unit of the velocity error to the unit of the velocity data if they are not in the same unit. If you have velocity errors in the same unit as the velocity data, it is recommended to set this quantity to 1.
If the Geometric Dilution of Precision (GDOP) Variables are included in the input dataset, the given scale value is interpreted as the HF radar's radial error, $\sigma_r$. In this case, the velocity error is calculated by multiplying the radar's radial error by the GDOP variables. The typical range for the radial errors of HF radars is between 0.05 to 0.20 meters per second.

Output Variables

The results of RestoreIO are stored in a NetCDF file with a .nc format. This file encompasses a selection of the following variables, contingent on the chosen configuration:

Mask
Reconstructed East and North Velocities
East and North Velocity Errors
East and North Velocity Ensemble

In addition to the detailed description of each variable provided below, you can explore a compilation of output variable names, along with their corresponding NetCDF dimensions, in the NetCDF Variables section available within the tutorials.

1. Mask

The mask variable is a three-dimensional array with dimensions for time, longitude, and latitude. This variable is stored under the name mask in the output file.

Interpreting Variable over Segmented Domains

The mask variable includes information about the result of domain segmentation (refer to Domain Segmentation section). This array contains integer values -1, 0, 1, and 2 that are interpreted as follows:

The value -1 indicates the location is identified to be on the land domain $\Omega_l$. In these locations, the output velocity variable is masked.
The value 0 indicates the location is identified to be on the known domain $\Omega_k$. These locations have velocity data in the input file. The same velocity values are preserved in the output file.
The value 1 indicates the location is identified to be on the missing domain $\Omega_m$. These locations do not have a velocity data in the input file, but they do have a reconstructed velocity data on the output file.
The value 2 indicates the location is identified to be on the ocean domain $\Omega_o$. In these locations, the output velocity variable is masked.

2. Reconstructed East and North Velocities

The reconstructed east and north velocity variables are stored in the output file under the names east_vel and north_vel, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.

Interpreting Variable over Segmented Domains

The velocity variables on each of the segmented domains are defined as follows:

On locations where the Mask value is -1 or 2, the output velocity variables are masked.
On locations where the Mask value is 0, the output velocity variables have the same values as the corresponding variables in the input file.
On locations where the Mask value is 1, the output velocity variables are reconstructed. If the Generate ensemble of velocity field for uncertainty quantification is enabled, these output velocity variables are obtained by the mean of the velocity ensemble, where the missing domain of each ensemble is reconstructed.

3. East and North Velocity Errors

If the Generate ensemble of velocity field for uncertainty quantification option is enabled within the interface, the east and north velocity error variables will be included in the output file under the names east_err and north_err, respectively. These variables are three-dimensional arrays with dimensions for time, longitude, and latitude.

Interpreting Variable over Segmented Domains

The velocity error variables on each of the segmented domains are defined as follows:

On locations where the Mask value is -1 or 2, the output velocity error variables are masked.
On locations where the Mask value is 0, the output velocity error variables are obtained from either the corresponding velocity error or GDOP variables in the input file scaled by the value of Scale velocity error.
On locations where the Mask value is 1, the output velocity error variables are obtained from the standard deviation of the ensemble, where the missing domain of each ensemble is reconstructed.

4. East and North Velocity Ensemble

When you activate the Generate ensemble of velocity field for uncertainty quantification option, a collection of velocity field ensemble is created. Yet, by default, the output file only contains the mean and standard deviation of the ensemble. To incorporate all ensemble samples into the output file, you should additionally enable the Write samples to output option within the interface. This action saves the east and north velocity ensemble variables in the output file as east_vel_ensemble and north_vel_ensemble, respectively. These variables are four-dimensional arrays with dimensions for ensemble, time, longitude, and latitude.

Ensemble Dimension

The ensemble dimension of the array has the size $s+1$ where $s$ is the number of samples specified by Number of (Monte-Carlo) samples entry within the interface (also refer to Number of (Monte-Carlo) Samples section). The first ensemble with the index $0$ (assuming zero-based numbering) corresponds to the original input dataset. The other samples with the indices $1, \dots, s$ correspond to the generated samples.

Interpreting Variable over Segmented Domains

The velocity ensemble variables on each of the segmented domains are defined similar to those presented for Reconstructed East and North Velocities. In particular, the missing domain of each ensemble is reconstructed independently.

Mean and Standard Deviation of Ensemble

Note that the mean and standard deviation of the velocity ensemble arrays over the ensemble dimension yield the Reconstructed East and North Velocities and East and North Velocity Errors variables, respectively.

Visualize Output Results

After the computation is complete, you can visualize the results by clicking on the Visualize button. This action loads an animated visualization of your results in the map section.

To gain more control over visualization, make use of the Show Workbench button located on the top-left corner of the visualization screen. This feature enables you to set specific color ranges and styles, tailoring the display to your preferences. If you find yourself working with large datasets on a slower internet connection, consider switching to a 2D map view (as opposed to the spherical globe) by clicking the Map button in the upper-right corner of the map screen. Additionally, to reset the view to the standard angle, simply click the circle at the center of the zoom buttons on the right panel of the screen.

Within the workbench, you have the ability to select variables for visualization according to your needs. These options include the original east and north velocities from your input dataset, the reconstructed east and north velocities, and the mask variable. Additionally, if you have enabled the generation of ensemble (refer to the Generate Ensemble section), you can also visualize the east and north velocity errors.

Interactively Compare Original and Reconstructed Datasets

Utilizing the workbench sidebar, you can also engage in an interactive comparison between input and reconstructed velocity datasets. To achieve this, choose two pairs of original and reconstructed velocity variables, such as the original and reconstructed east velocities. This action will overlay both sets of variable fields. By adjusting the opacity of the reconstructed velocity through the slider on the left sidebar, you can manage the visibility of one field while exposing the overlaid field. To see this process in action, you can refer to the provided demo video.

Download Output Results

Note that the output data is stored only during the lifetime of your browser session. Once you refresh or close your browser, both the uploaded input data (if any) and the generated output files will be completely purged from the server.

Download NetCDF File(s)

You can download the output result using Download button. Depending on your selection of single or multi-file dataset (refer to Multi-File Datasets), the output file you download can be one of the following:

1. Output for Single Dataset

If your input consists of a single dataset (either as a single input file or multiple files representing a single dataset), the output result will be a single .nc file.

2. Output for Multi-File Dataset

If your input files represent multiple separate datasets (refer to Multiple Separate Dataset, Each within a File), a distinct output file with a .nc format is generated for each input file (or URL). These output files are named similarly to their corresponding input files. All of these files are then bundled into a .zip file. When you download your results, you will receive this zip file.

View THREDDS Catalog

During the generation of output results, the output NetCDF files are temporarily hosted on our THREDDS data server for the duration of your browser session. As such, you have the option to view the THREDDS catalog page corresponding to your data. The catalog page contains useful information about your output file, allowing you to explore its contents before downloading. You can use services such as OPENDAP to explore the variables, NetcdfSubset to subset your data, and Godiva3 as an alternative visualization tool.

To access the THREDDS catalog of your data, navigate to the dropdown menu on the right corner of the Download button and select View THREDDS Catalog.

Sessions

Session ID

Every user's session is assigned a distinct ID, represented by a 32-character string. This ID serves as both the name for the output files and can be useful for reporting any bugs or issues. You can locate your exclusive identifier in the About section below.

Save and Load Sessions

To conveniently save and retrieve your configuration settings within your session, our application offers to save and load the form contents. By clicking on the Save Form button, you can store all the settings you have entered into a file named FormData.json. This file captures your configuration and can be downloaded for future use. In subsequent visits to the web page, you can easily restore your previous settings by uploading the FormData.json file using the Load Form button. Examples of form files for various tasks can be found in the Produce Output File section within the tutorials provided below.

Name	Size	Description
`ensemble`	\(s+1\), where \(s\) is the number of (Monte-Carlo) samples.	The ensemble index of arrays. This dimension is present in the output file only when the ensemble generation option is enabled.
`time`	The count of time points extracted from the input dataset within the designated time interval. If the ensemble generation option is activated or the Single time tab is chosen, the size is one.	The time index of arrays.
`lon`	The count of longitude points extracted from the input dataset within the specified subdomain bounds.	The longitude index of arrays.
`lat`	The count of latitude points extracted from the input dataset within the specified subdomain bounds.	The latitude index of arrays.

Variable	Name and Dimensions	Notes
Time	`time[time]`	This comprises the time points extracted from the input dataset's within the specified time span.
Longitude	`lon[lon]`	This comprises the longitudes extracted from the input dataset's rectangular grid, within the specified subdomain bounds.
Latitude	`lat[lat]`	This comprises the latitudes extracted from the input dataset's rectangular grid, within the specified subdomain bounds.
Mask	`mask[lat, lon]`	This variable represents the segmentation of the domain into ocean, land, known velocity and unknown velocity domains.
Reconstructed East Velocity	`east_vel[time, lat, lon]`	If the ensemble generation option is enabled, this variable represents the mean of `ensemble_east_vel` over the `ensemble` dimension, with a `time` dimension size of one.
Reconstructed North Velocity	`north_vel[time, lat, lon]`	If the ensemble generation option is enabled, this variable represents the mean of `ensemble_north_vel` over the `ensemble` dimension, with a `time` dimension size of one.
East Velocity Error	`east_err[time, lat, lon]`	This variable represents the standard deviation of `ensemble_east_err` over the `ensemble` dimension, with a `time` dimension size of one. This variable is present in the output dataset only if ensemble generation option is enabled.
North Velocity Error	`north_err[time, lat, lon]`	This variable represents the standard deviation of `ensemble_north_err` over the `ensemble` dimension, with a `time` dimension size of one. This variable is present in the output dataset only if ensemble generation option is enabled.
East Velocity Ensemble	`east_vel_ensemble[ensemble, lat, lon]`	This variable is present in the output dataset only if ensemble generation option is enabled.
North Velocity Ensemble	`north_vel_ensemble[ensemble, lat, lon]`	This variable is present in the output dataset only if ensemble generation option is enabled.

[1]	Ameli, S. and Shadden, S. C. (2019). A transport method for restoring incomplete ocean current measurements. Journal of Geophysical Research: Oceans, 124, 227– 242. doi: https://doi.org/10.1029/2018JC014254	PDF	BibTex	arXiv
	`@article{https://doi.org/10.1029/2018JC014254, author = {Ameli, Siavash and Shadden, Shawn C.}, title = {A Transport Method for Restoring Incomplete Ocean Current Measurements}, journal = {Journal of Geophysical Research: Oceans}, volume = {124}, number = {1}, pages = {227-242}, doi = {https://doi.org/10.1029/2018JC014254}, url = {https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018JC014254}, eprint = {https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2018JC014254}, year = {2019} }`
[2]	Ameli, S. and Shadden, S. C. (2023). Stochastic Modeling of HF Radar Data for Uncertainty Quantification and Gap Filling. arXiv: 2206.09976 [physics.ao-ph].	PDF	BibTex	arXiv
	`@misc{arxiv.2206.09976, doi = {10.48550/ARXIV.2206.09976}, title = {Stochastic Modeling of HF Radar Data for Uncertainty Quantification and Gap Filling}, author = {Ameli, S. and Shadden, S. C.}, year = {2023}, archivePrefix = {arXiv}, eprint = {2206.09976}, primaryClass = {physics.ao-ph}, howpublished = {\emph{arXiv}: 2206.09976 [physics.ao-ph]}, }`
[3]	RestoreIO - An online computational service to restore incomplete oceanographic datasets. Available at https://restoreio.org		BibTex
	`@misc{restoreio, title = {RestoreIO - An online computational service to restore incomplete oceanographic datasets}, howpublished = {Available at \url{https://restoreio.org}}, }`

Restore Incomplete Oceanographic Data

Restoring Missing Coverages of HF Radar Dataset

User Guide

Getting Started

Network Connection

Demo Video

Quick Start Using a Sample Dataset

Preparing Input Data

File Format

Best Practice for NetCDF Files

Required NetCDF Variables

Optional NetCDF Variables

1. Time Variable

Optional Attributes

Masking

2. Longitude and Latitude Variables

Data on Irregular Grids

Masking

3. Ocean's Surface East and North Velocity Variables

Unit

Array Dimensions

Dimensions Order

Masking

4. Ocean's Surface East and North Velocity Error Variables (Optional)

Unit

Array Dimensions

Dimensions Order

Masking

5. Geometric Dilution of Precision Variables (Optional)

Set Scale Velocity Error Entry

Unit

Array Dimensions

Dimensions Order

Masking

Providing Input Data

1. Providing URL of Remote Data

Finding the OpenDap URL from THREDDS Catalogs

2. Upload Files

Token

Multi-File Datasets

1. Single Dataset Stored Across Multiple Files

2. Multiple Separate Datasets, Each within a File

1. Name Your Files with a Numeric Pattern

2. Provide Multiple Files

3. Enable Multi-File Processing

4. Provide File Iterator Range

Scan Input Data

Visualize Input Data

Time

1. Single Time

2. Time Interval

Domain

Domain Segmentation

Detect Data Domain

1. Using Convex Hull

2. Using Concave Hull

Detect Land

Illustration of Domain Segmentation

Extend Data Domain to Coastline

Refine Grid

Generate Ensemble

Required Variables

Ensemble Generation Settings

Write Samples to Output

Number of (Monte-Carlo) Samples

Number of Eigen-Modes

Kernel Width

Scale Velocity Errors

Output Variables

1. Mask

Interpreting Variable over Segmented Domains

2. Reconstructed East and North Velocities

Interpreting Variable over Segmented Domains

3. East and North Velocity Errors

Interpreting Variable over Segmented Domains

4. East and North Velocity Ensemble

Ensemble Dimension

Interpreting Variable over Segmented Domains

Mean and Standard Deviation of Ensemble

Visualize Output Results