Areas of Interest driven by Cloud Raster Format and webtools in the ESRI portal

Context

IMPORTANT NOTE This documentation has been written using the ArcGIS Pro version 2.8.3 and the ArcGIS Enterprise Portal version 10.9 (Release May 2021).

Creating a crf

Check the document of CRF creation and storage for detailed steps. Before starting, make sure you are NOT building pyramids. This can add hours to the processing. Your Pro may be set to do this automatically. You can turn it off in Options (Access by clicking on the Project tab in the top left corner). If you donโ€™t want to make the change through the Options, you can simply make sure that you unclick the Build pyramids option when running a geoprocessing service.

The summary of geoprocessing steps followed are:

  • Create mosaic dataset (creates a container inside the geodatabase to hold the rasters )
  • Add rasters to mosaic dataset (Add rasters to the container)
  • Calculate Field (Create a unique id field)
  • Build multidimensional Info (Add Dimension and Variable information)
  • Table to table (Optional) (Create a lookup table to match the unique id and the raster name)
  • Copy raster (Create the crf with all the previous information, this is the long step)

(This notebook has the sequence of geoprocessing commands in python.)

Things to be aware of:

  • The projection of the rasters and the crf should match. Set the projection in the Environments tab of the first step Create mosaic dataset(You can use one of the rasters as input of the Coordinate System. By default ArcGIS Pro uses Pseudo Mercator 3857)
  • This is a multidimensional processing operation. Set the Parallel processing option to 90% in the Environments tab every time
  • The most critical step is Build multidimensional info, this is because it has to be clear what are the variables and what are the dimensions. After creating a new field in the attribute table of the Mosaic dataset, use the following input as guide. This example is how the encroachment datacube has been created.
1. The geoprocessing parameters

IMPORTANT NOTE: Be aware that when you check the ran geoprocess in the history, automatically, the information in Variable Field changes to the new Variable field. Always check the python snippet to see what has been done.

arcpy.md.BuildMultidimensionalInfo("land_encroachment", "Variable_new", "SliceNumber # #", "ghm # #")

2. Then the multidimensional info properties of the Mosaic dataset should look like this:

3. And finally the changes in the table of attributes that shows the multidimensional information

Where do the CRFs live?

They live in an Azure bucket in the cloud. They are managed using Microsoft Azure Storage Explorer. There are different ways to access the CRFs.

Accessing the CRFs from ArcGIS Pro

The Virtual machine already has the .acs file that makes the connection. The yaleCube.acs file is located in Documents/ArcGIS/Projects. When a new project is created a connection is done via the ribbon (Insert/Connections/Add Cloud Storage Connection). Then, the yaleCube.acs appears on the catalog, within the Cloud Stores folder and we can add the CRFs stored there to our map and work with them in ArcGIS Pro.

Accessing the CRFs from the webtools

Once the webtools are created (process explained in the following sections), we can test them and used them calling the CRFs directly from the Azure bucket in the cloud. For that, we need to use the path /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/ and add thr name of the CRF we need to use.

Currently (March 2023), the services in use require these CRFs:

  • For the biodiversity GP services:
CRF Path
Birds /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/birds_equal_area_20211003.crf
Amphibians /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/amphibians_equal_area_20211003.crf
Mammals /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/mammals_equal_area_20211003.crf
Reptiles /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/reptiles_equal_area_20211003.crf
  • For the contextual GP services:
CRF Path
Ecological Land Units (ELU) /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/ELU.crf
Population /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/population2020.crf
WDPA /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/WDPA_Terrestrial_CEA_June2021.crf
Human Pressure: energy and extractive resources /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/Extraction_TimeSeries_Reclassify_20230501.crf
Human Pressure: Transportation /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/Transportation_TimeSeries_Reclassify_20230515.crf
Human Pressure: Agriculture /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/Agriculture_TimeSeries_Reclassify_20230501.crf
Human Pressure: Human intrusion /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/HumanIntrusion_TimeSeries_Reclassify_20230501.crf
Human Pressure: Urban and built-up /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft/Builtup_TimeSeries_Reclassify_20230501.crf

Building and publishing a Geoprocessing Service or webtool

A Webtool is a geoprocessing tool that lives in the ArcGIS Enterprise Portal and can be called from the Half Earth Application to make calculations on the fly. The data shown when a user draws itโ€™s own area of interest relies completely on these tools.

To create these tools, the first steps is to build a model locally in ArcGIS Pro, and then publish it to the Portal.

Create a Model in ArcGIS Pro

In the Catalogue, create a new model (Model Builder) in the projectโ€™s Toolbox.

Once the model is ready, indicate which ovals are Parameters by right clicking on them (a small P appears). Set their names so they are legible by the Front End, inside model builder rename the parameters (right click on the ovals). Currently we are using: geometry, crf_name and output_table.

Use the tool calculate value as much as possible (this is a Model Builder Utility tool), Python is quicker than adding extra geoprocessing tools.

Avoid using the tool Calculate field because it generates problems when publishing the tool due to version incompatibilities between ArcGIS Pro and ArcGIS Enterprise Portal.

Since we are using Multidimensional cubes, it is key to set up the Parallel processing to 90% in any of the geoprocessing tools added to the model. That will speed up the processing.

IMPORTANT NOTE: Take into account the input and output coordinate reference systems. In this case, we are providing a Pseudo Mercator input (the geometry) and we are using it against an Equal Area projection (the crf). The geoprocessing tools automatically serve the output in the raster projection, without us having to manually add a re-projection step to change the crs of the geometry. However, it is a good practice to make explicit the CRS in the Environments tab of each of the geoprocessing tools.

Publish a geoprocessing service

Once our Model (made with Model Builder) is ready, we can publish it as a geoprocessing service. For that, we should follow the next steps:

  1. Create a small polygon using the Sample tool (click on the pencil that shows up on the left of Input location raster or features and create a small polygon in an area of interest). In the Table of contents right click on the new polygon and click on Zoom to Layer to show only the polygon.
  2. Create a small subset of the crf using Subset Multidimensional raster, set the environment setting of extent to โ€œcurrent Displayโ€. The reason for doing this is to avoid copying the entire datasets in the portal. Since the Model cannot be published without layers, we generate samples of the crf that are limited to the extent of the geometry that we are using for testing. The smaller the extent, the better, but we need to make sure the crf samples are slightly larger than the polygon used for testing, otherwise the tool would fail.

  1. Run the model using the polygon and the subset crf as a geoprocessing tool (By clicking on the Model inside the Toolbox directly). Select 90% for parallel processing.
  2. From the History, right click on the model that has just been run and Share as a Web tool (make sure you are logged into the Production Portal, look at the top right, otherwise the option wonโ€™t appear).

  3. In the General Panel:
    1. Name the model as ModelNameProd
    2. Add the model to the Production folder
    3. Click on Copy all data
    4. Click on Share with Everyone
  4. In the Configuration panel:
    1. Change the Message level to Info (this will give more details in case of an error).
    2. Increase the Maximum number of records returned by server to 100000. This is very important to avoid not returning a response to the front end.
  5. In the Content panel configure the tool properties (click on the pencil on the right)
    • Set the geometry to User defined value
    • Set the crf as Choice list, make sure only the subset crf is selected by clicking on Only use default layers. This is so only the minimum amount of data is copied, but also so there arenโ€™t several elements in the choice list.
    • Add the description to the different parameters
  6. Unclick the option: Add optional output Feature Service Parameter(we are not using this).
  7. Analyse before publishing to check which parameters or info is missing on the description of the tool. Sometimes analyse has to be run a couple of times without having to change anything between analyses. There will always be a warning message saying that the data will be copied to the Portal, that is expected and ok.
  8. Click on Publish ๐Ÿš€

Find the published GP Service on the Portal

Once you have succeeded with the publication of a webtool, it appears in the Portal section of the Catalogue Panel. When hovering over the tool the url to the item in the Portal appears. Follow the url and it takes you to the Portal in the Web (You can also log in directly to the Portal with your credentials, go to Content and find the tool you want to check). The โ€œlook and feelโ€โ€ of the portal is identical to ArcGIS online. Protect the tool from deletion and update to Public the sharing options in the settings. Then, in the Overview panel, on the bottom right you will find the url of the service. Click to View the tool in a new window.

On this new window click on Tasks. The url that appears in the search bar is the one that the front end must use. The url should look like this: https://heportal.esri.com/server/rest/services/<Tool name>/GPServer/<task name>.

Another way to get the URL is to click on the tool, and on the Overview panel Under Tools click on Service URL, this will take you directly to the Tasks View and you have the URL on the top.

Test the new GP Service

In order to check that the GP service is working correctly before passing the URL to the front-end, we can simulate the call that the front-end would make in the ArcGIS REST API.

Access the tool

  • Log in to the Production Portal (https://heportal.esri.com/portal/home) with the required credentials
  • Go to Content > Production folder > choose a GP service
  • Click on Service URL
  • Go to bottom of page and click on Submit Job

Fill parameters

To test the service, we need to provide a geometry and the path to the CRFs used by the webtool.

  • Geometry: the geometry needs to have a very specific format. The structure passed is a json that can be obtained by using the tool Features To Json in ArcGIS Pro: make sure the output path is set outside of the gdb so it can be accessed easily, and tick the boxes for Formatted JSON and Include Z values.

This is an example of the geometry you need to add to the box:

 {
  "displayFieldName" : "",
  "hasZ" : true,
  "fieldAliases" : {
    "OBJECTID" : "OBJECTID",
    "Name" : "Name",
    "Text" : "Text",
    "IntegerValue" : "Integer Value",
    "DoubleValue" : "Double Value",
    "DateTime" : "Date Time",
    "Shape_Length" : "Shape_Length",
    "Shape_Area" : "Shape_Area"
  },
  "geometryType" : "esriGeometryPolygon",
  "spatialReference" : {
    "wkid" : 102100,
    "latestWkid" : 3857
  },
  "fields" : [
    {
      "name" : "OBJECTID",
      "type" : "esriFieldTypeOID",
      "alias" : "OBJECTID"
    },
    {
      "name" : "Name",
      "type" : "esriFieldTypeString",
      "alias" : "Name",
      "length" : 255
    },
    {
      "name" : "Text",
      "type" : "esriFieldTypeString",
      "alias" : "Text",
      "length" : 255
    },
    {
      "name" : "IntegerValue",
      "type" : "esriFieldTypeInteger",
      "alias" : "Integer Value"
    },
    {
      "name" : "DoubleValue",
      "type" : "esriFieldTypeDouble",
      "alias" : "Double Value"
    },
    {
      "name" : "DateTime",
      "type" : "esriFieldTypeDate",
      "alias" : "Date Time",
      "length" : 8
    },
    {
      "name" : "Shape_Length",
      "type" : "esriFieldTypeDouble",
      "alias" : "Shape_Length"
    },
    {
      "name" : "Shape_Area",
      "type" : "esriFieldTypeDouble",
      "alias" : "Shape_Area"
    }
  ],
  "features" : [
    {
      "attributes" : {
        "OBJECTID" : 1,
        "Name" : null,
        "Text" : null,
        "IntegerValue" : null,
        "DoubleValue" : null,
        "DateTime" : null,
        "Shape_Length" : 231978.71016606738,
        "Shape_Area" : 3338690868.7937865
      },
      "geometry" : {
        "hasZ" : true,
        "rings" : [
          [
            [
              -818493.72899999842,
              5383774.996100001,
              0
            ],
            [
              -755549.04740000144,
              5382225.2217999995,
              0
            ],
            [
              -756854.20630000159,
              5329215.6889000013,
              0
            ],
            [
              -819798.88789999858,
              5330765.4632999972,
              0
            ],
            [
              -818493.72899999842,
              5383774.996100001,
              0
            ]
          ]
        ]
      }
    }
  ]
}
  • CRFs:

Depending on the GP service, we will need to provide the path to one or more than one CRFs. For example, the biodiversity GP services only use one CRF, the one corresponding to their taxa. The Contextual GP service, on the other hand, extracts information from different CRFs, so we need to provide the path to all of them.

Note that the default values that appear in the boxes represent the data that was used to publish the service, that is, small subsets of the original CRFs. So if we test the tool using a different geometry but we do not change the default CRF paths, the new geometry will fall outside the extent of the subset data and we will get an error. For that reason, we need to make sure we substitute the default names with the complete path to the Azure bucket /cloudStores/HECloudstore_ds_vwkuvgmvcfqewwft and the name of the corresponding CRF.

Submit job

Once the boxes have the geometry in a json format and the paths to the required CRFs, we can click on Submit Job (POST). To see how the process is going and check if there are any erros, we can click on Check Job details and get updates on the process progress.

When the process finishes, we get links to the output tables, which provide results in a JSON format.

Current GP services in use

The section AOI_summaries provides information about the most recent Geoprocessing Services. They were created in March 2023 for the implementation of the AOI richer summaries, which included new calculations such as the SPS and the incorporation of new human pressure layers.

Historic of AOIs and its maintenance

The AOIs created by users can be shared with an url. When the urls are created, the data is also sent to an AGOL table where the data is stored. When the recepient uses the url, the same data will be displayed without having to call the GP service. Currently, this is the Service being tested, but there are previous versions in the folder #2 aois (aoi-historic and aoi-historic-dev)

Cleaning the historic AOIs service

The notebook saved in the organisation is ready to be activated and start the cleaning every first of the month. A version for reference can be found in the he-scratchfolder. The important variable to check is the limit number of features that the service shoud have: feature_limit.


Information from the first iteration of geoprocessing services

Just in case some of this information is needed in the future, we have kept the documentation written when the first GP services were created (2021):

Some details about the tools used within the GP Services when working with CRFs

NOTE: This documentation has been written using the ArcGIS Pro version 2.6.4 and the Portal version 10.8.1, so there might be some differences with newer versions.

  • Sample: This is a super powerful tool. Its power lays on the fact that it is the first tool developed to deal with multidimensional data. Our testing showed that the time of processing increases as the area of interest increases. To use it in the portal server it was necessary to provide a new field to the polygon that had an integer.

  • Zonal Statistics as Table: Our testing showed that the time of processing increased as the number of slices increased, not the area of interest.

  • Polygon to Raster: When rasterizing a polygon for the purpose of calculating proportions it is key that the cell size is the same as the input crf. In this case, opposite to Sample, the field that worked well was OBJECTID. In the ArcGIS Pro version we were working, the raster created did not have an attribute table. Within Model Builder we got the number of pixels using Calculate Value and the following Python codeblock:

    getAreaRaster(r"%custom_raster%") It is key to use the right double quotes.

    import arcpy
    import numpy
    def getAreaRaster(rst):
      arr = arcpy.da.TableToNumPyArray(rst, "COUNT")
      a,= arr.tolist()[0]
      return a
    

    %custom_raster% refers to the output from Polygon to Raster. The % uses ESRIโ€™s in-line variable scripting.

  • Filtering a table using SQL: To obtain only the necessary rows, we have used Table Select. This tool uses an SQL expression that is built using Calculate value and python codeblock.

Example 1: Getting the top 20% most prevalent species

getTopRows(r"%table_in%")

import arcpy
import numpy as np
def getTopRows(table, prop = 0.2 ):
    arr = arcpy.da.TableToNumPyArray(table,['SliceNumber',"COUNT"])
    n = int(round(prop * len(arr), 0))
    sort_arr = np.sort(arr, order = "COUNT",)[0:n]
    arr_lit = sort_arr['SliceNumber'].tolist()
    arr_int = map(int, arr_lit)
    res = ', '.join(map(str, arr_int))
    out = f"SliceNumber IN ({res})"
    return out

Example 2: Getting only the rows with presence. This might be unecessary if used in Select Table. Select Table allows for a WHERE query.

getPresentSpecies(r"%table_in%")

import arcpy
import numpy as np
def getPresentSpecies(table):
    arr = arcpy.da.TableToNumPyArray(table,["SliceNumber","presence"])
    out_arr = arr[arr["presence"]>0]
    arr_lit = out_arr["SliceNumber"].tolist()
    arr_int = map(int, arr_lit)
    res = ', '.join(map(str, arr_int))
    out = f"SliceNumber IN ({res})"
    return out

About the limit of records returned: If you forget to set a high limit for the records returned, the front end might inform of this limit. In the object w returned, check value.exceededTransferLimit.

Models from Model Builder as Python code

The process inside the Geoprocessing service can be found in the he-scratchfolder repo.

Creating the lookup tables

The process of creation of the tables consist on getting the slice number and matching name from the raster mosaic dataset and then merge using the scientific name with data from MOL. (notebook).

First iteration of GP Services

| Front end element | Crf name |Crf variable| Gp service | Output to use | Field to use from response | AGOL table to use | AGOL field to use | |โ€“|โ€“|โ€“|โ€“|โ€“|โ€“| | population | population2020.crf |none| GP ContextualLayersProd20220131|output_table_population|SUM | none | none | | climate_regime | ELU.crf|none| GP ContextualLayersProd20220131|output_table_elu_majority |MAJORITY | agol link | cr_type contains the name of the type of climate regime | | land_cover | ELU.crf|none| GP ContextualLayersProd20220131|output_table_elu_majority |MAJORITY | agol link | lc_type contains the name of the type of land cover | | human_encroachment | land_encroachment.crf |none| GP ContextualLayersProd20220131|output_table_encroachment| SliceNumber has the code of the type of human activity, percentage_land_encroachment gives percentage of each type|agol link|SliceNumber to join and then Name| | Protection_percentage | WDPA_Terrestrial_CEA_June2021.crf | none| GP ContextualLayersProd20220131 |output_table_wdpa_percentage| percentage_protected |none |none | | WDPA list | none | none| GP ContextualLayersProd |output_table_wdpa| ORIG_NAME,DESIG_TYPE,IUCN_CAT,GOV_TYPE,AREA_KM,NAME_0 |agol link non whitelisted yet|WDPA_PID | | mammal_data | mammals_equal_area_20211003.crf | presence |GP SampleMamProd20220131 | output_table| SliceNumber has the code of the species; per_global shows the area relative to the global species range; per_aoi shows the % area present inside the aoi. |FS lookup tableWhitelisted table| SliceNumber, scientific_name, percent_protected,conservation_target,has_image,common_name | | amphibian_data | amphibians_equal_area_20211003.crf |amphibians| GP SampleAmphProd20220131 | output_table| SliceNumber has the code of the species; per_global shows the area relative to the global species range; per_aoi shows the % area present inside the aoi. |FS lookup tableWhitelisted table| SliceNumber, scientific_name, percent_protected,conservation_target,has_image,common_name | | bird_data | birds_equal_area_20211003.crf | birds|GP SampleBirdsProd20220131 | output_table| SliceNumber has the code of the species; per_global shows the area relative to the global species range; per_aoi shows the % area present inside the aoi. |FS lookup tableWhitelisted table| SliceNumber, scientific_name, percent_protected,conservation_target,has_image,common_name | | reptile_data | reptiles_equal_area_20211003.crf | reptiles|GP SampleReptProd20220131 | output_table| SliceNumber has the code of the species; per_global shows the area relative to the global species range; per_aoi shows the % area present inside the aoi. |FS lookup tableWhitelisted table| SliceNumber, scientific_name, percent_protected,conservation_target,has_image,common_name |

Source of data

Population: WorldPop 2020 - web World Terrestrial Ecosystem - Living Atlas

Querying the AGOL tables

For those Geoprocessing services that require to query information from a table in ArcGIS Online, Arcade can be used to return the information (more about Arcade in this docs). The Filter function accepts an SQL expression and a layer.

The structure of the SQL expression is composed of the name of the field to query (in our case SliceNumber), then the condition IN and between parenthesis all the ids of species returned by the geoprocessing service.

var lay = $layer
var sqlExpr = 'SliceNumber IN (164, 250)'
var val = Filter(lay, sqlExpr)
return val