Resource Import/Export

Currently, all data import and export operations happen through the Arches command line interface.

Importing a CSV

To do a bulk import of resources to your Arches database, you must create a correctly formatted CSV (comma-separated values) file. We recommend MS Excel or Open Office for this task. More advanced users will likely find a custom scripting effort to be worthwhile.

The workflow for creating a CSV should be something like this:

  1. Identify which Resource Model you are loading data into
  2. Download the mapping file and concepts file for that resource model
  3. Modify the mapping file to reference your CSV
  4. Populate the CSV with your data
  5. Import the CSV using the import_business_data command.

Note

Your CSV should be encoded into UTF-8. These steps will help you if you are using MS Excel.

CSV File Requirements

Each row in the CSV can contain the attribute values of one and only one resource.

The first column in the CSV must be named ResourceID. ResourceID is a user-generated unique ID for each individual resource. If ResourceID is a valid UUID, Arches will adopt it internally as the new resource’s identifier. If ResourceID is not a valid UUID Arches will create a new UUID and use that as the resource’s identifier. Subsequent columns can have any name.

ResourceIDs must be unique among all resources imported, not just within each csv, for this reason we suggest using UUIDs.

Resource ID attribute 1 attribute 2 attribute 3
1 attr. 1 value attr. 2 value attr. 3 value
2 attr. 1 value attr. 2 value attr. 3 value
3 attr. 1 value attr. 2 value attr. 3 value

Simple CSV with three resources, each with three different attributes.

Or, in a raw format (if you open the file in a text editor), the CSV should look like this:

Resource ID,attribute 1,attribute 2,attribute 3
1,attr. 1 value,attr. 2 value,attr. 3 value
2,attr. 1 value,attr. 2 value,attr. 3 value
3,attr. 1 value,attr. 2 value,attr. 3 value

Multiple lines may be used to add multiple attributes to a single resource. You must make sure these lines are contiguous, and every line must have a ResourceID. Other cells are optional.

Resource ID attribute 1 attribute 2 attribute 3
1 attr. 1 value attr. 2 value attr. 3 value
2 attr. 1 value attr. 2 value attr. 3 value
2   attr. 2 additional value  
3 attr. 1 value attr. 2 value attr. 3 value

CSV with three resources, one of which has two values for attribute 2.

Depending on your Resource Model’s graph structure, some attributes will be handled as “groups”. For example, Name and Name Type attributes would be a group. Attributes that are grouped must be on the same row. However, a single row can have many different groups of attributes in it, but there may be only one of each group type per row. (e.g. you cannot have two names and two name types in one row).

Resource ID name name_type description
1 Yucca House Primary “this house, built in…”
2 Big House Primary originally a small cabin
2 Old Main Building Historic  
3 Writer’s Cabin Primary housed resident authors

CSV with three resources, one of which has two groups of name and name_type attributes. Note that “Primary” and “Historic” are the prefLabels for two different concepts in the RDM.

You must have values for any required nodes in your resource models.

Value Formats

The data_type attribute in the mapping file will help you determine what the format of the value(s) in the column for a given node must be.

  • string - Strings need not be single-quoted unless they contain a comma or contain HTML tags (as in the case strings display/collected with the rich text editor widget).
  • number - Numbers don’t need quotes.
  • date - All dates must be formatted as YYYY-MM-DD. Dates that are not four digits must be zero padded (01-02-1999). No quotes.
  • geojson-feature-collection - All geometry must be formatted in “well-known text” (WKT), and all coordinates stored as WGS 84 (EPSG:4326) decimal degrees. Multi geometries must be single-quoted.
  • concept - If the values in your concept collection are unique you can use the label (prefLabel) for a concept. If not, you will get an error during import and you must use UUIDs instead of labels (if this happens, see Concepts File below). If a prefLabel has a comma in it, it must be triple-quoted: """Shingles, original""".
  • concept-list - This must be a single-quoted list of prefLabels (or UUIDs if necessary): "Slate,Thatch". If a prefLabel contains a comma, then that prefLabel must have double-quotes: "Slate,""Shingles, original"",Thatch".
  • domain-value - A string that matches a valid domain value for this node, single-quoted if it contains a comma.
  • domain-value-list - A single-quoted list of strings that match valid domain values for this node. Follow quoting guidelines for concept-list if any of the values contain commas.

Note

If you are using MS Excel to create your CSV files, double-quotes will automatically be added to any cell value that contains a comma.

Mapping File

All CSV files must be accompanied by a mapping file. This is a JSON-structured file that indicates which node in a Resource Model’s graph each column in the CSV file should map to. The mapping file should contain the source column name populated in the file_field_name property for all nodes in a graph the user wishes to map to. The mapping file should be named exactly the same as the CSV file but with the extension ‘.mapping’, and should be in the same directory as the CSV.

To create a mapping file for a Resource Model in your database, go to the Arches Designer landing page. Find the Resource Model into which you plan to load resources, and choose Export Mapping File from the Manage menu.

../_images/create_mapping_file.gif

Unzip the download, and you’ll find a .mapping file as well as a _concepts.json file (see Concepts File). The contents of the mapping file will look something like this:

{
    "resource_model_id": "bbc5cee8-fa16-11e6-9e3e-026d961c88e6",
    "resource_model_name": "HER Buildings",
    "nodes": [
        {
            "arches_nodeid": "bbc5cf1f-fa16-11e6-9e3e-026d961c88e6",
            "arches_node_name": "Name",
            "file_field_name": "",
            "data_type": "concept",
            "concept_export_value": "label",
            "export": false
        },
        {
            "arches_nodeid": "d4896e3b-fa30-11e6-9e3e-026d961c88e6",
            "arches_node_name": "Name Type",
            "file_field_name": "",
            "data_type": "concept",
            "concept_export_value": "label",
            "export": false
        },
        ...
    ]
}

The mapping file contains cursory information about the resource model (name and resource model id) and a listing of the nodes that compose that resource model. Each node contains attributes to help you import your business data (not all attributes are used on import, some are there simply to assist you). The concept_export_value attribute is only present for nodes with datatypes of concept, concept-list, domain, and domain-list - this attribute is not used for import. It is recommended that you not delete any attributes from the mapping file. If you do not wish to map to a specfic node simply set the file_field_name attribute to "".

You will now need to enter the column name from your CSV into the file_field_name in appropriate node in the mapping file. For example, if your CSV has a column named “activity_type” and you want the values in this column to populate “Activity Type” nodes in Arches, you would add that name to the mapping file like so:

{
    ...
        {
            "arches_nodeid": "bbc5cf1f-fa16-11e6-9e3e-026d961c88e6",
            "arches_node_name": "Activity Type",
            "file_field_name": "activity_type", <-- place column name here
            "data_type": "concept",
            "concept_export_value": "label",
            "export": false
        },
   ...
}

To map more than one column to a single node, simply copy and paste that node within the mapping file.

Concepts File

When populating concept nodes from a CSV you should generally use the prefLabel for that concept. However, in rare instances there may be two or more concepts in your collection that have identical prefLabels (this is allowed in Arches). In this case you will need to replace the prefLabel in your CSV with the UUID for the Value that represents that prefLabel.

To aid with the process, a “concepts file” is created every time you download a mapping file, which lists the valueids and corresponding labels for all of the concepts in all of the concept collections associated with any of the Resource Model’s nodes. For example:

"Name Type": {
    "ecb20ae9-a457-4011-83bf-1c936e2d6b6a": "Historic",
    "81dd62d2-6701-4195-b74b-8057456bba4b": "Primary"
},

You would then need to use 81dd62d2-6701-4195-b74b-8057456bba4b instead of Primary in your CSV.

Shapefile Import

Uploading a shapefile to Arches is very similar to uploading a csv file with a few exceptions. The same rules apply to rich text, concept data, grouped data, and contiguousness. And, like csv import, shapefile import requires a mapping file.

The shapefile must contain a field with a unique identifier for each resource named ‘ResourceID’.

The shapefile must be in WGS 84 (EPSG:4326) decimal degrees.

The shapefile must consist of at least a .shp, .dbf, .shx, and .prj file. It may be zipped or unzipped.

Dates in a shapefile can be in Esri Shapefile date format, Arches will convert them to the appropriate date format.

In your mapping file, the node you wish to map the geometry to must have a file_field_name value of ‘geom’.

To import a shapefile use the same command as csv import, simply point to your shapefile instead of your csv file. import_business_data

Note: More complex geometries may encounter a mapping_parser_exception error. This error occurs when a geometry is not valid in elasticsearch. To resolve this, first make sure your geometry is valid using ArcMap, QGIS, or PostGIS. Next, you can modify the precision of your geometry to 5 decimals or you can simplify your geometry using the QGIS simplify geometry geoprocessing tool, or the PostGIS st_snaptogrid function.

JSON Import

JSON import of business data is primarily intended for transferring business data between arches instances. Because of this it’s especially user-friendly to create or interpret, but doing so is not impossible.

First, there are at least two ways you can familiarize yourself with the format. The system settings in an Arches package is stored in this json format, you can open one of those up and take a look. Perhaps a better way in your case is to create some business data via the ui in your instance of arches and export it to the json format using the business data export command defined here https://github.com/archesproject/arches/wiki/Command-Line-Reference#export-commands. This can act as a template json for data creation. For the rest of this section it may be helpful to have one of these files open to make it easier to follow along.

General structure of the entire file:

{
    "business_data": {
        "resources": [
            {
                "resourceinstance": {. . .},
                "tiles": [. . .],
            }
        ]
    }
}

The json format is primarily a representation of the tiles table in the arches postgres database with some information about the resource instance(s) included. Within the business_data object of the json are two objects, the tiles object and the resourceinstance object. Let’s start with the resource instance object.

Structure of the resourceinstance object:

{
    "graph_id": "<uuid,
    "resourceinstanceid": uuid,
    "legacyid": uuid or text
}
  • graph_id - the id of the resource model for which this data was created
  • resourceinstanceid - the unique identifier of this resource instance within Arches (this will need to be unique for every resource in Arches)
  • legacyid - an identifier that was used for this resource before its inclusion in Arches. This can be the same as the resourceinstanceid (this is the case when you provide a UUID to the ResourceID column in a csv) or it can be another id. Either way it has to be unique among every resource in Arches.

The tiles object is a list of tiles that compose a resource instance. The tiles object is a bit more complicated than the resourceinstance object, and the structure can vary depending on the cardinality of your nodes. The following cardinality examples will be covered below:

  1. 1 card
  2. n cards
  3. 1 parent card with 1 child card
  4. 1 parent card with n child cards
  5. n parent cards with 1 child card
  6. n parent cards with n child cards

But first a description of the general structure of a single tile:

{
    "tileid": "<uuid>",
    "resourceinstance_id": "<uuid>",
    "nodegroupid": "<uuid>",
    "sortorder": 0,
    "parenttile_id": "<uuid>" or null,
    "data": {. . .}
}
  • tileid - unique identifier of the tile this is the primary key in the tiles table and must be a unique uuid

  • resourceinstance_id - the uuid corresponding to the instance this tile belongs to (this should be the same as the resourceinstance_id from the resourceinstance object.

  • nodegroup_id - the node group for which the nodes within the data array participate

  • sortorder - the sort order of this data in the form/report relative to other tiles (only applicable if cardinality is n)

  • parenttile_id - unique identifier of the parenttile of this tile (will be null if this is a parent tile or the tile has no parent)

  • data - json structure of a node group including the nodeid and data populating that node. For example:

    {
        "<uuid for building name node>":"Smith Cottage"
    }
    

The tile object is tied to a resource model in two ways: 1) through the nodegroup_id 2) in the data object where nodeids are used as keys for the business data itself.

Now for a detailed look at the actual contents of tiles. Note that below we are using simplified values for tileid, like "A" and "B", to clearly illustrate parent/child relationships. In reality these must be valid UUIDs.

1 card

1: There is one and only one instance of this nodegroup/card in a resource:

[
    {
        "tileid": "A",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {
            "nodeid": "some data",
            "nodeid": "some other data"
        }
    }
]

This structure represents a tile for a nodegroup (consisting of two nodes) with no parents collecting data with a cardinality of 1.

n cards

n: There are multiple instances of this nodegroup/card in a resource:

[
    {
        "tileid": "A",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid">,
        "nodegroupid": "<uuid from resource model">,
        "sortorder": 0,
        "parenttile_id": null,
        "data" {
            "nodeid": "some data",
            "nodeid": "some other data"
        }
    },
        {
        "tileid": "B",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {
            "nodeid": "more data",
            "nodeid": "more other data"
        }
    }
]

1 parent card with 1 child card

1-1: One and only one parent nodegroup/card contains one and only one child nodegroup/card:

[
    {
        "tileid": "A",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {}
    },
    {
        "tileid": "X",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "A",
        "data" {
            "nodeid": "data",
            "nodeid": "other data"
        }
    }
]

1 parent card with n child cards

1-n: One and only one parent nodegroup/card containing multiple instances of child nodegroups/cards:

[
    {
        "tileid": "A",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {}
    },
    {
        "tileid": "X",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "A",
        "data" {
            "nodeid": "data",
            "nodeid": "other data"
        }
    },
    {
        "tileid": "Y",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "A",
        "data" {
            "nodeid": "more data",
            "nodeid": "more other data"
        }
    }
]

n parent cards with 1 child card

n-1: Many parent nodegroups/cards each with one child nodegroup/card:

[
    {
        "tileid": "A",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {}
    },
    {
        "tileid": "X",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "A",
        "data" {
            "nodeid": "data",
            "nodeid": "other data"
        }
    },
    {
        "tileid": "B",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {}
    },
    {
        "tileid": "Y",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "B",
        "data" {
            "nodeid": "more data",
            "nodeid": "more other data"
        }
    }
]

n parent cards with n child cards

n-n: Many parent nodegroups/cards containing many child nodegroups/cards:

[
    {
        "tileid": "A",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": null,
        "data" {}
    },
    {
        "tileid": "X",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "A",
        "data" {
            "nodeid": "data",
            "nodeid": "other data"
        }
    },
    {
        "tileid": "B",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid":  uuid from resource model,
        "sortorder": 0,
        "parenttile_id": null,
        "data" {}
    },
    {
        "tileid": "Y",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "B",
        "data" {
            "nodeid": "more data",
            "nodeid": "more other data"
        }
    },
    {
        "tileid": "Z",
        "resourceinstance_id": "<uuid from resourceinstance.resourceinstanceid>",
        "nodegroupid": "<uuid from resource model>",
        "sortorder": 0,
        "parenttile_id": "B",
        "data" {
            "nodeid": "even more data",
            "nodeid": even "more other data"
        }
    }
]

Exporting Arches Data

Currently all export must happen through the command line interface. Alternatively, users with database access have the option of writing a resource view, adding the view to a GIS client, and exporting data from there. Export commands are listed in the Wiki documentation.

Resource Database Views

To export to tabular formats such as shapefile, it is necessary to flatten the graph structure of your resources. One way to do this is to create a database view of your resource models. Arches does not do this automatically because there are many ways to design a flattened table depending on your needs.

You can add any number of database views representing a given resource model either for export, or to connect directly to a GIS client such as QGIS or ArcGIS. When writing a view to support shapefile export be sure that your view does not violate any shapefile restrictions. For example, shapefile field names are limited to 10 characters with no special characters and text fields cannot store more than 255 characters.

If you plan to use the arches export command to export your view as a shapefile, you also need to be sure that your view contains 2 fields: geom with the geometry representing your resource instance’s location and geom_type with the postgis geometry type of your geom column.

To write your view, you should start by getting a mapping file for your resource. You can do that by going to the Arches Designer page and then in the manage dropdown of your resource model select Create Mapping File. A zip file will be downloaded and within that file you will find your .mapping file. This file lists all the ids that you will need to design your view.

Below is an example of a simple resource model view. If a resource instance has a tile with geojson saved to it, that tile will be represented as a record in the view along with the corresponding nodeid and tileid. A unique id (gid) is assigned to each row. If a node has more than one geometry, the geometries are combined into a multipart geometry. If a node has more than one geometry of different types, a record will be created for each type. The UUID (ab74af76-fa0e-11e6-9e3e-026d961c88e6) in the last line of this this example is the id of the view’s resource model.

  1. When creating your own view, you will need to replace this UUID with your own resource model’s id. You can find this UUID in your mapping file assigned to the property: resource_model_id.

    CREATE OR REPLACE VIEW vw_monuments_simple AS
    WITH mv AS (SELECT tileid, resourceinstanceid, nodeid, ST_Union(geom) as geom, ST_GeometryType(geom) AS geom_type
    FROM mv_geojson_geoms
    GROUP BY tileid, nodeid, resourceinstanceid, ST_GeometryType(geom))
    SELECT row_number() OVER () AS gid,
        mv.resourceinstanceid,
        mv.tileid,
        mv.nodeid,
        ST_GeometryType(geom) AS geom_type,
        geom
    FROM mv
    WHERE (SELECT graphid FROM resource_instances WHERE mv.resourceinstanceid = resourceinstanceid) = 'ab74af76-fa0e-11e6-9e3e-026d961c88e6'
    
  2. Here is a more complete example which includes columns with tile data:

    CREATE OR REPLACE VIEW vw_monuments AS
    WITH mv AS (select tileid, resourceinstanceid, nodeid, ST_Union(geom) AS geom, ST_GeometryType(geom) AS geom_type
    FROM mv_geojson_geoms
    GROUP BY tileid, nodeid, resourceinstanceid, ST_GeometryType(geom))
    SELECT
        row_number() over () AS gid,
        mv.resourceinstanceid,
        mv.tileid,
        mv.nodeid,
        ST_GeometryType(geom) AS geom_type,
        name_tile.tiledata ->> '677f303d-09cc-11e7-9aa6-6c4008b05c4c' AS name,
        (SELECT value FROM values WHERE cast(name_tile.tiledata ->> '677f39a8-09cc-11e7-834a-6c4008b05c4c' AS uuid) = valueid ) AS nametype,
        (SELECT value FROM values WHERE cast(component.tiledata ->>'ab74b009-fa0e-11e6-9e3e-026d961c88e6' AS uuid) = valueid ) AS construction_type,
        array_to_string((select array_agg(v.value) FROM unnest(ARRAY(SELECT jsonb_array_elements_text(component.tiledata -> 'ab74afec-fa0e-11e6-9e3e-026d961c88e6'))::uuid[]) item_id LEFT JOIN values v ON v.valueid=item_id), ',') AS const_tech,
        (SELECT value FROM values WHERE cast(record.tiledata ->> '677f2c0f-09cc-11e7-b412-6c4008b05c4c' AS uuid) = valueid ) AS record_type,
        geom
    FROM mv
    LEFT JOIN tiles name_tile
        ON mv.resourceinstanceid = name_tile.resourceinstanceid
        AND name_tile.tiledata->>'677f39a8-09cc-11e7-834a-6c4008b05c4c'
        != ''
    LEFT JOIN tiles component
        ON name_tile.resourceinstanceid = component.resourceinstanceid
        AND component.tiledata->>'ab74afec-fa0e-11e6-9e3e-026d961c88e6'
        != ''
    LEFT JOIN tiles record
        ON name_tile.resourceinstanceid = record.resourceinstanceid
        AND record.tiledata->>'677f2c0f-09cc-11e7-b412-6c4008b05c4c'
        != ''
    WHERE (SELECT graphid FROM resource_instances WHERE mv.resourceinstanceid = resourceinstanceid) = 'ab74af76-fa0e-11e6-9e3e-026d961c88e6'
    
  3. You will notice that for each node added as a column in the table, we perform a LEFT JOIN to the tiles table and the nodeid from which we want data. Here is an example joining to the tile containing the record node which has a nodeid of 677f2c0f-09cc-11e7-b412-6c4008b05c4c.

    LEFT JOIN tiles record
        ON name_tile.resourceinstanceid = record.resourceinstanceid
        AND record.tiledata->>'677f2c0f-09cc-11e7-b412-6c4008b05c4c'
        != ''
    
  4. We can then define a field be referencing that tile:

    (SELECT value FROM values WHERE cast(record.tiledata ->> '677f2c0f-09cc-11e7-b412-6c4008b05c4c' AS uuid) = valueid ) AS record_type
    
  5. How you define your fields depends largely on what the node datatype is:

    A node with a string datatype:

    name_tile.tiledata ->> '677f303d-09cc-11e7-9aa6-6c4008b05c4c' AS name
    

    A node with a concept value id. The following returns the concept values label:

    (SELECT value FROM values WHERE cast(name_tile.tiledata ->> '677f39a8-09cc-11e7-834a-6c4008b05c4c' AS uuid) = valueid ) AS nametype
    

    A node with a concept-list. The following returns a concatenated string of concept value labels:

    array_to_string((SELECT array_agg(v.value) FROM unnest(ARRAY(SELECT jsonb_array_elements_text(component.tiledata -> 'ab74afec-fa0e-11e6-9e3e-026d961c88e6'))::uuid[]) item_id LEFT JOIN values v ON v.valueid=item_id), ',') AS const_tech