Resource Import/Export

Currently, all data import and export operations happen through the Arches command line interface.

Uploading a CSV to Arches

To do a bulk import of resources to your Arches database, you must create a correctly formatted CSV (comma-separated values) file. We recommend MS Excel or Open Office for this task. More advanced users will likely find a custom scripting effort to be worthwhile.

To upload a shapefile see Uploading a Shapefile to Arches below.

All values containing commas must be quoted so the value is not misinterpreted by the importer as multiple csv columns. This includes strings, concept-lists, and multi-geoms.

The workflow for creating a CSV should be something like this:

  1. Identify which Resource Model you are loading data into
  2. Download the mapping file and concepts file for that resource model
  3. Modify the mapping file to reference your CSV
  4. Populate the CSV with your data
  5. Import the CSV using the import_business_data command.

CSV File Requirements

Each row in the CSV can contain the attribute values of one and only one resource.

The first column in the CSV must be named ResourceID. ResourceID is a user-generated unique ID for each individual resource. If ResourceID is a valid UUID, Arches will adopt it internally as the new resource’s identifier. If ResourceID is not a valid UUID Arches will create a new UUID and use that as the resource’s identifier. Subsequent columns can have any name.

ResourceIDs must be unique among all resources imported, not just within each csv, for this reason we suggest using UUIDs.

Resource ID attribute 1 attribute 2 attribute 3
1 attr. 1 value attr. 2 value attr. 3 value
2 attr. 1 value attr. 2 value attr. 3 value
3 attr. 1 value attr. 2 value attr. 3 value

Simple CSV with three resources, each with three different attributes.

Or, in a raw format (if you open the file in a text editor), the CSV should look like this:

Resource ID,attribute 1,attribute 2,attribute 3
1,attr. 1 value,attr. 2 value,attr. 3 value
2,attr. 1 value,attr. 2 value,attr. 3 value
3,attr. 1 value,attr. 2 value,attr. 3 value

Multiple lines may be used to add multiple attributes to a single resource. You must make sure these lines are contiguous, and every line must have a ResourceID. Other cells are optional.

Resource ID attribute 1 attribute 2 attribute 3
1 attr. 1 value attr. 2 value attr. 3 value
2 attr. 1 value attr. 2 value attr. 3 value
2   attr. 2 additional value  
3 attr. 1 value attr. 2 value attr. 3 value

CSV with three resources, one of which has two values for attribute 2.

Depending on your Resource Model’s graph structure, some attributes will be handled as “groups”. For example, Name and Name Type attributes would be a group. Attributes that are grouped must be on the same row. However, a single row can have many different groups of attributes in it, but there may be only one of each group type per row. (e.g. you cannot have two names and two name types in one row).

Resource ID name name_type description
1 Yucca House Primary* “this house, built in...”
2 Big House Primary* originally a small cabin
2 Old Main Building Historic*  
3 Writer’s Cabin Primary* housed resident authors

CSV with three resources, one of which has two groups of name and name_type attributes. See below for more on the ‘Primary’ and ‘Historic’ labels.

You must have values for any required nodes in your resource models.

The data_type attribute in the mapping file(see below) can help you determine what data type (string, number, conceptid, geometry, etc.) is valid for each column in your csv.

Some data types require specially formatted values:

  • Dates All dates must be formatted as YYYY-MM-DD. Dates that are not four digits must be zero padded.

  • Geometry All geometry must be formatted in “well-known text” (WKT), and that all coordinates stored as WGS 84 (EPSG:4326) decimal degrees. Multi geoms must be in quotes.

  • Rich Text Rich text must be quoted html.

  • Concepts See the Concepts section below.

  • concept-list and domain-value-list Values must be represented as quoted, comma-separated strings in the source CSV file. Each value within the concept/domain list does not need to be in quotation marks.

    concept-list example:

    "3b4bddbb-0ca6-44d5-8622-ce9982412883,10ac2b9b-3425-4c3c-9b5c-983bb16f93d3,052e8714-3f19-4179-b1f9-dcbe45df673e"
    

Mapping File Requirements

All CSV files must be accompanied by a mapping file. This is a JSON-structured file that indicates which node in a Resource Model’s graph each column in the CSV file should map to. The mapping file should contain the source column name populated in the file_field_name property for all nodes in a graph the user wishes to map to. The mapping file should be named exactly the same as the CSV file but with the extension ‘.mapping’, and should be in the same directory as the CSV.

To create a mapping file for a Resource Model in your database, go to the Arches Designer landing page. Find the Resource Model into which you plan to load resources, and choose Export Mapping File from the Manage menu.

../_images/create_mapping_file.gif

Unzip the download, and you’ll find a .mapping file as well as a ...concepts.json file (more on the latter below). The contents of the mapping file will look something like this:

{
    "resource_model_id": "bbc5cee8-fa16-11e6-9e3e-026d961c88e6",
    "resource_model_name": "HER Buildings",
    "nodes": [
        {
            "arches_nodeid": "bbc5cf1f-fa16-11e6-9e3e-026d961c88e6",
            "arches_node_name": "Name",
            "file_field_name": "",
            "data_type": "concept",
            "concept_export_value": "label",
            "export": false
        },
        {
            "arches_nodeid": "d4896e3b-fa30-11e6-9e3e-026d961c88e6",
            "arches_node_name": "Name Type",
            "file_field_name": "",
            "data_type": "concept",
            "concept_export_value": "label",
            "export": false
        },
        ...
    ]
}

The mapping file contains cursory information about the resource model (name and resource model id) and a listing of the nodes that comprise that resource model. Each node contains attributes to help you import your business data (not all attributes are used on import, some are there simply to assist you). The concept_export_value attribute is only present for nodes with datatypes of concept, ‘concept-list’, ‘domain’, and ‘domain-list’ - this attribute is not used for import. It is recommended that you not delete any attributes from the mapping file. If you do not wish to map to a specfic node simply leave the file_field_name attribute blank.

You will now need to enter the column name from your CSV into the file_field_name in appropriate node in the mapping file. For example, if your CSV has a column named “activity_type” and you want the values in this column to populate “Activity Type” nodes in Arches, you would add that name to the mapping file like so:

{
    ...
        {
            "arches_nodeid": "bbc5cf1f-fa16-11e6-9e3e-026d961c88e6",
            "arches_node_name": "Activity Type",
            "file_field_name": "activity_type", <-- place column name here
            "data_type": "concept",
            "concept_export_value": "label",
            "export": false
        },
   ...
}

To map more than one column to a single node, simply copy and paste that node within the mapping file.

Concept Values in CSVs, and the concepts file

Values in the CSV that represent concepts in the RDM (Reference Data Manager) must not use the label of that concept, but its actual valueid that represents that concept in the RDM. Thus, in the above example, ‘Primary’ and ‘Historic’, because these are actually concepts stored in the RDM, would actually need to be replaced by the valueid (a UUID) for their respective concepts.

To aid with the process, a concepts file is created every time you download a mapping file, which lists the concept valueids and corresponding labels for all of the Concept Collections associated with any of the Resource Model’s nodes. For example:

"Name Type": {
    "ecb20ae9-a457-4011-83bf-1c936e2d6b6a": "Historic",
    "81dd62d2-6701-4195-b74b-8057456bba4b": "Primary"
},

Thus, using the concepts file, the example above should be changed to look like this:

Resource ID name name_type description
1 Yucca House 81dd62d2-6701-4195-b74b-8057456bba4b “this house, built in...”
2 Big House 81dd62d2-6701-4195-b74b-8057456bba4b originally a small cabin
2 Old Main Building ecb20ae9-a457-4011-83bf-1c936e2d6b6a  
3 Writer’s Cabin 81dd62d2-6701-4195-b74b-8057456bba4b housed resident authors

Uploading a Shapefile to Arches

Uploading a shapefile to Arches is very similar to uploading a csv file with a few exceptions. The same rules apply to rich text, concept data, grouped data, and contiguousness. And, like csv import, shapefile import requires a mapping file.

The shapefile must contain a field with a unique identifier for each resource named ‘ResourceID’.

The shapefile must be in WGS 84 (EPSG:4326) decimal degrees.

The shapefile must consist of at least a .shp, .dbf, .shx, and .prj file. It may be zipped or unzipped.

Dates in a shapefile can be in Esri Shapefile date format, Arches will convert them to the appropriate date format.

In your mapping file, the node you wish to map the geometry to must have a file_field_name value of ‘geom’.

To import a shapefile use the same command as csv import, simply point to your shapefile instead of your csv file. import_business_data

Note: More complex geometries may encounter a mapping_parser_exception error. This error occurs when a geometry is not valid in elasticsearch. To resolve this, first make sure your geometry is valid using ArcMap, QGIS, or PostGIS. Next, you can modify the precision of your geometry to 5 decimals or you can simplify your geometry using the QGIS simplify geometry geoprocessing tool, or the PostGIS st_snaptogrid function.

Exporting Arches Data

More documentation coming soon... Currently all export commands are listed in the Wiki documentation.