To create a Dataset click on the Create New Dataset tile on your current project's datasets page. You will be guided through a set of steps in which you define the types of Concepts you want your Dataset to contain.
Step 1: Dataset settings
Choose an appropriate name for the Dataset, an example would be to name it after the collection of Concepts that it will contain. You need to name it unambiguously. The namespace for the Dataset is a URN and is set automatically – urn:test:countries in the example below. The namespace is also important as this will be used as part of the API endpoint URL if you want to access this Dataset via the API.
Choose whether the Dataset is public or private with respect to API access. A public Dataset can be accessed using the API from any client on the internet using an API key. A private Dataset can only be accessed using the API with OAuth / OpenID authentication.
When you are happy with the main settings, click Add Types to configure the Concept types you want the Dataset to contain.
Step 2: Add a Concept type
Next you must add one or more types of Concepts that you want to include in the Dataset. We call this the Dataset Schema. You don't have to be exhaustive at this point as you can edit your schema at any stage before you add your data.
Choose the first type you want to add, either selecting a type you have already defined or one of Data Graphs' own types, or create a brand new type. In the example below we are naming a new type Country as we want to include Country Concepts in the Dataset. When happy, click the Add Type button.
Step 3: Customise the type
Now you need to choose if this Concept type is a sub-class of some existing type or not. If it is a sub-class, you can select the Parent Type from the list of existing types. If you do choose a parent type, you can then choose to include the properties from the parent Concept type in this new (child) type – typically you should choose to inherit the parent Concept properties.
Automatic or manual identifiers
You must choose whether to use manual or automatic identifiers. If you choose manual, for every Concept of this type you create you will need to supply an identifier that unique with respect to the class / type of Concept. If you choose automatic you will not need to supply identifiers for each Concept, and Data Graphs will generate a globally unique identifier for you.
Defining the Concept properties
Now add the properties to your Concepts schema. By default each Concept has an id and a label. The id property name cannot be changed but you can change the label property name to something else (eg name is a good alternative).
Now you can define the custom properties that will describe each instance of your Concept. In our example Country has capitalCity, isoCode, flagImage, lat, longitude and borders properties.
For each property you must choose its data type. Property data types can be primitive types like text, decimal, integer and boolean, or specialised data types like latitude / longitude. Alternatively you can make a relationship property that references a different Concept type defined in this or another Dataset.
For each property, you can choose modifiers that describe whether the property:
- is optional – you do not need to supply a value for an optional field when editing data
- is identifier – if you have selected manual identifiers, you may select one custom property as the field that will uniquely identify Concepts of this type. You do not need to choose a field to be the identifier, and instead you can just use the default id field if this is preferable. If you do define an identifier property, the property datatype must be a keyword or an integer.
- allow multiple – indicates if the property allows multiple values (an array).
- preview image – if you have selected an imageURL datatype, you can choose if this property holds a canonical preview image of the Concept.
- is nested – if you have defined a relationship property that references another Concept, you can optionally choose this to be nested inside the Concept itself. This allows you to create complex data structures within the concept. A more typical scenario is simply to link to Concepts elsewhere in your Datasets (the knowledge graph and linked-data pattern). In this case leave this option unchecked.
Our Country example uses manual identifiers, which means a Concept should either use the standard id property as its identifier or use a newly defined property. In this case we have defined an isoCode field and marked it as the identifier. You can see the other custom properties we have chosen for our Country type below:
When you are happy with this Concept schema definition, click Review Types.
Step 4: Review types and save the Dataset
Finally you get to review the Concept types you have defined for this Dataset, you can edit, remove, or add more types from this final screen. If you choose to add another type, you will repeat Step 2 and 3 for the additional Concept type.
When you have defined all your types, click the Save Dataset button and Data Graphs will create an empty Dataset ready for you to add Concepts.