A data source binds external data to Chioro. It reads or receives the data from an external endpoint and converts it to Chioro’s internal format during execution.
Data sources represent the starting points of a flow, so to speak. Consequently, each flow requires at least one data source.
The configuration of a data source includes two main aspects:
- In what format is the data available?
- With which character set the data should be read
- Where does the data come from?
Type of the data source
Currently there are four basic ways to import product data into Chioro:
- Upload a file via the browser. Mainly intended for testing and playing.
- Remote file. Direct (pull) access to data via a number of protocols. FTP, SFTP, S3 (various providers), and Microsoft Azure, among others. See below for more details.
- API Push. Using the Push API, it is possible to actively send data to a Chioro data source. See below for more details.
- Operation Using the operation it is possible to import the result of an operation from another flow into the data source and use it as a source.
Upload file
If this type is selected, a gray area appears where a file can be dragged and dropped. Alternatively, clicking on the area opens a file selection dialog.
- While uploading the file please do not navigate to another page, close the browser or refresh the page. Otherwise, the file upload will have to be performed again.
- Please also note that uploading is only useful for smaller files up to about 100 MB.
Remote file
For direct access to a remote file, several, additional settings are made:
- An endpoint is selected in the Storage field. A storage is configured in the Admin menu, i.e. access data for an S3 bucket is stored there, for example. The name assigned there appears here for selection.
- The Path field then specifies the path of the file to be used relative to this endpoint.
At least the root path must be entered, i.e. a single slash: /
If the file is in a subfolder, the path could look like this: /Merchant/Subfolder/Input. - In Filename the file can be specified directly, e.g. CustomerData.csv or a wildcard is used. If more than one matching file is determined when evaluating the wildcard, Chioro will take the newest one.
An example, here * is used to represent any characters:
The following files are in a folder:
filename | modification date |
---|---|
data1.csv | 1.1.2020 10:00 |
data2.csv | 1.1.2020 10:30 |
data3.csv | 1.1.2020 10:45 |
data1.json | 1.1.2020 11:30 |
data2.json | 1.1.2020 11:00 |
data1.xml | 1.1.2020 12:30 |
Examples of the file determined and intended for import:
wildcard | determined file | comment |
---|---|---|
*.csv | data3.csv | The newest file with the extension .csv |
*2.csv | data2.csv | The only file to which *2.csv applies |
* | data1.xml | The most recent/newest of all files |
data1.* | data1.xml | The newest file starting with data1 |
Endpoints represent a kind of “base URL”. Since these usually contain confidential information such as passwords or API keys, their configuration is done in the admin area. Appropriate admin rights are required for this. Please contact your Chioro administrator or our Support if you do not have the appropriate rights.
API Push
If this type is selected, a URL specific to this data source will be generated and displayed. This URL can be used to send data to the data source via HTTP PUSH.
Data sets that are sent to the Push API are first buffered in the data source. Only when a Ready-For-Processing message is sent, the data source becomes “active”. That is, only then is the data analyzed and only then is the data available for subsequent operations. Furthermore, only then the data source is recognized by a trigger as “ready with new data”.
However, a push data source can be activated immediately by starting the import manually (via the “Import” button). However, only partial data may then be available.
More about the usage in a data source can be found below at External data sources.
Operation
Using this option it is possible to select the result of another operation from another flow and use it as a source.
In this example, the data source is the output of a split named “Split” imported from “Flow number 1”. This data can then be further used for any operations.
However, it is also possible to use another operation as a source:
Here you can see that also a source of another flow can be set as source for the own flow. All operations except “Data destination” can be used as source.
Data formats
Currently Chioro supports the following import formats:
- JSON
- Excel (xlsx)
- CSV
- BMEcat v1.2
- BMEcat 2005
- Data formats for connection to the commercetools product data API
- as well as user-specific formats, if applicable, provided these are activated
For further, e.g. user-specific data formats, please contact our Support.
External data sources
This type of data source enables the integration of external systems. To use these sources, an external program must be able to transfer records in a specific format (JSON) via “REST over HTTP”. Unlike the rest of the Chioro API, a special API key is required to access this endpoint, which must be submitted as part of the HTTP headers. The following describes the header and body format in detail.
Header
To create an API token, please go to the admin area and create a configuration of type ‘API Key’. Please note the API key that will be displayed immediately after creation. This key is displayed only once and cannot be viewed afterwards. To use the external API, please use this key in the HTTP header in the following format (replace the <<API_TOKEN>> below):
Authorization: Token <<API_TOKEN>>
Body
The text of the request must be formatted as follows:
{
"executionId": "an ID, not required but recommended",
"operation": "one of CLEAR, APPEND or READY_TO_PROCESS. Description below",
"data" : [{}, {}, {}, ...]
}
Please note that only the “POST” statement is supported.
Operation
- CLEAR: the data source is regenerated and cleared. All existing data will be removed.
- APPEND: add a record or a set of records to the data source. If the data source does not exist, it will be regenerated.
- READY_TO_PROCESS: indicates the logical end of the data stream. At this point the data source is ready to be used and the import will be performed.
Data
data
must always be specified as an array, even if it contains only one element. For some operations, such as CLEAR, data
is
is optional, when used, the data source is first cleared and then filled with the data.
The format of the data is not predefined and depends on the client. However, an important limitation of Chioro is that, once a certain attribute is transmitted in a certain format, all subsequent lines must respect this format and send data only in this format. For example, if the attribute “color” is sent as the string “red”, no other line may send the same attribute “color” with a nested format like {“hue” : 12, “saturation”: “22”, …}. So “color” must not appear once as a simple string and another time as an array.
Response
Ideally, the operation returns an empty list if everything could be imported successfully. If one of the lines causes an error it will be displayed in the response list.