Buildfiles¶
A buildfile
contains a list of operations to be performed on data. Think of it as a script for a spreadsheet.
JSON and YAML format are supported. databuild
will guess the format based on the file extension.
An example of build file could be:
[
{
"operation": "sheets.import_data",
"description": "Importing data from csv file",
"params": {
"sheet": "dataset1",
"format": "csv",
"filename": "dataset1.csv",
"skip_last_lines": 1
}
},
{
"operation": "columns.add_column",
"description": "Calculate the gender ratio",
"params": {
"sheet": "dataset1",
"name": "Gender Ratio",
"expression": {
"language": "python",
"content": "return float(row['Male Total']) / float(row['Female Total'])"
}
}
},
{
"operation": "sheets.export_data",
"description": "save the data",
"params": {
"sheet": "dataset1",
"format": "csv",
"filename": "dataset2.csv"
}
}
]
The same file in yaml:
- operation: sheets.import_data
description: Importing data from csv file
params:
sheet: dataset1
format: csv
filename: dataset1.csv
skip_last_lines: 1
- operation: columns.add_column
description: Calculate the gender ratio
params:
sheet: dataset1
name: Gender Ratio
expression:
language: python
content: "return float(row['Male Total']) / float(row['Female Totale'])"
- operation: sheets.export_data
description: save the data
params:
sheet: dataset1
format: csv
filename: dataset2.csv
You can split a buildfile in different files, and have databuild
process the directory that contains them.
Build files will be executed in alphabetical order. It’s recommended that you name them starting with a number indicating their order of execution. For example:
├── buildfiles
├── 1_import.yaml
├── 2_add_column.yaml
├── 3_export.yaml
└── data
└── dataset1.csv