Reguła automatyzacji Jira podczas scalania pull requestu
Posted by: AJ Welch
While the potential alterations made to the schema of a table in Google BigQuery are fairly limited, there are a essentially two types of changes you are allowed to perform. You may add a new NULLABLE
or REPEATED
column, or you may alter an existing column’s mode
(from REQUIRED
to NULLABLE
). Beyond that, BigQuery does not allow many other changes, such as column removal or renaming (though these can be performed indirectly by copying the columns you wish to retain to a new table, destroying the original, then creating a replacement table with the new data).
For the time being we’ll go over the methods for adding a new column to a table in this tutorial.
Nie ograniczanie się do Agile
Adding a column through the BigQuery WebUI is a very simple process:
- Open the BigQuery WebUI.
- Select the
project
,dataset
, and finallytable
you wish to alter. - Click the
Add New Fields
button. - For each field you wish to add, enter the
name
, select thetype
, and alter themode
(if necessary). Once complete, click the
Add to Table
button to finalize your updates.
Adding nested records
For more complex schema additions such as nested fields within a Record
type field, click the +
symbol next to the Record
type selection box. Now the new field box that was added will be nested within the Record
field above.
Nie ograniczanie się do Agile
To update a table and add a new column using the BigQuery API, you have two options for which method to utilize: Tables.patch or Tables.update
. Tables.patch
only updates the fields that are added/modified, whereas Tables.update
replaces the entire table resource with the new schema you provided. Therefore, in almost all cases, Tables.patch
is the preferred method.
As with all API calls in BigQuery, the key is formatting your configuration string appropriately before submitting it to the API method. In this case, the call to Tables.patch
requires the full table schema
be supplied, which just contains the fields
list, which is a list of fields containing name
, mode
, and type
values.
The important thing to note is that whether you are adding columns or just updating modes
for existing columns, you must supply the full schema to the API call.
For example, we have the simple melville
table with a few existing fields: BookMeta_Title
, BookMeta_Date
, BookMeta_Creator
, BookMeta_Language
, and BookMeta_Publisher
. We want to add a last_updated
field of the TIMESTAMP type
, so our entire configuration script for the API looks like this:
{
"schema": {
"fields": [
{
"mode": "NULLABLE",
"name": "BookMeta_Title",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "BookMeta_Date",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "BookMeta_Creator",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "BookMeta_Language",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "BookMeta_Publisher",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "last_updated",
"type": "TIMESTAMP"
}
]
}
}
In essence, all we’ve done is listed all the existing fields exactly, then added our new last_updated
field to the bottom. Once submitted via the API patch call, we get the expected result:
{
"kind": "bigquery#table",
"etag": "\"T7tifokHjXTVbjJPVpstHNnQ7nk/MTQ2OTMyMzMyOTIxMQ\"",
"id": "bookstore-1382:exports.melville",
"selfLink": "https://www.googleapis.com/bigquery/v2/projects/bookstore-1382/datasets/exports/tables/melville",
"tableReference": {
"projectId": "bookstore-1382",
"datasetId": "exports",
"tableId": "melville"
},
"schema": {
"fields": [
{
"name": "BookMeta_Title",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "BookMeta_Date",
"type": "TIMESTAMP",
"mode": "NULLABLE"
},
{
"name": "BookMeta_Creator",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "BookMeta_Language",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "BookMeta_Publisher",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "last_updated",
"type": "TIMESTAMP",
"mode": "NULLABLE"
}
]
},
"numBytes": "1344",
"numLongTermBytes": "0",
"numRows": "12",
"creationTime": "1469323329211",
"lastModifiedTime": "1469323329211",
"type": "TABLE"
}
And sure enough the table schema now has our new field added on!