Close

Reguła automatyzacji Jira podczas scalania pull requestu

Posted by: AJ Welch

While the potential alterations made to the schema of a table in Google BigQuery are fairly limited, there are a essentially two types of changes you are allowed to perform. You may add a new NULLABLE or REPEATED column, or you may alter an existing column’s mode (from REQUIRED to NULLABLE). Beyond that, BigQuery does not allow many other changes, such as column removal or renaming (though these can be performed indirectly by copying the columns you wish to retain to a new table, destroying the original, then creating a replacement table with the new data).

For the time being we’ll go over the methods for adding a new column to a table in this tutorial.


Nie ograniczanie się do Agile


Adding a column through the BigQuery WebUI is a very simple process:

  • Open the BigQuery WebUI.
  • Select the projectdataset, and finally table you wish to alter.
  • Click the Add New Fields button.
  • For each field you wish to add, enter the name, select the type, and alter the mode (if necessary).
  • Once complete, click the Add to Table button to finalize your updates.

Adding nested records

For more complex schema additions such as nested fields within a Record type field, click the + symbol next to the Record type selection box. Now the new field box that was added will be nested within the Record field above.

Nie ograniczanie się do Agile


To update a table and add a new column using the BigQuery API, you have two options for which method to utilize: Tables.patch or Tables.updateTables.patch only updates the fields that are added/modified, whereas Tables.update replaces the entire table resource with the new schema you provided. Therefore, in almost all cases, Tables.patch is the preferred method.

As with all API calls in BigQuery, the key is formatting your configuration string appropriately before submitting it to the API method. In this case, the call to Tables.patch requires the full table schema be supplied, which just contains the fields list, which is a list of fields containing namemode, and type values.

The important thing to note is that whether you are adding columns or just updating modes for existing columns, you must supply the full schema to the API call.

For example, we have the simple melville table with a few existing fields: BookMeta_TitleBookMeta_DateBookMeta_CreatorBookMeta_Language, and BookMeta_Publisher. We want to add a last_updated field of the TIMESTAMP type, so our entire configuration script for the API looks like this:

{
  "schema": {
    "fields": [
      {
        "mode": "NULLABLE",
        "name": "BookMeta_Title",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "BookMeta_Date",
        "type": "TIMESTAMP"
      },
      {
        "mode": "NULLABLE",
        "name": "BookMeta_Creator",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "BookMeta_Language",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "BookMeta_Publisher",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "last_updated",
        "type": "TIMESTAMP"
      }
    ]
  }
}

In essence, all we’ve done is listed all the existing fields exactly, then added our new last_updated field to the bottom. Once submitted via the API patch call, we get the expected result:

{
 "kind": "bigquery#table",
 "etag": "\"T7tifokHjXTVbjJPVpstHNnQ7nk/MTQ2OTMyMzMyOTIxMQ\"",
 "id": "bookstore-1382:exports.melville",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/bookstore-1382/datasets/exports/tables/melville",
 "tableReference": {
  "projectId": "bookstore-1382",
  "datasetId": "exports",
  "tableId": "melville"
 },
 "schema": {
  "fields": [
   {
    "name": "BookMeta_Title",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "BookMeta_Date",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
   },
   {
    "name": "BookMeta_Creator",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "BookMeta_Language",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "BookMeta_Publisher",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "last_updated",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
   }
  ]
 },
 "numBytes": "1344",
 "numLongTermBytes": "0",
 "numRows": "12",
 "creationTime": "1469323329211",
 "lastModifiedTime": "1469323329211",
 "type": "TABLE"
}

And sure enough the table schema now has our new field added on!


Następny temat