Close

Règle d'automatisation Jira pour le merge d'une pull request

Posted by: AJ Welch

As with most relational databases, there may often be situations where you need to combine the results of multiple queries into one single dataset when using Google BigQuery. Typically in BigQuery, this occurs when you’re gathering data from multiple tables or even across datasets, and this is where the power of using a UNION comes into play.

In this tutorial we’ll examine uniting results in BigQuery using both the default Legacy SQL syntax as well as the optional Standard SQL syntax.


Au-delà d'Agile


The default syntax of Legacy SQL in BigQuery makes uniting results rather simple. In fact, all it requires at the most basic level is listing the various tables in a comma-delimited list within the FROM clause.

For example, assuming all data sources contain identical columns, we can query three different tables in the gdelt-bq:hathitrustbooks dataset and combine the result set with the following query:

SELECT
  BookMeta_Identifier,
  BookMeta_Date,
  BookMeta_Title
FROM
  [gdelt-bq:hathitrustbooks.1920],
  [gdelt-bq:hathitrustbooks.1921],
  [gdelt-bq:hathitrustbooks.1922]
ORDER BY
  BookMeta_Identifier DESC
LIMIT
  1000

Since each of the tables contain the same columns and in the same order, we don’t need to specify anything extra in either the SELECT clause nor the filter options that follow, and yet BigQuery is intelligent enough to translate this query into a UNION ALL to combine all the results into one dataset.

The gathered results from all three tables are ordered, as intended, by the BookMeta_Identifier column:

[
  {
    "BookMeta_Identifier": "yul.11995746_000_00",
    "BookMeta_Date": "1922",
    "BookMeta_Title": "Foundations a statement of Christian belief in terms of modern thought: by seven Oxford men."
  },
  {
    "BookMeta_Identifier": "yul.11729715_000_00",
    "BookMeta_Date": "1922",
    "BookMeta_Title": "Frankenstein or, The modern Prometheus, by Mary W. Shelley."
  },
  {
    "BookMeta_Identifier": "yul.11368340_000_00",
    "BookMeta_Date": "1920",
    "BookMeta_Title": "Pilgrim tercentenary, 1620-1920 "
  },
  {
    "BookMeta_Identifier": "yul.11165435_000_00",
    "BookMeta_Date": "1922",
    "BookMeta_Title": "Social and diplomatic memories, 1884-1893 by the Right Hon. Sir James Rennell Rodd ..."
  },
  {
    "BookMeta_Identifier": "yale.39002044555168",
    "BookMeta_Date": "1921",
    "BookMeta_Title": "South India and her Muhammadan invaders, by S. Krishnaswami Aiyangar."
  },
  ...
]

Au-delà d'Agile


While the comma-delimited union method seen above for Legacy SQL is convenient, using the Standard SQL option with Google BigQuery requires a more verbose (yet also more familiar) method when combining result sets.

Instead of using comma-delimation, we must revert back to the normal use of the UNION option (followed by the ALL or DISTINCT keyword, as appropriate). For example, to perform the exact same uniting query on the gdelt-bq:hathitrustbooks dataset as seen above, our full query will look like this:

SELECT
    BookMeta_Identifier,
    BookMeta_Date,
    BookMeta_Title
  FROM
    `gdelt-bq.hathitrustbooks.1920`

UNION ALL

  SELECT
    BookMeta_Identifier,
    BookMeta_Date,
    BookMeta_Title
  FROM
    `gdelt-bq.hathitrustbooks.1921`

UNION ALL

  SELECT
    BookMeta_Identifier,
    BookMeta_Date,
    BookMeta_Title
  FROM
    `gdelt-bq.hathitrustbooks.1922`
  ORDER BY
    BookMeta_Identifier DESC
  LIMIT
    1000

While the spacing and indentation in the example above is irrelevant, it better illustrates what we’re doing: Taking the results of (3) unique queries, uniting them via (2) UNION ALL options, and then sorting and limiting the result set afterward.

As expected, the final united results are identical to the comma-delimited method using Legacy SQL:

[
  {
    "BookMeta_Identifier": "yul.11995746_000_00",
    "BookMeta_Date": "1922",
    "BookMeta_Title": "Foundations a statement of Christian belief in terms of modern thought: by seven Oxford men."
  },
  {
    "BookMeta_Identifier": "yul.11729715_000_00",
    "BookMeta_Date": "1922",
    "BookMeta_Title": "Frankenstein or, The modern Prometheus, by Mary W. Shelley."
  },
  {
    "BookMeta_Identifier": "yul.11368340_000_00",
    "BookMeta_Date": "1920",
    "BookMeta_Title": "Pilgrim tercentenary, 1620-1920 "
  },
  {
    "BookMeta_Identifier": "yul.11165435_000_00",
    "BookMeta_Date": "1922",
    "BookMeta_Title": "Social and diplomatic memories, 1884-1893 by the Right Hon. Sir James Rennell Rodd ..."
  },
  {
    "BookMeta_Identifier": "yale.39002044555168",
    "BookMeta_Date": "1921",
    "BookMeta_Title": "South India and her Muhammadan invaders, by S. Krishnaswami Aiyangar."
  },
  ...
]

Thème suivant