JINIers
220607_Using BigQuery to do Analysis 본문
BigQuery를 사용하여 분석하기
[학습내용]
- bigquery 콘솔에서 대화형 쿼리 수행
- 다양한 데이터세트를 조합하여 분석을 실행
1. 자전거 대여 데이터 살펴보기
bigquery > add data > explore public datasets > 'NYC bike' 검색 > view datasets
* 테이블에 뜨지 않으면 pin a project > enter project name > project name 입력
bigquery-public-data > new_york_citibike > citibike_trips 선택
# 쿼리 입력
SELECT
MIN(start_station_name) AS start_station_name,
MIN(end_station_name) AS end_station_name,
APPROX_QUANTILES(tripduration, 10)[OFFSET (5)] AS typical_duration,
COUNT(tripduration) AS num_trips
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`
WHERE
start_station_id != end_station_id
GROUP BY
start_station_id,
end_station_id
ORDER BY
num_trips DESC
LIMIT
10
# 각 자전거의 총 운행거리 쿼리 입력
WITH
trip_distance AS (
SELECT
bikeid,
ST_Distance(ST_GeogPoint(s.longitude,
s.latitude),
ST_GeogPoint(e.longitude,
e.latitude)) AS distance
FROM
`bigquery-public-data.new_york_citibike.citibike_trips`,
`bigquery-public-data.new_york_citibike.citibike_stations` as s,
`bigquery-public-data.new_york_citibike.citibike_stations` as e
WHERE
start_station_id = s.station_id
AND end_station_id = e.station_id )
SELECT
bikeid,
SUM(distance)/1000 AS total_distance
FROM
trip_distance
GROUP BY
bikeid
ORDER BY
total_distance DESC
LIMIT
5
2. 날씨 데이터 세트 탐색
ghcnd_d > ghcnd_2015 선택
# 쿼리에서 제공한 ID와 일치하는 뉴욕 시내 기상관측소에서 관측된 2015년 모든 날짜의 강수량 표시
SELECT
wx.date,
wx.value/10.0 AS prcp
FROM
`bigquery-public-data.ghcn_d.ghcnd_2015` AS wx
WHERE
id = 'USW00094728'
AND qflag IS NULL
AND element = 'PRCP'
ORDER BY
wx.date
3. 강우 여부와 자전거 대여 간의 연관성 찾기
# 소스가 전혀 다른 자전거 대여 데이터세트 + 날씨 데이터세트를 결합한 결과 확인
WITH bicycle_rentals AS (
SELECT
COUNT(starttime) as num_trips,
EXTRACT(DATE from starttime) as trip_date
FROM `bigquery-public-data.new_york_citibike.citibike_trips`
GROUP BY trip_date
),
rainy_days AS
(
SELECT
date,
(MAX(prcp) > 5) AS rainy
FROM (
SELECT
wx.date AS date,
IF (wx.element = 'PRCP', wx.value/10, NULL) AS prcp
FROM
`bigquery-public-data.ghcn_d.ghcnd_2015` AS wx
WHERE
wx.id = 'USW00094728'
)
GROUP BY
date
)
SELECT
ROUND(AVG(bk.num_trips)) AS num_trips,
wx.rainy
FROM bicycle_rentals AS bk
JOIN rainy_days AS wx
ON wx.date = bk.trip_date
GROUP BY wx.rainy
'GCP > Qwiklabs' 카테고리의 다른 글
220608_Loading Taxi Data into Google Cloud SQL 2.5 (0) | 2022.06.09 |
---|---|
220609_Loading data into BigQuery (0) | 2022.06.09 |
220603_Cloud Functions: Qwik Start - 콘솔 (0) | 2022.06.03 |
220524_BigQuery in JupyterLab on Vertex AI 2.5 (0) | 2022.05.24 |
220524_Analyzing Natality Data Using Vertex AI and BigQuery (0) | 2022.05.24 |
Comments