Notice
Recent Posts
Recent Comments
Link
«   2024/12   »
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
Tags
more
Archives
Today
Total
관리 메뉴

JINIers

220607_Using BigQuery to do Analysis 본문

GCP/Qwiklabs

220607_Using BigQuery to do Analysis

JINIers 2022. 6. 7. 13:56

BigQuery를 사용하여 분석하기

 

 

[학습내용]

  1. bigquery 콘솔에서 대화형 쿼리 수행
  2. 다양한 데이터세트를 조합하여 분석을 실행

1. 자전거 대여 데이터 살펴보기

bigquery > add data > explore public datasets > 'NYC bike' 검색 > view datasets


* 테이블에 뜨지 않으면 pin a project > enter project name > project name 입력



bigquery-public-data > new_york_citibike > citibike_trips 선택


# 쿼리 입력

SELECT
  MIN(start_station_name) AS start_station_name,
  MIN(end_station_name) AS end_station_name,
  APPROX_QUANTILES(tripduration, 10)[OFFSET (5)] AS typical_duration,
  COUNT(tripduration) AS num_trips
FROM
  `bigquery-public-data.new_york_citibike.citibike_trips`
WHERE
  start_station_id != end_station_id
GROUP BY
  start_station_id,
  end_station_id
ORDER BY
  num_trips DESC
LIMIT
  10



# 각 자전거의 총 운행거리 쿼리 입력

WITH
  trip_distance AS (
SELECT
  bikeid,
  ST_Distance(ST_GeogPoint(s.longitude,
      s.latitude),
    ST_GeogPoint(e.longitude,
      e.latitude)) AS distance
FROM
  `bigquery-public-data.new_york_citibike.citibike_trips`,
  `bigquery-public-data.new_york_citibike.citibike_stations` as s,
  `bigquery-public-data.new_york_citibike.citibike_stations` as e
WHERE
  start_station_id = s.station_id
  AND end_station_id = e.station_id )
SELECT
  bikeid,
  SUM(distance)/1000 AS total_distance
FROM
  trip_distance
GROUP BY
  bikeid
ORDER BY
  total_distance DESC
LIMIT
  5


2. 날씨 데이터 세트 탐색

ghcnd_d > ghcnd_2015 선택


# 쿼리에서 제공한 ID와 일치하는 뉴욕 시내 기상관측소에서 관측된 2015년 모든 날짜의 강수량 표시

SELECT
  wx.date,
  wx.value/10.0 AS prcp
FROM
  `bigquery-public-data.ghcn_d.ghcnd_2015` AS wx
WHERE
  id = 'USW00094728'
  AND qflag IS NULL
  AND element = 'PRCP'
ORDER BY
  wx.date


3. 강우 여부와 자전거 대여 간의 연관성 찾기

 

# 소스가 전혀 다른 자전거 대여 데이터세트 + 날씨 데이터세트를 결합한 결과 확인

WITH bicycle_rentals AS (
  SELECT
    COUNT(starttime) as num_trips,
    EXTRACT(DATE from starttime) as trip_date
  FROM `bigquery-public-data.new_york_citibike.citibike_trips`
  GROUP BY trip_date
),
rainy_days AS
(
SELECT
  date,
  (MAX(prcp) > 5) AS rainy
FROM (
  SELECT
    wx.date AS date,
    IF (wx.element = 'PRCP', wx.value/10, NULL) AS prcp
  FROM
    `bigquery-public-data.ghcn_d.ghcnd_2015` AS wx
  WHERE
    wx.id = 'USW00094728'
)
GROUP BY
  date
)
SELECT
  ROUND(AVG(bk.num_trips)) AS num_trips,
  wx.rainy
FROM bicycle_rentals AS bk
JOIN rainy_days AS wx
ON wx.date = bk.trip_date
GROUP BY wx.rainy

Comments