Notice
Recent Posts
Recent Comments
Link
«   2024/07   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
Tags
more
Archives
Today
Total
관리 메뉴

JINIers

220524_BigQuery in JupyterLab on Vertex AI 2.5 본문

GCP/Qwiklabs

220524_BigQuery in JupyterLab on Vertex AI 2.5

JINIers 2022. 5. 24. 13:58

개요

  • vertex ai에서 jupyter 노트북 인스턴스화
  • jupyter 노트북 내에서 bigquery 쿼리를 실행하고 pandas를 사용하여 출력 처리

작업 1. jupyterLab 노트북 인스턴스 시작

 

vertex ai > workbench > user-managed notebooks > new notebooks > python3 > create > jupyterlab 열기


작업 2. bigquery 쿼리실행

 

[1] google cloud 모듈설치

!pip install google-cloud-bigquery==1.25.0 --use-feature=2020-resolver



실행 후 restart kernel > restart


[2] 마술기능 사용 쿼리

%%bigquery df                                    # %%bigquery : 마술기능제공
SELECT
  depdelay as departure_delay,
  COUNT(1) AS num_flights,
  APPROX_QUANTILES(arrdelay, 10) AS arrival_delay_deciles
FROM
  `cloud-training-demos.airline_ontime_data.flights`
WHERE
 depdelay is not null
GROUP BY
  depdelay
HAVING
  num_flights > 100
ORDER BY
  depdelay ASC

 


[3] 상위 5개의 행을 출력

df.head()

 


작업 3. make a plot with pandas


[4] 원시 쿼리 출력 랭글링

import pandas as pd
percentiles = df['arrival_delay_deciles'].apply(pd.Series)
percentiles.rename(columns = lambda x : '{0}%'.format(x*10), inplace=True)
percentiles.head()




[5] percentiles table - departure_delay dataframe 연결

df = pd.concat([df['departure_delay'], percentiles], axis=1)
df.head()

 


[6] 필드에 저장된 극단값 삭제

df.drop(labels=['0%', '100%'], axis=1, inplace=True)
df.plot(x='departure_delay', xlim=(-30,50), ylim=(-50,50));

Comments