Notice
Recent Posts
Recent Comments
Link
«   2024/07   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
Tags
more
Archives
Today
Total
관리 메뉴

JINIers

220524_Analyzing Natality Data Using Vertex AI and BigQuery 본문

GCP/Qwiklabs

220524_Analyzing Natality Data Using Vertex AI and BigQuery

JINIers 2022. 5. 24. 09:51

개요

  • vertex ai 노트북 실행
  • bigquery 쿼리 호출
  • jupyter에서 차트 만들기
  • 기계 학습을 위한 데이터 내보내기

작업 1. bigquery 호출

bigquery > 쿼리편집기

 

# 00 ~ 05년 사이에 미국에서 태어난 세쌍둥이 찾기

SELECT
  plurality,
  COUNT(1) AS num_babies,
  AVG(weight_pounds) AS ave_weight
FROM
  `bigquery-public-data.samples.natality`
WHERE
  year > 2000 AND year < 2005
GROUP BY
  plurality

작업 2. vertex ai 노트북 실행

vertex ai > workbench > new notebook > python3 > create
region : us-central1
나머지 : 기본값

jupyter 열기


작업 3. vertex ai에서 데이터 시각화

 

[1] 모듈 설치 및 업그레이드

!pip install google-cloud-bigquery==1.25.0
!pip install --upgrade google-cloud-bigquery-storage

 

# 실행 시 오류가 날 것 → 다시시작을 하면 된다.


[2] 클라이언트 라이브러리 가져오기 & 클라이언트 초기화

from google.cloud import bigquery
client = bigquery.Client()

 


[3] BigQuery natality 공개 데이터세트에 대한 쿼리 실행

# 69 ~ 08까지 등록된 모든 미국출생을 설명하는 쿼리

sql = """
  SELECT
    plurality,
    COUNT(1) AS count,
    year
  FROM
    `bigquery-public-data.samples.natality`
  WHERE
    NOT IS_NAN(plurality) AND plurality > 1
  GROUP BY
    plurality, year
  ORDER BY
    count DESC
"""
df = client.query(sql).to_dataframe()
df.head() # 상위 5개의 행 출력

 


[4] 시간경과에 따른 복수 출생 수 누적 그래프 생성

pivot_table = df.pivot(index='year', columns='plurality', values='count')
pivot_table.plot(kind='bar', stacked=True, figsize=(15,7));


[5] 성별에 따른 아기 체중

sql = """
  SELECT
    is_male,
    AVG(weight_pounds) AS ave_weight
  FROM
    `bigquery-public-data.samples.natality`
  GROUP BY
    is_male
"""
df = client.query(sql).to_dataframe()
df.plot(x='is_male', y='ave_weight', kind='bar');

 

[6] 임신 주수에 따라 아기 체중이 어떻게 변하는지 확인

sql = """
  SELECT
    gestation_weeks,
    AVG(weight_pounds) AS ave_weight
  FROM
    `bigquery-public-data.samples.natality`
  WHERE
    NOT IS_NAN(gestation_weeks) AND gestation_weeks <> 99
  GROUP BY
    gestation_weeks
  ORDER BY
    gestation_weeks
"""
df = client.query(sql).to_dataframe()
df.plot(x='gestation_weeks', y='ave_weight', kind='bar');

 

Comments