GCP Associate Cloud Engineer Cheatsheet

9 May, 2021

Organization & Resource Hierarchy

Resource Hierarchy

Organisation - tied to G suite or cloud identity domain
Folder - contains number of projects & subfolders
Project - container for a set of related resources
Projects are separate from billing accounts and their free tier, e.g free tier is not tied to projects but resources.
There is no way to clone a project, you must re-create each resources as needed.
Trust boundaries
- Resources within a project “trust” each other. So for instance by default a GCE in a project can access a GCS bucket in the same project.

Permission Hierarchy

Permission. Pattern: Service.Resource.Verb - usually correspond to REST API methods, eg.: pubsub.subscriptions.consume or pubsub.topics.publish
Roles: a collection of permissions
- primitive role: project level and often too broad
  - viewer - read only
  - editor - view and edit
  - owner - editor + access control and billing
- predefined roles, granular access to specific resources for example: roles/pubsub.subscriber
  - exam tips: read through list of roles for each product think why each exists
- custom roles

Audit

Cloud Audit Logs, provide the following logs:

Log Type	Roles needed to view	Desc	Retention (days)
Admin Activity	Logging/Logs Viewer or Project/Viewer	Log entries for API calls or actions that modify config or metadata of resources - it’s always on	400
Data Access	Logging/Private Logs Viewer or Project/Owner	Contain API calls that read the configuration or metadata of resources, as well as user-driven CRUD API calls - on for BigQuery only, need to be turned on manually	30
System Event	Logging/Logs Viewer or Project/Viewer	Generated by Google systems, not by users - always on	400
Policy Denied	Logging/Logs Viewer or Project/Viewer	When access to resource denied due to policy violation	30

Private Logs Viewer roles/logging.privateLogViewer – includes roles/logging.viewer plus:
- read Access Transparency logs
- read Data Access audit logs
Publicly available resources that have the IAM policies allAuthenticatedUsers or allUsers don’t generate audit logs.

Audit logs can be exported to a log sink, type of log sink:

BigQuery
GCS
Pub/Sub

Pay attention to pricing, esp w/ Data Access audit logs as it can be large.

Aggregrated sink can route log entries from all projects, folders or billing accounts - useful if you want to route logs to a central destinatiion.

includeChildren param needs to be set to true when configuring the sink

Billing

Large enterprises should use invoicing when incurring large charges.
A self-service account is appropriate only for amounts that are within the credit limits of credit cards.
Who can delete a project? Project owner

Action	Role needed on Billing Account	Role needed on Project
Enable or change billing account	`roles/billing.user` or `roles/billing.admin`	`roles/owner` or `roles/billing.projectManager`
Disable billing account	`roles/billing.admin`	same as above

source: https://cloud.google.com/billing/docs/how-to/modify-project

Important IAM roles:

https://cloud.google.com/billing/docs/how-to/billing-access

IAM

To create a custom role - user must possess iam.roles.create
Roles pattern iam.*
Important basic roles:
- roles/browser - applicable for project only, allow browsing of project hierarchy including folder.

IAM Bindings

Policy binds members to roles for some scope of resources
Answers who can do what to which things
Roles & members listed in policy but resources identified by attaching
Always additive (allow) never subtractive (deny)
Managing policy bindings
- use gcloud add-iam-policy-binding and remove-iam-policy-binding
- Pattern: gcloud [RESOURCE-KIND] add-iam-policy-binding [RESOURCE-NAME] --role [ROLE] --member user:[USER_EMAIL]

Service Account

You can generate 10 keys per service account, not unlimited
Copying roles from one project to another: gcloud iam roles copy
To track spending for a department, a user needs to be allowed to link projects to billing account. What role is appropriate? Billing Account User

Compute

GCE

Pre-defined machine types are referred to with the notation: {machine-family}-{machine-type}-{no-of-cpu} The number of CPU are always of the power of 2, like 8, 16, 32 etc.

Create an instance with custom CPU & RAM config: gcloud compute instances create vm --custom-cpu=10 --custom-memory=60
IMPORTANT: instance goes to running state after OS boot - even before the startup script runs.
An instance doesn’t need to have an associated service account

The cost of a machine is determined by number of CPU & RAM, an example of machine types with the same CPU counts:

Machine	Price	RAM
n1-highcpu-2	$36.20	1.8GB
n1-standard-2	$48.54	7.5GB
n1-highmem-2	$60.45	13GB

If you need to manage user access to your Linux VM instances, you can use one of the following methods:

OS Login (recommended method)
Managing SSH keys in metadata - gcloud compute instances add-metadata [INSTANCE_NAME] --metadata-from-file ssh-keys=[LIST_PATH]
Temporarily grant a user access to an instance

OS Login is the recommended way. The OS Login feature lets you use Compute Engine IAM roles to manage SSH access to Linux instances.

How OS Login works:

Set enable-oslogin metadata on the instance (or on the project)
Grant roles/compute.osLogin or roles/compute.osAdminLogin to the IAM account.
If service account is used grant roles.iamServiceAccountUser (and use service account impersonation?)
Associate your local SSH key to the IAM account by: gcloud compute os-login ssh-keys add --key-file=KEY_FILE_PATH --ttl=EXPIRE_TIME
SSH to instance

GCE Logging with Stackdriver

Stackdriver is not installed by default on GCE.
Stackdriver does not automatically send files named stackdriver.log.
Logging to stdout and stderr on GCE is not the recommended way to get logs to Stackdriver.
The recommend way is to configure application to write to a log file and configure Stackdriver to monitor the file.

Scenario:

You are thinking through all the things that happen when a Compute Engine instance starts up with a startup script that installs the Stackdriver agent and runs gsutil to retrieve a large amount of data from Cloud Storage. Of the following steps, which is the last one to happen? Stackdriver logging shows the first logline from the startup script - which is a wrong answer because the log is streamed through.

The sequence:

VM powered on & OS booting up -> instance is Running.
metadata service will provide the startup script to the OS boot process.
The gsutil command will also need to get metadata, like the service account token but since it is synchronous by default and will take some time to transfer the volume of data to the instance, the Stackdriver agent should have a chance to push logs and show the startup script progress.
When the transfer is done, the startup script will complete and more logs will eventually be pushed to Stackdriver Logging
When attaching GPU to a VM, you need consider: GPU libraries must be installed, CPU & GPU must be compatible.

Instance Groups

Managed Instance Groups (MIG)

Autoscale, autoheal, regional (multiple zones), automatic update
User specify config in instance template and optional stateful config
Works with load balancer to distribute traffic across all of the instances
Zonal MIG - deploys to single zone
Regional MIG - deploys to multi zones in a region
Autoscaling policy can be based on CPU utilisation, load balancing capacity, Cloud Monitoring metrics, schedules or for zonal MIGs by using queue based workload like pubsub.

Troubleshooting, if MIG cannot create/re-create instances, it may be due:

The boot disk already exists. If a persistent disk already exists with that name, the request fails. To resolve this issue, you can optionally take a snapshot, and then delete the existing persistent disk.
Instance template is not valid

Instance Templates

Instance templates define the machine type, boot disk image or container image, labels, and other instance properties.

Unmanaged Instance Groups

You’d used Unmanaged instance groups if the instances are heterogeneous.
You need to add and remove instances from the group manually.
Do not offer autoscaling, autohealing, rolling update, multizeone, instance template and all that good stuffs.

GKE

Daemonset

DaemonSets are useful for deploying ongoing background tasks that you need to run on all or certain nodes and do not require user intervention. Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd

Statefulset

Like a deployment - manages pods on the same spec, however it maintains a sticky identity for each pods. Pods are created from same spec but not interchangeable - each has identifier to maintain across rescheduling.

Services

Services expose your pods to the internet

ClusterIP: exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.
NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll be able to contact the NodePort Service, from outside the cluster, by requesting {NodeIP}:{NodePort}.
LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.

Missc

What’s the recommended way to enable logs? Get the developers to write log lines to stdout & stderr
If a Pod is stuck in Pending status, it means that it can’t be scheduled onto a node. Generally, this is because there are insufficient resources of one type or another that prevent scheduling.
GKE cluster types:
- Single-zone clusters - control plane in one zone, workloads also in the same zone.
- Multi-zonal clusters - control plane in one zone, but nodes are in multiple zones.
- Regional clusters - control plane in multiple zones in a region, nodes can be configured in single or multiple zones.
GKE operation mode options:
- Autopilot - GKE provisions & manages cluster’s infra, e.g nodes, node pools etc
- Standard - user managed infra for flexibility.

App Engine

App Engine flexible environment allows you turn dockerised application.
App Engine standard support the following: Python, Java, Node.JS, PHP, Ruby, Go.
When a new version of app has been deployed, use app compute services set-traffic command pass the --migrate param to direct users to use the new version of the app.
You can specify the split of traffic going to new version / old version of app by gcloud app services set-traffic --splits

Automatic scaling settings:

target_cpu_utilization
target_throughput_utilization
min_idle_instances (if App Engine works out 5 instances needed to serve traffic, it will add x more as specified this seeting). Idle instances means pending latency will have less effect - but with the trade off of higher cost.

Cloud Function

Storage

Disks

Local SSD

For high performance with the trade off on durability and availability (not distributed)
High IOPS and low latency disk processing
Physically attached to the server that hosts the instance

Persistent Disk (PD)

Zonal & Regional resource - snapshots are regional though
Distributed across several physical disks (high durability)
Block based network attached
Boot disk for every GCE instance
Support snapshots for backup
Automatically encryped at rest & in transit
Types:
- Standard - HDD
- Balanced & SSD - SSD
IMPORTANT: NOT file-based NAS, can only be mounted to multiple instances if all are read-only

Cloud Firestore

Zonal based
It is NAS/NFS - file-based storage
Use case for content sharing - also for GKE containers sharing the same data
Multiple instances can write and read
Accessible to GCE & GKE through VPC, via NFSv3 protocol
Primary use is application migration (lift & shift)
Not backed up - you have to work out your own backup strategy (no snapshoting like PD)

Google Cloud Storage (GCS)

Object based - like S3
Can be configured as multi-region, dual-region, region.
Standard - frequently accessed data. highest availability SLA 99.95% (multi region). lowest operations charge, highest storage cost.
Nearline - low cost, data that can be stored for at least 30 days, use case: backup.
Coldline - very low cost, data that can be stored for at least 90 days, use case: disaster recovery.
Archive - lowest cost for storage, highest cost to access, data that can be stored for at least 365 days, use case: regulatory requirements.
Archive - no availability SLA.
Early deletion fee is chargable if you delete the file less than indicated for the class, e.g deleting a nearline object before 30 days, will incur fee.
This service is enabled by default (no need to enable it manually)
Is there a limit on storage or throughput? yes
- 1 request per 2 seconds for bucket creation & deletion
- 5Tib size limit for individual objects
- Combined size limit for all custom metadata keys & values of 8 Kib per object
Allow someone access to file in a bucket gsutil acl ch -u [email protected]:r gs://mybucket/myfile
There are access costs for Nearline, Coldline and Archive.
storage.objects.setIamPolicy permission allows a user to update Cloud Storage ACLs. The cost of each machine type is determined by how much CPU and RAM it uses.
Lifecycle management
- SetStorageClass switches object to storage class with the lowest at rest storage pricing takes precedence
- Delete action deletes an object when the objecct meets conditions in lifecycle rule
- NOTE: it doesn’t rewrite objects inside a bucket - therefore basis for future transitions or deletion is calculated from the original creation time of the object, for example: if one wants an object to be changed into different class in 90 days and deleted in 365 days, you’d configure SetStorageClass for 90 days and Delete action for 365 days (not 275 days).

Databases

Relational

These all support SQL.

Cloud SQL - transactional database. Need to provision VM.
Cloud SQL supports point-in-time recovery by enabling automated backups & binary logs. It’s enabled by default.
PTR supported for MySQL & PostgreSQL but not SQL server.
Cloud Spanner - globally distributed database that supports ANSI 2011 SQL and global transactions (CloudSQL does not)
BigQuery - data warehousing and analytics, not designed for transaction processing.
- BigQuery charges for data scanned in queries.

BigQuery

Access can be granted on organisation, project, dataset or table/view levels.
Roles cannot be applied to routines & models.
Supports external datasource (federated data source):
- Cloud BigTable
- Cloud Storage with supported format: CSV, JSON, Avro, ORC, Parquet, Datastore & Firestore exports.
- Cloud SQL
Use DML statements to perform bulk inserts.
To estimate BigQuery cost in GCP pricing calculator, you need to estimate:
- Storage needed
- Query cost
Estimate query cost by acquire the estimated bytes read by using query validaotor or run query via API with dryRun param, then use pricing calculator.

Important IAM roles:

roles	desc
`roles/bigquery.user`	on dataset - provides ability to read dataset metadata & list tables, on project: run jobs/queries, list/cancel their own jobs, enumerate datasets in a project, create new dataset
`roles/bigquery.dataViewer`	on table/view: read data. Cannot be applied to individual models or routines. on dataset: read metadata, list & read tables. On project: enumerate datasets
`roles/bigquery.dataEditor`
`roles/bigquery.dataOwner`

https://cloud.google.com/bigquery/docs/access-control

NoSQL

Memorystore - think of Redis, designed to cache data in memory. Max size 300GB.
Cloud Datastore - like DynamoDB, eventual consistent, regional & multi regional. GQL is the query language.
Google BigTable - high throughput NoSQL, designed to accept billion rows of data, use case: IoT. Need to provision VM.
- Priced according to the nodes provisioned.
- Bigtable is made for large analytical workloads.
- With Cloud Storage, you pay for read operations, so that can get quite expensive when it’s not the right fit for the data and access patterns
- cbt is the command line too for Google BigTable.
Firebase realtime DB. Zonal. It’s DB + servers that manages websockets (otherwise you need to manage your own websocket servers + dynamoDB).
Firestore is a mobile document-based database service that can synchronize data between mobile devices and centralized storage. Multi regional.

BigQuery, Datastore & Firebase are serverless.

ETL

Cloud DataProc - managed service for Apache Spark and Hadoop - used for big data analytics.
- With Hadoop, Java & Hive are the usual choice. With Spark, Java, Scala, Python & R are commonly used.
DataProc is designed to execute workflows in both batch and streaming modes - not sure if this is correct.
Cloud DataFlow - allows for stream and batch processing of data and well suited for ETL work.

Networking

VPC

VPCs are global resources
However subnets are regional (not zonal)
Use VPC Flow Logs to record a sample of network flows sent & received from VM instances, including those used as GKE nodes.
Subnets can be expanded but not shrunk
- gcloud compute networks subnets expand-ip-range is the command to expand
Google reserves 4 IP addresses in a subnet - so total available addresses is based on CIDR calculation - 4

CIDR cheatsheet

/32 - 1 address
/30 - 4 addresses
/22 - 1024 addresses

Cloud Load Balancing

To decide which load balancer, consider:

Global vs Regional
External vs Internal
Traffic type

Types:

External HTTP(s)
SSL Proxy - traffic: TCP w/ SSL offload
TCP Proxy - traffic: TCP w/o SSL offload
External Network - traffic: TCP/UDP, ESP or ICMP
Internal TCP/UDP
Internal HTTP(s)

Global:

External HTTP(s)
SSL Proxy
TCP Proxy

Features:

Supports multi region failover
Prioritise low latency connection to region near user (Anycast IP?)
Reacts quickly to changes in user/traffic/network/health etc unlike DNS

Regional:

Internal HTTP(s)
Internal TCP/UDP
External Network

Cloud Interconnect

Connect external networks to Google’s network

Cloud DNS

Can be used for public and private managed zones
Supports DNSSEC

Static IP

Regional IPs used for GCE Instances & Network Load Balancers
Global IPs used for Global Load Balancers (HTTP(S), SSL Proxy, TCP Proxy)
- These are Anycast IP
You pay if you don’t use it, otherwise it’s free

Cloud CDN

Supports HTTP/2 & HTTPS but no custom origins (GCP only)

Specialised Services

Cloud Natural Language Processing provides functionality for analyzing text.
Data analytics set of specialized services include products that help with extraction, transformation, and loading (ETL) and work with both batch and streaming data.
Cloud Armor builds on GCP’s load balancing services to provide the ability to allow or restrict access based on IP address, deploy rules to counter cross-site scripting attacks, and provide counter measures to SQL injection attacks.
Google Cloud Marketplace - integrated solutions vetted by Google Cloud to cover your enterprise’s IT needs. Scale procurement for your enterprise via online discovery, purchasing, and fulfillment of enterprise-grade cloud solutions.
Cloud Launcher - it’s the old name of Google Cloud Marketplace.
Google Cloud Interconnect - Dedicated - provides dedicated connection between your data center to GCP to allow for large data transfer.

GCloud

Enable new API - gcloud services enable xyz.googleapis.com or gcloud services enable xyz
Init GCloud - gcloud init
You can set default compute/zone and compute/region in gcloud config. Note: notice the compute/ prefix!

Things to look out for

CloudShell does not need authentication

What APIs enabled by default in GCP?

BigQuery
Google Cloud API
Datastore
Cloud SQL
GCS (Cloud Storage), Cloud Storage JSON API
Service Management
Service Usage
Cloud Debugger
Cloud Logging
Cloud Monitoring
Cloud Trace

Cloud Monitoring

Cloud Monitoring uses Workspaces to organize monitoring information. A Workspace can manage the monitoring data for a single Google Cloud project, or it can manage the data for multiple Google Cloud projects and AWS accounts. However, a Google Cloud project or an AWS account can only be associated with one Workspace at a time.

A Google Cloud project or AWS account can be monitored by exactly 1 Workspace. A Workspace always monitors its Google Cloud host project. However, you can configure a Workspace to monitor up to 100 Google Cloud projects and AWS accounts.

Missc

Default GCE service account format: [email protected] - note the project number NOT project id
Default App engine account format: [email protected]

When encountering a transient error - gsutil will retry using truncated binary exponential back off strategy.

billing.accounts.update when project is created on the console - will it be associated with billing account straight away?

n1-standard-8, n1-highcpu-8, and n1-highmem-16 - how do they compare cost wise? Type tells you where in the range of allowable RAM that machine falls from minimum (highcpu) to balanced (standard) to maximum (highmem) n1-highcpu-8 will have the least RAM - so it is the cheapest n1-highcpu-8 and highmem-8 - highcpu will still be cheapest

Does a default service account (GCE) has automatically access to objects in GCS in the same project? Yes, by default, both the default service account and the default scopes can read from GCS buckets in the same project.

Creating a project via the console will link it to your only billing account or ask you if you have more than one. When creating a project through gcloud, it does not automatically link the project to your billing account; you can use gcloud beta billing to do that.

Organization & Resource Hierarchy

Resource Hierarchy

Permission Hierarchy

Audit

Billing

IAM

IAM Bindings

Service Account

Compute

GCE

Login to GCE instance

GCE Logging with Stackdriver

Instance Groups

Managed Instance Groups (MIG)

Instance Templates

Unmanaged Instance Groups

GKE

Daemonset

Statefulset

Services

Missc

App Engine

Cloud Function

Storage

Disks

Local SSD

Persistent Disk (PD)

Cloud Firestore

Google Cloud Storage (GCS)

Databases

Relational

BigQuery

NoSQL

ETL

Networking

VPC

CIDR cheatsheet

Cloud Load Balancing

Cloud Interconnect

Cloud DNS

Static IP

Cloud CDN

Specialised Services

GCloud

Things to look out for

Cloud Monitoring

Missc