GCP Associate Cloud Engineer Cheatsheet

Organization & Resource Hierarchy

Resource Hierarchy

  • Organisation - tied to G suite or cloud identity domain
  • Folder - contains number of projects & subfolders
  • Project - container for a set of related resources
  • Projects are separate from billing accounts and their free tier, e.g free tier is not tied to projects but resources.
  • There is no way to clone a project, you must re-create each resources as needed.
  • Trust boundaries
    • Resources within a project “trust” each other. So for instance by default a GCE in a project can access a GCS bucket in the same project.

Permission Hierarchy

  • Permission. Pattern: Service.Resource.Verb - usually correspond to REST API methods, eg.: pubsub.subscriptions.consume or pubsub.topics.publish
  • Roles: a collection of permissions
    • primitive role: project level and often too broad
      • viewer - read only
      • editor - view and edit
      • owner - editor + access control and billing
    • predefined roles, granular access to specific resources for example: roles/pubsub.subscriber
      • exam tips: read through list of roles for each product think why each exists
    • custom roles

Audit

Cloud Audit Logs, provide the following logs:

Log Type Roles needed to view Desc Retention (days)
Admin Activity Logging/Logs Viewer or Project/Viewer Log entries for API calls or actions that modify config or metadata of resources - it’s always on 400
Data Access Logging/Private Logs Viewer or Project/Owner Contain API calls that read the configuration or metadata of resources, as well as user-driven CRUD API calls - on for BigQuery only, need to be turned on manually 30
System Event Logging/Logs Viewer or Project/Viewer Generated by Google systems, not by users - always on 400
Policy Denied Logging/Logs Viewer or Project/Viewer When access to resource denied due to policy violation 30
  • Private Logs Viewer roles/logging.privateLogViewer – includes roles/logging.viewer plus:
    • read Access Transparency logs
    • read Data Access audit logs
  • Publicly available resources that have the IAM policies allAuthenticatedUsers or allUsers don’t generate audit logs.

Audit logs can be exported to a log sink, type of log sink:

  • BigQuery
  • GCS
  • Pub/Sub

Pay attention to pricing, esp w/ Data Access audit logs as it can be large.

Aggregrated sink can route log entries from all projects, folders or billing accounts - useful if you want to route logs to a central destinatiion.

  • includeChildren param needs to be set to true when configuring the sink

Billing

  • Large enterprises should use invoicing when incurring large charges.
  • A self-service account is appropriate only for amounts that are within the credit limits of credit cards.
  • Who can delete a project? Project owner
Action Role needed on Billing Account Role needed on Project
Enable or change billing account roles/billing.user or roles/billing.admin roles/owner or roles/billing.projectManager
Disable billing account roles/billing.admin same as above

source: https://cloud.google.com/billing/docs/how-to/modify-project

Important IAM roles:

https://cloud.google.com/billing/docs/how-to/billing-access

IAM

  • To create a custom role - user must possess iam.roles.create
  • Roles pattern iam.*
  • Important basic roles:
    • roles/browser - applicable for project only, allow browsing of project hierarchy including folder.

IAM Bindings

  • Policy binds members to roles for some scope of resources
  • Answers who can do what to which things
  • Roles & members listed in policy but resources identified by attaching
  • Always additive (allow) never subtractive (deny)
  • Managing policy bindings
    • use gcloud add-iam-policy-binding and remove-iam-policy-binding
    • Pattern: gcloud [RESOURCE-KIND] add-iam-policy-binding [RESOURCE-NAME] --role [ROLE] --member user:[USER_EMAIL]

Service Account

  • You can generate 10 keys per service account, not unlimited
  • Copying roles from one project to another: gcloud iam roles copy
  • To track spending for a department, a user needs to be allowed to link projects to billing account. What role is appropriate? Billing Account User

Compute

GCE

Pre-defined machine types are referred to with the notation: {machine-family}-{machine-type}-{no-of-cpu} The number of CPU are always of the power of 2, like 8, 16, 32 etc.

  • Create an instance with custom CPU & RAM config: gcloud compute instances create vm --custom-cpu=10 --custom-memory=60
  • IMPORTANT: instance goes to running state after OS boot - even before the startup script runs.
  • An instance doesn’t need to have an associated service account

The cost of a machine is determined by number of CPU & RAM, an example of machine types with the same CPU counts:

Machine Price RAM
n1-highcpu-2 $36.20 1.8GB
n1-standard-2 $48.54 7.5GB
n1-highmem-2 $60.45 13GB

Login to GCE instance

If you need to manage user access to your Linux VM instances, you can use one of the following methods:

  • OS Login (recommended method)
  • Managing SSH keys in metadata - gcloud compute instances add-metadata [INSTANCE_NAME] --metadata-from-file ssh-keys=[LIST_PATH]
  • Temporarily grant a user access to an instance

OS Login is the recommended way. The OS Login feature lets you use Compute Engine IAM roles to manage SSH access to Linux instances.

How OS Login works:

  • Set enable-oslogin metadata on the instance (or on the project)
  • Grant roles/compute.osLogin or roles/compute.osAdminLogin to the IAM account.
  • If service account is used grant roles.iamServiceAccountUser (and use service account impersonation?)
  • Associate your local SSH key to the IAM account by: gcloud compute os-login ssh-keys add --key-file=KEY_FILE_PATH --ttl=EXPIRE_TIME
  • SSH to instance

GCE Logging with Stackdriver

  • Stackdriver is not installed by default on GCE.
  • Stackdriver does not automatically send files named stackdriver.log.
  • Logging to stdout and stderr on GCE is not the recommended way to get logs to Stackdriver.
  • The recommend way is to configure application to write to a log file and configure Stackdriver to monitor the file.

Scenario:

You are thinking through all the things that happen when a Compute Engine instance starts up with a startup script that installs the Stackdriver agent and runs gsutil to retrieve a large amount of data from Cloud Storage. Of the following steps, which is the last one to happen? Stackdriver logging shows the first logline from the startup script - which is a wrong answer because the log is streamed through.

The sequence:

  • VM powered on & OS booting up -> instance is Running.
  • metadata service will provide the startup script to the OS boot process.
  • The gsutil command will also need to get metadata, like the service account token but since it is synchronous by default and will take some time to transfer the volume of data to the instance, the Stackdriver agent should have a chance to push logs and show the startup script progress.
  • When the transfer is done, the startup script will complete and more logs will eventually be pushed to Stackdriver Logging
  • When attaching GPU to a VM, you need consider: GPU libraries must be installed, CPU & GPU must be compatible.

Instance Groups

Managed Instance Groups (MIG)

  • Autoscale, autoheal, regional (multiple zones), automatic update
  • User specify config in instance template and optional stateful config
  • Works with load balancer to distribute traffic across all of the instances
  • Zonal MIG - deploys to single zone
  • Regional MIG - deploys to multi zones in a region
  • Autoscaling policy can be based on CPU utilisation, load balancing capacity, Cloud Monitoring metrics, schedules or for zonal MIGs by using queue based workload like pubsub.

Troubleshooting, if MIG cannot create/re-create instances, it may be due:

  • The boot disk already exists. If a persistent disk already exists with that name, the request fails. To resolve this issue, you can optionally take a snapshot, and then delete the existing persistent disk.
  • Instance template is not valid
Instance Templates

Instance templates define the machine type, boot disk image or container image, labels, and other instance properties.

Unmanaged Instance Groups

  • You’d used Unmanaged instance groups if the instances are heterogeneous.
  • You need to add and remove instances from the group manually.
  • Do not offer autoscaling, autohealing, rolling update, multizeone, instance template and all that good stuffs.

GKE

Daemonset

DaemonSets are useful for deploying ongoing background tasks that you need to run on all or certain nodes and do not require user intervention. Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd

Statefulset

Like a deployment - manages pods on the same spec, however it maintains a sticky identity for each pods. Pods are created from same spec but not interchangeable - each has identifier to maintain across rescheduling.

Services

Services expose your pods to the internet

  • ClusterIP: exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.
  • NodePort: Exposes the Service on each Node’s IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You’ll be able to contact the NodePort Service, from outside the cluster, by requesting {NodeIP}:{NodePort}.
  • LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.

Missc

  • What’s the recommended way to enable logs? Get the developers to write log lines to stdout & stderr

  • If a Pod is stuck in Pending status, it means that it can’t be scheduled onto a node. Generally, this is because there are insufficient resources of one type or another that prevent scheduling.

  • GKE cluster types:

    • Single-zone clusters - control plane in one zone, workloads also in the same zone.
    • Multi-zonal clusters - control plane in one zone, but nodes are in multiple zones.
    • Regional clusters - control plane in multiple zones in a region, nodes can be configured in single or multiple zones.
  • GKE operation mode options:

    • Autopilot - GKE provisions & manages cluster’s infra, e.g nodes, node pools etc
    • Standard - user managed infra for flexibility.

App Engine

  • App Engine flexible environment allows you turn dockerised application.
  • App Engine standard support the following: Python, Java, Node.JS, PHP, Ruby, Go.
  • When a new version of app has been deployed, use app compute services set-traffic command pass the --migrate param to direct users to use the new version of the app.
  • You can specify the split of traffic going to new version / old version of app by gcloud app services set-traffic --splits

Automatic scaling settings:

  • target_cpu_utilization
  • target_throughput_utilization
  • min_idle_instances (if App Engine works out 5 instances needed to serve traffic, it will add x more as specified this seeting). Idle instances means pending latency will have less effect - but with the trade off of higher cost.

Cloud Function

Storage

Disks

Local SSD

  • For high performance with the trade off on durability and availability (not distributed)
  • High IOPS and low latency disk processing
  • Physically attached to the server that hosts the instance

Persistent Disk (PD)

  • Zonal & Regional resource - snapshots are regional though
  • Distributed across several physical disks (high durability)
  • Block based network attached
  • Boot disk for every GCE instance
  • Support snapshots for backup
  • Automatically encryped at rest & in transit
  • Types:
    • Standard - HDD
    • Balanced & SSD - SSD
  • IMPORTANT: NOT file-based NAS, can only be mounted to multiple instances if all are read-only

Cloud Firestore

  • Zonal based
  • It is NAS/NFS - file-based storage
  • Use case for content sharing - also for GKE containers sharing the same data
  • Multiple instances can write and read
  • Accessible to GCE & GKE through VPC, via NFSv3 protocol
  • Primary use is application migration (lift & shift)
  • Not backed up - you have to work out your own backup strategy (no snapshoting like PD)

Google Cloud Storage (GCS)

  • Object based - like S3

  • Can be configured as multi-region, dual-region, region.

  • Standard - frequently accessed data. highest availability SLA 99.95% (multi region). lowest operations charge, highest storage cost.

  • Nearline - low cost, data that can be stored for at least 30 days, use case: backup.

  • Coldline - very low cost, data that can be stored for at least 90 days, use case: disaster recovery.

  • Archive - lowest cost for storage, highest cost to access, data that can be stored for at least 365 days, use case: regulatory requirements.

  • Archive - no availability SLA.

  • Early deletion fee is chargable if you delete the file less than indicated for the class, e.g deleting a nearline object before 30 days, will incur fee.

  • This service is enabled by default (no need to enable it manually)

  • Is there a limit on storage or throughput? yes

    • 1 request per 2 seconds for bucket creation & deletion
    • 5Tib size limit for individual objects
    • Combined size limit for all custom metadata keys & values of 8 Kib per object
  • Allow someone access to file in a bucket gsutil acl ch -u [email protected]:r gs://mybucket/myfile

  • There are access costs for Nearline, Coldline and Archive.

  • storage.objects.setIamPolicy permission allows a user to update Cloud Storage ACLs. The cost of each machine type is determined by how much CPU and RAM it uses.

  • Lifecycle management

    • SetStorageClass switches object to storage class with the lowest at rest storage pricing takes precedence
    • Delete action deletes an object when the objecct meets conditions in lifecycle rule
    • NOTE: it doesn’t rewrite objects inside a bucket - therefore basis for future transitions or deletion is calculated from the original creation time of the object, for example: if one wants an object to be changed into different class in 90 days and deleted in 365 days, you’d configure SetStorageClass for 90 days and Delete action for 365 days (not 275 days).

Databases

Relational

These all support SQL.

  • Cloud SQL - transactional database. Need to provision VM.
  • Cloud SQL supports point-in-time recovery by enabling automated backups & binary logs. It’s enabled by default.
  • PTR supported for MySQL & PostgreSQL but not SQL server.
  • Cloud Spanner - globally distributed database that supports ANSI 2011 SQL and global transactions (CloudSQL does not)
  • BigQuery - data warehousing and analytics, not designed for transaction processing.
    • BigQuery charges for data scanned in queries.

BigQuery

  • Access can be granted on organisation, project, dataset or table/view levels.
  • Roles cannot be applied to routines & models.
  • Supports external datasource (federated data source):
    • Cloud BigTable
    • Cloud Storage with supported format: CSV, JSON, Avro, ORC, Parquet, Datastore & Firestore exports.
    • Cloud SQL
  • Use DML statements to perform bulk inserts.
  • To estimate BigQuery cost in GCP pricing calculator, you need to estimate:
    • Storage needed
    • Query cost
  • Estimate query cost by acquire the estimated bytes read by using query validaotor or run query via API with dryRun param, then use pricing calculator.

Important IAM roles:

roles desc
roles/bigquery.user on dataset - provides ability to read dataset metadata & list tables, on project: run jobs/queries, list/cancel their own jobs, enumerate datasets in a project, create new dataset
roles/bigquery.dataViewer on table/view: read data. Cannot be applied to individual models or routines. on dataset: read metadata, list & read tables. On project: enumerate datasets
roles/bigquery.dataEditor
roles/bigquery.dataOwner

https://cloud.google.com/bigquery/docs/access-control

NoSQL

  • Memorystore - think of Redis, designed to cache data in memory. Max size 300GB.
  • Cloud Datastore - like DynamoDB, eventual consistent, regional & multi regional. GQL is the query language.
  • Google BigTable - high throughput NoSQL, designed to accept billion rows of data, use case: IoT. Need to provision VM.
    • Priced according to the nodes provisioned.
    • Bigtable is made for large analytical workloads.
    • With Cloud Storage, you pay for read operations, so that can get quite expensive when it’s not the right fit for the data and access patterns
    • cbt is the command line too for Google BigTable.
  • Firebase realtime DB. Zonal. It’s DB + servers that manages websockets (otherwise you need to manage your own websocket servers + dynamoDB).
  • Firestore is a mobile document-based database service that can synchronize data between mobile devices and centralized storage. Multi regional.

BigQuery, Datastore & Firebase are serverless.

ETL

  • Cloud DataProc - managed service for Apache Spark and Hadoop - used for big data analytics.
    • With Hadoop, Java & Hive are the usual choice. With Spark, Java, Scala, Python & R are commonly used.
  • DataProc is designed to execute workflows in both batch and streaming modes - not sure if this is correct.
  • Cloud DataFlow - allows for stream and batch processing of data and well suited for ETL work.

Networking

VPC

  • VPCs are global resources
  • However subnets are regional (not zonal)
  • Use VPC Flow Logs to record a sample of network flows sent & received from VM instances, including those used as GKE nodes.
  • Subnets can be expanded but not shrunk
    • gcloud compute networks subnets expand-ip-range is the command to expand
  • Google reserves 4 IP addresses in a subnet - so total available addresses is based on CIDR calculation - 4

CIDR cheatsheet

  • /32 - 1 address
  • /30 - 4 addresses
  • /22 - 1024 addresses

Cloud Load Balancing

To decide which load balancer, consider:

  • Global vs Regional
  • External vs Internal
  • Traffic type

Types:

  • External HTTP(s)
  • SSL Proxy - traffic: TCP w/ SSL offload
  • TCP Proxy - traffic: TCP w/o SSL offload
  • External Network - traffic: TCP/UDP, ESP or ICMP
  • Internal TCP/UDP
  • Internal HTTP(s)

Global:

  • External HTTP(s)
  • SSL Proxy
  • TCP Proxy

Features:

  • Supports multi region failover
  • Prioritise low latency connection to region near user (Anycast IP?)
  • Reacts quickly to changes in user/traffic/network/health etc unlike DNS

Regional:

  • Internal HTTP(s)
  • Internal TCP/UDP
  • External Network

Cloud Interconnect

  • Connect external networks to Google’s network

Cloud DNS

  • Can be used for public and private managed zones
  • Supports DNSSEC

Static IP

  • Regional IPs used for GCE Instances & Network Load Balancers
  • Global IPs used for Global Load Balancers (HTTP(S), SSL Proxy, TCP Proxy)
    • These are Anycast IP
  • You pay if you don’t use it, otherwise it’s free

Cloud CDN

  • Supports HTTP/2 & HTTPS but no custom origins (GCP only)

Specialised Services

  • Cloud Natural Language Processing provides functionality for analyzing text.
  • Data analytics set of specialized services include products that help with extraction, transformation, and loading (ETL) and work with both batch and streaming data.
  • Cloud Armor builds on GCP’s load balancing services to provide the ability to allow or restrict access based on IP address, deploy rules to counter cross-site scripting attacks, and provide counter measures to SQL injection attacks.
  • Google Cloud Marketplace - integrated solutions vetted by Google Cloud to cover your enterprise’s IT needs. Scale procurement for your enterprise via online discovery, purchasing, and fulfillment of enterprise-grade cloud solutions.
  • Cloud Launcher - it’s the old name of Google Cloud Marketplace.
  • Google Cloud Interconnect - Dedicated - provides dedicated connection between your data center to GCP to allow for large data transfer.

GCloud

  • Enable new API - gcloud services enable xyz.googleapis.com or gcloud services enable xyz
  • Init GCloud - gcloud init
  • You can set default compute/zone and compute/region in gcloud config. Note: notice the compute/ prefix!

Things to look out for

  • CloudShell does not need authentication

What APIs enabled by default in GCP?

  • BigQuery
  • Google Cloud API
  • Datastore
  • Cloud SQL
  • GCS (Cloud Storage), Cloud Storage JSON API
  • Service Management
  • Service Usage
  • Cloud Debugger
  • Cloud Logging
  • Cloud Monitoring
  • Cloud Trace

Cloud Monitoring

Cloud Monitoring uses Workspaces to organize monitoring information. A Workspace can manage the monitoring data for a single Google Cloud project, or it can manage the data for multiple Google Cloud projects and AWS accounts. However, a Google Cloud project or an AWS account can only be associated with one Workspace at a time.

A Google Cloud project or AWS account can be monitored by exactly 1 Workspace. A Workspace always monitors its Google Cloud host project. However, you can configure a Workspace to monitor up to 100 Google Cloud projects and AWS accounts.

Missc

When encountering a transient error - gsutil will retry using truncated binary exponential back off strategy.

billing.accounts.update when project is created on the console - will it be associated with billing account straight away?

n1-standard-8, n1-highcpu-8, and n1-highmem-16 - how do they compare cost wise? Type tells you where in the range of allowable RAM that machine falls from minimum (highcpu) to balanced (standard) to maximum (highmem) n1-highcpu-8 will have the least RAM - so it is the cheapest n1-highcpu-8 and highmem-8 - highcpu will still be cheapest

Does a default service account (GCE) has automatically access to objects in GCS in the same project? Yes, by default, both the default service account and the default scopes can read from GCS buckets in the same project.

Creating a project via the console will link it to your only billing account or ask you if you have more than one. When creating a project through gcloud, it does not automatically link the project to your billing account; you can use gcloud beta billing to do that.