Saturday, 26 March 2022

Google BigQuery vs BigTable

 

BigQuery

BigTable

  • BigQuery is Google Cloud’s fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real-time.
  • You can use bq command-line tool or Google Cloud Console to interact with BigTable.
  • You can access BigQuery by using the Cloud Console, by using the bq command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, .NET, or Python.
  • A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to your tables and views.
  • You specify a location for storing your BigQuery data when you create a dataset. After you create the dataset, the location cannot be changed, but you can copy the dataset to a different location, or manually move (recreate) the dataset in a different location.
  • You can set control access to datasets in BigQuery at table and view level, column-level, or use IAM.
  • There are several ways to ingest data into BigQuery:
  • Batch load a set of data records.
  • Stream individual records or batches of records.
  • Use queries to generate new data and append or overwrite the results to a table.
  • Use a third-party application or service.
  • Data loaded in BigQuery can be exported in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.
  • Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.
  • An external data source (also known as a federated data source) is a data source that you can query directly even though the data is not stored in BigQuery. Instead of loading or streaming the data, you create a table that references the external data source.
  • A fully managed, scalable NoSQL database service for large analytical and operational workloads.
  • You can use cbt command-line tool or Google Cloud Console to interact with BigTable.
  • Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key.
  • Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.
  • Cloud Bigtable stores data in massively scalable tables, each of which is a sorted key/value map. The table is composed of rows, each of which typically describes a single entity, and columns, which contain individual values for each row. Each row is indexed by a single row key, and columns that are related to one another are typically grouped together into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family.
  • To use Cloud Bigtable, you create instances, which contain up to 4 clusters that your applications can connect to. Each cluster contains nodes, the compute units that manage your data and perform maintenance tasks.
  • A Cloud Bigtable instance is a container for your data. Instances have one or more clusters, located in different zones. Each cluster has at least 1 node.
  • Cloud Bigtable backups let you save a copy of a table’s schema and data, then restore from the backup to a new table at a later time.
  • Dataflow templates allow you to export data from Cloud Bigtable from a variety of data types then import the data back into Cloud Bigtable.
  • Replication for Cloud Bigtable enables you to increase the availability and durability of your data by copying it across multiple regions or multiple zones within the same region. You can also isolate workloads by routing different types of requests to different clusters.
  • You can use Dataproc to create one or more Compute Engine instances that can connect to a Cloud Bigtable instance and run Hadoop jobs.

Google Cloud Functions vs App Engine vs Cloud Run vs GKE

 Serverless compute platforms like Cloud Functions, App Engine, and Cloud Run lets you build, develop, and deploy applications while simplifying the developer experience by eliminating all infrastructure management.

On the other hand, Google Kubernetes Engine (GKE) runs Certified Kubernetes that helps you facilitate the orchestration of containers via declarative configuration and automation.

Both Google serverless platforms and GKE allows you to scale your application based on your infrastructure requirement. Here’s a table to help you identify when to use these specific services.

Cloud Functions

App Engine

  • Cloud Functions is a fully managed, serverless platform for creating stand-alone functions that respond to real-time events without the need to manage servers, configure software, update frameworks, and patch operating systems.
  • With Cloud Functions, you write simple, single-purpose functions that are attached to events produced from your cloud infrastructure and services.
  • Cloud Functions can be written using JavaScript, Python 3, Go, or Java runtimes which make both portability and local testing more familiar.
  • Functions are stateless. The execution environment is often initialized from scratch, which is called a cold start and they take significant amounts of time to complete.
  • It is a serverless execution environment that can be used for building and connecting your cloud services. It can serve IoT workloads, ETL, webhooks, Kafka messages, analytics, and event-driven services.
  • Cloud Functions are great for building serverless backends, doing real-time data processing, and creating intelligent apps.
  • App Engine is a fully managed, serverless platform for hosting and developing highly scalable web applications. It lets you focus on your code while App Engine manages infrastructure concerns.
  • You can scale your applications from zero to planet-scale without having to worry and manage infrastructure.
  • You can build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP runtimes. Moreover, you can also bring any library and framework to App Engine by supplying a Docker container.
  • Each Cloud project can only contain a single App Engine application. Once App Engine is created on a project, you are not allowed to change the location of your application.
  • App Engine can seamlessly host different versions of your application, and help you effortlessly create development, test, staging, and production environments.
  • With App Engine, you can route incoming traffic to different versions of your application, A/B test it, and perform incremental feature rollouts by using traffic splitting.
  • App Engine easily integrates with Cloud Monitoring and Cloud Logging to monitor your app’s health and performance. It also works with Cloud Debugger and Error Reporting to help you diagnose and fix bugs quickly.
  • You can run your applications in App Engine using the standard or flexible environments. You are allowed to simultaneously use both environments for your application to take advantage of each environment’s individual benefits.

Cloud Run

Google Kubernetes Engine (GKE)

  • Cloud Run is a managed serverless compute platform that helps you run highly scalable containerized applications that can be invoked via web requests or Pub/Sub events.
  • It is built upon an open standard Knative, that enables the portability of your applications
  • You can pick the programming language of your choice, any operating system libraries, or even bring your own binaries.
  • You can leverage container workflows since Cloud Run integrates well with services in the container ecosystem like Cloud Build, Artifact Registry, Docker.
  • Your container instances run in a secure sandbox environment isolated from other resources.
  • With Cloud Run, you can automatically scale up or down from zero to N depending on traffic. 
  • Cloud Run services are regional and are automatically replicated across multiple zones.
  • Cloud Run provides an out-of-the-box integration with Cloud Monitoring, Cloud Logging, Cloud Trace, and Error Reporting to monitor the health performance of an application.
  • Google Kuberenetes Engine (GKE) is a managed Kubernetes service that facilitates the orchestration of containers via declarative configuration and automation.
  • It integrates with Identity Access Management (IAM) to control access in the cluster with your Google accounts and role permissions you set.
  • GKE runs Certified Kubernetes. This enables portability to other Kubernetes platforms across cloud and on-premises workloads.
  • You can eliminate operational overhead expenses by enabling auto-repair, auto-upgrade, and release channels
  • GKE lets you reserve a CIDR range for your cluster, allowing your cluster IPs to coexist with private network IPs via Google Cloud VPN.
  • With GKE, you can choose clusters designed to the availability, version stability, isolation, and pod traffic requirements of your mission-critical workloads.
  • You can automatically scale your application deployment up and down based on CPU and memory utilization.
  • By default, your cluster nodes are automatically updated with the latest release version of Kubernetes. Kubernetes release updates are quickly made available within GKE.
  • Google Kubernetes Engine integrates well with Cloud Logging and Cloud Monitoring via Cloud Console, making it easy to gain insight into your application.

Google Cloud Storage vs Persistent Disks vs Local SSD vs Cloud Filestore

 

Google Cloud StoragePersistent DisksLocal SSDCloud Filestore
  • Cloud Storage is a service for storing your objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format. You store objects in containers called buckets.
  • You specify a location for storing your object data when you create a bucket. You can either select region, dual-region, and multi-region as location. Objects stored in a multi-region or dual-region are geo-redundant.
  • Cloud Storage offers different storage classes for various storage requirements: Standard, Nearline, Coldline, and Archive.
  • GCS offers unlimited storage with no minimum object size.
  • Cloud Storage offers two systems for granting users permission to access your buckets and objects: IAM and Access Control Lists (ACLs). These systems act in parallel – in order for a user to access a Cloud Storage resource, only one of the systems needs to grant the user permission.
  • Cloud Storage always encrypts your data by default on the server-side before it is written to disk, at no additional charge. You also have an option to do your own encryption before uploading it to Cloud Storage.
  • Block storage service, fully integrated with Google Cloud products like Compute Engine and GKE.
  • It can be attached to virtual machine (VM) instances running in Compute Engine or Google Kubernetes Engine
  • Transparently resize, quickly back up, and support simultaneous readers
  • Persistent disks ensure data integrity by storing data redundantly in zones or regions and are designed for high durability.
  • They are located independently from your virtual machine instances. This means you can detach or move your disks to retain your data even after deleting your instances.
  • You can create snapshots to back up data from your zonal or regional persistent disks.
  • Snapshots are geo-replicated and available for restore in all regions by default. Snapshots of a block device can take place in minutes rather than hours.
  • You can resize your existing persistent disks to scale based on performance and storage space requirements.
  • Persistent Disks are automatically encrypted to protect your data, in transit or at rest. You can supply your own key, or we will automatically generate one for you.
  • Ephemeral locally attached block storage for virtual machines and containers. 
  • Local SSDs have higher throughput and lower latency than standard persistent disks or SSD persistent disks.
  • The data that you store on a local SSD persists only until the instance is stopped or deleted. 
  • Local SSDs are designed to offer very high IOPS and low latency.
  • Compute Engine automatically encrypts your data when it is written to local SSD storage space. You can’t use customer-supplied encryption keys with local SSDs.
  • You can create an instance with 16 or 24 local SSD partitions for 6 TB or 9 TB of local SSD space, respectively.
  • Instances with shared-core machine types can’t attach any local SSD partitions.
  • Fully managed service for file migration and storage. Easily mount file shares on Compute Engine VMs.
  • Filestore instances are fully managed NFS file servers on Google Cloud for use with applications running on Compute Engine virtual machines (VMs) instances or Google Kubernetes Engine clusters.
  • Filestore share can be accessed both from a Compute Engine instance within the same VPC or from remote clients.
  • File shares can also be accessed from Google Kubernetes Cluster. The cluster must be in the same Google Cloud project and VPC network as the Filestore instance unless the Filestore instance is on a shared VPC network. Currently, Filestore instances can only be created on a shared VPC network from the host project

Google Compute Engine vs App Engine

 

Google Compute Engine

Google App Engine

Compute Engine delivers configurable virtual machines running in Google’s data centers with access to high-performance networking infrastructure and block storage solutions.

App Engine is a fully managed, serverless platform for developing and hosting web applications at scale.

Delivered as Infrastructure-as-a-Service (IaaS)

Delivered as Platform-as-a-Service (PaaS)

Supported Languages: Any

Supported Languages: Go, Python, Java, Node.js, PHP, Ruby (.Net and Custom runtimes for Flexible Environment)

A machine type is a set of virtualized hardware resources available to a virtual machine (VM) instance, including the system memory size, virtual CPU (vCPU) count, and persistent disk limits. In Compute Engine, machine types are grouped and curated by families for different workloads. You can choose from general-purpose, memory-optimized, and compute-optimized families.

You can run your applications in App Engine using the flexible environment or standard environment. You can also choose to simultaneously use both environments for your application and allow your services to take advantage of each environment’s benefits.

You can create a collection of virtual instances and manage them as a single entity by creating instance groups. Instance groups can be managed instance groups (MIGs) or unmanaged instance groups.

Instances are the basic building blocks of App Engine, providing all the resources needed to successfully host your application. App Engine can automatically create and shut down instances as traffic fluctuates, or you can specify the number of instances to run regardless of the amount of traffic.

Compute Engine offers autoscaling to automatically add or remove VM instances from a managed instance group based on increases or decreases in load. Autoscaling lets your apps gracefully handle increases in traffic, and it reduces cost when the need for resources is lower.

You can specify what type of scaling you want to implement by the following -Basic Scaling-Automatic Scaling-Manual Scaling- 

App Engine can scale down to 0 instances when no one is using your application.

General Workloads, VM migration to Compute Engine, Genomics data processing, BYOL or use license-included images

Modern web applications, Scalable mobile back ends

Google Cloud Build

 

  • Build, test, and deploy on Google Cloud Platform’s serverless CI/CD platform.

Features

  • Cloud build is a fully serverless platform that helps you build your custom development workflows for building, testing, and deploying.
  • Cloud Build can import source code from:
    • Cloud Storage
    • Cloud Source Repositories
    • GitHub
    • Bitbucket
  • Supports Native Docker.
    • You can import your existing Docker file.
    • Push images directly to Docker image storage repositories such as Docker Hub and Container Registry.
  • You can also automate deployments to Google Kubernetes Engine (GKE) or Cloud Run for continuous delivery.
  • Automatically performs package vulnerability scanning for vulnerable images based on policies set by DevSecOps.
  • You can package source into containers or non-container artifacts like Maven, Gradle, Go, or Bazel.

Pricing

  • The first 120 build-minutes per day is free.
  • The succeeding time is charged.

Google Container Registry

 

  • Container Registry is a container image repository to manage Docker images, perform vulnerability analysis, and define fine-grained access control.

Features

  • Automatically build and push images to a private registry when you commit code to Cloud Source Repositories, GitHub, or Bitbucket.
  • You can push and pull Docker images to your private Container Registry utilizing the standard Docker command-line interface.
  • The system creates a Cloud Storage bucket to store all of your images the first time you push an image to Container Registry
  • You have the ability to maintain control over who can access, view, or download images.

Pricing

  • Container Registry charges for the following:
    • Storing images on Cloud Storage
    • Network egress for containers stored in the registry.
  • Network ingress is free.
  • If the Container Scanning API is enabled in either Container Registry, vulnerability scanning is turned on and billed for both products.

GCP Developer Tools

 

  • A fully managed git repository where you can securely manage your code.

Features

  • You will be able to extend your git workflow with Cloud Source Repositories. Set up a repository as a Git remote. Push, pull, clone, log, and perform other Git operations as required by your workflow.
  • You can create multiple repositories for a single Google Cloud project. This allows you to organize the code associated with your cloud project in the best way.
  • View repository files from within the Cloud Source Repositories using Source Browser. You can filter your view to focus on a specific branch, tag, or commit.
  • Private repositories are for free.
  • Can be automatically synced with Github and Bitbucket repositories.
  • Integrates with Cloud Build to automatically build and test an image when changes are pushed to Cloud Source Repositories.
  • You can get insights on actions performed on your repository with Cloud Audit Logs.

Pricing

  • Cloud Source Repositories charges based on:
    • Per user
    • Storage
    • Egress network