Linux Training in Coimbatore & Best Linux Server Administration Training Institute

Saturday, 26 March 2022

Google BigQuery

A fully managed data warehouse where you can feed petabyte-scale data sets and run SQL-like queries.

Features

Cloud BigQuery is a serverless data warehousing technology.
It provides integration with the Apache big data ecosystem allowing Hadoop/Spark and Beam workloads to read or write data directly from BigQuery using Storage API.
BigQuery supports a standard SQL dialect that is ANSI:2011 compliant, which reduces the need for code rewrites.
Automatically replicates data and keeps a seven-day history of changes which facilitates restoration and data comparison from different times.

Loading data into BigQuery

You must first load your data into BigQuery before you can run queries. To do this you can:

Load a set of data records from Cloud Storage or from a local file. The records can be in Avro, CSV, JSON (newline delimited only), ORC, or Parquet format.
Export data from Datastore or Firestore and load the exported data into BigQuery.
Load data from other Google services, such as
- Google Ad Manager
- Google Ads
- Google Play
- Cloud Storage
- Youtube Channel Reports
- Youtube Content Owner reports
Stream data one record at a time using streaming inserts.
Write data from a Dataflow pipeline to BigQuery.
Use DML statements to perform bulk inserts. Note that BigQuery charges for DML queries. See Data Manipulation Language pricing.

Querying from external data sources

BigQuery offers support for querying data directly from:
- Cloud BigTable
- Cloud Storage
- Cloud SQL
Supported formats are:
- Avro
- CSV
- JSON (newline delimited only)
- ORC
- Parquet
To query data on external sources, you have to create external table definition file that contains the schema definition and metadata.

Monitoring

BigQuery creates log entries for actions such as creating or deleting a table, purchasing slots, or running a load job.

Pricing

On-demand pricing lets you pay only for the storage and compute that you use.
Flat-rate pricing with reservations enables high-volume users to choose price for workloads that are predictable.
To estimate query costs, it is best practice to acquire the estimated bytes read by using the query validator in Cloud Console or submitting a query job using the API with the dryRun parameter. Use this information in Pricing Calculator to calculate the query cost.

Google Cloud Spanner

A fully managed relational database service that scales horizontally with strong consistency.

Features

SLA availability up to 99.999% for multi-regional instances with 10x less downtime than four nines.
Provides transparent, synchronous replication across region and multi-region configurations.
Optimizes performance by automatically sharding the data based on request load and size of data so you can spend less time thinking about scaling your database and more time scaling your business.
You can run instances on a regional scope or multi-regional where your database is able to survive regional failure.
All tables must have a declared primary key (PK), which can be composed of multiple table columns.
Can make schema changes like adding a column or adding an index while serving live traffic with zero downtime.

Pricing

Pricing for Cloud Spanner is simple and predictable. You are only charged for:
- number of nodes in your instance
- amount of storage that your tables and secondary indexes use (not pre-provisioned)
- amount of network bandwidth (egress) used
Note that there is no additional charge for replication.

Google Cloud SQL

A fully managed relational database service. Cloud SQL is available for:
- MySQL
- PostgreSQL
- SQL Server

Features

Scale instantly with a single API call as your data grows.
Automated and on-demand backups are available.
You can restore your database instance to its state at an earlier point in time by enabling binary logging.
Data replication between multiple zones with automatic failover.
You can perform an analytics job by using BigQuery to directly query your CloudSQL instance.

Networking

Can be easily connected to App Engine, Compute Engine, Google Kubernetes Engine, and your workstation.

Security

Data is encrypted at rest and in transit and can be encrypted using customer-managed encryption keys.
It supports private connectivity with Virtual Private Cloud.
Every Cloud SQL instance includes a network firewall to allow you to publicly control network access to your database instances.

Pricing

Price varies depending on how much storage, memory, and CPU you provision.
Cloud SQL offers per-second billing and database instances.
Committed use discounts are offered for continuous use of database instances in a particular region for a one-year or three-year term.

Google Cloud Storage (GCS)

An object storage service that stores data within buckets.
Below is a sample Cloud Storage integration:

Buckets

The data you upload on Cloud Storage are called objects.
An object is an immutable piece of data consisting of a file in any format.
You store objects inside containers called buckets.
All buckets belong to a project.
Each project can have multiple buckets.
You can also configure a Cloud Storage bucket to host a static website for a domain you own.

Bucket Configurations

Life Cycle Management
- You can define conditions that trigger data deletion, or transition to a cheaper storage class with object life cycle management.
Versioning
- Continue to store old copies of objects you store when they are deleted or overwritten.
Retention Policies
- Define minimum retention periods that objects must be stored.
Object holds
- Place a hold on an object to prevent deletion.
Encryption keys
- Customer-managed
- Customer-supplied
Access Permissions
- Access Control List
- Uniform bucket level access
- Object and Bucket Level Permissions

Storage Classes

Standard Storage
- Good for hot data that is accessed frequently.
Nearline Storage
- Good for use cases that need to store objects for at least 30 days.
- Ideal for data that you plan to access once per month or less.
Coldline Storage
- Is a low-cost storage option for storing infrequently accessed data within 90 days.
Archive Storage
- Is the coldest storage among the storage classes.
- Designed for storing archive data and disaster recovery data that is expected to be accessed once per 365 days or less.

gsutil tool

A Python application that enables you to manage your Cloud Storage from the command line.
You can use gsutil to perform bucket and object management tasks like:
- creating and deleting buckets
- uploading, downloading, and deleting objects
- listing buckets and objects
- moving, copying, and renaming objects
- editing object and bucket ACL
gsutil performs all operations using HTTPS and TLS

Uploading objects to GCS

You can send upload requests to Google Cloud Storage via the following methods:

Simple Upload – utilize this if the file is small enough to upload again if the connection fails, and if there is no object metadata to send as part of the upload request.
Multipart Upload – utilize this if the file is small enough to upload again if the connection fails, and you need to include object metadata as part of the upload request.
Resumable Upload – utilize this for a more reliable transfer, which is especially important with large files.
Parallel composite uploads – utilize if network and disk speed are not limiting factors. When doing parallel composite upload, a file is divided into up to 32 chunks and uploaded in parallel to temporary objects. The final object is recreated using the temporary objects, and the temporary objects are deleted.
Alternatively, for uploading large volumes of data (from hundreds of terabytes up to 1 petabyte), you can utilize the Transfer Appliance. It is a hardware appliance you can use to securely migrate to Google Cloud Platform without disrupting business operations.

Pricing

Pricing for Cloud Storage services is based on what you use, including:
- the amount of data you store,
- the duration for which you store it,
- the number of operations you perform on your data,
- the network resources used when moving or accessing your data.
For “cold” storage classes meant to store long-term, infrequently accessed data, there are also charges for retrieving data and early deletion of data.
You can require accessors of your data to include a project ID to bill for network charges, operation charges, and retrieval fees.

Google Cloud Filestore

Fully managed NFS file servers on Google Cloud for Compute Engine and Google Kubernetes Engine instances
Most commonly used for media rendering, data analytics, and managing shared content.

Features

Simple, fast, consistent, scalable, and easy to use network-attached storage.
You can copy data from Cloud Storage to a filestore fileshare that is mounted on a Compute Engine instance.
Data is encrypted at rest and in transit with system-defined keys or customer-supplied keys.
Filestore instances are zonal resources that feature in-zone storage redundancy only.
It is tightly integrated with Google Kubernetes Engine (GKE) so containers can reference the same shared data.
You can easily grow or shrink your Filestore instances via the Google Cloud Console GUI, gcloud command line, or via API-based controls.

Filestore Performance Service Tiers

You can pick a performance tier to support your workload requirements.
- Basic (HDD) – General purpose, test/dev
- Basic (SSD) – High performance, limited capacity
- High Scale (SSD) – High performance, large capacity

Pricing

Filestore is priced based on the following factors:
- Service Tier – Basic Standard, Basic Premium, or High Scale SSD
- Instance Capacity – refers to the storage capacity allocation of your instance
- Region – the location to which the instance is provisioned
There is no charge for ingress traffic to Filestore or egress traffic to a client within the same zone as the Filestore instance. However, there is a charge for egress from Filestore when network traffic leaves the zone of the Filestore instance.

Persistent Disks

Are durable network storage devices that you can provision to host your virtual machine instances.

Features

Data on each persistent disk is distributed across several physical disks and is designed for high durability. It stores data redundantly to ensure data integrity.
Persistent disks are resizable to accommodate larger storage requirements.
It can be attached to virtual machines running on Compute Engine (GCE) or Google Kubernetes Engine (GKE).
You cannot attach a persistent disk to an instance on another project.
Your storage is independent of your virtual machine instances so you can detach or move your PDs to keep your data even after you delete your instances.
You can only change the size of a Persistent Disk incrementally.

Zonal and Regional Persistent Disks

You can configure your PD to be zonal or regional.

Zonal Disks
- Are relatively faster than Regional disks and are found in a zone.
Regional Disks
- Provides replication of data between two zones in the same region.
- Is designed for building robust and highly available systems on Compute Engine.

Persistent Disk Types

Standard (pd-standard)
- Backed by standard hard disk drives (HDD).
- Efficient and economical for handing sequential read/write operations but they aren’t optimized to handle high rates of random input/output per second (IOPS).
Balanced and SSD Disks
- Backed by solid state drives (SSD)
- SSD persistent disks are designed for single-digit millisecond latencies

Encryption

Data on persistent disks are automatically encrypted at rest and in transit by system defined encryption keys or with customer-supplied keys.
To control your data encryption, you can create PDs with your own encryption keys.

Snapshots

Persistent disk snapshots can be created to protect against data loss.
Snapshots are incremental and take only minutes to create even if you snapshot disks that are attached to running instances.
You can set up a snapshot schedule to back up your data on a regular basis.

Pricing

Provisioning persistent disks incurs cost based on the following factors:
- Amount and location of provisioned space per disk
- Snapshot Storage
- Network charges for snapshot creation

Friday, 25 March 2022

Local SSD

Is a local solid-state drive storage physically attached to the server that hosts your virtual machine (VM) instances.

Features

Tightly coupled to a physical server that offers superior performance, very high input/output operations per second (IOPS), and very low latency compared to other block storage options.
Each local SSD is 375 GB. Moreover, you can attach a maximum of 24 Local SSD partitions. You can also format and mount several local SSD partitions into one logical volume.
Local SSDs are designed for temporary storage use cases which makes them suitable for workloads like:
- Media Rendering
- Data Analytics
- Caches
- Processing Space
Date stored in the GCP infrastructure is automatically encrypted at rest including Local SSDs too.
The performance boosts you get from Local SSDs require certain trade-offs like availability, durability, and flexibility. Because of these, the storage is not automatically replicated and all data on the local SSD may be lost if the instance stops for any reason.
You are not able to stop and restart an instance that has a local SSD. This means that if you shut down an instance with a local SSD through the guest OS, you cannot restart the instance and all the data stored on the local SSD will be lost.