Saturday, 26 March 2022

Google Cloud Console

  • Google Cloud Console is a web admin interface to manage your Google cloud infrastructure.

Features

  • You can create projects on Google Cloud Console.
  • With Cloud Console, you can quickly find and check the health of all your cloud resources in one place, including virtual machines, network settings, and data storage.
  • Logging
    • Manage and audit user access to project resources.
    • Track down production issues quickly by viewing logs.
  • You can explore the Google Cloud Marketplace and launch cloud solutions with just a few clicks.
  • Billing
    • View a detailed billing breakdown of your bills.
    • Set spending budgets to avoid unexpected surprises
  • Cloud Console enables you to connect to your virtual machines via Cloud Shell. You can quickly handle admin tasks using this instant-on Linux machine equipped with your favorite tools including Google Cloud SDK preconfigured and authenticated.

Pricing

  • Cloud Console is available at no cost to Google Cloud Platform customers.

 

Google Cloud Dataproc

 

  • Build fully managed Apache Spark, Apache Hadoop, Presto, and other OSS clusters on the Google Cloud Platform using Cloud Dataproc.

Features

  • You can spin up resizable clusters quickly with various virtual machine types, disk sizes, number of nodes, and networking options on Cloud Dataproc.
  • Dataproc provides autoscaling features to help you automatically manage the addition and removal of cluster workers.
  • Cloud Dataproc has built-in integration with the following Google Cloud services for a more complete and robust platform.
    • Cloud Storage
    • BigQuery
    • Cloud Bigtable
    • Cloud Logging
    • Cloud Monitoring
    • AI Hub
  • It is capable of image versioning. This will allow you to switch between different versions of the tools you want to use.
  • To avoid charges for inactive clusters, you can utilize Dataproc’s scheduled deletion.
  • You can manage your clusters via
    • Cloud Console Web UI
    • Cloud SDK
    • RESTful APIs
    • SSH access.
  • Dataproc can be provisioned with custom images according to your needs.
  • Workflow templates provide a flexible and simple mechanism for managing and executing workflows.

Pricing

  • Only pay for the resources you use and lower the total cost of ownership of OSS
  • Dataproc pricing is based on the number of vCPUs and the duration that they run.

Google Cloud Dataflow

 

  • Cloud Dataflow is a fully managed data processing service for executing a wide variety of data processing patterns.

Features

  • Dataflow templates allow you to easily share your pipelines with team members and across your organization.
  • You can also take advantage of Google-provided templates to implement useful but simple data processing tasks.
  • Autoscaling lets the Dataflow automatically choose the appropriate number of worker instances required to run your job.
  • You can build a batch or streaming pipeline protected with customer-managed encryption key (CMEK) or access CMEK-protected data in sources and sinks.
  • Dataflow is integrated with VPC Service Controls to provide additional security on data processing environments by improving the ability to mitigate the risk of data exfiltration.

Pricing

  • Dataflow jobs are billed per second, based on the actual use of Dataflow batch or streaming workers. Additional resources, such as Cloud Storage or Pub/Sub, are each billed per that service’s pricing.

Google Cloud Dataprep

 

  • Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.

Features

  • You can transform structured or unstructured datasets of any size — megabytes to petabytes — with equal ease and simplicity.
  • Cloud Dataproc can transform datasets stored in CSV, JSON, or relational table formats.
  • You can process data stored in Cloud Storage, BigQuery, or from your desktop, then export the refined data to BigQuery or Cloud Storage for storage, analysis, visualization, or machine learning.
  • Uses a proprietary algorithm that interprets the data transformation intent of a user’s data selection.
  • You can leverage hundreds of transformation functions readily available to turn your data into the asset you want.
  • Cloud Dataprep enables users to collaborate on similar flow objects in real-time or to create copies for other team members to use for independent tasks.
  • Explore your data through interactive visual distributions to assist in your discovery, cleansing, and transformation process.
  • Cloud Dataprep automatically generates one or more samples of the data for display and manipulation in the client application to achieve performance optimization.

Pricing

  • Pricing is split across two variables;
    • Design – is priced on a per-project basis for an unlimited number of users.
    • Execution – consists of the Dataflow usage for running jobs in Dataprep.

Google BigQuery

 Google Cloud BigQuery

  • A fully managed data warehouse where you can feed petabyte-scale data sets and run SQL-like queries.

Features

  • Cloud BigQuery is a serverless data warehousing technology.
  • It provides integration with the Apache big data ecosystem allowing Hadoop/Spark and Beam workloads to read or write data directly from BigQuery using Storage API.
  • BigQuery supports a standard SQL dialect that is ANSI:2011 compliant, which reduces the need for code rewrites.
  • Automatically replicates data and keeps a seven-day history of changes which facilitates restoration and data comparison from different times.

Loading data into BigQuery

You must first load your data into BigQuery before you can run queries. To do this you can:

  • Load a set of data records from Cloud Storage or from a local file. The records can be in Avro, CSV, JSON (newline delimited only), ORC, or Parquet format.
  • Export data from Datastore or Firestore and load the exported data into BigQuery.
  • Load data from other Google services, such as
    • Google Ad Manager
    • Google Ads
    • Google Play
    • Cloud Storage
    • Youtube Channel Reports
    • Youtube Content Owner reports
  • Stream data one record at a time using streaming inserts.
  • Write data from a Dataflow pipeline to BigQuery.
  • Use DML statements to perform bulk inserts. Note that BigQuery charges for DML queries. See Data Manipulation Language pricing.

Querying from external data sources

  • BigQuery offers support for querying data directly from:
    • Cloud BigTable
    • Cloud Storage
    • Cloud SQL
  • Supported formats are:
    • Avro
    • CSV
    • JSON (newline delimited only)
    • ORC
    • Parquet
  • To query data on external sources, you have to create external table definition file that contains the schema definition and metadata.

Monitoring

  • BigQuery creates log entries for actions such as creating or deleting a table, purchasing slots, or running a load job.

Pricing

  • On-demand pricing lets you pay only for the storage and compute that you use.
  • Flat-rate pricing with reservations enables high-volume users to choose price for workloads that are predictable.
  • To estimate query costs, it is best practice to acquire the estimated bytes read by using the query validator in Cloud Console or submitting a query job using the API with the dryRun parameter. Use this information in Pricing Calculator to calculate the query cost.

Google Cloud Pub/Sub

 

  • Cloud Pub/Sub is a fully-managed real-time messaging service for event driven systems that allows you to send and receive messages between independent applications.

Features

  • Capable of global message routing to simplify multi-region systems.
  • Synchronous, cross-zone message replication and per-message receipt tracking ensure at-least-once delivery at any scale. Pub/Sub delivers each message at least once, so the Pub/Sub service might redeliver messages.
  • You can declare independent quota and billing for publishers and subscribers.
  • Cloud Pub/Sub doesn’t have shards or partitions. You just need to set your quota, publish, and consume.

Key Concepts

  • Topic
    • It is a named resource to which publishers send messages.
  • Subscription
    • Is a named resource representing the stream of messages from a specific topic, to be sent to the subscribing application.
  • Message
    • The combination of data and attributes that a publisher sends to a topic and is eventually sent to subscribers.
  • Message attribute
    • A key-value pair that a publisher can define for a message.

Publisher-subscriber relationships

  • A publisher application creates and sends messages to a topic.
  • Subscriber applications then create a subscription to a topic to receive messages from the topic.
  • Communication can be
    • one-to-many
    • many-to-one
    • many-to-many

Pricing

  • Pub/Sub pricing is calculated based upon monthly data volumes:
    • Message ingestion and delivery
    • Snapshots and retained acknowledged messages
  • The first 10 GB of data per month is offered free of charge.

Google Cloud Secret Manager

 

  • Secret Manager is a secure and convenient method to store API keys, passwords, certificates, and other sensitive data.
  • It provides a central place as the source of truth to manage, access, and audit secrets across Google Cloud.

Features

  • Secret names are project-global resources, but secret data is stored in regions.
  • You can choose specific regions in which to store your secrets.
  • Secret data is immutable and most operations take place on secret versions.
  • Secret Manager integrates with IAM.
  • Every interaction with Secret Manager generates an audit entry with Cloud Logging enabled to help you detect system anomalies.
  • You can enable context-aware access to Secret Manager from hybrid environments using VPC Service Controls.

Pricing

  • Secret Manager charges for operations and active secret versions.
  • A version is considered active if it is in the ENABLED or DISABLED state.