News About GPC Services
At Google Cloud, we work with businesses in a range of industries, and we’ve seen nearly every business experience peak events when their online traffic skyrockets. For retailers, their peak events are Black Friday and Cyber Monday (or BFCM)—the period right after Thanksgiving in the U.S., when holiday shopping starts. The weekend kicks off the all-important holiday shopping season of November and December, when an estimated 20% of all annual retail sales occur.
During an average day, online retail sales in the U.S. total about $1.4 billion, CNET reports. In contrast, on Black Friday 2018, U.S. online sales totaled $6.22 billion (up 24% from 2017). Cyber Monday 2018 sales surged to $7.9 billion (up 19% from 2017)—the biggest online sales day ever in the U.S., according to Adobe Analytics.
Traffic to retailers’ mobile and shopping apps surges to levels unmatched during the rest of the year, and availability or scalability issues can result in millions of dollars of lost sales. Every year, there are well-publicized retail website crashes, so avoiding downtime—along with the accompanying reputation damage, unhappy customers and stressed, overworked IT teams—is particularly important for retailers.
We know that a solid technology infrastructure is the foundation for retailers to stay ahead of demand and succeed during this busy season. Beyond that, though, support for that infrastructure is essential. Support isn’t just activated if something goes wrong. Support for an event like Black Friday and Cyber Monday involves preparation well ahead of time, and includes testing, architecture reviews, capacity planning, operational drills, and war rooms during the event itself. We took a prescriptive approach to BFCM support, setting expectations and ownership early (more than six months ahead), to understand what each retail customer needed, both on their side and from our team.
We’ll go through the steps that helped our retail customers have a fruitful and disaster-free season. These steps can generally help you prepare for your own peak event. We’ll also describe how one large-scale retail platform in particular—Shopify—had a successful BFCM using Google Cloud.
Preparing to support retailers on Black Friday/Cyber Monday
We started planning for Black Friday and Cyber Monday for our retail customers in the spring of 2018 to align with their typical preparation timeline. We formed a task force composed of representatives from Google Cloud’s Professional Services, Customer Engineering, Support, Customer Reliability Engineering (CRE), and Product and Engineering teams. We met regularly to strategize, develop tactics, and execute on those tactics with the goal of making sure Google team members and our GCP retail customers were well-prepared.
We focused on a few key technology areas where planning could help prevent any issues.
1. Early capacity planning
As early as May 2018, our account teams began reaching out to GCP retail customers. We discussed high-level planning, such as their particular holiday shopping objectives and the infrastructure capacity they might need to meet those goals.
We worked closely with retailers to review their architectures and advise on techniques to forecast and plan for increases in capacity before Black Friday, since scalability is essential when planning for traffic spikes. We conducted tests across teams and services, and stress-tested systems to uncover any constraints or weaknesses and remediate as needed. Those tailored preparations paid off across the board. With GCP capacity status firmly green—available—throughout Black Friday and Cyber Monday, shoppers visiting our retail customers’ sites could make their purchases without running into a slow or unresponsive site.
2. Reliability testing
Identifying potential reliability issues in a “pre-mortem” (an important component of CRE) was another preemptive step we took. Early on, our CRE team partnered with our retail customers to analyze the reliability of their infrastructures, and run through tabletop exercises to see how well-prepared the customer was in the face of a failure. In some cases, the Professional Services team helped perform load testing to make sure retailers’ platforms could handle expected levels of peak traffic, and in others we encouraged regular load testing and evaluation. And given how important mobile commerce has become, we also tested the performance and reliability of customers’ mobile apps. We also employed Apigee’s API monitoring tools to ensure API stability. We’ve seen APIs become more important in retail technology, since they allow more flexible, microservice-based e-commerce sites.
3. Operational war rooms
“What could possibly go wrong?”
That’s the million-dollar question to ask before a big IT event. We got together with our retail customers’ IT and engineering teams to explore and test for possible worst-case scenarios, like an entire site crash. We created a central Black Friday/Cyber Monday war room staffed with senior-level, experienced Googlers from the Professional Services, Support, and Site Reliability Engineering (SRE) teams. This team of first responders was prepared to use real-time communications to stay connected and address any problems as soon as they arose. This was in addition to understanding customer and vendor integrations and making sure escalation paths were defined ahead of time, so that customer expectations were clear for various channels.
During that weekend, we doubled the number of on-call support staff available to retail customers. In some cases, we placed account teams on-site at GCP and Apigee retail customer locations to help as needed. We monitored whether any retail customers were starting to have reliability or latency problems. If something needed to be triaged, the war room team kicked into action, tackling issues and advising on next steps. The Google war room team also had direct, open access to Google engineers and executives for additional support.
Apigee team members kept a close eye on API traffic during the Black Friday period. The number of API calls for Apigee’s customers (excluding those who host the platform on-premises) grew 95% compared to the same span of time in 2017. Peak API traffic running through Apigee more than doubled, from 48,000 transactions per second (TPS) to 108,000 TPS this year, and the platform remained 99.999% available.
How retailers sailed through Black Friday and Cyber Monday
One of our retail partners, Shopify, is an e-commerce platform supporting more than 600,000 independent retailers. The complexity of managing all those storefronts makes predicting holiday site traffic and sales spikes even more challenging. Shopify provides a platform with 99.98% uptime, and calls BFCM their annual “World Cup” event.
Developing applications today comes with lots of choices and plenty to learn, whether you’re exploring serverless computing or managing a raft of APIs. In today’s post, we’re sharing some of our top videos on what’s new in application development on Google Cloud Platform (GCP) to find tips and tricks you can use.
This demo-packed session walks you through the use of Knative, our Kubernetes-based platform for building and deploying serverless apps. This session goes through how to get started with using Knative to further the goal of focusing on writing code. You’ll see how it uses APIs that are familiar from GKE, and auto-scales and auto-builds to remove added tasks and overhead. The demos show how Knative spins up prebuilt containers, builds custom images, previews new versions of your apps, migrates traffic to those versions, and auto-scales to meet unpredictable usage patterns, among other steps in the build and deploy pipeline. You’ll see the cold start experience, along with preconfigured monitoring dashboards and how auto-termination works.
The takeaway: Get an up-close view into how a serverless platform like Knative works, and what it looks like to further abstract code from the underlying infrastructure.
You have a lot of key choices to make when deciding how and which technology to adopt to meet your application development needs. In this session, you’ll hear about various options for running code and the tradeoffs that may come with your decisions. Considerations include what your code is used for: Does it connect to the internet? Are there licensing considerations? Is it part of a push toward CI/CD? Is it language-dependent or kernel-limited? It’s also important to consider your team’s skills and interests as you decide where you want to focus, and where you want to run your code.
The takeaway: Understand the full spectrum of compute models (and related Google Cloud products) first, then consider the right tool for the job when choosing where to run your code.
Kubernetes empowers developers by making hard tasks possible. In this session, we introduce Kubernetes as a workload-level abstraction that lets you build your own deployment pipeline, and starts with the premise that rather than making simple tasks easier. The session walks through how to deploy containers with Kubernetes, and configuring a deployment pipeline with Cloud Build. Deployment strategy advice includes using probes to check container integrity and connectedness, using configuration as code for a robust production deployment environment, setting up a CI/CD pipeline, and requesting that the scheduler provision the right resources for your container. It concludes with some tips on preparing for growth by configuring automated scaling using the requests per section (RPS) metric
The takeaway: Kubernetes can help you automate deployment operations in a highly flexible and customizable way, but needs to be configured correctly for maximum benefit. Help Kubernetes help you for best results.
There’s a lot of advice out there about APIs, so this session recommends focusing on what your goals are for each API you create. That could be updating or integrating software, among others. Choose a problem that’s important to solve with your API, and weigh your team and organization’s particular priorities when you’re creating that API. This session also points out some areas where common API mistakes happen, like version control or naming, and recommends using uniform API structure. When in doubt, keep it simple and don’t mess up how HTTP is actually used.
The takeaway: APIs have to do a lot of heavy lifting these days. Design the right API for the job and future-proof it as much as you can for the people and organizations who will use it down the road.
This session takes a top-to-bottom look at how we define and run serverless here at Google. Serverless compute platforms make it easy to quickly build applications, but sometimes identifying and diagnosing issues can be difficult without a good understanding of how the underlying machinery is working. In this session, you’ll learn how Google runs untrusted code at scale in a shared computing infrastructure, and what that means for you and your applications. You’ll learn how to build serverless applications that are optimized for high performance at scale, learn the tips and pitfalls associated with this, and see a live demo of optimization on Cloud Functions.
The takeaway: When you’re running apps on a serverless platform, you’re focusing on managing those things that elevate your business. See how it actually works so you’re ready for this stage of cloud computing.
Here’s a look at what serverless is, and what it is specifically on GCP. The bottom line is that serverless brings invisible infrastructure that automatically scales, and where you’re only charged for what you use. Serverless tools from GCP are designed to spring to life when they’re needed, and scale very closely to usage needs. In this session, you’ll get a look at how the serverless pieces come together with machine learning in a few interesting use cases, including medical data transcription and building an e-commerce recommendation engine that works even when no historical data is available. Make sure to stay for the cool demo from the CEO of Smart Parking, who shows a real-time, industrial-grade IoT system that’s improving parking for cities and drivers—without a server to be found.
The takeaway: Serverless helps workloads beyond just compute: learn how, why, and when you might use it for your own apps.
As California’s recent wildfires have shown, it’s often hard to predict where fire will travel. While firefighters rely heavily on third-party weather data sources like NOAA, they often benefit from complementing weather data with other sources of information. (In fact, there’s a good chance there’s no nearby weather station to actively monitor weather properties in and around a wildfire.) How, then, is it possible to leverage modern technology to help firefighters plan for and contain blazes?
Last June, we chatted with Aditya Shah and Sanjana Shah, two students at Monta Vista High School in Cupertino, California, who’ve been using machine learning in an effort to better predict the future path of a wildfire. These high school seniors had set about building a fire estimator, based on a model trained in TensorFlow, that measures the amount of dead fuel on the forest floor—a major wildfire risk. This month we checked back in with them to learn more on how they did it.
Why pick this challenge?
Aditya spends a fair bit of time outdoors in the Rancho San Antonio Open Space Preserve near where he lives, and wanted to protect it and other areas of natural beauty so close to home. Meanwhile, after being evacuated from Lawrence Berkeley National Lab in the summer of 2017 due a nearby wildfire, Sanjana wanted to find a technical solution that reduced the risk of fire even before it occurs. Wildfires not only destroy natural habitat but also displace people, impact jobs, and cause extensive damage to homes and other property. Just as prevention is better than a cure, preventing a potential wildfire from occurring is more effective than fighting it.
With a common goal, the two joined forces to explore available technologies that might prove useful. They began by taking photos of the underbrush around the Rancho San Antonio Open Space Preserve, cataloguing a broad spectrum of brush samples—from dry and easily ignited, to green or wet brush, which would not ignite as easily. In all, they captured 200 photos across three categories of underbrush: “
gr1” (humid), “
gr2” (dry shrubs and leaves), and “
gr3” (no fuel, plain dirt/soil, or burnt fuel).
Aditya and Sanjana then trained a successful model with 150 sample (training) images (roughly 50 in each of the three classes) plus a 50 image test (evaluation) set. For training, the pair turned to Keras, their preferred Python-based, easy-to-use deep learning library. Training the model in Keras has two benefits—it permits you to export to a TensorFlow estimator, which you can run on a variety of platforms and devices, and it allows for easy and fast prototyping since it runs seamlessly on either CPU or GPU.
Preparing the data
Before training the model, Aditya and Sanjana ran a preprocessing step on the data: resizing and flattening the images. They used the
image_to_function_vector, which accepts raw pixel intensities from an input bitmap image and resizes that image to a fixed size, to ensure each image in the input dataset has the same ‘feature vector’ size. As many of the images are of different sizes, the pair resized all their captured images to 32×32 pixels. Since Keras models take as their input a 1-dimensional feature vector, he needed to flatten the 32x32x3 image into a 3072-dimensional feature vector. Further, he defined the ImagePaths to initiate the list of data and label, then looped over the ImagePaths individually to load them to the folder storage using
cv2.imread function. Next, the pair extracted the class labels (as
gr3) from each image’s name. He then converted the images to feature vectors using
image_to_feature_vector function and updated the data and label lists to match.
Aditya and Sanjana next discovered that the simplest way to build the model was to linearly stacks layers, to form a sequential model, which simplified organization of the hidden layers. They were able to use the
img2vec function, built into TensorFlow, as well as a support-vector-machine (SVM) layer.
Next, the pair trained the model using a stochastic gradient descent (SGD) optimizer with learning rate = 0.05. SGD is an iterative method for finding an optimal solution by using stochastic gradient descent. There are a number of gradient descent methods typically used, including
adam. Aditya and Sanjana tried
rmsprop, which yielded very low accuracy (~47%). Some methods like
adagrad yielded slightly higher accuracy but took more time to run. So they decided to use SGD as it offers better optimization with good accuracy and fast running time. In terms of hyperparameters, the pair tried different numbers of training
epochs (50, 100, 200) and
batch_size values (10, 35, 50), and he achieved the highest accuracy (94%) with
epoch = 200 and
batch_size = 35.
Unfortunately, the order counts still don’t match. But this time the restaurant website has more orders (11) than the customer site (9). By reviewing the restaurant’s order list, we see the reason for this situation: duplicate orders. For example, an order for Hawaii, Broccoli Salad, Water was created by the customer only once but appears twice on the restaurant site, assigned to two different cooks! Customers may be fine with that but just like missing customer orders, delivering extra pizzas is not good for Kale Pizza & Pasta’s business.
Why are we getting duplicate orders? By looking into the reported errors, we see that not only does the
chooseCook function return transient errors but the
prepareMeal function does as well. Now, looking into the
processOrder function source code again, we see that a new order document is added to Cloud Firestore every time a function executes. This results in duplicates in the following scenario: when an order is added to Cloud Firestore and then the call to
prepareMeal function fails, the function is retried, resulting in the same order (potentially with a different cook assigned) being written to Cloud Firestore as a separate document.
We discussed situations like this in our blog post about idempotency, showing how you must make a function idempotent if you want to apply retries without duplicate results or side effects.
In this case, to make the
processOrder function idempotent, we can use a Cloud Firestore transaction in place of the
add() call. The transaction first checks if the given order has already been stored (using the event ID to uniquely identify an order), and then creates a document in the database if the order does not exist yet:
In October at Next ’18 London, we announced Cloud Identity for Customers and Partners (CICP) to help you add Google-grade identity and access management (IAM) functionality to your apps, protect user accounts, and scale with confidence—even if those users are customers, partners, and vendors who might be outside of your organization. CICP is now available in public beta.
Adding Google-grade authentication to your apps
All users expect simple and secure sign-up, sign-in, and self-service experiences from all their devices. While you could build an IAM system for your apps, it can be hard and expensive. Just think about the complexity of building and maintaining an IAM system that stays up-to-date with evolving authentication requirements, keeping user accounts secure in the face of threats that increase in occurrence and sophistication, and scaling the system reliably when the demand for your app grows.
Knative, the open-source framework that provides serverless building blocks for Kubernetes, is on a roll, and GKE serverless add-on, the first commercial Knative offering that we announced this summer, is enjoying strong uptake with our customers. Today, we are announcing that we’ve updated GKE serverless add-on to support Knative 0.2. In addition, today at KubeCon, RedHat, IBM, and SAP announced their own commercial offerings based on Knative. We are excited for this growing ecosystem of products based on Knative.
Knative allows developers to easily leverage the power of Kubernetes, the de-facto cross-cloud container orchestrator. Although Kubernetes provides a rich toolkit for empowering the application operator, it offers less built-in convenience for application developers. Knative solves this by integrating automated container build, fast serving, autoscaling and eventing capabilities on top of Kubernetes so you get the benefits of serverless, all on the extensible Kubernetes platform. In addition, Knative applications are fully portable, enabling hybrid applications that can run both on-prem and in the public cloud.
Knative plus Kubernetes together form a general purpose platform with the unique ability to run serveless, stateful, batch, and machine learning (ML) workloads alongside one another. That means developers can use existing Kubernetes capabilities for monitoring, logging, authentication, identity, security and more, across all their modern applications. This consistency saves time and effort, reduces errors and fragmentation and improves your time to market. As a user you get the ease of use of Knative where you want it, with the power of Kubernetes when you need it.
In the four months since we announced Knative, an active and diverse community of companies has contributed to the project. Google Kubernetes Engine (GKE) users have been actively using the GKE serverless add-on since its launch in July and have provided valuable feedback leading to many of the improvements in Knative 0.2.
In addition to Google, multiple partners are now delivering commercial offerings based on Knative. Red Hat announced that you can now start trying Knative as part of its OpenShift container application platform. IBM has committed to supporting Knative on its IBM Cloud Kubernetes Service. SAP is using Knative as part of its SAP Cloud Platform and open-source Kyma project.
A consistent experience, with the flexibility to run where you want, resonates with many enterprises and startups. We are pleased that Red Hat, IBM, and SAP are embracing Knative as a powerful open industry-wide approach to serverless. Here’s what Knative brings to each of the new commercial offerings:
“The serverless paradigm has already demonstrated that it can accelerate developer productivity and significantly optimize compute resources utilization. However, serverless offerings have also historically come with deep vendor lock-in. Red Hat believes that Knative, with its availability on Red Hat OpenShift, and collaboration within the open source community behind the project, will enable enterprises to benefit from the advantages of serverless while also minimizing lock-in, both from a perspective of application portability, as well as that of day-2 operations management.” – Reza Shafii, VP of product, platform services, at Red Hat
“IBM believes open standards are key to success as enterprises are shifting to the era of hybrid multi-cloud, where portability and no vendor lock-in are crucial. We think Knative is a key technology that enables the community to unify containers, apps, and functions deployment on Kubernetes.” – Jason McGee, IBM Fellow, VP and CTO, Cloud Platform.
“SAP’s focus has always been centered around simplifying and facilitating end-to-end business processes. SAP Cloud Platform Extension Factory is addressing the need to integrate and extend business solutions by providing a central point of control, allowing developers to react on business events and orchestrate complex workflows across all connected systems. Under the hood, we are leveraging cloud-native technologies such as Knative, Kubernetes, Istio and Kyma. Knative tremendously simplifies the overall architecture of SAP Cloud Platform Extension Factory and we will continue to collaborate and actively contribute to the Knative codebase together with Google and other industry leaders.” – Michael Wintergerst, SVP, SAP Cloud Platform
We’re excited to deliver enterprise-grade Knative functionality as part of Google Kubernetes Engine, and by its momentum in the industry. To get started, take part in ther GKE serverless add-on alpha. To learn more about the Knative ecosystem, check out our post on the Google Open Source blog.
In this post, we’ll help you get started deploying the Cloud Storage connector for your CDH clusters. The methods and steps we discuss here will apply to both on-premise clusters and cloud-based clusters. Keep in mind that the Cloud Storage connector uses Java, so you’ll want to make sure that the appropriate Java 8 packages are installed on your CDH cluster. Java 8 should come pre-configured as your default Java Development Kit.[Check out this post if you’re deciding how and when to use Cloud Storage over the Hadoop Distributed File System (HDFS).]
Here’s how to get started:
Distribute using the Cloudera parcel
If you’re running a large Hadoop cluster or more than one cluster, it can be hard to deploy libraries and configure Hadoop services to use those libraries without making mistakes. Fortunately, Cloudera Manager provides a way to install packages with parcels. A parcel is a binary distribution format that consists of a gzipped (compressed) tar archive file with metadata.
We recommend using the CDH parcel to install the Cloud Storage connector. There are some big advantages of using a parcel instead of manual deployment and configuration to deploy the Cloud Storage connector on your Hadoop cluster:
Self-contained distribution: All related libraries, scripts and metadata are packaged into a single parcel file. You can host it at an internal location that is accessible to the cluster or even upload it directly to the Cloudera Manager node.
No need for sudo access or root: The parcel is not deployed under /usr or any of the system directories. Cloudera Manager will deploy it through agents, which eliminates the need to use sudo access users or root user to deploy.
Create your own Cloud Storage connector parcel
To create the parcel for your clusters, download and use this script. You can do this on any machine with access to the internet.
This script will execute the following actions:
Download Cloud Storage connector to a local drive
Package the connector Java Archive (JAR) file into a parcel
Place the parcel under the Cloudera Manager’s parcel repo directory
If you’re connecting an on-premise CDH cluster or cluster on a cloud provider other than Google Cloud Platform (GCP), follow the instructions from this page to create a service account and download its JSON key file.
Create the Cloud Storage parcel
Next, you’ll want to run the script to create the parcel file and checksum file and let Cloudera Manager find it with the following steps:
1. Place the service account JSON key file and the create_parcel.sh script under the same directory. Make sure that there are no other files under this directory.
2. Run the script, which will look something like this:
$ ./create_parcel.sh -f <parcel_name> -v <version> -o <os_distro_suffix>
- parcel_name is the name of the parcel in a single string format without any spaces or special characters. (i.e.,, gcsconnector)
- version is the version of the parcel in the format x.x.x (ex: 1.0.0)
- os_distro_suffix: Like the naming conventions of RPM or deb, parcels need to be named in a similar way. A full list of possible distribution suffixes can be found here.
- d is a flag you can use to deploy the parcel to the Cloudera Manager parcel repo folder. It’s optional; if not provided, the parcel file will be created in the same directory where the script ran.
3. Logs of the script can be found in /var/log/build_script.log
Distribute and activate the parcel
Once you’ve created the Cloud Storage parcel, Cloudera Manager has to recognize the parcel and install it on the cluster.
The script you ran generated a .parcel file and a .parcel.sha checksum file. Put these two files on the Cloudera Manager node under directory /opt/cloudera/parcel-repo. If you already host Cloudera parcels somewhere, you can just place these files there and add an entry in the manifest.json file.
On the Cloudera Manager interface, go to Hosts -> Parcels and click Check for New Parcels to refresh the list to load any new parcels. The Cloud Storage connector parcel should show up like this:
[Editor’s note: This post originally appeared on the Velostrata blog. Velostrata has since come into the Google Cloud fold, and we’re pleased to now bring you their seasoned perspective on deciding to migrate to cloud. There’s more here on how Velostrata’s accelerated migration technology works. ]
At Velostrata, we’ve spent a lot of time talking about how to optimize the cloud migration process. But one of the questions we also get a lot is: What drives an enterprise’s cloud migration in the first place? For this post, we chatted with customers and dug into our own data, along with market data from organizations like RightScale and others to find the most common reasons businesses move to the cloud. If you think moving to the cloud may be in your future, this can help you determine what kinds of events may result in starting a migration plan.
1. Data center contract renewals
Many enterprises have contracts with private data centers that need to be periodically renewed. When you get to renegotiation time for these contracts, considerations like cost adjustments or other limiting factors often come up. Consequently, it’s during these contract renewal periods that many businesses begin to consider moving to the cloud.
When companies merge, it’s often a challenge to match up application landscapes and data—and doing this across multiple on-prem data centers can be all the more challenging. Lots of enterprises undergoing mergers find that moving key applications and data into the cloud makes the process easier. Using cloud also makes it easier to accommodate new geographies and employees, ultimately resulting in a smoother transition.
3. Increased capacity requirements
Whether it’s the normal progression of a growing business or the need to accommodate huge capacity jumps during seasonal shifts, your enterprise can benefit from being able to rapidly increase or decrease compute. Instead of having to pay the maximum for on-prem capacity, you can shift your capacity on-demand with cloud and pay as you go.
4. Software and hardware refresh cycles
When you manage an on-prem data center, it’s up to you to keep everything up to date. This can mean costly on-prem software licenses and hardware upgrades to handle the requirements of newly upgraded software. We’ve seen that when evaluating an upcoming refresh cycle, many enterprises find it’s significantly less expensive to decommission on-prem software and hardware and consider either a SaaS subscription or a lift-and-shift of that application into the public cloud. Which path you choose will depend greatly on the app (and available SaaS options), but either way it’s the beginning of a cloud migration project.
5. Security threats
With security threats only increasing in scale and severity, we know many enterprises that are migrating to the cloud to mitigate risk. Public cloud providers offer vast resources for protecting against threats—more than nearly any single company could invest in.
6. Compliance needs
If you’re working in industries like financial services and healthcare, ensuring data compliance is essential for business operations. Moving to the cloud means businesses are using cloud-based tools and services that are already compliant, helping remove some of the burden of compliance from enterprise IT teams.
7. Product development benefits
By taking advantage of benefits like a pay-as-you-go cost model and dynamic provisioning for product development and testing, many enterprises are finding that the cloud helps them get products to market faster. We see businesses migrating to the cloud not just to save time and money, but also to realize revenue faster.
8. End-of-life events
All good things must come to an end—software included. Increasingly, when critical data center software has an end-of-life event announcement, it can be a natural time for enterprise IT teams to look for ways to replicate those services in the cloud instead of trying to extend the life cycle on-prem. This means enterprises can decommission old licenses and hardware along with getting the other benefits of cloud.
As you can see, there are a lot of reasons why organizations decide to kick off their cloud journeys. In some cases, they’re already in the migration process when they find even more ways to use cloud services in the best way. Understanding the types of events that frequently result in a cloud migration can help you determine the right cloud architecture and migration strategy to get your workloads to the cloud.
Learn more here about cloud migration with Velostrata.
For engineers or developers in charge of integrating, transforming, and loading a variety of data from an ever-growing collection of sources and systems, Cloud Composer has dramatically reduced the number of cycles spent on workflow logistics. Built on Apache Airflow, Cloud Composer makes it easy to author, schedule, and monitor data pipelines across multiple clouds and on-premises data centers.
Let’s walk through an example of how Cloud Composer makes building a pipeline across public clouds easier. As you design your new workflow that’s going to bring data from another cloud (Microsoft Azure’s ADLS, for example) into Google Cloud, you notice that upstream Apache Airflow already has an ADLS hook that you can use to copy data. You insert an import statement into your DAG file, save, and attempt to test your workflow. “ImportError – no module named x.” Now what?
As it turns out, functionality that has been committed upstream—such as brand new Hooks and Operators—might not have made its way into Cloud Composer just yet. Don’t worry, though: you can still use these upstream additions by leveraging the Apache Airflow Plugin interface.
Using the upstream AzureDataLakeHook as an example, all you have to do is the following:
Copy the code into a separate file (ensuring adherence to the Apache License)
from airflow.plugins_manager import AirflowPlugin)
Add the below snippet to the bottom of the file:
AWS also has a free tier, it’s like giving the first hit of ecstasy to someone free. Why not use this free server. Then that server needs to expand and you make plans and youre hooked and know the AWS cloud better than Google.
Google Cloud offers a credit of $300 right now to try and get you involved but its not the same as a free tier of service. Once the $300 is gone its always going to cost you whereas you can downgrade a server back to the free tier if ya decide to do that.
There are also some wonky decisions that Google made that leave me annoyed almost daily. The fact you cant utilize smtp ports of the servers leaves me having to go all around to get a WordPress site to send emails…or the inability to easily transfer a project between accounts. I’ve landed myself in a situation where I transferred ownership but I didn’t remember to transfer billing but was no longer a project owner so I couldnt transfer billing anymore, customer service just acted like it made sense that I couldn’t use or config the resource but that my credit card was still going to be used.
SSH and SFTP into AWS fairly standardized and it is relatively seamless. Google makes these difficult.
The way they only give out one static ip address per zone. They have a BETA project and decide if its to allow multiple IPs but…what took so long? IP Aliases or multiple network ip addresses … on AWS I just added the IP addresses, why do I need more than one? Because my name servers need to have different IP address, but again I cant do it right now.
So with all these limits here and there I personally pay for my servers with AWS (its just easier to use) but I use Google Cloud for short experiments where I may need more than 1 IP, and a site that doesn’t ever send an email. is new and overdue.
You can request a Transfer Appliance directly from your GCP console. The service will be available in beta in the EU in a 100TB configuration with total usable capacity of 200TB. And it’ll soon be available in a 480TB configuration with a total usable capacity of a petabyte.
Moving HDFS clusters with Transfer Appliance
Customers have been using Transfer Appliance to move everything from audio and satellite imagery archives to geographic and wind data. One popular use case is migrating Hadoop Distributed File System (HDFS) clusters to GCP.
We see lots of users run their powerful Apache Spark and Apache Hadoop clusters on GCP with Cloud Dataproc, a managed Spark and Hadoop service that allows you to create clusters quickly, then hand off cluster management to the service. Transfer Appliance is an easy way to migrate petabytes of data from on-premise HDFS clusters to GCP.
Earlier this year, we announced the ability to configure Transfer Appliance with one or more NFS volumes. This lets you push HDFS data to Transfer Appliance using Apache DistCp (also known as Distributed Copy)—an open source tool commonly used for intra/inter-cluster data copy. To copy HDFS data onto a Transfer Appliance, configure it with an NFS volume and mount it from the HDFS cluster. Then run DistCp with the mount point as the copy target. Once your data is copied to Transfer Appliance, ship it to us and we’ll load your data into Cloud Storage.
Using Transfer Appliance in production
EU customers such as Candour Creative, which helps their clients tell stories through films and photographs, wanted to take advantage of having their content readily available in the cloud. But Zac Crawley, Director at Candour, was facing some challenges with the move.
“Multiple physical backups of our data were taking up space and becoming costly,” Crawley says. “But when we looked at our network, we figured it would take a matter of months to move the 40TBs of large file data. Transfer Appliance reduced that time significantly.”
Tier 1: Opportunistic (especially to maximize ROI)
The first tier—strong candidates to migrate first—revolves around current opportunities that could help you maximize ROI of that app in some fashion. That’s especially useful if you’re under pressure to justify cloud business value or otherwise provide cost assessments for a workload or app. Here are some questions to ask to identify the applications to prioritize:
Is this application significantly more expensive to run on-prem than it would be to migrate and run in the public cloud?
Will this application require an upcoming hardware refresh, making it more attractive to move to the public cloud sooner rather than later?
Are there services (or regions/instances, etc.) in the cloud that would make this application perform significantly better?
Identifying these options to migrate can create some quick wins that yield tangible, immediate benefits for users and the business.
Tier 2: Minimize your migration risk
Where our first tier focuses on opportunity, our second tier focuses on risk. What applications can you move with relatively low risk to your greater IT operations? There are a number of questions IT can ask to help evaluate which applications are the least risky to migrate, making them the most attractive to migrate in the early phases of a cloud migration project. For example:
What is the business criticality of this application?
Do large swaths of employees and/or customers depend on this application?
What is the production level of this application (development vs. production)?
How many dependencies and/or integrations does this application have?
What is my IT team’s understanding of this application?
Does my IT team have proper, up-to-date, thorough documentation for this application and its architecture?
What are the operational requirements (SLAs) for this application?
What are the legal or governmental compliance requirements for this application?
What are the downtime and/or latency sensitivities for this application?
Are there line-of-business owners eager and willing to migrate their apps early?
Going through this list of questions can help you rank applications from lowest to highest risk. Low-risk applications should be migrated first, and higher-risk applications should come later.
Tier 3: Ease of migration to the public cloud
The third tier in this framework revolves around the ease with which you can potentially migrate an application to the cloud. Unlike risk, which is all about that application’s relative importance, ease of migration is about how frictionless the application’s journey to the cloud will be. Some good questions to ask include:
How new is this application, and was it designed to run on-prem or in the public cloud?
Can this application be migrated using straightforward approaches like lift-and-shift?
Is this application standardized for one OS type, or does it have flexible requirements?
Does this application (or its data) have regulatory, compliance, or SLA-based requirements to run on-prem?
When plotting out which applications to migrate to the cloud, you may find that sometimes applications from Tier 3 may go before Tier 2 (or even Tier 1). This is completely fine. Tier 2 and Tier 3 both involve a lot of variables, so it’s common to have some mixing and matching along your migration path.
Tier 4: Custom applications
The fourth and final tier of this framework—representing the applications you should migrate last—are your custom applications. These are applications that were written in-house or by third parties, but which will pose some potentially unique migration questions, like:
Was this application built specifically for its current hardware? For on-prem?
Do we have proper comments in the code to help us re-architect for the cloud if needed?
How is this application intertwined within our total application landscape?
Do we have the in-house expertise to migrate this application to the cloud successfully?
Answering these questions will help you have a sense of how (and whether) you’ll migrate these applications, and what challenges you might encounter along the way.
These four tiers represent a generic framework for you to decide the order in which to migrate your applications to the cloud. It’s crucial to get early wins during the migration process, and this framework can help you identify which applications represent the highest likelihood of early success. You can progress through your application landscape knowing which apps are most likely to yield success and which will be more challenging.
Learn more about cloud migration here, and happy planning!
At Google Cloud, we work hard to give you the controls you need to tailor your network and security configurations to your organization’s needs. Today, we’re excited to announce the general availability of a few important networking features for Google Kubernetes Engine (GKE) that provide additional security and privacy for your container infrastructure: private clusters, master authorized networks, and Shared Virtual Private Cloud (VPC).
These new features enable you to limit access to your Kubernetes clusters from the public internet, confining them within the secure perimeter of your VPC, and to share common resources across your organization without compromising on isolation. Specifically:
Private clusters let you deploy GKE clusters privately as part of your VPC, restricting access to within the secure boundaries of your VPC.
Master authorized networks block access to your clusters’ master API endpoint from the internet, limiting access to a set of IP addresses you control.
Shared VPC eases cluster maintenance and operation by separating responsibilities: it gives centralized control of critical network resources to network or security admins, and clusters responsibilities to project admins.
Credit Karma, a personal finance company that keeps track of its users’ credit scores, has been eagerly testing out these advanced GKE networking capabilities, especially as they work to meet compliance requirements such as PCI-DSS (Payment Card Industry Data Security Standard).
“GKE gives us the features we need to move faster. The private cluster capability enables us to meet strict security and compliance requirements without compromising on functionality. With private IPs and pod IP aliasing, we are able to communicate with other services in GCP while staying within Google’s private network.” – Kevin Jones, Staff engineer, Credit Karma
Now that we’ve been introduced to the new features, let’s take a look at each one in more detail.
Enable more secured Kubernetes deployments
Private clusters on GKE use only private IP addresses for your workloads so that they’re only reachable from within your VPC making the communication between the master and the nodes completely private.
In order to access your GKE master for administrative purposes, you can connect privately to the Kubernetes master from your on-prem via VPN or private Interconnect.
You can also whitelist a set of internet public IPs that are allowed to access the master endpoint, blocking traffic from unauthorized IP sources, with master authorized networks.
The access to images in Google Container Registry, or to Stackdriver to send logs is also done privately with Private Google Access without leaving Google’s network. To gain internet access from the private cluster for the nodes, you can either set up additional services, such as a NAT gateway, or use Google’s managed version, Cloud NAT.
Check out the documentation on how to create a private cluster to confine your workloads within the secure boundaries of your VPC.
Control access to critical network resources with Shared VPC
Shared VPC allows many different GKE cluster admins in an Organization to carry out their cluster management duties autonomously while communicating and sharing common resources securely.
For example, you can assign administrative responsibilities such as creating and managing a GKE cluster to project admins, while tasking security and network admin teams with the responsibility for critical network resources like subnets, routes, and firewalls. Learn how to create Kubernetes clusters in a Shared VPC model and set appropriate access controls for critical network resources.
In conclusion GKE provides the network and security centralized management for your enterprise deployments, and allows your sensitive workloads to remain secure and private within the boundaries of your VPC. Read more about how to holistically think about networking to apply to your GKE deployments.
To show off the power of Cloud ML Engine we built two versions of the model independently—one in Scikit-learn and one in TensorFlow—and built a web app to easily generate predictions from both versions. Because these models were built with entirely different frameworks and have different dependencies, it previously required a lot of code to build even a simple app that queried both models. Cloud ML Engine provides a centralized place for us to host multiple types of models, and streamlines the process of querying them.
And before we get into the details, you may be wondering why you’d need multiple versions of the same model. If you’ve got data scientists or ML engineers on your team, they may want to experiment independently with different model inputs and frameworks. Or, maybe they’ve built an initial prototype of a model and will then obtain additional training data and train a new version. A web app like the one we’ve built provides an easy way to compare output, or even load test across multiple versions.
For the frontend, we needed a way to make predictions directly from our web app. Because we wanted the demo to focus on Cloud ML Engine serving, and not on boilerplate details like authenticating our Cloud ML Engine API request, Cloud Functions was a great fit. The frontend consists of a single HTML page hosted on Cloud Storage. When a user enters a movie description in the web app and clicks “Get Prediction,” it invokes a cloud function using an HTTP trigger. The function sends the text to ML Engine, and parses the genres returned from the model to display them in the web UI.
Here’s an architecture diagram of how it all fits together:
640 Cloud TPUs in GKE powering Minigo
Internally, we use Cloud TPUs to run one of the most iconic Google machine learning workloads: Go. Specifically, we run Minigo, an open-source and independent implementation of Google DeepMind’s AlphaGo Zero algorithm, which was the first computer program to defeat a professional human Go player and world champion. Minigo was started by Googlers as a 20% project, written only from existing published papers after DeepMind retired AlphaGo.
Go is a strategy board game that was invented in China more than 2,500 years ago and that has fascinated humans ever since—and in recent years challenged computers. Players alternate placing stones on a grid of lines in an attempt to surround the most territory. The large number of choices available for each move and the very long horizon of their effects combine to make Go very difficult to analyze. Unlike chess or shogi, which have clear rules that determine when a game is finished (e.g., checkmate), a Go game is only over when both players agree. That’s a difficult problem for computers. It’s also very hard, even for skilled human players, to determine which player is winning or losing at a given point in the game.
Minigo plays a game of Go using a neural network, or a model, that answers two questions: “Which move is most likely to be played next?” called the policy, and “Which player is likely to win?” called the value. It uses the policy and value to search through the possible future states of the game and determine the best move to be played.
The neural network provides these answers using reinforcement learning which iteratively improves the model in a two-step process. First, the best network plays games against itself, recording the results of its search at each move. Second, the network is updated to better predict the results in step one. Then the updated model plays more games against itself, and the cycle repeats, with the self-play process producing new data for the training process to build better models, and so on ad infinitum.
Key Components & Services
There are two custom services running on the deployed machines that are essential for the solution to function properly. These services are gcs-sync (running on WordPress instances – both Admin and Content) and cloudsql-proxy (running on the SQL Proxy instances).
The gcs-sync service runs a script /opt/c2d/downloads/gcs-sync that, depending on the role the VM is assigned (Content or Admin), will check in with the GCS bucket tied to the deployment and determine if content needs to be pushed to or pulled from GCS. If you need to interact with the service, you can do so via systemctl. For example:
systemctl stop gcs-sync
will kill the script checking GCS, and the node will not receive any updates that come from the Administrator Node. Conversely, if the service needs to be started you can do so with the following command:
systemctl start gcs-sync
The cloudsql-proxy service makes use of the Cloud SQL Proxy binary so you can connect to your Cloud SQL instance without having to whitelist IP addresses, which can change when instances are deleted and recreated in a Managed Instance Group. The Cloud SQL binary is located at /opt/c2d/downloads/cloud_sql_proxy and the script that executes the binary is located at /opt/c2d/downloads/cloudsql-proxy. Like the service that runs gcs-sync, it can be interacted with using systemctl. Stopping the service can be done with:
systemctl stop cloudsql-proxy
At this point your instance will not be able to communicate with the Cloud SQL instance, and the application will not function. If you needed to manually start the service for any reason you can do so with the following command:
systemctl start cloudsql-proxy
When choosing a cloud to host your applications, you want a portfolio of database options—SQL, NoSQL, relational, non-relational, scale up/down, scale in/out, you name it—so you can use the right tool for the job. Google Cloud Platform (GCP) offers a full complement of managed database services to address a variety of workload needs, and of course, you can run your own database in Google Compute Engine or Kubernetes Engine if you prefer.
Today, we’re introducing some new database features along with partnerships, beta news and other improvements that can help you get the most out of your databases for your business.
Here’s what we’re announcing today:
- Oracle workloads can now be brought to GCP
- SAP HANA workloads can run on GCP persistent-memory VMs
- Cloud Firestore launching for all users developing cloud-native apps
- Regional replication, visualization tool available for Cloud Bigtable
- Cloud Spanner updates, by popular demand
Managing OracleⓇ workloads with Google partners
Until now, it’s been a challenge for customers to bring some of the most common workloads to GCP. Today, we’re excited to announce that we are partnering with managed service providers (MSPs) to provide a fully managed service for Oracle workloads for GCP customers. Partner-managed services like this unlock the ability to run Oracle workloads and take advantage of the rest of the GCP platform. You can run your Oracle workloads on dedicated hardware and you can connect the applications you’re running on GCP.
By partnering with a trusted managed service provider, we can offer fully managed services for Oracle workloads with the same advantages as GCP services. You can select the offering that meets your requirements, as well as use your existing investment in Oracle software licenses.
We are excited to open the doors to customers and partners whose technical requirements do not fit neatly into the public cloud. By working with partners, you’ll have the option to move these workloads to GCP and take advantage of the benefits of not having to manage hardware and software. Learn more about managing your Oracle workloads with Google partners, available this fall.
Partnering with Intel and SAP
This week we announced our collaboration with Intel and SAP to offer Compute Engine virtual machines backed by the upcoming Intel Optane DC Persistent Memory for SAP HANA workloads. Google Compute Engine VMs with this Intel Optane DC persistent memory will offer higher overall memory capacity and lower cost compared to instances with only dynamic random-access memory (DRAM). Google Cloud instances on Intel Optane DC Persistent Memory for SAP HANA and other in-memory database workloads will soon be available through an early access program. To learn more, sign up here.
We’re also continuing to scale our instance size roadmap for SAP HANA production workloads. With 4TB machine types now in general availability, we’re working on new virtual machines that support 12TB of memory by next summer, and 18TB of memory by the end of 2019.
Accelerate app development with Cloud Firestore
For app developers, Cloud Firestore brings the ability to easily store and sync app data at global scale. Today, we’re announcing that we’ll soon expand the availability of the Cloud Firestore beta to more users by bringing the UI to the GCP console. Cloud Firestore is a serverless, NoSQL document database that simplifies storing, syncing and querying data for your cloud-native apps at global scale. Its client libraries provide live synchronization and offline support, while its security features and integrations with Firebase and GCP accelerate building truly serverless apps.
We’re also announcing that Cloud Firestore will support Datastore Mode in the coming weeks. Cloud Firestore, currently available in beta, is the next generation of Cloud Datastore, and offers compatibility with the Datastore API and existing client libraries. With the newly introduced Datastore mode on Cloud Firestore, you don’t need to make any changes to your existing Datastore apps to take advantage of the added benefits of Cloud Firestore. After general availability of Cloud Firestore, we will transparently live-migrate your apps to the Cloud Firestore backend, and you’ll see better performance right away, for the same pricing you have now, with the added benefit of always being strongly consistent. It’ll be a simple, no-downtime upgrade. Read more here about Cloud Firestore.
Simplicity, speed and replication with Cloud Bigtable
For your analytical and operational workloads, an excellent option is Google Cloud Bigtable, a high-throughput, low-latency, and massively scalable NoSQL database. Today, we are announcing that regional replication is generally available. You can easily replicate your Cloud Bigtable data set asynchronously across zones within a GCP region, for additional read throughput, higher durability and resilience in the face of zonal failures. Get more information about regional replication for Cloud Bigtable.
Additionally, we are announcing the beta version of Key Visualizer, a visualization tool for Cloud Bigtable key access patterns. Key Visualizer helps debug performance issues due to unbalanced access patterns across the key space, or single rows that are too large or receiving too much read or write activity. With Key Visualizer, you get a heat map visualization of access patterns over time, along with the ability to zoom into specific key or time ranges, or select a specific row to find the full row key ID that’s responsible for a hotspot. Key Visualizer is automatically enabled for Cloud Bigtable clusters with sufficient data or activity, and does not affect Cloud Bigtable cluster performance. Learn more about using Key Visualizer on our website.
|Key Visualizer, now in beta, shows an access pattern heat map so you can debug performance issues in Cloud Bigtable.|
Finally, we launched client libraries for Node.js (beta) and C# (beta) this month. We will continue working to provide stronger language support for Cloud Bigtable, and look forward to launching Python (beta), C++ (beta), native Java (beta), Ruby (alpha) and PHP (alpha) client libraries in the coming months. Learn more about Cloud Bigtable client libraries.
Cloud Spanner updates, by popular request
Last year, we launched our Cloud Spanner database, and we’ve already seen customers do proof-of-concept trials and deploy business-critical apps to take advantage of Cloud Spanner’s benefits, which include simplified database administration and management, strong global consistency, and industry-leading SLAs.
Today we’re announcing a number of new updates to Cloud Spanner that our customers have requested. First, we recently announced the general availability of import/export functionality. With this new feature, you can move your data using Apache Avro files, which are transferred with our recently released Apache Beam-based Cloud Dataflow connector. This feature makes Cloud Spanner easier to use for a number of important use cases such as disaster recovery, analytics ingestion, testing and more.
We are also previewing data manipulation language (DML) for Cloud Spanner to make it easier to reuse existing code and tool chains. In addition, you’ll see introspection improvements with Top-N Query Statistics support to help database admins tune performance. DML (in the API as well as in the JDBC driver), and Top-N Query Stats will be released for Cloud Spanner later this year.
Your cloud data is essential to whatever type of app you’re building with GCP. You’ve now got more options than ever when picking the database to power your business.