News About GPC Services
There are lots of great tools for machine learning, and there are lots of ways to solve a machine learning problem. The key is using the right tool for the right situation. While Scikit-learn, TensorFlow, Keras, and PyTorch all have their merits, for this case, BigQuery ML’s ease and speed can’t be beat.
Not convinced? Try this Qwiklab designed for basketball analysis and you’ll see what we mean!
Since we didn’t have to design our architecture from scratch, we wanted to expand and collaborate with more basketball enthusiasts. College students were a natural fit. We started by hosting the first-ever Google Cloud and NCAA Hackathon at MIT this past January, and after seeing some impressive work, we recruited about 30 students from across the country to join our data analyst ranks.
The students have split into two teams, looking at the concepts of ‘explosiveness’ and ‘competitiveness,’ each hoping to build a viable metric to evaluate college basketball teams. By iterating over Google Docs, BigQuery, and Colab, the students have been homing in on ways to use data and quantitative analysis to create definition around previously qualitative ideas.
For example, sportscasters often mention how ‘explosive’ a team is at various points in a game. But aside from watching endless hours of basketball footage, how might you go about determining if a team was, in fact, playing explosively? Our student analysts considered the various factors that come into explosive play, like dunks and scoring runs. By pulling up play-by-play data in BigQuery, they could easily find boxscore data with timestamps of all historical games, yielding a score differential. Using %%bigquery magic, they pivoted to Colab, and explored the pace of play of games, creating time boundaries that isolated when teams went on a run in a game. From there, they created an expression of explosiveness, which will be used for game analysis during the tournament. You can read more about their analysis and insight at g.co/marchmadness.
Moving data is one essential task for enterprises today, especially if you’re using lots of different systems across many locations. One product for this is IBM MQ, which helps you move data dependably with secure messaging. Deploying a highly available IBM MQ cluster in the cloud is not a straightforward task, and IBM provides many clustering configurations you can use and combine in various ways to achieve your high availability goals. It’s challenging to deploy a cluster like this in a way that takes advantage of cloud’s benefits, like multiple zone availability, load balancers and vertical scaling. But once you’ve got it set up, you can safely move, integrate and process data from within applications across your organization quickly and securely, with multiple petabytes of data processed.
We’ve heard that you want to know how to use a tool like IBM MQ as part of your Google Cloud Platform (GCP) deployment, specifically Compute Engine. We’re pleased to introduce this guide to how to deploy a highly available IBM MQ Queue Manager Cluster on Compute Engine with GlusterFS. GlusterFS is an open-source, scale-out storage system that works well for this purpose because it is designed for high-throughput storage shared between instances and requires little effort to set up.
Using queue managers to build the IBM MQ cluster
In this solution, we recommend combining queue manager clusters with multi-instance queue managers (instances of the same queue manager configured on different servers) to create a highly available, scalable IBM MQ deployment. Multi-instance queue managers run in an active/standby configuration, using a shared volume for configuration and state data. Clustered queue managers share configuration information using a network channel and can perform load balancing on incoming messages. However, the message state is not shared between the two queue managers.
By using both IBM MQ cluster deployment models at once, you can achieve redundancy at the queue manager level and then scale by distributing the load of the IBM MQ cluster across one or more queue managers.You can see the architecture of the deployment in this diagram:
Brand-new or second-hand? Diesel or electric? Convertible or SUV? Buying a car means choosing from a plethora of options, and that can be hard for some people to navigate. As a result, retailers are constantly rethinking their technology offerings—which means digital transformation must move just as fast.
As the UK’s largest digital automotive marketplace and the country’s 16th largest website, Auto Trader UK prides itself on how simple it is for its consumers and retailers alike to buy and sell cars on their platform. To do it they rely on Google Kubernetes Engine (GKE) plus Istio, an open-source, transparent service mesh that is integrated into GKE. Istio has helped to enable visibility, increase agility and effectively secure their production environment, without sacrificing developer productivity.
Improving security and agility with GKE and Istio
Since 2013, Auto Trader has been a completely digital business, and they are now the UK’s market leader, with 55 million cross-platform visits every month and an audience four times larger than their nearest competitor. In total, they offer 300 applications including valuation tools, detailed reviews of dealerships and new cars, and integrations with car finance and insurance partners.
Auto Trader’s journey began 17 years ago on-premises with its own data centers. Then in 2018, they decided to move to the public cloud as part of their digital transition to create a more agile architecture that enables faster innovation. Their first choice was Google Cloud Platform (GCP).
As a part of this journey, Auto Trader moved their back-end applications to GKE and implemented Istio. They were looking for a trusted partner to off-load management of Kubernetes and they chose Google Cloud because, as Karl Stoney, Delivering Engineering Lead at Auto Trader put it: “Who could manage it better than the company that created it?” Many of the capabilities that Auto Trader were looking for come out-of-the-box with Istio, as it enables visibility into applications in terms of response times and other important service metrics.
“Over the last 14 months we have worked directly with Google’s Kubernetes product managers with ongoing access to the Google Cloud Istio teams,” says Russell Warman, Head of Infrastructure at Auto Trader. “From a business perspective, migrating to Google Cloud Platform means we can get ideas up and running quickly, enabling us to build brilliant new products, helping us to continue to lead in this space.”
Since adopting Kubernetes and Istio, Auto Trader has seen significant gains in efficiency. For example, they are 75 percent more efficient in terms of their compute resources, without impacting performance. Auto Trader has also lowered their monthly bill and can now predict future spending more accurately. Istio, meanwhile, has helped them improve security and visibility, with no extra developer effort or training needed.
Auto Trader is now planning to complete its migration to the public cloud. With about a third of its services already running in production on GCP, they plan to migrate their remaining workloads over the next year to ensure everything is built, managed and monitored in the same way.
Auto Trader are certainly in the driver seat when it comes to their Istio journey.
To find out more about the other benefits of migrating to GCP, both from an operational and development perspective, including improved security, see the Auto Trader UK case study.
Good technologies solve problems. Great ones deliver new ways to think about ourselves and the future.
Think of the rich worlds seen through the telescope and the microscope, or space explorations that have broadened the sense of our place in creation. There is that “bicycle for the mind,” as Steve Jobs called the personal computer. Romance, entertainment, and personal needs have been transformed by online life. In every case, the attributes of the machines fire and empower human imagination.
Cloud computing is another one of these great technologies. Today, we’re pleased to publish The Future of Cloud Computing, a look at some of the ways that the tools and attributes of cloud computing are transforming work, business, and markets.
Cloud computing, which has become a standard at many of the world’s largest companies, is much more than just a cheaper and easier way to access computers, storage, networks and software. The power and ubiquity of the cloud mean easy two-way interactions of data and analysis from virtually any point on the planet. Software innovations enable companies to work at a scale we could not have imagined even a decade ago.
The defaults of this computing architecture, particularly in public clouds like Google Cloud, are choice, flexibility, responsiveness, and a strong analytic capability. It’s notable that these are the same values that increasingly drive organizations, in everything from distributed teams, on-demand collaboration, and real-time customer service and product upgrades.
To take one example: large-scale clouds like Google Cloud, and smaller private clouds inside companies, increasingly use management software like Kubernetes and Istio, which seek to observe and manage lots of workloads, moving them efficiently through lots of computing hardware in the most standardized and automated way possible.
On one level, this is simply technology at its best, getting hassles out of the way so people can do more creative work. It’s no accident, though, that both products are open source, a transparent, collaborative, and high-velocity way of working that has come into its own in the cloud era.
Other examples draw on existing trends. In global manufacturing, the collaboration of outsourcing, partnering, subcontracting, and alliances is supercharged by the cloud. The transformation of our workspaces, with expensive closed-door offices giving way to cubicles, then open-plan offices and telecommuting, can be effected in new and better ways when both work products and communications tools reside in the cloud, accessible anywhere.
Mobility and the Internet of Things mean all kinds of sensors are capturing data, and products are better connected to the cloud, responding appropriately to new information. There are better forms of security and privacy protection. These are borderless values, designed for protection within the contexts of an overall systems, and not weaker localized computers.
There’s much more in the report, and I hope you check it out.
Where is this taking us? Our world has challenges and opportunities, but we have eliminated a few fatal diseases, educated people around the world with cloud-based remote learning, and look at videos from Mars. It’s possible to be quite an optimist, too.
In that spirit, cloud computing is about discovering more of life, reacting to it, and building on our discoveries faster and better. It’s about making more dreams real.
Click here to download the full report.
From a configuration, operation, and troubleshooting standpoint, the cloud has a lot of roads to the proverbial top of Mount Fuji. Figure 5 shows the different the different paths available. You are free to choose the one that best maps to your skill level and is most appropriate to complete the task at hand. While Google Cloud has a CLI-based SDK for shell scripting or interactive terminal sessions, you don’t have to use it. If you are developing an application or prefer a programmatic approach, you can use one of many client libraries that expose a wealth of functionality. If you’re an experienced programmer with specific needs you can even write directly to the REST API itself. And of course, on the other end of the spectrum, if you are learning or prefer to use a visual approach, there’s always the console.
In addition to the tools above, If you need to create larger topologies on a regular cadence you may want to look at Google’s Cloud Deployment Manager. If you want a vendor agnostic tool that works across cloud carriers you can investigate the open-source program Terraform. Both solutions offer a jump from imperative to declarative infrastructure programming. This may be a good fit if you need a more consistent workflow across developers and operators as they provision resources
Putting it all together
If this sounds like a lot, that’s because it is. Don’t despair though, there’s a readily available resource that will really help you grok these foundational network concepts: the documentation.
You are most likely very familiar with the documentation section of several network vendor’s websites. To get up to speed on networking on Google Cloud, your best bet is to familiarize yourself with Google’s documentation as well. There is documentation for high-level concepts like network tier levels, network peering, and hybrid connectivity. Then, each cloud service also has its own individual set of documentation, subdivided into concepts, how-tos, quotas, pricing, and other areas. Reviewing how it is structured and creating bookmarks will make studying and the certification process much easier. Better yet, it will also make you a better cloud engineer.
Finally, I want to challenge you to stretch beyond your comfort zone. Moving from network to the cloud is about virtualization, automation, programming, and developing new areas of expertise. Your journey into the cloud should not stop at learning how GCP implements VPCs. Set long term as well as short term goals. There are so many new areas where your skill sets are needed and you can provide value. You can do it; don’t doubt that for one minute.
In my next blog post I’ll be discussing an approach to structure your Cloud learning. This will make the learning and certification process easier as well as prepare you to do the cloud Network Engineer role. Until then, the Google Cloud training team has lots of ways for you to increase your Google Cloud know-how. Join our webinar on preparing for the Professional Cloud Network Engineer certification exam on February 22, 2019 at 9:45am PST. Now go visit the Google certification page and set your first certification goal! Best of Luck!
In contrast to a static algorithm coded by a software developer, an ML model is an algorithm that is learned and dynamically updated. You can think of a software application as an amalgamation of algorithms, defined by design patterns and coded by software engineers, that perform planned tasks. Once an application is released to production, it may not perform as planned, prompting developers to rethink, redesign, and rewrite it (continuous integration/continuous delivery).
We are entering an era of replacing some of these static algorithms with ML models, which are essentially dynamic algorithms. This dynamism presents a host of new challenges for planners, who work in conjunction with product owners and quality assurance (QA) teams.
For example, how should the QA team test and report metrics? ML models are often expressed as confidence scores. Let’s suppose that a model shows that it is 97% accurate on an evaluation data set. Does it pass the quality test? If we built a calculator using static algorithms and it got the answer right 97% of the time, we would want to know about the 3% of the time it does not.
Similarly, how does a daily standup work with machine learning models? It’s not like the training process is going to give a quick update each morning on what it learned yesterday and what it anticipates learning today. It’s more likely your team will be giving updates on data gathering/cleaning and hyperparameter tuning.
When the application is released and supported, one usually develops policies to address user issues. But with continuous learning and reinforcement learning, the model is learning the policy. What policy do we want it to learn? For example, you may want it to observe and detect user friction in navigating the user interface and learn to adapt the interface (Auto A/B) to reduce the friction.
Within an effective ML lifecycle, planning needs to be embedded in all stages to start answering these questions specific to your organization.
Data engineering is where the majority of the development budget is spent—as much as 70% to 80% of engineering funds in some organizations. Learning is dependent on data—lots of data, and the right data. It’s like the old software engineering adage: garbage in, garbage out. The same is true for modeling: if bad data goes in, what the model learns is noise.
In addition to software engineers and data scientists, you really need a data engineering organization. These skilled engineers will handle data collection (e.g., billions of records), data extraction (e.g., SQL, Hadoop), data transformation, data storage and data serving. It’s the data that consumes the vast majority of your physical resources (persistent storage and compute). Typically due to the magnitude in scale, these are now handled using cloud services versus traditional on-prem methods.
Effective deployment and management of data cloud operations are handled by those skilled in data operations (DataOps). The data collection and serving are handled by those skilled in data warehousing (DBAs). The data extraction and transformation are handled by those skilled in data engineering (Data Engineers), and data analysis are handled by those skilled in statistical analysis and visualization (Data Analysts).
Modeling is integrated throughout the software development lifecycle. You don’t just train a model once and you’re done. The concept of one-shot training, while appealing in budget terms and simplification, is only effective in academic and single-task use cases.
Until fairly recently, modeling was the domain of data scientists. The initial ML frameworks (like Theano and Caffe) were designed for data scientists. ML frameworks are evolving and today are more in the realm of software engineers (like Keras and PyTorch). Data scientists play an important role in researching the classes of machine learning algorithms and their amalgamation, advising on business policy and direction, and moving into roles of leading data driven teams.
But as ML frameworks and AI as a Service (AIaaS) evolve, the majority of modeling will be performed by software engineers. The same goes for feature engineering, a task performed by today’s data engineers: with its similarities to conventional tasks related to data ontologies, namespaces, self-defining schemas, and contracts between interfaces, it too will move into the realm of software engineering. In addition, many organizations will move model building and training to cloud-based services used by software engineers and managed by data operations. Then, as AIaaS evolves further, modeling will transition to a combination of turnkey solutions accessible via cloud APIs, such as for Cloud Vision and Cloud Speech-to-Text, and customizing pre-trained algorithms using transfer learning tools such as AutoML.
Frameworks like Keras and PyTorch have already transitioned away symbol programming into imperative programming (the dominant form in software development), and incorporate object-oriented programming (OOP) principles such as inheritance, encapsulation, and polymorphism. One should anticipate that other ML frameworks will evolve to include object relational models (ORM), which we already use for databases, to data sources and inference (prediction). Common best practices will evolve and industry-wide design patterns will become defined and published, much like how Design Patterns by the Gang of Four influenced the evolution of OOP.
Like continuous integration and delivery, continuous learning will also move into build processes, and be managed by build and reliability engineers. Then, once your application is released, its usage and adaptation in the wild will provide new insights in the form of data, which will be fed back to the modeling process so the model can continue learning.
As you can see, adopting machine learning isn’t simply a question of learning to train a model, and you’re done. You need to think deeply about how those ML models will fit into your existing systems and processes, and grow your staff accordingly. I, and all the staff here at Google, wish you the best in your machine learning journey, as you upgrade your software development lifecycle to accommodate machine learning. To learn more about machine learning on Google Cloud here, visit our Cloud AI products page.
Part and parcel of modern enterprise development is building APIs that enable you to expose your services to developers both inside and outside your organization. But just building APIs isn’t enough. Getting APIs and API programs to market successfully hinges on convincing your developers to actually use them. And the key driver of getting developers to adopt and consume APIs, both within a company or among the wider developer community, is the developer portal.
To help enterprises create great developer experiences, we’re announcing several enhancements to the Apigee Developer Portal, a comprehensive, customizable solution that helps API providers seamlessly onboard developers and admins who use APIs managed by Google Cloud’s Apigee platform. Here is what’s included in this round of updates:
- A new version of SmartDocs API reference documentation
- An enhanced theme editor and redesigned default portal theme
- Improvements to managing developer accounts
Apigee’s SmartDocs automatically creates beautiful API reference documentation for your developers, and features a new, three-pane view. The left pane helps developers navigate between areas of the API, while the center pane gives detailed documentation for a given operation. The right pane enables you to make API requests directly from the docs, and it includes an “expand” button so you can focus on the details of the request itself.
Documentation is built upon on the OpenAPI Specification and supports both versions 2.0 and 3.0.x. Every operation defined in the OpenAPI spec gets its own page, which makes it easy for users to share and discuss specific areas of the docs and for your API team to deep-link users to the exact content they need.
We’re pleased to announce that The Site Reliability Workbook is available in HTML now! Site Reliability Engineering (SRE), as it has come to be generally defined at Google, is what happens when you ask a software engineer to solve an operational problem. SRE is an essential part of engineering at Google. It’s a mindset, and a set of practices, metrics, and prescriptive ways to ensure systems reliability. The new workbook is designed to give you actionable tips on getting started with SRE and maturing your SRE practice. We’ve included links to specific chapters of the workbook that align with our tips throughout this post.
We’re often asked what implementing SRE means in practice, since our customers face challenges quantifying their success when setting up their own SRE practices. In this post, we’re sharing a couple of checklists to be used by members of an organization responsible for any high-reliability services. These will be useful when you’re trying to move your team toward an SRE model. Implementing this model at your organization can benefit both your services and teams due to higher service reliability, lower operational cost, and higher-value work for the humans.
But how can you tell how far you have progressed along this journey? While there is no simple or canonical answer, you can see below a non-exhaustive list to check your progress, organized as checklists by ascending order of maturity of a team. Within every checklist, the items are roughly in chronological order, but we do recognize that any given team’s actual needs and priorities may vary.
If you’re part of a mature SRE team, these checklists can be useful as a form of industry benchmark, and we’d love to encourage others to publish theirs as well. Of course, SRE isn’t an exact science, and challenges arise along the way. You may not get to 100% completion of the items here, but we’ve learned at Google that SRE is an ongoing journey.
SRE: Just getting started
The following three practices are key principles of SRE, but can largely be adopted by any team responsible for production systems, regardless of its name, before and in parallel to staffing an SRE team.
Beginner SRE teams
Most, if not all, SRE teams at Google have established the following practices and characteristics. We generally view these as fundamental to an effective SRE team, unless there are good reasons why they aren’t feasible for a specific team’s circumstances.
The following practices are also common for SRE teams starting out. If they don’t exist, that can be a sign of poor team health and sustainability issues:
Intermediate SRE teams
These characteristics are common in mature teams and generally indicate that the team is taking a proactive approach to efficient management of its services.
Advanced SRE teams
These practices are common in more senior teams, or sometimes can be achieved when an organization or set of SRE teams share a broader charter.
Another set of SRE “features” which may be desirable but unlikely to be implemented by most companies are:
- SREs are not on-call 24×7. SRE teams are geographically distributed in two locations, such as U.S. and Europe. It’s worth pointing out that neither half is treated as secondary.
- SRE and developer organizations share common goals and may have separate reporting chains up to SVP level or higher. This arrangement helps to avoid conflicts of interest.
What should I do next?
Once you’ve looked through these checklists, your next step is to think about whether they match your company’s needs.
For those without an SRE team where most of the beginner list is unfilled, we’d highly recommend reading the associated SRE Workbook chapters in the order they have been presented. If you happen to be a Google Cloud Platform (GCP) customer and would like to request CRE involvement, contact your account manager to apply for this program. But to be clear, SRE is a methodology that will work on a huge variety of infrastructures, and using Google Cloud is not a prerequisite for pursuing this set of engineering practices.
We’d also recommend attending existing conferences and organizing summits with other companies in order to share best practices on how to solve some of the blockers, such as recruiting.
We have also seen teams struggling to fill out the advanced list because of churn. The rate of systems and personnel changes may be a deterrent to get there. In order to avoid teams reverting to the beginner stage and other problems, our SRE leadership reviews key metrics per team every six months. The scope is more narrow than the checklists above because several of the items have now become standard.
As you may have guessed by now, answering the central question in this article involves addressing and attempting to assess a given team’s impact, health, and most importantly, how the actual work is done. After all, as we wrote in our first book on SRE: “If we are engineering processes and solutions that are not automatable, we continue having to staff humans to maintain the system. If we have to staff humans to do the work, we are feeding the machines with the blood, sweat, and tears of human beings.”
So yes, you might have an SRE team already. Is it effective? Is it scalable? Are people happy? Wherever you are in your SRE journey, you can likely continue to evolve, grow and hone your team’s work and your company’s services. Learn more here about getting started building an SRE team.
Thanks to Adrian Hilton, Alec Warner, David Ferguson, Eric Harvieux, Matt Brown, Myk Taylor, Stephen Thorne, Todd Underwood and Vivek Rau among others for their contributions to this post.
To operate machine learning systems at scale, teams need to have access to a wealth of feature data to both train their models, as well as to serve them in production. GO-JEK and Google Cloud are pleased to announce the release of Feast, an open source feature store that allows teams to manage, store, and discover features for use in machine learning projects.
Developed jointly by GO-JEK and Google Cloud, Feast aims to solve a set of common challenges facing machine learning engineering teams by becoming an open, extensible, unified platform for feature storage. It gives teams the ability to define and publish features to this unified store, which in turn facilitates discovery and feature reuse across machine learning projects.
“Feast is an essential component in building end-to-end machine learning systems at GO-JEK,” says Peter Richens, Senior Data Scientist at GO-JEK, “we are very excited to release it to the open source community. We worked closely with Google Cloud in the design and development of the product, and this has yielded a robust system for the management of machine learning features, all the way from idea to production.”
For production deployments, machine learning teams need a diverse set of systems working together. Kubeflow is a project dedicated to making these systems simple, portable and scalable and aims to deploy best-of-breed open-source systems for ML to diverse infrastructures. We are currently in the process of integrating Feast with Kubeflow to address the feature storage needs inherent in the machine learning lifecycle.
Feature data are signals about a domain entity, e.g: for GO-JEK, we can have a driver entity and a feature of the daily count of trips completed. Other interesting features might be the distance between the driver and a destination, or the time of day. A combination of multiple features are used as inputs for a machine learning model.
In large teams and environments, how features are maintained and served can diverge significantly across projects and this introduces infrastructure complexity, and can result in duplicated work.
- Features not being reused: Features representing the same business concepts are being redeveloped many times, when existing work from other teams could have been reused.
- Feature definitions vary: Teams define features differently and there is no easy access to the documentation of a feature.
- Hard to serve up to date features: Combining streaming and batch derived features, and making them available for serving, requires expertise that not all teams have. Ingesting and serving features derived from streaming data often requires specialised infrastrastructure. As such, teams are deterred from making use of real time data.
- Inconsistency between training and serving: Training requires access to historical data, whereas models that serve predictions need the latest values. Inconsistencies arise when data is siloed into many independent systems requiring separate tooling.
Feast solves these challenges by providing a centralized platform on which to standardize the definition, storage and access of features for training and serving. It acts as a bridge between data engineering and machine learning.
Feast handles the ingestion of feature data from both batch and streaming sources. It also manages both warehouse and serving databases for historical and the latest data. Using a Python SDK, users are able to generate training datasets from the feature warehouse. Once their model is deployed, they can use a client library to access feature data from the Feast Serving API.
At Google Cloud, we work with businesses in a range of industries, and we’ve seen nearly every business experience peak events when their online traffic skyrockets. For retailers, their peak events are Black Friday and Cyber Monday (or BFCM)—the period right after Thanksgiving in the U.S., when holiday shopping starts. The weekend kicks off the all-important holiday shopping season of November and December, when an estimated 20% of all annual retail sales occur.
During an average day, online retail sales in the U.S. total about $1.4 billion, CNET reports. In contrast, on Black Friday 2018, U.S. online sales totaled $6.22 billion (up 24% from 2017). Cyber Monday 2018 sales surged to $7.9 billion (up 19% from 2017)—the biggest online sales day ever in the U.S., according to Adobe Analytics.
Traffic to retailers’ mobile and shopping apps surges to levels unmatched during the rest of the year, and availability or scalability issues can result in millions of dollars of lost sales. Every year, there are well-publicized retail website crashes, so avoiding downtime—along with the accompanying reputation damage, unhappy customers and stressed, overworked IT teams—is particularly important for retailers.
We know that a solid technology infrastructure is the foundation for retailers to stay ahead of demand and succeed during this busy season. Beyond that, though, support for that infrastructure is essential. Support isn’t just activated if something goes wrong. Support for an event like Black Friday and Cyber Monday involves preparation well ahead of time, and includes testing, architecture reviews, capacity planning, operational drills, and war rooms during the event itself. We took a prescriptive approach to BFCM support, setting expectations and ownership early (more than six months ahead), to understand what each retail customer needed, both on their side and from our team.
We’ll go through the steps that helped our retail customers have a fruitful and disaster-free season. These steps can generally help you prepare for your own peak event. We’ll also describe how one large-scale retail platform in particular—Shopify—had a successful BFCM using Google Cloud.
Preparing to support retailers on Black Friday/Cyber Monday
We started planning for Black Friday and Cyber Monday for our retail customers in the spring of 2018 to align with their typical preparation timeline. We formed a task force composed of representatives from Google Cloud’s Professional Services, Customer Engineering, Support, Customer Reliability Engineering (CRE), and Product and Engineering teams. We met regularly to strategize, develop tactics, and execute on those tactics with the goal of making sure Google team members and our GCP retail customers were well-prepared.
We focused on a few key technology areas where planning could help prevent any issues.
1. Early capacity planning
As early as May 2018, our account teams began reaching out to GCP retail customers. We discussed high-level planning, such as their particular holiday shopping objectives and the infrastructure capacity they might need to meet those goals.
We worked closely with retailers to review their architectures and advise on techniques to forecast and plan for increases in capacity before Black Friday, since scalability is essential when planning for traffic spikes. We conducted tests across teams and services, and stress-tested systems to uncover any constraints or weaknesses and remediate as needed. Those tailored preparations paid off across the board. With GCP capacity status firmly green—available—throughout Black Friday and Cyber Monday, shoppers visiting our retail customers’ sites could make their purchases without running into a slow or unresponsive site.
2. Reliability testing
Identifying potential reliability issues in a “pre-mortem” (an important component of CRE) was another preemptive step we took. Early on, our CRE team partnered with our retail customers to analyze the reliability of their infrastructures, and run through tabletop exercises to see how well-prepared the customer was in the face of a failure. In some cases, the Professional Services team helped perform load testing to make sure retailers’ platforms could handle expected levels of peak traffic, and in others we encouraged regular load testing and evaluation. And given how important mobile commerce has become, we also tested the performance and reliability of customers’ mobile apps. We also employed Apigee’s API monitoring tools to ensure API stability. We’ve seen APIs become more important in retail technology, since they allow more flexible, microservice-based e-commerce sites.
3. Operational war rooms
“What could possibly go wrong?”
That’s the million-dollar question to ask before a big IT event. We got together with our retail customers’ IT and engineering teams to explore and test for possible worst-case scenarios, like an entire site crash. We created a central Black Friday/Cyber Monday war room staffed with senior-level, experienced Googlers from the Professional Services, Support, and Site Reliability Engineering (SRE) teams. This team of first responders was prepared to use real-time communications to stay connected and address any problems as soon as they arose. This was in addition to understanding customer and vendor integrations and making sure escalation paths were defined ahead of time, so that customer expectations were clear for various channels.
During that weekend, we doubled the number of on-call support staff available to retail customers. In some cases, we placed account teams on-site at GCP and Apigee retail customer locations to help as needed. We monitored whether any retail customers were starting to have reliability or latency problems. If something needed to be triaged, the war room team kicked into action, tackling issues and advising on next steps. The Google war room team also had direct, open access to Google engineers and executives for additional support.
Apigee team members kept a close eye on API traffic during the Black Friday period. The number of API calls for Apigee’s customers (excluding those who host the platform on-premises) grew 95% compared to the same span of time in 2017. Peak API traffic running through Apigee more than doubled, from 48,000 transactions per second (TPS) to 108,000 TPS this year, and the platform remained 99.999% available.
How retailers sailed through Black Friday and Cyber Monday
One of our retail partners, Shopify, is an e-commerce platform supporting more than 600,000 independent retailers. The complexity of managing all those storefronts makes predicting holiday site traffic and sales spikes even more challenging. Shopify provides a platform with 99.98% uptime, and calls BFCM their annual “World Cup” event.
Developing applications today comes with lots of choices and plenty to learn, whether you’re exploring serverless computing or managing a raft of APIs. In today’s post, we’re sharing some of our top videos on what’s new in application development on Google Cloud Platform (GCP) to find tips and tricks you can use.
This demo-packed session walks you through the use of Knative, our Kubernetes-based platform for building and deploying serverless apps. This session goes through how to get started with using Knative to further the goal of focusing on writing code. You’ll see how it uses APIs that are familiar from GKE, and auto-scales and auto-builds to remove added tasks and overhead. The demos show how Knative spins up prebuilt containers, builds custom images, previews new versions of your apps, migrates traffic to those versions, and auto-scales to meet unpredictable usage patterns, among other steps in the build and deploy pipeline. You’ll see the cold start experience, along with preconfigured monitoring dashboards and how auto-termination works.
The takeaway: Get an up-close view into how a serverless platform like Knative works, and what it looks like to further abstract code from the underlying infrastructure.
You have a lot of key choices to make when deciding how and which technology to adopt to meet your application development needs. In this session, you’ll hear about various options for running code and the tradeoffs that may come with your decisions. Considerations include what your code is used for: Does it connect to the internet? Are there licensing considerations? Is it part of a push toward CI/CD? Is it language-dependent or kernel-limited? It’s also important to consider your team’s skills and interests as you decide where you want to focus, and where you want to run your code.
The takeaway: Understand the full spectrum of compute models (and related Google Cloud products) first, then consider the right tool for the job when choosing where to run your code.
Kubernetes empowers developers by making hard tasks possible. In this session, we introduce Kubernetes as a workload-level abstraction that lets you build your own deployment pipeline, and starts with the premise that rather than making simple tasks easier. The session walks through how to deploy containers with Kubernetes, and configuring a deployment pipeline with Cloud Build. Deployment strategy advice includes using probes to check container integrity and connectedness, using configuration as code for a robust production deployment environment, setting up a CI/CD pipeline, and requesting that the scheduler provision the right resources for your container. It concludes with some tips on preparing for growth by configuring automated scaling using the requests per section (RPS) metric
The takeaway: Kubernetes can help you automate deployment operations in a highly flexible and customizable way, but needs to be configured correctly for maximum benefit. Help Kubernetes help you for best results.
There’s a lot of advice out there about APIs, so this session recommends focusing on what your goals are for each API you create. That could be updating or integrating software, among others. Choose a problem that’s important to solve with your API, and weigh your team and organization’s particular priorities when you’re creating that API. This session also points out some areas where common API mistakes happen, like version control or naming, and recommends using uniform API structure. When in doubt, keep it simple and don’t mess up how HTTP is actually used.
The takeaway: APIs have to do a lot of heavy lifting these days. Design the right API for the job and future-proof it as much as you can for the people and organizations who will use it down the road.
This session takes a top-to-bottom look at how we define and run serverless here at Google. Serverless compute platforms make it easy to quickly build applications, but sometimes identifying and diagnosing issues can be difficult without a good understanding of how the underlying machinery is working. In this session, you’ll learn how Google runs untrusted code at scale in a shared computing infrastructure, and what that means for you and your applications. You’ll learn how to build serverless applications that are optimized for high performance at scale, learn the tips and pitfalls associated with this, and see a live demo of optimization on Cloud Functions.
The takeaway: When you’re running apps on a serverless platform, you’re focusing on managing those things that elevate your business. See how it actually works so you’re ready for this stage of cloud computing.
Here’s a look at what serverless is, and what it is specifically on GCP. The bottom line is that serverless brings invisible infrastructure that automatically scales, and where you’re only charged for what you use. Serverless tools from GCP are designed to spring to life when they’re needed, and scale very closely to usage needs. In this session, you’ll get a look at how the serverless pieces come together with machine learning in a few interesting use cases, including medical data transcription and building an e-commerce recommendation engine that works even when no historical data is available. Make sure to stay for the cool demo from the CEO of Smart Parking, who shows a real-time, industrial-grade IoT system that’s improving parking for cities and drivers—without a server to be found.
The takeaway: Serverless helps workloads beyond just compute: learn how, why, and when you might use it for your own apps.
As California’s recent wildfires have shown, it’s often hard to predict where fire will travel. While firefighters rely heavily on third-party weather data sources like NOAA, they often benefit from complementing weather data with other sources of information. (In fact, there’s a good chance there’s no nearby weather station to actively monitor weather properties in and around a wildfire.) How, then, is it possible to leverage modern technology to help firefighters plan for and contain blazes?
Last June, we chatted with Aditya Shah and Sanjana Shah, two students at Monta Vista High School in Cupertino, California, who’ve been using machine learning in an effort to better predict the future path of a wildfire. These high school seniors had set about building a fire estimator, based on a model trained in TensorFlow, that measures the amount of dead fuel on the forest floor—a major wildfire risk. This month we checked back in with them to learn more on how they did it.
Why pick this challenge?
Aditya spends a fair bit of time outdoors in the Rancho San Antonio Open Space Preserve near where he lives, and wanted to protect it and other areas of natural beauty so close to home. Meanwhile, after being evacuated from Lawrence Berkeley National Lab in the summer of 2017 due a nearby wildfire, Sanjana wanted to find a technical solution that reduced the risk of fire even before it occurs. Wildfires not only destroy natural habitat but also displace people, impact jobs, and cause extensive damage to homes and other property. Just as prevention is better than a cure, preventing a potential wildfire from occurring is more effective than fighting it.
With a common goal, the two joined forces to explore available technologies that might prove useful. They began by taking photos of the underbrush around the Rancho San Antonio Open Space Preserve, cataloguing a broad spectrum of brush samples—from dry and easily ignited, to green or wet brush, which would not ignite as easily. In all, they captured 200 photos across three categories of underbrush: “
gr1” (humid), “
gr2” (dry shrubs and leaves), and “
gr3” (no fuel, plain dirt/soil, or burnt fuel).
Aditya and Sanjana then trained a successful model with 150 sample (training) images (roughly 50 in each of the three classes) plus a 50 image test (evaluation) set. For training, the pair turned to Keras, their preferred Python-based, easy-to-use deep learning library. Training the model in Keras has two benefits—it permits you to export to a TensorFlow estimator, which you can run on a variety of platforms and devices, and it allows for easy and fast prototyping since it runs seamlessly on either CPU or GPU.
Preparing the data
Before training the model, Aditya and Sanjana ran a preprocessing step on the data: resizing and flattening the images. They used the
image_to_function_vector, which accepts raw pixel intensities from an input bitmap image and resizes that image to a fixed size, to ensure each image in the input dataset has the same ‘feature vector’ size. As many of the images are of different sizes, the pair resized all their captured images to 32×32 pixels. Since Keras models take as their input a 1-dimensional feature vector, he needed to flatten the 32x32x3 image into a 3072-dimensional feature vector. Further, he defined the ImagePaths to initiate the list of data and label, then looped over the ImagePaths individually to load them to the folder storage using
cv2.imread function. Next, the pair extracted the class labels (as
gr3) from each image’s name. He then converted the images to feature vectors using
image_to_feature_vector function and updated the data and label lists to match.
Aditya and Sanjana next discovered that the simplest way to build the model was to linearly stacks layers, to form a sequential model, which simplified organization of the hidden layers. They were able to use the
img2vec function, built into TensorFlow, as well as a support-vector-machine (SVM) layer.
Next, the pair trained the model using a stochastic gradient descent (SGD) optimizer with learning rate = 0.05. SGD is an iterative method for finding an optimal solution by using stochastic gradient descent. There are a number of gradient descent methods typically used, including
adam. Aditya and Sanjana tried
rmsprop, which yielded very low accuracy (~47%). Some methods like
adagrad yielded slightly higher accuracy but took more time to run. So they decided to use SGD as it offers better optimization with good accuracy and fast running time. In terms of hyperparameters, the pair tried different numbers of training
epochs (50, 100, 200) and
batch_size values (10, 35, 50), and he achieved the highest accuracy (94%) with
epoch = 200 and
batch_size = 35.
Unfortunately, the order counts still don’t match. But this time the restaurant website has more orders (11) than the customer site (9). By reviewing the restaurant’s order list, we see the reason for this situation: duplicate orders. For example, an order for Hawaii, Broccoli Salad, Water was created by the customer only once but appears twice on the restaurant site, assigned to two different cooks! Customers may be fine with that but just like missing customer orders, delivering extra pizzas is not good for Kale Pizza & Pasta’s business.
Why are we getting duplicate orders? By looking into the reported errors, we see that not only does the
chooseCook function return transient errors but the
prepareMeal function does as well. Now, looking into the
processOrder function source code again, we see that a new order document is added to Cloud Firestore every time a function executes. This results in duplicates in the following scenario: when an order is added to Cloud Firestore and then the call to
prepareMeal function fails, the function is retried, resulting in the same order (potentially with a different cook assigned) being written to Cloud Firestore as a separate document.
We discussed situations like this in our blog post about idempotency, showing how you must make a function idempotent if you want to apply retries without duplicate results or side effects.
In this case, to make the
processOrder function idempotent, we can use a Cloud Firestore transaction in place of the
add() call. The transaction first checks if the given order has already been stored (using the event ID to uniquely identify an order), and then creates a document in the database if the order does not exist yet:
In October at Next ’18 London, we announced Cloud Identity for Customers and Partners (CICP) to help you add Google-grade identity and access management (IAM) functionality to your apps, protect user accounts, and scale with confidence—even if those users are customers, partners, and vendors who might be outside of your organization. CICP is now available in public beta.
Adding Google-grade authentication to your apps
All users expect simple and secure sign-up, sign-in, and self-service experiences from all their devices. While you could build an IAM system for your apps, it can be hard and expensive. Just think about the complexity of building and maintaining an IAM system that stays up-to-date with evolving authentication requirements, keeping user accounts secure in the face of threats that increase in occurrence and sophistication, and scaling the system reliably when the demand for your app grows.
Knative, the open-source framework that provides serverless building blocks for Kubernetes, is on a roll, and GKE serverless add-on, the first commercial Knative offering that we announced this summer, is enjoying strong uptake with our customers. Today, we are announcing that we’ve updated GKE serverless add-on to support Knative 0.2. In addition, today at KubeCon, RedHat, IBM, and SAP announced their own commercial offerings based on Knative. We are excited for this growing ecosystem of products based on Knative.
Knative allows developers to easily leverage the power of Kubernetes, the de-facto cross-cloud container orchestrator. Although Kubernetes provides a rich toolkit for empowering the application operator, it offers less built-in convenience for application developers. Knative solves this by integrating automated container build, fast serving, autoscaling and eventing capabilities on top of Kubernetes so you get the benefits of serverless, all on the extensible Kubernetes platform. In addition, Knative applications are fully portable, enabling hybrid applications that can run both on-prem and in the public cloud.
Knative plus Kubernetes together form a general purpose platform with the unique ability to run serveless, stateful, batch, and machine learning (ML) workloads alongside one another. That means developers can use existing Kubernetes capabilities for monitoring, logging, authentication, identity, security and more, across all their modern applications. This consistency saves time and effort, reduces errors and fragmentation and improves your time to market. As a user you get the ease of use of Knative where you want it, with the power of Kubernetes when you need it.
In the four months since we announced Knative, an active and diverse community of companies has contributed to the project. Google Kubernetes Engine (GKE) users have been actively using the GKE serverless add-on since its launch in July and have provided valuable feedback leading to many of the improvements in Knative 0.2.
In addition to Google, multiple partners are now delivering commercial offerings based on Knative. Red Hat announced that you can now start trying Knative as part of its OpenShift container application platform. IBM has committed to supporting Knative on its IBM Cloud Kubernetes Service. SAP is using Knative as part of its SAP Cloud Platform and open-source Kyma project.
A consistent experience, with the flexibility to run where you want, resonates with many enterprises and startups. We are pleased that Red Hat, IBM, and SAP are embracing Knative as a powerful open industry-wide approach to serverless. Here’s what Knative brings to each of the new commercial offerings:
“The serverless paradigm has already demonstrated that it can accelerate developer productivity and significantly optimize compute resources utilization. However, serverless offerings have also historically come with deep vendor lock-in. Red Hat believes that Knative, with its availability on Red Hat OpenShift, and collaboration within the open source community behind the project, will enable enterprises to benefit from the advantages of serverless while also minimizing lock-in, both from a perspective of application portability, as well as that of day-2 operations management.” – Reza Shafii, VP of product, platform services, at Red Hat
“IBM believes open standards are key to success as enterprises are shifting to the era of hybrid multi-cloud, where portability and no vendor lock-in are crucial. We think Knative is a key technology that enables the community to unify containers, apps, and functions deployment on Kubernetes.” – Jason McGee, IBM Fellow, VP and CTO, Cloud Platform.
“SAP’s focus has always been centered around simplifying and facilitating end-to-end business processes. SAP Cloud Platform Extension Factory is addressing the need to integrate and extend business solutions by providing a central point of control, allowing developers to react on business events and orchestrate complex workflows across all connected systems. Under the hood, we are leveraging cloud-native technologies such as Knative, Kubernetes, Istio and Kyma. Knative tremendously simplifies the overall architecture of SAP Cloud Platform Extension Factory and we will continue to collaborate and actively contribute to the Knative codebase together with Google and other industry leaders.” – Michael Wintergerst, SVP, SAP Cloud Platform
We’re excited to deliver enterprise-grade Knative functionality as part of Google Kubernetes Engine, and by its momentum in the industry. To get started, take part in ther GKE serverless add-on alpha. To learn more about the Knative ecosystem, check out our post on the Google Open Source blog.
In this post, we’ll help you get started deploying the Cloud Storage connector for your CDH clusters. The methods and steps we discuss here will apply to both on-premise clusters and cloud-based clusters. Keep in mind that the Cloud Storage connector uses Java, so you’ll want to make sure that the appropriate Java 8 packages are installed on your CDH cluster. Java 8 should come pre-configured as your default Java Development Kit.[Check out this post if you’re deciding how and when to use Cloud Storage over the Hadoop Distributed File System (HDFS).]
Here’s how to get started:
Distribute using the Cloudera parcel
If you’re running a large Hadoop cluster or more than one cluster, it can be hard to deploy libraries and configure Hadoop services to use those libraries without making mistakes. Fortunately, Cloudera Manager provides a way to install packages with parcels. A parcel is a binary distribution format that consists of a gzipped (compressed) tar archive file with metadata.
We recommend using the CDH parcel to install the Cloud Storage connector. There are some big advantages of using a parcel instead of manual deployment and configuration to deploy the Cloud Storage connector on your Hadoop cluster:
Self-contained distribution: All related libraries, scripts and metadata are packaged into a single parcel file. You can host it at an internal location that is accessible to the cluster or even upload it directly to the Cloudera Manager node.
No need for sudo access or root: The parcel is not deployed under /usr or any of the system directories. Cloudera Manager will deploy it through agents, which eliminates the need to use sudo access users or root user to deploy.
Create your own Cloud Storage connector parcel
To create the parcel for your clusters, download and use this script. You can do this on any machine with access to the internet.
This script will execute the following actions:
Download Cloud Storage connector to a local drive
Package the connector Java Archive (JAR) file into a parcel
Place the parcel under the Cloudera Manager’s parcel repo directory
If you’re connecting an on-premise CDH cluster or cluster on a cloud provider other than Google Cloud Platform (GCP), follow the instructions from this page to create a service account and download its JSON key file.
Create the Cloud Storage parcel
Next, you’ll want to run the script to create the parcel file and checksum file and let Cloudera Manager find it with the following steps:
1. Place the service account JSON key file and the create_parcel.sh script under the same directory. Make sure that there are no other files under this directory.
2. Run the script, which will look something like this:
$ ./create_parcel.sh -f <parcel_name> -v <version> -o <os_distro_suffix>
- parcel_name is the name of the parcel in a single string format without any spaces or special characters. (i.e.,, gcsconnector)
- version is the version of the parcel in the format x.x.x (ex: 1.0.0)
- os_distro_suffix: Like the naming conventions of RPM or deb, parcels need to be named in a similar way. A full list of possible distribution suffixes can be found here.
- d is a flag you can use to deploy the parcel to the Cloudera Manager parcel repo folder. It’s optional; if not provided, the parcel file will be created in the same directory where the script ran.
3. Logs of the script can be found in /var/log/build_script.log
Distribute and activate the parcel
Once you’ve created the Cloud Storage parcel, Cloudera Manager has to recognize the parcel and install it on the cluster.
The script you ran generated a .parcel file and a .parcel.sha checksum file. Put these two files on the Cloudera Manager node under directory /opt/cloudera/parcel-repo. If you already host Cloudera parcels somewhere, you can just place these files there and add an entry in the manifest.json file.
On the Cloudera Manager interface, go to Hosts -> Parcels and click Check for New Parcels to refresh the list to load any new parcels. The Cloud Storage connector parcel should show up like this:
[Editor’s note: This post originally appeared on the Velostrata blog. Velostrata has since come into the Google Cloud fold, and we’re pleased to now bring you their seasoned perspective on deciding to migrate to cloud. There’s more here on how Velostrata’s accelerated migration technology works. ]
At Velostrata, we’ve spent a lot of time talking about how to optimize the cloud migration process. But one of the questions we also get a lot is: What drives an enterprise’s cloud migration in the first place? For this post, we chatted with customers and dug into our own data, along with market data from organizations like RightScale and others to find the most common reasons businesses move to the cloud. If you think moving to the cloud may be in your future, this can help you determine what kinds of events may result in starting a migration plan.
1. Data center contract renewals
Many enterprises have contracts with private data centers that need to be periodically renewed. When you get to renegotiation time for these contracts, considerations like cost adjustments or other limiting factors often come up. Consequently, it’s during these contract renewal periods that many businesses begin to consider moving to the cloud.
When companies merge, it’s often a challenge to match up application landscapes and data—and doing this across multiple on-prem data centers can be all the more challenging. Lots of enterprises undergoing mergers find that moving key applications and data into the cloud makes the process easier. Using cloud also makes it easier to accommodate new geographies and employees, ultimately resulting in a smoother transition.
3. Increased capacity requirements
Whether it’s the normal progression of a growing business or the need to accommodate huge capacity jumps during seasonal shifts, your enterprise can benefit from being able to rapidly increase or decrease compute. Instead of having to pay the maximum for on-prem capacity, you can shift your capacity on-demand with cloud and pay as you go.
4. Software and hardware refresh cycles
When you manage an on-prem data center, it’s up to you to keep everything up to date. This can mean costly on-prem software licenses and hardware upgrades to handle the requirements of newly upgraded software. We’ve seen that when evaluating an upcoming refresh cycle, many enterprises find it’s significantly less expensive to decommission on-prem software and hardware and consider either a SaaS subscription or a lift-and-shift of that application into the public cloud. Which path you choose will depend greatly on the app (and available SaaS options), but either way it’s the beginning of a cloud migration project.
5. Security threats
With security threats only increasing in scale and severity, we know many enterprises that are migrating to the cloud to mitigate risk. Public cloud providers offer vast resources for protecting against threats—more than nearly any single company could invest in.
6. Compliance needs
If you’re working in industries like financial services and healthcare, ensuring data compliance is essential for business operations. Moving to the cloud means businesses are using cloud-based tools and services that are already compliant, helping remove some of the burden of compliance from enterprise IT teams.
7. Product development benefits
By taking advantage of benefits like a pay-as-you-go cost model and dynamic provisioning for product development and testing, many enterprises are finding that the cloud helps them get products to market faster. We see businesses migrating to the cloud not just to save time and money, but also to realize revenue faster.
8. End-of-life events
All good things must come to an end—software included. Increasingly, when critical data center software has an end-of-life event announcement, it can be a natural time for enterprise IT teams to look for ways to replicate those services in the cloud instead of trying to extend the life cycle on-prem. This means enterprises can decommission old licenses and hardware along with getting the other benefits of cloud.
As you can see, there are a lot of reasons why organizations decide to kick off their cloud journeys. In some cases, they’re already in the migration process when they find even more ways to use cloud services in the best way. Understanding the types of events that frequently result in a cloud migration can help you determine the right cloud architecture and migration strategy to get your workloads to the cloud.
Learn more here about cloud migration with Velostrata.
For engineers or developers in charge of integrating, transforming, and loading a variety of data from an ever-growing collection of sources and systems, Cloud Composer has dramatically reduced the number of cycles spent on workflow logistics. Built on Apache Airflow, Cloud Composer makes it easy to author, schedule, and monitor data pipelines across multiple clouds and on-premises data centers.
Let’s walk through an example of how Cloud Composer makes building a pipeline across public clouds easier. As you design your new workflow that’s going to bring data from another cloud (Microsoft Azure’s ADLS, for example) into Google Cloud, you notice that upstream Apache Airflow already has an ADLS hook that you can use to copy data. You insert an import statement into your DAG file, save, and attempt to test your workflow. “ImportError – no module named x.” Now what?
As it turns out, functionality that has been committed upstream—such as brand new Hooks and Operators—might not have made its way into Cloud Composer just yet. Don’t worry, though: you can still use these upstream additions by leveraging the Apache Airflow Plugin interface.
Using the upstream AzureDataLakeHook as an example, all you have to do is the following:
Copy the code into a separate file (ensuring adherence to the Apache License)
from airflow.plugins_manager import AirflowPlugin)
Add the below snippet to the bottom of the file:
AWS also has a free tier, it’s like giving the first hit of ecstasy to someone free. Why not use this free server. Then that server needs to expand and you make plans and youre hooked and know the AWS cloud better than Google.
Google Cloud offers a credit of $300 right now to try and get you involved but its not the same as a free tier of service. Once the $300 is gone its always going to cost you whereas you can downgrade a server back to the free tier if ya decide to do that.
There are also some wonky decisions that Google made that leave me annoyed almost daily. The fact you cant utilize smtp ports of the servers leaves me having to go all around to get a WordPress site to send emails…or the inability to easily transfer a project between accounts. I’ve landed myself in a situation where I transferred ownership but I didn’t remember to transfer billing but was no longer a project owner so I couldnt transfer billing anymore, customer service just acted like it made sense that I couldn’t use or config the resource but that my credit card was still going to be used.
SSH and SFTP into AWS fairly standardized and it is relatively seamless. Google makes these difficult.
The way they only give out one static ip address per zone. They have a BETA project and decide if its to allow multiple IPs but…what took so long? IP Aliases or multiple network ip addresses … on AWS I just added the IP addresses, why do I need more than one? Because my name servers need to have different IP address, but again I cant do it right now.
So with all these limits here and there I personally pay for my servers with AWS (its just easier to use) but I use Google Cloud for short experiments where I may need more than 1 IP, and a site that doesn’t ever send an email. is new and overdue.
You can request a Transfer Appliance directly from your GCP console. The service will be available in beta in the EU in a 100TB configuration with total usable capacity of 200TB. And it’ll soon be available in a 480TB configuration with a total usable capacity of a petabyte.
Moving HDFS clusters with Transfer Appliance
Customers have been using Transfer Appliance to move everything from audio and satellite imagery archives to geographic and wind data. One popular use case is migrating Hadoop Distributed File System (HDFS) clusters to GCP.
We see lots of users run their powerful Apache Spark and Apache Hadoop clusters on GCP with Cloud Dataproc, a managed Spark and Hadoop service that allows you to create clusters quickly, then hand off cluster management to the service. Transfer Appliance is an easy way to migrate petabytes of data from on-premise HDFS clusters to GCP.
Earlier this year, we announced the ability to configure Transfer Appliance with one or more NFS volumes. This lets you push HDFS data to Transfer Appliance using Apache DistCp (also known as Distributed Copy)—an open source tool commonly used for intra/inter-cluster data copy. To copy HDFS data onto a Transfer Appliance, configure it with an NFS volume and mount it from the HDFS cluster. Then run DistCp with the mount point as the copy target. Once your data is copied to Transfer Appliance, ship it to us and we’ll load your data into Cloud Storage.
Using Transfer Appliance in production
EU customers such as Candour Creative, which helps their clients tell stories through films and photographs, wanted to take advantage of having their content readily available in the cloud. But Zac Crawley, Director at Candour, was facing some challenges with the move.
“Multiple physical backups of our data were taking up space and becoming costly,” Crawley says. “But when we looked at our network, we figured it would take a matter of months to move the 40TBs of large file data. Transfer Appliance reduced that time significantly.”