News About GPC Services
Unity launched the first iteration of the Obstacle Tower Challenge in February, and the reception from the AI research community has been very positive. The competition has received more than 2,000 entries from several hundred teams around the world, including both established research institutions and collegiate student teams. The top batch of competitors, the highest scoring 50 teams, will receive an award sponsored by Google Cloud and advance to the second round.
Completing the first round was a significant milestone, since teams had to overcome a fairly difficult hurdle, advancing past several levels of increased difficulty in the challenge. None of these levels were available to the researchers or their agents during training, so the agents had to learn complex behavior and generalize their behavior to handle previously unseen situations.
The contest’s second round features a set of additional levels. These new three-dimensional environments incorporate brand new puzzles and graphical elements that force contestant research teams to develop more sophisticated machine learning models. New obstacles may stymie many of the agents that passed the levels from first phase.
How Google Cloud can help
Developing complex game agents is a computationally demanding task, which is why we hope that the availability of Cloud credits will help participating teams. Google Cloud offers the same infrastructure that trained AlphaGo’s world-class machine learning models, to any developer around the world. In particular we recently announced the availability of Cloud TPU pods, for more information you can read this blog post.
All of us at Google Cloud AI would like to congratulate the first batch of successful contestants of the Unity AI challenge, and we wish them the best of luck as they enter the second phase. We are excited to learn from the winning strategies.
There’s never a dull moment in the big world of Google, and we came across a few especially interesting stories in the past month for you tech lovers out there. Read on for the latest in new technology and new ideas.
Neural networks help create kiss detection technology
That’s right, “kiss detection” is an actual feature in the Pixel 3 Camera app in Photobooth mode, part of its improved selfie-taking capabilities. Photobooth mode is optimized for the front-facing camera, and developing this new detection mode required the use of two models: one for facial expressions and one to detect when people kiss. The team worked with photographers to identify key facial expressions that would trigger capture, then trained a neural network to classify those expressions. The new feature means the camera automatically takes a photo when the camera is steady and can tell that the subjects are kissing, resulting in better selfies.
Build your own smart device
Our brand-new Coral platform, designed to make AI hardware development easier, is now available through several global distributors. The Coral products include a dev board, USB accelerator, and camera, all powered by Google AI’s Edge TPU, a custom-designed ASIC chip that provides high-performance ML inferencing for low-power devices. For example, the Edge TPU chip can execute state-of-the-art mobile vision models such as MobileNet V2 at 100+ fps in a power-efficient manner. Last month, the new Environmental Sensor Board became available, so developers can bring sensor input into models. It has integrated light, temperature, humidity, and barometric sensors, and the ability to add more sensors via its four Grove connectors. There’s also an updated Edge TPU model compiler and new C++ API.
Witnessing happy little cloud moments
Here’s a bird’s-eye view from a Google cloud architect who’s worked with lots of companies getting started with cloud, and his take on what makes users really happy when they start using Google Cloud Platform (GCP). Some are simple, like the concept of a project in GCP, which is a namespace that groups resources together and by default isn’t available to any other project. There are also network tags to make firewall rule creation easier, and some console features that users often love.
Serverless and containers, better together
Serverless should really be called “service-ful,” says one interviewee on this GCP podcast about the new Cloud Run, since running serverless containers in the cloud lets you focus on code and building services, not infrastructure. Cloud Run lets you to run any language, binary, or code in a container in the cloud, and delivers the pay-per-use model serverless is known for. There are two versions: Cloud Run, the fully managed service for running serverless containers, and Cloud Run for GKE, which runs the compute inside your GKE cluster.
Google Earth Timelapse shows the world go round
The Google Earth team released some new updates last month to the Google Earth Timelapse, a video visualization of our planet’s surface from 1984 to 2018. If you haven’t checked this out yet, it’s now available to see on mobile devices and tablets. It’s a very cool look at how the Earth has changed, for example how Las Vegas has grown or how landslides have increased on one island. The visualization uses Google Earth Engine to analyze more than 15 million satellite images, and uses technology from Carnegie Mellon’s CREATE Lab to make the video interactively explorable. And for an extra dash of inspiration, see how a high school student is using Google Earth Engine at her NASA internship to monitor mangrove ecosystems.
Let us know what you’ve been reading lately. Tell us your recommendations here.
There were lots of product announcements along with learning opportunities. Not surprisingly, our top stories from April were all from the big event. Read on to catch up!
Next ‘19 at a glance
Whether you attended Next ‘19 or not, you can catch up on all that happened from our list of all 122 announcements from the show. Read about the news in compute and infrastructure, as well as a ton of launches tied to identity and security on GCP. There are also new features to explore in data analytics and AI/ML, details on running Windows workloads on GCP and improvements in productivity and collaboration with G Suite. Finally, scroll down the list to learn how customers are using GCP. Our blog now even has its own dedicated Next section where you can find all the posts from the event.
The future of the cloud is open
At Next ‘19, we introduced Anthos, our hybrid cloud platform that will let you write once, run anywhere. It’s designed so you can write an app and run it without modifying the code across platforms: GCP, Google Kubernetes Engine (GKE), GKE On-Prem, and, soon, third-party clouds. Anthos is completely software-based, using open APIs so users can easily build and manage hybrid clouds. In addition, Anthos Migrate (available in beta) can auto-migrate VMs into GKE containers.
Also at Next, we announced new partnerships with seven open source-centric database and analytics providers. This means that you can use their managed services through GCP, with the added benefits of unified management, billing and support. A ton of new applications being developed today run these partners’ open-source database systems, ranging from general purpose databases to specific ones for time-series, graph and search use cases.
No servers, no problem
Cloud Run, our new serverless compute offering, also entered beta last month. Cloud Run lets you run stateless HTTP-driven containers, while handling all infrastructure management, including server provisioning, configuring, scaling and management behind the scenes. With Cloud Run, you can scale your containers up or down quickly (even to zero), giving you the flexibility of containers and velocity of serverless. Cloud Run is based on Knative, an open-source API and runtime environment.
Extending the tools developers use
As cloud evolves, so does application development. Cloud Code is a set of plug-ins for IntelliJ and Visual Studio (VS) Code that bring automation and assistance to every phase of the software development lifecycle, using the tools developers already use. Integrated development environments (IDEs) can automate a lot of a developer’s work, but can be challenging for cloud development. Cloud Code uses command-line container tools such as Skaffold, Jib and Kubectl under the hood, so you see continuous feedback as you’re building your project in a Kubernetes environment.
That’s a wrap for April. We’ll see you next month.
This may seem off topic but its on topic, technical SEO is imperative … you’re not going to rank number one on Google using Shopify or Wix. It just isnt going to happen.
Its also apparently difficult to get solid advice on SEO Hosting from “experts” Best Blog Hosting for SEO is junk … reciting features doesnt make a hosting plan the best…one quote notes that WordPress is already installed with InMotionHosting.com … so what! Our web servers are preconfigured to install WordPress in every new account as well…it only saves maybe 5 minutes per user but for a web host that time adds up very quickly. But you arent a web host so it’s not that big of a deal. I’d like to hear about benchmarking tests they may have run to decide who is the best.
Features Aren’t Technical Specs
Unlimited bandwidth…sounds great but what are the limits? There are limits and these are beyond the hosts control sometimes but for instance …. if someone uses a CAT5 cable instead of a CAT6 everything will be more speed limited and especially if a bottle neck is designed in to infrastructure. Unlimited bandwidth means nothing to me because there are limits … physical limits exist and can’t be avoided.And WordPress preinstalled saves someone 5 minutes but nothing else. These aren’t important to the Hosting platform.
Cloud Computing: Be Your Own Host
The industry standard in web hosting is cPanel. No way around it with cPanel your support opinions are bountiful where as dreamhost.com has its own proprietary server software … its no better in actuality its just far less supported by third parties. Ultimate SEO is hosted on a variety of cPanel servers that were eay to build and deploy, I made them from scratch and with templates but all in all there are 4 AWS servers, 2 Google Cloud Platfrom and 4 Digital Ocean currently powering hundreds of sites including this site. Cost varies wildly…
Its important to note that your web host is honestly likely run on one of these three services. Youre sharing their share of the cloud environment. Why not just skip ahead and be the master of your domain….sure it will cost more than $3 a month … but that $3 a month hosting plan is shit.
Amazon Web Services
I don’t even know what I am spending, where and how it is being spent. AWS charges you for everything little thing and no matter what steps you might take it may seem like rising project costs are simply unavoidable. There platform to work within is NOT intuitive and it will require some play time to remember that you have to leave the virtual server’s configuration area to select an IP address ( that will cost you money…each ip address, not talking about bandwidth I’m just saying the number ) and then return to that original area to associate it. Dont even think about swapping hard drives and knowing what is attached t what unless you are prepared to write down long strings of numbers and letters.
AWS does provide greater flexibility than the others on options beyond just a virtual server…but unless you plan to send 100,oo0 emails a day to people you wont benefit from their email service … as an example. Technical SEO wise I’d give AWS a D overall. Infrastructure and computing power is an obivous A+ but its how you interact with that that weighs the grade.
Poor navigation and the nickle and dime pricing is absurd. Want to monitor your usage so you can understand your bill? Monitoring costs more…its ridiculous.
They do offer reserved instances and I loaded up on those but still my costs never decreased. AWS is so hard to understand billing wise that IT Managed Service Providers will offer free excel templates to figure out your AWS monthly costs. Think I’m being over the top? Check out this calculator form sheet by AWS to forecast your expenses.
Heres something crazy…why my April bill was 167 but AWS forcasts it will be $1020 in May I have no idea. I’m not adding servers…
Google Cloud Platform
Is easier to use and wrap your head around but it is considerably more expensive than either of the other options. For this simple reason…they receive an F. The additional costs come with less options and less features than AWS. Billing is more transparent and you can understand why your bill is what it is at least. But Google also makes unilateral decisions for you like blocking smtp and ssh access. Sure its more secure but it makes email and server maintenance a nightmare. Documents like this Connecting to Instances make it seem like not a big deal, but these wont allow you to move a file from your computer to the server like SFTP would.
They are expensive, offer less and needlessly shot you in the foot with their restrictions. Thats why I stand by the F as an overall grade. Now infrastructure capabilities … A+ no doubt about it.
I received no compensation or thank you from anyone for writing this … Digital Ocean is my B+ graded cloud solution. Its the cheapest, and they don’t seem to charge you a fee for tools that are required for the main product to function, unlike AWS and their static ip addresses. They have the least ability and options outside of a virtual server. If you want a database server thats in the works unless you can use Postgres. Thats limiting, but it is also not important if you’re just running a few web servers that will already have MySQL installed on them anyhow.
Digital Ocean is the no frills, no surprises, cloud computing option. The reason I have so many servers is because I am migrating everything off AWS and Google Cloud to Digital Ocean…it’ll be cheaper. A lot cheaper…
Thats right… $20 vs $121, $177 and $120 from AWS, GCP and Azure. I didn’t really consider Microsoft Azure just because I have reservations moving into their sphere or control where every thing you need to do is addressed by yet another Microsoft product that usually has little imagination in it.
Test out a server in each environment and I think you’ll quickly take to the Digital Ocean option.
Using the cloud to better understand our environment
The earth is pretty big. Learning about what’s going on around the globe is often a function of being able to crunch an incredible amount of data and understand millions of images. Understanding this type of data at scale is an excellent use for cloud computing.
The Google Earth Engine and Google Cloud teams came together to map the land cover in each 30-meter square of the earth, going back to 2013. The combination of Google Earth Engine with App Engine, Cloud Pub/Sub, Cloud Dataflow, TensorFlow, Kubeflow and ML Engine form an end-to-end pipeline that creates value out of raw data. This data pipeline turns pixels into rich map information, with machine learning allowing this to be done over time. Taking a time-series look at the earth’s land cover can help track and understand things like urbanization, deforestation, water resource changes and cropland views. In a Cloud Next session, Nicholas Clinton, David Cavazos and Christopher Brown from Google explained the process in detail to answer the question, What is on Earth?
Insights about our environment can help us make better decisions about management, urban planning and climate targets. To help deliver those insights to decision makers, Google recently launched the Environmental Insights Explorer, built using GCP, to analyze Google Maps data and provide rich insights into the vital signs of our planet. Cities are using these insights to create carbon baselines and accelerate climate action plans. Hear all about it from Saleem Van Groenou and Denise Pearl in Global City Climate Action Analysis with Geo Data.
Investing in ideas that go round and round
At Google Cloud, we believe that with the help of modern technology, business can be a positive catalyst for change. That’s why we’ve partnered with SAP to host a sustainability contest for social entrepreneurs called Circular Economy 2030. We invited thought leaders from around the world to submit a revenue-generating idea that uses Google Cloud and SAP technology to advance a circular economy—a holistic system that designs out waste and pollution, keeps products in use, and regenerates natural resources.
In collaboration with UN Environment, the Ellen MacArthur Foundation, and the Global Partnership for Sustainable Development Data, we selected a total of five finalists from a pool of 250+ applications from 50+ countries around the world. Each of the finalists excelled across the four assessment criteria of original idea, business model, potential impact, and technical innovation, and demonstrated their passion for sustainability. Whether working to track industrial waste flows for increased accountability or addressing rural food waste with solar-powered cold storage, the Circular Economy 2030 finalists are all advancing a better, more equitable, and more sustainable future.
To learn more about the Circular Economy 2030 contest and the five finalists, check out the panel session from Google Cloud Next where we announced the winners and discussed how to use cloud computing for a sustainable future.
Advances in technology are helping drive advances in sustainability and creative ideas to improve our global environment. We’re looking forward to seeing continued innovation around the world.
Monitoring provides a time series list API method, which returns collected time series data. Using this API, you can download your monitoring data for external storage and analysis. For example, using the Monitoring API, you could download your time series and then store it in BigQuery for efficient analysis.
Analyzing metrics over a larger time window means that you’ll have to make a design choice around data volumes. Either you include each individual data point and incur the time and cost processing of each one, or you aggregate metrics over a time period, which reduces the time and cost of processing at the expense of reduced metrics granularity.
Monitoring provides a powerful aggregation capability in the form of aligners and reducers available in the Monitoring API. Using aligners and reducers, you can collapse time-series data to a single point or set of points for an alignment period. Selecting an appropriate alignment period depends on the specific use case. One hour provides a good trade-off between granularity and aggregation.
Each of the Monitoring metrics have a metricKind and a valueType, which describe both the type of the metric values as well as what the values represent (i.e., DELTA or GAUGE values). These values determine which aligners and reducers may be used during metric aggregation.
For example, using an ALIGN_SUM aligner, you can collapse your App Engine
http/server/response_latencies metrics for each app in a given Stackdriver Workspace into a single latency metric per app per alignment period. If you don’t need to separate the metrics by their associated apps, you can use an ALIGN_SUM aligner combined with a REDUCE_PERCENTILE_99 reducer to collapse all of your App Engine latency metrics into a single value per alignment period, as shown here:https://cloud.google.com/blog/products/management-tools/how-to-use-stackdriver-monitoring-export-for-long-term-metric-analysis/
With the sheer number of applications in today’s enterprises, it can be hard for procurement departments and cloud administrators to maintain compliant and efficient procurement processes for their cloud development teams. Last week at Next we introduced you to Private Catalog, a new service from Google Cloud that lets you control the availability and distribution of IT solutions to maintain compliance and governance, simplify internal solution discovery, and ensure that only approved and compatible apps are available throughout your organization. Here’s a bit more color.
Stay compliant, control access
Private Catalog helps you reduce complexity in regulated industries, or when handling sensitive data. Controlling which apps your developers use can help you avoid costly data loss, data leaks, or reliability issues from unverified code. You can ensure that only products that meet your compliance and governance rules are published to your catalog and available to your developer teams. For additional control, you can create hierarchies complete with access controls in the catalog, limiting who can deploy what within your organization.
Create a collaborative environment
Centralizing your apps is not only good for compliance, it’s good for productivity. Distributed workforces often create technology silos, introducing redundancies across your teams. Private Catalog simplifies how users find sanctioned applications—they simply navigate to a single place to find all the approved internal apps available to them. And when central IT teams create a new solution, Private Catalog makes it easy to distribute it to the whole organization.
Less failures, more efficiency
Failing to control how you deploy internal apps leads to inefficient resource usage and more support tickets. With Private Catalog, you can control how you distribute your software according to parameters in Cloud Deployment Manager templates, including regions, RAM, CPUs, and almost any other value. When you control the parameters, you ensure that apps have the correct amount of resources, in approved configurations.
Here are some highlights from day one:
Making hybrid cloud a reality
We introduced Anthos—formerly Cloud Services Platform—to help customers by providing a unified programming model and monitoring/policy management framework across on-premises and multiple clouds, not just Google Cloud Platform (GCP). Cisco and VMware joined us on stage to talk about how we’re working together to make hybrid cloud a reality.
What we announced
- Anthos is now generally available on Google Kubernetes Engine (GKE) and GKE On-Prem, so you can deploy, run and manage your applications on-premises or in the cloud. Coming soon, we’ll extend that flexibility to third-party clouds like AWS and Azure. And Anthos is launching with the support of more than 30 hardware, software and system integration partners so you can get up and running fast.
- With Anthos Migrate, powered by Velostrata’s migration technology, you can auto-migrate VMs from on-premises or other clouds directly into containers in GKE with minimal effort.
- Anthos Config Management lets you create multi-cluster policies out of the box that set and enforce role-based access controls, resource quotas, and namespaces—all from a single source of truth.
- Partners such as VMware, Dell EMC, HPE, Intel, and Lenovo have committed to delivering Anthos on their own hyperconverged infrastructure for their customers. By validating Anthos on their solution stacks, our mutual customers can choose hardware based on their storage, memory, and performance needs.
Bringing serverless to the full stack
From constrained runtime support to vendor lock-in, traditional serverless offerings often come with some significant challenges. As a result, developers often find themselves choosing between the ease and velocity of serverless or the flexibility and portability of containers. Today, we introduced Cloud Run, a new serverless compute platform for containerized apps with portability built-in, to give you the best of both worlds.
What we announced
- Cloud Run, our fully managed serverless execution environment, offers serverless agility for containerized apps.
- Cloud Run on GKE brings the serverless developer experience and workload portability to your GKE cluster.
- Knative, the open API and runtime environment, brings a serverless developer experience and workload portability to your existing Kubernetes cluster anywhere.
- We’re also making new investments in our Cloud Functions and App Engine platforms with new second generation runtimes, a new open-sourced Functions Framework, and additional core capabilities, including connectivity to private GCP resources.
Announcing partnerships to bring fully managed open source to Google Cloud
Google Cloud is deeply committed to open source—and we know so many organizations would benefit from fully managed services that let you get the best of open-source innovation and operate those technologies at scale. To help, Thomas announced our strategic partnerships with leaders in the open source community to deliver the industry’s first comprehensive platform for fully managed open source-centric database and analytics services.
What we announced
What our customers and partners are saying
We’re glad to bring our larger community together this week to make connections and learn from each other. We’ve heard some great stories today about how the cloud is helping change the way companies do business, whether they’re improving global patient care or reimagining bank branches as centers for community and education. Here are a few of our favorite quotes from day one:
“It’s not about the cloud working backward, it’s about the realities of the environment that all of our customers live in today. We need to build open ecosystems the people around us can embrace and extend.” —David Goeckeler, EVP and General Manager, Networking and Security, Cisco
“Retail in the midst of major transformation. As a retailer, we needed to innovate. We also needed to become more engineering focused. Google has that same DNA—an engineering focus driving innovation. That really blended us together in terms of this partnership.” —Ratnakar Lavu, CTO, Kohl’s
“Monetizing open source was always a very big challenge for open source vendors and more so in the cloud era. Google has taken a very different approach from other cloud vendors when it comes to open source, and I think this is great news for the open source community.” —Ofer Bengal, CEO, Redis Labs.
Even if your workloads and apps can’t fully migrate to the cloud for your unique business reasons, you can still take full advantage of Google’s innovative technology. Containers and Kubernetes underpin any modern hybrid cloud strategy, and Google’s Cloud Services Platform (CSP) brings the best of these technologies to your datacenter.
At Google Cloud Next ‘19 this year, we’re offering more than 45 sessions on topics ranging from adapting your existing application to a hybrid cloud environment to building, running, and managing microservices both on-prem and in the cloud.
If you’re joining us at the event, don’t forget to mark these specific sessions to hear from the folks who helped originate CSP:
1. Bringing the Cloud to You: Join us in this spotlight session after our day-one keynote to learn about one of our big announcements at Next this year. This spotlight session will show you how services built on Kubernetes and Istio will bring the efficiency, speed, and scale of cloud to you. We’ll show you how these tools and technologies can help you build reliable, secure, and high-performing cloud services for today and the future.
2. Fireside Chat with Eric Brewer: Hear the latest from the person who introduced Kubernetes to the world almost five years ago. Eric is Vice President of Infrastructure at Google Cloud and has been working on Google’s cluster and compute infrastructure since 2011. This session is presented by the Kubernetes Podcast from Google, a weekly news and interview show with insight from the Kubernetes community.
While you’re at it, here’s a list of must-see hybrid cloud and container sessions, as well as sessions on how to modernize your application developments. Be sure to register by clicking the links below to reserve your spot—seats are filling up fast!
Managing Applications The Kubernetes Way
Ever wonder how it is to write applications that manage other applications? In this session, we will show you how to build custom Kubernetes controllers for managing your applications. We’ll also share best practices based on Google’s experience of managing large-scale workloads.
Using GKE On-Prem to Manage Kubernetes in Your Datacenter
We’ll explore how customers can best leverage GKE On-Prem and CSP to manage a true hybrid environment. We’ll walk through common use cases for GKE On-Prem and then deep dive into the tech stack. We’ll also walk through how to install GKE On-Prem into your vSphere environment and to manage the cluster from Google Cloud.
Next Generation CI/CD with GKE and Tekton
Deciding on a CI/CD system for Kubernetes can be a frustrating experience—there are a gazillion to choose from, and traditional systems were built before Kubernetes existed. We’ve teamed up with industry leaders to build a standard set of components, APIs and best practices for cloud native, CI/CD systems. Through examples and demos, we will show off new, Kubernetes-native resources that can be used to get your code from source to production with a modern development workflow that works in multi-cloud and hybrid cloud environments.
Onramp to Istio: An Adoption Story
This session will take you through the journey of a customer in EMEA who has implemented Istio in production. We will start from the problems that led them to service mesh in the first place. We will discuss the decisions they made as they planned and started their implementation. Finally, we’ll talk about how Istio has changed their day-to-day life and what the benefits have been.
Bringing Your Kubernetes Clusters to GCP
Congratulations, you’ve successfully rolled out Kubernetes across your organization and you’re now running clusters in places you never thought was possible. How do you begin to corral those clusters to see what’s going on? In this talk, we’ll show you how to get visibility into your workloads and to take advantage of Google Cloud.
Lastly, we know hybrid cloud is top of mind, so we’re bringing you more ways to engage with us face-to-face to answer any burning questions. Check out these on-site attractions in Moscone South.
DevZone: Stop by DevZone and meet the developers behind the cloud products you use everyday. It’s open to all attendees and located in between the keynote floor and the showcase.
Google Cloud Next Showcase: Join us at the Showcase to see what we’ve been up to over the past year.
The Cloud Lab: Discuss strategy or ask questions with Google Cloud experts one-on-one. When you arrive at Next, be sure to reserve your meeting time by going to The Cloud Lab as space is limited. You can do so in the DevZone.
From predicting appliance usage from raw power readings, to medical imaging, machine learning has made a profound impact on many industries. Our AI and machine learning sessions are amongst our most popular each year at Next, and this year we’re offering more than 30 on topics ranging from building a better customer service chatbot to automated visual inspection for manufacturing.
If you’re joining us at Next, here are nine AI and machine learning sessions you won’t want to miss.
1. Automating Visual Inspections in Energy and Manufacturing with AI
In this session, you can learn from two global companies that are aggressively shaping practical business solutions using machine vision. AES is a global power company that strives to build a future that runs on greener energy. To serve this mission, they are rigorously scaling the use of drones in their wind farm operations with Google’s AutoML Vision to automatically identify defects and improve the speed and reliability of inspections. Our second presenter joins us from LG CNS, a global subsidiary of LG Corporation and Korea’s largest IT service provider. LG’s Smart Factory initiative is building an autonomous factory to maximize productivity, quality, cost, and delivery. By using AutoML Vision on edge devices, they are detecting defects in various products during the manufacturing process with their visual inspection solution.
2. Building Game AI for Better User Experiences
Learn how DeNA, a mobile game studio, is integrating AI into its next-generation mobile games. This session will focus on how DeNA built its popular mobile game Gyakuten Othellonia on Google Cloud Platform (GCP) and how they’ve integrated AI-based assistance. DeNA will share how they designed, trained, and optimized models, and then explain how they built a scalable and robust backend system with Cloud ML Engine.
3. Cloud AI: Use Case Driven Technology (Spotlight)
More than ever, today’s enterprises are relying on AI to reach their customers more effectively, deliver the experiences they expect, increase efficiency and drive growth across their organizations. Join Andrew Moore and Rajen Sheth in a session with three of Google Cloud’s leading AI innovators, Unilever, Blackrock, and FOX Sports Australia, as they discuss how GCP and Cloud AI services, like the Vision API, Video Intelligence API, and Cloud Natural Language have made their products more intelligent, and how they can do the same for yours.
4. Fast and Lean Data Science With TPUs
Google’s Tensor Processing Units (TPUs) are revolutionizing the way data scientists work. Week-long training times are a thing of the past, and you can now train many models in minutes, right in a notebook. Agility and fast iterations are bringing neural networks into regular software development cycles and many developers are ramping up on machine learning. Machine learning expert Martin Görner will introduce TPUs, then dive deep into their microarchitecture secrets. He will also show you how to use them in your day-to-day projects to iterate faster. In fact, Martin will not just demo but train most of the models presented in this session on stage in real time, on TPUs.
5. Serverless and Open-Source Machine Learning at Sling Media
This session covers Sling’s incremental adoption strategy of Google Cloud’s serverless machine learning platforms that enable data scientists and engineers to build business-relevant models quickly. Sling will explain how they use deep learning techniques to better predict customer churn, develop a traditional pipeline to serve the model, and enhance the pipeline to be both serverless and scalable. Sling will share best practices and lessons learned deploying Beam, tf.transform, and TensorFlow on Cloud Dataflow and Cloud ML Engine.
6. Understanding the Earth: ML With Kubeflow Pipelines
Petabytes of satellite imagery contain valuable indicators of scientific and economic activity around the globe. In order to turn its geospatial data into conclusions, Descartes Labs has built a data processing and modeling platform for which all components run on Google Cloud. Descartes leverages tools including Kubeflow Pipelines as part of their model-building process to enable efficient experimentation, orchestrate complicated workflows, maximize repeatability and reuse, and deploy at scale. This session will explain how you can implement machine learning workflows in Kubeflow Pipelines, and cover some successes and challenges of using these tools in practice.
7. Virtual Assistants: Demystify and Deploy
In this session, you’ll learn how Discover built a customer service solution around Dialogflow. Discover’s data science team will explain how to execute on your customer service strategy, and how you can best configure your agent’s Dialogflow “model” before you deploy it to production.
8. Reinventing Retail with AI
Today’s retailers must have a deep understanding of each of their customers to earn and maintain their loyalty. In this session, Nordstrom and Disney explain how they’ve used AI to create engaging and highly personalized customer experiences. In addition, Google partner Pitney Bowes will discuss how they’re predicting credit card fraud for luxury retail brands. This session will discuss new Google products for the retail industry, as well as how they fit into a broader data-driven strategy for retailers.
9. GPU Infrastructure on GCP for ML and HPC Workloads
ML researchers want a GPU infrastructure they can get started with quickly, run consistently in production, and dynamically scale as needed. Learn about GCP’s various GPU offerings and features often used with ML. From there, we will discuss real-world customer story of how they manage their GPU compute infrastructure on GCP. We’ll cover the new NVIDIA Tesla T4 and V100 GPU, Deep Learning VM Image for quickly getting started, preemptible GPUs for low cost, GPU integration with Kubernetes Engine (GKE), and more.
Every time I meet with our customers in the capital markets, they share new ways they are reinventing their businesses. Recently, I met with a CIO from a large investment bank looking to take the next step in the bank’s cloud adoption journey. We talked about everything from creating a plan for public cloud migration of mission-critical workloads and communicating it to regional regulators, to developing a roadmap for adopting engineering-driven software operations methodologies across the organization. The CIO repeatedly emphasized the bank’s collective commitment to creating a culture of innovation. What would it take to achieve this evolutionary transformation?
IT leaders in capital markets are asking the same question. Google Cloud recently contracted Aite Group, an independent research and advisory firm focused on business, technology, and regulatory issues and their impact on the financial services industry. Aite surveyed 19 capital markets firms regarding their respective public cloud adoption journeys. Here are valuable insights into what these firms do to bring metamorphic change:
1. They learn from the tech industry.
Technology is becoming more and more vital to non-tech companies, but innovation can stall if you don’t fundamentally change how you build software. Successful capital markets firms have taken cues from traditional tech companies, adopting software operations methodologies such as continuous integration and continuous delivery (CI/CD), code reviews, unit and integration testing, incremental rollout, blameless post-mortems, and more. These practices accelerate ROI and support innovation, and are a significant reason why the tech industry builds software more effectively than other industries. Even though following these practices may slow down new code development in the short-term, it significantly reduces time spent on code maintenance down the road, freeing developers to innovate.
Most importantly, innovative capital markets firms adopt a “lifelong learning” attitude within the organization, emphasizing “training first” to reduce ramp-up times and respond in a fast-changing capital markets environment. They recognize that every employee can be a cloud worker, connected 24/7; security and workplace policies support this reality.
2. They foster a front-office culture of “everyone is a programmer” and bring AI to the middle and back office.
By democratizing the ability to build solutions across the business rather than isolating those capabilities in innovation labs, firms can build better products for their clients. Especially because code is easier to follow, audit and test than with traditional tools such as spreadsheets. The front office may finally be less wedded to management via spreadsheet, if the tools are more fit for purpose.
In the middle and back office, machine learning (ML) and artificial intelligence (AI) may bring much needed relief in areas such as trade surveillance, where sophisticated malicious attacks make identifying breaches increasingly challenging. Moving from a rules-based review of electronic communications and compliance data to natural language processing refines data results. It allows firms to more seamlessly integrate electronic communications flags within the overall surveillance infrastructure. Similarly, cybersecurity could also benefit from more comprehensive and proactive activity monitoring by way of ML- and AI-based tools.
3. They use data openly with strong controls and security.
One CIO at a tier-1 global bank predicts that in the future, regulations such as GDPR will require data access to be granted by the end client—whether a retail investor or a large pension fund. Storing data in a manner where access can be granted or revoked by users easily across service providers—from large custodians through small service providers—will be essential to retaining business moving forward. Cloud-based services that incorporate tools for data loss prevention, obfuscation, tokenization, encryption and logging can help firms meet the security, privacy and data lineage requirements of emerging data-related regulations and user preferences.
4. They adopt production ML systems.
There’s more to ML than implementing an algorithm. Production ML systems equipped for data collection, verification, machine resource management, analysis and other functions enable firms to improve monitoring, prediction scaling, error diagnosis, reporting and other tasks that support trading operations. For example, a proprietary trading firm in Singapore uses TensorFlow, an open-source machine learning library for numerical computation, with the Google Cloud Bigtable NoSQL database service, to “listen” to live market data and make trading decisions.
5. They commit to open-source code with serverless applications.
Using open-source code rather than starting all software projects from scratch also speeds up innovation, provides tighter security and offers freedom from vendor lock-in. Plus publicly sharing changes to open-source software permits a richness of thought and a continuous feedback loop with users. Numerous capital markets firms have begun to champion open-source development and participate in related industry groups, such as the Fintech Open Source Foundation (FINOS).
To learn more about how these innovators are transforming their firms for greater efficiency and competitive differentiation using cloud-based thinking, check out our latest white paper, “Cloud as an Innovation Platform in Capital Markets.”
There are lots of great tools for machine learning, and there are lots of ways to solve a machine learning problem. The key is using the right tool for the right situation. While Scikit-learn, TensorFlow, Keras, and PyTorch all have their merits, for this case, BigQuery ML’s ease and speed can’t be beat.
Not convinced? Try this Qwiklab designed for basketball analysis and you’ll see what we mean!
Since we didn’t have to design our architecture from scratch, we wanted to expand and collaborate with more basketball enthusiasts. College students were a natural fit. We started by hosting the first-ever Google Cloud and NCAA Hackathon at MIT this past January, and after seeing some impressive work, we recruited about 30 students from across the country to join our data analyst ranks.
The students have split into two teams, looking at the concepts of ‘explosiveness’ and ‘competitiveness,’ each hoping to build a viable metric to evaluate college basketball teams. By iterating over Google Docs, BigQuery, and Colab, the students have been homing in on ways to use data and quantitative analysis to create definition around previously qualitative ideas.
For example, sportscasters often mention how ‘explosive’ a team is at various points in a game. But aside from watching endless hours of basketball footage, how might you go about determining if a team was, in fact, playing explosively? Our student analysts considered the various factors that come into explosive play, like dunks and scoring runs. By pulling up play-by-play data in BigQuery, they could easily find boxscore data with timestamps of all historical games, yielding a score differential. Using %%bigquery magic, they pivoted to Colab, and explored the pace of play of games, creating time boundaries that isolated when teams went on a run in a game. From there, they created an expression of explosiveness, which will be used for game analysis during the tournament. You can read more about their analysis and insight at g.co/marchmadness.
Moving data is one essential task for enterprises today, especially if you’re using lots of different systems across many locations. One product for this is IBM MQ, which helps you move data dependably with secure messaging. Deploying a highly available IBM MQ cluster in the cloud is not a straightforward task, and IBM provides many clustering configurations you can use and combine in various ways to achieve your high availability goals. It’s challenging to deploy a cluster like this in a way that takes advantage of cloud’s benefits, like multiple zone availability, load balancers and vertical scaling. But once you’ve got it set up, you can safely move, integrate and process data from within applications across your organization quickly and securely, with multiple petabytes of data processed.
We’ve heard that you want to know how to use a tool like IBM MQ as part of your Google Cloud Platform (GCP) deployment, specifically Compute Engine. We’re pleased to introduce this guide to how to deploy a highly available IBM MQ Queue Manager Cluster on Compute Engine with GlusterFS. GlusterFS is an open-source, scale-out storage system that works well for this purpose because it is designed for high-throughput storage shared between instances and requires little effort to set up.
Using queue managers to build the IBM MQ cluster
In this solution, we recommend combining queue manager clusters with multi-instance queue managers (instances of the same queue manager configured on different servers) to create a highly available, scalable IBM MQ deployment. Multi-instance queue managers run in an active/standby configuration, using a shared volume for configuration and state data. Clustered queue managers share configuration information using a network channel and can perform load balancing on incoming messages. However, the message state is not shared between the two queue managers.
By using both IBM MQ cluster deployment models at once, you can achieve redundancy at the queue manager level and then scale by distributing the load of the IBM MQ cluster across one or more queue managers.You can see the architecture of the deployment in this diagram:
Brand-new or second-hand? Diesel or electric? Convertible or SUV? Buying a car means choosing from a plethora of options, and that can be hard for some people to navigate. As a result, retailers are constantly rethinking their technology offerings—which means digital transformation must move just as fast.
As the UK’s largest digital automotive marketplace and the country’s 16th largest website, Auto Trader UK prides itself on how simple it is for its consumers and retailers alike to buy and sell cars on their platform. To do it they rely on Google Kubernetes Engine (GKE) plus Istio, an open-source, transparent service mesh that is integrated into GKE. Istio has helped to enable visibility, increase agility and effectively secure their production environment, without sacrificing developer productivity.
Improving security and agility with GKE and Istio
Since 2013, Auto Trader has been a completely digital business, and they are now the UK’s market leader, with 55 million cross-platform visits every month and an audience four times larger than their nearest competitor. In total, they offer 300 applications including valuation tools, detailed reviews of dealerships and new cars, and integrations with car finance and insurance partners.
Auto Trader’s journey began 17 years ago on-premises with its own data centers. Then in 2018, they decided to move to the public cloud as part of their digital transition to create a more agile architecture that enables faster innovation. Their first choice was Google Cloud Platform (GCP).
As a part of this journey, Auto Trader moved their back-end applications to GKE and implemented Istio. They were looking for a trusted partner to off-load management of Kubernetes and they chose Google Cloud because, as Karl Stoney, Delivering Engineering Lead at Auto Trader put it: “Who could manage it better than the company that created it?” Many of the capabilities that Auto Trader were looking for come out-of-the-box with Istio, as it enables visibility into applications in terms of response times and other important service metrics.
“Over the last 14 months we have worked directly with Google’s Kubernetes product managers with ongoing access to the Google Cloud Istio teams,” says Russell Warman, Head of Infrastructure at Auto Trader. “From a business perspective, migrating to Google Cloud Platform means we can get ideas up and running quickly, enabling us to build brilliant new products, helping us to continue to lead in this space.”
Since adopting Kubernetes and Istio, Auto Trader has seen significant gains in efficiency. For example, they are 75 percent more efficient in terms of their compute resources, without impacting performance. Auto Trader has also lowered their monthly bill and can now predict future spending more accurately. Istio, meanwhile, has helped them improve security and visibility, with no extra developer effort or training needed.
Auto Trader is now planning to complete its migration to the public cloud. With about a third of its services already running in production on GCP, they plan to migrate their remaining workloads over the next year to ensure everything is built, managed and monitored in the same way.
Auto Trader are certainly in the driver seat when it comes to their Istio journey.
To find out more about the other benefits of migrating to GCP, both from an operational and development perspective, including improved security, see the Auto Trader UK case study.
Good technologies solve problems. Great ones deliver new ways to think about ourselves and the future.
Think of the rich worlds seen through the telescope and the microscope, or space explorations that have broadened the sense of our place in creation. There is that “bicycle for the mind,” as Steve Jobs called the personal computer. Romance, entertainment, and personal needs have been transformed by online life. In every case, the attributes of the machines fire and empower human imagination.
Cloud computing is another one of these great technologies. Today, we’re pleased to publish The Future of Cloud Computing, a look at some of the ways that the tools and attributes of cloud computing are transforming work, business, and markets.
Cloud computing, which has become a standard at many of the world’s largest companies, is much more than just a cheaper and easier way to access computers, storage, networks and software. The power and ubiquity of the cloud mean easy two-way interactions of data and analysis from virtually any point on the planet. Software innovations enable companies to work at a scale we could not have imagined even a decade ago.
The defaults of this computing architecture, particularly in public clouds like Google Cloud, are choice, flexibility, responsiveness, and a strong analytic capability. It’s notable that these are the same values that increasingly drive organizations, in everything from distributed teams, on-demand collaboration, and real-time customer service and product upgrades.
To take one example: large-scale clouds like Google Cloud, and smaller private clouds inside companies, increasingly use management software like Kubernetes and Istio, which seek to observe and manage lots of workloads, moving them efficiently through lots of computing hardware in the most standardized and automated way possible.
On one level, this is simply technology at its best, getting hassles out of the way so people can do more creative work. It’s no accident, though, that both products are open source, a transparent, collaborative, and high-velocity way of working that has come into its own in the cloud era.
Other examples draw on existing trends. In global manufacturing, the collaboration of outsourcing, partnering, subcontracting, and alliances is supercharged by the cloud. The transformation of our workspaces, with expensive closed-door offices giving way to cubicles, then open-plan offices and telecommuting, can be effected in new and better ways when both work products and communications tools reside in the cloud, accessible anywhere.
Mobility and the Internet of Things mean all kinds of sensors are capturing data, and products are better connected to the cloud, responding appropriately to new information. There are better forms of security and privacy protection. These are borderless values, designed for protection within the contexts of an overall systems, and not weaker localized computers.
There’s much more in the report, and I hope you check it out.
Where is this taking us? Our world has challenges and opportunities, but we have eliminated a few fatal diseases, educated people around the world with cloud-based remote learning, and look at videos from Mars. It’s possible to be quite an optimist, too.
In that spirit, cloud computing is about discovering more of life, reacting to it, and building on our discoveries faster and better. It’s about making more dreams real.
Click here to download the full report.
From a configuration, operation, and troubleshooting standpoint, the cloud has a lot of roads to the proverbial top of Mount Fuji. Figure 5 shows the different the different paths available. You are free to choose the one that best maps to your skill level and is most appropriate to complete the task at hand. While Google Cloud has a CLI-based SDK for shell scripting or interactive terminal sessions, you don’t have to use it. If you are developing an application or prefer a programmatic approach, you can use one of many client libraries that expose a wealth of functionality. If you’re an experienced programmer with specific needs you can even write directly to the REST API itself. And of course, on the other end of the spectrum, if you are learning or prefer to use a visual approach, there’s always the console.
In addition to the tools above, If you need to create larger topologies on a regular cadence you may want to look at Google’s Cloud Deployment Manager. If you want a vendor agnostic tool that works across cloud carriers you can investigate the open-source program Terraform. Both solutions offer a jump from imperative to declarative infrastructure programming. This may be a good fit if you need a more consistent workflow across developers and operators as they provision resources
Putting it all together
If this sounds like a lot, that’s because it is. Don’t despair though, there’s a readily available resource that will really help you grok these foundational network concepts: the documentation.
You are most likely very familiar with the documentation section of several network vendor’s websites. To get up to speed on networking on Google Cloud, your best bet is to familiarize yourself with Google’s documentation as well. There is documentation for high-level concepts like network tier levels, network peering, and hybrid connectivity. Then, each cloud service also has its own individual set of documentation, subdivided into concepts, how-tos, quotas, pricing, and other areas. Reviewing how it is structured and creating bookmarks will make studying and the certification process much easier. Better yet, it will also make you a better cloud engineer.
Finally, I want to challenge you to stretch beyond your comfort zone. Moving from network to the cloud is about virtualization, automation, programming, and developing new areas of expertise. Your journey into the cloud should not stop at learning how GCP implements VPCs. Set long term as well as short term goals. There are so many new areas where your skill sets are needed and you can provide value. You can do it; don’t doubt that for one minute.
In my next blog post I’ll be discussing an approach to structure your Cloud learning. This will make the learning and certification process easier as well as prepare you to do the cloud Network Engineer role. Until then, the Google Cloud training team has lots of ways for you to increase your Google Cloud know-how. Join our webinar on preparing for the Professional Cloud Network Engineer certification exam on February 22, 2019 at 9:45am PST. Now go visit the Google certification page and set your first certification goal! Best of Luck!
In contrast to a static algorithm coded by a software developer, an ML model is an algorithm that is learned and dynamically updated. You can think of a software application as an amalgamation of algorithms, defined by design patterns and coded by software engineers, that perform planned tasks. Once an application is released to production, it may not perform as planned, prompting developers to rethink, redesign, and rewrite it (continuous integration/continuous delivery).
We are entering an era of replacing some of these static algorithms with ML models, which are essentially dynamic algorithms. This dynamism presents a host of new challenges for planners, who work in conjunction with product owners and quality assurance (QA) teams.
For example, how should the QA team test and report metrics? ML models are often expressed as confidence scores. Let’s suppose that a model shows that it is 97% accurate on an evaluation data set. Does it pass the quality test? If we built a calculator using static algorithms and it got the answer right 97% of the time, we would want to know about the 3% of the time it does not.
Similarly, how does a daily standup work with machine learning models? It’s not like the training process is going to give a quick update each morning on what it learned yesterday and what it anticipates learning today. It’s more likely your team will be giving updates on data gathering/cleaning and hyperparameter tuning.
When the application is released and supported, one usually develops policies to address user issues. But with continuous learning and reinforcement learning, the model is learning the policy. What policy do we want it to learn? For example, you may want it to observe and detect user friction in navigating the user interface and learn to adapt the interface (Auto A/B) to reduce the friction.
Within an effective ML lifecycle, planning needs to be embedded in all stages to start answering these questions specific to your organization.
Data engineering is where the majority of the development budget is spent—as much as 70% to 80% of engineering funds in some organizations. Learning is dependent on data—lots of data, and the right data. It’s like the old software engineering adage: garbage in, garbage out. The same is true for modeling: if bad data goes in, what the model learns is noise.
In addition to software engineers and data scientists, you really need a data engineering organization. These skilled engineers will handle data collection (e.g., billions of records), data extraction (e.g., SQL, Hadoop), data transformation, data storage and data serving. It’s the data that consumes the vast majority of your physical resources (persistent storage and compute). Typically due to the magnitude in scale, these are now handled using cloud services versus traditional on-prem methods.
Effective deployment and management of data cloud operations are handled by those skilled in data operations (DataOps). The data collection and serving are handled by those skilled in data warehousing (DBAs). The data extraction and transformation are handled by those skilled in data engineering (Data Engineers), and data analysis are handled by those skilled in statistical analysis and visualization (Data Analysts).
Modeling is integrated throughout the software development lifecycle. You don’t just train a model once and you’re done. The concept of one-shot training, while appealing in budget terms and simplification, is only effective in academic and single-task use cases.
Until fairly recently, modeling was the domain of data scientists. The initial ML frameworks (like Theano and Caffe) were designed for data scientists. ML frameworks are evolving and today are more in the realm of software engineers (like Keras and PyTorch). Data scientists play an important role in researching the classes of machine learning algorithms and their amalgamation, advising on business policy and direction, and moving into roles of leading data driven teams.
But as ML frameworks and AI as a Service (AIaaS) evolve, the majority of modeling will be performed by software engineers. The same goes for feature engineering, a task performed by today’s data engineers: with its similarities to conventional tasks related to data ontologies, namespaces, self-defining schemas, and contracts between interfaces, it too will move into the realm of software engineering. In addition, many organizations will move model building and training to cloud-based services used by software engineers and managed by data operations. Then, as AIaaS evolves further, modeling will transition to a combination of turnkey solutions accessible via cloud APIs, such as for Cloud Vision and Cloud Speech-to-Text, and customizing pre-trained algorithms using transfer learning tools such as AutoML.
Frameworks like Keras and PyTorch have already transitioned away symbol programming into imperative programming (the dominant form in software development), and incorporate object-oriented programming (OOP) principles such as inheritance, encapsulation, and polymorphism. One should anticipate that other ML frameworks will evolve to include object relational models (ORM), which we already use for databases, to data sources and inference (prediction). Common best practices will evolve and industry-wide design patterns will become defined and published, much like how Design Patterns by the Gang of Four influenced the evolution of OOP.
Like continuous integration and delivery, continuous learning will also move into build processes, and be managed by build and reliability engineers. Then, once your application is released, its usage and adaptation in the wild will provide new insights in the form of data, which will be fed back to the modeling process so the model can continue learning.
As you can see, adopting machine learning isn’t simply a question of learning to train a model, and you’re done. You need to think deeply about how those ML models will fit into your existing systems and processes, and grow your staff accordingly. I, and all the staff here at Google, wish you the best in your machine learning journey, as you upgrade your software development lifecycle to accommodate machine learning. To learn more about machine learning on Google Cloud here, visit our Cloud AI products page.
Part and parcel of modern enterprise development is building APIs that enable you to expose your services to developers both inside and outside your organization. But just building APIs isn’t enough. Getting APIs and API programs to market successfully hinges on convincing your developers to actually use them. And the key driver of getting developers to adopt and consume APIs, both within a company or among the wider developer community, is the developer portal.
To help enterprises create great developer experiences, we’re announcing several enhancements to the Apigee Developer Portal, a comprehensive, customizable solution that helps API providers seamlessly onboard developers and admins who use APIs managed by Google Cloud’s Apigee platform. Here is what’s included in this round of updates:
- A new version of SmartDocs API reference documentation
- An enhanced theme editor and redesigned default portal theme
- Improvements to managing developer accounts
Apigee’s SmartDocs automatically creates beautiful API reference documentation for your developers, and features a new, three-pane view. The left pane helps developers navigate between areas of the API, while the center pane gives detailed documentation for a given operation. The right pane enables you to make API requests directly from the docs, and it includes an “expand” button so you can focus on the details of the request itself.
Documentation is built upon on the OpenAPI Specification and supports both versions 2.0 and 3.0.x. Every operation defined in the OpenAPI spec gets its own page, which makes it easy for users to share and discuss specific areas of the docs and for your API team to deep-link users to the exact content they need.
We’re pleased to announce that The Site Reliability Workbook is available in HTML now! Site Reliability Engineering (SRE), as it has come to be generally defined at Google, is what happens when you ask a software engineer to solve an operational problem. SRE is an essential part of engineering at Google. It’s a mindset, and a set of practices, metrics, and prescriptive ways to ensure systems reliability. The new workbook is designed to give you actionable tips on getting started with SRE and maturing your SRE practice. We’ve included links to specific chapters of the workbook that align with our tips throughout this post.
We’re often asked what implementing SRE means in practice, since our customers face challenges quantifying their success when setting up their own SRE practices. In this post, we’re sharing a couple of checklists to be used by members of an organization responsible for any high-reliability services. These will be useful when you’re trying to move your team toward an SRE model. Implementing this model at your organization can benefit both your services and teams due to higher service reliability, lower operational cost, and higher-value work for the humans.
But how can you tell how far you have progressed along this journey? While there is no simple or canonical answer, you can see below a non-exhaustive list to check your progress, organized as checklists by ascending order of maturity of a team. Within every checklist, the items are roughly in chronological order, but we do recognize that any given team’s actual needs and priorities may vary.
If you’re part of a mature SRE team, these checklists can be useful as a form of industry benchmark, and we’d love to encourage others to publish theirs as well. Of course, SRE isn’t an exact science, and challenges arise along the way. You may not get to 100% completion of the items here, but we’ve learned at Google that SRE is an ongoing journey.
SRE: Just getting started
The following three practices are key principles of SRE, but can largely be adopted by any team responsible for production systems, regardless of its name, before and in parallel to staffing an SRE team.
Beginner SRE teams
Most, if not all, SRE teams at Google have established the following practices and characteristics. We generally view these as fundamental to an effective SRE team, unless there are good reasons why they aren’t feasible for a specific team’s circumstances.
The following practices are also common for SRE teams starting out. If they don’t exist, that can be a sign of poor team health and sustainability issues:
Intermediate SRE teams
These characteristics are common in mature teams and generally indicate that the team is taking a proactive approach to efficient management of its services.
Advanced SRE teams
These practices are common in more senior teams, or sometimes can be achieved when an organization or set of SRE teams share a broader charter.
Another set of SRE “features” which may be desirable but unlikely to be implemented by most companies are:
- SREs are not on-call 24×7. SRE teams are geographically distributed in two locations, such as U.S. and Europe. It’s worth pointing out that neither half is treated as secondary.
- SRE and developer organizations share common goals and may have separate reporting chains up to SVP level or higher. This arrangement helps to avoid conflicts of interest.
What should I do next?
Once you’ve looked through these checklists, your next step is to think about whether they match your company’s needs.
For those without an SRE team where most of the beginner list is unfilled, we’d highly recommend reading the associated SRE Workbook chapters in the order they have been presented. If you happen to be a Google Cloud Platform (GCP) customer and would like to request CRE involvement, contact your account manager to apply for this program. But to be clear, SRE is a methodology that will work on a huge variety of infrastructures, and using Google Cloud is not a prerequisite for pursuing this set of engineering practices.
We’d also recommend attending existing conferences and organizing summits with other companies in order to share best practices on how to solve some of the blockers, such as recruiting.
We have also seen teams struggling to fill out the advanced list because of churn. The rate of systems and personnel changes may be a deterrent to get there. In order to avoid teams reverting to the beginner stage and other problems, our SRE leadership reviews key metrics per team every six months. The scope is more narrow than the checklists above because several of the items have now become standard.
As you may have guessed by now, answering the central question in this article involves addressing and attempting to assess a given team’s impact, health, and most importantly, how the actual work is done. After all, as we wrote in our first book on SRE: “If we are engineering processes and solutions that are not automatable, we continue having to staff humans to maintain the system. If we have to staff humans to do the work, we are feeding the machines with the blood, sweat, and tears of human beings.”
So yes, you might have an SRE team already. Is it effective? Is it scalable? Are people happy? Wherever you are in your SRE journey, you can likely continue to evolve, grow and hone your team’s work and your company’s services. Learn more here about getting started building an SRE team.
Thanks to Adrian Hilton, Alec Warner, David Ferguson, Eric Harvieux, Matt Brown, Myk Taylor, Stephen Thorne, Todd Underwood and Vivek Rau among others for their contributions to this post.
To operate machine learning systems at scale, teams need to have access to a wealth of feature data to both train their models, as well as to serve them in production. GO-JEK and Google Cloud are pleased to announce the release of Feast, an open source feature store that allows teams to manage, store, and discover features for use in machine learning projects.
Developed jointly by GO-JEK and Google Cloud, Feast aims to solve a set of common challenges facing machine learning engineering teams by becoming an open, extensible, unified platform for feature storage. It gives teams the ability to define and publish features to this unified store, which in turn facilitates discovery and feature reuse across machine learning projects.
“Feast is an essential component in building end-to-end machine learning systems at GO-JEK,” says Peter Richens, Senior Data Scientist at GO-JEK, “we are very excited to release it to the open source community. We worked closely with Google Cloud in the design and development of the product, and this has yielded a robust system for the management of machine learning features, all the way from idea to production.”
For production deployments, machine learning teams need a diverse set of systems working together. Kubeflow is a project dedicated to making these systems simple, portable and scalable and aims to deploy best-of-breed open-source systems for ML to diverse infrastructures. We are currently in the process of integrating Feast with Kubeflow to address the feature storage needs inherent in the machine learning lifecycle.
Feature data are signals about a domain entity, e.g: for GO-JEK, we can have a driver entity and a feature of the daily count of trips completed. Other interesting features might be the distance between the driver and a destination, or the time of day. A combination of multiple features are used as inputs for a machine learning model.
In large teams and environments, how features are maintained and served can diverge significantly across projects and this introduces infrastructure complexity, and can result in duplicated work.
- Features not being reused: Features representing the same business concepts are being redeveloped many times, when existing work from other teams could have been reused.
- Feature definitions vary: Teams define features differently and there is no easy access to the documentation of a feature.
- Hard to serve up to date features: Combining streaming and batch derived features, and making them available for serving, requires expertise that not all teams have. Ingesting and serving features derived from streaming data often requires specialised infrastrastructure. As such, teams are deterred from making use of real time data.
- Inconsistency between training and serving: Training requires access to historical data, whereas models that serve predictions need the latest values. Inconsistencies arise when data is siloed into many independent systems requiring separate tooling.
Feast solves these challenges by providing a centralized platform on which to standardize the definition, storage and access of features for training and serving. It acts as a bridge between data engineering and machine learning.
Feast handles the ingestion of feature data from both batch and streaming sources. It also manages both warehouse and serving databases for historical and the latest data. Using a Python SDK, users are able to generate training datasets from the feature warehouse. Once their model is deployed, they can use a client library to access feature data from the Feast Serving API.