Rogue Tech Talks: November, 2019 – Rogue Tech Talks ©

~ TRADITIONAL SYSTEM LAUNCH PROCESS ~
Gunnar Engelbach began his presentation by describing the traditional manner in which I.T. Departments launch new computer systems: try to predict what your workload is going to be, buy a whole bunch of equipment and then, months later when everything arrives, you set it up and install the operating system, you set up the Service Level Agreements (SLAs), you put the equipment on the floor and install the software, you test it out and then, when the rush hits, you hope you’ve got enough compute power to handle it.

And, because there’s such a huge lag time, you have to seriously over-purchase to be able to handle that predicted future load. That’s the Cloud Proposition … by not having to buy (invest in) all that stuff ahead of time, and being able to set up a virtual server—tomorrow—you don’t over-purchase your I.T. resources. You buy what you need when you need it.

~ HOSTED SERVERS ~
That mindset provided incentive to go to hosted servers. We got away from having on-premises servers and moved to hosted services.

~ GOING SERVERLESS ~
The next step beyond virtualization is serverless. You still have a host server out in the cloud. You still have to apply patches and keep them updated and perform all the usual maintenance chores.

The new paradigm shift away from having virtualization is going serverless; removing the need to actually manage a server. All you care about is the software and services provided.

~ COSTS ~
Q: You said you could move-up as your needs increase. Do your costs get scaled, also?
G: Yes.

Q: OK. Is it by the package or…?
G: It depends on which way you’re going. If you’re going the traditional server-based route— and I’ll speak specifically to AWS—they call them ECC (“EC²”) Instances; Elastic Compute Cloud. They put “Elastic” into all of their names, for some reason.

C: Just to be flexible…
G: You’re still basically buying a server. You’re specifying how many processors, how much memory, your network throughput, your hard drive, all your resources. And they package these resources. For example, a T3 Large has two processors, 8 GB of RAM, and will handle up to 10 network speeds. They’re tiered and they’re discrete. It’s not like you can say I want four processors and a certain amount of RAM … you must select from the pre-defined resource packages and they will charge you based upon the package you select, as a unit cost.

This particular one, T3 Limited, will cost a penny per hour, for example. That’s the more traditional server-based way to go. When you move to serverless, they completely change the way they charge for resources.

~ SERVERLESS SERVICES ~
More specifically, when you go serverless in the AWS World, this [reference slides] is an overview of their various services that support that; they’re all built around the serverless idea.

Some recognizable things like Docker and other stuff. [For details, see Slide #2 in .pdf attachment link at the end of this post.] Most of this stuff is for supporting the compute environment; the timing for building applications and services. They all still function in a serverless fashion, which is to say no servers to maintain, no operating system to keep patched, none of that stuff.

~ DOCKER ~
A quick overview of Docker: Docker is a virtual machine (or image) that’s stripped-down a bit to run a particular service you define for thatmachine. It might be a reduced set of Ubuntu with the necessary libraries, or Apache or whatever you need for that particular image as a service for it. The benefit is when a developer makes a Docker Image, the development environment becomes the production environment so, once that image is placed into production, it works. None of this old stuff with the developer saying, “well, it worked on my machine … I don’t know why it doesn’t work in test.”

AWS supports Docker and a lot of the traditional ways. There’s Docker Hub. You can find Docker pre-built images [sort of a library of Docker Images]. Amazon maintains those for their use, so when you make your own Docker Image, you can put it into ECR (Elastic Container Registry) and keep track of your own Docker Images that you build that way. ECS is for running them.

They have Kubernetes. They have another thing called Fargate for managing your Docker Images that are running. It’s kind of like their (Amazon’s) take on Kubernetes. You can do Kubernetes or you can do Fargate; they’re both serving kind of the same purpose. BUT…

~ LAMBDA ~
The new way is: Lambda. Lambda takes the idea of Docker and strips it down to the bare minimum. Take the idea of a Docker Image–a reduced virtual machine—and take away everything you don’t absolutely need, until you get it as small as possible, and you gear it towards running a single function. You’re not running a whole service off of it like a Web service, you’re running a single function. So if you’re looking at something like a Web API, when somebody does a get, or getA, you go to one Lambda Image, or getB, you go to another Lambda Image, etc. They are all implemented in separate Lambda functions. And, because they’re all stripped-down so small, the startup times for them is really, really small.

It’s not like booting your computer. It simply pops-up and runs and then it’s done. And, because it’s so fast to execute, the way they use them is for every single request that comes in to use a Lambda Function, that request gets its own Lambda Function. The Lambda Function processes that request, does whatever the Lambda Function was written to do and then exits. Which means that, if a million people are doing simultaneous requests, they each get a separate instance of that Lambda Function. So they’re all populating independently. For that reason, it scales infinitely, up the point of AWS’s computer images.

Q: So, if I’m running something and you’re running something, my Lambda Image is exactly what I need it to be and yours is exactly what you need it to be?
G: Not only that, but if I write an image for doing some function for my website, every person gets a separate image for themselves. So my website now scales, infinitely.
C: Wow!

G: A separate function, on separate hardware, anywhere in the AWS Cloud. And there are some side benefits of that: If you’ve got a bug in your Lambda Function, it doesn’t take down the whole system; it can’t be used as an attack vector (it can, but not as easily). That data input might crash that one user request but everybody else is making requests … no problem.
C: Sweet.

G: In fact, this is part of the AWS Security Model, which is because computing functions are disposable, we no longer worry so much about securing your server; we simply tear it down and bring up a new one—and your attacker is gone. So, get a Trojan planted, shut it down, start a new instance; it’s gone.

~ LAMBDA PRICING ~
Lambda Pricing is based on CPU Seconds. It’s based on how much memory you want to make available to your Lambda Function; this will determine the Pricing Tier. And the pricing is per 100 milliseconds; IOW, it’s fractions of a fraction of a fraction of a penny per 100 milliseconds. Plus, you get 3,200,000 milliseconds free per month, if you choose the small memory module, all the way up to the 3008 Gig Memory Module [see Slide #5], you’ve still got 136,000 milliseconds free. So you have to build a pretty high-production website before you start getting charged for using a Lambda Function.

Q: That’s for each Lambda Function, right?
G: Yes. That’s total compute time for Lambda functions.

Q: So if you have a Lambda end point for every get service of your API , you would have that times however many Lambdas you have, right?
G: Yes. So it behooves you to write an efficient Lambda Function.

Q: Per instance, right? Oh, but it’s per 100 milliseconds.
G: Yeah. It’s per Lambda Function and per instance; total amount of Lambda usage per month.

Q: But this is still added-on to some base user level?
G: Nope. Lambda is completely free until you hit this threshold.
C: Whoa!

~ BUILDING YOUR LAMBDA FUNCTION ~
G: So, me setting up practice websites or just things that don’t get used much … there’s never any charge, at all. If you do a high-volume website that’s getting a lot of hits on a lot of Lambda functions, you’re going to end up eventually getting charged. And, you’re going to get charged two-millionths of a penny per 100 milliseconds. Lambda is part of it. Basically, you write a single method; a single function. But it’s no longer part of a program; you have to glue things together and actually build something. What they do is provide a lot of different services—again, in a serverless environment—to make that possible starting with data storage. All of those on the first line [see Slide #6] are types of databases; MySQL and PostGres. Aurora is Amazon’s take on a database, which is compatible with either MySQL or PostGres, depending on how you want to use it. And it has some auto-scaling and multi-zone capabilities. DynamoDB is their version of Mongo, so it’s a document database. It’s very fast. It’s also used intrinsically in a lot of stuff you see as part of the Amazon Services.

~ STORAGE ~
Then there’s Storage. EFS is Elastic File System; it’s basically an NFS file mount. S3 (Simple Storage Service) is the core of their database system; it kind of acts like a database where you store objects; it’s an object database where an object in this case is usually a funnel. But, again, it’s kind of core to everything. And they recently added FSx (third party file systems), which is another type of file system, depending on what kind of operating system you’re using, is either Lustre if you’re on a Linux System or it’s Windows File System. It’s fast as hell. It’s also very expensive.

~ DNS ~
Then they provide DNS for you, through Route 53 and, again, because you’re inside the Amazon Environment, Route 53 isn’t just DNS; it does a lot of things specific to Amazon. If you happen to be at the point where you register your domain through Amazon and it’s in Route 53, there are advantages you can take because of that which you can’t do otherwise.

Let’s go with an example of Lambda Functions to implement back-end compute for a website. You’ve got to have a front-end for that website and they provide that for you; that’s the API Gateway. This is basically like a RESTful Service you set up and then, behind the scenes, you tell it what it’s going to forward requests to. It also handles SSL Certificates: you simply register a certificate, for free, and it will do https for you. Then you simply set up routes: when a request comes in, send it to this function … or, this server … or, this database … or, whatever you want to do. API Gateway takes care of that.

Whether you’re using EC2 Instances in traditional server mode or you’re going serverless, the API Gateway will automatically do load balancing for you. Plus session tracking, so if you’re doing something that requires session authentication with a session key, API Gateway will track where your session key is registered and ensure that’s the service you’re always talking to so your session key remains valid.

~ LOAD BALANCING ~
Load Balancing [see Slide #6] which goes to API Gateway … and all that other stuff you might want to watch on the back end: Cloud Watch, Cloud Trail are basically for watching logs and notification services. Guardrail is where you set up the limits for where you want the service to run in, kind of security-wise; that can be a wide range of things from amount of memory to usage to permissions. If it ever goes outside of the bounds you set, this controls that situation.

Inspector is a security-scanning service. You never have to set up a server to do any of this stuff; it’s all just a service that you add to your application.

Q: Does the load-balancing allow you to cross zones? Different Data Center Zones.
G: Amazon is organized on two major levels: Availability Zones and within those, Regions. Oregon has a Region: there’s one in Portland-or-Salem.

C: I’m pretty sure the load-balancing is regional. We rent a 3-Zone Kubernetes Cluster and have a Type 1 Load Balancer for that. I don’t know about cross-region. They do have stuff built-in for the DNS Resolution so you get routed to the right region.

C: That may be what I was thinking about…

G: They also have Cloud Front; your main server might be in U.S. West or East or Northern Virginia or Oregon or Ohio, to name some of the Data Centers. But your customers might be in Australia and you want to have faster response for them. Cloud Front is like a CDM; it replicates … it basically builds a cache of common stuff with a Web Server in that region that’s local to those users, so they get really fast response times and then it manages the cache. When users ask for something, if it isn’t cached, it goes back to your main server but also checks to make sure that what is cached is up there, plus provides additional functions like if somebody wants to upload a file, you use Cloud Front so their upload is local so their Internet access is fast, and the rest of the upload goes over the Amazon Network which is much faster, so that’s a hell of a lot cheaper than uploading something from Australia to Virginia, or uploading from Australia to Australia and then going over the backbone for the rest of it.

~ ORCHESTRATION ~
Orchestration [see Slide #7], or how we get these different elements coordinated.

Lambda Function is one function but how do you get multiple Lambda Functions to operate together as a service?

That’s where these Orchestration things come in.

~ SNS (SIMPLE NOTIFICATION SERVICE) ~
SNS is a notification service. The target of the notification service can be a lot of things like Simple Queue Service. What we typically use it for is email notification. For example, I’ll go to Cloud Watch … I’ll set up an alert based on some criteria like server running out of memory and then it will send a message to an email list so they’ll get notification to go check on the server. That can also be mobile or SNS.

~ SQS (SIMPLE QUEUE SERVICE) ~
SQS is really the one that comes in handy for something like Lambda. It’s a queueing service; a FIFO. You put a message in and tell it what the destination of that message is and at the other end, that service, if it is a Lambda Function, gets notification that there’s a new message and spins-up a Lambda Function, the Lambda Function pulls the message off, takes whatever action is in that message, which could be sent on down another queue to a service.

So, in this manner, you paste together a bunch of things. And, because the end point of the SNS or SQS doesn’t have to be a Lambda Function … it could be a database … it could be a lot of different services that Amazon has, you end up pasting together a bunch of stuff to make an application.

~ DEVELOPMENT SUPPORT ~
Then there are other things for supporting the development of all this stuff.

~ CLOUD9 ~
Cloud9 is an IDE that looks a lot like ATOM, if you’ve used that, or Microsoft #Code is an offshoot of ATOM that’s basically Cloud9 only it’s got some hooks into things like Lambda and other things to make it easier to develop and push AWS-specific stuff.

X-Ray for debugging.

Then CI/CD stuff.

~ CodeBuild ~
CodeBuild; it basically uses Docker Images to do a compile and, because it uses a Docker Image, if you’ve got an odd thing you want to compile, or some different requirements, build your own Docker Image, register it with your Code Build pipeline and let it build some stuff for you.

~ CodePipeline ~
There’s CodePipeline that ties these things together. So you have a custom Docker Image and, during a compile, you might have another Docker Image for doing the test. CodePipeline simply links them together.

For example: Do this. If that succeeds, do this. In that whole process, you set up the CI/CD Pipeline and, because I have CodePipeline, you can have it triggered by something like a commit to Amazon’s Git or it can actually monitor GitHub for you. So, do a commit, kick out all of these things, and the final output of the process might be success to ship to the test environment.

Q: Are there any challenges with revisions when you’re working with so many different components everywhere? Like upgrades affecting your process because the receiving service isn’t up to the current rev? Like two different versions of Docker or something?
G: A lot of this stuff, like Docker Registry can be used for taking other actions. Anyhow, it’s how you set it up.
C: OK.

G: Generally, what we do is: if there’s a commit, you want to do a build on the whole chain; if successful, we push to a test environment. And in fact, because a test environment is simply something you’ve defined in code, you can dynamically build the test environment when the event happens.

The big revolution of AWS really is Infrastructure as Code (IaC). Because your entire environment can be described in code, and AWS publishes SDKs in multiple languages for PaaS to go back and write a Lambda Function, it can be Java, it can be Python, it can be Node, it can be DotNet, it can be Ruby; a wide variety of choices. It’s the same Infrastructure as Code. You can write a #Code that calls the AWS APIs which defines your network and security parameters, services, etc., and it’s simply software that defines all of this. It’s an interesting shift to wrap your head around; it really is a game-changer.

~ CodeStar ~
Q: How’s CodeStar compared to the IDE?
G: Let’s start with Step Functions. You’ve got this definition. I’ve got a function here. I’ve defined a Lambda Function. I’ve got a database. I’ve got some sort of logic that chains them together which I’ve defined with an SQS. Something happens. What you end up with is a map of stuff that’s going to happen: this function … then this function … this message goes here when this happens … and ends up there. Because that’s a common thing to do, Amazon said they should define a way to define that; that’s what a Step Function is.

You use YAML (Yet Another Markup Language; human-readable data-serialization language), basically, to define the architecture of what you’re doing and then the YAML is executed and Amazon builds the process. So all you do is write the YAML and Amazon builds it. When I want to set up a test environment when an event happens, I define my test environment using YAML and then, when I want to test something, here’s the YAML, build the test environment, execute this [series of instructions] and shut it down.

CodeStar is related to that sort of thing. They’ve got things like when you’re setting up traditional services like EC2 Instances, CloudFront and all of that, instead of having to go through and do all of that yourself, they set up a system for helping them through it. And that’s what CodeStar is but it’s a serverless environment.

~ STEP FUNCTIONS EXAMPLE ~ [Reference Slide #10.]

On the right is the YAML, on the left is a flowchart of what this Step Function will do. It’s not really showing it but in this process, any of those steps can be a Lambda Function, a Docker Image, a traditional EC2 Image, a database tied together with SQS, whatever happens to work. You describe it all using YAML, including some of the logic. For example, when you see this, it goes to that Lambda Function and determines success or failure, etc.

One of the things I’ve worked with a lot that’s based on this sort of thing is called a P-Cluster; a Parallel Cluster. It uses Step Functions, SQS, DynamoDB, all of that stuff, to set up a large compute environment, a huge cluster of any size you define and on the back end is some Python Code, creating a config file of about 5 lines in there that specify what I want, defining the type of servers I want as compute nodes or the head node, how much storage I want on them for shared storage—local and shared—and then I simply run P-Cluster and it goes out and builds the cluster. Tens, hundreds, thousands of compute nodes, if you want. Sets it all up … and, again that’s all going back to this concept of Infrastructure as Code. Out of a config file, I can build a massive compute cluster and do huge oil and gas research, genomic research, whatever takes that much compute power. And simply because of the way it works, it’s also dynamic. One of the things about Amazon is: you only get charged when a node is running and doing an action. If I set up a compute node that has potentially a cluster with 1,000 compute nodes, the only node that’s going to be running all the time is the head node where you submit a job. Once you submit a job, Amazon automatically spins-up the compute nodes, because they’re all coming out of pre-defined images. This takes a couple of minutes, registers them with the head node, starts the job, the job finishes, and the compute nodes shut down. I only end up using the compute time I use, instead of having idle computers sitting around for the duration, waiting for a job.

Done.

Q: What’s in it for them (Amazon)?
G: This is their biggest money maker. They’re up-front about it. They have calculators that help users determine their [projected] costs. If you do a little work, you know exactly what kind of expense you’re getting into; there should be no surprises. And, in fact, their billing dashboard is really detailed and shows you exactly what you’re paying for and when the cost was incurred.

Q: I’m sort of thinking about the fact that Amazon wants your data.
G: That’s part of it.

C: OK.
G: You get sucked-in to the Amazon Environment.

C: I understand that, too. But, as far as how you set up security, is that on the user?
G: Yes and no. Amazon is actually, by default, a very secure environment. There’s a whole system they call IAM which is a set of tools for security management. The defaults, if you do them, are going to be secure which also means inaccessible. You have to go out of your way to declare that you want to have data publicly accessible, or you want server A to be able to talk to server B. In fact, they had a lot of certifications hit them. The official internal Amazon motto is: Security is Job Zero. Security is the thing they take most seriously.

Q: I’ve seen stories in the news that the CIA and/or the FBI or the government in general, is going to go with Microsoft instead of Amazon.
C: That’s for a DOD Contract.

C: One specific contract … worth $10 billion dollars over the next decade.

Q: Isn’t that the start of something? Isn’t that going to set a precedent? Or, not necessarily?

Q: Who has the advantage in Cloud Computing?
G: Amazon, by a huge margin. They are way ahead of everybody.

Q: So, you’d make a bet on Amazon?
G: Yes.

C: I was shocked to see Microsoft in the mix.
G: Especially if you look at the pricing structure. Amazon is probably ten times cheaper than Microsoft for a compute node. There’s a ridiculous price difference. Plus, Microsoft is very limited just in operating system selection and other stuff.

C: AWS started off as an experiment at Amazon because they needed the infrastructure for running their store so they experimented with selling, basically, access to their computers, and it’s now a huge amount of their revenue.
G: It’s their biggest single source of their revenue. They give some stuff away for free but they’re not losing money.

C: No, absolutely not.

C: But Netflix runs on Amazon … it makes it easy for a startup company to grow.

C: That’s what’s so impressive, to me.

C: Yeah. Because you don’t have to invest in hardware; you can simply rent as much as you need.

C: And, not only that, you don’t have to learn about hardware.
G: Actually, Cloud9 is not part of the free tier. Code Build and CodePipeline are not part of the free tier; those things do end up costing you, and they can be pretty expensive. And, storage is probably your biggest expense. Storage and Database.

C: That makes sense … that’s fair.
G: They have this thing … I refer to it as free tier that’s part of getting people drawn-in. When you create a new Amazon account, some things are going to be free for a year; basic EC2 Instance, free storage, the basic stuff. They stop being free after a year. It’s great for a startup … they’ve got a year to make this thing productive and produce revenue and, after that year, Amazon is going to start charging you for it.

C: But you don’t even want to log on or start an account until you know what you’re going to do.
G: OK. Well, you create a new account, start up, shut down, create another account. But, as an example, the old company I had we used to do a hosted service on a shared server for our website, cost us about $200 a month for a crappy server which was shared. We moved that to Amazon, on a basic system and the response is hugely better. The website acts fantastically now, and it costs us $20 a month. We’re out of the free tier.

Q: Are you doing anything with serverless?
G: Yes. Actually, simply basic stuff. The last thing I did, one simple Lambda Function that monitors GitHub because I want to be notified when certain genomic projects get a new release. It turns out there’s a Python SDK, actually GitHub has an API, a restful API for querying that so you just plop it into Lambda with a function to check it periodically. What triggers a Lambda Function? It could be an event arriving via SQS, it could be an event arriving from the API Gateway or from another Lambda Function, or it can be just another Lambda Function triggered by something else. If you go to Cloud Watch, it has a list of possible things that can trigger a Lambda Function, and one of them is basically a cron job. I basically set a Lambda Function to check daily this list of projects to see if they’ve been updated on GitHub since last I checked, and then use SMS to send me a notification. When you get down to it, a couple of lines of Python, a couple of libraries, and Amazon triggered off of a cron job.

The other, more interesting thing we’re doing is called GRASP; it’s a genomic research project. There’s a huge repository of genomic data, and this is all about pathogens like salmonella, that the government maintains at NCBI (National Center for Biotechnology Information). But for other purposes, we like to grab their genomic data and then defer the process so we can do things like mapping outbreaks over time or other kinds of data mining, plus you have the meta data associated with that. We use Step Functions, Lambda, Docker Images, and batch … plus they’ve got a graph database called Neptune. Then you’re looking at genomic data and trying to map relationships between genomes, and a graph database works perfectly for that. This is a new project we’ve started. That’s serverless with Docker.

Q: How do you find out about salmonella outbreaks from genomic data?
G: I support the people who do that. If you see news of an outbreak somewhere, the FDA has gotten a biological sample of the pathogen inolved and, among other things, are sequencing its genome as part of their research efforts; it shows up in the lab at CFSAN (FDA’s Center for Food Safey and Applied Nutrition) and they sequence it and do a bunch of stuff with it which I couldn’t tell you because I don’t know, and at some point, submit their results to NCBI to maintain this huge database of that stuff.

Q: Why do you trust the security of all of these pieces you assemble into images?
G: Some of it is working with Amazon enough to see how their security actually works and most of it is from the frustration of working with it and having to troubleshoot issues and discover why a particular thing does not work and discovering that it’s Amazon Security getting in the way … I have to go into IAM and fix my security settings. When you actually use it and find that you have to go out of your way to make things connect the way that you want to, you realize that there’s some serious security here. The other thing is: they make a bunch of security tools available, for free. You want to do Port Scanning? Security Compliance Scanning? Known Vulnerability Scanning? The tools are there; use them.

Q: Have there been any instances of someone actually breaking Amazon Security, because I think that would make headlines.
G: I’m sure there are…

C: Or, maybe they quash it.
G: I can’t think of any, but I’m sure there are.

C: The ones I’ve heard of haven’t been Amazon’s fault; including passwords being compromised. The Capital One breach.

C: Years ago, there were some vulnerabilities in the M Stack…

C: But, basically, what you’re seeing is a long history of rock-solid security.

C: Yeah.

G: For the most part.

C: There are definitely ways it could be exploited … Is it the hardware, spectre, the machine-level stuff? The Intel chip? ARM? The answer to that was basically disable certain instructions in the CPUs which makes them run a lot slower … like six times slower. There’s probably ways around that but, with Amazon, it would be extremely random … and depend upon which machine you got on, if you’re trying to exploit something.

C: You’d have to simply get lucky to get on the intended target machine; you couldn’t target somebody else’s machine. It would be much easier to social-engineer something.
G: That’s always been the case. Amazon is the easiest hack.

C: If you’re looking for data from Capital One, for example.
G: Then you attack the application. Did they protect against SQL Injection Attacks?

C: Right.

G: There’s a list of things that are considered machine services (referring to slides). Multiple languages, Speech Recognition, TransGuide text to speech, translate between languages, photo, more AI.
C: That’s impressive.

Q: Is that what Sage Maker is with the neural networks?
G: It’s one of the possible models for Sage Maker.

G: You can probably do things like Monte Carlo and other types of analysis as well. One of them, in fact, associated with that as well is managed and non-managed curated data sets. But if you don’t, they have a tool for helping you create a curated data set vs. a non-curated one.

Q: Is that equivalent to a Training Set?
G: Both are Training Sets; the curated ones work better because you actually have this type of this so it’s easier to draw this type of inference once you start training it, as opposed to just throwing a bunch of data at it.

C: Good stuff.

Rob, Jon, Joe, Dave, Gunnar. Copyright © 2019, FPP, LLC. All rights reserved.

Slides