POPULARITY
On this episode, we are joined by special co-host Hugh Evans and returning guest Will Xu as we announce Druid Summit 2024 and dive into Druid 30.0's new features and enhancements. Improvements include better ingestion for Amazon Kinesis and Apache Kafka, enhanced support for Delta Lake, and advanced integrations with Google Cloud Storage and Azure Blob Storage. Come for the technical upgrades like GROUP BY and ORDER BY for complex columns and faster query processing with new IN and AND filters, stay for the stabilized concurrent append and replace API for late-arriving streaming data. We also explore experimental features like the centralized data source schema for better performance. Tune in to learn about the latest on arrays, the upcoming GA for window functions, and the benefits of upgrading Druid! To submit a talk or register for Druid Summit 2024, visit https://druidsummit.org/
Apache Flink is an open-source framework and distributed processing engine designed for data analytics. It excels at handling tasks such as data joins, aggregations, and ETL (Extract, Transform, Load) operations. Moreover, it supports advanced real-time techniques like complex event processing.In this episode, Deepthi Mohan and Nagesh Honnalii from AWS discussed Apache Flink and the Amazon Managed Service for Apache Flink (MSF) with our host, Alex Williams. MSF is a service that caters to customers with varying infrastructure preferences. Some prefer complete control, while others want AWS to handle all infrastructure-related aspects.Use cases for MSF can be grouped into three categories. First, there's streaming ETL, which involves tasks like log aggregation for later auditing. Second, it supports real-time analytics, enabling customers to create dashboards for tasks like fraud detection. Third, it handles complex event processing, where data from multiple sources is joined and aggregated to extract meaningful insights.The origins of MSF trace back to the evolution of real-time data services within AWS. In 2013, AWS introduced Amazon Kinesis, while the open-source community developed Apache Kafka. These services paved the way for MSF by highlighting the need for real-time data processing.To provide more flexibility, AWS launched Kinesis Data Analytics in 2016, allowing customers to write code in JVM-based languages like Java and Scala. In 2018, AWS decided to incorporate Apache Flink into its Kinesis Data Analytics offering, leading to the birth of MSF.Today, thousands of customers use MSF, and AWS continues to enhance its offerings in the real-time data processing space, including the launch of Amazon MSK (Managed Streaming for Apache Kafka). To align with its foundation on Flink, AWS rebranded Kinesis Data Analytics for Apache Flink to Amazon Managed Service for Apache Flink, making it clearer for customers.Learn more from The New Stack about AWS and Apache Flink:Apache Flink for Real Time Data AnalysisApache Flink for Unbounded Data Streams3 Reasons Why You Need Apache Flink for Stream Processing
En este episodio hablaremos del servicio Amazon Kinesis y sus diferentes componentes, Amazon Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics y Kinesis Vide Stream Material Adicional: https://aws.amazon.com/kinesis/
Neste episódio entrevistamos o Kishore Gopalakrishna, Co-Fundador e CEO da empresa StarTree, Luan Moreno e Mateus Oliveira batem um papo com o co-criador dessa poderosa ferramenta chamada Apache Pinot.O Pinot é um OLAP DataStore desenvolvido para responder consultas analíticas com tempo de resposta na casa dos milissegundos, podendo ser considerado um banco de dados para consultas em tempo-real. Capaz de ingerir de fontes de dados em Batch (Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage), bem como fontes de dados em Stream (Apache Kafka, Apache Pulsar, Amazon Kinesis).O Pinot foi projetado para executar consultas OLAP em tempo real, com baixa latência em grandes quantidades de eventos para entregar o conceito de User-Facing Analytics.Foi criado e desenvolvido por engenheiros do LinkedIn e do Uber e projetado para escalar e expandir sem limites.Apache PinotKishore GopalakrishnaStarTree Luan Moreno = https://www.linkedin.com/in/luanmoreno/
About SamSam Nicholls: Veeam's Director of Public Cloud Product Marketing, with 10+ years of sales, alliance management and product marketing experience in IT. Sam has evolved from his on-premises storage days and is now laser-focused on spreading the word about cloud-native backup and recovery, packing in thousands of viewers on his webinars, blogs and webpages.Links Referenced: Veeam AWS Backup: https://www.veeam.com/aws-backup.html Veeam: https://veeam.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by and sponsored by our friends over at Veeam. And as a part of that, they have thrown one of their own to the proverbial lion. My guest today is Sam Nicholls, Director of Public Cloud over at Veeam. Sam, thank you for joining me.Sam: Hey. Thanks for having me, Corey, and thanks for everyone joining and listening in. I do know that I've been thrown into the lion's den, and I am [laugh] hopefully well-prepared to answer anything and everything that Corey throws my way. Fingers crossed. [laugh].Corey: I don't think there's too much room for criticizing here, to be direct. I mean, Veeam is a company that is solidly and thoroughly built around a problem that absolutely no one cares about. I mean, what could possibly be wrong with that? You do backups; which no one ever cares about. Restores, on the other hand, people care very much about restores. And that's when they learn, “Oh, I really should have cared about backups at any point prior to 20 minutes ago.”Sam: Yeah, it's a great point. It's kind of like taxes and insurance. It's almost like, you know, something that you have to do that you don't necessarily want to do, but when push comes to shove, and something's burning down, a file has been deleted, someone's made their way into your account and, you know, running a right mess within there, that's when you really, kind of, care about what you mentioned, which is the recovery piece, the speed of recovery, the reliability of recovery.Corey: It's been over a decade, and I'm still sore about losing my email archives from 2006 to 2009. There's no way to get it back. I ran my own mail server; it was an iPhone setting that said, “Oh, yeah, automatically delete everything in your trash folder—or archive folder—after 30 days.” It was just a weird default setting back in that era. I didn't realize it was doing that. Yeah, painful stuff.And we learned the hard way in some of these cases. Not that I really have much need for email from that era of my life, but every once in a while it still bugs me. Which gets speaks to the point that the people who are the most fanatical about backing things up are the people who have been burned by not having a backup. And I'm fortunate in that it wasn't someone else's data with which I had been entrusted that really cemented that lesson for me.Sam: Yeah, yeah. It's a good point. I could remember a few years ago, my wife migrated a very aging, polycarbonate white Mac to one of the shiny new aluminum ones and thought everything was good—Corey: As the white polycarbonate Mac becomes yellow, then yeah, all right, you know, it's time to replace it. Yeah. So yeah, so she wiped the drive, and what happened?Sam: That was her moment where she learned the value and importance of backup unless she backs everything up now. I fortunately have never gone through it. But I'm employed by a backup vendor and that's why I care about it. But it's incredibly important to have, of course.Corey: Oh, yes. My spouse has many wonderful qualities, but one that drives me slightly nuts is she's something of a digital packrat where her hard drives on her laptop will periodically fill up. And I used to take the approach of oh, you can be more efficient and do the rest. And I realized no, telling other people they're doing it wrong is generally poor practice, whereas just buying bigger drives is way easier. Let's go ahead and do that. It's small price to pay for domestic tranquility.And there's a lesson in that. We can map that almost perfectly to the corporate world where you folks tend to operate in. You're not doing home backup, last time I checked; you are doing public cloud backup. Actually, I should ask that. Where do you folks start and where do you stop?Sam: Yeah, no, it's a great question. You know, we started over 15 years ago when virtualization, specifically VMware vSphere, was really the up-and-coming thing, and, you know, a lot of folks were there trying to utilize agents to protect their vSphere instances, just like they were doing with physical Windows and Linux boxes. And, you know, it kind of got the job done, but was it the best way of doing it? No. And that's kind of why Veeam was pioneered; it was this agentless backup, image-based backup for vSphere.And, of course, you know, in the last 15 years, we've seen lots of transitions, of course, we're here at Screaming in the Cloud, with you, Corey, so AWS, as well as a number of other public cloud vendors we can help protect as well, as a number of SaaS applications like Microsoft 365, metadata and data within Salesforce. So, Veeam's really kind of come a long way from just virtual machines to really taking a global look at the entirety of modern environments, and how can we best protect each and every single one of those without trying to take a square peg and fit it in a round hole?Corey: It's a good question and a common one. We wind up with an awful lot of folks who are confused by the proliferation of data. And I'm one of them, let's be very clear here. It comes down to a problem where backups are a multifaceted, deep problem, and I don't think that people necessarily think of it that way. But I take a look at all of the different, even AWS services that I use for my various nonsense, and which ones can be used to store data?Well, all of them. Some of them, you have to hold it in a particularly wrong sort of way, but they all store data. And in various contexts, a lot of that data becomes very important. So, what service am I using, in which account am I using, and in what region am I using it, and you wind up with data sprawl, where it's a tremendous amount of data that you can generally only track down by looking at your bills at the end of the month. Okay, so what am I being charged, and for what service?That seems like a good place to start, but where is it getting backed up? How do you think about that? So, some people, I think, tend to ignore the problem, which we're seeing less and less, but other folks tend to go to the opposite extreme and we're just going to backup absolutely everything, and we're going to keep that data for the rest of our natural lives. It feels to me that there's probably an answer that is more appropriate somewhere nestled between those two extremes.Sam: Yeah, snapshot sprawl is a real thing, and it gets very, very expensive very, very quickly. You know, your snapshots of EC2 instances are stored on those attached EBS volumes. Five cents per gig per month doesn't sound like a lot, but when you're dealing with thousands of snapshots for thousands machines, it gets out of hand very, very quickly. And you don't know when to delete them. Like you say, folks are just retaining them forever and dealing with this unfortunate bill shock.So, you know, where to start is automating the lifecycle of a snapshot, right, from its creation—how often do we want to be creating them—from the retention—how long do we want to keep these for—and where do we want to keep them because there are other storage services outside of just EBS volumes. And then, of course, the ultimate: deletion. And that's important even from a compliance perspective as well, right? You've got to retain data for a specific number of years, I think healthcare is like seven years, but then you've—Corey: And then not a day more.Sam: Yeah, and then not a day more because that puts you out of compliance, too. So, policy-based automation is your friend and we see a number of folks building these policies out: gold, silver, bronze tiers based on criticality of data compliance and really just kind of letting the machine do the rest. And you can focus on not babysitting backup.Corey: What was it that led to the rise of snapshots? Because back in my very early days, there was no such thing. We wound up using a bunch of servers stuffed in a rack somewhere and virtualization was not really in play, so we had file systems on physical disks. And how do you back that up? Well, you have an agent of some sort that basically looks at all the files and according to some ruleset that it has, it copies them off somewhere else.It was slow, it was fraught, it had a whole bunch of logic that was pushed out to the very edge, and forget about restoring that data in a timely fashion or even validating a lot of those backups worked other than via checksum. And God help you if you had data that was constantly in the state of flux, where anything changing during the backup run would leave your backups in an inconsistent state. That on some level seems to have largely been solved by snapshots. But what's your take on it? You're a lot closer to this part of the world than I am.Sam: Yeah, snapshots, I think folks have turned to snapshots for the speed, the lack of impact that they have on production performance, and again, just the ease of accessibility. We have access to all different kinds of snapshots for EC2, RDS, EFS throughout the entirety of our AWS environment. So, I think the snapshots are kind of like the default go-to for folks. They can help deliver those very, very quick RPOs, especially in, for example, databases, like you were saying, that change very, very quickly and we all of a sudden are stranded with a crash-consistent backup or snapshot versus an application-consistent snapshot. And then they're also very, very quick to recover from.So, snapshots are very, very appealing, but they absolutely do have their limitations. And I think, you know, it's not a one or the other; it's that they've got to go hand-in-hand with something else. And typically, that is an image-based backup that is stored in a separate location to the snapshot because that snapshot is not independent of the disk that it is protecting.Corey: One of the challenges with snapshots is most of them are created in a copy-on-write sense. It takes basically an instant frozen point in time back—once upon a time when we ran MySQL databases on top of the NetApp Filer—which works surprisingly well—we would have a script that would automatically quiesce the database so that it would be in a consistent state, snapshot the file and then un-quiesce it, which took less than a second, start to finish. And that was awesome, but then you had this snapshot type of thing. It wasn't super portable, it needed to reference a previous snapshot in some cases, and AWS takes the same approach where the first snapshot it captures every block, then subsequent snapshots wind up only taking up as much size as there have been changes since the first snapshots. So, large quantities of data that generally don't get access to a whole lot have remarkably small, subsequent snapshot sizes.But that's not at all obvious from the outside, and looking at these things. They're not the most portable thing in the world. But it's definitely the direction that the industry has trended in. So, rather than having a cron job fire off an AWS API call to take snapshots of my volumes as a sort of the baseline approach that we all started with, what is the value proposition that you folks bring? And please don't say it's, “Well, cron jobs are hard and we have a friendlier interface for that.”Sam: [laugh]. I think it's really starting to look at the proliferation of those snapshots, understanding what they're good at, and what they are good for within your environment—as previously mentioned, low RPOs, low RTOs, how quickly can I take a backup, how frequently can I take a backup, and more importantly, how quickly can I restore—but then looking at their limitations. So, I mentioned that they were not independent of that disk, so that certainly does introduce a single point of failure as well as being not so secure. We've kind of touched on the cost component of that as well. So, what Veeam can come in and do is then take an image-based backup of those snapshots, right—so you've got your initial snapshot and then your incremental ones—we'll take the backup from that snapshot, and then we'll start to store that elsewhere.And that is likely going to be in a different account. We can look at the Well-Architected Framework, AWS deeming accounts as a security boundary, so having that cross-account function is critically important so you don't have that single point of failure. Locking down with IAM roles is also incredibly important so we haven't just got a big wide open door between the two. But that data is then stored in a separate account—potentially in a separate region, maybe in the same region—Amazon S3 storage. And S3 has the wonderful benefit of being still relatively performant, so we can have quick recoveries, but it is much, much cheaper. You're dealing with 2.3 cents per gig per month, instead of—Corey: To start, and it goes down from there with sizeable volumes.Sam: Absolutely, yeah. You can go down to S3 Glacier, where you're looking at, I forget how many points and zeros and nines it is, but it's fractions of a cent per gig per month, but it's going to take you a couple of days to recover that da—Corey: Even infrequent access cuts that in half.Sam: Oh yeah.Corey: And let's be clear, these are snapshot backups; you probably should not be accessing them on a consistent, sustained basis.Sam: Well, exactly. And this is where it's kind of almost like having your cake and eating it as well. Compliance or regulatory mandates or corporate mandates are saying you must keep this data for this length of time. Keeping that—you know, let's just say it's three years' worth of snapshots in an EBS volume is going to be incredibly expensive. What's the likelihood of you needing to recover something from two years—actually, even two months ago? It's very, very small.So, the performance part of S3 is, you don't need to take it as much into consideration. Can you recover? Yes. Is it going to take a little bit longer? Absolutely. But it's going to help you meet those retention requirements while keeping your backup bill low, avoiding that bill shock, right, spending tens and tens of thousands every single month on snapshots. This is what I mean by kind of having your cake and eating it.Corey: I somewhat recently have had a client where EBS snapshots are one of the driving costs behind their bill. It is one of their largest single line items. And I want to be very clear here because if one of those people who listen to this and thinking, “Well, hang on. Wait, they're telling stories about us, even though they're not naming us by name?” Yeah, there were three of you in the last quarter.So, at that point, it becomes clear it is not about something that one individual company has done and more about an overall driving trend. I am personalizing it a little bit by referring to as one company when there were three of you. This is a narrative device, not me breaking confidentiality. Disclaimer over. Now, when you talk to people about, “So, tell me why you've got 80 times more snapshots than you do EBS volumes?” The answer is as, “Well, we wanted to back things up and we needed to get hourly backups to a point, then daily backups, then monthly, and so on and so forth. And when this was set up, there wasn't a great way to do this natively and we don't always necessarily know what we need versus what we don't. And the cost of us backing this up, well, you can see it on the bill. The cost of us deleting too much and needing it as soon as we do? Well, that cost is almost incalculable. So, this is the safe way to go.” And they're not wrong in anything that they're saying. But the world has definitely evolved since then.Sam: Yeah, yeah. It's a really great point. Again, it just folds back into my whole having your cake and eating it conversation. Yes, you need to retain data; it gives you that kind of nice, warm, cozy feeling, it's a nice blanket on a winter's day that that data, irrespective of what happens, you're going to have something to recover from. But the question is does that need to be living on an EBS volume as a snapshot? Why can't it be living on much, much more cost-effective storage that's going to give you the warm and fuzzies, but is going to make your finance team much, much happier [laugh].Corey: One of the inherent challenges I think people have is that snapshots by themselves are almost worthless, in that I have an EBS snapshot, it is sitting there now, it's costing me an undetermined amount of money because it's not exactly clear on a per snapshot basis exactly how large it is, and okay, great. Well, I'm looking for a file that was not modified since X date, as it was on this time. Well, great, you're going to have to take that snapshot, restore it to a volume and then go exploring by hand. Oh, it was the wrong one. Great. Try it again, with a different one.And after, like, the fifth or six in a row, you start doing a binary search approach on this thing. But it's expensive, it's time-consuming, it takes forever, and it's not a fun user experience at all. Part of the problem is it seems that historically, backup systems have no context or no contextual awareness whatsoever around what is actually contained within that backup.Sam: Yeah, yeah. I mean, you kind of highlighted two of the steps. It's more like a ten-step process to do, you know, granular file or folder-level recovery from a snapshot, right? You've got to, like you say, you've got to determine the point in time when that, you know, you knew the last time that it was around, then you're going to have to determine the volume size, the region, the OS, you're going to have to create an EBS volume of the same size, region, from that snapshot, create the EC2 instance with the same OS, connect the two together, boot the EC2 instance, mount the volume search for the files to restore, download them manually, at which point you have your file back. It's not back in the machine where it was, it's now been downloaded locally to whatever machine you're accessing that from. And then you got to tear it all down.And that is again, like you say, predicated on the fact that you knew exactly that that was the right time. It might not be and then you have to start from scratch from a different point in time. So, backup tooling from backup vendors that have been doing this for many, many years, knew about this problem long, long ago, and really seek to not only automate the entirety of that process but make the whole e-discovery, the search, the location of those files, much, much easier. I don't necessarily want to do a vendor pitch, but I will say with Veeam, we have explorer-like functionality, whereby it's just a simple web browser. Once that machine is all spun up again, automatic process, you can just search for your individual file, folder, locate it, you can download it locally, you can inject it back into the instance where it was through Amazon Kinesis or AWS Kinesis—I forget the right terminology for it; some of its AWS, some of its Amazon.But by-the-by, the whole recovery process, especially from a file or folder level, is much more pain-free, but also much faster. And that's ultimately what people care about how reliable is my backup? How quickly can I get stuff online? Because the time that I'm down is costing me an indescribable amount of time or money.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: Right, the idea of RPO versus RTO: recovery point objective and recovery time objective. With an RPO, it's great, disaster strikes right now, how long is acceptable to it have been since the last time we backed up data to a restorable point? Sometimes it's measured in minutes, sometimes it's measured in fractions of a second. It really depends on what we're talking about. Payments databases, that needs to be—the RPO is basically an asymptotically approaches zero.The RTO is okay, how long is acceptable before we have that data restored and are back up and running? And that is almost always a longer time, but not always. And there's a different series of trade-offs that go into that. But both of those also presuppose that you've already dealt with the existential question of is it possible for us to recover this data. And that's where I know that you are obviously—you have a position on this that is informed by where you work, but I don't, and I will call this out as what I see in the industry: AWS backup is compelling to me except for one fatal flaw that it has, and that is it starts and stops with AWS.I am not a proponent of multi-cloud. Lord knows I've gotten flack for that position a bunch of times, but the one area where it makes absolute sense to me is backups. Have your data in a rehydrate-the-business level state backed up somewhere that is not your primary cloud provider because you're otherwise single point of failure-ing through a company, through the payment instrument you have on file with that company, in the blast radius of someone who can successfully impersonate you to that vendor. There has to be a gap of some sort for the truly business-critical data. Yes, egress to other providers is expensive, but you know what also is expensive? Irrevocably losing the data that powers your business. Is it likely? No, but I would much rather do it than have to justify why I'm not doing it.Sam: Yeah. Wasn't likely that I was going to win that 2 billion or 2.1 billion on the Powerball, but [laugh] I still play [laugh]. But I understand your standpoint on multi-cloud and I read your newsletters and understand where you're coming from, but I think the reality is that we do live in at least a hybrid cloud world, if not multi-cloud. The number of organizations that are sole-sourced on a single cloud and nothing else is relatively small, single-digit percentage. It's around 80-some percent that are hybrid, and the remainder of them are your favorite: multi-cloud.But again, having something that is one hundred percent sole-source on a single platform or a single vendor does expose you to a certain degree of risk. So, having the ability to do cross-platform backups, recoveries, migrations, for whatever reason, right, because it might not just be a disaster like you'd mentioned, it might also just be… I don't know, the company has been taken over and all of a sudden, the preference is now towards another cloud provider and I want you to refactor and re-architect everything for this other cloud provider. If all that data is locked into one platform, that's going to make your job very, very difficult. So, we mentioned at the beginning of the call, Veeam is capable of protecting a vast number of heterogeneous workloads on different platforms, in different environments, on-premises, in multiple different clouds, but the other key piece is that we always use the same backup file format. And why that's key is because it enables portability.If I have backups of EC2 instances that are stored in S3, I could copy those onto on-premises disk, I could copy those into Azure, I could do the same with my Azure VMs and store those on S3, or again, on-premises disk, and any other endless combination that goes with that. And it's really kind of centered around, like control and ownership of your data. We are not prescriptive by any means. Like, you do what is best for your organization. We just want to provide you with the toolset that enables you to do that without steering you one direction or the other with fee structures, disparate feature sets, whatever it might be.Corey: One of the big challenges that I keep seeing across the board is just a lack of awareness of what the data that matters is, where you see people backing up endless fleets of web server instances that are auto-scaled into existence and then removed, but you can create those things at will; why do you care about the actual data that's on these things? It winds up almost at the library management problem, on some level. And in that scenario, snapshots are almost certainly the wrong answer. One thing that I saw previously that really changed my way of thinking about this was back many years ago when I was working at a startup that had just started using GitHub and they were paying for a third-party service that wound up backing up Git repos. Today, that makes a lot more sense because you have a bunch of other stuff on GitHub that goes well beyond the stuff contained within Git, but at the time, it was silly. It was, why do that? Every Git clone is a full copy of the entire repository history. Just grab it off some developer's laptop somewhere.It's like, “Really? You want to bet the company, slash your job, slash everyone else's job on that being feasible and doable or do you want to spend the 39 bucks a month or whatever it was to wind up getting that out the door now so we don't have to think about it, and they validate that it works?” And that was really a shift in my way of thinking because, yeah, backing up things can get expensive when you have multiple copies of the data living in different places, but what's really expensive is not having a company anymore.Sam: Yeah, yeah, absolutely. We can tie it back to my insurance dynamic earlier where, you know, it's something that you know that you have to have, but you don't necessarily want to pay for it. Well, you know, just like with insurances, there's multiple different ways to go about recovering your data and it's only in crunch time, do you really care about what it is that you've been paying for, right, when it comes to backup?Could you get your backup through a git clone? Absolutely. Could you get your data back—how long is that going to take you? How painful is that going to be? What's going to be the impact to the business where you're trying to figure that out versus, like you say, the 39 bucks a month, a year, or whatever it might be to have something purpose-built for that, that is going to make the recovery process as quick and painless as possible and just get things back up online.Corey: I am not a big fan of the fear, uncertainty, and doubt approach, but I do practice what I preach here in that yeah, there is a real fear against data loss. It's not, “People are coming to get you, so you absolutely have to buy whatever it is I'm selling,” but it is something you absolutely have to think about. My core consulting proposition is that I optimize the AWS bill. And sometimes that means spending more. Okay, that one S3 bucket is extremely important to you and you say you can't sustain the loss of it ever so one zone is not an option. Where is it being backed up? Oh, it's not? Yeah, I suggest you spend more money and back that thing up if it's as irreplaceable as you say. It's about doing the right thing.Sam: Yeah, yeah, it's interesting, and it's going to be hard for you to prove the value of doing that when you are driving their bill up when you're trying to bring it down. But again, you have to look at something that's not itemized on that bill, which is going to be the impact of downtime. I'm not going to pretend to try and recall the exact figures because it also varies depending on your business, your industry, the size, but the impact of downtime is massive financially. Tens of thousands of dollars for small organizations per hour, millions and millions of dollars per hour for much larger organizations. The backup component of that is relatively small in comparison, so having something that is purpose-built, and is going to protect your data and help mitigate that impact of downtime.Because that's ultimately what you're trying to protect against. It is the recovery piece that you're buying is the most important piece. And like you, I would say, at least be cognizant of it and evaluate your options and what can you live with and what can you live without.Corey: That's the big burning question that I think a lot of people do not have a good answer to. And when you don't have an answer, you either backup everything or nothing. And I'm not a big fan of doing either of those things blindly.Sam: Yeah, absolutely. And I think this is why we see varying different backup options as well, you know? You're not going to try and apply the same data protection policies each and every single workload within your environment because they've all got different types of workload criticality. And like you say, some of them might not even need to be backed up at all, just because they don't have data that needs to be protected. So, you need something that is going to be able to be flexible enough to apply across the entirety of your environment, protect it with the right policy, in terms of how frequently do you protect it, where do you store it, how often, or when are you eventually going to delete that and apply that on a workload by workload basis. And this is where the joy of things like tags come into play as well.Corey: One last thing I want to bring up is that I'm a big fan of watching for companies saying the quiet part out loud. And one area in which they do this—because they're forced to by brevity—is in the title tag of their website. I pull up veeam.com and I hover over the tab in my browser, and it says, “Veeam Software: Modern Data Protection.”And I want to call that out because you're not framing it as explicitly backup. So, the last topic I want to get into is the idea of security. Because I think it is not fully appreciated on a lived-experience basis—although people will of course agree to this when they're having ivory tower whiteboard discussions—that every place your data lives is a potential for a security breach to happen. So, you want to have your data living in a bunch of places ideally, for backup and resiliency purposes. But you also want it to be completely unworkable or illegible to anyone who is not authorized to have access to it.How do you balance those trade-offs yourself given that what you're fundamentally saying is, “Trust us with your Holy of Holies when it comes to things that power your entire business?” I mean, I can barely get some companies to agree to show me their AWS bill, let alone this is the data that contains all of this stuff to destroy our company.Sam: Yeah. Yeah, it's a great question. Before I explicitly answer that piece, I will just go to say that modern data protection does absolutely have a security component to it, and I think that backup absolutely needs to be a—I'm going to say this an air quotes—a “first class citizen” of any security strategy. I think when people think about security, their mind goes to the preventative, like how do we keep these bad people out?This is going to be a bit of the FUD that you love, but ultimately, the bad guys on the outside have an infinite number of attempts to get into your environment and only have to be right once to get in and start wreaking havoc. You on the other hand, as the good guy with your cape and whatnot, you have got to be right each and every single one of those times. And we as humans are fallible, right? None of us are perfect, and it's incredibly difficult to defend against these ever-evolving, more complex attacks. So backup, if someone does get in, having a clean, verifiable, recoverable backup, is really going to be the only thing that is going to save your organization, should that actually happen.And what's key to a secure backup? I would say separation, isolation of backup data from the production data, I would say utilizing things like immutability, so in AWS, we've got Amazon S3 object lock, so it's that write once, read many state for whatever retention period that you put on it. So, the data that they're seeking to encrypt, whether it's in production or in their backup, they cannot encrypt it. And then the other piece that I think is becoming more and more into play, and it's almost table stakes is encryption, right? And we can utilize things like AWS KMS for that encryption.But that's there to help defend against the exfiltration attempts. Because these bad guys are realizing, “Hey, people aren't paying me my ransom because they're just recovering from a clean backup, so now I'm going to take that backup data, I'm going to leak the personally identifiable information, trade secrets, or whatever on the internet, and that's going to put them in breach compliance and give them a hefty fine that way unless they pay me my ransom.” So encryption, so they can't read that data. So, not only can they not change it, but they can't read it is equally important. So, I would say those are the three big things for me on what's needed for backup to make sure it is clean and recoverable.Corey: I think that is one of those areas where people need to put additional levels of thought in. I think that if you have access to the production environment and have full administrative rights throughout it, you should definitionally not—at least with that account and ideally not you at all personally—have access to alter the backups. Full stop. I would say, on some level, there should not be the ability to alter backups for some particular workloads, the idea being that if you get hit with a ransomware infection, it's pretty bad, let's be clear, but if you can get all of your data back, it's more of an annoyance than it is, again, the existential business crisis that becomes something that redefines you as a company if you still are a company.Sam: Yeah. Yeah, I mean, we can turn to a number of organizations. Code Spaces always springs to mind for me, I love Code Spaces. It was kind of one of those precursors to—Corey: It's amazing.Sam: Yeah, but they were running on AWS and they had everything, production and backups, all stored in one account. Got into the account. “We're going to delete your data if you don't pay us this ransom.” They were like, “Well, we're not paying you the ransoms. We got backups.” Well, they deleted those, too. And, you know, unfortunately, Code Spaces isn't around anymore. But it really kind of goes to show just the importance of at least logically separating your data across different accounts and not having that god-like access to absolutely everything.Corey: Yeah, when you talked about Code Spaces, I was in [unintelligible 00:32:29] talking about GitHub Codespaces specifically, where they have their developer workstations in the cloud. They're still very much around, at least last time I saw unless you know something I don't.Sam: Precursor to that. I can send you the link—Corey: Oh oh—Sam: You can share it with the listeners.Corey: Oh, yes, please do. I'd love to see that.Sam: Yeah. Yeah, absolutely.Corey: And it's been a long and strange time in this industry. Speaking of links for the show notes, I appreciate you're spending so much time with me. Where can people go to learn more?Sam: Yeah, absolutely. I think veeam.com is kind of the first place that people gravitate towards. Me personally, I'm kind of like a hands-on learning kind of guy, so we always make free product available.And then you can find that on the AWS Marketplace. Simply search ‘Veeam' through there. A number of free products; we don't put time limits on it, we don't put feature limitations. You can backup ten instances, including your VPCs, which we actually didn't talk about today, but I do think is important. But I won't waste any more time on that.Corey: Oh, configuration of these things is critically important. If you don't know how everything was structured and built out, you're basically trying to re-architect from first principles based upon archaeology.Sam: Yeah [laugh], that's a real pain. So, we can help protect those VPCs and we actually don't put any limitations on the number of VPCs that you can protect; it's always free. So, if you're going to use it for anything, use it for that. But hands-on, marketplace, if you want more documentation, want to learn more, want to speak to someone veeam.com is the place to go.Corey: And we will, of course, include that in the show notes. Thank you so much for taking so much time to speak with me today. It's appreciated.Sam: Thank you, Corey, and thanks for all the listeners tuning in today.Corey: Sam Nicholls, Director of Public Cloud at Veeam. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that takes you two hours to type out but then you lose it because you forgot to back it up.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
On The Cloud Pod this week, the team finds out whose re:Invent 2021 crystal ball was most accurate. Also Graviton3 is announced, and Adam Selipsky gives his first re:Invent keynote. A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located. This week's highlights
In this special episode, Eoin and Luciano talk about their impression on the announcements from the first day of AWS re:invent 2021. AWS Lambda now supports event filtering for Amazon SQS, Amazon DynamoDB, and Amazon Kinesis as event sources: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-event-filtering-amazon-sqs-dynamodb-kinesis-sources/ Amazon CodeGuru Reviewer now detects hardcoded secrets in Java and Python repositories: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-codeguru-reviewer-hardcoded-secrets-java-python/ Amazon ECR announces pull through cache repositories: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-ecr-cache-repositories/ Introducing recommenders optimized to deliver personalized experiences for Media & Entertainment and Retail with Amazon Personalize: https://aws.amazon.com/about-aws/whats-new/2021/11/recommenders-optimized-personalized-media-entertainment-retail-amazon-personalize/AWS Chatbot now supports management of AWS resources in Slack (Preview): https://aws.amazon.com/about-aws/whats-new/2021/11/aws-chatbot-management-resources-slack/ Amazon CloudWatch Evidently: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-evidently-feature-experimentation-safer-launches/ AWS Migration Hub Refactor Spaces - Preview: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-migration-hub-refactor-spaces/ CloudWatch Real User Monitoring: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-rum-applications-client-side-performance/ CloudWatch Metrics Insights: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-metrics-insights-preview/ AWS Karpenter: https://github.com/aws/karpenter S3 Event Notifications with EventBridge: https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/ S3 Event Notifications for S3 Lifecycle, S3 Intelligent-Tiering, object tags, and object access control lists: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-s3-event-notifications-s3-lifecycle-intelligent-tiering-object-tags-object-access-control-lists/ Amazon Athena ACID Transactions (Preview): https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-athena-acid-apache-iceberg/ AWS Control Tower introduces Terraform account provisioning and customization: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-control-tower-terraform/ Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
En el episodio número 11 del podcast nos visita Javier Ramirez, Senior Developer Advocate en Amazon Web Services. Hoy nos viene a contar sobre 2 topics. El primero, cómo se instruye y qué es lo que hace un Developer Advocate. Además, nos cuenta sus inicios, cómo se formó y cómo ayuda a los clientes hoy. En la segunda parte nos metemos de lleno en qué es ClickStream y el análisis de actividad en una aplicación. En este tema vamos de menor a mayor, desde simplemente enviar los datos para ser analizados de manera sencilla hasta envíos masivos de datos y análisis con herramientas serverless en Amazon Web Services. Javier Ramirez - @supercoco9 : Trabaja como evangelista técnico en AWS para ayudar a los desarrolladores a aprovechar al máximo la nube, de modo que puedan concentrarse en resolver problemas interesantes y confiar en AWS en cuanto a rendimiento, escalabilidad, elasticidad y seguridad. Fanático del almacenamiento de datos, grandes y pequeños con una amplia experiencia con diferentes soluciones SQL, NoSQL, graph, in-memory y Big Data. Antes de trabajar en AWS, pasó 20 años desarrollando software profesionalmente y compartiendo sus aprendizajes con la comunidad. Ha hablado en eventos en más de 15 países, ha sido mentor de decenas de empresas emergentes, ha enseñado durante 6 años en universidades y ha capacitado a cientos de profesionales en la nube y la ingeniería de datos. Rodrigo Asensio - @rasensio : Basado en Barcelona, España, Rodrigo es responsable de un equipo de Solution Architecture del segmento Enterprise que ayuda a grandes clientes en sus migraciones masivas al cloud, en transformación digital y proyectos de innovación. Links AWS:- Clickstream solution: https://aws.amazon.com/quickstart/architecture/clickstream-analytics/ - Amazon Athena para consultar datos desde S3: https://aws.amazon.com/athena- Amazon Kinesis para streaming de datos: https://aws.amazon.com/kinesis/- Amazon Managed Streaming for Apache Kafka para streaming de datos con Kafka: https://aws.amazon.com/msk/ - Amazon QuickSight para visualización de datos: https://aws.amazon.com/quicksight/ Conecta con Rodrigo Asensio en Twitter https://twitter.com/rasensio y Linkedin en https://www.linkedin.com/in/rasensio/
Various challenges while switching to Data Science/AI as Experienced/Fresher:1. Do I need to work as Fresher in the field of Data Science by diluting my previous experience?2. My CTC going to be diluted if I switch to Data Science/AI? Or Do I get the salary hike?3. After working for many years into other experience, Can I crack Data Science Interview? Can I survive in Data Science/AI/ML fied?For all the above questions, Book a free call consultation & Get a customised career transition roadmap into the field of Data Science/AI: https://www.bepec.in/registration-formSpeak with our mentor Mr Kanth on Instagram @meet_kanth
In this month's episode where we tell you what AWS released since re:Invent, Guy gets to talk a fair bit about IoT, JM just wants to remind everyone of various things, and Arjen suffers from some sleep deprivation. What's New Finally in Sydney PartiQL for DynamoDB now is supported in 23 AWS Regions AWS Network Firewall is now available in the Asia Pacific (Sydney) Region Amazon Rekognition Custom Labels is now available in the Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), and Asia Pacific (Tokyo) AWS Regions Announcing new Amazon EC2 T4g instances powered by AWS Graviton2 processors along with a T4g free trial in Asia Pacific (Sydney, Singapore), Europe (London), North Americas (Canada Central, San Francisco), and South Americas (Sao Paulo) regions Serverless Lambda AWS Compute Optimizer Now Delivers Recommendations For AWS Lambda Functions AWS Lambda now makes it easier to build analytics for Amazon Kinesis and Amazon DynamoDB Streams AWS Lambda now supports self-managed Apache Kafka as an event source AWS Lambda launches checkpointing for Amazon Kinesis and Amazon DynamoDB Streams AWS Lambda now supports SASL/SCRAM authentication for functions triggered from Amazon MSK API Gateway Amazon API Gateway now supports data mapping in HTTP APIs Containers Monitoring Join the Preview – Amazon Managed Service for Prometheus (AMP) | AWS News Blog Announcing Amazon Managed Service for Grafana (in Preview) | AWS News Blog Amazon CloudWatch now adds Fluent Bit support for container logs from Amazon EKS and Kubernetes General EC2 Image Builder now supports container images ECS Amazon ECS announces the general availability of ECS Deployment Circuit Breaker Amazon Elastic Container Service launches new management console Amazon ECS now supports VPC Endpoint policies Amazon ECS announces increased service quotas for tasks per service and services per cluster EKS AWS Load Balancer Controller version 2.1 now available with support for additional ELB configurations EC2 & VPC Instances Announcing new Amazon EC2 C6gn instances powered by AWS Graviton2 processors with 100 Gbps networking Amazon EC2 Auto Scaling now allows to define 40 instance types when defining Mixed Instances Policy EBS Multi-Attach support now available on Amazon EBS Provisioned IOPS volume type, io2 Amazon Data Lifecycle Manager now automates copying EBS snapshots across accounts Networking Amazon Virtual Private Cloud (VPC) Now supports Tag on Create for Elastic IP addresses Amazon EC2 API now supports Internet Protocol Version 6 (IPv6) Lightsail Amazon Lightsail now supports IPv6 Dev & Ops Dev AWS SDK for Go version 2 is now generally available AWS SDK for JavaScript version 3 is now generally available Porting Assistant for .NET supports automated code translation Announcing the General Availability of Amazon Corretto 11 for Linux on ARM32 and for Windows on x86 (32-bit) AWS App2Container now supports remote execution of containerization workflows AWS CodePipeline supports deployments with CloudFormation StackSets Announcing CDK Support for AWS Chalice Ops Introducing AWS Systems Manager Change Manager | AWS News Blog New – AWS Systems Manager Consolidates Application Management | AWS News Blog Introducing AWS Systems Manager Fleet Manager Security AWS Single Sign-On now supports Microsoft Active Directory (AD) synchronization Announcing Amazon Route 53 support for DNSSEC AWS Config launches ability to save advanced queries Amazon GuardDuty adds three new threat detections to help you better protect your data stored in Amazon S3 Amazon Cognito Identity Pools enables using user attributes from identity providers for access control to simplify permissions management in AWS AWS Certificate Manager Private Certificate Authority now supports additional certificate customization Amazon Detective enhances IP Address Analytics Data storage & processing AWS Glue launches AWS Glue Custom Connectors Amazon CloudSearch announces updates to its search instances New – AWS Transfer Family support for Amazon Elastic File System | AWS News Blog Achieve faster database failover with Amazon Web Services MySQL JDBC Driver - now in preview Amazon Aurora supports in-place upgrades from MySQL 5.6 to 5.7 Amazon Aurora supports PostgreSQL 12 Amazon Keyspaces (for Apache Cassandra) now supports JSON syntax to help you read and write data from other systems more easily AI & ML Introducing Amazon SageMaker ml.P4d instances for highest performance ML training in the cloud IoT New – AWS IoT Core for LoRaWAN to Connect, Manage, and Secure LoRaWAN Devices at Scale | AWS News Blog Announcing AWS IoT Greengrass 2.0 – With an Open Source Edge Runtime and New Developer Capabilities | AWS News Blog Announcing AWS IoT SiteWise Edge (Preview), a new capability of AWS IoT SiteWise to collect, process, and monitor industrial equipment data on-premises Announcing support for Alarms (Preview) in AWS IoT Events and AWS IoT SiteWise Introducing AWS IoT SiteWise plugin for Grafana AWS IoT Core Device Advisor now available in preview AWS IoT Core adds the ability to deliver data to Apache Kafka clusters AWS IoT SiteWise launches support for Modbus TCP and EtherNet/IP protocols with enhancements to OPC-UA data ingestion Introducing AWS IoT EduKit Announcing AWS IoT Device Defender ML Detect public preview Announcing date and time functions and timezone support in AWS IoT SiteWise Other Cool Stuff Policy Stepping up for a truly open source Elasticsearch | AWS Open Source Blog Services AWS CloudShell – Command-Line Access to AWS Resources | AWS News Blog Amazon Location – Add Maps and Location Awareness to Your Applications | AWS News Blog AWS Cost Anomaly Detection is now generally available Features APIs now available for the AWS Well-Architected Tool Cost & Usage Report Now Available to Member (Linked) Accounts Announcing the availability of AWS Outposts Private Connectivity Amazon Managed Blockchain now supports Ethereum (Preview) AWS Snow Family now supports the Amazon Linux 2 operating system Service Quotas now supports tagging and Attribute-Based Access Control (ABAC) Amazon Lex Introduces an Enhanced Console Experience and New V2 APIs | AWS News Blog SQS Amazon SQS Now Supports a High Throughput Mode for FIFO Queues (Preview) Amazon SQS announces tiered pricing Control Tower region AWS Control Tower now extends governance to existing OUs in your AWS Organizations AWS Control Tower now provides bulk account update The Nanos Amazon Aurora supports in-place upgrades from PostgreSQL 11 to 12 Announcing the General Availability of Amazon Corretto 11 for Linux on ARM32 and for Windows on x86 (32-bit) Amazon Lightsail now supports IPv6 Amazon Virtual Private Cloud (VPC) Now supports Tag on Create for Elastic IP addresses Sponsors Gold Sponsor Innablr Silver Sponsors AC3 CMD Solutions DoIT International
AWS US east-1 experienced an outage Nov-25-2020. Amazon has updated us with summary detailing what exactly happened to amazon Kinesis that caused the outage let us discuss it 0:00 Intro 1:00 Tldr (diagram) 7:30 Detailed Analysis of What Happened 25:00 Why Cognito Went Down 31:20 Why CloudWatch Went Down 33:20 Why Lambda and AutoScaling Went Down 35:50 Why EventBridge, Elastic Kubernetes and Container Service Went Down 38:00 Why Service Status Went Down 40:00 Summary https://aws.amazon.com/message/11201/ --- Send in a voice message: https://anchor.fm/hnasr/message
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS!」 おはようございます、サーバーワークスの加藤です。 今日は 7/29 に出たアップデートから10件をご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE ラインナップ Amazon Kinesis Data Firehose が複数の新しいデータ配信先をサポート Amazon Translate がOfficeファイルの翻訳をサポート AWS Security Hub が新しい自動セキュリティコントロールを追加 Amazon ECR がAWS KMSを用いたイメージの暗号化をサポート AWS Database Migration Service が拡張されたタスク評価をサポート Amazon Elasticseach Service が類似検索を強化 - コサイン類似度をサポート Amazon Lightsail が cPanel WHM のプリインストールをサポート AWS Cloud Map への Amazon EC2 インスタンスの登録が簡素化 Amazon RDS for Oracle が Oracle Application Express Version 20.1 をサポート AWS Site-to-Site VPN が作成時のタグ付けとリソースレベルのアクセス制御に対応 ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
Listen now as April Shen, FSA, CFA interviews Tom Peplow (Principal, Director of Product Development - LTS) on the interaction between actuarial modeling and big data. In this episode, Tom discusses the basic concept on data warehouse, tools and resources actuaries can use to learn more. Show Notes and Resources: 1) Anything by Ralph Kimball – he’s the forefather of data warehouse data models. 2) Data storage tools: Databricks, Snowflake, Google BigTable, Amazon RedShift, Microsoft Synapse, Parquet 3) Streaming analytics / event processing: Kafka, Amazon Kinesis, Azure Stream Analytics 4) Data preparation: Power BI (using Power Query), Azure Factory Data Flows, Alteryx, AWS Glue 5) Data analytics: Tableau, Power BI 6) Other tools/languages: Jupyter, Python, R, C#, .NET 7) Chapter 5 of Roy Fielding’s dissertation, “Representational State Transfer (REST).” 8) Andrew White (Gartner) “Top Data and Analytics Predictions for 2019” https://blogs.gartner.com/andrew_white/2019/01/03/our-top-data-and-analytics-predicts-for-2019/ 9) Data warehouse is not a use case, Jordan Tigani (Google) https://www.youtube.com/watch?v=0I7eOQpDBHQ
Organizations grappling with moving on-premises workloads and data to the cloud face the challenge of getting platform engineering and team structures right. In most cases, lift-and-shift is not an option. In this session, learn how Vanguard created a team to tackle volume and velocity of data for microservices and big data workloads, using data streaming (Amazon Kinesis), file transfer (AWS Storage Gateway), CDC replication (DB2 on z/OS, Oracle Exadata, Microsoft SQL Server), relational and NoSQL databases (Amazon DynamoDB, Amazon RDS for PostgreSQL, Amazon Aurora), and object storage (Amazon S3). The cloud data platform has seen a 200 percent YOY increase in adoption.
For over a decade, cloud-enabled digital transformation has remade industries and powered innovation to greatly benefit enterprises and consumers. In this session, learn how AWS customers in industries as diverse of Manufacturing, Healthcare, and Oil & Gas are incorporating robots into their next-generation solutions. Come learn how AWS services such AWS RoboMaker, AWS IoT Core, Amazon SageMaker, Amazon Kinesis, and Amazon Rekognition are being used to fundamentally transform work and improve outcomes.
Streaming data pipelines are increasingly used to replace batch processing with real-time decision-making for use cases including log processing, real-time monitoring, data lake analytics, and machine learning. Join this session to learn how to leverage Amazon Kinesis and AWS Lambda to solve real-time ingestion, processing, storage, and analytics challenges. We introduce design patterns and best practices as well as share a customer journey in building large-scale real-time serverless analytics capabilities.
Sensor data on the event stream can be voluminous. In NAND manufacturing, there are millions of columns of data that represent many measured and virtual metrics. These sensor data can arrive with considerable velocity. In this session, learn about developing cross-sectional and longitudinal analyses for anomaly detection and yield optimization using deep learning methods, as well as super-fast subsequence signature search on accumulated time-series data and methods for handling very wide data in Apache Spark on Amazon EMR. The trained models are developed in TIBCO Data Science and Amazon SageMaker and applied to event streams using services such as Amazon Kinesis to identify hot paths to anomaly detection. This presentation is brought to you by TIBCO Software, an APN Partner.
Hertz is undertaking a massive digital transformation to evolve its technology landscape. This move provides an opportunity to extract valuable insights from a large amount of data produced by the new systems. In this session, learn how Deloitte collaborated with Hertz to build a next-generation data platform, which includes an integration hub, a unified reporting layer, and an ecosystem of tools used to build advanced analytics models. Discover how the solution leveraged all native AWS services, such as Amazon S3, Amazon Kinesis, Amazon EMR, and AWS Lambda, to enable cross-functional insights and accelerate the cloud journey in under 12 months. This presentation is brought to you by Deloitte, an APN Partner.
Stream processing tools like Apache Spark and Flink are the default choice for big data processing, but these frameworks also come with high development and operation costs. Serverless streaming architecture is an alternative solution that brings significant reduction in these costs and allows developers to focus on business delivery, not infrastructure management. This session explores how Capital One used serverless streaming architecture to provide real-time insights for millions of customers through its intelligent assistant Eno. Learn how high-throughput streaming loads can be handled with ease as well as how message-driven architecture can be implemented using Amazon API Gateway, AWS Lambda, and Amazon Kinesis for complex asynchronous applications. This presentation is brought to you by Capital One, an APN Partner.
Amazon Prime Video is one of the largest streaming services in the world, serving over 8 percent of all internet traffic in the US on any day. With millions of devices in over 200 countries around the world, collecting and analyzing telemetry data in real time is a major scaling challenge. In this session, learn how the team used AWS technologies such as Amazon Kinesis, AWS Lambda, and Amazon Simple Storage Service (Amazon S3) to build a next-generation telemetry platform that handles millions of transactions per second (TPS) and hundreds of petabytes of data.
As Intuit moves to a cloud hosting architecture that includes Amazon Elastic Compute Cloud (Amazon EC2), containers, and serverless applications, the company is on an "observability" journey to transform the way it monitors its applications' health. Observability requires correlating logs, metrics, traces, etc., which becomes complex as systems become more distributed. In this session, the company discusses leveraging tools such as OpenTelemetry, Amazon Elasticsearch Service (Amazon ES), Amazon Kinesis, and Jaeger to build an observability solution, and the benefits this solution provides by giving visibility across the platform, from containers to serverless applications. The company also discuss how it sees observability evolving in coming years.
In adopting a cloud-first strategy, Pearson had to secure and monitor its infrastructure, but with many applications deployed against many services, it needed reliable, fast ingest of logs into a real-time analytics engine. Pearson chose Amazon Elasticsearch Service for monitoring and Amazon Kinesis for ingest. In this session, we explain some Amazon ES basics and common ingest patterns that builders use on AWS. A Pearson representative shares their firsthand experiences developing a reliable and scalable architecture comprising Amazon ES, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, and AWS Lambda to centralize logs and keep ingestion and access secure.
Today's businesses need to react to customer opportunities and problems as quickly as possible. In this session, we walk through how to build analytics pipelines for business operational reporting that speed up time to information from hours to seconds. We discuss how streaming-data services like Amazon Kinesis are used to capture and analyze data in real time, prior to delivering it to your Amazon Redshift data warehouse and Amazon Aurora reporting databases. Our customer GoDaddy shares how they use this architectural pattern to provide the best experience to the millions of customers that host websites on their platform.
Intercom is "all in" on AWS-a strategy that's aligned with the engineering principles used to build the company. This talk covers how the company uses those principles to make engineering decisions. Learn about the evolution of Intercom's architecture, from handful of Amazon EC2 hosts to thousands of instances. Also learn how Intercom uses Amazon DynamoDB, AWS Lambda, Amazon SQS, Amazon Aurora, Amazon Redshift, Amazon Kinesis, and more. Finally, learn why Intercom decided against leveraging microservices, and why this has increased its ability to move fast and ship great products.
In this messaging themed episode of AWS TechChat, Pete is back, and more so in person. They started the show reminiscing about messaging history, going back, looking at where we came from and how we arrived at the position we are today. More importantly, why do we use messaging and the benefits you can derive in decoupling your architecture. They then pivot to event streams, which cover both Amazon Kinesis and Amazon Managed Streaming for Apache Kafka, (Amazon MSK). They are both designed to process or analyze streaming data for specialized needs. Next, they moved to a more traditional message bus - Amazon Simple Queue Service (SQS) and Amazon MQ (Managed message broker service for ActiveMQ), both a durable pull-based messaging platform. Amazon SQS being lightweight and tightly integrated to the AWS Cloud platform and Amazon MQ supporting a variety of protocols making it a great choice for existing applications that use industry-standard protocols. Finally, they talked about push-based messaging with Amazon Simple Notification Service (SNS) and the Message Broker for AWS IoT. Both publish/subscribe (pub/sub) platform that enables you to build fan out architectures with hundreds of thousands to millions of subscribers. You now have more than a hammer to build your applications, Maslow would be proud. Speakers: Shane Baldacchino - Solutions Architect, ANZ, AWS Peter Stanski - Head of Solution Architecture, AWS Resources: Amazon CloudFront announces new Edge location in Shenzhen, China https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-cloudfront-shenzhen-launch/ What is Pub/Sub Messaging? https://aws.amazon.com/pub-sub-messaging/ Amazon Kinesis Data Streams https://aws.amazon.com/kinesis/data-streams/ Amazon Managed Streaming for Apache Kafka (Amazon MSK) https://aws.amazon.com/msk/ Apache ZooKeeper https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-zookeeper.html Amazon Simple Queue Service https://aws.amazon.com/sqs/ Amazon Simple Queue Service Released https://aws.amazon.com/blogs/aws/amazon_simple_q/ Amazon MQ https://aws.amazon.com/amazon-mq/ Amazon Simple Notification Service https://aws.amazon.com/sns/ MQTT - AWS IoT https://docs.aws.amazon.com/iot/latest/developerguide/mqtt.html Message Broker for AWS IoT https://docs.aws.amazon.com/iot/latest/developerguide/iot-message-broker.html AWS Events: AWS re:Invent https://reinvent.awsevents.com/ AWSome Day Online Series https://aws.amazon.com/events/awsome-day/awsome-day-online/ AWS Modern Application Development Online Event https://aws.amazon.com/events/application/modern-app-development/ AWS Innovate on-demand https://aws.amazon.com/events/aws-innovate/
Amazon Kinesis makes it easy to speed up the time it takes for you to get valuable, real-time insights from your streaming data. In this session, we walk through the most popular applications that customers implement using Amazon Kinesis, including streaming extract-transform-load, continuous metric generation, and responsive analytics. Our customer Autodesk joins us to describe how they created real-time metrics generation and analytics using Amazon Kinesis and Amazon Elasticsearch Service. They walk us through their architecture and the best practices they learned in building and deploying their real-time analytics solution.
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we dive deep into best practices for Kinesis Data Streams and Kinesis Data Firehose to get the most performance out of your data streaming applications. Comcast uses Amazon Kinesis Data Streams to build a Streaming Data Platform that centralizes data exchanges. It is foundational to the way our data analysts and data scientists derive real-time insights from the data. In the second part of this talk, Comcast zooms into how to properly scale a Kinesis stream. We first list the factors to consider to avoid scaling issues with standard Kinesis stream consumption, and then we see how the new fan-out feature changes these scaling considerations.
In the cloud, modern apps are decoupled into independent building blocks, called microservices, which are easier to develop, deploy, and maintain. Messaging is a central tool used to connect and coordinate these microservices. AWS offers multiple messaging services, which address a variety of use cases. In this session, learn how to choose the service that's best for your use case as we present the key technical features of each. We pay special attention to integrating messaging services with serverless technology. We cover Amazon Kinesis, Amazon SQS, and Amazon SNS in detail with discussion of other services as appropriate.
In this session, learn how Supercell architected its analytics pipeline on AWS. We dive deep into how Supercell leverages Amazon Elastic Compute Cloud (Amazon EC2), Amazon Kinesis, Amazon Simple Storage Service (Amazon S3), Amazon EMR, and Spark to ingest, process, store, and query petabytes of data. We also dive deep into how Supercell's games are architected to accommodate scaling and failure recovery. We explain how Supercell's teams are organized into small and independent cells and how this affects the technology choices they make to produce value and agility in the development process.
Real-time analytics has traditionally been analyzed using batch processing in DWH/Hadoop environments. Common use cases use data lakes, data science, and machine learning (ML). Creating serverless data-driven architecture and serverless streaming solutions with services like Amazon Kinesis, AWS Lambda, and Amazon Athena can solve real-time ingestion, storage, and analytics challenges, and help you focus on application logic without managing infrastructure. In this session, we introduce design patterns, best practices, and share customer journeys from batch to real-time insights in building modern serverless data-driven architecture applications. Hear how Intel built the Intel Pharma Analytics Platform using a serverless architecture. This AI cloud-based offering enables remote monitoring of patients using an array of sensors, wearable devices, and ML algorithms to objectively quantify the impact of interventions and power clinical studies in various therapeutics conditions.
You've designed and built a well-architected data lake and ingested extreme amounts of structured and unstructured data. Now what? In this session, we explore real-world use cases where data scientists, developers, and researchers have discovered new and valuable ways to extract business insights using advanced analytics and machine learning. We review Amazon S3, Amazon Glacier, and Amazon EFS, the foundation for the analytics clusters and data engines. We also explore analytics tools and databases, including Amazon Redshift, Amazon Athena, Amazon EMR, Amazon QuickSight, Amazon Kinesis, Amazon RDS, and Amazon Aurora; and we review the AWS machine learning portfolio and AI services such as Amazon SageMaker, AWS Deep Learning AMIs, Amazon Rekognition, and Amazon Lex. We discuss how all of these pieces fit together to build intelligent applications.
In this session, learn from market-leader Vonage how and why they re-architected their QoS-sensitive, highly available and highly performant legacy real-time communications systems to take advantage of Amazon EC2, Enhanced Networking, Amazon S3, ASG, Amazon RDS, Amazon ElastiCache, AWS Lambda, StepFunctions, Amazon SNS, Amazon SQS, Amazon Kinesis, Amazon EFS, and more. We also learn how Aspect, a multinational leader in call center solutions, used AWS Lambda, Amazon API Gateway, Amazon Kinesis, Amazon ElastiCache, Amazon Cognito, and Application Load Balancer with open-source API development tooling from Swagger, to build a comprehensive, microservices-based solution. Vonage and Aspect share their journey to TCO optimization, global outreach, and agility with best practices and insights.
Mark Grover LinkedIn Profile and Github Profile"Hadoop Application Architectures" "Drill to Detail Ep. 7 'Apache Spark and Hadoop Application Architectures'Lyft Engineering Blog"Software Engineer to Product Manager" blog by Gwen Shapira"Introduction to the Oracle Data Integrator Topology" from the Oracle Data Integrator docs siteApache Airflow and Amazon Kinesis homepages "Experimentation in a Ridesharing Marketplace" by Nicholas Chamandy, Head of Data Science at Lyft"How Uber Eats Works with Restaurants""Deliveroo has built a bunch of tiny kitchens to feed more hungry Londoners" - Wired.co.uk
Mark Grover LinkedIn Profile and Github Profile"Hadoop Application Architectures" "Drill to Detail Ep. 7 'Apache Spark and Hadoop Application Architectures'Lyft Engineering Blog"Software Engineer to Product Manager" blog by Gwen Shapira"Introduction to the Oracle Data Integrator Topology" from the Oracle Data Integrator docs siteApache Airflow and Amazon Kinesis homepages "Experimentation in a Ridesharing Marketplace" by Nicholas Chamandy, Head of Data Science at Lyft"How Uber Eats Works with Restaurants""Deliveroo has built a bunch of tiny kitchens to feed more hungry Londoners" - Wired.co.uk
As the nation's only high-speed intercity passenger rail provider, Amtrak needs to know critical information to run their business such as: Who's onboard any train at any time? How are booking and revenue trending? Amtrak was faced with unpredictable and often slow response times from existing databases, ranging from seconds to hours; existing booking and revenue dashboards were spreadsheet-based and manual; multiple copies of data were stored in different repositories, lacking integration and consistency; and operations and maintenance (O&M) costs were relatively high. Join us as we demonstrate how Deloitte and Amtrak successfully went live with a cloud-native operational database and analytical datamart for near-real-time reporting in under six months. We highlight the specific challenges and the modernization of architecture on an AWS native Platform as a Service (PaaS) solution. The solution includes cloud-native components such as AWS Lambda for microservices, Amazon Kinesis and AWS Data Pipeline for moving data, Amazon S3 for storage, Amazon DynamoDB for a managed NoSQL database service, and Amazon Redshift for near-real time reports and dashboards. Deloitte's solution enabled “at scale” processing of 1 million transactions/day and up to 2K transactions/minute. It provided flexibility and scalability, largely eliminate the need for system management, and dramatically reduce operating costs. Moreover, it laid the groundwork for decommissioning legacy systems, anticipated to save at least $1M over 3 years. Session sponsored by Deloitte
Serverless and AWS Lambda specifically enable developers to build super-scalable application components with minimal effort. You can use Amazon Kinesis and Amazon SQS to create a universal event stream to orchestrate Lambdas into much more complex applications. Now, using AWS Step Functions, we can build large distributed applications with Lambdas using visual workflows. See how Step Functions are different from Amazon SWF, how to get started with Step Functions, and how to use them to take your Lambda-based applications to the next level. We start with a few granular functions and stitch them up using Step Functions. As we build out the application, we add monitoring to ensure that changes we make actually improve things, not make them worse. Leave the session with actionable learnings for using Step Functions in your environment right away. Session sponsored by Datadog
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this session, we show you how to incorporate serverless concepts into your big data architectures. We explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness and share a reference architecture using a combination of cloud and open source technologies to solve your big data problems. Topics include: use cases and best practices for serverless big data applications; leveraging AWS technologies such as Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon Athena, and Amazon EMR; and serverless ETL, event processing, ad hoc analysis, and real-time analytics.
In this session, learn how Cox Automotive is using Splunk Cloud for real time visibility into its AWS and hybrid environments to achieve near instantaneous MTTI, reduce auction incidents by 90%, and proactively predict outages. We also introduce a highly anticipated capability that allows you to ingest, transform, and analyze data in real time using Splunk and Amazon Kinesis Firehose to gain valuable insights from your cloud resources. It's now quicker and easier than ever to gain access to analytics-driven infrastructure monitoring using Splunk Enterprise & Splunk Cloud. Session sponsored by Splunk
Reducing the time to get actionable insights from data is important to all businesses, and customers who employ batch data analytics tools are exploring the benefits of streaming analytics. Learn best practices to extend your architecture from data warehouses and databases to real-time solutions. Learn how to use Amazon Kinesis to get real-time data insights and integrate them with Amazon Aurora, Amazon RDS, Amazon Redshift, and Amazon S3. The Amazon Flex team describes how they used streaming analytics in their Amazon Flex mobile app, used by Amazon delivery drivers to deliver millions of packages each month on time. They discuss the architecture that enabled the move from a batch processing system to a real-time system, overcoming the challenges of migrating existing batch data to streaming data, and how to benefit from real-time analytics.
Real-time data processing is a powerful technique that allows businesses to make agile automated decisions. This process is particularly powerful when applied to workloads like security, analyzing access logs, parsing audit logs, and monitoring API activity to detect behavior anomalies. Combined with automation, business can quickly take action to remediate security concerns, or even train a machine learning (ML) model. We explore different techniques for analyzing real-time streams on AWS using Lambda, Amazon Kinesis, Spark with Amazon EMR, and Amazon DynamoDB. We also cover best practices around short- and long-term storage and analysis of data and, briefly, the possibility of leveraging ML.
Have a lot of real-time data piling up? Need to analyze it, transform it, and store it somewhere else real quick? What if there were an easier way to perform streaming data processing, with less setup, instant scaling, and no servers to provision and manage? With serverless computing, you can build applications to meet your real-time needs for everything from IoT data to operational logs without needing to spin up servers or install software. Come learn how to leverage AWS Lambda with Amazon Kinesis, Kinesis Firehose, and Kinesis Analytics to architect highly scalable, high throughput pipelines that can cover all your real-time processing needs. We will cover different example architectures that handle use cases like in-line process or data manipulation, as well as discuss the advantages of using an AWS managed stream.
The pace of technology innovation is relentless, especially at AWS. Designing and building new system architectures is a balancing act between using established, production-ready technologies while maintaining the ability to evolve and take advantage of new features and innovations as they become available. In this session, learn how Amazon Game Studios built a flexible analytics pipeline on AWS for their team battle sport game, Breakaway, that provided value on day one, but was built with the future in mind. We will discuss the challenges we faced and the solution we built for ingesting, storing and analyzing gameplay telemetry and dive deep into the technical architecture using AWS many services including Amazon Kinesis, Amazon S3, and Amazon Redshift. This session will focus on game analytics as a specific use case, but will emphasize an overarching focus on designing an architectural flexibility that is relevant to any system.
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we present an end-to-end streaming data solution using Kinesis Streams for data ingestion, Kinesis Analytics for real-time processing, and Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users - from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways. In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users.
Amazon Kinesis Analytics offers a built-in machine learning algorithm that you can use to easily detect anomalies in your VPC network traffic and improve security monitoring. Join us for an interactive discussion on how to stream your VPC flow Logs to Amazon Kinesis Streams and identify anomalies using Kinesis Analytics.
Thousands of services work in concert to deliver millions of hours of video streams to Netflix customers every day. These applications vary in size, function, and technology, but they all make use of the Netflix network to communicate. Understanding the interactions between these services is a daunting challenge both because of the sheer volume of traffic and the dynamic nature of deployments. In this session, we first discuss why Netflix chose Kinesis Streams to address these challenges at scale. We then dive deep into how Netflix uses Kinesis Streams to enrich network traffic logs and identify usage patterns in real time. Lastly, we cover how Netflix uses this system to build comprehensive dependency maps, increase network efficiency, and improve failure resiliency. From this session, youl learn how to build a real-time application monitoring system using network traffic logs and get real-time, actionable insights.
Fed up with stop and go in your data center? Shift into overdrive and pull into the fast lane! Learn how AutoScout24, the largest online car marketplace Europe-wide, is building its Autobahn in the Cloud. The secret ingredient? Culture! Because “cloud” is only half of the digital transformation story. The other half is how your organization deals with cultural change as you transition from the old world of IT into building microservices on AWS, with agile DevOps teams in a true „you build it, you run it“ fashion. Listen to stories from the trenches, powered by Amazon Kinesis, Amazon DynamoDB, AWS Lambda, Amazon ECS, Amazon API Gateway and much more, backed by AWS Partners, AWS Professional Services, and AWS Enterprise Support. Learn how to become cloud native, evolve your architecture, drive cultural change across teams, and manage your company's transformation for the future.
In this session, we first look at common approaches to refactoring common legacy .NET applications to microservices and AWS serverless architectures. We also look at modern approaches to .NET-based architectures on AWS. We then elaborate on running .NET Core microservices in Docker containers natively on Linux in AWS while examining the use of AWS SDK and .NET Core platform. We also look at the use of the various AWS services such as Amazon SNS, Amazon SQS, Amazon Kinesis, and Amazon DynamoDB, which provide the backbone of the platform. For example, Experian Consumer Services runs a large ecommerce platform that is now cloud based in the AWS. We look at how they went from monolithic platform to microservices, primarily in .NET Core. With a heavy push to move to Java and open source, we look at the development process, which started in the beta days of .NET Core, and how the direction Microsoft was going allowed them to use existing C# skills while pushing themselves to innovate in AWS. The large, single team of Windows based developers was broken down into several small teams to allow for rapid development into an all Linux environment.
Join this session for an in-depth look into how the Amazon CloudFront team measures the internet in real time to give our customers the best possible experience using AWS technologies, such as Amazon Kinesis and Amazon EMR. AWS customers should expect to leave this whiteboarding session with sample design patterns that they can use when they build their own distributed applications that need a feedback control system.
What is the future for IoT in retail fulfilment and logistics? We discuss retail use cases from Amazon.com, which has a global network of over 150 fulfillment centers with increasing levels of automation. To create more flexible designs and increase the pool of suppliers to choose from, Amazon developed a Machine as a Service framework based on ANSI/ISA-85, allowing Amazon software systems to be vendor agnostic. This framework now extends to incorporate AWS IoT technologies like Amazon Kinesis, AWS Lambda, and AWS Greengrass. Amazon can swap machines with the same functionality without changing the interface. Amazon.com can now create a virtual fulfilment center within Amazon EC2 to test software deployments before the actual building is completed.
In this session, attain knowledge about how AWS can help create a differentiated customer experience for your end users and employees, at scale and at the speed of innovation to meet your customer's expectations. Hear from a panel of enterprise IT executives, including Glenn Weinstein, CIO of Appirio, who are innovating and driving real transformations of their business via the AWS Cloud. They are leveraging AWS offerings to build, migrate and run their applications on AWS, including AWS Lex, AWS Lambda, Amazon Kinesis, and Amazon Redshift. Session sponsored by Wipro
In the latest episode of AWS TechChat, Dr.Pete welcomes Olivier Klein as the new co-host. The hosts kick off the episode with, information and updates around Amazon Connect, Amazon WorkSpaces, AWS Direct Connect, AWS Web Application Firewall (WAF), AWS Config, Amazon Kinesis, New Quick Start, Amazon CloudWatch, Amazon EC2 Systems Manager, Amazon Athena, Amazon Route 53 and wrap it up with an Amazon Connect demo.
Simon speaks with Roger Barga, General Manager for Amazon Kinesis, about how streaming is changing the way organisations are building systems and creating modern, responsive interactions with their customers. Useful Links: Amazon Kinesis: https://aws.amazon.com/kinesis/ Hands-on Tutorial: https://aws.amazon.com/getting-started/projects/build-log-analytics-solution/ Test Drive Using Demo Data: https://aws.amazon.com/blogs/big-data/writing-sql-on-streaming-data-with-amazon-kinesis-analytics-part-1/ Test Data Generator: https://github.com/awslabs/amazon-kinesis-data-generator Amazon Kinesis posts on the Big Data Blog: https://aws.amazon.com/blogs/big-data/tag/amazon-kinesis/
At AWS, the availability of our services is non-negotiable. While building our own services, such as Amazon CloudFront, we learn from and develop our own design patterns for high availability. In this session, we review several of these design patterns, and we show how you can implement the patterns in your own services or applications built on top of AWS using services such as Amazon Kinesis, AWS Elastic Beanstalk, or AWS Lambda.
Learn how AWS processes millions of records per second to support accurate metering across AWS and our customers. This session shows how we migrated from traditional frameworks to AWS managed services to support a large processing pipeline. You will gain insights on how we used AWS services to build a reliable, scalable, and fast processing system using Amazon Kinesis, Amazon S3, and Amazon EMR. Along the way we dive deep into use cases that deal with scaling and accuracy constraints. Attend this session to see AWS’s end-to-end solution that supports metering at AWS.
Building big data applications often requires integrating a broad set of technologies to store, process, and analyze the increasing variety, velocity, and volume of data being collected by many organizations. In this session, we show how you can build entire big data applications using a core set of managed services including Amazon S3, Amazon Kinesis, Amazon EMR, Amazon Elasticsearch Service, Amazon Redshift, and Amazon QuickSight.
The growing popularity and breadth of use cases for IoT are challenging the traditional thinking of how data is acquired, processed, and analyzed to quickly gain insights and act promptly. Today, the potential of this data remains largely untapped. In this session, we explore architecture patterns for building comprehensive IoT analytics solutions using AWS big data services. We walk through two production-ready implementations. First, we present an end-to-end solution using AWS IoT, Amazon Kinesis, and AWS Lambda. Next, Hello discusses their consumer IoT solution built on top of Amazon Kinesis, Amazon DynamoDB, and Amazon Redshift.
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
Amazon Kinesis is a platform of services for building real-time, streaming data applications in the cloud. Customers can use Amazon Kinesis to collect, stream, and process real-time data such as website clickstreams, financial transactions, social media feeds, application logs, location-tracking events, and more. In this session, we first cover best practices for building an end-to-end streaming data applications using Amazon Kinesis. Next, Beeswax, which provides real-time Bidder as a Service for programmatic digital advertising, will talk about how they built a feature-rich, real-time streaming data solution on AWS using Amazon Kinesis, Amazon Redshift, Amazon S3, Amazon EMR, and Apache Spark. Beeswax will discuss key components of their solution including scalable data capture, messaging hub for archival, data warehousing, near real-time analytics, and real-time alerting.
In November 2015, Capital Games launched a mobile game accompanying a major feature film release. The back end of the game is hosted in AWS and uses big data services like Amazon Kinesis, Amazon EC2, Amazon S3, Amazon Redshift, and AWS Data Pipeline. Capital Games will describe some of their challenges on their initial setup and usage of Amazon Redshift and Amazon EMR. They will then go over their engagement with AWS Partner 47lining and talk about specific best practices regarding solution architecture, data transformation pipelines, and system maintenance using AWS big data services. Attendees of this session should expect a candid view of the process to implementing a big data solution. From problem statement identification to visualizing data, with an in-depth look at the technical challenges and hurdles along the way.
Customers are adopting Apache Spark ‒ an open-source distributed processing framework ‒ on Amazon EMR for large-scale machine learning workloads, especially for applications that power customer segmentation and content recommendation. By leveraging Spark ML, a set of machine learning algorithms included with Spark, customers can quickly build and execute massively parallel machine learning jobs. Additionally, Spark applications can train models in streaming or batch contexts, and can access data from Amazon S3, Amazon Kinesis, Amazon Redshift, and other services. This session explains how to quickly and easily create scalable Spark clusters with Amazon EMR, build and share models using Apache Zeppelin and Jupyter notebooks, and use the Spark ML pipelines API to manage your training workflow. In addition, Jasjeet Thind, Senior Director of Data Science and Engineering at Zillow Group, will discuss his organization's development of personalization algorithms and platforms at scale using Spark on Amazon EMR.
Toyota Racing Development (TRD) developed a robust and highly performant real-time data analysis tool for professional racing. In this talk, learn how we structured a reliable, maintainable, decoupled architecture built around Amazon DynamoDB as both a streaming mechanism and a long-term persistent data store. In racing, milliseconds matter and even moments of downtime can cost a race. You'll see how we used DynamoDB together with Amazon Kinesis and Kinesis Firehose to build a real-time streaming data analysis tool for competitive racing.
Raju Gulabani, vice president of AWS Database Services (AWS), discusses the evolution of database services on AWS and the new database services and features we launched this year, and shares our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different shapes and forms. Storage, management, and analysis of this data requires database services that scale and perform in ways not possible before. AWS offers a collection of such database and other data services like Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR to process, store, manage, and analyze data. In this session, we provide an overview of AWS database services and discuss how our customers are using these services today.
How does McGraw-Hill Education use the AWS platform to scale and reliably receive 10,000 learning events per second? How do we provide near-real-time reporting and event-driven analytics for hundreds of thousands of concurrent learners in a reliable, secure, and auditable manner that is cost effective? MHE designed and implemented a robust solution that integrates AWS API Gateway, AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Elasticsearch Service, Amazon DynamoDB, HDFS, Amazon EMR, Amazopn EC2, and other technologies to deliver this cloud-native platform across the US and soon the world. This session describes the challenges we faced, architecture considerations, how we gained confidence for a successful production roll-out, and the behind-the-scenes lessons we learned.
Serverless architecture can eliminate the need to provision and manage servers required to process files or streaming data in real time. In this session, we will cover the fundamentals of using AWS Lambda to process data in real-time from push sources such as AWS Iot and pull sources such as Amazon DynamoDB Streams or Amazon Kinesis. We will walk through sample use cases and demonstrate how to set up some of these real-time data processing solutions. We'll also discuss best practices and do a deep dive into AWS Lambda real-time stream processing.
AWS serverless architecture components such as Amazon S3, Amazon SQS, Amazon SNS, CloudWatch Logs, DynamoDB, Amazon Kinesis, and Lambda can be tightly constrained in their operation. However, it may still be possible to use some of them to propagate payloads that could be used to exploit vulnerabilities in some consuming endpoints or user-generated code. This session explores techniques for enhancing the security of these services, from assessing and tightening permissions in IAM to integrating tools and mechanisms for inline and out-of-band payload analysis that are more typically applied to traditional server-based architectures.
In this deep-dive session, we outline how to leverage the appropriate AWS services for sending different types and sizes of data, such as images or streaming video. We'll cover common real-world scenarios related to authentication/authorization, access patterns, data transfer and caching for more performant Mobile Apps. You learn when you should access services such as Amazon Cognito, Amazon DynamoDB, Amazon S3, or Amazon Kinesis directly from your mobile app, and when you should route through Amazon API Gateway and AWS Lambda instead. Additionally, we cover coding techniques across the native, hybrid, and mobile web using popular open-source frameworks to perform these actions efficiently, and with a smooth user experience.
Quantcast provides its advertising clients the ability to run targeted ad campaigns reaching millions of online users. The real-time bidding for campaigns runs on thousands of machines across the world. When Quantcast wanted to collect and analyze campaign metrics in real-time, they turned to AWS to rapidly build a scalable, resilient, and extensible framework. Quantcast used Amazon Kinesis streams to stage data, Amazon EC2 instances to shuffle and aggregate the data, and Amazon DynamoDB and Amazon ElastiCache for building scalable time-series databases. With Elastic Load Balancing and Auto Scaling groups, they are able to set up distributed microservices with minimal operation overhead. This session discusses their use case, how they architected the application with AWS technologies integrated with their existing home-grown stack, and the lessons they learned.
Issei Naruta さんをゲストに迎えて、AWS Re:Invent などについて話しました。 Show Notes Amazon Lightsail: Simple Virtual Private Servers on AWS In the Works – Amazon EC2 Elastic GPUs Programmable chips turning Azure into a supercomputing powerhouse 負荷低すぎはもはや障害じゃないのか Amazon Athena — Serverless Interactive Query Service Log streaming: Amazon S3 - Streaming logs | Fastly Presto | Distributed SQL Query Engine for Big Data Amazon Aurora Update – PostgreSQL Compatibility Why Uber Engineering Switched from Postgres to MySQL AWS Snowmobile – Massive Exabyte-Scale Data Transfer Service Amazon Polly – Lifelike Text-to-Speech Amazon Kinesis Fluentd | Open Source Data Collector AWS Greengrass – Embeded Lambda Compute in Connected Devices AWS CodeBuild – Continuous Integration Service Blox - Open Source Tools for Amazon ECS Packer by HashiCorp
Mark Rittman is joined once more by Stewart Bryson, talking about Oracle's recent reboot of it's cloud big data platform at Oracle Openworld 2016, thoughts on DataFlowML and comparisons with Google's Cloud DataFlow and Amazon Kinesis, and data storytelling with Oracle Data Visualisation Desktop 2.0
Mark Rittman is joined once more by Stewart Bryson, talking about Oracle's recent reboot of it's cloud big data platform at Oracle Openworld 2016, thoughts on DataFlowML and comparisons with Google's Cloud DataFlow and Amazon Kinesis, and data storytelling with Oracle Data Visualisation Desktop 2.0