Podcasts about amazon kinesis

22PODCASTS
73EPISODES
48mAVG DURATION
?INFREQUENT EPISODES
Oct 8, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about amazon kinesis

AWS re:Invent 2017

18 episodes with amazon kinesis

AWS re:Invent 2016

15 episodes with amazon kinesis

AWS re:Invent 2019

11 episodes with amazon kinesis

Drill to Detail

4 episodes with amazon kinesis

AWS re:Invent 2018

7 episodes with amazon kinesis

AWS TechChat

2 episodes with amazon kinesis

Latest podcast episodes about amazon kinesis

Software Toolbox: OPC Server, Router, DataHub and more (P248)

The Automation Podcast

Play Episode Listen Later Oct 8, 2025 57:48 Transcription Available

Shawn Tierney meets up with Connor Mason of Software Toolbox to learn their company, products, as well as see a demo of their products in action in this episode of The Automation Podcast. For any links related to this episode, check out the “Show Notes” located below the video. Watch The Automation Podcast from The Automation Blog: Listen to The Automation Podcast from The Automation Blog: The Automation Podcast, Episode 248 Show Notes: Special thanks to Software Toolbox for sponsoring this episode so we could release it “ad free!” To learn about Software Toolbox please checkout the below links: TOP Server Cogent DataHub Industries Case studies Technical blogs Read the transcript on The Automation Blog: (automatically generated) Shawn Tierney (Host): Welcome back to the automation podcast. My name is Shawn Tierney with Insights and Automation, and I wanna thank you for tuning back in this week. Now this week on the show, I meet up with Connor Mason from Software Toolbox, who gives us an overview of their product suite, and then he gives us a demo at the end. And even if you’re listening, I think you’re gonna find the demo interesting because Connor does a great job of talking through what he’s doing on the screen. With that said, let’s go ahead and jump into this week’s episode with Connor Mason from Software Toolbox. I wanna welcome Connor from Software Toolbox to the show. Connor, it’s really exciting to have you. It’s just a lot of fun talking to your team as we prepared for this, and, I’m really looking forward to because I just know in your company over the years, you guys have so many great solutions that I really just wanna thank you for coming on the show. And before you jump into talking about products and technologies Yeah. Could you first tell us just a little bit about yourself? Connor Mason (Guest): Absolutely. Thanks, Shawn, for having us on. Definitely a pleasure to be a part of this environment. So my name is Connor Mason. Again, I’m with Software Toolbox. We’ve been around for quite a while. So we’ll get into some of that history as well before we get into all the the fun technical things. But, you know, I’ve worked a lot with the variety of OT and IT projects that are ongoing at this point. I’ve come up through our support side. It’s definitely where we grow a lot of our technical skills. It’s a big portion of our company. We’ll get that into that a little more. Currently a technical application consultant lead. So like I said, I I help run our support team, help with these large solutions based projects and consultations, to find what’s what’s best for you guys out there. There’s a lot of different things that in our in our industry is new, exciting. It’s fast paced. Definitely keeps me busy. My background was actually in data analytics. I did not come through engineering, did not come through the automation, trainings at all. So this is a whole new world for me about five years ago, and I’ve learned a lot, and I really enjoyed it. So, I really appreciate your time having us on here, Shawn Tierney (Host): Shawn. Well, I appreciate you coming on. I’m looking forward to what you’re gonna show us today. I had a the audience should know I had a little preview of what they were gonna show, so I’m looking forward to it. Connor Mason (Guest): Awesome. Well, let’s jump right into it then. So like I said, we’re here at Software Toolbox, kinda have this ongoing logo and and just word map of connect everything, and that’s really where we lie. Some people have called us data plumbers in the past. It’s all these different connections where you have something, maybe legacy or something new, you need to get into another system. Well, how do you connect all those different points to it? And, you know, throughout all these projects we worked on, there’s always something unique in those different projects. And we try to work in between those unique areas and in between all these different integrations and be something that people can come to as an expert, have those high level discussions, find something that works for them at a cost effective solution. So outside of just, you know, products that we offer, we also have a lot of just knowledge in the industry, and we wanna share that. You’ll kinda see along here, there are some product names as well that you might recognize. Our top server and OmniServer, we’ll be talking about LOPA as well. It’s been around in the industry for, you know, decades at this point. And also our symbol factory might be something you you may have heard in other products, that they actually utilize themselves for HMI and and SCADA graphics. That is that is our product. So you may have interacted it with us without even knowing it, and I hope we get to kind of talk more about things that we do. So before we jump into all the fun technical things as well, I kind of want to talk about just the overall software toolbox experience as we call it. We’re we’re more than just someone that wants to sell you a product. We we really do work with, the idea of solutions. How do we provide you value and solve the problems that you are facing as the person that’s actually working out there on the field, on those operation lines, and making things as well. And that’s really our big priority is providing a high level of knowledge, variety of the things we can work with, and then also the support. It’s very dear to me coming through the the support team is still working, you know, day to day throughout that software toolbox, and it’s something that has been ingrained into our heritage. Next year will be thirty years of software toolbox in 2026. So we’re established in 1996. Through those thirty years, we have committed to supporting the people that we work with. And I I I can just tell you that that entire motto lives throughout everyone that’s here. So from that, over 97% of the customers that we interact with through support say they had an awesome or great experience. Having someone that you can call that understands the products you’re working with, understands the environment you’re working in, understands the priority of certain things. If you ever have a plant shut down, we know how stressful that is. Those are things that we work through and help people throughout. So this really is the core pillars of Software Toolbox and who we are, beyond just the products, and and I really think this is something unique that we have continued to grow and stand upon for those thirty years. So jumping right into some of the industry challenges we’ve been seeing over the past few years. This is also a fun one for me, talking about data analytics and tying these things together. In my prior life and education, I worked with just tons of data, and I never fully knew where it might have come from, why it was such a mess, who structured it that way, but it’s my job to get some insights out of that. And knowing what the data actually was and why it matters is a big part of actually getting value. So if you have dirty data, if you have data that’s just clustered, it’s in silos, it’s very often you’re not gonna get much value out of it. This was a study that we found in 2024, from Garner Research, And it said that, based on the question that business were asked, were there any top strategic priorities for your data analytics functions in 2024? And almost 50%, it’s right at ’49, said that they wanted to improve data quality, and that was a strategic priority. This is about half the industry is just talking about data quality, and it’s exactly because of those reasons I said in my prior life gave me a headache, to look at all these different things that I don’t even know where they became from or or why they were so different. And the person that made that may have been gone may not have the contacts, and making that from the person that implemented things to the people that are making decisions, is a very big task sometimes. So if we can create a better pipeline of data quality at the beginning, makes those people’s lives a lot easier up front and allows them to get value out of that data a lot quicker. And that’s what businesses need. Shawn Tierney (Host): You know, I wanna just data quality. Right? Mhmm. I think a lot of us, when we think of that, we think of, you know, error error detection. We think of lost connections. We think of, you know, just garbage data coming through. But I I think from an analytical side, there’s a different view on that, you know, in line with what you were just saying. So how do you when you’re talking to somebody about data quality, how do you get them to shift gears and focus in on what you’re talking about and not like a quality connection to the device itself? Connor Mason (Guest): Absolutely. Yeah. We I kinda live in both those worlds now. You know, I I get to see that that connection state. And when you’re operating in real time, that quality is also very important to you. Mhmm. And I kind of use that at the same realm. Think of that when you’re thinking in real time, if you know what’s going on in the operation and where things are running, that’s important to you. That’s the quality that you’re looking for. You have to think beyond just real time. We’re talking about historical data. We’re talking about data that’s been stored for months and years. Think about the quality of that data once it’s made up to that level. Are they gonna understand what was happening around those periods? Are they gonna understand what those tags even are? Are they gonna understand what those conventions that you’ve implemented, to give them insights into this operation. Is that a clear picture? So, yeah, you’re absolutely right. There are two levels to this, and and that is a big part of it. The the real time data and historical, and we’re gonna get some of that into into our demo as well. It it’s a it’s a big area for the business, and the people working in the operations. Shawn Tierney (Host): Yeah. I think quality too. Think, you know, you may have data. It’s good data. It was collected correctly. You had a good connection to the device. You got it. You got it as often as you want. But that data could really be useless. It could tell you nothing. Connor Mason (Guest): Right. Exactly. Shawn Tierney (Host): Right? It could be a flow rate on part of the process that irrelevant to monitoring the actual production of the product or or whatever you’re making. And, you know, I’ve known a lot of people who filled up their databases, their historians, with they just they just logged everything. And it’s like a lot of that data was what I would call low quality because it’s low information value. Right? Absolutely. I’m sure you run into that too. Connor Mason (Guest): Yeah. We we run into a lot of people that, you know, I’ve got x amount of data points in my historian and, you know, then we start digging into, well, I wanna do something with it or wanna migrate. Okay. Like, well, what do you wanna achieve at the end of this? Right? And and asking those questions, you know, it’s great that you have all these things historized. Are you using it? Do you have the right things historized? Are they even set up to be, you know, worked upon once they are historized by someone outside of this this landscape? And I think OT plays such a big role in this, and that’s why we start to see the convergence of the IT and OT teams just because that communication needs to occur sooner. So we’re not just passing along, you know, low quality data, bad quality data as well. And we’ll get into some of that later on. So to jump into some of our products and solutions, I kinda wanna give this overview of the automation pyramid. This is where we work from things like the field device communications. And you you have certain sensors, meters, actuators along the actual lines, wherever you’re working. We work across all the industries, so this can vary between those. Through there, you work up kind of your control area. A lot of control engineers are working. This is where I think a lot of the audience is very familiar with PLCs. Your your typical name, Siemens, Rockwell, your Schneiders that are creating, these hardware products. They’re interacting with things on the operation level, and they’re generating data. That that was kind of our bread and butter for a very long time and still is that communication level of getting data from there, but now getting it up the stack further into the pyramid of your supervisory, MES connections, and it’ll also now open to these ERP. We have a lot of large corporations that have data across variety of different solutions and also want to integrate directly down into their operation levels. There’s a lot of value to doing that, but there’s also a lot of watch outs, and a lot of security concerns. So that’ll be a topic that we’ll be getting into. We also all know that the cloud is here. It’s been here, and it’s it’s gonna continue to push its way into, these cloud providers into OT as well. There there’s a lot of benefit to it, but there there’s also some watch outs as this kind of realm, changes in the landscape that we’ve been used to. So there’s a lot of times that we wanna get data out there. There’s value into AI agents. It’s a hot it’s a hot commodity right now. Analytics as well. How do we get those things directly from shop floor, up into the cloud directly, and how do we do that securely? It’s things that we’ve been working on. We’ve had successful projects, continues to be an interest area and I don’t see it slowing down at all. Now, when we kind of begin this level at the bottom of connectivity, people mostly know us for our top server. This is our platform for industrial device connectivity. It’s a thing that’s talking to all those different PLCs in your plant, whether that’s brownfield or greenfield. We pretty much know that there’s never gonna be a plant that’s a single PLC manufacturer, that exists in one plant. There’s always gonna be something that’s slightly different. Definitely from Brownfield, things different engineers made different choices, things have been eminent, and you gotta keep running them. TopServe provides this single platform to connect to a long laundry list of different PLCs. And if this sounds very familiar to Kepserver, well, you’re not wrong. Kepserver is the same exact technology that TopServer is. What’s the difference then is probably the biggest question we usually get. The difference technology wise is nothing. The difference in the back end is that actually it’s all the same product, same product releases, same price, but we have been the biggest single source of Kepserver or Topsyra implementation into the market, for almost two plus decades at this point. So the single biggest purchase that we own this own labeled version of Kepserver to provide to our customers. They interact with our support team, our solutions teams as well, and we sell it along the stack of other things because it it fits so well. And we’ve been doing this since the early two thousands when, Kepware was a a much smaller company than it is now, and we’ve had a really great relationship with them. So if you’ve enjoyed the technology of of Kepserver, maybe there’s some users out there. If you ever heard of TopServer and that has been unclear, I hope this clear clarifies it. But it it is a great technology stack that that we build upon and we’ll get into some of that in our demo. Now the other question is, what if you don’t have a standard communication protocol, like a modbus, like an Allen Bradley PLC as well? We see this a lot with, you know, testing areas, pharmaceuticals, maybe also in packaging, barcode scanners, weigh scales, printers online as well. They they may have some form of basic communications that talks over just TCP or or serial. And how do you get that information that’s really valuable still, but it’s not going through a PLC. It’s not going into your typical agent mind SCADA. It might be very manual process for a lot of these test systems as well, how they’re collecting and analyzing the data. Well, you may have heard of our Arm server as well. It’s been around, like I said, for a couple decades and just a proven solution that without coding, you can go in and build a custom protocol that expects a format from that device, translates it, puts it into standard tags, and now that those tags can be accessible through the open standards of OPC, or to it was a a Veeva user suite link as well. And that really provides a nice combination of your standard communications and also these more custom communications may have been done through scripting in the past. Well, you know, put this onto, an actual server that can communicate through those protocols natively, and just get that data into those SCADA systems, HMIs, where you need it. Shawn Tierney (Host): You know, I used that. Many years ago, I had an integrator who came to me. He’s like, Shawn, I wanna this is back in the RSVUE days. He’s like, Shawn, I I got, like, 20 Euotherm devices on a four eighty five, and they speak ASCII, and I gotta I gotta get into RSVUE 32. And, you know, OmniSIR, I love that you could you could basically developing and we did Omega and some other devices too. You’re developing your own protocol, but it’s beautiful. And and the fact that when you’re testing it, it color codes everything. So you know, hey. That part worked. The header worked. The data worked. Oh, the trailing didn’t work, or the terminated didn’t work, or the data’s not in the right format. Or I just it was a joy to work with back then, and I can imagine it’s only gotten better since. Connor Mason (Guest): Yeah. I think it’s like a little engineer playground where you get in there. It started really decoding and seeing how these devices communicate. And then once you’ve got it running, it it’s one of those things that it it just performs and, is saved by many people from developing custom code, having to manage that custom code and integrations, you know, for for many years. So it it’s one of those things that’s kinda tried, tested, and, it it’s kind of a staple still our our base level communications. Alright. So moving along kind of our automation pyramid as well. Another part of our large offering is the Cogent data hub. Some people may have heard from this as well. It’s been around for a good while. It’s been part of our portfolio for for a while as well. This starts building upon where we had the communication now up to those higher echelons of the pyramid. This is gonna bring in a lot of different connectivities. You if you’re not if you’re listening, it it’s kind of this cog and spoke type of concept for real time data. We also have historical implementations. You can connect through a variety of different things. OPC, both the profiles for alarms and events, and even OPC UA’s alarming conditions, which is still getting adoption across the, across the industry, but it is growing. As part of the OPC UA standard, we have integrations to MQTT. It can be its own MQTT broker, and it can also be an MQTT client. That has grown a lot. It’s one of those things that lives be besides OPC UA, not exactly a replacement. If you ever have any questions about that, it’s definitely a topic I love to talk about. There’s space for for this to combine the benefits of both of these, and it’s so versatile and flexible for these different type of implementations. On top of that, it it’s it’s a really strong tool for conversion and aggregation. You kind of add this, like, its name says, it’s a it’s a data hub. You send all the different information to this. It stores it into, a hierarchy with a variety of different modeling that you can do within it. That’s gonna store these values across a standard data format. Once I had data into this, any of those different connections, I can then send data back out. So if I have anything that I know is coming in through a certain plug in like OPC, bring that in, send it out to on these other ones, OPC, DA over to MQTT. It could even do DDA if I’m still using that, which I probably wouldn’t suggest. But overall, there’s a lot of good benefits from having something that can also be a standardization, between all your different connections. I have a lot of different things, maybe variety of OPC servers, legacy or newer. Bring that into a data hub, and then all your other connections, your historians, your MAS, your SCADAs, it can connect to that single point. So it’s all getting the same data model and values from a single source rather than going out and making many to many connections. A a large thing that it was originally, used for was getting around DCOM. That word is, you know, it might send some shivers down people’s spines still, to this day, but it’s it’s not a fun thing to deal with DCOM and also with the security hardening. It’s just not something that you really want to do. I’m sure there’s a lot of security professionals would advise against EPRA doing it. This tunneling will allow you to have a data hub that locally talks to any of the DA server client, communicate between two data hubs over a tunnel that pushes the data just over TCP, takes away all the comm wrappers, and now you just have values that get streamed in between. Now you don’t have to configure any DCOM at all, and it’s all local. So a lot of people went transitioning, between products where maybe the server only supports OPC DA, and then the client is now supporting OPC UA. They can’t change it yet. This has allowed them to implement a solution quickly and cost and at a cost effective price, without ripping everything out. Shawn Tierney (Host): You know, I wanna ask you too. I can see because this thing is it’s a data hub. So if you’re watching and you’re if you’re listening and not watching, you you’re not gonna see, you know, server, client, UAD, a broker, server, client. You know, just all these different things up here on the site. Do you what how does somebody find out if it does what they need? I mean, do you guys have a line they can call to say, I wanna do this to this. Is that something Data Hub can do, or is there a demo? What would you recommend to somebody? Connor Mason (Guest): Absolutely. Reach out to us. We we have a a lot of content outline, and it’s not behind any paywall or sign in links even. You you can always go to our website. It’s just softwaretoolbox.com. Mhmm. And that’s gonna get you to our product pages. You can download any product directly from there. They have demo timers. So typically with, with coaching data hub, after an hour, it will stop. You can just rerun it. And then call our team. Yeah. We have a solutions team that can work with you on, hey. What do I need as well? Then our support team, if you run into any issues, can help you troubleshoot that as well. So, I’ll have some contact information at the end, that’ll get some people to, you know, where they need to go. But you’re absolutely right, Shawn. Because this is so versatile, everyone’s use case of it is usually something a little bit different. And the best people to come talk to that is us because we’ve we’ve seen all those differences. So Shawn Tierney (Host): I think a lot of people run into the fact, like, they have a problem. Maybe it’s the one you said where they have the OPC UA and it needs to connect to an OPC DA client. And, you know, and a lot of times, they’re they’re a little gunshot to buy a license because they wanna make sure it’s gonna do exactly what they need first. And I think that’s where having your people can, you know, answer their questions saying, yes. We can do that or, no. We can’t do that. Or, you know, a a demo that they could download and run for an hour at a time to actually do a proof of concept for the boss who’s gonna sign off on purchasing this. And then the other thing is too, a lot of products like this have options. And you wanna make sure you’re buying the ticking the right boxes when you buy your license because you don’t wanna buy something you’re not gonna use. You wanna buy the exact pieces you need. So I highly recommend I mean, this product just does like, I have, in my mind, like, five things I wanna ask right now, but not gonna. But, yeah, def definitely, when it when it comes to a product like this, great to touch base with these folks. They’re super friendly and helpful, and, they’ll they’ll put you in the right direction. Connor Mason (Guest): Yeah. I I can tell you that’s working someone to support. Selling someone a solution that doesn’t work is not something I’ve been doing. Bad day. Right. Exactly. Yeah. And we work very closely, between anyone that’s looking at products. You know, me being as technical product managers, well, I I’m engaged in those conversations. And Mhmm. Yeah. If you need a demo license, reach out to us to extend that. We wanna make sure that you are buying something that provides you value. Now kind of moving on into a similar realm. This is one of our still somewhat newer offerings, I say, but we’ve been around five five plus years, and it’s really grown. And I kinda said here, it’s called OPC router, and and it’s not it’s not a networking tool. A lot of people may may kinda get that. It’s more of a, kind of a term about, again, all these different type of connections. How do you route them to different ways? It it kind of it it separates itself from the Cogent data hub, and and acting at this base level of being like a visual workflow that you can assign various tasks to. So if I have certain events that occur, I may wanna do some processing on that before I just send data along, where the data hub is really working in between converting, streaming data, real time connections. This gives you a a kind of a playground to work around of if I have certain tasks that are occurring, maybe through a database that I wanna trigger off of a certain value, based on my SCADA system, well, you can build that in in these different workflows to execute exactly what you need. Very, very flexible. Again, it has all these different type of connections. The very unique ones that have also grown into kind of that OT IT convergence, is it can be a REST API server and client as well. So I can be sending out requests to, RESTful servers where we’re seeing that hosted in a lot of new applications. I wanna get data out of them. Or once I have consumed a variety of data, I can become the REST server in OPC router and offer that to other applications to request data from itself. So, again, it can kind of be that centralized area of information. The other thing as we talked about in the automation pyramid is it has connections directly into SAP and ERP systems. So if you have work orders, if you have materials, that you wanna continue to track and maybe trigger things based off information from your your operation floors via PLCs tracking, how they’re using things along the line, and that needs to match up with what the SAP system has for, the amount of materials you have. This can be that bridge. It’s really is built off the mindset of the OT world as well. So we kinda say this helps empower the OT level because we’re now giving them the tools to that they understand what what’s occurring in their operations. And what could you do by having a tool like this to allow you to kind of create automated workflows based off certain values and certain events and automate some of these things that you may be doing manually or doing very convoluted through a variety of solutions. So this is one of those prod, products as well that’s very advanced in the things that supports. Linux and Docker containers is, is definitely could be a hot topic, rightly fleet rightfully so. And this can run on a on a Docker container deployed as well. So we we’ve seen that with the I IT folks that really enjoy being able to control and to higher deployment, allows you to update easily, allows you to control and spin up new containers as well. This gives you a lot of flexibility to to deploy and manage these systems. Shawn Tierney (Host): You know, I may wanna have you back on to talk about this. I used to there’s an old product called Rascal that I used to use. It was a transaction manager, and it would based on data changing or on a time that as a trigger, it could take data either from the PLC to the database or from the database to the PLC, and it would work with stored procedures. And and this seems like it hits all those points, And it sounds like it’s a visual like you said, right there on the slide, visual workflow builder. Connor Mason (Guest): Yep. Shawn Tierney (Host): So you really piqued my interest with this one, and and it may be something we wanna come back to and and revisit in the future, because, it just it’s just I know that that older product was very useful and, you know, it really solved a lot of old applications back in the day. Connor Mason (Guest): Yeah. Absolutely. And this this just takes that on and builds even more. If you if anyone was, kind of listening at the beginning of this year or two, a conference called Prove It that was very big in the industry, we were there to and we presented on stage a solution that we had. Highly recommend going searching for that. It’s on our web pages. It’s also on their YouTube links, and it’s it’s called Prove It. And OPC router was a big part of that in the back end. I would love to dive in and show you the really unique things. Kind of as a quick overview, we’re able to use Google AI vision to take camera data and detect if someone was wearing a hard hat. All that logic and behind of getting that information to Google AI vision, was through REST with OPC router. Then we were parsing that information back through that, connection and then providing it back to the PLCs. So we go all the way from a camera to a PLC controlling a light stack, up to Google AI vision through OPC router, all on hotel Wi Fi. It’s very imp it’s very, very fun presentation, and, our I think our team did a really great job. So a a a pretty new offering I have I wanna highlight, is our is our data caster. This is a an actual piece of hardware. You know, our software toolbox is we we do have some hardware as well. It’s just, part of the nature of this environment of how we mesh in between things. But the the idea is that, there’s a lot of different use cases for HMI and SCADA. They have grown so much from what they used to be, and they’re very core part of the automation stack. Now a lot of times, these are doing so many things beyond that as well. What we found is that in different areas of operations, you may not need all that different control. You may not even have the space to make up a whole workstation for that as well. What this does, the data caster, is, just simply plug it plugs it into any network and into an HDMI compatible display, and it gives you a very easy configure workplace to put a few key metrics onto a screen. So if I have different things from you can connect directly to PLCs like Allen Bradley. You can connect to SQL databases. You can also connect to rest APIs to gather the data from these different sources and build a a a kind of easy to to view, KPI dashboard in a way. So if you’re on a operation line and you wanna look at your current run rate, maybe you have certain things in the POC tags, you know, flow and pressure that’s very important for those operators to see. They may not be, even the capacity to be interacting with anything. They just need visualizations of what’s going on. This product can just be installed, you know, industrial areas with, with any type of display that you can easily access and and give them something that they can easily look at. It’s configured all through a web browser to display what you want. You can put on different colors based on levels of values as well. And it’s just I feel like a very simple thing that sometimes it seems so simple, but those might be the things that provide value on the actual operation floor. This is, for anyone that’s watching, kind of a quick view of a very simple screen. What we’re showing here is what it would look like from all the different data sources. So talking directly to ControlLogs PLC, talking to SQL databases, micro eight eight hundreds, an arrest client, and and what’s coming very soon, definitely by the end of this year, is OPC UA support. So any OPC UA server that’s out there that’s already having your PLC data or etcetera, this could also connect to that and get values from there. Shawn Tierney (Host): Can I can you make it I’m I’m here I go? Can you make it so it, like, changes, like, pages every few seconds? Connor Mason (Guest): Right now, it is a single page, but this is, like I said, very new product, so we’re taking any feedback. If, yeah, if there’s this type of slideshow cycle that would be, you know, valuable to anyone out there, let us know. We’re definitely always interested to see the people that are actually working out at these operation sites, what what’s valuable to them. Yeah. Shawn Tierney (Host): A lot of kiosks you see when when you’re traveling, it’ll say, like, line one well, I’ll just throw out there. Line one, and that’ll be on there for five seconds, and then it’ll go line two. That’ll be on there for five seconds, and then line you know, I and that’s why I just mentioned that because I can see that being a question that, that that I would get from somebody who is asking me about it. Connor Mason (Guest): Oh, great question. Appreciate it. Alright. So now we’re gonna set time for a little hands on demo. For anyone that’s just listening, we’re gonna I’m gonna talk about this at at a high level and walk through everything. But the idea is that, we have a few different POCs, very common in Allen Bradley and just a a Siemens seven, s seven fifteen hundred that’s in our office, pretty close to me on the other side of the wall wall, actually. We’re gonna first start by connecting that to our top server like we talked about. This is our industrial communication server, that offers both OCDA, OC UA, SweetLink connectivity as well. And then we’re gonna bring this into our Cogent data hub. This we talked about is getting those values up to these higher levels. What we’ll be doing is also tunneling the data. We talked about being able to share data through the data hubs themselves. Kinda explain why we’re doing that here and the value you can add. And then we’re also gonna showcase adding on MQTT to this level. Taking beta now just from these two PLCs that are sitting on a rack, and I can automatically make all that information available in the MQTT broker. So any MQTT client that’s out there that wants to subscribe to that data, now has that accessible. And I’ve created this all through a a really simple workflow. We also have some databases connected. Influx, we install with Code and DataHub, has a free visualization tool that kinda just helps you see what’s going on in your processes. I wanna showcase a little bit of that as well. Alright. So now jumping into our demo, when we first start off here is the our top server. Like I mentioned before, if anyone has worked with KEP server in the past, this is gonna look very similar. Like it because it is. The same technology and all the things here. The the first things that I wanted to establish in our demo, was our connection to our POCs. I have a few here. We’re only gonna use the Allen Bradley and the Siemens, for the the time that we have on our demo here. But how this builds out as a platform is you create these different channels and the devices connections between them. This is gonna be your your physical connections to them. It’s either, IP TCPIP connection or maybe your serial connection as well. We have support for all of them. It really is a long list. Anyone watching out there, you can kind of see all the different drivers that that we offer. So allowing this into a single platform, you can have all your connectivity based here. All those different connections that you now have that up the stack, your SCADA, your historians, MAS even as well, they can all go to a single source. Makes that management, troubleshooting, all those a bit easier as well. So one of the first things I did here, I have this built out, but I’ll kinda walk through what you would typically do. You have your Allen Bradley ControlLogix Ethernet driver here first. You know, I have some IPs in here I won’t show, but, regardless, we have our our our drivers here, and then we have a set of tags. These are all the global tags in the programming of the PLC. How I got these to to kind of map automatically is in our in our driver, we’re able to create tags automatically. So you’re able to send a command to that device and ask for its entire tag database. They can come back, provide all that, map it out for you, create those tags as well. This saves a lot of time from, you know, an engineer have to go in and, addressing all the individual items themselves. So once it’s defined in the program project, you’re able to bring this all in automatically. I’ll show now how easy that makes it connecting to something like the Cogent data hub. In a very similar fashion, we have a connection over here to the Siemens, PLC that I also have. You can see beneath it all these different tag structures, and this was created the exact same way. While those those PLC support it, you can do an automatic tag generation, bring in all the structure that you’ve already built out your PLC programming, and and make this available on this OPC server now as well. So that’s really the basis. We first need to establish communications to these PLCs, get that tag data, and now what do we wanna do with it? So in this demo, what I wanted to bring up was, the code in DataHub next. So here, I see a very similar kind of layout. We have a different set set of plugins on the left side. So for anyone listening, the Cogent Data Hub again is kind of our aggregation and conversion tool. All these different type of protocols like OPC UA, OPC DA, and OPC A and E for alarms and events. We also support OPC alarms and conditions, which is the newer profile for alarms in OPC UA. We have all a variety of different ways that you can get data out of things and data’s into the data hub. We can also do bridging. This concept is, how you share data in between different points. So let’s say I had a connection to one OPC server, and it was communicating to a certain PLC, and there were certain registers I was getting data from. Well, now I also wanna connect to a different OPC server that has, entirely different brand of PLCs. And then maybe I wanna share data in between them directly. Well, with this software, I can just bridge those points between them. Once they’re in the data hub, I can do kind of whatever I want with them. I can then allow them to write between those PLCs and share data that way, and you’re not now having to do any type of hardwiring directly in between them, and then I’m compatible to communicate to each other. Through the standards of OPC and these variety of different communication levels, I can integrate them together. Shawn Tierney (Host): You know, you bring up a good point. When you do something like that, is there any heartbeat? Like, is there on the general or under under, one of these, topics? Is there are there tags we can use that are from DataHub itself that can be sent to the destination, like a heartbeat or, you know, the merge transactions? Or Connor Mason (Guest): Yeah. Absolutely. So with this as well, there’s pretty strong scripting engine, and I have done that in the past where you can make internal tags. And that that could be a a timer. It could be a counter. And and just kind of allows you to create your own tags as well that you could do the same thing, could share that, through bridge connection to a PLC. So, yeah, there there are definitely some people that had those cert and, you know, use cases where they wanna get something to just track, on this software side and get it out to those hardware PLCs. Absolutely. Shawn Tierney (Host): I mean, when you send out the data out of the PLC, the PLC doesn’t care to take my data. But when you’re getting data into the PLC, you wanna make sure it’s updating and it’s fresh. And so, you know, they throw a counter in there, the script thing, and be able to have that. As as long as you see that incrementing, you know, you got good data coming in. That’s that’s a good feature. Connor Mason (Guest): Absolutely. You know, another big one is the the redundancy. So what this does is beyond just the OPC, we can make redundancy to basically anything that has two things running of it. So any of these different connections. How it’s unique is what it does is it just looks at the buckets of data that you create. So for an example, if I do have two different OPC servers and I put them into two areas of, let’s say, OPC server one and OPC server two, I can what now create an OPC redundancy data bucket. And now any client that connects externally to that and wants that data, it’s gonna go talk to that bucket of data. And that bucket of data is going to automatically change in between sources as things go down, things come back up, and the client would never know what’s hap what that happened unless you wanted to. There are internal tasks to show what’s the current source and things, but the idea is to make this trans kind of hidden that regardless of what’s going on in the operations, if I have this set up, I can have my external applications just reading from a single source without knowing that there’s two things behind it that are actually controlling that. Very important for, you know, historian connections where you wanna have a full complete picture of that data that’s coming in. If you’re able to make a redundant connection to two different, servers and then allow that historian to talk to a single point where it doesn’t have to control that switching back and forth. It it will just see that data flow streamlessly as as either one is up at that time. Kinda beyond that as well, there’s quite a few other different things in here. I don’t think we have time to cover all of them. But for for our demo, what I wanna focus on first is our OPC UA connection. This allows us both to act as a OPC UA client to get data from any servers out there, like our top server. And also we can act as an OPC UA server itself. So if anything’s coming in from maybe you have multiple connections to different servers, multiple connections to other things that aren’t OPC as well, I can now provide all this data automatically in my own namespace to allow things to connect to me as well. And that’s part of that aggregation feature, and kind of topic I was mentioning before. So with that, I have a connection here. It’s pulling data all from my top server. I have a few different tags from my Alec Bradley and and my Siemens PLC selected. The next part of this, while I was meshing, was the tunneling. Like I said, this is very popular to get around DCOM issues, but there’s a lot of reasons why you still may use this beyond just the headache of DCOM and what it was. What this runs on is a a TCP stream that takes all the data points as a value, a quality, and a timestamp, and it can mirror those in between another DataHub instance. So if I wanna get things across a network, like my OT side, where NASH previously, I would have to come in and allow a, open port onto my network for any OPC UA clients, across the network to access that, I can now actually change the direction of this and allow me to tunnel data out of my network without opening up any ports. This is really big for security. If anyone out there, security professional or working as an engineer, you have to work with your IT and security a lot, they don’t you don’t wanna have an open port, especially to your operations and OT side. So this allows you to change that direction of flow and push data out of this direction into another area like a DMZ computer or up to a business level computer as well. The other things as well that I have configured in this demo, the benefit of having that tunneling streaming data across this connection is I can also store this data locally in a, influx database. The purpose of that then is that I can actually historize this, provide then if this connection ever goes down to backfill any information that was lost during that tunnel connection going down. So with this added layer on and real time data scenarios like OPC UA, unless you have historical access, you would lose a lot of data if that connection ever went down. But with this, I can actually use the back end of this InfluxDB, buffer any values. When my connection comes back up, pass them along that stream again. And if I have anything that’s historically connected, like, another InfluxDB, maybe a PI historian, Vue historian, any historian offering out there that can allow that connection. I can then provide all those records that were originally missed and backfill that into those systems. So I switched over to a second machine. It’s gonna look very similar here as well. This also has an instance of the Cogent Data Hub running here. For anyone not watching, what we’ve actually have on this side is the the portion of the tunneler that’s sitting here and listening for any data requests coming in. So on my first machine, I was able to connect my PLCs, gather that information into Cogent DataHub, and now I’m pushing that information, across the network into a separate machine that’s sitting here and listening to gather information. So what I can quickly do is just make sure I have all my data here. So I have these different points, both from my Allen Bradley PLCs. I have a few, different simulation demo points, like temperature, pressure, tank level, a few statuses, and all this is updating directly through that stream as the PLC is updating it as well. I also have my scenes controller. I have some, current values and a few different counters tags as well. All of this again is being directly streamed through that tunnel. I’m not connecting to an OPC server at all on this side. I can show you that here. There’s no connections configured. I’m not talking to the PLCs directly on this machine as well. But maybe we’ll pass all the information through without opening up any ports on my OT demo machine per se. So what’s the benefit of that? Well, again, security. Also, the ability to do the store and forward mechanisms. On the other side, I was logging directly to a InfluxDB. This could be my d- my buffer, and then I was able to configure it where if any values were lost, to store that across the network. So now with this side, if I pull up Chronic Graph, which is a free visualization tool that installs with the DataHub as well, I can see some very nice, visual workflows and and visual diagrams of what is going on with this data. So I have a pressure that is just a simulator in this, Allen Bradley PLC that ramps up and and comes back down. It’s not actually connected to anything that’s reading a real pressure, but you can see over time, I can kind of change through these different layers of time. And I might go back a little far, but I have a lot of data that’s been stored in here. For a while during my test, I turned this off and, made it fail, but then I came back in and I was able to recreate all the data and backfill it as well. So through through these views, I can see that as data disconnects, as it comes back on, I have a very cyclical view of the data because it was able to recover and store and forward from that source. Like I said, Shawn, data quality is a big thing in this industry. It’s a big thing for people both at the operations side, and both people making decision in the business layer. So being able to have a full picture, without gaps, it is definitely something that, you should be prioritizing, when you can. Shawn Tierney (Host): Now what we’re seeing here is you’re using InfluxDB on this, destination PC or IT side PC and chronograph, which was that utility or that package that comes, gets installed. It’s free. But you don’t actually have to use that. You could have sent this in to an OSI pi or Exactly. Somebody else’s historian. Right? Can you name some of the historians you work with? I know OSI pie. Connor Mason (Guest): Yeah. Yeah. Absolutely. So there’s quite a few different ones. As far as what we support in the Data Hub natively, Amazon Kinesis, the cloud hosted historian that we can also do the same things from here as well. Aviva Historian, Aviva Insight, Apache Kafka. This is a a kind of a a newer one as well that used to be a very IT oriented solution, now getting into OT. It’s kind of a similar database structure where things are stored in different topics that we can stream to. On top of that, just regular old ODBC connections. That opens up a lot of different ways you can do it, or even, the old classic OPC, HDA. So if you have any, historians that that can act as an OPC HDA, connection, we we can also stream it through there. Shawn Tierney (Host): Excellent. That’s a great list. Connor Mason (Guest): The other thing I wanna show while we still have some time here is that MQTT component. This is really growing and, it’s gonna continue to be a part of the industrial automation technology stack and conversations moving forward, for streaming data, you know, from devices, edge devices, up into different layers, both now into the OT, and then maybe out to, IT, in our business levels as well, and definitely into the cloud as we’re seeing a lot of growth into it. Like I mentioned with Data Hub, the big benefit is I have all these different connections. I can consume all this data. Well, I can also act as an MQTT broker. And what what a broker typically does in MQTT is just route data and share data. It’s kind of that central point where things come to it to either say, hey. I’m giving you some new values. Share it with someone else. Or, hey. I need these values. Can you give me that? It really fits in super well with what this product is at its core. So all I have to do here is just enable it. What that now allows is I have an example, MQTT Explorer. If anyone has worked with MQTT, you’re probably familiar with this. There’s nothing else I configured beyond just enabling the broker. And you can see within this structure, I have all the same data that was in my Data Hub already. The same things I were collecting from my PLCs and top server. Now I’ve embedded these as MPPT points and now I have them in JSON format with the value, their timestamp. You can even see, like, a little trend here kind of matching what we saw in Influx. And and now this enables all those different cloud connectors that wanna speak this language to do it seamlessly. Shawn Tierney (Host): So you didn’t have to set up the PLCs a second time to do this? Nope. Connor Mason (Guest): Not at all. Shawn Tierney (Host): You just enabled this, and now the data’s going this way as well. Exactly. Connor Mason (Guest): Yeah. That’s a really strong point of the Cogent Data Hub is once you have everything into its structure and model, you just enable it to use any of these different connections. You can get really, really creative with these different things. Like we talked about with the the bridging aspect and getting into different systems, even writing down the PLCs. You can make crust, custom notifications and email alerts, based on any of these values. You could even take something like this MTT connection, tunnel it across to another data hub as well, maybe then convert it to OPC DA. And now you’ve made a a a new connection over to something that’s very legacy as well. Shawn Tierney (Host): Yeah. That, I mean, the options here are just pretty amazing, all the different things that can be done. Connor Mason (Guest): Absolutely. Well, I, you know, I wanna jump back into some of our presentation here while we still got the time. And now after we’re kinda done with our demo, there’s so many different ways that you can use these different tools. This is just a really simple, kind of view of the, something that used to be very simple, just connecting OpenSea servers to a variety of different connections, kind of expanding onto with that that’s store and forward, the local influx usage, getting out to things like MTT as well. But there’s a lot more you can do with these solutions. So like Shawn said, reach out to us. We’re happy to engage and see what we can help you with. I have a few other things before we wrap up. Just overall, it we’ve worked across nearly every industry. We have installations across the globe on all continents. And like I said, we’ve been around for pushing thirty years next year. So we’ve seen a lot of different things, and we really wanna talk to anyone out there that maybe has some struggles that are going on with just connectivity, or you have any ongoing projects. If you work in these different industries or if there’s nothing marked here and you have anything going on that you need help with, we’re very happy to sit down and let you know if there’s there’s something we can do there. Shawn Tierney (Host): Yeah. For those who are, listening, I mean, we see most of the big energy and consumer product, companies on that slide. So I’m not gonna read them off, but, it’s just a lot of car manufacturers. You know, these are these are these, the household name brands that everybody knows and loves. Connor Mason (Guest): So kind of wrap some things up here. We talked about all the different ways that we’ve kind of helped solve things in the past, but I wanna highlight some of the unique ones, that we’ve also gone do some, case studies on and and success stories. So this one I actually got to work on, within the last few years that, a plastic packaging, manufacturer was looking to track uptime and downtime across multiple different lines, and they had a new cloud solution that they were already evaluating. They’re really excited to get into play. They they had a lot of upside to, getting things connected to this and start using it. Well, what they had was a lot of different PLCs, a lot of different brands, different areas, different, you know, areas of operation that they need to connect to. So what they used was to first get that into our top server, kind of similar to how they showed them use in their in our demo. We just need to get all the data into a centralized platform first, get that data accessible. Then from there, once they had all that information into a centralized area, they used the Cogent Data Hub as well to help aggregate that information and transform it to be sent to the cloud through MQTT. So very similar to the demo here, this is actually a real use case of that. Getting information from PLCs, structuring it into that how that cloud system needed it for MQTT, and streamlining that data connection to now where it’s just running in operation. They constantly have updates about where their lines are in operation, tracking their downtime, tracking their uptime as well, and then being able to do some predictive analytics in that cloud solution based on their history. So this really enabled them to kind of build from what they had existing. It was doing a lot of manual tracking, into an entirely automated system with management able to see real views of what’s going on at this operation level. Another one I wanna talk about was we we were able to do this success story with, Ace Automation. They worked with a pharmaceutical company. Ace Automation is a SI and they were brought in and doing a lot of work with some some old DDE connections, doing some custom Excel macros, and we’re just having a hard time maintaining some legacy systems that were just a pain to deal with. They were working with these older files, from some old InTouch histor HMIs, and what they needed to do was get something that was not just based on Excel and doing custom macros. So one product we didn’t get to talk about yet, but we also carry is our LGH file inspector. It’s able to take these files, put them out into a standardized format like CSV, and also do a lot of that automation of when when should these files be queried? Should they be, queried for different lengths? Should they be output to different areas? Can I set these up in a scheduled task so it can be done automatically rather than someone having to sit down and do it manually in Excel? So they will able to, recover over fifty hours of engineering time with the solution from having to do late night calls to troubleshoot a, Excel macro that stopped working, from crashing machines, because they were running a legacy systems to still support some of the DDE servers, into saving them, you know, almost two hundred plus hours of productivity. Another example, if we’re able to work with a renewable, energy customer that’s doing a lot of innovative things across North America, They had a very ambitious plan to double their footprint in the next two years. And with that, they had to really look back at their assets and see where they currently stand, how do we make new standards to support us growing into what we want to be. So with this, they had a lot of different data sources currently. They’re all kind of siloed at the specific areas. Nothing was really connected commonly to a corporate level area of historization, or control and security. So again, they they were able to use our top server and put out a standard connectivity platform, bring in the DataHub as an aggregation tool. So each of these sites would have a top server that was individually collecting data from different devices, and then that was able to send it into a single DataHub. So now their corporate level had an entire view of all the information from these different plants in one single application. That then enabled them to connect their historian applications to that data hub and have a perfect view and make visualizations off of their entire operations. What this allowed them to do was grow without replacing everything. And that’s a big thing that we try to strive on is replacing and ripping out all your existing technologies. It’s not something you can do overnight. But how do we provide value and gain efficiency with what’s in place and providing newer technologies on top of that without disrupting the actual operation as well? So this was really, really successful. And at the end, I just wanna kind of provide some other contacts and information people can learn more. We have a blog that goes out every week on Thursdays. A lot of good technical content out there. A lot of recast of the the awesome things we get to do here, the success stories as well, and you can always find that at justblog.softwaretoolbox.com. And again, our main website is justsoftwaretoolbox.com. You can get product information, downloads, reach out to anyone on our team. Let’s discuss what what issues you have going on, any new projects, we’ll be happy to listen. Shawn Tierney (Host): Well, Connor, I wanna thank you very much for coming on the show and bringing us up to speed on not only software toolbox, but also to, you know, bring us up to speed on top server and doing that demo with top server and data hub. Really appreciate that. And, I think, you know, like you just said, if anybody, has any projects that you think these solutions may be able to solve, please give them a give them a call. And if you’ve already done something with them, leave a comment. You know? To leave a comment, no matter where you’re watching or listening to this, let us know what you did. What did you use? Like me, I used OmniServer all those many years ago, and, of course, Top Server as an OPC server. But if you guys have already used Software Toolbox and, of course, Symbol Factory, I use that all the time. But if you guys are using it, let us know in the comments. It’s always great to hear from people out there. I know, you know, with thousands of you guys listening every week, but I’d love to hear, you know, are you using these products? Or if you have questions, I’ll funnel them over to Connor if you put them in the comments. So with that, Connor, did you have anything else you wanted to cover before we close out today’s show? Connor Mason (Guest): I think that was it, Shawn. Thanks again for having us on. It was really fun. Shawn Tierney (Host): I hope you enjoyed that episode, and I wanna thank Connor for taking time out of his busy schedule to come on the show and bring us up to speed on software toolbox and their suite of products. Really appreciated that demo at the end too, so we actually got a look at if you’re watching. Gotta look at their products and how they work. And, just really appreciate them taking all of my questions. I also appreciate the fact that Software Toolbox sponsored this episode, meaning we were able to release it to you without any ads. So I really appreciate them. If you’re doing any business with Software Toolbox, please thank them for sponsoring this episode. And with that, I just wanna wish you all good health and happiness. And until next time, my friends, peace. Until next time, Peace ✌️ If you enjoyed this content, please give it a Like, and consider Sharing a link to it as that is the best way for us to grow our audience, which in turn allows us to produce more content

ai sharing reach north america selling code pc software gotta wifi automation ot analytics pi omega excel arm makes mes sap poc kpi linux server toolbox apis siemens erp ips sql rockwell docker google ai vue opensea hdmi dmz plc router influx iit restful json rascal osi pocs opc tcp csv dcom in touch ascii rest apis scada brownfield prove it plcs hmi dda apache kafka mtt cogent mqtt kep lopa schneiders dde veeva influxdb hda uad alec bradley hmis datahub opc ua odbc ot it allen bradley amazon kinesis lgh mppt connor mason

Apache Druid News! Druid 30.0 is Live and Druid Summit 2024 Announced with Hugh Evans and Will Xu

Tales at Scale

Play Episode Listen Later Jul 31, 2024 37:44

On this episode, we are joined by special co-host Hugh Evans and returning guest Will Xu as we announce Druid Summit 2024 and dive into Druid 30.0's new features and enhancements. Improvements include better ingestion for Amazon Kinesis and Apache Kafka, enhanced support for Delta Lake, and advanced integrations with Google Cloud Storage and Azure Blob Storage. Come for the technical upgrades like GROUP BY and ORDER BY for complex columns and faster query processing with new IN and AND filters, stay for the stabilized concurrent append and replace API for late-arriving streaming data. We also explore experimental features like the centralized data source schema for better performance. Tune in to learn about the latest on arrays, the upcoming GA for window functions, and the benefits of upgrading Druid! To submit a talk or register for Druid Summit 2024, visit https://druidsummit.org/

ga summit api improvements apache sql druid apache kafka hugh evans amazon kinesis

Kinesis, Kafka and Amazon Managed Service for Apache Flink

The New Stack Podcast

Play Episode Listen Later Sep 12, 2023 27:07

Apache Flink is an open-source framework and distributed processing engine designed for data analytics. It excels at handling tasks such as data joins, aggregations, and ETL (Extract, Transform, Load) operations. Moreover, it supports advanced real-time techniques like complex event processing.In this episode, Deepthi Mohan and Nagesh Honnalii from AWS discussed Apache Flink and the Amazon Managed Service for Apache Flink (MSF) with our host, Alex Williams. MSF is a service that caters to customers with varying infrastructure preferences. Some prefer complete control, while others want AWS to handle all infrastructure-related aspects.Use cases for MSF can be grouped into three categories. First, there's streaming ETL, which involves tasks like log aggregation for later auditing. Second, it supports real-time analytics, enabling customers to create dashboards for tasks like fraud detection. Third, it handles complex event processing, where data from multiple sources is joined and aggregated to extract meaningful insights.The origins of MSF trace back to the evolution of real-time data services within AWS. In 2013, AWS introduced Amazon Kinesis, while the open-source community developed Apache Kafka. These services paved the way for MSF by highlighting the need for real-time data processing.To provide more flexibility, AWS launched Kinesis Data Analytics in 2016, allowing customers to write code in JVM-based languages like Java and Scala. In 2018, AWS decided to incorporate Apache Flink into its Kinesis Data Analytics offering, leading to the birth of MSF.Today, thousands of customers use MSF, and AWS continues to enhance its offerings in the real-time data processing space, including the launch of Amazon MSK (Managed Streaming for Apache Kafka). To align with its foundation on Flink, AWS rebranded Kinesis Data Analytics for Apache Flink to Amazon Managed Service for Apache Flink, making it clearer for customers.Learn more from The New Stack about AWS and Apache Flink:Apache Flink for Real Time Data AnalysisApache Flink for Unbounded Data Streams3 Reasons Why You Need Apache Flink for Stream Processing

EP140: Analítica en AWS - Amazon Kinesis

Podcast AWS LATAM

Play Episode Listen Later Jun 20, 2023 16:20

En este episodio hablaremos del servicio Amazon Kinesis y sus diferentes componentes, Amazon Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics y Kinesis Vide Stream Material Adicional: https://aws.amazon.com/kinesis/

anal aws amazon amazon kinesis

Enabling User-Facing Analytics using Apache Pinot with Kishore Gopalakrishna

Engenharia de Dados [Cast]

Play Episode Listen Later Dec 29, 2022 52:11

Neste episódio entrevistamos o Kishore Gopalakrishna, Co-Fundador e CEO da empresa StarTree, Luan Moreno e Mateus Oliveira batem um papo com o co-criador dessa poderosa ferramenta chamada Apache Pinot.O Pinot é um OLAP DataStore desenvolvido para responder consultas analíticas com tempo de resposta na casa dos milissegundos, podendo ser considerado um banco de dados para consultas em tempo-real. Capaz de ingerir de fontes de dados em Batch (Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage), bem como fontes de dados em Stream (Apache Kafka, Apache Pulsar, Amazon Kinesis).O Pinot foi projetado para executar consultas OLAP em tempo real, com baixa latência em grandes quantidades de eventos para entregar o conceito de User-Facing Analytics.Foi criado e desenvolvido por engenheiros do LinkedIn e do Uber e projetado para escalar e expandir sem limites.Apache PinotKishore GopalakrishnaStarTree Luan Moreno = https://www.linkedin.com/in/luanmoreno/

ceo uber facing streaming neste analytics user big data foi enabling batch data analytics apache capaz cofundador pinot data engineers kishore amazon s3 olap apache pulsar startree amazon kinesis apache pinot

Crafting a Modern Data Protection Strategy with Sam Nicholls

Screaming in the Cloud

Play Episode Listen Later Nov 30, 2022 37:26

About SamSam Nicholls: Veeam's Director of Public Cloud Product Marketing, with 10+ years of sales, alliance management and product marketing experience in IT. Sam has evolved from his on-premises storage days and is now laser-focused on spreading the word about cloud-native backup and recovery, packing in thousands of viewers on his webinars, blogs and webpages.Links Referenced: Veeam AWS Backup: https://www.veeam.com/aws-backup.html Veeam: https://veeam.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by and sponsored by our friends over at Veeam. And as a part of that, they have thrown one of their own to the proverbial lion. My guest today is Sam Nicholls, Director of Public Cloud over at Veeam. Sam, thank you for joining me.Sam: Hey. Thanks for having me, Corey, and thanks for everyone joining and listening in. I do know that I've been thrown into the lion's den, and I am [laugh] hopefully well-prepared to answer anything and everything that Corey throws my way. Fingers crossed. [laugh].Corey: I don't think there's too much room for criticizing here, to be direct. I mean, Veeam is a company that is solidly and thoroughly built around a problem that absolutely no one cares about. I mean, what could possibly be wrong with that? You do backups; which no one ever cares about. Restores, on the other hand, people care very much about restores. And that's when they learn, “Oh, I really should have cared about backups at any point prior to 20 minutes ago.”Sam: Yeah, it's a great point. It's kind of like taxes and insurance. It's almost like, you know, something that you have to do that you don't necessarily want to do, but when push comes to shove, and something's burning down, a file has been deleted, someone's made their way into your account and, you know, running a right mess within there, that's when you really, kind of, care about what you mentioned, which is the recovery piece, the speed of recovery, the reliability of recovery.Corey: It's been over a decade, and I'm still sore about losing my email archives from 2006 to 2009. There's no way to get it back. I ran my own mail server; it was an iPhone setting that said, “Oh, yeah, automatically delete everything in your trash folder—or archive folder—after 30 days.” It was just a weird default setting back in that era. I didn't realize it was doing that. Yeah, painful stuff.And we learned the hard way in some of these cases. Not that I really have much need for email from that era of my life, but every once in a while it still bugs me. Which gets speaks to the point that the people who are the most fanatical about backing things up are the people who have been burned by not having a backup. And I'm fortunate in that it wasn't someone else's data with which I had been entrusted that really cemented that lesson for me.Sam: Yeah, yeah. It's a good point. I could remember a few years ago, my wife migrated a very aging, polycarbonate white Mac to one of the shiny new aluminum ones and thought everything was good—Corey: As the white polycarbonate Mac becomes yellow, then yeah, all right, you know, it's time to replace it. Yeah. So yeah, so she wiped the drive, and what happened?Sam: That was her moment where she learned the value and importance of backup unless she backs everything up now. I fortunately have never gone through it. But I'm employed by a backup vendor and that's why I care about it. But it's incredibly important to have, of course.Corey: Oh, yes. My spouse has many wonderful qualities, but one that drives me slightly nuts is she's something of a digital packrat where her hard drives on her laptop will periodically fill up. And I used to take the approach of oh, you can be more efficient and do the rest. And I realized no, telling other people they're doing it wrong is generally poor practice, whereas just buying bigger drives is way easier. Let's go ahead and do that. It's small price to pay for domestic tranquility.And there's a lesson in that. We can map that almost perfectly to the corporate world where you folks tend to operate in. You're not doing home backup, last time I checked; you are doing public cloud backup. Actually, I should ask that. Where do you folks start and where do you stop?Sam: Yeah, no, it's a great question. You know, we started over 15 years ago when virtualization, specifically VMware vSphere, was really the up-and-coming thing, and, you know, a lot of folks were there trying to utilize agents to protect their vSphere instances, just like they were doing with physical Windows and Linux boxes. And, you know, it kind of got the job done, but was it the best way of doing it? No. And that's kind of why Veeam was pioneered; it was this agentless backup, image-based backup for vSphere.And, of course, you know, in the last 15 years, we've seen lots of transitions, of course, we're here at Screaming in the Cloud, with you, Corey, so AWS, as well as a number of other public cloud vendors we can help protect as well, as a number of SaaS applications like Microsoft 365, metadata and data within Salesforce. So, Veeam's really kind of come a long way from just virtual machines to really taking a global look at the entirety of modern environments, and how can we best protect each and every single one of those without trying to take a square peg and fit it in a round hole?Corey: It's a good question and a common one. We wind up with an awful lot of folks who are confused by the proliferation of data. And I'm one of them, let's be very clear here. It comes down to a problem where backups are a multifaceted, deep problem, and I don't think that people necessarily think of it that way. But I take a look at all of the different, even AWS services that I use for my various nonsense, and which ones can be used to store data?Well, all of them. Some of them, you have to hold it in a particularly wrong sort of way, but they all store data. And in various contexts, a lot of that data becomes very important. So, what service am I using, in which account am I using, and in what region am I using it, and you wind up with data sprawl, where it's a tremendous amount of data that you can generally only track down by looking at your bills at the end of the month. Okay, so what am I being charged, and for what service?That seems like a good place to start, but where is it getting backed up? How do you think about that? So, some people, I think, tend to ignore the problem, which we're seeing less and less, but other folks tend to go to the opposite extreme and we're just going to backup absolutely everything, and we're going to keep that data for the rest of our natural lives. It feels to me that there's probably an answer that is more appropriate somewhere nestled between those two extremes.Sam: Yeah, snapshot sprawl is a real thing, and it gets very, very expensive very, very quickly. You know, your snapshots of EC2 instances are stored on those attached EBS volumes. Five cents per gig per month doesn't sound like a lot, but when you're dealing with thousands of snapshots for thousands machines, it gets out of hand very, very quickly. And you don't know when to delete them. Like you say, folks are just retaining them forever and dealing with this unfortunate bill shock.So, you know, where to start is automating the lifecycle of a snapshot, right, from its creation—how often do we want to be creating them—from the retention—how long do we want to keep these for—and where do we want to keep them because there are other storage services outside of just EBS volumes. And then, of course, the ultimate: deletion. And that's important even from a compliance perspective as well, right? You've got to retain data for a specific number of years, I think healthcare is like seven years, but then you've—Corey: And then not a day more.Sam: Yeah, and then not a day more because that puts you out of compliance, too. So, policy-based automation is your friend and we see a number of folks building these policies out: gold, silver, bronze tiers based on criticality of data compliance and really just kind of letting the machine do the rest. And you can focus on not babysitting backup.Corey: What was it that led to the rise of snapshots? Because back in my very early days, there was no such thing. We wound up using a bunch of servers stuffed in a rack somewhere and virtualization was not really in play, so we had file systems on physical disks. And how do you back that up? Well, you have an agent of some sort that basically looks at all the files and according to some ruleset that it has, it copies them off somewhere else.It was slow, it was fraught, it had a whole bunch of logic that was pushed out to the very edge, and forget about restoring that data in a timely fashion or even validating a lot of those backups worked other than via checksum. And God help you if you had data that was constantly in the state of flux, where anything changing during the backup run would leave your backups in an inconsistent state. That on some level seems to have largely been solved by snapshots. But what's your take on it? You're a lot closer to this part of the world than I am.Sam: Yeah, snapshots, I think folks have turned to snapshots for the speed, the lack of impact that they have on production performance, and again, just the ease of accessibility. We have access to all different kinds of snapshots for EC2, RDS, EFS throughout the entirety of our AWS environment. So, I think the snapshots are kind of like the default go-to for folks. They can help deliver those very, very quick RPOs, especially in, for example, databases, like you were saying, that change very, very quickly and we all of a sudden are stranded with a crash-consistent backup or snapshot versus an application-consistent snapshot. And then they're also very, very quick to recover from.So, snapshots are very, very appealing, but they absolutely do have their limitations. And I think, you know, it's not a one or the other; it's that they've got to go hand-in-hand with something else. And typically, that is an image-based backup that is stored in a separate location to the snapshot because that snapshot is not independent of the disk that it is protecting.Corey: One of the challenges with snapshots is most of them are created in a copy-on-write sense. It takes basically an instant frozen point in time back—once upon a time when we ran MySQL databases on top of the NetApp Filer—which works surprisingly well—we would have a script that would automatically quiesce the database so that it would be in a consistent state, snapshot the file and then un-quiesce it, which took less than a second, start to finish. And that was awesome, but then you had this snapshot type of thing. It wasn't super portable, it needed to reference a previous snapshot in some cases, and AWS takes the same approach where the first snapshot it captures every block, then subsequent snapshots wind up only taking up as much size as there have been changes since the first snapshots. So, large quantities of data that generally don't get access to a whole lot have remarkably small, subsequent snapshot sizes.But that's not at all obvious from the outside, and looking at these things. They're not the most portable thing in the world. But it's definitely the direction that the industry has trended in. So, rather than having a cron job fire off an AWS API call to take snapshots of my volumes as a sort of the baseline approach that we all started with, what is the value proposition that you folks bring? And please don't say it's, “Well, cron jobs are hard and we have a friendlier interface for that.”Sam: [laugh]. I think it's really starting to look at the proliferation of those snapshots, understanding what they're good at, and what they are good for within your environment—as previously mentioned, low RPOs, low RTOs, how quickly can I take a backup, how frequently can I take a backup, and more importantly, how quickly can I restore—but then looking at their limitations. So, I mentioned that they were not independent of that disk, so that certainly does introduce a single point of failure as well as being not so secure. We've kind of touched on the cost component of that as well. So, what Veeam can come in and do is then take an image-based backup of those snapshots, right—so you've got your initial snapshot and then your incremental ones—we'll take the backup from that snapshot, and then we'll start to store that elsewhere.And that is likely going to be in a different account. We can look at the Well-Architected Framework, AWS deeming accounts as a security boundary, so having that cross-account function is critically important so you don't have that single point of failure. Locking down with IAM roles is also incredibly important so we haven't just got a big wide open door between the two. But that data is then stored in a separate account—potentially in a separate region, maybe in the same region—Amazon S3 storage. And S3 has the wonderful benefit of being still relatively performant, so we can have quick recoveries, but it is much, much cheaper. You're dealing with 2.3 cents per gig per month, instead of—Corey: To start, and it goes down from there with sizeable volumes.Sam: Absolutely, yeah. You can go down to S3 Glacier, where you're looking at, I forget how many points and zeros and nines it is, but it's fractions of a cent per gig per month, but it's going to take you a couple of days to recover that da—Corey: Even infrequent access cuts that in half.Sam: Oh yeah.Corey: And let's be clear, these are snapshot backups; you probably should not be accessing them on a consistent, sustained basis.Sam: Well, exactly. And this is where it's kind of almost like having your cake and eating it as well. Compliance or regulatory mandates or corporate mandates are saying you must keep this data for this length of time. Keeping that—you know, let's just say it's three years' worth of snapshots in an EBS volume is going to be incredibly expensive. What's the likelihood of you needing to recover something from two years—actually, even two months ago? It's very, very small.So, the performance part of S3 is, you don't need to take it as much into consideration. Can you recover? Yes. Is it going to take a little bit longer? Absolutely. But it's going to help you meet those retention requirements while keeping your backup bill low, avoiding that bill shock, right, spending tens and tens of thousands every single month on snapshots. This is what I mean by kind of having your cake and eating it.Corey: I somewhat recently have had a client where EBS snapshots are one of the driving costs behind their bill. It is one of their largest single line items. And I want to be very clear here because if one of those people who listen to this and thinking, “Well, hang on. Wait, they're telling stories about us, even though they're not naming us by name?” Yeah, there were three of you in the last quarter.So, at that point, it becomes clear it is not about something that one individual company has done and more about an overall driving trend. I am personalizing it a little bit by referring to as one company when there were three of you. This is a narrative device, not me breaking confidentiality. Disclaimer over. Now, when you talk to people about, “So, tell me why you've got 80 times more snapshots than you do EBS volumes?” The answer is as, “Well, we wanted to back things up and we needed to get hourly backups to a point, then daily backups, then monthly, and so on and so forth. And when this was set up, there wasn't a great way to do this natively and we don't always necessarily know what we need versus what we don't. And the cost of us backing this up, well, you can see it on the bill. The cost of us deleting too much and needing it as soon as we do? Well, that cost is almost incalculable. So, this is the safe way to go.” And they're not wrong in anything that they're saying. But the world has definitely evolved since then.Sam: Yeah, yeah. It's a really great point. Again, it just folds back into my whole having your cake and eating it conversation. Yes, you need to retain data; it gives you that kind of nice, warm, cozy feeling, it's a nice blanket on a winter's day that that data, irrespective of what happens, you're going to have something to recover from. But the question is does that need to be living on an EBS volume as a snapshot? Why can't it be living on much, much more cost-effective storage that's going to give you the warm and fuzzies, but is going to make your finance team much, much happier [laugh].Corey: One of the inherent challenges I think people have is that snapshots by themselves are almost worthless, in that I have an EBS snapshot, it is sitting there now, it's costing me an undetermined amount of money because it's not exactly clear on a per snapshot basis exactly how large it is, and okay, great. Well, I'm looking for a file that was not modified since X date, as it was on this time. Well, great, you're going to have to take that snapshot, restore it to a volume and then go exploring by hand. Oh, it was the wrong one. Great. Try it again, with a different one.And after, like, the fifth or six in a row, you start doing a binary search approach on this thing. But it's expensive, it's time-consuming, it takes forever, and it's not a fun user experience at all. Part of the problem is it seems that historically, backup systems have no context or no contextual awareness whatsoever around what is actually contained within that backup.Sam: Yeah, yeah. I mean, you kind of highlighted two of the steps. It's more like a ten-step process to do, you know, granular file or folder-level recovery from a snapshot, right? You've got to, like you say, you've got to determine the point in time when that, you know, you knew the last time that it was around, then you're going to have to determine the volume size, the region, the OS, you're going to have to create an EBS volume of the same size, region, from that snapshot, create the EC2 instance with the same OS, connect the two together, boot the EC2 instance, mount the volume search for the files to restore, download them manually, at which point you have your file back. It's not back in the machine where it was, it's now been downloaded locally to whatever machine you're accessing that from. And then you got to tear it all down.And that is again, like you say, predicated on the fact that you knew exactly that that was the right time. It might not be and then you have to start from scratch from a different point in time. So, backup tooling from backup vendors that have been doing this for many, many years, knew about this problem long, long ago, and really seek to not only automate the entirety of that process but make the whole e-discovery, the search, the location of those files, much, much easier. I don't necessarily want to do a vendor pitch, but I will say with Veeam, we have explorer-like functionality, whereby it's just a simple web browser. Once that machine is all spun up again, automatic process, you can just search for your individual file, folder, locate it, you can download it locally, you can inject it back into the instance where it was through Amazon Kinesis or AWS Kinesis—I forget the right terminology for it; some of its AWS, some of its Amazon.But by-the-by, the whole recovery process, especially from a file or folder level, is much more pain-free, but also much faster. And that's ultimately what people care about how reliable is my backup? How quickly can I get stuff online? Because the time that I'm down is costing me an indescribable amount of time or money.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: Right, the idea of RPO versus RTO: recovery point objective and recovery time objective. With an RPO, it's great, disaster strikes right now, how long is acceptable to it have been since the last time we backed up data to a restorable point? Sometimes it's measured in minutes, sometimes it's measured in fractions of a second. It really depends on what we're talking about. Payments databases, that needs to be—the RPO is basically an asymptotically approaches zero.The RTO is okay, how long is acceptable before we have that data restored and are back up and running? And that is almost always a longer time, but not always. And there's a different series of trade-offs that go into that. But both of those also presuppose that you've already dealt with the existential question of is it possible for us to recover this data. And that's where I know that you are obviously—you have a position on this that is informed by where you work, but I don't, and I will call this out as what I see in the industry: AWS backup is compelling to me except for one fatal flaw that it has, and that is it starts and stops with AWS.I am not a proponent of multi-cloud. Lord knows I've gotten flack for that position a bunch of times, but the one area where it makes absolute sense to me is backups. Have your data in a rehydrate-the-business level state backed up somewhere that is not your primary cloud provider because you're otherwise single point of failure-ing through a company, through the payment instrument you have on file with that company, in the blast radius of someone who can successfully impersonate you to that vendor. There has to be a gap of some sort for the truly business-critical data. Yes, egress to other providers is expensive, but you know what also is expensive? Irrevocably losing the data that powers your business. Is it likely? No, but I would much rather do it than have to justify why I'm not doing it.Sam: Yeah. Wasn't likely that I was going to win that 2 billion or 2.1 billion on the Powerball, but [laugh] I still play [laugh]. But I understand your standpoint on multi-cloud and I read your newsletters and understand where you're coming from, but I think the reality is that we do live in at least a hybrid cloud world, if not multi-cloud. The number of organizations that are sole-sourced on a single cloud and nothing else is relatively small, single-digit percentage. It's around 80-some percent that are hybrid, and the remainder of them are your favorite: multi-cloud.But again, having something that is one hundred percent sole-source on a single platform or a single vendor does expose you to a certain degree of risk. So, having the ability to do cross-platform backups, recoveries, migrations, for whatever reason, right, because it might not just be a disaster like you'd mentioned, it might also just be… I don't know, the company has been taken over and all of a sudden, the preference is now towards another cloud provider and I want you to refactor and re-architect everything for this other cloud provider. If all that data is locked into one platform, that's going to make your job very, very difficult. So, we mentioned at the beginning of the call, Veeam is capable of protecting a vast number of heterogeneous workloads on different platforms, in different environments, on-premises, in multiple different clouds, but the other key piece is that we always use the same backup file format. And why that's key is because it enables portability.If I have backups of EC2 instances that are stored in S3, I could copy those onto on-premises disk, I could copy those into Azure, I could do the same with my Azure VMs and store those on S3, or again, on-premises disk, and any other endless combination that goes with that. And it's really kind of centered around, like control and ownership of your data. We are not prescriptive by any means. Like, you do what is best for your organization. We just want to provide you with the toolset that enables you to do that without steering you one direction or the other with fee structures, disparate feature sets, whatever it might be.Corey: One of the big challenges that I keep seeing across the board is just a lack of awareness of what the data that matters is, where you see people backing up endless fleets of web server instances that are auto-scaled into existence and then removed, but you can create those things at will; why do you care about the actual data that's on these things? It winds up almost at the library management problem, on some level. And in that scenario, snapshots are almost certainly the wrong answer. One thing that I saw previously that really changed my way of thinking about this was back many years ago when I was working at a startup that had just started using GitHub and they were paying for a third-party service that wound up backing up Git repos. Today, that makes a lot more sense because you have a bunch of other stuff on GitHub that goes well beyond the stuff contained within Git, but at the time, it was silly. It was, why do that? Every Git clone is a full copy of the entire repository history. Just grab it off some developer's laptop somewhere.It's like, “Really? You want to bet the company, slash your job, slash everyone else's job on that being feasible and doable or do you want to spend the 39 bucks a month or whatever it was to wind up getting that out the door now so we don't have to think about it, and they validate that it works?” And that was really a shift in my way of thinking because, yeah, backing up things can get expensive when you have multiple copies of the data living in different places, but what's really expensive is not having a company anymore.Sam: Yeah, yeah, absolutely. We can tie it back to my insurance dynamic earlier where, you know, it's something that you know that you have to have, but you don't necessarily want to pay for it. Well, you know, just like with insurances, there's multiple different ways to go about recovering your data and it's only in crunch time, do you really care about what it is that you've been paying for, right, when it comes to backup?Could you get your backup through a git clone? Absolutely. Could you get your data back—how long is that going to take you? How painful is that going to be? What's going to be the impact to the business where you're trying to figure that out versus, like you say, the 39 bucks a month, a year, or whatever it might be to have something purpose-built for that, that is going to make the recovery process as quick and painless as possible and just get things back up online.Corey: I am not a big fan of the fear, uncertainty, and doubt approach, but I do practice what I preach here in that yeah, there is a real fear against data loss. It's not, “People are coming to get you, so you absolutely have to buy whatever it is I'm selling,” but it is something you absolutely have to think about. My core consulting proposition is that I optimize the AWS bill. And sometimes that means spending more. Okay, that one S3 bucket is extremely important to you and you say you can't sustain the loss of it ever so one zone is not an option. Where is it being backed up? Oh, it's not? Yeah, I suggest you spend more money and back that thing up if it's as irreplaceable as you say. It's about doing the right thing.Sam: Yeah, yeah, it's interesting, and it's going to be hard for you to prove the value of doing that when you are driving their bill up when you're trying to bring it down. But again, you have to look at something that's not itemized on that bill, which is going to be the impact of downtime. I'm not going to pretend to try and recall the exact figures because it also varies depending on your business, your industry, the size, but the impact of downtime is massive financially. Tens of thousands of dollars for small organizations per hour, millions and millions of dollars per hour for much larger organizations. The backup component of that is relatively small in comparison, so having something that is purpose-built, and is going to protect your data and help mitigate that impact of downtime.Because that's ultimately what you're trying to protect against. It is the recovery piece that you're buying is the most important piece. And like you, I would say, at least be cognizant of it and evaluate your options and what can you live with and what can you live without.Corey: That's the big burning question that I think a lot of people do not have a good answer to. And when you don't have an answer, you either backup everything or nothing. And I'm not a big fan of doing either of those things blindly.Sam: Yeah, absolutely. And I think this is why we see varying different backup options as well, you know? You're not going to try and apply the same data protection policies each and every single workload within your environment because they've all got different types of workload criticality. And like you say, some of them might not even need to be backed up at all, just because they don't have data that needs to be protected. So, you need something that is going to be able to be flexible enough to apply across the entirety of your environment, protect it with the right policy, in terms of how frequently do you protect it, where do you store it, how often, or when are you eventually going to delete that and apply that on a workload by workload basis. And this is where the joy of things like tags come into play as well.Corey: One last thing I want to bring up is that I'm a big fan of watching for companies saying the quiet part out loud. And one area in which they do this—because they're forced to by brevity—is in the title tag of their website. I pull up veeam.com and I hover over the tab in my browser, and it says, “Veeam Software: Modern Data Protection.”And I want to call that out because you're not framing it as explicitly backup. So, the last topic I want to get into is the idea of security. Because I think it is not fully appreciated on a lived-experience basis—although people will of course agree to this when they're having ivory tower whiteboard discussions—that every place your data lives is a potential for a security breach to happen. So, you want to have your data living in a bunch of places ideally, for backup and resiliency purposes. But you also want it to be completely unworkable or illegible to anyone who is not authorized to have access to it.How do you balance those trade-offs yourself given that what you're fundamentally saying is, “Trust us with your Holy of Holies when it comes to things that power your entire business?” I mean, I can barely get some companies to agree to show me their AWS bill, let alone this is the data that contains all of this stuff to destroy our company.Sam: Yeah. Yeah, it's a great question. Before I explicitly answer that piece, I will just go to say that modern data protection does absolutely have a security component to it, and I think that backup absolutely needs to be a—I'm going to say this an air quotes—a “first class citizen” of any security strategy. I think when people think about security, their mind goes to the preventative, like how do we keep these bad people out?This is going to be a bit of the FUD that you love, but ultimately, the bad guys on the outside have an infinite number of attempts to get into your environment and only have to be right once to get in and start wreaking havoc. You on the other hand, as the good guy with your cape and whatnot, you have got to be right each and every single one of those times. And we as humans are fallible, right? None of us are perfect, and it's incredibly difficult to defend against these ever-evolving, more complex attacks. So backup, if someone does get in, having a clean, verifiable, recoverable backup, is really going to be the only thing that is going to save your organization, should that actually happen.And what's key to a secure backup? I would say separation, isolation of backup data from the production data, I would say utilizing things like immutability, so in AWS, we've got Amazon S3 object lock, so it's that write once, read many state for whatever retention period that you put on it. So, the data that they're seeking to encrypt, whether it's in production or in their backup, they cannot encrypt it. And then the other piece that I think is becoming more and more into play, and it's almost table stakes is encryption, right? And we can utilize things like AWS KMS for that encryption.But that's there to help defend against the exfiltration attempts. Because these bad guys are realizing, “Hey, people aren't paying me my ransom because they're just recovering from a clean backup, so now I'm going to take that backup data, I'm going to leak the personally identifiable information, trade secrets, or whatever on the internet, and that's going to put them in breach compliance and give them a hefty fine that way unless they pay me my ransom.” So encryption, so they can't read that data. So, not only can they not change it, but they can't read it is equally important. So, I would say those are the three big things for me on what's needed for backup to make sure it is clean and recoverable.Corey: I think that is one of those areas where people need to put additional levels of thought in. I think that if you have access to the production environment and have full administrative rights throughout it, you should definitionally not—at least with that account and ideally not you at all personally—have access to alter the backups. Full stop. I would say, on some level, there should not be the ability to alter backups for some particular workloads, the idea being that if you get hit with a ransomware infection, it's pretty bad, let's be clear, but if you can get all of your data back, it's more of an annoyance than it is, again, the existential business crisis that becomes something that redefines you as a company if you still are a company.Sam: Yeah. Yeah, I mean, we can turn to a number of organizations. Code Spaces always springs to mind for me, I love Code Spaces. It was kind of one of those precursors to—Corey: It's amazing.Sam: Yeah, but they were running on AWS and they had everything, production and backups, all stored in one account. Got into the account. “We're going to delete your data if you don't pay us this ransom.” They were like, “Well, we're not paying you the ransoms. We got backups.” Well, they deleted those, too. And, you know, unfortunately, Code Spaces isn't around anymore. But it really kind of goes to show just the importance of at least logically separating your data across different accounts and not having that god-like access to absolutely everything.Corey: Yeah, when you talked about Code Spaces, I was in [unintelligible 00:32:29] talking about GitHub Codespaces specifically, where they have their developer workstations in the cloud. They're still very much around, at least last time I saw unless you know something I don't.Sam: Precursor to that. I can send you the link—Corey: Oh oh—Sam: You can share it with the listeners.Corey: Oh, yes, please do. I'd love to see that.Sam: Yeah. Yeah, absolutely.Corey: And it's been a long and strange time in this industry. Speaking of links for the show notes, I appreciate you're spending so much time with me. Where can people go to learn more?Sam: Yeah, absolutely. I think veeam.com is kind of the first place that people gravitate towards. Me personally, I'm kind of like a hands-on learning kind of guy, so we always make free product available.And then you can find that on the AWS Marketplace. Simply search ‘Veeam' through there. A number of free products; we don't put time limits on it, we don't put feature limitations. You can backup ten instances, including your VPCs, which we actually didn't talk about today, but I do think is important. But I won't waste any more time on that.Corey: Oh, configuration of these things is critically important. If you don't know how everything was structured and built out, you're basically trying to re-architect from first principles based upon archaeology.Sam: Yeah [laugh], that's a real pain. So, we can help protect those VPCs and we actually don't put any limitations on the number of VPCs that you can protect; it's always free. So, if you're going to use it for anything, use it for that. But hands-on, marketplace, if you want more documentation, want to learn more, want to speak to someone veeam.com is the place to go.Corey: And we will, of course, include that in the show notes. Thank you so much for taking so much time to speak with me today. It's appreciated.Sam: Thank you, Corey, and thanks for all the listeners tuning in today.Corey: Sam Nicholls, Director of Public Cloud at Veeam. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that takes you two hours to type out but then you lose it because you forgot to back it up.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

director amazon trust ai lord strategy speaking microsoft holy modern iphone tired os cloud mac windows crafting i am saas payments compliance salesforce fingers screaming aws tens linux github devops azure restores powerball s3 locking hands on learning git nicholls data protection holies rds fud mysql rpo rto ecs redis ebs eks public cloud amazon s3 ec2 veeam efs pinecone rpos rtos corey quinn vsphere samwell aws marketplace chronosphere vmware vsphere github codespaces sam oh vpcs duckbill group sam you amazon kinesis chief cloud economist modern data protection last week in aws humblepod

145: The Cloud Pod Evidently Wants to Talk about re:Invent

The Cloud Pod

Play Episode Listen Later Dec 13, 2021 95:22

On The Cloud Pod this week, the team finds out whose re:Invent 2021 crystal ball was most accurate. Also Graviton3 is announced, and Adam Selipsky gives his first re:Invent keynote. A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located. This week's highlights

ceo amazon ai brazil ios customers ip saas developers intel powered swift slack time travel salesforce poem optimize users savings api troll io offline aws fcc asia pacific ml linux java snowflakes reinvent apis dl azure day two invent regions gpu vmware google cloud s3 sql git guardrails emr rds terraform msk ssh pii sql server kotlin udacity ebs access management 1tb amazon s3 ec2 amazon connect lustre 1gb dynamodb us west 100gb graviton aws marketplace cloudfront fargate jumpcloud vmware cloud intel xeon amazon sagemaker cloudwatch amd epyc 50gb openzfs amazon redshift aws step functions amazon aurora aws sdk aws identity amazon linux fsx amazon cloudfront aws control tower amazon cognito amazon kinesis aws cli amazon inspector aws regions amazon cloudwatch aws transit gateway amazon fsx aws snowball edge amazon ec2 instances cloud pod foghorn consulting

AWS re:Invent Day One Special

AWS Bites

Play Episode Listen Later Nov 30, 2021 30:16

In this special episode, Eoin and Luciano talk about their impression on the announcements from the first day of AWS re:invent 2021. AWS Lambda now supports event filtering for Amazon SQS, Amazon DynamoDB, and Amazon Kinesis as event sources: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-event-filtering-amazon-sqs-dynamodb-kinesis-sources/ Amazon CodeGuru Reviewer now detects hardcoded secrets in Java and Python repositories: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-codeguru-reviewer-hardcoded-secrets-java-python/ Amazon ECR announces pull through cache repositories: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-ecr-cache-repositories/ Introducing recommenders optimized to deliver personalized experiences for Media & Entertainment and Retail with Amazon Personalize: https://aws.amazon.com/about-aws/whats-new/2021/11/recommenders-optimized-personalized-media-entertainment-retail-amazon-personalize/AWS Chatbot now supports management of AWS resources in Slack (Preview): https://aws.amazon.com/about-aws/whats-new/2021/11/aws-chatbot-management-resources-slack/ Amazon CloudWatch Evidently: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-evidently-feature-experimentation-safer-launches/ AWS Migration Hub Refactor Spaces - Preview: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-migration-hub-refactor-spaces/ CloudWatch Real User Monitoring: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-rum-applications-client-side-performance/ CloudWatch Metrics Insights: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-cloudwatch-metrics-insights-preview/ AWS Karpenter: https://github.com/aws/karpenter S3 Event Notifications with EventBridge: https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/ S3 Event Notifications for S3 Lifecycle, S3 Intelligent-Tiering, object tags, and object access control lists: https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-s3-event-notifications-s3-lifecycle-intelligent-tiering-object-tags-object-access-control-lists/ Amazon Athena ACID Transactions (Preview): https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-athena-acid-apache-iceberg/ AWS Control Tower introduces Terraform account provisioning and customization: https://aws.amazon.com/about-aws/whats-new/2021/11/aws-control-tower-terraform/ Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige

media retail day one python aws java reinvent eoin terraform aws lambda amazon dynamodb eventbridge amazon sqs aws control tower amazon kinesis amazon ecr

#0011: ClickStream - Analizando la actividad en tu aplicación

Innovando con AWS

Play Episode Listen Later Aug 26, 2021 47:19

En el episodio número 11 del podcast nos visita Javier Ramirez, Senior Developer Advocate en Amazon Web Services. Hoy nos viene a contar sobre 2 topics. El primero, cómo se instruye y qué es lo que hace un Developer Advocate. Además, nos cuenta sus inicios, cómo se formó y cómo ayuda a los clientes hoy. En la segunda parte nos metemos de lleno en qué es ClickStream y el análisis de actividad en una aplicación. En este tema vamos de menor a mayor, desde simplemente enviar los datos para ser analizados de manera sencilla hasta envíos masivos de datos y análisis con herramientas serverless en Amazon Web Services.   Javier Ramirez - @supercoco9 : Trabaja como evangelista técnico en AWS para ayudar a los desarrolladores a aprovechar al máximo la nube, de modo que puedan concentrarse en resolver problemas interesantes y confiar en AWS en cuanto a rendimiento, escalabilidad, elasticidad y seguridad. Fanático del almacenamiento de datos, grandes y pequeños con una amplia experiencia con diferentes soluciones SQL, NoSQL, graph, in-memory y Big Data.  Antes de trabajar en AWS, pasó 20 años desarrollando software profesionalmente y compartiendo sus aprendizajes con la comunidad. Ha hablado en eventos en más de 15 países, ha sido mentor de decenas de empresas emergentes, ha enseñado durante 6 años en universidades y ha capacitado a cientos de profesionales en la nube y la ingeniería de datos.  Rodrigo Asensio - @rasensio : Basado en Barcelona, España, Rodrigo es responsable de un equipo de Solution Architecture del segmento Enterprise que ayuda a grandes clientes en sus migraciones masivas al cloud, en transformación digital y proyectos de innovación.   Links AWS:- Clickstream solution: https://aws.amazon.com/quickstart/architecture/clickstream-analytics/ - Amazon Athena para consultar datos desde S3: https://aws.amazon.com/athena-  Amazon Kinesis para streaming de datos: https://aws.amazon.com/kinesis/-  Amazon Managed Streaming for Apache Kafka para streaming de datos con Kafka: https://aws.amazon.com/msk/ - Amazon QuickSight para visualización de datos: https://aws.amazon.com/quicksight/ Conecta con Rodrigo Asensio en Twitter https://twitter.com/rasensio y Linkedin en https://www.linkedin.com/in/rasensio/

amazon barcelona espa adem antes enterprise big data aws kafka amazon web services trabaja s3 conecta sql actividad basado aplicaci analizando developer advocate nosql apache kafka senior developer advocate solution architecture amazon athena quicksight amazon kinesis amazon quicksight

What is Data Streaming? Purpose of Amazon Kinesis?

Tech Podcast's - Data Science, AI, Machine Learning(BEPEC)

Play Episode Listen Later Apr 23, 2021 10:03

Various challenges while switching to Data Science/AI as Experienced/Fresher:1. Do I need to work as Fresher in the field of Data Science by diluting my previous experience?2. My CTC going to be diluted if I switch to Data Science/AI? Or Do I get the salary hike?3. After working for many years into other experience, Can I crack Data Science Interview? Can I survive in Data Science/AI/ML fied?For all the above questions, Book a free call consultation & Get a customised career transition roadmap into the field of Data Science/AI: https://www.bepec.in/registration-formSpeak with our mentor Mr Kanth on Instagram @meet_kanth

data speak artificial intelligence streaming machine learning data science ai podcast fresher amazon kinesis

What's New in January 2021

Melbourne AWS User Group

Play Episode Listen Later Feb 10, 2021 64:43

In this month's episode where we tell you what AWS released since re:Invent, Guy gets to talk a fair bit about IoT, JM just wants to remind everyone of various things, and Arjen suffers from some sleep deprivation. What's New Finally in Sydney PartiQL for DynamoDB now is supported in 23 AWS Regions AWS Network Firewall is now available in the Asia Pacific (Sydney) Region Amazon Rekognition Custom Labels is now available in the Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), and Asia Pacific (Tokyo) AWS Regions Announcing new Amazon EC2 T4g instances powered by AWS Graviton2 processors along with a T4g free trial in Asia Pacific (Sydney, Singapore), Europe (London), North Americas (Canada Central, San Francisco), and South Americas (Sao Paulo) regions Serverless Lambda AWS Compute Optimizer Now Delivers Recommendations For AWS Lambda Functions AWS Lambda now makes it easier to build analytics for Amazon Kinesis and Amazon DynamoDB Streams AWS Lambda now supports self-managed Apache Kafka as an event source AWS Lambda launches checkpointing for Amazon Kinesis and Amazon DynamoDB Streams AWS Lambda now supports SASL/SCRAM authentication for functions triggered from Amazon MSK API Gateway Amazon API Gateway now supports data mapping in HTTP APIs Containers Monitoring Join the Preview – Amazon Managed Service for Prometheus (AMP) | AWS News Blog Announcing Amazon Managed Service for Grafana (in Preview) | AWS News Blog Amazon CloudWatch now adds Fluent Bit support for container logs from Amazon EKS and Kubernetes General EC2 Image Builder now supports container images ECS Amazon ECS announces the general availability of ECS Deployment Circuit Breaker Amazon Elastic Container Service launches new management console Amazon ECS now supports VPC Endpoint policies Amazon ECS announces increased service quotas for tasks per service and services per cluster EKS AWS Load Balancer Controller version 2.1 now available with support for additional ELB configurations EC2 & VPC Instances Announcing new Amazon EC2 C6gn instances powered by AWS Graviton2 processors with 100 Gbps networking Amazon EC2 Auto Scaling now allows to define 40 instance types when defining Mixed Instances Policy EBS Multi-Attach support now available on Amazon EBS Provisioned IOPS volume type, io2 Amazon Data Lifecycle Manager now automates copying EBS snapshots across accounts Networking Amazon Virtual Private Cloud (VPC) Now supports Tag on Create for Elastic IP addresses Amazon EC2 API now supports Internet Protocol Version 6 (IPv6) Lightsail Amazon Lightsail now supports IPv6 Dev & Ops Dev AWS SDK for Go version 2 is now generally available AWS SDK for JavaScript version 3 is now generally available Porting Assistant for .NET supports automated code translation Announcing the General Availability of Amazon Corretto 11 for Linux on ARM32 and for Windows on x86 (32-bit) AWS App2Container now supports remote execution of containerization workflows AWS CodePipeline supports deployments with CloudFormation StackSets Announcing CDK Support for AWS Chalice Ops Introducing AWS Systems Manager Change Manager | AWS News Blog New – AWS Systems Manager Consolidates Application Management | AWS News Blog Introducing AWS Systems Manager Fleet Manager Security AWS Single Sign-On now supports Microsoft Active Directory (AD) synchronization Announcing Amazon Route 53 support for DNSSEC AWS Config launches ability to save advanced queries Amazon GuardDuty adds three new threat detections to help you better protect your data stored in Amazon S3 Amazon Cognito Identity Pools enables using user attributes from identity providers for access control to simplify permissions management in AWS AWS Certificate Manager Private Certificate Authority now supports additional certificate customization Amazon Detective enhances IP Address Analytics Data storage & processing AWS Glue launches AWS Glue Custom Connectors Amazon CloudSearch announces updates to its search instances New – AWS Transfer Family support for Amazon Elastic File System | AWS News Blog Achieve faster database failover with Amazon Web Services MySQL JDBC Driver - now in preview Amazon Aurora supports in-place upgrades from MySQL 5.6 to 5.7 Amazon Aurora supports PostgreSQL 12 Amazon Keyspaces (for Apache Cassandra) now supports JSON syntax to help you read and write data from other systems more easily AI & ML Introducing Amazon SageMaker ml.P4d instances for highest performance ML training in the cloud IoT New – AWS IoT Core for LoRaWAN to Connect, Manage, and Secure LoRaWAN Devices at Scale | AWS News Blog Announcing AWS IoT Greengrass 2.0 – With an Open Source Edge Runtime and New Developer Capabilities | AWS News Blog Announcing AWS IoT SiteWise Edge (Preview), a new capability of AWS IoT SiteWise to collect, process, and monitor industrial equipment data on-premises Announcing support for Alarms (Preview) in AWS IoT Events and AWS IoT SiteWise Introducing AWS IoT SiteWise plugin for Grafana AWS IoT Core Device Advisor now available in preview AWS IoT Core adds the ability to deliver data to Apache Kafka clusters AWS IoT SiteWise launches support for Modbus TCP and EtherNet/IP protocols with enhancements to OPC-UA data ingestion Introducing AWS IoT EduKit Announcing AWS IoT Device Defender ML Detect public preview Announcing date and time functions and timezone support in AWS IoT SiteWise Other Cool Stuff Policy Stepping up for a truly open source Elasticsearch | AWS Open Source Blog Services AWS CloudShell – Command-Line Access to AWS Resources | AWS News Blog Amazon Location – Add Maps and Location Awareness to Your Applications | AWS News Blog AWS Cost Anomaly Detection is now generally available Features APIs now available for the AWS Well-Architected Tool Cost & Usage Report Now Available to Member (Linked) Accounts Announcing the availability of AWS Outposts Private Connectivity Amazon Managed Blockchain now supports Ethereum (Preview) AWS Snow Family now supports the Amazon Linux 2 operating system Service Quotas now supports tagging and Attribute-Based Access Control (ABAC) Amazon Lex Introduces an Enhanced Console Experience and New V2 APIs | AWS News Blog SQS Amazon SQS Now Supports a High Throughput Mode for FIFO Queues (Preview) Amazon SQS announces tiered pricing Control Tower region AWS Control Tower now extends governance to existing OUs in your AWS Organizations AWS Control Tower now provides bulk account update The Nanos Amazon Aurora supports in-place upgrades from PostgreSQL 11 to 12 Announcing the General Availability of Amazon Corretto 11 for Linux on ARM32 and for Windows on x86 (32-bit) Amazon Lightsail now supports IPv6 Amazon Virtual Private Cloud (VPC) Now supports Tag on Create for Elastic IP addresses Sponsors Gold Sponsor Innablr Silver Sponsors AC3 CMD Solutions DoIT International

san francisco manage singapore windows iot aws ml linux javascript invent jm json mysql arjen postgresql ebs grafana apache kafka dynamodb lorawan ous apache cassandra t4g general availability amazon eks opc ua amazon linux amazon kinesis amazon lightsail

A Detailed Analysis of The Amazon Kinesis Outage on US East-1 Region

IGeometry

Play Episode Listen Later Nov 29, 2020 45:31

AWS US east-1 experienced an outage Nov-25-2020. Amazon has updated us with summary detailing what exactly happened to amazon Kinesis that caused the outage let us discuss it 0:00 Intro 1:00 Tldr (diagram) 7:30 Detailed Analysis of What Happened 25:00 Why Cognito Went Down 31:20 Why CloudWatch Went Down 33:20 Why Lambda and AutoScaling Went Down 35:50 Why EventBridge, Elastic Kubernetes and Container Service Went Down 38:00 Why Service Status Went Down 40:00 Summary https://aws.amazon.com/message/11201/ --- Send in a voice message: https://anchor.fm/hnasr/message

amazon region detailed outage tldr kinesis us east amazon kinesis

【毎日AWS #030】DatadogやNew Relic に対応！ Amazon Kinesis Data Firehose が複数の新しいデータ配信先をサポート他9件

サーバーワークスが送るAWS情報番組「さばラジ！」

Play Episode Listen Later Jul 30, 2020 12:04

最新情報を "ながら" でキャッチアップ！ラジオ感覚放送「毎日AWS！」おはようございます、サーバーワークスの加藤です。今日は 7/29 に出たアップデートから10件をご紹介。感想は Twitter にて「#サバワ」をつけて投稿してください！ ■ UPDATE ラインナップ Amazon Kinesis Data Firehose が複数の新しいデータ配信先をサポート Amazon Translate がOfficeファイルの翻訳をサポート AWS Security Hub が新しい自動セキュリティコントロールを追加 Amazon ECR がAWS KMSを用いたイメージの暗号化をサポート AWS Database Migration Service が拡張されたタスク評価をサポート Amazon Elasticseach Service が類似検索を強化 - コサイン類似度をサポート Amazon Lightsail が cPanel WHM のプリインストールをサポート AWS Cloud Map への Amazon EC2 インスタンスの登録が簡素化 Amazon RDS for Oracle が Oracle Application Express Version 20.1 をサポート AWS Site-to-Site VPN が作成時のタグ付けとリソースレベルのアクセス制御に対応 ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログサーバーワークスエンジニアブログ

aws datadog new relic firehose amazon rds amazon kinesis aws security hub

Modeling Section: Data warehouse, data lake or data swamp? Actuarial modeling in the big data era with Tom Peplow

Society of Actuaries Podcasts Feed

Play Episode Listen Later Apr 24, 2020 32:58

Listen now as April Shen, FSA, CFA interviews Tom Peplow (Principal, Director of Product Development - LTS) on the interaction between actuarial modeling and big data. In this episode, Tom discusses the basic concept on data warehouse, tools and resources actuaries can use to learn more. Show Notes and Resources: 1) Anything by Ralph Kimball – he’s the forefather of data warehouse data models. 2) Data storage tools: Databricks, Snowflake, Google BigTable, Amazon RedShift, Microsoft Synapse, Parquet 3) Streaming analytics / event processing: Kafka, Amazon Kinesis, Azure Stream Analytics 4) Data preparation: Power BI (using Power Query), Azure Factory Data Flows, Alteryx, AWS Glue 5) Data analytics: Tableau, Power BI 6) Other tools/languages: Jupyter, Python, R, C#, .NET 7) Chapter 5 of Roy Fielding’s dissertation, “Representational State Transfer (REST).” 8) Andrew White (Gartner) “Top Data and Analytics Predictions for 2019” https://blogs.gartner.com/andrew_white/2019/01/03/our-top-data-and-analytics-predicts-for-2019/ 9) Data warehouse is not a use case, Jordan Tigani (Google) https://www.youtube.com/watch?v=0I7eOQpDBHQ

director data streaming big data modeling swamp python snowflakes cfa kafka tableau fsa power bi databricks parquet data warehouses data lake jupyter actuarial alteryx power query amazon redshift aws glue amazon kinesis roy fielding

ROB202: Embracing the next wave of digital transformation via smart robots

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 50:52

For over a decade, cloud-enabled digital transformation has remade industries and powered innovation to greatly benefit enterprises and consumers. In this session, learn how AWS customers in industries as diverse of Manufacturing, Healthcare, and Oil & Gas are incorporating robots into their next-generation solutions. Come learn how AWS services such AWS RoboMaker, AWS IoT Core, Amazon SageMaker, Amazon Kinesis, and Amazon Rekognition are being used to fundamentally transform work and improve outcomes.

embracing smart healthcare robots gas oil manufacturing digital transformation aws next wave amazon sagemaker amazon rekognition amazon kinesis aws iot core

SVS317-R1: Serverless stream processing pipeline best practices

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 60:27

Streaming data pipelines are increasingly used to replace batch processing with real-time decision-making for use cases including log processing, real-time monitoring, data lake analytics, and machine learning. Join this session to learn how to leverage Amazon Kinesis and AWS Lambda to solve real-time ingestion, processing, storage, and analytics challenges. We introduce design patterns and best practices as well as share a customer journey in building large-scale real-time serverless analytics capabilities.

streaming best practices pipeline serverless aws lambda stream processing amazon kinesis

AIM201-S: Paths to anomaly detection w/ TIBCO data science, streaming on AWS

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 47:54

Sensor data on the event stream can be voluminous. In NAND manufacturing, there are millions of columns of data that represent many measured and virtual metrics. These sensor data can arrive with considerable velocity. In this session, learn about developing cross-sectional and longitudinal analyses for anomaly detection and yield optimization using deep learning methods, as well as super-fast subsequence signature search on accumulated time-series data and methods for handling very wide data in Apache Spark on Amazon EMR. The trained models are developed in TIBCO Data Science and Amazon SageMaker and applied to event streams using services such as Amazon Kinesis to identify hot paths to anomaly detection. This presentation is brought to you by TIBCO Software, an APN Partner.

streaming paths data science sensor apache spark tibco anomaly detection amazon sagemaker tibco software amazon kinesis amazon emr

ARC414-S: Accelerated analytics: Building the next-gen data platform for Hertz

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 44:58

Hertz is undertaking a massive digital transformation to evolve its technology landscape. This move provides an opportunity to extract valuable insights from a large amount of data produced by the new systems. In this session, learn how Deloitte collaborated with Hertz to build a next-generation data platform, which includes an integration hub, a unified reporting layer, and an ecosystem of tools used to build advanced analytics models. Discover how the solution leveraged all native AWS services, such as Amazon S3, Amazon Kinesis, Amazon EMR, and AWS Lambda, to enable cross-functional insights and accelerate the cloud journey in under 12 months. This presentation is brought to you by Deloitte, an APN Partner.

discover analytics next gen deloitte aws hertz accelerated data platform aws lambda amazon s3 amazon kinesis amazon emr

DEM34-S: Scaling to billions of requests the serverless way at Capital One

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 21:52

Stream processing tools like Apache Spark and Flink are the default choice for big data processing, but these frameworks also come with high development and operation costs. Serverless streaming architecture is an alternative solution that brings significant reduction in these costs and allows developers to focus on business delivery, not infrastructure management. This session explores how Capital One used serverless streaming architecture to provide real-time insights for millions of customers through its intelligent assistant Eno. Learn how high-throughput streaming loads can be handled with ease as well as how message-driven architecture can be implemented using Amazon API Gateway, AWS Lambda, and Amazon Kinesis for complex asynchronous applications. This presentation is brought to you by Capital One, an APN Partner.

stream scaling billions requests capital one eno serverless flink aws lambda apache spark amazon kinesis amazon api gateway

DAT373: Data platform engineering: How Vanguard is migrating data to AWS

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 56:23

Organizations grappling with moving on-premises workloads and data to the cloud face the challenge of getting platform engineering and team structures right. In most cases, lift-and-shift is not an option. In this session, learn how Vanguard created a team to tackle volume and velocity of data for microservices and big data workloads, using data streaming (Amazon Kinesis), file transfer (AWS Storage Gateway), CDC replication (DB2 on z/OS, Oracle Exadata, Microsoft SQL Server), relational and NoSQL databases (Amazon DynamoDB, Amazon RDS for PostgreSQL, Amazon Aurora), and object storage (Amazon S3). The cloud data platform has seen a 200 percent YOY increase in adoption.

os cdc organizations vanguard migrating yoy nosql postgresql data platform amazon s3 platform engineering microsoft sql server db2 amazon rds amazon aurora amazon dynamodb amazon kinesis aws storage gateway

ANT348: How Prime Video processes 8 percent of all US internet traffic on AWS

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 46:56

Amazon Prime Video is one of the largest streaming services in the world, serving over 8 percent of all internet traffic in the US on any day. With millions of devices in over 200 countries around the world, collecting and analyzing telemetry data in real time is a major scaling challenge. In this session, learn how the team used AWS technologies such as Amazon Kinesis, AWS Lambda, and Amazon Simple Storage Service (Amazon S3) to build a next-generation telemetry platform that handles millions of transactions per second (TPS) and hundreds of petabytes of data.

processes aws prime video amazon prime video tps aws lambda internet traffic us internet amazon kinesis

ANT330: Intuit: Moving from monitoring to observability using Amazon ES

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 57:49

As Intuit moves to a cloud hosting architecture that includes Amazon Elastic Compute Cloud (Amazon EC2), containers, and serverless applications, the company is on an "observability" journey to transform the way it monitors its applications' health. Observability requires correlating logs, metrics, traces, etc., which becomes complex as systems become more distributed. In this session, the company discusses leveraging tools such as OpenTelemetry, Amazon Elasticsearch Service (Amazon ES), Amazon Kinesis, and Jaeger to build an observability solution, and the benefits this solution provides by giving visibility across the platform, from containers to serverless applications. The company also discuss how it sees observability evolving in coming years.

moving monitoring intuit jaeger observability using amazon amazones amazon kinesis amazon elastic compute cloud amazon ec2

ANT329-R1: Amazon ES ingest at Pearson-managing high throughput and scale

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 55:25

In adopting a cloud-first strategy, Pearson had to secure and monitor its infrastructure, but with many applications deployed against many services, it needed reliable, fast ingest of logs into a real-time analytics engine. Pearson chose Amazon Elasticsearch Service for monitoring and Amazon Kinesis for ingest. In this session, we explain some Amazon ES basics and common ingest patterns that builders use on AWS. A Pearson representative shares their firsthand experiences developing a reliable and scalable architecture comprising Amazon ES, Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose, and AWS Lambda to centralize logs and keep ingestion and access secure.

managing scale pearson aws throughput aws lambda ingest amazones amazon kinesis amazon elasticsearch service

ANT326-R1: Building a streaming data platform with Amazon Kinesis

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 61:04

Today's businesses need to react to customer opportunities and problems as quickly as possible. In this session, we walk through how to build analytics pipelines for business operational reporting that speed up time to information from hours to seconds. We discuss how streaming-data services like Amazon Kinesis are used to capture and analyze data in real time, prior to delivering it to your Amazon Redshift data warehouse and Amazon Aurora reporting databases. Our customer GoDaddy shares how they use this architectural pattern to provide the best experience to the millions of customers that host websites on their platform.

streaming godaddy data platform amazon redshift amazon aurora amazon kinesis

STP208: How to build a company founded on engineering principles

AWS re:Invent 2019

Play Episode Listen Later Dec 7, 2019 45:29

Intercom is "all in" on AWS-a strategy that's aligned with the engineering principles used to build the company. This talk covers how the company uses those principles to make engineering decisions. Learn about the evolution of Intercom's architecture, from handful of Amazon EC2 hosts to thousands of instances. Also learn how Intercom uses Amazon DynamoDB, AWS Lambda, Amazon SQS, Amazon Aurora, Amazon Redshift, Amazon Kinesis, and more. Finally, learn why Intercom decided against leveraging microservices, and why this has increased its ability to move fast and ship great products.

principles engineering founded aws intercom aws lambda amazon ec2 amazon redshift amazon aurora amazon dynamodb amazon sqs amazon kinesis

Episode 57 - Messaging Special

AWS TechChat

Play Episode Listen Later Oct 3, 2019 50:55

In this messaging themed episode of AWS TechChat, Pete is back, and more so in person. They started the show reminiscing about messaging history, going back, looking at where we came from and how we arrived at the position we are today. More importantly, why do we use messaging and the benefits you can derive in decoupling your architecture. They then pivot to event streams, which cover both Amazon Kinesis and Amazon Managed Streaming for Apache Kafka, (Amazon MSK). They are both designed to process or analyze streaming data for specialized needs. Next, they moved to a more traditional message bus - Amazon Simple Queue Service (SQS) and Amazon MQ (Managed message broker service for ActiveMQ), both a durable pull-based messaging platform. Amazon SQS being lightweight and tightly integrated to the AWS Cloud platform and Amazon MQ supporting a variety of protocols making it a great choice for existing applications that use industry-standard protocols. Finally, they talked about push-based messaging with Amazon Simple Notification Service (SNS) and the Message Broker for AWS IoT. Both publish/subscribe (pub/sub) platform that enables you to build fan out architectures with hundreds of thousands to millions of subscribers. You now have more than a hammer to build your applications, Maslow would be proud. Speakers: Shane Baldacchino - Solutions Architect, ANZ, AWS Peter Stanski - Head of Solution Architecture, AWS Resources: Amazon CloudFront announces new Edge location in Shenzhen, China https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-cloudfront-shenzhen-launch/ What is Pub/Sub Messaging? https://aws.amazon.com/pub-sub-messaging/ Amazon Kinesis Data Streams https://aws.amazon.com/kinesis/data-streams/ Amazon Managed Streaming for Apache Kafka (Amazon MSK) https://aws.amazon.com/msk/ Apache ZooKeeper https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-zookeeper.html Amazon Simple Queue Service https://aws.amazon.com/sqs/ Amazon Simple Queue Service Released https://aws.amazon.com/blogs/aws/amazon_simple_q/ Amazon MQ https://aws.amazon.com/amazon-mq/ Amazon Simple Notification Service https://aws.amazon.com/sns/ MQTT - AWS IoT https://docs.aws.amazon.com/iot/latest/developerguide/mqtt.html Message Broker for AWS IoT https://docs.aws.amazon.com/iot/latest/developerguide/iot-message-broker.html AWS Events: AWS re:Invent https://reinvent.awsevents.com/ AWSome Day Online Series https://aws.amazon.com/events/awsome-day/awsome-day-online/ AWS Modern Application Development Online Event https://aws.amazon.com/events/application/modern-app-development/ AWS Innovate on-demand https://aws.amazon.com/events/aws-innovate/

china messaging maslow invent shenzhen anz apache kafka aws cloud solution architecture aws iot amazon sqs amazon kinesis activemq apache zookeeper amazon msk

ANT310: Architecting for Real-Time Insights with Amazon Kinesis

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 61:14

Amazon Kinesis makes it easy to speed up the time it takes for you to get valuable, real-time insights from your streaming data. In this session, we walk through the most popular applications that customers implement using Amazon Kinesis, including streaming extract-transform-load, continuous metric generation, and responsive analytics. Our customer Autodesk joins us to describe how they created real-time metrics generation and analytics using Amazon Kinesis and Amazon Elasticsearch Service. They walk us through their architecture and the best practices they learned in building and deploying their real-time analytics solution.

real time autodesk architecting amazon kinesis amazon elasticsearch service

ANT322: High Performance Data Streaming with Amazon Kinesis: Best Practices

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 55:47

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we dive deep into best practices for Kinesis Data Streams and Kinesis Data Firehose to get the most performance out of your data streaming applications. Comcast uses Amazon Kinesis Data Streams to build a Streaming Data Platform that centralizes data exchanges. It is foundational to the way our data analysts and data scientists derive real-time insights from the data. In the second part of this talk, Comcast zooms into how to properly scale a Kinesis stream. We first list the factors to consider to avoid scaling issues with standard Kinesis stream consumption, and then we see how the new fan-out feature changes these scaling considerations.

data streaming best practices high performance comcast kinesis amazon kinesis

API305: Choosing the Right Messaging Service for Your Distributed App

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 47:50

In the cloud, modern apps are decoupled into independent building blocks, called microservices, which are easier to develop, deploy, and maintain. Messaging is a central tool used to connect and coordinate these microservices. AWS offers multiple messaging services, which address a variety of use cases. In this session, learn how to choose the service that's best for your use case as we present the key technical features of each. We pay special attention to integrating messaging services with serverless technology. We cover Amazon Kinesis, Amazon SQS, and Amazon SNS in detail with discussion of other services as appropriate.

service messaging aws distributed amazon sqs amazon kinesis

GAM301: Supercell - Scaling Mobile Games

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 56:54

In this session, learn how Supercell architected its analytics pipeline on AWS. We dive deep into how Supercell leverages Amazon Elastic Compute Cloud (Amazon EC2), Amazon Kinesis, Amazon Simple Storage Service (Amazon S3), Amazon EMR, and Spark to ingest, process, store, and query petabytes of data. We also dive deep into how Supercell's games are architected to accommodate scaling and failure recovery. We explain how Supercell's teams are organized into small and independent cells and how this affects the technology choices they make to produce value and agility in the development process.

spark scaling aws mobile games supercell amazon kinesis amazon emr amazon elastic compute cloud amazon ec2

SRV316: Serverless Stream Processing Pipeline Best Practices

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 43:57

Real-time analytics has traditionally been analyzed using batch processing in DWH/Hadoop environments. Common use cases use data lakes, data science, and machine learning (ML). Creating serverless data-driven architecture and serverless streaming solutions with services like Amazon Kinesis, AWS Lambda, and Amazon Athena can solve real-time ingestion, storage, and analytics challenges, and help you focus on application logic without managing infrastructure. In this session, we introduce design patterns, best practices, and share customer journeys from batch to real-time insights in building modern serverless data-driven architecture applications. Hear how Intel built the Intel Pharma Analytics Platform using a serverless architecture. This AI cloud-based offering enables remote monitoring of patients using an array of sensors, wearable devices, and ML algorithms to objectively quantify the impact of interventions and power clinical studies in various therapeutics conditions.

ai real intel best practices pipeline ml serverless aws lambda stream processing amazon athena amazon kinesis

STG302: Driving Machine Learning and Analytics Use Cases with AWS Storage

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 59:12

You've designed and built a well-architected data lake and ingested extreme amounts of structured and unstructured data. Now what? In this session, we explore real-world use cases where data scientists, developers, and researchers have discovered new and valuable ways to extract business insights using advanced analytics and machine learning. We review Amazon S3, Amazon Glacier, and Amazon EFS, the foundation for the analytics clusters and data engines. We also explore analytics tools and databases, including Amazon Redshift, Amazon Athena, Amazon EMR, Amazon QuickSight, Amazon Kinesis, Amazon RDS, and Amazon Aurora; and we review the AWS machine learning portfolio and AI services such as Amazon SageMaker, AWS Deep Learning AMIs, Amazon Rekognition, and Amazon Lex. We discuss how all of these pieces fit together to build intelligent applications.

ai driving analytics machine learning aws use cases amazon s3 amazon sagemaker amazon rds amazon rekognition amazon redshift amazon aurora amazon athena amazon lex amazon kinesis amazon quicksight amazon glacier aws storage amazon emr amazon efs

TLC306: Vonage & Aspect: Transform Communications & Customer Engagement

AWS re:Invent 2018

Play Episode Listen Later Nov 30, 2018 60:00

In this session, learn from market-leader Vonage how and why they re-architected their QoS-sensitive, highly available and highly performant legacy real-time communications systems to take advantage of Amazon EC2, Enhanced Networking, Amazon S3, ASG, Amazon RDS, Amazon ElastiCache, AWS Lambda, StepFunctions, Amazon SNS, Amazon SQS, Amazon Kinesis, Amazon EFS, and more. We also learn how Aspect, a multinational leader in call center solutions, used AWS Lambda, Amazon API Gateway, Amazon Kinesis, Amazon ElastiCache, Amazon Cognito, and Application Load Balancer with open-source API development tooling from Swagger, to build a comprehensive, microservices-based solution. Vonage and Aspect share their journey to TCO optimization, global outreach, and agility with best practices and insights.

transform api aspect swagger customer engagement asg tco aws lambda amazon s3 vonage qos amazon ec2 amazon rds amazon sqs amazon kinesis amazon cognito amazon api gateway application load balancer amazon elasticache amazon efs

Drill to Detail Ep.52 'Lyft, Ride-Share Analytics and ETL Developer Productivity ' With Special Guest Mark Grover

Drill to Detail

Play Episode Listen Later Apr 9, 2018 48:46

Mark Grover LinkedIn Profile and Github Profile"Hadoop Application Architectures" "Drill to Detail Ep. 7 'Apache Spark and Hadoop Application Architectures'Lyft Engineering Blog"Software Engineer to Product Manager" blog by Gwen Shapira"Introduction to the Oracle Data Integrator Topology" from the Oracle Data Integrator docs siteApache Airflow and Amazon Kinesis homepages "Experimentation in a Ridesharing Marketplace" by Nicholas Chamandy, Head of Data Science at Lyft"How Uber Eats Works with Restaurants""Deliveroo has built a bunch of tiny kitchens to feed more hungry Londoners" - Wired.co.uk

head productivity analytics developers detail data science product managers drill experimentation grover rideshare developer productivity amazon kinesis

Drill to Detail Ep.52 'Lyft, Ride-Share Analytics and ETL Developer Productivity ' With Special Guest Mark Grover

Drill to Detail

Play Episode Listen Later Apr 9, 2018 48:46

head productivity analytics developers detail data science product managers drill experimentation grover rideshare developer productivity amazon kinesis

Podcasts about amazon kinesis

Best podcasts about amazon kinesis

AWS re:Invent 2017

AWS re:Invent 2016

AWS re:Invent 2019

Drill to Detail

AWS re:Invent 2018

AWS TechChat

Latest news about amazon kinesis

Latest podcast episodes about amazon kinesis

Software Toolbox: OPC Server, Router, DataHub and more (P248)

Apache Druid News! Druid 30.0 is Live and Druid Summit 2024 Announced with Hugh Evans and Will Xu

Kinesis, Kafka and Amazon Managed Service for Apache Flink

EP140: Analítica en AWS - Amazon Kinesis

Enabling User-Facing Analytics using Apache Pinot with Kishore Gopalakrishna

Crafting a Modern Data Protection Strategy with Sam Nicholls

145: The Cloud Pod Evidently Wants to Talk about re:Invent

AWS re:Invent Day One Special

#0011: ClickStream - Analizando la actividad en tu aplicación

What is Data Streaming? Purpose of Amazon Kinesis?

What's New in January 2021

A Detailed Analysis of The Amazon Kinesis Outage on US East-1 Region

【毎日AWS #030】DatadogやNew Relic に対応！ Amazon Kinesis Data Firehose が複数の新しいデータ配信先をサポート 他9件

Modeling Section: Data warehouse, data lake or data swamp? Actuarial modeling in the big data era with Tom Peplow

ROB202: Embracing the next wave of digital transformation via smart robots

SVS317-R1: Serverless stream processing pipeline best practices

AIM201-S: Paths to anomaly detection w/ TIBCO data science, streaming on AWS

ARC414-S: Accelerated analytics: Building the next-gen data platform for Hertz

DEM34-S: Scaling to billions of requests the serverless way at Capital One

DAT373: Data platform engineering: How Vanguard is migrating data to AWS

ANT348: How Prime Video processes 8 percent of all US internet traffic on AWS

ANT330: Intuit: Moving from monitoring to observability using Amazon ES

ANT329-R1: Amazon ES ingest at Pearson-managing high throughput and scale

ANT326-R1: Building a streaming data platform with Amazon Kinesis

STP208: How to build a company founded on engineering principles

Episode 57 - Messaging Special

ANT310: Architecting for Real-Time Insights with Amazon Kinesis

ANT322: High Performance Data Streaming with Amazon Kinesis: Best Practices

API305: Choosing the Right Messaging Service for Your Distributed App

GAM301: Supercell - Scaling Mobile Games

SRV316: Serverless Stream Processing Pipeline Best Practices

STG302: Driving Machine Learning and Analytics Use Cases with AWS Storage

TLC306: Vonage & Aspect: Transform Communications & Customer Engagement

Drill to Detail Ep.52 'Lyft, Ride-Share Analytics and ETL Developer Productivity ' With Special Guest Mark Grover

Drill to Detail Ep.52 'Lyft, Ride-Share Analytics and ETL Developer Productivity ' With Special Guest Mark Grover

ABD217: From Batch to Streaming: How Amazon Flex Uses Real-time Analytics to Deliver Packages on Time

ABD301: Analyzing Streaming Data in Real Time with Amazon Kinesis

ABD202: Best Practices for Building Serverless Big Data Applications

ABD208: Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores New Innovation with Amazon Kinesis Firehose

ABD210: Modernizing Amtrak: Serverless Solution for Real-Time Data Capabilities

GPSTEC313: GPS: Real-Time Data Processing with AWS Lambda Quickly, at Scale, and What Comes Next?

SRV211: Building Smart Applications Leveraging AWS to Drive Customer and Worker Experience

SRV304: Building High-Throughput Serverless Data Processing Pipelines

ABD312: Deep Dive: Migrating Big Data Workloads to AWS

ABD318: Architecting a data lake with Amazon S3, Amazon Kinesis, and Amazon Athena

ABD335: Real-Time Anomaly Detection Using Amazon Kinesis

ABD401: How Netflix Monitors Applications in Near Real-Time with Amazon Kinesis

ARC315: The Enterprise Fast Lane - What Your Competition Doesn't Want You to Know about Enterprise Cloud Transformation

SRV335: Best Practices for Orchestrating AWS Lambda Workloads

ARC318: Building .NET-based Serverless Architectures and Running .NET Core Microservices in Docker Containers on AWS

CTD406: Measuring the Internet in Real Time

GAM401: Designing for the Future: Building a Flexible Event-based Analytics Architecture for Games

RET303: Drive Warehouse Efficiencies with the Same AWS IoT technology That Powers Amazon Fulfillment

Episode 19 - AWS news that matter to you most

#197: Changing your Business with Amazon Kinesis

ARC308: Metering Big Data at AWS: From 0 to 100 Million Records in 1 Second

ARC306: Event Handling at Scale: Designing an Auditable Ingestion and Persistence Architecture for 10K+ events/second

SVR301: Real-time Data Processing Using AWS Lambda

SAC310: Securing Serverless Architectures, and API Filtering at Layer 7

MBL404: Deep-Dive: Native, Hybrid and Web patterns with Serverless and AWS Mobile Services

MAC303: Zillow Group: Developing Classification and Recommendation Engines with Amazon EMR and Apache Spark

GAM301: How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful Player Insights

DAT320: AWS Database State of the Union

DAT310: Building Real-Time Campaign Analytics Using AWS Services

CTD303: Design Patterns for High Availability: Lessons from Amazon CloudFront that You Can Replicate on AWS

BDM403: Beeswax: Building a Real-Time Streaming Data Platform on AWS

BDM303: JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing

BDM206: Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT Analytics Platform on AWS

BDA206: Building Big Data Applications with the AWS Big Data Platform

DAT311: How Toyota Racing Development Makes Racing Decisions in Real Time with AWS

【毎日AWS #030】DatadogやNew Relic に対応！ Amazon Kinesis Data Firehose が複数の新しいデータ配信先をサポート他9件