Podcasts about mlcs

  • 18PODCASTS
  • 31EPISODES
  • 47mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Mar 4, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about mlcs

Latest podcast episodes about mlcs

Manx Radio's Mannin Line
It's Mannin Line with Andy Wint - Tuesday 4th February 2025

Manx Radio's Mannin Line

Play Episode Listen Later Mar 4, 2025 50:19


Attracting digital nomads, Manxman's tidal woes, future of Reayrt ny Baie, DOI minister on Crogga, three new MLCs, dredging Douglas harbour & fly tipping in Castletown and Ramsey. It's Mannin Line with Andy Wint - Tuesday 4th February 2025

Manx Radio's Mannin Line
It's Mannin Line with Beth Espey - Monday 24 February 2025

Manx Radio's Mannin Line

Play Episode Listen Later Feb 24, 2025 46:57


A range of topics brought up by callers on today's programme with Beth Espey, including the role of MLCs, the Island's national parks and people seeing what's on your phone.

Manx Radio's Mannin Line
It's Mannin Line with Andy Wint - Friday 21st February 2025

Manx Radio's Mannin Line

Play Episode Listen Later Feb 21, 2025 50:31


The role and value of MLCs, phone scams continuing, the cost of the tax cuts, change of supervision ratio and bad behaviour on the Steam Packet & Manx Care apologise to Julie Edge MHK. It's Mannin Line with Andy Wint - Friday 21st February 2025

mannin mlcs andy wint
Manx Radio's Mannin Line
It's Mannin Line with Beth Espey - Tuesday 14 January 2025

Manx Radio's Mannin Line

Play Episode Listen Later Jan 14, 2025 47:21


Politics dominated today's programme, with discussion about the number of MHKs and MLCs needed for the Isle of Man. Is it time for reform and if so, what form should that take? There were also differing opinions on the new ratios for teachers travelling with children on school trips. If you have any thoughts to share before the next Mannin Line, you can call 682631.

Agenda - Manx Radio
Agenda 6.1.25 - Legislative Council's second Evidence Stage on the Assisted Dying Bill

Agenda - Manx Radio

Play Episode Listen Later Jan 6, 2025 26:52


Agenda 6.1.25 In the final November sitting of Legislative Council members held their second Evidence Stage on the Assisted Dying Bill and MLCs heard from two expert witnesses Bridget Carter and Dr Claud Regnard as well as the Bill's promoter Alex Allinson. Ms Carter works to prevent coercive behaviour towards vulnerable elderly people and Dr Regnard is a life long Palliative Care expert. As the Assisted Dying Bill approaches the end of its consideration by LegCo, Agenda extracts some of the highlights from the evidence session including the advice given by the Attorney General on the Bill.

Perspective - Manx Radio
Perspective 10.11.24 - the Assisted Dying Bill has begun its passage through the Legislative Council - what did the MLCs make of it?

Perspective - Manx Radio

Play Episode Listen Later Nov 10, 2024 50:35


The Assisted Dying Bill 2023 is being progressed now through the Legislative Council and at the Principles Stage of the Bill there was a successful call for further evidence to be taken. The vote in LegCo was seven in favour of the Bill with just one against. We've heard a lot over the past year or so about the case in favour and against the principle of introducing legislation to allow people to end their life so on Perspective this week we're considering what the legislation now looks like, why new evidence is needed to be considered what concerns remain.

Agenda - Manx Radio
Agenda 22.4.24 - should MLCs make decisions on government policy bearing in mind their at best tenuous mandate from the people?

Agenda - Manx Radio

Play Episode Listen Later Apr 18, 2024 24:32


In Tynwald last week Onchan MHK Rob Callister asked his colleagues to reaffirm a 2017 Tynwald decision relating to the role LegCo members should fulfil and in particular that they should not be a member of more than one government department. The government website tells us that one MLC is a member of two departments and since last month's reshuffle there may be two, but does any of this really matter? As is often the case there were mixed views in Tynwald and we hear some of them on Agenda. Should MLCs make decisions on government policy bearing in mind their at best tenuous mandate from the people?

Manx Radio's Mannin Line
MHK Hooper on the Bishop - latest, medical director's absence, Port Erin development exclusion, the role of MLCs & bye bye Range Left Jagged Right. It's Mannin Line with Andy Wint #iom #manninline #manxradio

Manx Radio's Mannin Line

Play Episode Listen Later Dec 5, 2023 49:36


MHK Hooper on the Bishop - latest, medical director's absence, Port Erin development exclusion, the role of MLCs & bye bye Range Left Jagged Right. It's Mannin Line with Andy Wint #iom #manninline #manxradio

The Pacific War - week by week
- 90 - Pacific War - Komiatum Offensive, August 8-15, 1943

The Pacific War - week by week

Play Episode Listen Later Aug 21, 2023 35:58


Last time we spoke about the intense battle for Munda. The most important objective of the New Georgia campaign, the seizure of Munda had come at long last. The 43rd, 37th and 25th divisions all performed an envelopment offensive against Munda, but in their way were extremely formidable Japanese fortifications. It was a real slogfest seeing tremendous casualties for both sides of the conflict. However the Americans were able to breakthrough some of the Japanese bunkers, tunnels and pillboxes thanks largely to the use of flamethrowers which were becoming more and more popular on the battlefield of the Pacific. Munda was finally captured and now the Japanese had to withdraw to other areas like Vila to keep the fight alive. On the seas, commander Frederick Moosbrugger unleashed some improved mark 14 torpedoes at the IJN and scored a major victory sending 3 destroyed to their grave and countless sailors and soldiers.  This episode is the Komiatum Offensive Welcome to the Pacific War Podcast Week by Week, I am your dutiful host Craig Watson. But, before we start I want to also remind you this podcast is only made possible through the efforts of Kings and Generals over at Youtube. Perhaps you want to learn more about world war two? Kings and Generals have an assortment of episodes on world war two and much more  so go give them a look over on Youtube. So please subscribe to Kings and Generals over at Youtube and to continue helping us produce this content please check out www.patreon.com/kingsandgenerals. If you are still hungry for some more history related content, over on my channel, the Pacific War Channel you can find a few videos all the way from the Opium Wars of the 1800's until the end of the Pacific War in 1945.    So last week we talked exclusively about the New Georgia campaign, so today as you guessed it we are diving back over to New Guinea. You know when it comes to the big and popular aspects of the war, Guadalcanal usually takes the leading role, but campaigns like New Guinea seem to always fall to the wayside as they say. Yet the battle for New Guinea was just as important, it took significant resources away from the Empire of Japan. We are soon to reach the climax of the Lae-Salamaua campaign, things are really starting to heat up. Now the last time we were over on Green Hell, Brigadier Moten had just ordered the 2/6th battalion to secure Bobdubi ridge, while the 2/5th assault Mount Tambu. By the end of July, the Coane Force was beginning to occupy Tambu Bay. The 3rd battalion, 162nd of Archibald Roosevelt were securing the Boisi area with two of their companies hitting slopes west of Tambu Bay, while the 2nd battalion assembled itself at Tambu Bay. By seizing Tambu Bay, the artillery could now take up a good position to better support the troops. Further north, Brigadier Heathcote Hammer was reorganizing his 15th brigade for a new attack against the Old Vickers position. On July 24th, he held an officers conference at Gwaibolom. Hammer laid out plans to employ the 58/59th battalion against Erskine Creek and Old Vickers. The commander of the 58/59th, Lt Colonel Patrick Starr received the order from Hammer, but also a letter directed at him. In the letter Hammer laid out a ton of criticisms against his unit, some of his officers and by implication Starr himself. The main criticisms were based largely on ineffective ground operations such as the unit lacking adequate knowledge of where their neighboring units were or that of the enemy. But as we know, this unit had not received proper training and it really was a baptism under fire kind of situation. But like they say about swimming, sometimes you gotta be thrown into the pool, and boy were they. Following some rather poorly planned and failed attacks back on June 30th, the 58/59th now adopted a more measure approach against the Old Vickers position. Hammer also helped with his reorganizing efforts. Hammer ordered Company A to head further north, while Major Warfe's commandos would take over the defenses for Gwaibolom; General Savige was assigned the 2/7th to help reinforce the 15th brigade; the 2/6th were ordered to advance along the Sugarcane Ridge to clear a way forward, but would run into a 100 Japanese strong position north off the ridge. On July 26th, the Australians concentrated their 25 pounders upon the ridge before launching a frontal assault. Meanwhile Brigadier General Ralph Coane renewed their attack against Roosevelt Ridge on July 27th. Coane ordered the still assembling 2nd battalion, 162nd regiment for the task. 100 men of E company advanced using a creek line parallel to the ridge, going through some thick jungle. They marched single file, hooking back towards a spur that led towards a small knoll on its crest, looking for a way to break the Japanese defenses. But once they reached the crest, they began taking heavy fire and although they established themselves firmly on a shoulder of ground below the ridge, they could advance no further. Meanwhile the 2/6th were lobbing 25 pounders accurately over the Old Vicker's position, forcing the Japanese to flee to the refugee of their underground shelters. It basically had become a routine of taking a bombardment and awaiting some screaming Australians or Americans afterwards for most of the Japanese defenders by this point. However no assault came. On July 28th another bombardment was on its way, but this one was directed on the Coconuts area. Starting at 2:45, two 25 pounders from Tambu Bay fired hundreds of rounds alongside some 3 inch mortars in an attempt to thwart the Japanese from sending reinforcement over to the Old Vickers position. During the final 5 minutes of what was a 15 minute bombardment, it turned into a creeping barrage allowing C Company of the 58/59th to advance. The bombardment made a ton of smoke aiding the men. Three platoons attacked the Old vickers position simultaneously. Platoon 7 of Butch Proby charged across some exposed ground at the center of the position; Platoon 13 of Lt Jack Evans attacked the left; while Platoon 15 of Sergeant Vic Hammond attacked from the right. The platoons managed to successfully overrun the Japanese forward bunkers and reached the crest just as the unsuspecting Japanese there were emerging from their dugouts. A heavy firefight broke out, but it was the defenders who began fleeing for their lives towards the Coconuts area. As the men consolidated the Old vickers position they found 17 dead Japanese, but also a large amount of abandoned booty. A 70mm gun with 300 shells, 4 light machine guns, 1 medium machine gun and 28 rifles which the Australians gladly grabbed. Hammer expected the Japanese to launch a vicious counterattack so he rapidly ordered the 2/7th battalion to send the fresh A Company of Captain Septimus Cramp over to relieve the exhausted C Company. Meanwhile B Company of the 2/6th were assaulting Sugarcane Ridge being supported by 3 inch mortars and 4 Vickers guns from the 2/6th field regiment along the Tambu Bay coast. Coming from Ambush Knoll, Platoon 10 led by Lt Clive Trethewie made a frontal assault, but machine gun fire from atop Sugarcane Ridge halted them quickly. Platoon 12 led by Sergeant Stan White and Platoon 11 of Lt Ted Exton were hooking around the ridge to attack the enemy's rear. The Japanese defenders had assumed the ridge was too steep in its rear position and were completely taken by surprise by the attack, seeing Extoons Platoon 11 overrun them. The Japanese were forced to flee for their lives. The Japanese attempted a dusk counterattack to reclaim the ridge, but it failed. On July 28th, with E Company stalled, F company was brought up to help out, taking up a position to E Company's left. They both tried to assault the ridge together, but gained little ground and were forced to dig in as the Japanese harassed them with counterattacks. The problem really was the Japanese were simply too well dug in. They held a steep narrow crest on the ridge, with the typical camouflaged pillboxes, mutually supporting machine gun nests, an intricate network of underground tunnels, lets call it the “Japanese special” haha, it will be seen quite often going forward into this war. The allied artillery and mortar bombardments could do little to actually hurt the Japanese, but it did cause them to take shelter within their tunnels, then there was the hope the assaulting forces got close enough before the Japanese stormed out again, which feels a lot like battles from WW1. In the meantime Major Roosevelt's battalion were working to cut off the Japanese supply routes to the ridge. He dispatched multiple patrols to take up positions along junctions and tracks between Scout ridge, Roosevelt Ridge and Mount Tambu. The men ran into skirmishes with Japanese supply efforts, greatly hindering them. But with the lack of progress by Coane's force concerned certain commanders like General Savige who began to criticize Coane for a lack of control and discipline over the men. Savige ordered him to push on immediately to capture Roosevelt Ridge, but in response Coane protested that he needed more reinforcements to seize the heavily fortified position. Likewise the lack of progress over at Mount Tambu was also annoying commanders. Taylor Force had just relieved the exhausted 2/5th battalion on the 28th. Several companies consisting of around 400 men from the 1st battalion, 162nd regiment coming over from Nassau Bay took up positions around Mount Tambu. Australian mortar crews and stretcher bearers remaining in the line to support their American comrades with one company of the 2/5th staying behind likewise. Moten planned for a new attack, slated for the 30th, to be followed with attacks against Goodview Junction and Orodubi by the 2/5th and 2/6th respectively. To open up the new attack, 8 105mm guns positioned at Buigap Creek Valley alongside 5 25 pounders position at Tambu Bay opened fire in the morning firing around 200 rounds per gun for an hour and a half. The Americans began their assault with Platoons 2 and 3 charging the ridge while Platoon 1 awaited in reserve. For 45 minutes the two leading platoons moved 150 meters across the Japanese front's right shoulder. They managed to knock out 6 out of 8 bunkers on the shoulder before attempting to advance further, but the defenders second tier line three meters higher up opened fire upon them and numerous grenades came rolling down the slope. The fire was too much, with the defenders using their tunnel and trench system to deadly effect taking up numerous positions to fire down on the Americans. The two platoons were halted dead in their tracks as the third platoon was brought up, but it made no difference. A legendary figure emerged from this action. For those of you from down unda, you probably already know the story, but for those of you who down, Corporal Leslie Bull Allen became a hero this day. Bull Allen was born in 1918, in Ballarant Victoria and when WW2 broke out he volunteered for service with the 2nd Australian imperial force. He served the 2/5th in Palestine where he became a stretcher bearer. He served in Libya and Syria where he received the nickname Bull for getting a reputation to having a cool head under fire. He was a fairly big boy, 5”11, laborer type build and he had a really deep laugh his comrades would remark “you could hear him a mile off! Bull was thus one of the battalion's most recognisable…and one of its most popular characters”. After facing the Italians, French and Germans, Bull was sent to New Guinea. He had served during the Wau battle where he received a Military medal for carrying out comrades under intense fire, his citation read “Private Allen's bearing and his untiring efforts in tending the wounded and helping with rations and stores were an inspiration”. On July the 30th when the Americans were storming Mount Tambu and got botched down, Allen was one of the stretcher bearers who came running up and by himself carried 12 American servicemen to safety. There's a famous photograph of Bull carrying an American soldier over his shoulders who had been knocked unconscious by a mortar, I do recommend googling it. And of course, I am a Sabaton fan and I would be remiss not to mention there is a song dedicated to Bull Allen, worth a listen. I got to sit down with Sabaton at a bar once in Montreal, the first time they came to north america, by the way, just gloating. Bull Allen received the Silver Star for his heroism from the United States. But as much as I'd like to end it there, I would also like to mention the reality of war. Bull put on a straight face and showed no fear as he saved the men, but as early as 1941 he was showing psychological issues. He had been admitted to a hospital in Libya, suffering from anxiety neurosis, again what we call acute combat stress or combat stress reaction. By the time he saved those boys on Mount Tambu his health was being taxed heavily. Towards the end of 1944, Bull would begin lashing out at superior officers and got himself court martialed and demoted to private. His psychological health, alongside a few bouts of malaria took a horrible toll on him creating numerous anxiety ridden episodes seeing him discharged from duty as he was not deemed medically fit. Bull found it difficult in the post war years suffering from post traumatic stress, and during one point he lost the ability to speak for 6 months. He spent his life after the war working as a laborer and then as a theater nurse at the Ballarat Base hospital. Bull became quite a popular fellow around Ballarant and would pass away on May 11th of 1982 from diabetes and other complications. He is a staple on Anzac Day and a famous image of the Australian war effort during the Pacific War. Mount Tambu was not taken that day, though the first line of bunkers were battered. Moten realized frontally attacking such fortifications was suicide, so he elected to cut off Mount Tambu instead. With the Americans failing, the 2/5th and 2/6th planned attacks changed to taking up positions to surround Mount Tambu. Back on the 29th, Major Warfe took his men to attack what was known as the Timbered knoll held by some Japanese. He sent A Platoon led by John Lewin south along its ridge. They were supported by artillery from Tambu Bay. At 4pm the artillery and mortars started blasting away for 15 minutes. The commandos assaulted the knoll from its northern side, but were quickly pinned down by machine gun fire. Around 10 men advanced along the Bench Cut track east of the Timbered Knoll and attacked it from the south, successfully surprising the defenders, forcing them to flee. Following the capture of the Timbered Knoll, Warfe wanted to press onwards to Orodubi, but Brigadier Hammer ordered his commandos to hold their position as he did not want to open up any gaps along the ridge. Also on the 29th, General Herring for the first time informed General Savige of the true offensive going on which was against Lae rather than Salamaua, indicating to him that the role of his 3rd division was to hold the enemy down in the Salamaua area. Likewise Moten had devised a new plan to drive the enemy from Mount Tambu. It turned out a patrol from the 2/6th had discovered a route going from Ambush Knoll to the Buirali Creek which would allow forces to cut off the Komiatum track, thus isolating the Mount Tambu and Goodview junction. The 2/6 sent 4 patrols out searching for how to ford the Buirali Creek going up to the Kiamatum ridge, some of which probed Japanese positions.  To the north, Captain Edwin Griff's B Company of the 58/59th advanced to Buggert preparing to attack the Coconuts area. On the 30th as they began their attack, they were met with heavy fit around 80 yards south of the South Coconuts. Forced to dig in the Australians spent the night repelling 3 counterattacks with a handful of men receiving some nasty bayonet and knife wounds. By the morning of the 31st Griff was down to 38 effective men and at 7:20am a 4th Japanese counterattack consisting of a hundred or so men overwhelmed his position. Griff was forced to withdraw to a village west of the Old Vickers position. While this was going on, Hammer had sent companies over to cut the Komiatum and Bench tracks using his A company and C Company. Moten reinforced him with A company of the 2/7th in the hopes such actions would press the Japanese to move more units from Lae over the Salamaua area. It was a huge success as by the end of July the Salamaua area counted with more than 8000 troops. However with all of these troops at Salamaua also required the allies to boost up their commitment in the area, thus Brigadier Raymond Monaghan with the 29th brigade were landed at Nassau Bay for the task. They were assigned to reinforce the Coane force which was still struggling against Roosevelt ridge. Over on the Japanese side, General Adachi decided to reinforce Lae's defenses. He deployed the 2nd battalion, 80th regiment who would be coming over from Finschhafen, however they would never make it to Lae as by the time they were going to depart they were forced to stay put because the Australians were threatening  the region. Adachi also ordered the Shoge detachment of Major General Shoge Ryoichi to depart Wewak. His force consisted of the 1st and 2nd battalions of the 238th regiment and a battalion of the 41st mountain artillery regiment. Elements of the 238th regiment began leaving Wewak traveling in groups of 3 motor landing crafts every two nights. Each MLC had 50 men and their supplies packed in like sardines. Soon small fishing boats were also carrying 20 men, by late July the 2nd battalion, 238th had all moved from Wewak to Alexishafen. From Alexishafen they traveled again by night and by MLC to Finschhafen and from there finally to Lae. However due to increased attacks and losses upon the MLCS countless men would be left at Finschhafen. Some were ordered to march overland to Lae, but it was a nightmare of a trip. On August 1st, the 1st battalion, 80th regiment had taken up positions along the side of the Old Vickers position and began firing upon its defenders. They were covered by mortars as they charged up the steep terrain in an enveloping movements towards Grassy Knoll. Captain Edwin Griff's B company harassed them from the west, and by the following morning the 2/7th battalion were able to push the Japanese back. To the north in the Coconuts, Pimple Knoll and the Sugarcane Knoll more Japanese attacks were occurring, but the defenders held the former Japanese fortifications giving them a distinct edge. By the afternoon the Japanese were sniping men in the Old Vickers and Sugarcane Knoll, trying to cover their assault units. By August 3rd, the Japanese unleashed another assault against the entire perimeter, seeing the fiercest fighting take place in an area in front of the 8th Platoon led by Corporal Alan Naismith. Alan ended up crawling forward with grenades in hand before tossing them down the steep slopes of Old Vickers killing many Japanese. Seeing the battle going nowhere, the Japanese unleashed a banzai charge at night as a last ditch effort to break through, but were ultimately forced to withdraw. Seeing three full days of frontal assaults fail, the Japanese then elected to advance further south along a ridge and dug in between the Old Vickers and Buggert. This threatened to encircle the 2/7th, so Griff's B Company were ordered to restore the line of communications to Old Vickers. Griff ordered a concentrated bombardment of 30 mortars before his company stormed the slope the Japanese dug in on. Two platoons quickly broke through towards Sugarcane Knoll and in the process forced the Japanese to withdraw back over to the Coconuts area. Griff then ordered his company to perform mop up operations as some Japanese had stayed in their foxholes. Yet the performance overall for the 58/59th had displeased Hammer who now decided to place them under Major Warfe's command. They would also be redeployed over to the Gwaibolom area, while the commandos would take over their Old Vickers position. For a few days the 2/7th performed patrols around the Coconuts area to prepare for a final attack against it. Over at Mount Tambu, on August 4th, Captain Cam Bennett's B Company and Walters A Company of the 2/5th successfully surprise attacked the defenders atop a small knoll known as Hodge's Knoll. However they were soon met with heavy counterattacks from three sides dislodging them in the late afternoon. The next day, Moten ordered the 2/6th battalion to advance along the Stephens Track, while its D company led by Captain Harold Laver would take an alternate path towards the Komiatum ridge heading north of Goodview. During the afternoon, a forward patrol of Company D found a route through the jungle to Komiatum village, but the route proved very difficult for the full company to traverse. Alongside this discovery, a patrol from Taylor Force found a small ridge north of Komiatum that was unoccupied named Davidson ridge. By August 6th, Moten and Savige concluded their plan to isolate and reduce Mount Tambu. The 2/6th would secure Komiatum ridge to the northwest; Coane Force would hit Roosevelt and Scout ridge; Lt Colonel Charlie Davidsons 42nd battalion would hit a key ridge to the north, Ie: the one that was to be called Davidson; the 2/5th would hit Goodview junction and the 15th brigade would contain assault the Coconuts area containing the enemy at Tambu knoll and Orodubi. General Herring liked the plan and urged General Savige to quote "drive Coane on to the capture of Roosevelt Ridge even if the cost is higher than he cares about". Herring also added that he could take Savige's requests to the higher authorities and upon stating that Savige immediately requested Coane and Major Roosevelt be relieved of their commands. Again a lot of the interpersonal and command issues were due to MacArthurs tampering with Alamo Force. Brigadier Coane was told by Fuller he was a separate command from MacKechnie and Colonel Roosevelt continuously refused to obey orders from MacKechnie stating he was not under Australian command. It took until July 19th, for Herring to clarify things that the Australians were in charge of operations in the Nassau Bay area. Combine this with the lack of progress and it was no surprise people were gunning to sack another. On August 7th the first units of Davidsons 42nd battalion landed at Nassau bay at 2am and Coane requested that Davidson immediately march north. Davidson refused to do so until his men got a hot meal and some sleep angering Coane. Then when Davidson and his men reach Duali he was informed Major Stephen Hodgman was waiting with orders from Moten that it was he who was taking operational command. Coane was only to have command over supply communications and rations. When Davidson reached Tambu Bay on the 8th he met with Coane who was greatly frustrated that he was unable to use Davidsons units to hit Roosevelt ridge. Coane told him “If I can't do as I want with you, I don't consider you under my command at all”. It was quite fortunate as MacArthur soon relieved Coane and Roosevelt of their commands. As General Savige would later write “MacArthur asked me for my views on Coane and Roosevelt and I gave them strongly…I had my bags packed but MacArthur supported me”. Thus MacArthur sides with Herring and Savige and as a result Colonel MacKechnie was given back command over the 162nd regiment which was taken away from the 41st division directly under Savige's command. So much sneaky maneuvering going on by MacArthur's team. On August the 9th Savige visited Motens HQ, then Hammers, then the 58/59th battalions and finally the 2/6th. He was making a tour of the front lines trying to raise morale for the Australians. The next day, the 42nd battalion finally got into position at Tambu Bay where they received confirmation of their orders to seize Davidson ridge. By the 11th the men were climbing the ridge, facing no opposition and it was fully occupied by the 12th. Also on the 12th, MacKechnie began his attack against Roosevelt ridge deploying his 2nd battalion on the right flank and the 3rd on the left. The 2nd battalion established a position on the ridge crest rappelling several counterattacks throughout the day. After a 1.5hr artillery barrage of over 2000 rounds the 2nd battalion charged the ridge and successfully breached the Japanese line in three points. Meanwhile the 3rd battalion, 66th regiment were fighting for their lives, but by nightfall two Australian companies were now occupying high knolls around 500 yards apart. The 3rd battalion, 238th regiment had just begun arriving to Salamaua and were quickly redirected to help out the men on Roosevelt ridge. It would all be for naught however as by the 14th, the Australians pushed the Japanese to the eastern end of the ridge. From a Historian who covered the 41st division “At about 13:15 the jungles north, south and west of Roosevelt Ridge shook and shivered to the sustained blast. The mountains and ridges threw the echo back and forth, down and out, and the quiet white-capped sea to the east, ringing the outer third of Roosevelt Ridge, grew dark a s it received the eruption of earth and steel on that stricken shoulder of land. Scores of guns—75-mm howitzers, Aussie 25-pounders, 20-mms, Bofors, light and heavy machine-guns, even small arms—had opened up simultaneously on the enemy-held ridge. A score or more Allied fighters and bombers had swooped low to strafe its dome and tons of bombs released from the B-24s and B-25s fell straight and true, to detonate, shatter, rip and tear and to deliver certain death at that moment on an August afternoon. Those who watched from the beach saw the top fourth of the ridge lift perceptibly into the air and then fall into the waiting sea. In a scant twenty minutes all that remained of the objective was a denuded, redly scarred hill over which infantrymen already were clambering, destroying what remained of a battered and stunned enemy.” By the late afternoon, Roosevelt Ridge was finally firmly in the hands of the allies. MacKechnie could not however advance any further as his lines were already overextended. The Japanese withdrew to the nearby Scout Ridge where the 238th regiment reinforcements also came to defend.  While this was occurring the 2/7th were advancing upon the Coconuts area. Captain Andrew Rooke led the Bena platoon of Company A alongside Platoon 9 to hit the steep eastern approaches of the South coconuts; Captain Fred Barr's B company advanced upon the North Coconuts from the west. August the 14th began with a heavy airstrike made up of 22 B-24's and 7 B-17's. Starting at 9:30am as told to us by Axel Olsen observing from the Old Vickers “with a noise like the rushing of a great wind', the bombs passed over the heads of the waiting assault troops. ‘Trees, logs and other rubbish flew through the fall [sic] of dust which now cloaked the target.' The observers at Old Vickers observed, ‘It seemed that nothing could have lived in the midst of devastation loosed by the planes.” At 10:10 artillery began to bombard the area for an hour and half. As the artillery ceased, 3 inch mortars continued to fire covering the approach of the infantry who were using smoke bombs. As Axel Olsen wrote observing from the Old Vickers  “came a terribly fierce raking with Vickers guns firing through the haze from smoke bombs”. The Australian assault battered the north coconuts position which was guarded by two pillboxes connected to weapon pits using crawl trenches. The area had suffered hard from the bombardments easily allowing the Australians to seize it. However the southern defenses of the south Coconuts found defenders resisting hard in their trenches. The center Coconuts position like the north had nearly been obliterated by the bombing allowing B company to make progress, but soon they were pulling back to the north coconuts position. During the night, allied platoons came across a Japanese communication line going over the Salamua-Bobdubi track, so they cut it to prevent reinforcements. For the next two days, patrols and mortar fire were harassing the south coconuts defenders gradually forcing them to evacuate. By August the 17th the Coconuts and northern end towards Bobdubi were firmly in Australian hands. With all of these gains in hand, Moten was finally ready to attack Komiatum. On August the 15th,  Captain Edgar's A Company, Captain Laver's D Company of the 2/6th battalion took up a position due west of Laver's Knoll. Yes the future names of these knolls and ridges really does seem to give away what happens in the stories haha. Laver's Knoll was a key feature of the Komiatum ridge and taking it would allow the allies to apply more pressure upon the enemy. On the morning of the 16th, the 2/5th battalion performed a diversionary attack against Goodview, while A and B Companies advanced up the Komiatum ridge under a creeping barrage. The men were fortunate as the Japanese were forced to flee during the artillery fire, allowing Laver's knoll to be seized quite easily. The men dug in immediately allowing Lt Les Johnson's platoon 17 to capture, you guessed it Johnson's knoll. During WW2 if you really wanted something named after you, all you had to do was travel to Green Hell. Johnson and his men dug in on the knoll and soon Japanese fire was directed at them. Japanese counterattacks were lobbed from their south and west before nightfall, but they managed to hold on. During the night the 42nd battalion began using Vicker guns and mortars from Davidson ridge to help harass the enemy. Around dawn on the 17th, the Japanese unleashed another counter attack against Johnson knoll, this time the enemy got within just meters of the Australian defenders. After dusk even more counterattacks were made seeing 217 deaths, 380 wounded and 301 sick Japanese after all was said and done. The attacks were tossed back and soon Vickers machine guns were brought up to Laver's Knoll to add to the Japanese misery. Unable to break the allied push onto the Komiatum ridge, the Japanese began to become more and more desperate. Artillery and aerial bombardment on top of enveloping maneuvers by the Australians were taking a heavy toll. The Japanese had suffered over 900 casualties since July 23rd and with more and more men dying by the minute, General Nakano ordered a withdrawal from Komiatum to be carried out on the night of August 19th. Nakano was still under the illusion Salamaua was the main target. The next day the Taylor Force and 2/5th found Mount Tambu and Goodview suddenly unoccupied and finally seized their objectives. General Savige personally came over to congratulate the men who took Laver's knoll, but this was to be his last action in command of the 3rd division. Blamey decided to replace  Savige with the commander of the 5th division General Edward Milford. Milford would later find out the reason for Savige's sacking was because General Herring was greatly annoyed that a supply line to the coast had not been opened, which was desperately needed to relieve supply aircraft for the upcoming attack on Lae. Herring told Milford that Savige had never visited the front line because he was too old, but as I just mentioned this was false, Savage had in fact visited Mubo and Komiatum. Major General Frank Berryman working in Blamey's HQ, who remained quite close to the man, who often sought out his advice believed General Herring was unjustified in his sacking of Savige. Berryman would point out “ Herring ‘not giving Savige a fair burl… Savige having to fight Herring as well as Japs. Savige had done well and we had misjudged him'.Savige bitterly handed over his command, greatly disappointed he would not get to see the final capture of Salamaua. But he did not depart unrewarded, as he received a Companion of the order of Bath for his services during the campaign with his citation reading; Maj-Gen. Savige had control of the Battle for Salamaua from 30 Jun. 43 till his relief on 26 Aug. 43. The battle was finally won on 11 Sep. 43—the credit for victory must rest with Maj-Gen. Savige during whose period of command, the back of the enemy's defence was broken. The nature of the country rendered great assistance to the defender, and careful planning alone enabled the defences to be overcome. The supplying of our forward troops was also a terrific problem. Maj-Gen. Savige triumphed over all these difficulties, his men were kept supplied, they were encouraged to endure the most dreadful hardships, and to overcome great difficulties of terrain. Maj-Gen. Savige's plans were well conceived and he saw them carried through. The success achieved is of the greatest importance to the Allied cause, and Maj-Gen. Savige by his fine leadership has made a very real contribution to the ultimate success of the United nations. The victories won over the enemy at the battles for Mubo and Komiatum were due to his well conceived plans and energetic execution. I would like to take this time to remind you all that this podcast is only made possible through the efforts of Kings and Generals over at Youtube. Please go subscribe to Kings and Generals over at Youtube and to continue helping us produce this content please check out www.patreon.com/kingsandgenerals. If you are still hungry after that, give my personal channel a look over at The Pacific War Channel at Youtube, it would mean a lot to me. The battle for Salamaua and Lae was drawing ever closer. The boys down unda had seized control over vital positions forcing the Japanese into more and more desperate defensive measures taking horrifying casualties in the process. 

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jul 26, 2023 54:31


FlashAttention was first published by Tri Dao in May 2022 and it had a deep impact in the large language models space. Most open models you've heard of (RedPajama, MPT, LLaMA, Falcon, etc) all leverage it for faster inference. Tri came on the podcast to chat about FlashAttention, the newly released FlashAttention-2, the research process at Hazy Lab, and more. This is the first episode of our “Papers Explained” series, which will cover some of the foundational research in this space. Our Discord also hosts a weekly Paper Club, which you can signup for here. How does FlashAttention work?The paper is titled “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”. There are a couple keywords to call out:* “Memory Efficient”: standard attention memory usage is quadratic with sequence length (i.e. O(N^2)). FlashAttention is sub-quadratic at O(N). * “Exact”: the opposite of “exact” in this case is “sparse”, as in “sparse networks” (see our episode with Jonathan Frankle for more). This means that you're not giving up any precision.* The “IO” in “IO-Awareness” stands for “Input/Output” and hints at a write/read related bottleneck. Before we dive in, look at this simple GPU architecture diagram:The GPU has access to three memory stores at runtime:* SRAM: this is on-chip memory co-located with the actual execution core. It's limited in size (~20MB on an A100 card) but extremely fast (19TB/s total bandwidth)* HBM: this is off-chip but on-card memory, meaning it's in the GPU but not co-located with the core itself. An A100 has 40GB of HBM, but only a 1.5TB/s bandwidth. * DRAM: this is your traditional CPU RAM. You can have TBs of this, but you can only get ~12.8GB/s bandwidth, which is way too slow.Now that you know what HBM is, look at how the standard Attention algorithm is implemented:As you can see, all 3 steps include a “write X to HBM” step and a “read from HBM” step. The core idea behind FlashAttention boils down to this: instead of storing each intermediate result, why don't we use kernel fusion and run every operation in a single kernel in order to avoid memory read/write overhead? (We also talked about kernel fusion in our episode with George Hotz and how PyTorch / tinygrad take different approaches here)The result is much faster, but much harder to read:As you can see, FlashAttention is a very meaningful speed improvement on traditional Attention, and it's easy to understand why it's becoming the standard for most models.This should be enough of a primer before you dive into our episode! We talked about FlashAttention-2, how Hazy Research Group works, and some of the research being done in Transformer alternatives.Show Notes:* FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (arXiv)* FlashAttention-2* Together AI* From Deep Learning to Long Learning* The Hardware Lottery by Sara Hooker* Hazy Research* Is Attention All You Need?* Nvidia CUTLASS 3* SRAM scaling slows* Transformer alternatives:* S4* Hyena* Recurrent Neural Networks (RNNs)Timestamps:* Tri's background [00:00:00]* FlashAttention's deep dive [00:02:18]* How the Hazy Research group collaborates across theory, systems, and applications [00:17:21]* Evaluating models beyond raw performance [00:25:00]* FlashAttention-2 [00:27:00]* CUDA and The Hardware Lottery [00:30:00]* Researching in a fast-changing market [00:35:00]* Promising transformer alternatives like state space models and RNNs [00:37:30]* The spectrum of openness in AI models [00:43:00]* Practical impact of models like LLAMA2 despite restrictions [00:47:12]* Incentives for releasing open training datasets [00:49:43]* Lightning Round [00:53:22]Transcript:Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO-in-Residence at Decibel Partners. Today we have no Swyx, because he's in Singapore, so it's a one-on-one discussion with Tri Dao. Welcome! [00:00:24]Tri: Hi everyone. I'm Tri Dao, excited to be here. [00:00:27]Alessio: Tri just completed his PhD at Stanford a month ago. You might not remember his name, but he's one of the main authors in the FlashAttention paper, which is one of the seminal work in the Transformers era. He's got a lot of interest from efficient transformer training and inference, long range sequence model, a lot of interesting stuff. And now you're going to be an assistant professor in CS at Princeton next year. [00:00:51]Tri: Yeah, that's right. [00:00:52]Alessio: Yeah. And in the meantime, just to get, you know, a low pressure thing, you're Chief Scientist at Together as well, which is the company behind RedPajama. [00:01:01]Tri: Yeah. So I just joined this week actually, and it's been really exciting. [00:01:04]Alessio: So what's something that is not on the internet that people should know about you? [00:01:09]Tri: Let's see. When I started college, I was going to be an economist, so I was fully on board. I was going to major in economics, but the first week I was at Stanford undergrad, I took a few math classes and I immediately decided that I was going to be a math major. And that kind of changed the course of my career. So now I'm doing math, computer science, AI research. [00:01:32]Alessio: I had a similar thing. I started with physics and then I took like a programming course and I was like, I got to do computer science. I don't want to do physics. So FlashAttention is definitely, everybody's using this. Everybody loves it. You just released FlashAttention 2 last week. [00:01:48]Tri: Yeah. Early this week on Monday. Yeah. [00:01:53]Alessio: You know, AI time. Things move fast. So maybe let's run through some of the FlashAttention highlights, some of the innovation there, and then we can dive into FlashAttention 2. So the core improvement in FlashAttention is that traditional attention is a quadratic sequence length. And to the two, FlashAttention is linear, which obviously helps with scaling some of these models. [00:02:18]Tri: There are two factors there. So of course the goal has been to make attention go faster or more memory efficient. And ever since attention became popular in 2017 with the Transformer paper, lots and lots of folks have been working on this. And a lot of approaches has been focusing on approximating attention. The goal is you want to scale to longer sequences. There are tons of applications where you want to do that. But scaling to longer sequences is difficult because attention scales quadratically in sequence length on both runtime and memory, as you mentioned. So instead of trying to approximate attention, we were trying to figure out, can we do the same computation and maybe be more memory efficient? So in the end, we ended up being the memory is linear in sequence length. In terms of computation, it's still quadratic, but we managed to make it much more hardware friendly. And as a result, we do get wall clock speed up on the order of 2 to 4x, which really helps because that just means that you'll be able to train with 2 to 4x longer sequence length for the same cost without doing any approximations. As a result, lots of folks have been using this. The thing is available in a lot of libraries that do language model training or fine tuning. [00:03:32]Alessio: And the approximation thing is important because this is an exact thing versus a sparse. So maybe explain a little bit the difference there. [00:03:40]Tri: For sure. So in addition, essentially you compute pairwise similarity between every single element in a sequence against each other. So there's been other approaches where instead of doing all that pairwise computation, you only compute similarity for some pairs of elements in the sequence. So you don't do quadratic number of comparison. And this can be seen as some form of sparsity. Essentially you're ignoring some of the elements. When you write down the matrix, you essentially say, OK, I'm going to pretend there's zero. So that has some benefits in terms of runtime and memory. But the trade-off is that it tends to do worse in terms of quality because you're essentially approximating or ignoring some elements. And I personally have worked on this as well for a few years. But when we talk to practitioners who actually train models, especially at large scale, they say, tend not to use these approximate attention methods. Because it turns out, this was surprising to me at the time, was that these approximation methods, even though they perform fewer computation, they tend to not be faster in walk-on time. So this was pretty surprising because back then, I think my background was more on the theoretical side. So I was thinking of, oh, how many flops or floating point operations are you performing? And hopefully that correlates well with walk-on time. But I realized that I was missing a bunch of ideas from the system side where flops or floating point operations don't necessarily correlate with runtime. There are other factors like memory reading and writing, parallelism, and so on. So I learned a ton from just talking to systems people because they kind of figured this stuff out a while ago. So that was really eye-opening. And then we ended up focusing a lot more on memory reading and writing because that turned out to be the majority of the time when you're doing attention is reading and writing memory. [00:05:34]Alessio: Yeah, the I.O. awareness is probably one of the biggest innovations here. And the idea behind it is, like you mentioned, the FLOPS growth of the cards have been going up, but the memory bandwidth, not as much. So I think maybe that was one of the assumptions that the original attention paper had. So talk a bit about how that came to be as an idea. It's one of those things that like in insight, it's like, obviously, why are we like rewriting to like HBM every time, you know, and like once you change it, it's clear. But what was that discovery process? [00:06:08]Tri: Yeah, in hindsight, a lot of the ideas have already been there in the literature. And I would say is it was somehow at the intersection of both machine learning and systems. And you kind of needed ideas from both sides. So on one hand, on the system side, so lots of systems folks have known that, oh, you know, kernel fusion is great. Kernel fusion just means that instead of performing, you know, loading the same element, instead of performing an operation, write it down, load it back up and perform the second operation, you just load it once, perform two operations and then write it down again. So that saves you kind of memory read and write in the middle there. So kernel fusion has been a classic. There's been other techniques from the system side, like tiling, where you perform things in the form of computations in block, again, so that you can load it into a really fast memory. Think of it as a cache. And this is, again, classical computer science ideas, right? You want to use the cache. So the system folks have been thinking about these ideas for a long time, and they apply to attention as well. But there were certain things in attention that made it difficult to do a complete kernel fusion. One of which is there is this softmax operation in the middle, which requires you to essentially sum across the row of the attention matrix. So it makes it difficult to kind of break it, because there's this dependency. So it makes it difficult to break things into a block. So on the system side, people have been thinking about these ideas, but it's been difficult to kind of do kernel fusion for the entire operation. On the machine learning side, people have been thinking more algorithmically. They say, okay, either we can approximate attention, or there's this trick called the online softmax trick, which says that because of softmax, the way it's written mathematically, you can actually break it up into smaller pieces, do some rescaling, and still get the right answer. So this online softmax trick has been around for a while. I think there was a paper from NVIDIA folks back in 2018 about this. And then there was a paper from Google. So Marcus, Rob, and Stats wrote a paper late 2021 on using this online softmax trick to break attention up into smaller pieces. So a lot of the ideas were already there. But it turns out, you kind of need to combine ideas from both sides. So you need to understand that, hey, we want to do kernel fusion to reduce memory written writes. But we also need this online softmax trick to be able to break the softmax into smaller pieces so that a lot of the systems tricks kind of carry through. We saw that, and it was kind of a natural idea that we ended up using ideas from both sides, and it ended up working pretty well. Yeah. [00:08:57]Alessio: Are there any downsides to kernel fusion? If I think about databases and the reasons why we have atomic operations, you know, it's like, you have observability and fallback in between them. How does that work with attention? Is there anything that we lose by fusing the operations? [00:09:13]Tri: Yeah, I think mostly on the practical side is that you lose a little bit of flexibility in the sense that, hey, now you have, for example, faster attention, it's just a subroutine that you would call to do attention. But as a researcher, let's say you don't want that exact thing, right? You don't want just attention, let's say you want some modification to attention. You want to do, hey, I'm going to multiply the query and key, but then I'm going to do this extra thing before I carry on. So kernel fusion just means that, okay, we have a subroutine that does the entire thing. But if you want to experiment with things, you won't be able to use that fused kernel. And the answer is, can we have a compiler that then automatically does a lot of this kernel fusion? Lots of compiler folks are thinking about this, either with a new language or you can embed it in PyTorch. PyTorch folks have been working on this as well. So if you write just your code in PyTorch and they can capture the graph, can they generate code that will fuse everything together? That's still ongoing, and it works for some cases. But for attention, because of this kind of softmax rewriting stuff, it's been a little bit more difficult. So maybe in a year or two, we'll have compilers that are able to do a lot of these optimizations for you. And you don't have to, for example, spend a couple months writing CUDA to get this stuff to work. Awesome. [00:10:41]Alessio: And just to make it clear for listeners, when we say we're not writing it to memory, we are storing it, but just in a faster memory. So instead of the HBM, we're putting it in the SRAM. Yeah. [00:10:53]Tri: Yeah. [00:10:54]Alessio: Maybe explain just a little bit the difference there. [00:10:56]Tri: Yeah, for sure. This is kind of a caricature of how you think about accelerators or GPUs in particular, is that they have a large pool of memory, usually called HBM, or high bandwidth memory. So this is what you think of as GPU memory. So if you're using A100 and you list the GPU memory, it's like 40 gigs or 80 gigs. So that's the HBM. And then when you perform any operation, you need to move data from the HBM to the compute unit. So the actual hardware unit that does the computation. And next to these compute units, there are on-chip memory or SRAM, which are much, much smaller than HBM, but much faster. So the analogy there is if you're familiar with, say, CPU and RAM and so on. So you have a large pool of RAM, and then you have the CPU performing the computation. But next to the CPU, you have L1 cache and L2 cache, which are much smaller than DRAM, but much faster. So you can think of SRAM as the small, fast cache that stays close to the compute unit. Physically, it's closer. There is some kind of asymmetry here. So HBM is much larger, and SRAM is much smaller, but much faster. One way of thinking about it is, how can we design algorithms that take advantage of this asymmetric memory hierarchy? And of course, lots of folks have been thinking about this. These ideas are pretty old. I think back in the 1980s, the primary concerns were sorting. How can we sort numbers as efficiently as possible? And the motivating example was banks were trying to sort their transactions, and that needs to happen overnight so that the next day they can be ready. And so the same idea applies, which is that they have slow memory, which was hard disk, and they have fast memory, which was DRAM. And people had to design sorting algorithms that take advantage of this asymmetry. And it turns out, these same ideas can apply today, which is different kinds of memory. [00:13:00]Alessio: In your paper, you have the pyramid of memory. Just to give people an idea, when he says smaller, it's like HBM is like 40 gig, and then SRAM is like 20 megabytes. So it's not a little smaller, it's much smaller. But the throughput on card is like 1.5 terabytes a second for HBM and like 19 terabytes a second for SRAM, which is a lot larger. How do you think that evolves? So TSMC said they hit the scaling limits for SRAM, they just cannot grow that much more. HBM keeps growing, HBM3 is going to be 2x faster than HBM2, I think the latest NVIDIA thing has HBM3. How do you think about the future of FlashAttention? Do you think HBM is going to get fast enough when maybe it's not as useful to use the SRAM? [00:13:49]Tri: That's right. I think it comes down to physics. When you design hardware, literally SRAM stays very close to compute units. And so you don't have that much area to essentially put the transistors. And you can't shrink these things too much. So just physics, in terms of area, you don't have that much area for the SRAM. HBM is off-chip, so there is some kind of bus that essentially transfers data from HBM to the compute unit. So you have more area to essentially put these memory units. And so yeah, I think in the future SRAM probably won't get that much larger, because you don't have that much area. HBM will get larger and faster. And so I think it becomes more important to design algorithms that take advantage of this memory asymmetry. It's the same thing in CPU, where the cache is really small, the DRAM is growing larger and larger. DRAM could get to, I don't know, two terabytes, six terabytes, or something, whereas the cache stays at, I don't know, 15 megabytes or something like that. I think maybe the algorithm design becomes more and more important. There's still ways to take advantage of this, I think. So in the future, I think flash attention right now is being used. I don't know if in the next couple of years, some new architecture will come in and whatnot, but attention seems to be still important. For the next couple of years, I still expect some of these ideas to be useful. Not necessarily the exact code that's out there, but I think these ideas have kind of stood the test of time. New ideas like IO awareness from back in the 1980s, ideas like kernel fusions, tiling. These are classical ideas that have stood the test of time. So I think in the future, these ideas will become more and more important as we scale models to be larger, as we have more kinds of devices, where performance and efficiency become much, much more important. [00:15:40]Alessio: Yeah, and we had Jonathan Frankle on the podcast, and if you go to issattentionallyouneed.com, he has an outstanding bet, and he does believe that attention will be the state of the art architecture still in a few years. Did you think flash attention would be this popular? I'm always curious on the research side, you publish a paper, and obviously you know it's great work, but sometimes it just kind of falls flat in the industry. Could you see everybody just starting to use this, or was that a surprise to you? [00:16:11]Tri: Certainly, I didn't anticipate the level of popularity. Of course, we were extremely happy to have people using this stuff and giving us feedback and so on, and help us improve things. I think when we were writing the paper, I remember sending an email to one of my advisors, and like, hey, I'm excited about this paper, but I think the most important thing will be the artifact, which is the code. So I knew that the code will be valuable. So we kind of focus a lot on the code and make sure that the code is usable and as fast as can be. Of course, the idea, the paper presents the ideas and explain it and have experiments that validate the idea, but I knew that the artifact or the code was also pretty important. And that turned out to be the right focus, which is, you know, we put out the paper, we release the code and continue working on the code. So it's a team effort with my co-authors as well. [00:17:07]Alessio: We mentioned Hazy Research a bunch of times on the podcast before. I would love for you to spend five minutes just talking about how does the group work? How do people get together? How do you bounce ideas off of each other? Yeah. [00:17:21]Tri: So Hazy Research is a research group at Stanford led by one of my advisors, Chris Re. I love the people there. It was one of the best experiences I had. They've made my PhD so much more enjoyable. And I think there are a couple of ways that the group has been working pretty well. So one is, I think there's a diverse pool of people who either, you know, some of them focus on algorithms and theory, some of them focus on building systems, some of them focus on applications. And as a result, there is this flow of idea. So as an example, some of us were working on like more algorithms and theory, and then we can talk to the folks building systems and say, hey, let's try it out and let's put it in the systems and see how it is. And there you will get feedback from systems folks. They will say, hey, we implemented this, or we tried this and this is where it doesn't work, something like that. And once we put it in the systems, the application folks can use the algorithm or new methods or new models. And we again get great feedback from them because the application folks, for example, some of my good friends, they focus on medical imaging or seizure detection. And that is the problem they care about. And if your method doesn't work on the task they care about, they will tell you. Whereas I think a lot of people in machine learning, they're a little bit more flexible. So they will be like, hey, it doesn't work on seizure detection. Let's try some other task, right? But having that direct feedback of like, hey, it doesn't work there, let's figure out why. I think that that feedback allows us to do better work. And I think that kind of process of exchanging ideas, validating it in a real system so that applications folks can try it out and give you feedback. That cycle has been very, very useful. And so that's one, having a diverse group of people. The other one is, and this is something I really appreciate from advice from Chris was try to understand the fundamental, right? And he's happy letting me go off and read some textbooks and playing with things because I think a lot of research ideas come from understanding the old literature and see how it fits with the new landscape. And so if you just new archive papers every day, that's great, but you also need to read textbooks. And that's one advice I got from Chris, which is understand the fundamentals. And I think that allows us to do more impactful work. [00:19:46]Alessio: How do you think about academia versus industry? I feel like AI / Machine Learning has been an area where up until three, four years ago, most of the cutting edge work was being done in academia. And now there's all these big industry research labs. You're obviously going to Princeton, so you're an academia believer. How should people think about where to go? Say I'm doing my master's, I have to decide between doing a PhD and going into OpenAI Anthropic. How should I decide? [00:20:15]Tri: I think they kind of play a complementary role, in my opinion. Of course, I also was considering different paths as well. So I think right now, scaling matters a lot, especially when you talk about language models and AI and so on. Scaling matters a lot. And that means that you need compute resources and you need infrastructure and you need engineers time. And so industry tends to have an advantage when it comes to scaling things. But a lot of the ideas actually came from academia. So let's take Attention, which got popular with the Transformer in 2017. Attention actually has been around for a while. So I think the first mention was in 2014, a paper from Bernadot and others and Yoshua Bengio, which is coming from academia. A lot of ideas did come from academia. And scaling things up, of course, I think OpenAI has been great at scaling things up. That was the bet that they made after, I think, GPT-2. So they saw that scaling these things up to back then was 1.5 billion parameter seemed to give you amazing capabilities. So they really committed to that. They really committed to scaling things. And that turned out to be, it's been a pretty successful bet. I think for academia, we're still trying to figure out exactly what we're doing in this shifting landscape. And so lots of folks have been focusing on, for example, evaluation. So I know the Stanford Center for Foundation Model led by Percy, they have this benchmark called HELM, which is this holistic benchmark. So trying to figure out, okay, characterizing the landscape of different kinds of models, what people should evaluate, what people should measure, and things like that. So evaluation is one role. The other one is understanding. So this has happened historically where there's been some development in the industry and academia can play a role in explaining, understanding. They have the luxury to slow down trying to understand stuff, right? So lots of paper on understanding what's really going on, probing these models, and so on. I think I'm not as familiar with the NLP literature, but my impression is there's a lot of that going on in the NLP conferences, which is understanding what these models are doing, what capabilities they have, and so on. And the third one I could see is that the academia can take more risky bets in the sense that we can work on stuff that is quite different from industry. I think industry, my impression is you have some objective. You're trying to say, hey, for this quarter, we want to scale the model in this particular way. Next quarter, we want the model to have these capabilities. You're trying to get objectives that maybe, I don't know, 70% that will work out because it's important for the company's direction. I think for academia, the way things work is you have many, many researchers or PhD students, and they're kind of pursuing independent directions. And they have a little bit more flexibility on, hey, I'm going to try out this seemingly crazy idea and see, let's say there's a 30% chance of success or something. And however you define success, for academia, a lot of the time, success just means like, hey, we found something interesting. That could eventually go into industry through collaboration and so on. So I do see academia and industry kind of playing complementary roles. And as for someone choosing a career, I think just more and more generally, industry would be probably better in terms of compensation, in terms of probably work-life balance. But my biased perspective is that maybe academia gives you a little bit more freedom to think and understand things. So it probably comes down to personal choice. I end up choosing to be a professor next year at Princeton. But of course, I want to maintain a relationship with industry folks. I think industry folks can provide very valuable feedback to what we're doing in academia so that we understand where the field is moving because some of the directions are very much influenced by what, for example, OpenAI or Google is doing. So we want to understand where the field is moving. What are some promising applications? And try to anticipate, okay, if the field is moving like this, these applications are going to be popular. What problems will be important in two, three years? And then we try to start thinking about those problems so that hopefully in two, three years, we have some of the answers to some of these problems in two, three years. Sometimes it works out, sometimes it doesn't. But as long as we do interesting things in academia, that's the goal. [00:25:03]Alessio: And you mentioned the eval side. So we did a Benchmarks 101 episode. And one of the things we were seeing is sometimes the benchmarks really influence the model development. Because obviously, if you don't score well on the benchmarks, you're not going to get published and you're not going to get funded. How do you think about that? How do you think that's going to change now that a lot of the applications of these models, again, is in more narrow industry use cases? Do you think the goal of the academia eval system is to be very broad and then industry can do their own evals? Or what's the relationship there? [00:25:40]Tri: Yeah, so I think evaluation is important and often a little bit underrated. So it's not as flashy as, oh, we have a new model that can do such and such. But I think evaluation, what you don't measure, you can't make progress on, essentially. So I think industry folks, of course, they have specific use cases that their models need to do well on. And that's what they care about. Not just academia, but other groups as well. People do understand what are some of the emerging use cases. So for example, now one of the most popular use cases is Chatbot. And then I think folks from Berkeley, some of them are from Berkeley, call them MLCs. They set up this kind of Chatbot arena to essentially benchmark different models. So people do understand what are some of the emerging use cases. People do contribute to evaluation and measurement. And as a whole, I think people try to contribute to the field and move the field forward, albeit that maybe slightly different directions. But we're making progress and definitely evaluation and measurement is one of the ways you make progress. So I think going forward, there's still going to be just more models, more evaluation. We'll just have better understanding of what these models are doing and what capabilities they have. [00:26:56]Alessio: I like that your work has been focused on not making benchmarks better, but it's like, let's just make everything faster. So it's very horizontal. So FlashAttention 2, you just released that on Monday. I read in the blog post that a lot of the work was also related to some of the NVIDIA library updates. Yeah, maybe run us through some of those changes and some of the innovations there. Yeah, for sure. [00:27:19]Tri: So FlashAttention 2 is something I've been working on for the past couple of months. So the story is the NVIDIA CUTLASS team, they released a new version of their library, which contains all these primitives to allow you to do matrix multiply or memory loading on GPU efficiently. So it's a great library and I built on that. So they released their version 3 back in January and I got really excited and I wanted to play with that library. So as an excuse, I was just like, okay, I'm going to refactor my code and use this library. So that was kind of the start of the project. By the end, I just ended up working with the code a whole lot more and I realized that, hey, there are these inefficiencies still in Flash Attention. We could change this way or that way and make it, in the end, twice as fast. But of course, building on the library that the NVIDIA folks released. So that was kind of a really fun exercise. I was starting out, it's just an excuse for myself to play with the new library. What ended up was several months of improvement, improving Flash Attention, discovering new ideas. And in the end, we managed to make it 2x faster and now it's pretty close to probably the efficiency of things like matrix multiply, which is probably the most optimized subroutine on the planet. So we're really happy about it. The NVIDIA Cutlass team has been very supportive and hopefully in the future, we're going to collaborate more. [00:28:46]Alessio: And since it's an NVIDIA library, can you only run this on CUDA runtimes? Or could you use this and then run it on an AMD GPU? [00:28:56]Tri: Yeah, so it's an NVIDIA library. So right now, the code we release runs on NVIDIA GPUs, which is what most people are using to train models. Of course, there are emerging other hardware as well. So the AMD folks did implement a version of Flash Attention, I think last year as well, and that's also available. I think there's some implementation on CPU as well. For example, there's this library, ggml, where they implemented the same idea running on Mac and CPU. So I think that kind of broadly, the idea would apply. The current implementation ended up using NVIDIA's library or primitives, but I expect these ideas to be broadly applicable to different hardware. I think the main idea is you have asymmetry in memory hierarchy, which tends to be everywhere in a lot of accelerators. [00:29:46]Alessio: Yeah, it kind of reminds me of Sara Hooker's post, like the hardware lottery. There could be all these things that are much better, like architectures that are better, but they're not better on NVIDIA. So we're never going to know if they're actually improved. How does that play into some of the research that you all do too? [00:30:04]Tri: Yeah, so absolutely. Yeah, I think Sara Hooker, she wrote this piece on hardware lottery, and I think she captured really well of what a lot of people have been thinking about this. And I certainly think about hardware lottery quite a bit, given that I do some of the work that's kind of really low level at the level of, hey, we're optimizing for GPUs or NVIDIA GPUs and optimizing for attention itself. And at the same time, I also work on algorithms and methods and transformer alternatives. And we do see this effect in play, not just hardware lottery, but also kind of software framework lottery. You know, attention has been popular for six years now. And so many kind of engineer hours has been spent on making it as easy and efficient as possible to run transformer, right? And there's libraries to do all kinds of tensor parallel, pipeline parallel, if you use transformer. Let's say someone else developed alternatives, or let's just take recurrent neural nets, like LSTM, GRU. If we want to do that and run that efficiently on current hardware with current software framework, that's quite a bit harder. So in some sense, there is this feedback loop where somehow the model architectures that take advantage of hardware become popular. And the hardware will also kind of evolve to optimize a little bit for that kind of architecture and software framework will also evolve to optimize for that particular architecture. Right now, transformer is the dominant architecture. So yeah, I'm not sure if there is a good way out of this. Of course, there's a lot of development. Things like, I think compilers will play a role because compilers allow you to maybe still be much more efficient across different kinds of hardware because essentially you write the same code and compiler will be able to make it run efficiently different kinds of hardware. So for example, there's this language Mojo, they're compiler experts, right? And their bet is AI models will be running on different kinds of devices. So let's make sure that we have really good compilers with a good language that then the compiler can do a good job optimizing for all kinds of devices. So that's maybe one way that you can get out of this cycle. But yeah, I'm not sure of a good way. In my own research, I have to think about both the algorithm new model and how it maps to hardware. So there are crazy ideas that seem really good, but will be really, really difficult to run efficiently. And so as a result, for example, we can't really scale some of the architectures up simply because they're not hardware friendly. I have to think about both sides when I'm working on new models. [00:32:50]Alessio: Yeah. Have you spent any time looking at some of the new kind of like AI chips companies, so to speak, like the Cerebras of the world? Like one of their innovations is co-locating everything on the chip. So you remove some of this memory bandwidth issue. How do you think about that? [00:33:07]Tri: Yeah, I think that's an interesting bet. I think Tesla also has this Dojo supercomputer where they try to have essentially as fast on-chip memory as possible and removing some of these data transfer back and forth. I think that's a promising direction. The issues I could see, you know, I'm definitely not a hardware expert. One issue is the on-chip memory tends to be really expensive to manufacture, much more expensive per gigabyte compared to off-chip memory. So I talked to, you know, some of my friends at Cerebros and, you know, they have their own stack and compiler and so on, and they can make it work. The other kind of obstacle is, again, with compiler and software framework and so on. For example, if you can run PyTorch on this stuff, lots of people will be using it. But supporting all the operations in PyTorch will take a long time to implement. Of course, people are working on this. So I think, yeah, we kind of need these different bets on the hardware side as well. Hardware has, my understanding is, has a kind of a longer time scale. So you need to design hardware, you need to manufacture it, you know, maybe on the order of three to five years or something like that. So people are taking different bets, but the AI landscape is changing so fast that it's hard to predict, okay, what kind of models will be dominant in, let's say, three or five years. Or thinking back five years ago, would we have known that Transformer would have been the dominant architecture? Maybe, maybe not, right? And so different people will make different bets on the hardware side. [00:34:39]Alessio: Does the pace of the industry and the research also influence the PhD research itself? For example, in your case, you're working on improving attention. It probably took you quite a while to write the paper and everything, but in the meantime, you could have had a new model architecture come out and then it's like nobody cares about attention anymore. How do people balance that? [00:35:02]Tri: Yeah, so I think it's tough. It's definitely tough for PhD students, for researchers. Given that the field is moving really, really fast, I think it comes down to understanding fundamental. Because that's essentially, for example, what the PhD allows you to do. It's been a couple of years understanding the fundamentals. So for example, when I started my PhD, I was working on understanding matrix vector multiply, which has been a concept that's been around for hundreds of years. We were trying to characterize what kind of matrices would have theoretically fast multiplication algorithm. That seems to have nothing to do with AI or anything. But I think that was a time when I developed mathematical maturity and research taste and research skill. The research topic at that point didn't have to be super trendy or anything, as long as I'm developing skills as a researcher, I'm making progress. And eventually, I've gotten quite a bit better in terms of research skills. And that allows, for example, PhD students later in their career to quickly develop solutions to whatever problems they're facing. So I think that's just the natural arc of how you're being trained as a researcher. For a lot of PhD students, I think given the pace is so fast, maybe it's harder to justify spending a lot of time on the fundamental. And it's tough. What is this kind of explore, exploit kind of dilemma? And I don't think there's a universal answer. So I personally spend some time doing this kind of exploration, reading random textbooks or lecture notes. And I spend some time keeping up with the latest architecture or methods and so on. I don't know if there's a right balance. It varies from person to person. But if you only spend 100% on one, either you only do exploration or only do exploitation, I think it probably won't work in the long term. It's probably going to have to be a mix and you have to just experiment and kind of be introspective and say, hey, I tried this kind of mixture of, I don't know, one exploration paper and one exploitation paper. How did that work out for me? Should I, you know, having conversation with, for example, my advisor about like, hey, did that work out? You know, should I shift? I focus more on one or the other. I think quickly adjusting and focusing on the process. I think that's probably the right way. I don't have like a specific recommendation that, hey, you focus, I don't know, 60% on lecture notes and 40% on archive papers or anything like that. [00:37:35]Alessio: Let's talk about some Transformer alternatives. You know, say Jonathan Franco loses his bet and Transformer is not the state of the art architecture. What are some of the candidates to take over? [00:37:49]Tri: Yeah, so this bet is quite fun. So my understanding is this bet between Jonathan Franco and Sasha Rush, right? I've talked to Sasha a bunch and I think he recently gave an excellent tutorial on Transformer alternatives as well. So I would recommend that. So just to quickly recap, I think there's been quite a bit of development more recently about Transformer alternatives. So architectures that are not Transformer, right? And the question is, can they do well on, for example, language modeling, which is kind of the application that a lot of people care about these days. So there are methods based on state space methods that came out in 2021 from Albert Gu and Curran and Chris Re that presumably could do much better in terms of capturing long range information while not scaling quadratically. They scale sub-quadratically in terms of sequence length. So potentially you could have a much more efficient architecture when sequence length gets really long. The other ones have been focusing more on recurrent neural nets, which is, again, an old idea, but adapting to the new landscape. So things like RWKV, I've also personally worked in this space as well. So there's been some promising results. So there's been some results here and there that show that, hey, these alternatives, either RNN or state space methods, can match the performance of Transformer on language modeling. So that's really exciting. And we're starting to understand on the academic research side, we want to understand, do we really need attention? I think that's a valuable kind of intellectual thing to understand. And maybe we do, maybe we don't. If we want to know, we need to spend serious effort on trying the alternatives. And there's been folks pushing on this direction. I think RWKV scale up to, they have a model at 14 billion that seems pretty competitive with Transformer. So that's really exciting. That's kind of an intellectual thing. We want to figure out if attention is necessary. So that's one motivation. The other motivation is Transformer Alternative could have an advantage in practice in some of the use cases. So one use case is really long sequences. The other is really high throughput of generation. So for really long sequences, when you train with Transformer, with flash attention and so on, the computation is still quadratic in the sequence length. So if your sequence length is on the order of, I don't know, 16K, 32K, 100K or something, which some of these models have sequence length 100K, then you do get significantly slower in terms of training, also in terms of inference. So maybe these alternative architectures could scale better in terms of sequence length. I haven't seen actual validation on this. Let's say an RNN model release with context length, I don't know, 100K or something. I haven't really seen that. But the hope could be that as we scale to long sequences, these alternative architectures could be more well-suited. Not just text, but things like high resolution images, audio, video, and so on, which are emerging applications. So that's one, long sequences. Number two is a high throughput generation, where I can imagine scenarios where the application isn't like an interactive chatbot, but let's say a company wants to batch as many requests as possible on their server, or they're doing offline processing, they're generating stuff based on their internal documents, that you need to process in batch. And the issue with Transformer is that during generation, it essentially needs to keep around all the previous history. It's called the KV cache. And that could take a significant amount of memory, so you can't really batch too much because you run out of memory. I am personally bullish on RNNs. I think RNNs, they essentially summarize the past into a state vector that has fixed size, so the size doesn't grow with the history. So that means that you don't need as much memory to keep around all the previous tokens. And as a result, I think you can scale to much higher batch sizes. And as a result, you can make much more efficient use of the GPUs or the accelerator, and you could have much higher generation throughput. Now, this, I don't think, has been validated at scale. So as a researcher, I'm bullish on this stuff because I think in the next couple of years, these are use cases where these alternatives could have an advantage. We'll just kind of have to wait and see to see if these things will happen. I am personally bullish on this stuff. At the same time, I also spend a bunch of time making attention as fast as possible. So maybe hatching and playing both sides. Ultimately, we want to understand, as researchers, we want to understand what works, why do the models have these capabilities? And one way is, let's push attention to be as efficient as possible. On the other hand, let's push other alternatives to be as efficient at scale, as big as possible, and so that we can kind of compare them and understand. Yeah, awesome. [00:43:01]Alessio: And I think as long as all of this work happens and open, it's a net positive for everybody to explore all the paths. Yeah, let's talk about open-source AI. Obviously, together, when Red Pajama came out, which was an open clone of the LLAMA1 pre-training dataset, it was a big thing in the industry. LLAMA2 came out on Tuesday, I forget. And this week, there's been a lot of things going on, which they call open-source, but it's not really open-source. Actually, we wrote a post about it that was on the front page of Hacker News before this podcast, so I was frantically responding. How do you think about what open-source AI really is? In my mind, in open-source software, we have different levels of open. So there's free software, that's like the GPL license. There's open-source, which is Apache, MIT. And then there's kind of restricted open-source, which is the SSPL and some of these other licenses. In AI, you have the open models. So Red Pajama is an open model because you have the pre-training dataset, you have the training runs and everything. And then there's obviously RandomLens that doesn't make it one-to-one if you retrain it. Then you have the open-weights model that's kind of like StableLM, where the weights are open, but the dataset is not open. And then you have LLAMA2, which is the dataset is not open, the weights are restricted. It's kind of like not really open-source, but open enough. I think it's net positive because it's like $3 million of flops donated to the public. [00:44:32]Tri: How do you think about that? [00:44:34]Alessio: And also, as you work together, what is your philosophy with open-source AI? Right, right. [00:44:40]Tri: Yeah, I think that's a great question. And I think about it on maybe more practical terms. So of course, Meta has done an amazing job training LLAMA1, LLAMA2. And for LLAMA2, they make it much less restrictive compared to LLAMA1. Now you can use it for businesses, unless you are a monthly active user or something like that. I think just this change will have a very significant impact in the kind of landscape of open-source AI, where now lots of businesses, lots of companies will be using, I expect will be using things like LLAMA2. They will fine-tune on their own dataset. They will be serving variants or derivatives of LLAMA2. Whereas before, with LLAMA1, it was also a really good model, but your business companies weren't allowed to do that. So I think on a more practical term, it's kind of shifting the balance between a closed-source model like OpenAI and Anthropic and Google, where you're making API calls, right? And maybe you don't understand as much of what the model is doing, how the model is changing, and so on. Versus now, we have a model with open weight that is pretty competitive from what I've seen in terms of benchmarks, pretty competitive with GPT 3.5, right? And if you fine-tune it on your own data, maybe it's more well-suited for your own data. And I do see that's going to shift the balance of it. More and more folks are going to be using, let's say, derivatives of LLAMA2. More and more folks are going to fine-tune and serve their own model instead of calling an API. So that shifting of balance is important because in one way, we don't want just a concentration of decision-making power in the hands of a few companies. So I think that's a really positive development from Meta. Of course, training the model takes a couple of millions of dollars, but engineers have and I'm sure they spend tons of time trying many, many different things. So the actual cost is probably way more than that. And they make the weights available and they allow probably a lot of companies are going to be using this. So I think that's a really positive development. And we've also seen amazing progress on the open source community where they would take these models and they either fine-tune on different kinds of data sets or even make changes to the model. So as an example, I think for LLAMA1, the context lane was limited to 2K. Like a bunch of folks figured out some really simple methods to scale up to like 8K. [00:47:12]Alessio: Like the RoPE. [00:47:13]Tri: Yes. I think the open source community is very creative, right? And lots of people. LLAMA2 will, again, kind of accelerate this where more people will try it out. More people will make tweaks to it and make a contribution and then so on. So overall, I think I see that as still a very positive development for the field. And there's been lots of libraries that will allow you to host or fine-tune these models, like even with quantization and so on. Just a couple of hours after LLAMA2 was released, tons of companies announcing that, hey, it's on our API or hosting and so on and together did the same. So it's a very fast-paced development and just kind of a model with available weights that businesses are allowed to use. I think that alone is already a very positive development. At the same time, yeah, we can do much better in terms of releasing data sets. Data sets tend to be... Somehow people are not incentivized to release data sets. So philosophically, yeah, you want to be as open as possible. But on a practical term, I think it's a little bit harder for companies to release data sets. Legal issues. The data sets released tend to be not as eye-catchy as the model release. So maybe people are less incentivized to do that. We've seen quite a few companies releasing data sets together. Released a red pajama data set. I think Cerebus then worked on that and deduplicate and clean it up and release slim pajama and so on. So we're also seeing positive development on that front, kind of on the pre-training data set. So I do expect that to continue. And then on the fine-tuning data set or instruction tuning data set, I think we now have quite a few open data sets on instruction tuning and fine-tuning. But these companies do pay for human labelers to annotate these instruction tuning data set. And that is expensive. And maybe they will see that as their competitive advantage. And so it's harder to incentivize these companies to release these data sets. So I think on a practical term, we're still going to make a lot of progress on open source AI, on both the model development, on both model hosting, on pre-training data set and fine-tuning data set. Right now, maybe we don't have the perfect open source model since all the data sets are available. Maybe we don't have such a thing yet, but we've seen very fast development on the open source side. I think just maybe this time last year, there weren't as many models that are competitive with, let's say, ChatGPT. [00:49:43]Alessio: Yeah, I think the open data sets have so much more impact than open models. If you think about Elusive and the work that they've done, GPT-J was great, and the Pythia models are great, but the Pyle and the Stack, everybody uses them. So hopefully we get more people to contribute time to work on data sets instead of doing the 100th open model that performs worse than all the other ones, but they want to say they released the model. [00:50:14]Tri: Yeah, maybe the question is, how do we figure out an incentive structure so that companies are willing to release open data sets? And for example, it could be like, I think some of the organizations are now doing this where they are asking volunteers to annotate and so on. And maybe the Wikipedia model of data set, especially for instruction tuning, could be interesting where people actually volunteer their time and instead of editing Wikipedia, add annotation. And somehow they acknowledge and feel incentivized to do so. Hopefully we get to that kind of level of, in terms of data, it would be kind of like Wikipedia. And in terms of model development, it's kind of like Linux where people are contributing patches and improving the model in some way. I don't know exactly how that's going to happen, but based on history, I think there is a way to get there. [00:51:05]Alessio: Yeah, I think the Dolly-15K data set is a good example of a company saying, let's do this smaller thing, just make sure we make it open. We had Mike Conover from Databricks on the podcast, and he was like, people just bought into it and leadership was bought into it. You have companies out there with 200,000, 300,000 employees. It's like, just put some of them to label some data. It's going to be helpful. So I'm curious to see how that evolves. What made you decide to join Together? [00:51:35]Tri: For Together, the focus has been focusing a lot on open source model. And I think that aligns quite well with what I care about, of course. I also know a bunch of people there that I know and trust, and I'm excited to work with them. Philosophically, the way they've been really open with data set and model release, I like that a lot. Personally, for the stuff, for example, the research that I've developed, like we also try to make code available, free to use and modify and so on, contributing to the community. That has given us really valuable feedback from the community and improving our work. So philosophically, I like the way Together has been focusing on open source model. And the nice thing is we're also going to be at the forefront of research and the kind of research areas that I'm really excited about, things like efficient training and inference, aligns quite well with what the company is doing. We'll try our best to make things open and available to everyone. Yeah, but it's going to be fun being at the company, leading a team, doing research on the topic that I really care about, and hopefully we'll make things open to benefit the community. [00:52:45]Alessio: Awesome. Let's jump into the lightning round. Usually, I have two questions. So one is on acceleration, one on exploration, and then a takeaway. So the first one is, what's something that already happened in AI machine learning that you thought would take much longer than it has? [00:53:01]Tri: I think understanding jokes. I didn't expect that to happen, but it turns out scaling model up and training lots of data, the model can now understand jokes. Maybe it's a small thing, but that was amazing to me. [00:53:16]Alessio: What about the exploration side? What are some of the most interesting unsolved questions in the space? [00:53:22]Tri: I would say reasoning in the broad term. We don't really know how these models do. Essentially, they do something that looks like reasoning. We don't know how they're doing it. We have some ideas. And in the future, I think we will need to design architecture that explicitly has some kind of reasoning module in it if we want to have much more capable models. [00:53:43]Alessio: What's one message you want everyone to remember today? [00:53:47]Tri: I would say try to understand both the algorithm and the systems that these algorithms run on. I think at the intersection of machine learning system has been really exciting, and there's been a lot of amazing results at this intersection. And then when you scale models to large scale, both the machine learning side and the system side really matter. [00:54:06]Alessio: Awesome. Well, thank you so much for coming on 3. [00:54:09]Tri: This was great. Yeah, this has been really fun. [00:54:11] Get full access to Latent Space at www.latent.space/subscribe

Agenda - Manx Radio
Agenda 26.6.23 - Legislative Council has come in for a fair bit of stick recently so MLCs Tanya August Hanson and Bill Henderson are fighting back.

Agenda - Manx Radio

Play Episode Listen Later Jun 25, 2023 24:28


Legislative Council has come in for a fair bit of stick recently for its very short sittings, just twenty one sittings each year outside of Tynwald and the relatively passive if not silent public face of Leg Co members. On Agenda  MLCs Tanya August Hanson and Bill Henderson fight back. Is legislative scrutiny more than just passing comment in LegCo. Is the lack of legislation coming through the branches the fault of government and limited availability of legislative drafters? Does government have far more important stuff to be getting on with than creating new law, or is new law essential to fixing the problems we face?

Agenda - Manx Radio
Agenda 6.3.23 - Peter Reid and David Prictor give us a full account of their backgrounds and motivation to become MLCs.

Agenda - Manx Radio

Play Episode Listen Later Mar 5, 2023 53:56


On an extened Agenda this week we hear from the final two candidates for the Legislative Council election. Banker Peter Reid and David Prictor give us a full account of their backgrounds and motivation to become national politicians. Do the skills obtained from running a bank equip you for running our country? With the LegCo election taking place in just over a week's time you can find out about all the candidates by listening to Manx Radio's LegCo candidate Agenda and Perspective podcasts and lobby your MHKs to vote for your favourite candidates.

Music Art Film
FILMMAKER Freddy Moyano - CREATOR OF THE MLC AWARDS AND FILMMAKER ON ROOM 108

Music Art Film

Play Episode Listen Later Feb 25, 2023 38:10


https:// Based in the Northeastern Wisconsin area, actor/voiceover artist, Freddy Moyano, owes his artistic name, Freddy, to Wes Craven's Freddy Krueger, one of his favorite motion picture characters while growing up in Spain (1980s-1990s). He defines himself as a character artist. After having enough of working for private corporations (strong litigation/law enforcement background: 2002-2015) in the Midwest over the course of 14 years, Freddy founded his own production company, Moyano Lingua and began acting and doing voiceover work , now contributing with VO in movies and his over 30 audio books on audible.com and itunes have his narration label, namely bestselling audio productions Sins of the Son and Dark Visions. His acting career started in 2016 after a collaboration with Sergio Bruna in Los Angeles (Imparable tour). As of 2017, he acquired more experience on sets with many background and stand-in opportunities (Empire, Easy, The Chi, Rampage, Electric Dreams...) in Wisconsin and Northern Illinois/Chicago area, as well as a strong formation in advanced acting and stage combat. Freddy slowly progressed into speaking roles (modern western short, Milkshake or Apocalypture) and lead re-enactment roles (Killer Couples). Overseas, his work in Aqua Brava as a producer and lead actor has been recognized in various film festivals. Freddy is also a gifted linguist, editor and script writer. He often writes for Green Bay's "Press Times" (wildlife writer) and is a film critic and freelance correspondent for the Green Bay City Pages. Freddy loves contributing to various filmmaking processes with his consulting and expertise under www.moyanolingua.com . Over 40 motion picture productions (wildlife-focused most of them) have his seal under mlcproductions.org A Best Cinematography winner (Virgin Spring Cinefest - August 2020) and Best Nature Film winner (A Bay to Cherish - Oniros Film Awards Summer 2020) among others, Freddy has also collected Diamond, Gold, Silver and Bronze distinctions in Voice Over and Sound editing, in total 60+ wins and 30+ nominations. Freddy is also the founder and director (and celebrated Gala emcee) of the MLC Awards (an IMDb event since 2019 operating under www.mlcawards.com ) with two annual galas in Green Bay, Wisconsin and a qualifier event in Little Rock Arkansas. Actors such as Steven Weber, Vincent Pastore (The Sopranos) or Barry Corbin (No Country for Old Men) have been recognized in this international filmmaking event, which became the first ever film festival in screening a portion of an audiobook at their September 13, 2020 MLC Summer Gala. Thanks to insightful promo work and genuine appreciation for filmmakers, Freddy and his team of judges, have placed the MLCs among the top 5 most respected Midwest-based film festivals. As of 2022, Green Bay-native Tony Shalhoub has joined the MLC Awards judging team for the Best Screenplay category. Perhaps one of the most solid areas of Freddy's talent is wildlife videography, research and storytelling. Years of work can be found on RNV TV ( www.rnvtv.com ). His research of mallards diving to collect zebra mussels from the depths of the Fox River was featured internationally on various internet sites. Freddy's wildlife videos have been featured 3 times in USA Today's Outdoor section. Freddy posts weekly wildlife content on instagram.com/freddymoyanoofficial /filmmaker-freddy-moyano-founder-of-mlc-awards-filmmaker-room-

Manx Radio - Update
69 new - now 438 cases, 2 in Nobles, 0 in ICU, ambulance shortage, 'Superspreader' events, Summerland site's future, two new MLCs, 19/20 Manx economy numbers. It's Update with Andy Wint #iom #manxradio #news

Manx Radio - Update

Play Episode Listen Later Nov 23, 2021 26:12


69 new - now 438 cases, 2 in Nobles, 0 in ICU, ambulance shortage, 'Superspreader' events, Summerland site's future, two new MLCs, 19/20 Manx economy numbers. It's Update with Andy Wint #iom #manxradio #news

Manx Radio's Mannin Line
55 new - now 386 cases, 2 in Nobles, 0 in ICU, young people leaving IOM, ex-MHKs becoming MLCs, apprenticeships, Douglas Promenade progress. It's Mannin Line with Andy Wint #iom #manninline #manxradio

Manx Radio's Mannin Line

Play Episode Listen Later Oct 8, 2021 51:08


55 new - now 386 cases, 2 in Nobles, 0 in ICU, young people leaving IOM, ex-MHKs becoming MLCs, apprenticeships, Douglas Promenade progress. It's Mannin Line with Andy Wint #iom #manninline #manxradio

Manx Radio's Mannin Line
56 new - now 529 cases, 12 in Nobles, 1 in ICU, Electric Railway sign, MLCs who would be MHKs, Pandemic mindset, Douglas Prom. It's Mannin Line with Andy Wint #iom #manninline #manxradio

Manx Radio's Mannin Line

Play Episode Listen Later Aug 19, 2021 50:46


56 new - now 529 cases, 12 in Nobles, 1 in ICU, Electric Railway sign, MLCs who would be MHKs, Pandemic mindset, Douglas Prom. It's Mannin Line with Andy Wint #iom #manninline #manxradio

Gun & Gear Review Podcast
Gun and Gear Review Podcast Episode 347 – MLCS-11 chassis, MPA AR9, Vigilance M20

Gun & Gear Review Podcast

Play Episode Listen Later Nov 13, 2020 35:53


On this week's episode, we discuss the MLCS-11 lightweight 10/22 chassis system from Wiland, Masterpiece Arms MPA AR9 pcc, and Vigilance Rifles M20 subgun (if you can call it that) For all the show notes and back episodes, head over to firearmsradio.tv/gun-and-gear-review-podcast

Firearms Radio Network (All Shows)
Gun and Gear Review Podcast Episode 347 – MLCS-11 chassis, MPA AR9, Vigilance M20

Firearms Radio Network (All Shows)

Play Episode Listen Later Nov 13, 2020 35:53


On this week’s episode, we discuss the MLCS-11 lightweight 10/22 chassis system from Wiland, Masterpiece Arms MPA AR9 pcc, and Vigilance Rifles M20 subgun (if you can call it that) For all the show notes and back episodes, head over to firearmsradio.tv/gun-and-gear-review-podcast

Gun & Gear Review Podcast
Gun and Gear Review Podcast Episode 347 – MLCS-11 chassis, MPA AR9, Vigilance M20

Gun & Gear Review Podcast

Play Episode Listen Later Nov 13, 2020 35:53


On this week’s episode, we discuss the MLCS-11 lightweight 10/22 chassis system from Wiland, Masterpiece Arms MPA AR9 pcc, and Vigilance Rifles M20 subgun (if you can call it that) For all the show notes and back episodes, head over to firearmsradio.tv/gun-and-gear-review-podcast

Aaj Ka Din
RJD छोड़ने वाले MLCs को अचानक 'नीतीश का विकास' कैसे दिख गया: आज का दिन, 24 जून

Aaj Ka Din

Play Episode Listen Later Jun 24, 2020 37:03


आज तक रेडियो के सुबह के न्यूज़ पॉडकास्ट- आज का दिन में सुनिए, सेना प्रमुख एमएम नरावणे के लद्दाख दौरे का दूसरा दिन, बिहार में आरजेडी छोड़ गए पांच एमएलसी में से एक दिलीप राय से बातचीत और रामदेव की कंपनी पतंजलि के कोरोना का इलाज खोजने पर उठे सवाल. सुनिए, आज का दिन, नितिन ठाकुर के साथ.

mlcs
Wood Whisperer Live (Audio)
Holiday Giveaway 2019 – TWW Live!

Wood Whisperer Live (Audio)

Play Episode Listen Later Dec 22, 2019


Our annual holiday show! Special thanks to MLCS and M-Power Tools for donating to the giveaways! Congrats to all the winners!

The Celebrant Talk Show
The one with the solid tangent

The Celebrant Talk Show

Play Episode Listen Later Nov 25, 2019 98:42


It's a month until Christmas and here we are again with another episode of your favourite podcast! This episode Josh and Sarah chat about the latest newsletter from MLCS, our thoughts about OPD into the future, our obligations in an online environment, not stressing too much about the smaller details of the marriage paperwork, and how to bring the magic to every single ceremony. --- Send in a voice message: https://anchor.fm/celebrant/message

The Celebrant Talk Show
Written to elicit a response

The Celebrant Talk Show

Play Episode Listen Later Oct 17, 2019 64:05


On todays episode, we discuss our seat at the big table with MLCS, caping celebrant numbers, and the new survey that was written to coerce celebrants into doing less OPD, the implications for the industry and what you can do about it. We also look at the celebrants obligations in an online environment and we finish the episode discussing the inclusion of children at ceremonies. 1.30 Preshow banter 6.23 MLCS inclusion for Celebrant Institute 15.55 Capping celebrant numbers 33.46 The survey that was written to elicit a response 45.27 Recap of electronic environment questions 50.27 Children at weddings --- Send in a voice message: https://anchor.fm/celebrant/message

Vida Tranquila
Interview with MLCS Skates

Vida Tranquila

Play Episode Listen Later Sep 12, 2019 66:39


Mlcs Skates visits the studio and bless us with awesome insight on his brand the way he looks at life! take a listen and let us know! --- This episode is sponsored by · Anchor: The easiest way to make a podcast. https://anchor.fm/app Support this podcast: https://anchor.fm/vidatranquila/support

skates mlcs
The Celebrant Talk Show
On the off chance that no-one knows what this sentence is about

The Celebrant Talk Show

Play Episode Listen Later Jul 27, 2019 68:33


Hello everybody! Thanks for joining us for another episode of The Celebrant Talk Show! This time Josh and Sarah chat about the annual registration fee (pay yours now!), the review of the marriage forms, the epic funeral course Sarah went on (http://www.silvercelebrants.com.au), and the new fact sheet about celebrants' obligations in an online environment, plus an exciting announcement! As always, shoot us an email at hello@celebrant.fm if you have any questions or topics you'd like us to discuss in an upcoming episode! What a time to be alive! The Marriage Act 1961 has finally aligned with the Electronic Transactions Act (thanks Josh)! and this ep chats about what this means for us. Paying your Annual Registration Fee and why pricing yourself out per hour or perhaps why you can't are also discussed in this episode. Other awesome content re the MLCS fact sheets and Sarah has some wonderful advice for funeral celebrants. You can't miss this one! 03:00 – Sarah's fav podcast. This podcast will kill you. https://thispodcastwillkillyou.com/ 04:10 – Why it's impossible to price yourself out per hour. 12:40 – Why do you have to pay your Annual registration fee? Why such a big deal? 19:15 – What do the MLCS do with the funds from the Registration Fee? 27:45 – MLCS Fact sheet regarding Celebrant obligations in an online environment 28:25 – Accepting the NOIM, Date and place of Birth electronically! - What a time to be alive! (Thanks Josh)! 30.20 – Sighting evidence of identity electronically. 33:40 – The Electronic Transactions act 34:07 – What is a Commissioner for Declarations? 36:15 – Children of previous marriages – Australian Bureau of Statistics 41:50 – Conference chat - DJ Alliance of Australia https://www.djaa.org.au/ 52.22 – Doing the Cert IV with Sarah and Lifeskills Training https://www.lifeskillstraining.com.au/ 59:00 – Silver Celebrant Funeral training – Robyn O'Connell https://www.silvercelebrants.com.au/ --- Send in a voice message: https://anchor.fm/celebrant/message

Chat&Chai: Yoga Talks from Miami Life Center
Tim on The Importance of Having a Teacher

Chat&Chai: Yoga Talks from Miami Life Center

Play Episode Listen Later Nov 1, 2018 57:05


Tim Feldmann is back again for another episode of Chat & Chai: Yoga Talks from Miami Life Center. Tim in one of MLCs owners and founders and teaches regular workshops in our South Beach Shala This recording was taken from a talk he gave to our students. He touches upon the role and importance of having a teacher as you walk down the yogic path. He draws from classical Indian philosophy as well as his own personal experience to give us a better understanding of how to navigate through the powerful teacher-student relationship Side note... we apologize for the not so great audio quality! We figured the content was so good it was better to share than keep it to ourselves. Hope you enjoy!  

Wood Whisperer Live (HD Video)
Hand-Held Spindle Sander

Wood Whisperer Live (HD Video)

Play Episode Listen Later Sep 9, 2017


I show off my new toy from MLCS and we'll discuss things like jumpy sanders, letting kids into the shop, and even Nicole's work history.

Wood Whisperer Live (Audio)
Hand-Held Spindle Sander

Wood Whisperer Live (Audio)

Play Episode Listen Later Sep 9, 2017


I show off my new toy from MLCS and we'll discuss things like jumpy sanders, letting kids into the shop, and even Nicole's work history.

The Social Network Show
Connect Ireland: Land of Opportunity for Business

The Social Network Show

Play Episode Listen Later May 21, 2015 25:59


Ireland's economy was hit hard by the global financial crisis of 2007-08 and entered into economic depression in 2009. By 2012, unemployment reached 14.6%. (http://en.wikipedia.org/wiki/Post-2008_Irish_economic_downturn) In this interview with Jim Nico and Dr. J, Joanna Murphy, Chief Operations Officer of ConnectIreland, explains the effort to capitalize on the connections of 70 million persons of Irish heritage living abroad, the Irish diaspora, in getting Ireland back to work. A passionate proponent of the program, Ms. Murphy describes the creation of jobs through the participation of “connectors.” Anyone can register as a Connector on the website “ConnectIreland, our country, your opportunity”. If a business the Connector refers to ConnectIreland creates jobs in Ireland, the Connector can receive a reward of €1500 per job. Joanna's enthusiasm is contagious as she speaks of rolling out the figurative red carpet for visitors who come to Ireland to explore the possibilities of setting up shop. The Irish government has gone all out to favor “a green light over red tape.” This intriguing leverage of a worldwide social network among those of Irish heritage highlights the value of cooperative effort among government, non-governmental agencies, and grass roots participation of persons-with-something-in-common around the world. Joanna Murphy is the Chief Operating Officer of ConnectIreland – an incentivised referral programme appointed by IDA Ireland to deliver the Succeed in Ireland Initiative. She is responsible for ConnectIreland's brand development and strives to extend its reach through partnerships with leading global organisations, such as GAA, DAA and many more. A business development specialist, Joanna leads ConnectIreland's international marketing growth and has worked on countless successful marketing campaigns since being appointed as COO. Prior to joining ConnectIreland as Project Manager in 2013, where she quickly rose through the ranks, Joanna was Managing Director at two highly successful business entities, Protherm Thermal Insulation Solutions and MLCS, which she established and subsequently sold. Her passion and enthusiasm for the ConnectIreland model is matched only by her invaluable experience in the Irish business market and she has been prominent in furthering links among the Irish, at home and abroad. Joanna holds a first class honours degree in business studies from the Irish Management Institute.

Medizin - Open Access LMU - Teil 06/22
DY determinants, possibly associated with novel class II molecules, stimulate autoreactive CD4+ T cells with suppressive activity

Medizin - Open Access LMU - Teil 06/22

Play Episode Listen Later Jan 1, 1988


A set of T cell clones (TCC) isolated from HLA-DR-, Dw-, DQ-matched allogeneic MLCs was found to proliferate autonomously when stimulated with cells carrying a wide range of class I or II specificities. This apparently unrestricted proliferation was relatively weak, and only low levels of IL-2 were present in the supernatants of stimulated cells. Autologous as well as allogeneic PBMC and B lymphoblastoid cell lines (B-LCL) were capable of stimulating such clones, which were also restimulated by suppressive, but not by helper, TCC. Moreover, such clones displayed the unusual property of autostimulation. mAb inhibition experiments suggested that class II- or class II-restricted antigens were involved in stimulation. Thus, certain "broad" mAbs (TU39, SG520) reacting with multiple locus products inhibited activation of these reagents, but none of those reacting more specifically with DR (TU34, TU37, L243, Q2/70, SG157), DQ (TU22, SPV- L3, Leu 10), or DP (B7/21), or mixtures of these mAbs, were able to do so. Evidence from sequential immunoprecipitation experiments suggested that mAb TU39 bound class II-like molecules other than DR, DQ, and DP on TCC and B-LCL, and it is therefore proposed that such putative novel class II-like molecules may carry the stimulating determinants for these autoreactive clones. DY-reactive clones lacked helper activity for B cells but mediated potent suppressive activity on T cell proliferative responses that was not restricted by the HLA type of the responding cells. Suppressive activity was induced in normal PBMC by such clones, as well as by independent suppressive clones, which was also inhibited only by mAb TU39. These findings lead to the proposal that DY-reactive autostimulatory cells may constitute a self- maintaining suppressive circuit, the level of activity of which would be regulated primarily by the availability of IL-2 in the microenvironment