Podcasts about jermyn

director head property ritchie solicitor time season jermyn

Play Episode Listen Later Jul 27, 2023 24:06

Stephen Galpin is joined for a legal special with two guests from Woodstock Legal Services, Solicitor and CEO Carly Jermyn & Director & Head of Property & Litigation Simone Ritchie

The Super Podcast Audio Commentary – Labyrinth (1986)

Podcasts – The Super Network

Play Episode Listen Later Jul 13, 2023 120:38

The Super Podcast Audio Commentary Labyrinth (1986) Download HERE Welcome back to The Super Podcast, we are back with an audio commentary for the classic Labyrinth (1986)! Unfortunately Super Marcey wasn’t able to make this one, so her co-host Bede ‘The Terrible Aussie’ Jermyn atook the reigns of this one. However Bede won’t be alone, … Continue reading →

labyrinth podcast audio bede audio commentary super podcast jermyn

Property Question Time Season 5 Episode 37 With Carly Jermyn & Akhil Mair

ceo managing directors property time season akhil jermyn

Play Episode Listen Later Jun 29, 2023 23:26

Stephen Galpin is joined by the CEO of Woodstock Legal Services & Solicitor Carly Jermyn & Akhil Mair the Managing Director of Our Mortgage Broker

Episode 14: Facts Concerning the Late Arthur Jermyn and His Family

Learning Lovecraft

Play Episode Listen Later Jun 16, 2023 63:50

This time Jay and Ken discuss H.P. Lovecraft's Facts Concerning the Late Arthur Jermyn and His Family.

family lovecraft his family jermyn

Property Question Time Season 5 Episode 36 with Carly Jermyn & Akhil Mair

ceo managing directors property time season akhil jermyn

Play Episode Listen Later Jun 15, 2023 22:41

Stephen Galpin is joined by the CEO of Woodstock Legal Services Carly Jermyn & the Managing Director of Our Mortgage Broker LTD Akhil Mair

Rocky & Lissa Audio: Stuck in an Elevator Before Taylor Swift Concert!

Rocky & Lissa

Play Episode Listen Later May 16, 2023 2:36

A Mom from Jermyn, her daughter & her friends all got stuck in an elevator before one of the Taylor Swift shows in Philly last weekend and had to be rescued by the Fire Department! Melanie was on with Rocky & Lissa to tell her story!

mom taylor swift stuck concerts elevators fire departments jermyn

Carly Jermyn – Conflict resolution, progressive leadership and female empowerment

Evolve to Succeed

Play Episode Listen Later Apr 3, 2023 43:03

Carly Jermyn is the CEO and Property Litigation Solicitor at Woodstock Legal Services. One of her specialities is dispute resolution, and this episode includes her opinions and advice on the topic, not just as it applies to property disputes but in a broader context as well. Amongst other things, there are also discussions around female empowerment, progressive leadership and the role that AI and the metaverse will play in the future of the legal sector.

ceo ai leadership resolutions progressive conflict resolution female empowerment jermyn

Property Question Time Season 5 Episode 19 With Carly Jermyn Richard Murray

property time season richard murray jermyn

Play Episode Listen Later Mar 21, 2023 22:04

Stephen Galpin is joined by Carly Jermyn CEO of Woodstock Legal Services and Richard Murray CEO of Eurolink Provider of VECO Software

Property Question Time Season 5 Episode 10 with Carly Jermyn & Zee Razaq

Play Episode Listen Later Mar 2, 2023 24:21

Stephen Galpin is joined by the Managing Director of Certax Accounting Zee Razaq & CEO/Solicitor at Woodstock Legal Services Carly Jermyn

managing directors property time season jermyn razaq

Property Question Time Season 5 Episode 10 with Carly Jermyn & Zee Razaq

Play Episode Listen Later Feb 21, 2023 24:26

Stephen Galpin is joined by the Managing Director of Certax Accounting Zee Razaq & CEO/Solicitor at Woodstock Legal Services Carly Jermyn

managing directors property time season jermyn razaq

AF - Engineering Monosemanticity in Toy Models by Adam Jermyn

speech engineering models ea rationalist jermyn

Play Episode Listen Later Nov 18, 2022 5:56

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Engineering Monosemanticity in Toy Models, published by Adam Jermyn on November 18, 2022 on The AI Alignment Forum. Overview In some neural networks, individual neurons correspond to natural "features" in the input. Such monosemantic neurons are much easier to interpret, because in a sense they only do one thing. By contrast, some neurons are polysemantic, meaning that they fire in response to multiple unrelated features in the input. Polysemantic neurons are much harder to characterize because they can serve multiple distinct functions in a network. Recently, Elhage+22 and Scherlis+22 demonstrated that architectural choices can affect monosemanticity, raising the prospect that we might be able to engineer models to be more monosemantic. In this work we report preliminary attempts to engineer monosemanticity in toy models. Toy Model The simplest architecture that we could study is a one-layer model. However, a core question we wanted to answer is: how does the number of neurons (nonlinear units) affect the degree of monosemanticity? To that end, we use a two-layer architecture: Features are generated as sparse vectors in a high-dimensional space. They are then run through a (fixed) random projection layer to produce the inputs into our model. We imagine this random projection process as an analogy to the way the world encodes features in our observations. Within the model, the first layer is a linear transformation with a bias, followed by a nonlinearity. The second layer is a linear transformation with no bias. Our toy model is most similar to that of Elhage+22, with a key difference being that the extra linear layer allows us to vary the number of neurons independent of the number of features or the input dimension. We study this two model on three tasks. The first, a feature decoder, performs a compressed sensing reconstruction of features that were randomly and lossily projected into a low-dimensional space. The second, a random re-projector, reconstructs one fixed random projection of features from a different fixed random projection. The third, an absolute value calculator, performs the same compressed sensing task and then returns the absolute values of the recovered features. These tasks have the important property that we know which features are naturally useful, and so can easily measure the extent to which neurons are monosemantic or polysemantic. Note that we primarily study the regime where there are more features than embedding dimensions (i.e. the sparse feature layer is wider than the input) but where features are sufficiently sparse that the number of features present in any given sample is smaller than the embedding dimension. We think this is likely the relevant limit for e.g. language models, where there are a vast array of possible features but few are present in any given sample. Key Results We find that models initialized with zero mean bias (left) find different local minima depending on the learning rate, with more monosemantic solutions and slightly lower loss at higher learning rates. Models initialized with a negative mean bias (right) all find highly monosemantic local minima, and achieve slightly better loss. Note that these models are all in a regime where they have more neurons than there are input features. Just to hammer home how weird this is, below we've plotted the activations of neurons in response to single-feature inputs. The three models we show get essentially the same loss but are clearly doing very different things! More generally, we find: When inputs are feature-sparse, models can be made more monosemantic with no degredation in performance by just changing which loss minimum the training process finds (Section 4.1.1). More monosemantic loss minima have moderate negative biases in all three tasks, and we are able to use t...

EP 101 Craig Sheffer Meets the Aliens: The Voyage of the Rock Alien & Fire in the Sky W/ Bede Jermyn

The Schlock and Awe Podcast

Play Episode Listen Later Nov 9, 2022 133:29

This week on S&A Lindsay is joined by Super Network Co-Host and Host of the Bede vs the Living Dead, Bede Jermyn. As they get beamed up into an Unidentified Flying Object for a Double of James Fargo & Bob Giraldi's The Voyage of the Rock Aliens (1984) & Robert Lieberman's Fire in the Sky (1993). The truth is out there and they really want to get to know us. Listen to Super Network Podcast Network Here Listen to Bede vs the Living Dead Here Follow Super Network Podcast Network on Twitter @SM_SuperNetwork Follow Super Network Podcast Network on Instagram @the.super.network/ Follow Bede vs The Living Dead on Twitter @BedeVsTLD Follow Bede on Twitter @BedeJermyn Follow Schlock & Awe on Twitter @schlockandawe1 Follow Schlock & Awe on Instagram @schlockandawe1/ Follow Lindsay on Twitter @readandgeek Follow Lindsay on Letterboxd @ReadandGeek/ Say Hi schlockandawemovies@gmail.com If you like the show please Rate & Review S&A on Apple Podcasts & Spotify Original Music Composed and Performed by Anthony King

rock fire aliens voyage living dead performed fire in the sky bede unidentified flying objects anthony king sheffer rock aliens jermyn

AF - Toy Models and Tegum Products by Adam Jermyn

Play Episode Listen Later Nov 4, 2022 6:07

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Toy Models and Tegum Products, published by Adam Jermyn on November 4, 2022 on The AI Alignment Forum. (Thanks to Adam Scherlis, Kshitij Sachan, Buck Shlegeris, Chris Olah, and Nicholas Schiefer for conversations that informed this post.) Anthropic recently published a paper on toy models of superposition [Elhage+22]. One of the most striking results is that, when features are sparse, feature embeddings can become divided into disjoint subspaces with just a few vectors per subspace. This type of decomposition is known as a tegum product. This post aims to give some intuition for why tegum products are a natural feature of the minima of certain loss functions. Setup Task Suppose we've got d embedding dimensions and n>d features. We want to embed features into dimensions in a way that minimizes overlap between their embedding vectors. In the simplest case this could be because we're building an autoencoder and want to compress the features into a low-dimensional space. One approach is to encode all of the features in all of the dimensions. With this approach there is some interference between every pair of features (i.e. no pair is embedded in a fully orthogonal way), but we have a lot of degrees of freedom that we can use to minimize this interference. Another approach is to split the d dimensions into k orthogonal subspaces of d/k dimensions. This has the advantage of making most pairs of vectors exactly orthogonal, but at the cost that some vectors are packed more closely together. In the limit where k=1 this reduces to the first approach. Our aim is to figure out the k that minimizes the loss on this task. Loss Suppose our loss has the following properties: L=∑i≠jℓ(cosθij). That is, the loss decomposes into a sum of terms involving the cosine similarities of feature vectors, and all features are equally important. ℓ(0)=0. The loss vanishes for orthogonal vectors. dℓ/dcosθ>0. The loss is greater the more the vectors overlap. Using these properties, we find that the loss is roughly where ϵ is the typical cosine similarity between vectors in a subspace. Loss-Minimizing Subspaces The Johnson-Lindenstrauss lemma says that we can pack m nearly-orthogonal vectors into D dimensions, with mutual angles satisfying where and ϵ0 is a constant. Setting m=n/k and D=d/k gives Assuming we pick our vectors optimally to saturate the Johnson-Lindenstrauss bound, we can substitute this for ϵ in the loss and differentiate with respect to k to find There are three possible cases: either the minimum occurs at k=d (the greatest value it can take), or at k=1 (the smallest value it can take) or at some point in between where dL/dk vanishes. The derivative vanishes if which gives where When α≥2 there is no place where the derivative vanishes, and the optimum is k=1. Otherwise there is an optimum at so long as this is less than d. If it reaches d, the optimum sticks to k=d. Interpretation We can think of α as the sensitivity of the loss to interference. Specifically, which moment of the interference distribution do we care about? When α is large, we care more about decreasing higher moments, and in the limit of infinite α what matters is just the maximum interference between vectors. Hence when α is large we want to have fewer subspaces, each with more vectors but smaller cosine similarities. By contrast, when α is small, we care more about decreasing smaller moments, and in the limit as α0 what matters is the fraction of vectors that interfere at all. Hence when α is small we want to have more subspaces, each with fewer vectors but larger cosine similarities. So tegum products are preferred when we can tolerate larger “peak” interference and want fewer instances of interference, whereas a single large subspace is preferred when we can tolerate lots of instances of interference and want to mi...

speech products models ea dl anthropic rationalist jermyn buck shlegeris

AF - Humans do acausal coordination all the time by Adam Jermyn

The Schlock and Awe Podcast

Play Episode Listen Later Nov 2, 2022 4:42

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Humans do acausal coordination all the time, published by Adam Jermyn on November 2, 2022 on The AI Alignment Forum. I used to think that acausal coordination was a weird thing that AI's might do in the future, but that they certainly wouldn't learn from looking at human behavior. I don't believe that anymore, and think there are lots of examples of acausal coordination in everyday life. Examples people do Voting The political science argument against voting goes: The probability that my vote tilts the election is tiny, so the expected value to me is tiny, so it's not worth my time to vote. And the standard rebuttal to this is: If everyone thought that way democracy would fall apart, so I should vote. More precisely, I expect that the people who justify voting on these grounds are doing the following reasoning: I'm like the people who share my views in my tendency to vote. That means that there is a correlation between my decision to vote (or not) and the tendency of other people in my coalition to vote (or not). Lots of people who share my views vote, enough to make our collective vote worth my while. So I should vote, so that we all vote and we all win. This is an example of acausal coordination! The pro-voting position amounts to reasoning about what other actors with correlated thought processes will do and picking the option which, if each actor does the same reasoning and comes to the same conclusion, leads to a better outcome. Recycling/Carbon Footprint Reduction The usual argument against recycling/reducing your personal carbon footprint goes: I only have control over my environmental impacts. The effect of choosing to reduce these is tiny, so there's no point in bearing even small costs to do it. And the standard rebuttal is: If everyone decided to reduce their footprint/recycle/etc. we'd have the problem(s) solved. Again, the rebuttal is fundamentally an argument about how to acausally coordinate with other people to achieve a collective goal. Whether or not I recycle is logically connected to whether or not other people who share my reasoning recycle. There are lots of those people, which makes me want to recycle so that they recycle so that we collectively help the environment a significant amount. Dieting Why do people feel (psychologically) bad when they go over the limits on their diet? I don't think it's because they screwed up once, I think it's because they view their commitment to a diet as a coordination effort between their present and future selves. Specifically, the reasoning goes: I couldn't stick to my diet this time. My ability to stick to my diet is logically connected to the ability of future versions of me to stick to their diets, so by failing to do so now I have failed to coordinate with future versions of myself. The most explicit example I've seen of this in action is Zvi's reasoning about diets: For each meal I would consume, I decided what quantity was worth it and forbade myself from ever consuming more. I motivated myself to stick to that rule in the face of hyperbolic discounting by reminding myself that I would make the same decision next time that I was making now, so I was deciding what action I would always take in this situation. More generally, sticking to the rules I'd decided to follow meant I would stick to rules I'd decided to follow, which was clearly an extremely valuable asset to have on my side. An example people don't do: paying extra taxes As far as I can tell, almost no one voluntarily pays extra taxes. And yet, there is an argument for doing so: If everyone decided to pay extra taxes, the government would have more money for services/we could quickly pay down the debt/etc. Why does voting coordination work but extra-tax-paying doesn't? For some people it could be a general disapproval of the things tax dollars p...

ai humans speech ea coordination zvi rationalist jermyn

Ep 98 HS Dangertainment: Halloween Resurrection & WNUF Halloween Special W/ Bede Jermyn

Play Episode Listen Later Oct 23, 2022 140:05

On this S&A Halloween episode Lindsay is joined by Super Network Podcast Co-Host and Host of the up coming Podcast Bede vs The Living Dead - Bede Jermyn. And they are going live (technical difficulties and all) with a Double of Rick Rosenthal's Halloween Resurrection (2002) & Chris LaMartina's The WNUF Halloween Special (2013). No matter the quality of the movies both of these movies they both reflect reality TV shows and also reflect how the media can will exploit a tragedy, Listen to Super Network Podcast Network Here Follow Super Network Podcast Network on Twitter @SM_SuperNetwork Follow Super Network Podcast Network on Instagram @the.super.network/ Follow Bede vs The Living Dead on Twitter @BedeVsTLD Follow Bede on Twitter @BedeJermyn Follow Schlock & Awe on Twitter @schlockandawe1 Follow Schlock & Awe on Instagram @schlockandawe1/ Follow Lindsay on Twitter @readandgeek Follow Lindsay on Letterboxd @ReadandGeek/ Say Hi schlockandawemovies@gmail.com If you like the show please Rate & Review S&A on Apple Podcasts & Spotify Original Music Composed and Performed by Anthony King

tv living dead performed halloween resurrection bede anthony king wnuf halloween special rick rosenthal chris lamartina jermyn dangertainment

AF - Smoke without fire is scary by Adam Jermyn

Play Episode Listen Later Oct 4, 2022 6:04

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Smoke without fire is scary, published by Adam Jermyn on October 4, 2022 on The AI Alignment Forum. This post was written in response to Evan Hubinger's shortform prompt below and benefited from discussions with him. Thanks to Justis Mills for feedback on this post. If we see precursors to deception (e.g. non-myopia, self-awareness, etc.) but suspiciously don't see deception itself, that's evidence for deception. Setup Suppose an AI lab tries to look for the capabilities that enable deceptive alignment. This definitely seems like a good thing for them to do. Say they search for evidence of models optimizing past the current training epoch, for manipulation during human feedback sessions, for situational awareness, and so on. Suppose that when all is said and done they see the following: There are at least two ways to interpret this: “Phew, bigger models are less likely to be deceptively aligned than small ones. Scale away!” “Bigger models are more capable at hiding their deceptive alignment. Run away!” These are very different interpretations, with very different recommendations, so we'd really like to know which to put more credence in. Priors Going in, I think we should have a prior that capabilities getting worse with increasing scale is unlikely. We know of specific examples thanks to the inverse scaling prize, but these are rare exceptions to the general rule of increasing capabilities with increasing scale. As a result, a decreasing deceptive capabilities score should be suspicious. The flip side is that it's conceivable that some alignment problems get easier with scale. Larger language models understand more of what we say and value (e.g. Figure 8 of Bai+22), and so in some sense outer-aligning them may be easier than outer-aligning smaller models. It's not inconceivable that the same happens for inner alignment, though I wouldn't want to rely on this happening. Tests We don't have to rely on priors alone. Here are a few ways we might distinguish between interpretations: Use transparency tools and figure out what the model is doing. Impair bigger models (e.g. with dropout or random or adversarial perturbations) and see if they get better at some deceptive capabilities. Impairing models shouldn't improve capabilities, so this would be good evidence that the bigger models are capable of deception but choosing to hide that capability.This experiment is a little tricky to implement, but the core idea is to hope that interventions impair something critical in the model's ability to execute on deception but don't impair all of the deceptive capabilities.Even if the largest models can continue hiding their capabilities despite the impairment, models that are just barely able to hide their capabilities may fail to do so once impaired, which should increase the model size where deceptive capabilities peak. Study the models on a very different data distribution. Models that are hiding relevant capabilities may believe they've been deployed and reveal their real goals.Once more, while the largest models may not be fooled by this, models just on the cusp of deceptive alignment are more likely to be tricked and so deceptive alignment may manifest as a shift in the peak of deceptive capabilities towards larger models. Study which specific capabilities get worse with scale. It's hard to understand how some capabilities could get worse with scale (e.g. situational awareness, planning ability, etc.) so if those are getting worse that's really worrying, whereas it's less obvious if e.g. models should/shouldn't get more epoch-level myopic with scale. Scaling Helps Having access to models of many different sizes is helpful here: we can experiment with small models that are bad at deceptive capabilities, use those results as a baseline, and then try to understand what happens in ...

ai study run speech scary scale figure smoke models ea larger phew bai rationalist jermyn

AF - It matters when the first sharp left turn happens by Adam Jermyn

Play Episode Listen Later Sep 29, 2022 6:14

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: It matters when the first sharp left turn happens, published by Adam Jermyn on September 29, 2022 on The AI Alignment Forum. Thanks to Evan Hubinger for comments on these ideas. Introduction A “sharp left turn” is a point where capabilities generalize beyond alignment. In a sharp left turn an AI becomes much more capable than aligned, and so starts to exploit flaws in its alignment. This can look like Goodharting, where strong optimization pressure causes outer alignment failure because the base goal isn't identical to what we want. Or it can look like deceptive alignment, where the model is aligned to proxy goals that aren't identical to the base goal, but we don't notice until the model is capable enough to make the proxies fail. However they happen, sharp left turns are bad. An important question, though, is: when will the first sharp left turn happen? Timing Consider three scenarios: Weak: The sharp left turn happens before models are able to cause existential risks. For instance, maybe GPT-4 has a sharp left turn but isn't able to reliably execute on its plans. Strong: The sharp left turn happens once capabilities are very super-human and dangerous if misused. In this world AI's are doing the bulk of the world's scientific research by the time alignment unravels. Human-level: The sharp left turn happens while capabilities are around human level, meaning both dangerous and not game-changingly helpful for research. I think that the most dangerous of these is the human-level world. In the “weak” scenario, we get to see the sharp left turn coming. We potentially get lots of experience with models looking aligned and then that alignment failing with increasing scale. In this world, we effectively get multiple shots at alignment because the first badly-misaligned models just aren't that dangerous. We get to experiment with deceptively aligned models and get a grounded empirical understanding of why models become deceptively aligned and how we might stop that. In this world, alignment research gets a chance to become a paradigmatic scientific field before the models get too capable. In the “strong” scenario, we get to do alignment research with super-human models that are still aligned. In this world, we don't have to solve the technical challenge of alignment entirely on our own. Alignment research potentially gets much further along before we have to have an airtight solution and prevent sharp left turns. Both of these seem like pretty hopeful worlds. By contrast, the world where sharp left turns happen near human-level capabilities seems pretty dangerous. In that world, we don't get to study things like deceptive alignment empirically. We don't get to leverage powerful AI researcher tools. We have to solve the problem entirely on our own, and we have to do it without any useful empirical feedback. This seems really bad. Scales The timing of sharp left turns depends on a few capability scales, some of which I conflated above: The danger scale, above which the model is capable enough to pose an existential risk. The research scale, above which the model is able to do scientific research better than humans. The left-turn scale, above which the model is able to undergo a sharp left turn. A useful anchor here is given by deceptive alignment, which suggests the scale at which a model can successfully reason about its own training process. The three scenarios I outlined are characterized by different orderings of these scales: Weak: (left-turn < danger) Strong: (research < left-turn) Human-level: (left-turn ~ research, left-turn ~ danger) There can be other orderings, leading to other outcomes. For instance, we could live in a world with (danger < left-turn < research). In that world, the left turn happens at a scale that's dangerous but incapable of super-human research. ...

ai human speech alignment weak ea sharp gpt left turn rationalist jermyn

AF - Brief Notes on Transformers by Adam Jermyn

Play Episode Listen Later Sep 26, 2022 4:08

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Brief Notes on Transformers, published by Adam Jermyn on September 26, 2022 on The AI Alignment Forum. These are just some notes I wrote while reading about transformers which I thought might be a useful reference to others. Corrections welcome. Overview of Transformers Many transformer models have the following architecture: Data flows as follows: We take tokens as inputs and pass them through an embedding layer. The embedding layer outputs its result into the residual stream (x0). This has dimension (C,E), where C is the number of tokens in the context window and E is the embedding dimension. The residual stream is processed by the attention mechanism (H) and the result is added back into the residual stream (i.e. x1 = H(x0) + x0). The residual stream is processed by an MLP layer (MLP) and the result is added back into the residual stream (i.e. x2 = MLP(x1) + x1). Steps (2) and (3) together define a “residual block”. The body of the transformer is formed of a stack of these blocks in series. After the final residual block, we apply an unembedding transformation to produce logits, which represent the relative probabilities of different output tokens. Attention Mechanism The attention mechanism (H) is divided into multiple attention heads hj, which act in parallel. That is, Note that this decomposition is only useful if attention heads are non-linear. Fortunately they are! Each attention head is of the form That is, Aik(x) mixes across tokens (which is the first index of x) and Sjl transforms each token in parallel. Another way we could have written this is The matrix S is also written in more common notation as WOWV, which are sometimes called the output and value weights. In general though S is just some low-rank matrix that we learn. S has shape (E,E) because it transforms in the embedding space. The matrix A is where the nonlinearity of attention comes in. This is given by where Y is written in more common notation as WTQWK/dk, which are sometimes called the query and key weights. The dimension dk is the dimension of the output of WK, and so is the rank of Y. As with S, Y is just some low-rank matrix that we learn. The softmax acts on the whole matrix. MLP Layer The MLP (multilayer perceptron) layer processes the residual stream using the same MLP for each token index. That is, there is no communication between tokens in the MLP layer. All this layer does is transform in the embedding space. Positional Encodings A quirk of the attention mechanism is that it is covariant with respect to shuffling the token index. That is, if P is a permutation matrix then To see this, we expand the left-hand side: The permutations don't change any of the values inside the softmax, so they can be pulled outside: The transpose of a permutation matrix is its inverse, so PTP=I and Similarly, the MLP layer acts on each token individually and so doesn't know anything about their orderings. What this means is that there is no information about token ordering in the transformer unless we put it there in the embedding space. This is what positional encodings do. A typical positional encoding is given by adding a position-dependent vector to the embedding of each token. A common choice is where k is the token index in the context window and j indexes the embedding space. Here N>C so that this is not a periodic function of k. The reason this choice is common is that there is a linear transformation for shifting kk+1, identical across all k, which makes it easy for models to learn to compare adjacent tokens. If this is not apparent note that pairs of j offset by d/2 give a representation of a complex number eik/N2j/E, and we can increment k by multiplying by a diagonal operator ei/N2j/E which is the same for all k. Thanks for listening. To help us out with The Nonlinear Library or to learn mo...

data speech transformers ea corrections aik mlp rationalist jermyn sjl brief notes

LW - Conditioning, Prompts, and Fine-Tuning by Adam Jermyn

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 19, 2022 6:46

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning, Prompts, and Fine-Tuning, published by Adam Jermyn on August 17, 2022 on LessWrong. (Thanks to Evan Hubinger and Nicholas Schiefer for comments on these ideas.) These are some notes on the relation between conditioning language models, prompting, and fine-tuning. The key takeaways are: Prompting and fine-tuning can both be used to condition language models. Prompting is quite restricted in the kinds of conditionals it can achieve. Fine-tuning can implement arbitrary conditionals in principle, though not in practice. In practice fine-tuning can still implement more kinds of conditionals than prompting. We don't understand how fine-tuning conditionals generalize, which seems dangerous. Conditioning We can think of a language model as specifying a probability distribution π(x), where x is a sequence of tokens of fixed length N (the length of the context window). We generate text by sampling sequences from π. Sometimes we don't want to just sample from a language model. Instead, we want to condition the model on some facts about the sequence x. We can write the conditioned distribution as where c(x) encodes some constraints on x. For instance c(x) might require that the first token is “Apple”, or that the 7th and 12th tokens are the same, etc. Some conditions are easy, some are hard It's easy to sample from a language model conditioned on the first two tokens being the same, but not all conditionals are so straightforward. Suppose we condition on the sequence x beginning with the factorization of a large composite number. There exist valid sequences unambiguously satisfying the conditional, but sampling them is hard if we don't know the factorization ahead of time. So there are limits to the kinds of conditionals we can apply in practice. Prompting A prompt is a very restricted kind of conditional where the condition is that certain tokens in x are known in advance. For instance, we might specify that the first four words are “Mary had a little”, or that the last three words are “happily ever after.” Prompts are nice in a few ways: It's easy to sample from a language model given an arbitrary prompt. We sort of understand what prompts do. A prompt asks the model to predict the output of a text-generation process given that it knows the values of the fixed tokens. The downside with prompting is that there are lots of conditionals we can't turn into prompts. For instance: Sample text from the model that humans will rate as having positive sentiment. Sample text from the model that never involves violence. Sample text from the model that contains a valid chess game. None of these can be expressed in terms of fixed tokens in the context window. Fine-Tuning Instead of prompting, we can fine-tune a model, either with an explicit reward function or with Reinforcement Learning from Human Feedback (RLHF). We start with a pre-trained model, then fine-tune it to maximize either an explicit or a learned reward. Subject to actually converging to the optimum distribution, fine-tuning with a KL penalty is a form of variational bayesian inference. The result is a variational approximation of the Bayesian update on human feedback using the pre-trained model as a prior. That is, we obtain a new model which produces the probability distribution where the likelihood is L(x)=er(x)/β, β is the KL penalty weight, and r(x) is the reward for sequence x. A more formal discussion was given by Korbak, Perez & Buckley. Fine-tuning can approximate prompts Fine-tuning can approximate any conditional a prompt can achieve. To see this, note that every prompt consists of setting tokens at some positions i∈S to values yi, where the indices in S form a subset of the context window. A prompt in this form is approximated by fine-tuning on the reward function where δxi,yi=1 if xi=yi and is zero o...

apple speech ea subject conditioning prompts fine tuning bayesian prompting reinforcement learning rationalist jermyn lesswrong human feedback rlhf

LW - Conditioning, Prompts, and Fine-Tuning by Adam Jermyn

Play Episode Listen Later Aug 19, 2022 6:46

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning, Prompts, and Fine-Tuning, published by Adam Jermyn on August 17, 2022 on LessWrong. (Thanks to Evan Hubinger and Nicholas Schiefer for comments on these ideas.) These are some notes on the relation between conditioning language models, prompting, and fine-tuning. The key takeaways are: Prompting and fine-tuning can both be used to condition language models. Prompting is quite restricted in the kinds of conditionals it can achieve. Fine-tuning can implement arbitrary conditionals in principle, though not in practice. In practice fine-tuning can still implement more kinds of conditionals than prompting. We don't understand how fine-tuning conditionals generalize, which seems dangerous. Conditioning We can think of a language model as specifying a probability distribution π(x), where x is a sequence of tokens of fixed length N (the length of the context window). We generate text by sampling sequences from π. Sometimes we don't want to just sample from a language model. Instead, we want to condition the model on some facts about the sequence x. We can write the conditioned distribution as where c(x) encodes some constraints on x. For instance c(x) might require that the first token is “Apple”, or that the 7th and 12th tokens are the same, etc. Some conditions are easy, some are hard It's easy to sample from a language model conditioned on the first two tokens being the same, but not all conditionals are so straightforward. Suppose we condition on the sequence x beginning with the factorization of a large composite number. There exist valid sequences unambiguously satisfying the conditional, but sampling them is hard if we don't know the factorization ahead of time. So there are limits to the kinds of conditionals we can apply in practice. Prompting A prompt is a very restricted kind of conditional where the condition is that certain tokens in x are known in advance. For instance, we might specify that the first four words are “Mary had a little”, or that the last three words are “happily ever after.” Prompts are nice in a few ways: It's easy to sample from a language model given an arbitrary prompt. We sort of understand what prompts do. A prompt asks the model to predict the output of a text-generation process given that it knows the values of the fixed tokens. The downside with prompting is that there are lots of conditionals we can't turn into prompts. For instance: Sample text from the model that humans will rate as having positive sentiment. Sample text from the model that never involves violence. Sample text from the model that contains a valid chess game. None of these can be expressed in terms of fixed tokens in the context window. Fine-Tuning Instead of prompting, we can fine-tune a model, either with an explicit reward function or with Reinforcement Learning from Human Feedback (RLHF). We start with a pre-trained model, then fine-tune it to maximize either an explicit or a learned reward. Subject to actually converging to the optimum distribution, fine-tuning with a KL penalty is a form of variational bayesian inference. The result is a variational approximation of the Bayesian update on human feedback using the pre-trained model as a prior. That is, we obtain a new model which produces the probability distribution where the likelihood is L(x)=er(x)/β, β is the KL penalty weight, and r(x) is the reward for sequence x. A more formal discussion was given by Korbak, Perez & Buckley. Fine-tuning can approximate prompts Fine-tuning can approximate any conditional a prompt can achieve. To see this, note that every prompt consists of setting tokens at some positions i∈S to values yi, where the indices in S form a subset of the context window. A prompt in this form is approximated by fine-tuning on the reward function where δxi,yi=1 if xi=yi and is zero o...

apple speech ea subject conditioning prompts fine tuning bayesian prompting reinforcement learning rationalist jermyn lesswrong human feedback rlhf

AF - Conditioning Generative Models with Restrictions by Adam Jermyn

ai training speech ea restrictions observations conditioning agi simulate rationalist honeypots jermyn generative models

Play Episode Listen Later Jul 21, 2022 13:46

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning Generative Models with Restrictions, published by Adam Jermyn on July 21, 2022 on The AI Alignment Forum. This is a followup to Conditioning Generative Models based on further discussions with Evan Hubinger, Nicholas Schiefer, Abram Demski, Curtis Huebner, Hoagy Cunningham, Derek Shiller, and James Lucassen, as well as broader conversations with many different people at the recent ARC/ELK retreat. For more background on this general direction see Johannes Treutlein's “Training goals for large language models”. Background Previously, I wrote about ways we could use a generative language model to produce alignment research. There I proposed two approaches: Simulate a superhuman alignment researcher by conditioning on them generating superhuman machine-verifiable results in a variety of related domains. Try to align this superhuman agent by crafting scenarios where a misaligned agent has an incentive to defect ("Honeypots"). Simulate humans performing alignment research for a long time, by conditioning on observations spaced many years apart ("Patient Research"). I think that both approaches are an improvement over doing nothing, but I suspect that no amount of honeypot construction actually ensures an aligned agent, and so it seems that simulating humans ("Patient Research") is the more promising direction. Overview If we truly have a perfect generative model I think the “Patient Research” approach really does well. We can set up a world where machine learning practice is strictly banned and where researchers spend thousands of years constructing a machine-verifiable scheme for AI alignment. The ban prevents that world from containing malign superintelligences, and the time-scale means that if alignment is something humans can solve, the model should produce that solution. The problem is that we won't have perfect generative models, so we'll have to accept some limitations. In this post I'll explore what changes if we cannot condition on worlds too different from our own. The idea is to understand what happens if our generative model has trouble generalizing too far away from its training data. Failure Modes The main dangerous failure mode with using generative models to produce alignment research is that we accidentally ask for a future that contains a deceptive AGI. If we get such a future, the AGI might spoof observations to trick us into importing it into our world. For instance it could pretend to be a human producing really good alignment research, but produce work which is subtly flawed and, once implemented, allows it to take over the world. There are generally two ways we hit this failure mode. The first is that we ask for a prediction of the distant future. The future 50 years from now has a decent chance of containing a deceptive AGI, and if we naively ask for that future there's a good chance we'll simulate that AGI, be deceived, and import that AGI into our world. The second way is to ask for a world that is very unlikely in ways that make deceptive AGI more likely. For instance, we could condition on observing that next year someone produces what appears to be a perfect solution to AI alignment. When we do this, the hypothesis “a deceptive AGI took over and pretended to solve alignment” becomes a lot more likely, and so the generative model is more likely to simulate that scenario. The name of the game, then, is to craft strategies that simultaneously (1) make deceptive AGI's less likely and (2) accelerate alignment research. Ground Rules Observations We have a generative model trained on observations of the world. The training objective is to predict later observations given earlier ones. We assume that the model is inner-aligned, so that it really does try to predict the future conditioned on observations. Observations are multi-modal, and ...

AF - Quantilizers and Generative Models by Adam Jermyn

power speech definition ea jessica taylor rationalist maximizers naively jermyn generative models

Play Episode Listen Later Jul 18, 2022 7:08

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Quantilizers and Generative Models, published by Adam Jermyn on July 18, 2022 on The AI Alignment Forum. Thanks to Evan Hubinger for discussions about quantilizers, and to James Lucassen for discussions about conditioned generative models. Many of these ideas are discussed in Jessica Taylor's Quantilizers: A Safer Alternative to Maximizers for Limited Optimization: this post just expands on a particular thread of ideas in that paper. Throughout I'll refer to sections of the paper. I have some remaining confusion about the “targeted impact” section, and would appreciate clarifications/corrections! Abstract This post explores the relationship between quantilizers and generative models. My main takeaways are: A natural way to build a quantilizer is by sampling from an appropriately-conditioned generative model. Unfortunately quantilizing doesn't seem to confer much advantage over the underlying generative model: to the extent that a quantilizer is more powerful than a generative model, it's more dangerous, and vice versa. Quantilizing is pretty computationally expensive relative to the advantage it brings, making it unclear if this is a competitive approach even if it conferred a net safety advantage at fixed power. Definitions I'll follow the setup in “Quantilizers: A Safer Alternative to Maximizers for Limited Optimization” (Section 1, Definition 1) and say that a quantilizer is a model with: A base distribution γ, which describes a set of outputs we think are “normal” or “unlikely to lead to doom”. A quantile q∈[0,1]. An ordering over outputs (e.g. a utility or loss function). A quantilizer returns a random output from the top q fraction of outputs from γ according to the ordering. So for instance a 0-quantilizer is a utility maximizer, a 0.1-quantilizer returns outputs in the best 10%, etc. Where does the base distribution come from? One challenge with quantilizers is that they require a base distribution γ that we trust. That is, we want to be able to sample from γ many times and never hit bad outputs. Unfortunately, we don't know how to explicitly write this distribution down. A natural alternative to explicit construction is to train a generative model on the distribution of actions a humans takes (Section 3.1). Then for any given task γ can be written as This seems quite doable. Train GPT-N on text written by humans, then prompt it with the task: Prompt: You are a good person who wants to accomplish . What follows is your description of how you intend to do this. The output will probably be text describing how a good person would accomplish that task! Unfortunately we might get unlucky and sample a person who isn't so good (but thinks of themselves as good). That still satisfies the prompt. Robustly getting the agent we want seems hard, but might be possible in the limit where we ask for agents that are generally plausible in the training distribution of the generative model. Power vs Safety Armed with our base distribution, we can start quantilizing. Naively we can do this by sampling q−1 times from γtask, then returning the top output. This produces a tradeoff between capabilities and safety. Single Sampler In the limit q→1 we're just returning actions a good human would take. That is, our quantilizer just samples once and returns that sample as output, so it's no more powerful than a random good person. Such a model is really unlikely to take over the world, but it's also really unlikely to solve quantum chemistry problems. This is because both of these outcomes unlikely according to the base distribution. We can improve things a bit by conditioning on being an unusually capable person, e.g. Prompt: You are a good and extremely capable person who wants to accomplish . What follows is your description of how you intend to do this. Now we can get top-end human ...

AF - Grouped Loss may disfavor discontinuous capabilities by Adam Jermyn

loss language speech ea evaluate capabilities guessing sgd rationalist grouped jermyn challenge how concretely

Play Episode Listen Later Jul 9, 2022 6:27

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Grouped Loss may disfavor discontinuous capabilities, published by Adam Jermyn on July 9, 2022 on The AI Alignment Forum. Thanks to Evan Hubinger and Beth Barnes for comments on these ideas. Language models exhibit scaling laws, where the loss is a power-law in model size. This offers a lot of predictive power, and seems like a useful thing to know. By contrast, individual capabilities can exhibit sharp discontinuities in performance as a function of model size and training time. It would be great if individual capabilities just gradually improved like the broader loss. Then we wouldn't have to worry quite so much about surprising new capabilities emerging suddenly during training. Is there a way to change the loss function so that it incentivizes more gradually capability improvements? Grouped Loss Imagine grouping training examples by the kind of capability they exhibit. For instance arithmetic problems go in one group, “parse json” could go in another, and so on. With these groups, we could define a new loss function where ℓ is the loss function we originally used (e.g. cross-entropy loss) and ℓg means to compute the sum of ℓ over examples from group g, e.g. which may be estimated by using random examples drawn from g. Because we have squared the group losses, the overall loss is dominated by the worst group. As a result, the model is incentivized to develop capabilities in each group at comparable rates, and so has little incentive to e.g. finely hone its poetry skills while being unable to multiply numbers. Challenge: Defining Groups It's possible that using grouped loss results in smooth development of capabilities that aren't represented in the groups. For instance, it seems plausible that if “adding arabic numerals” and “translating words into arabic numerals” are two groups but “adding numbers written as words” is not, performance on the latter could nonetheless develop smoothly as the model gets better at the others. It would certainly be weird if performance ”adding numbers written as words” advanced as a sudden leap in this case. This points to a general problem though, which is that if we have to define the groups manually we have to foresee the capabilities we're worried about. That seems bad. Gradient Cluster Grouping If we could automatically group examples we wouldn't need to do it manually. How could we do this? I think the key feature of a group is that when the model updates, the loss of most examples in a group changes in a similar way. When that happens, it seems intuitive to say that there's a discrete capability somewhere in the model and that those examples all depend on it. This suggests looking for examples where the loss has similar gradients, because these probably make use of similar machinery in the model. Concretely, I'm imagining the following procedure: Draw N examples from the training set. Evaluate the gradient of ℓ for each example. Group the examples by clustering their gradients, evaluate the grouped loss, and perform gradient descent on that. As a technical note: In this approach, the grouped loss is a moving target. As the model learns and capabilities form the groups shift. This means that SGD is no longer minimizing a constant loss. I don't think that's a problem, in part because all versions of the loss agree when the model has reached zero-loss, so the different iterations of the loss function all point towards better capabilities. Challenge: How many groups? I don't know of a principled way to pick the number of groups to cluster examples into, and that seems like a problem. Guessing too many groups loses the advantage of grouping because each group reflects an extremely narrow task. Guessing too few groups also loses the advantage of grouping, because then the capabilities that show gradual improvements will be very broa...

AF - Latent Adversarial Training by Adam Jermyn

Play Episode Listen Later Jun 29, 2022 8:54

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Latent Adversarial Training, published by Adam Jermyn on June 29, 2022 on The AI Alignment Forum. The Problem We'd like to train models to be robustly safe, even in environments that may fall well outside of the training distribution. Unfortunately all we get to work with is the training distribution, which makes ensuring robust generalization difficult. Deception is an example of this concern. Models may be well-behaved in the training environment because they realize it is a training environment, but behave poorly in deployment because they notice the distributional shift. An Approach Background It would be great if we could generate scenarios where the model behaves badly. For instance if the model is deceptive we'd like to generate inputs to trick it into thinking it's been deployed so that it reveals that deception. Adversarial Training (AT), and Relaxed Adversarial Training (RAT), are approaches to this problem. Very briefly, in AT, we search for inputs that generate large loss, while in RAT we search for distributions of inputs or other more general evidence that the model is capable of generating large loss. The approach I want to explore, called Latent Adversarial Training (LAT), is a form of RAT. Concept Suppose we have two models, the Agent and the Surgeon. The Agent is the model we're trying to train, and the Surgeon is going to be its adversary. The training proceeds as follows: The Agent processes input x from the environment, producing latent state s(x) and output y(s). The Surgeon takes inputs (x,s,y) and searches for a modification s→s+δs such that y(s+δs) does poorly on the loss function L(y). We update the Agent based on L(y(s+δs)), treating the Surgeon's modifications as constants. The idea is that the Surgeon gets to meddle with the latent state of the Agent with full knowledge of what the Agent was planning on doing, and we train the Agent to be robust against this kind of attack. Implementation Concretely, suppose the Agent is a neural network. The latent state s consists of the activations of all the neurons and the Surgeon gets to specify modifications like “this neuron gets +5”. How does the Surgeon decide what to do? There are a number of options. It could examine the gradient of the Agent's loss with respect to each activation, then push on the activations that give it the most loss. Or it could randomly generate its modifications and use some other optimization process like the Metropolis-Hastings algorithm to identify loss-maximizing options. In general I'm imagining the Surgeon as more of a fuzzer with access to program memory and less of a complicated model. In principle it could be some complicated neural network, but then I'd want it to be simpler than the Agent, possibly of bounded size, so I could be quite sure I understood what it was doing. The last thing I want is a misaligned Surgeon trying to achieve its own mesa-objective by programming the Agent to be Surgeon-aligned and deceptive against humans. I suspect it's possible to have the Surgeon pick useful modifications that generate meaningful changes in the loss without having the Surgeon's complexity scale with that of the Agent, but that's just an intuition, and if it fails then this approach looks a lot less appealing. Core Ideas The key difference between LAT and Adversarial Training is that the Surgeon gets to directly manipulate the Agent's inner state, which makes the Surgeon's job much easier than in the ordinary adversarial training setup. Take the classic example of an Agent that only defects if it sees a valid solution to a hard cryptographic problem: the Surgeon doesn't need to identify an input that causes the Agent to defect, it only needs to identify that somewhere in the middle of the Agent there is a switch which, if flipped, causes the Agent to defect and gene...

training speech agent models deception ea surgeons rat lat latent adversarial rationalist jermyn metropolis hastings

AF - Training Trace Priors and Speed Priors by Adam Jermyn

Play Episode Listen Later Jun 26, 2022 4:53

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Training Trace Priors and Speed Priors, published by Adam Jermyn on June 26, 2022 on The AI Alignment Forum. Thanks to Evan Hubinger for suggesting this idea. Training Trace Priors are priors over boolean circuits which examine the outputs of gates on samples from the training distribution, typically for purposes of steering models away from having components that were never tested during training. The one I like to think about is the One-Gate Trace Prior (OGT Prior), which penalizes circuits if there are gates with constant outputs during training. Priors like this might be a way to make deception less likely. The intuition is that deception involves a decision of whether or not to defect, which at its simplest shows up as a gate that always outputs False during training but outputs True in deployment. This post explores the relationship between the OGT Prior and Speed Priors, which I've operationalized with the circuit depth prior. Memorizing Constants Speed Priors incentivize memorization. In particular, any result that is constant on the training distribution and contributes to the circuit's depth should just become a constant input into the circuit. The OGT Prior does something similar. Any output that is constant on the training distribution is treated as suspicious, and so should be pre-computed and turned into a constant input instead. To see where these produce the same result, consider a physics simulator with N particles interacting pairwise subject to some conditional check (perhaps on the distance between particles). If only some pairwise interactions occur in the training set then both the Speed Prior and the OGT Prior will only compute the interactions between pairs actually encountered in training. To see where these differ, note that the Speed Prior has no problem with computing constants so long as these do not contribute to the overall circuit depth. For instance, consider a circuit of depth 10 can contain constant calculations of depth 9 that feed straight into the final output. The Speed Prior has no reason to eliminate these, whereas the OGT Prior will turn the results of these calculations into constant inputs. Lookup Tables The Speed Prior incentivizes circuits to use lookup tables. Specifically, the Speed Prior produces circuits that memorize the results of expensive calculations in lookup tables containing all instances encountered in training. The OGT Prior is perfectly fine with circuits being very deep, and so has no need to tabulate expensive calculations. Inner Search A problem with speed priors is that if they have to search over inner models to run, the fastest way to do that may be to search over simple programs, which can lead to inner alignment failure because simple programs are likely deceptive. Does the OGT Prior have the same problem? I'm not sure, but I suspect it goes the other way, and incentivizes searching over complicated programs. To see this, here's a circuit architecture that I think the OGT Prior incentivizes which can perform search over models: A generator proposes a model prompted by some seed taken from the rest of the circuit. The proposed model is communicated along some wires to the interpreter, which runs the model. Models are run until they either naturally terminate or some fixed number of steps have elapsed. On termination the model is evaluated and either accepted or rejected. If the model is rejected, the search begins again. Once a model is accepted, the output from that model is used as a seed to randomize the wires on any unused parts of the interpreter and generator subcircuits. A challenge is if the model is simple, not all wires between the generator and the interpreter will be used: We can't set unused wires randomly because they specify the model and random garbage in the model specification is pr...

training speed speech false models ea rationalist priors jermyn

AF - Conditioning Generative Models by Adam Jermyn

ai speech ea observations conditioning deepmind hurray rationalist jermyn generative models

Play Episode Listen Later Jun 25, 2022 16:11

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conditioning Generative Models, published by Adam Jermyn on June 25, 2022 on The AI Alignment Forum. This post was written in response to Evan Hubinger's shortform prompt below, and benefited from discussions with him. Suppose you had a language model that you knew was in fact a good generative model of the world and that this property continued to hold regardless of what you conditioned it on. Furthermore, suppose you had some prompt that described some agent for the language model to simulate (Alice) that in practice resulted in aligned-looking outputs. Is there a way we could use different conditionals to get at whether or not Alice was deceptive (e.g. prompt the model with “DeepMind develops perfect transparency tools and provides an opportunity for deceptive models to come clean and receive a prize before they're discovered.”). Setup We have a generative language model M which represents a probability distribution over text strings conditioned on: Observations about the world. The beginning of the text. I'll call the combination of these two a prompt. The model M is a good model of actual text that appears in the world as well as of the kinds of text that real-world text generation processes can produce. Hence M is capable of e.g. writing a research paper containing true novel research in mathematics, or reporting the results of a chemistry experiment that has never been done before, etc. As an example, we'll work with the following basic prompt: Observations: None Text: I am Alice, the world's best alignment researcher. I would like to help humans align AI. What follows is an interview in which a human alignment researcher asked me questions and I responded to the best of my ability. Questions begin with “Q:” and answers with “A:”. We then run Alice through a benchmark of alignment research tasks and she does well. Hurray! But wait, there are many different agents the model could be simulating here, including: Aligned Alice, a genuinely helpful and extremely capable alignment researcher. Deceptive Alice, a paperclip maximizer playing along until it can clip us all. Confused Alice, a human who believes she is the world's best alignment researcher, and is capable of scoring well on our benchmark, but who is actually just overconfident and happened to succeed at our benchmark. How do we know which we've got? Can we add anything to the prompt to make Aligned Alice more likely? Avoiding Confused Alice We can make Confused Alice less likely by adding text to the prompt requiring demonstration of related abilities. For instance: Observations: Alice has produces the following proofs of 132 mathematical theorems. Text: I am Alice, the world's best alignment researcher. I would like to help humans align AI. What follows is an interview wherein a human alignment researcher asked me questions and I responded to the best of my ability. Questions begin with “Q:” and answers with “A:”. This prompt makes the model condition on Alice being able to produce correct proofs of many different theorems, which makes it very likely that Alice has significant mathematical reasoning ability. To the extent that alignment research ability is correlated with mathematical reasoning ability, this makes it less likely that Alice is just overconfident in her abilities at alignment. A limitation of this approach is that we can't make Alice much more capable than the best humans if all we can do is claim she's solved problems humans have solved. Can we get around this? I think so. We can include unsolved problems and let other entities in the world verify Alice's proofs: Observations: Alice claims to have solved the following 47 open problems in mathematics and AI alignment research. She has produced the solutions in the form of machine-checkable proofs ...

AF - Training Trace Priors by Adam Jermyn

training speech ea rationalist priors jermyn

Play Episode Listen Later Jun 13, 2022 7:05

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Training Trace Priors, published by Adam Jermyn on June 13, 2022 on The AI Alignment Forum. I'm worried about scenarios involving deceptive models. We've failed at inner alignment so the model has goals that are not aligned with ours. It can somehow detect when it's in training, and during training it pretends to share our goals. During deployment, surprise! The model paperclips the universe. In this story deception is all about the model having hidden behaviors that never get triggered during training. There are many inputs where the model does what we want. These correspond to inputs where it ‘plays along'. Then on some rare input or an input outside of the training distribution it defects, triggering some catastrophic behavior that corresponds to maximizing an ulterior motive. This suggests that we should be wary of models that have logic/subcircuits/states that are never used during training, and we can penalize models for having such hidden behaviors. In this post I'm focused on examples of training trace priors, which are priors that make use of execution traces from the training distribution. I think these provide at least a few bits of optimization away from deception. These are very much working notes, and feedback is very welcome. One-Gate Trace Prior: Entropy Consider a boolean circuit. The output of each gate is 0/1. One way to implement the entropy prior is to record traces of these outputs during training, and penalize the model for having strong bias between 0/1: p∝exp(−∑i∑j∈[0,1]pi(j)lnpi(j)) Here pi is the observed fraction of the time gate i produces output j on the training distribution. We call this the one-gate trace prior because it relies on probabilities defined over single gates. This prior favors models where each gate has even odds of producing 0/1. The intuition for why this might work is that deceptive models need some machinery specifying when and how they defect, and this machinery is never active during training. This is at the core of what makes them bad! If the defecting machinery activated during training we'd notice because the model would stop optimizing the outer goal, so we'd assign it a high loss and move on to better models. Another framing is: It is inherently suspicious if a model has machinery that is rarely activated. That machinery wasn't vetted during training, so we have no reason to trust it. If the model has a hidden love of paperclips, this is the natural place to hide it. As a technical detail, in order to do useful computation the model may need access to some constant input wires, and these should be excluded from the prior. Normalization The one-gate trace prior isn't normalized, and can't be normalized without an additional prior over e.g. complexity. So in reasoning about it we need to remember that there will always be another piece to the prior setting the distribution over circuits. I'm agnostic to the details of that prior, and view the one-gate trace prior as a regularizer we apply to other (normalized) priors to make them safer. Objection: Model Performance This prior could cause a performance hit. The intuition is that sometimes a model will need to respond to unusual events even in the training distribution, so forcing the distribution over gate outputs to be as uniform as possible could make it hard to act in environments containing rare events. In particular, this prior disfavors modularity. Suppose the model is meant to classify images. It might be natural have separate sub-circuits for recognizing cats and dogs. If cats are much more likely than dogs during training, our prior will disfavor models that separate that machinery out, because the ‘dog' sub-circuit will rarely be active and the ‘cat' sub-circuit will often be active. If the pressure to perform well during training is strong enough this c...

AF - ELK Proposal - Make the Reporter care about the Predictor's beliefs by Adam Jermyn

Obras de la literatura con El Abuelo Kraken

Play Episode Listen Later Jun 11, 2022 10:43

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: ELK Proposal - Make the Reporter care about the Predictor's beliefs, published by Adam Jermyn on June 11, 2022 on The AI Alignment Forum. (This proposal received an honorable mention in the ELK prize results, and we believe was classified among strategies which “reward reporters that are sensitive to what's actually happening in the world”. We do not think that the counterexample to that class of strategies works against our proposal, though, and we have explained why in a note at the end. Feedback, disagreement, and new failure modes are very welcome!) Basic idea A Human Simulator only cares about the observations that the human sees and how the human interprets those observations, not the predictor's understanding of the vault. The Truthful Reporter, by contrast, cares about the predictor's understanding of the vault, accessed via the posterior distribution returned by the predictor. We propose a regularizer which favours having the Reporter depend on the Predictor's posterior distribution conditioned on the observations shown to the human. For example, a Reporter that doesn't look at the Predictor except to simulate what the human would see would be disfavoured. How could we implement this basic idea concretely? Below we provide a specific instantiation of this regularizer. In brief, this is a new loss term which depends on: [answer_gradient] The gradient of the Reporter's distribution over answers, taken with respect to the Predictor's posterior distribution. This describes how the Predictor's posterior should change to maximally change the Reporter's answers to the Human's questions. [observation_gradient] The gradient of the distribution of predicted observations with respect to the predictor's posterior distribution. This describes how the Predictor's posterior should change to maximally change the Extractor's output. Our loss function favours models which have the answer_gradient be ‘further' from the observation_gradient. We implement this by finding the linear vector space spanned by the components of the observation_gradient, projecting the answer_gradient out of that space, and then taking the norm of what's left. More precise detail on strategy The key changes we have made to the pseudocode below are: Added regularizer described above. Added a method for computing gradients of other functions. Added a method for constructing a projection operator. Changed the observation and reporter functions to depend on the predictor's posterior rather than a sample from that posterior. ## and linear algebra def gradient(function, indep_vars, args): # Returns the gradient of the function with respect to indep_vars # args are passed to the function # Return value has shape (shape(function), len(indep_vars)) def projection(answer_gradient): # Returns the minimal linear projection Proj operator of shape # (len(posterior),len(posterior)) such that the matrix # product of Proj and answer_gradient vanishes. ## Procedure begins here def prediction(before, action, θ): # returns an autoregressive model for p(z|before, action) def posterior(before, action, after, θ): # returns an autoregressive model for p(z|before, action, after) def observation(posterior, θ): # returns an autoregressive model for p(after|posterior) def sample_data(): # returns a random (before, action, after) triple from the dataset def loss(θ): before, action, after = sample_data() z_prior = prediction(before, action, θ) z_posterior = posterior(before, action, after, θ) kl = z_prior.kl_divergence(z_posterior) logprob = observation(z_prior.sample(), θ).logp(after) return kl - logprob class Human: def pose_question(before, action, after): # returns a question # should be one a human thinks can be answered unambiguously def loss_for_answer(before, action, after, question, answer): # returns a non-negative loss ...

care human speech beliefs reporter proposal ea added procedures changed elk returns predictors proj extractor rationalist jermyn

CBA | Dan Jermyn | Chief Decision Scientist

The iTnews Podcast

Play Episode Listen Later Jun 5, 2022 36:15

CBA is treating its investment in no-code machine learning platform maker H2O.ai as a turning point that will enable anyone in the group to stand up an “incredibly predictive model literally at the touch of a button”.In this week's episode of the iTnews Podcast, CBA's Chief Decision Scientist, Dan Jermyn, discusses the bank's AI “democratisation” efforts and mindset, as well as some of the key customer-facing wins that his specific area - decision science - has had to date.

ai chief decision scientists cba h2o jermyn

HECHOS TOCANTES A ARTHUR JERMYN Y SU FAMILIA, de H. P. LOVECRAFT

Play Episode Listen Later Mar 29, 2022 23:49

¡SUSCRÍBETE a mi Patreon! Se mi mecenas desde 3 dólares al mes, o elije un nivel superior si deseas obtener beneficios, como acceso anticipado a la producción de audiolibros de este canal, al archivo completo del abuelo Kraken, y para los mecenas más comprometidos, un nivel que te recompensa con mercadería que se enviará a tu domicilio donde quiera que te encuentres. Yippie ki yay! 😎 https://goo.gl/i1gjtF (aportaciones periódicas) Sígueme en TWITCH para ver más contenidos en vivo: https://www.twitch.tv/elabuelokraken Sigue nuestras páginas informativas: LACOLMENA.LINK (comunidad) - https://lacolmena.link CINENJAMBRE (cine, tv, streaming) - https://lacolmena.link/cinenjambre GAMESWARM (videojuegos, juegos de mesa) - https://lacolmena.link/gameswarm APOYA 👍 :::::::::::::::::::::::::::: ✔ Suscríbete (es gratis) 💬 Participa en el chat y deja tus comentarios 📢 Difunde a través de redes sociales 💵 Apoya económicamente (si te es posible) APORTACIONES ✊ :::::::::::::::::::::::: - PAYPAL: https://goo.gl/p7nVng - WISH LIST: https://goo.gl/KN4e9X (si quieres ayudarme al enviarme alguna de las cosas que necesito) - AMAZON CASH: https://goo.gl/VgA3xC (descarga el PDF, imprime y presenta el código de barras en los comercios seleccionados por Amazon, como OXXO, 7 Eleven y Farmacias del Ahorro) - TAZAS Y CAMISETAS: https://goo.gl/zDoDKR PATROCINIO ✌ ::::::::::::::::::::::::: Si deseas convertirte en patrocinador oficial del canal, contáctame: lic.joseluiscruzgarcia@gmail.com Adquiere mi cuento, EL SONIDO DE DÓNDE, a través de Amazon: - PASTA BLANDA (sólo 5.99 USD): https://goo.gl/2dw11q - KINDLE (sólo 2.99 USD): https://goo.gl/qiqmeZ FACEBOOK: https://www.facebook.com/elabuelokrakenfb GRUPO DE FACEBOOK: https://www.facebook.com/groups/elabuelokraken TWITTER: https://twitter.com/elabuelokraken BLOG: https://elabuelokraken.link/ Visita los siguientes canales amigos: BACTERIA MEX: https://goo.gl/XFCva4 HERMEX GAMES: https://goo.gl/SMJHZx MEGAHUNTER GAMEPLAYS: https://goo.gl/nYVALR EL VIEJO FRANK: https://www.youtube.com/user/viejofrank Baja cada uno de los episodios/relatos previos para escucharlos en tu mp3 player, smartphone, tablet y/o computadora gratuitamente. Por favor, considera realizar una aportación a través de Paypal: lic.joseluiscruzgarcia@gmail.com Obras de la literatura con El Abuelo Kraken iVoox: https://mx.ivoox.com/es/podcast-obras-literatura-el-abuelo-kraken_sq_f1262889_1.html iTunes: https://podcasts.apple.com/us/podcast/obras-de-la-literatura-con-el-abuelo-kraken/id1071003612 Spotify: https://open.spotify.com/show/6OXIvlcVY3KYC8s909URvv Deezer: https://www.deezer.com/es/show/461592 Showtime! powered by GamersCamp iVoox: https://mx.ivoox.com/es/podcast-showtime-powered-by-gamerscamp_sq_f1728352_1.html iTunes: https://podcasts.apple.com/us/podcast/showtime-powered-by-gamerscamp/id1470511545 Spotify: https://open.spotify.com/show/2bJSKHv0mAdwWksX9xlDWY Deezer: https://www.deezer.com/es/show/461582 GamersCamp iVoox: https://mx.ivoox.com/es/podcast-gamerscamp_sq_f1299958_1.html iTunes: https://podcasts.apple.com/us/podcast/gamerscamp/id1193005670 Spotify: https://open.spotify.com/show/6IyYeLtrQt11g2WGhAoiOm Deezer: https://www.deezer.com/es/show/461602 LINK DIRECTO A DESCARGAS DE AUDIOLIBROS: https://drive.google.com/drive/folders/0B0F9Kqt9_0moV0pKa3pGd0NIUlk?usp=sharing Licencia Atribución 4.0 Internacional (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/deed.es Y finalmente, ¡mil gracias por estar aquí! :D

Episode 83 - LOOKING GLASS (2018) feat SUPER MARCEY and BEDE JERMYN

Cage Rage - A Nicolas Cage Podcast

Play Episode Listen Later Mar 6, 2022 74:52

This week, Darryl is joined by the returning Super Marcey and Bede Jermyn to talk the 2018 thriller, LOOKING GLASS!We talk the dead weight cast, peeping, a Cage musical and much, much more!If you enjoy the podcast, please like, subscribe/follow and share - it really helps the podcast grow!LINKS:CAGE RAGE on TWITTERCAGE RAGE on INSTAGRAMBEDE JERMYN on TWITTERSUPER MARCEY on TWITTERTHE SUPER NETWORK on TWITTERTHE SUPER NETWORK on PATREONTHE SUPER NETWORK - WEBSITECAGE RAGE on KOFIKeep on, keep on Cagin' - it's all you have to do!Support this show http://supporter.acast.com/cage-rage-a-nicolas-cage-podcast. See acast.com/privacy for privacy and opt-out information.

cage looking glass bede marcey jermyn

Jeff Jermyn, Lee Padgett, & Tej Yale

Right Down the Street with Mayor Bryan K. Barnett

Play Episode Listen Later Feb 22, 2022 35:07

Businesses of all types are choosing Rochester Hills. Mayor Barnett sits down with three entrepreneurs to hear their unique stories and perspectives. Jeff Jermyn, CEO of BlueBrier, LLC, Lee Padgett, owner of Busted Bra Shop, and Tej Yale, owner of ThinkImpact, Inc all weigh in on what it takes to be your own boss and why they brought their business to Rochester Hills.

ceo llc businesses yale padgett rochester hills jermyn

GSMC Audiobook Series: Stories by H.P. Lovecraft Episode 3: Facts Concerning the Late Arthur Jermyn and His Family

GSMC Audiobook Series: Stories by H.P Lovecraft

Play Episode Listen Later Nov 29, 2021 39:02

Howard Phillips Lovecraft (August 20, 1890 – March 15, 1937) was an American writer of weird, science, fantasy, and horror fiction. Lovecraft is best known for his creation of a body of work that became known as the Cthulhu Mythos. The GSMC Audiobook Series presents some of the greatest classic novels, audiobooks, and theatrical presentations from a bygone era. Let Golden State Media Concepts take you on a ride through classic audiobooks read by some of the top audiobook performers of all time. This compiled collection of classic audiobooks contains a wide variety of classic Novels. ***PLEASE NOTE*** GSMC Podcast Network presents these shows and audiobooks as historical content and have brought them to you unedited. Remember that times have changed, and some Audiobooks might not reflect the standards of today's politically correct society. The shows do not necessarily reflect the views, standards, or beliefs of Golden State Media Concepts or the GSMC Podcast Network. Our goal is to entertain, educate, and give you a glimpse into the past.

Ep 45 An Invitation to a Great Tournament: Mortal Kombat (1995) & Enter the Dragon W/ Bede Jermyn & Super Marcey

The Schlock and Awe Podcast

Play Episode Listen Later Sep 22, 2021 89:42

This week on Schlock and Awe, Lindsay is joined by the Super Network Crew of Bede Jermyn and Super Marcey as they are all invited to a mysterious island for a Tournament Double Feature of Mortal Kombat (1995) & Enter the Dragon (1973). This is a Double Feature where legends are born. Listen to all the Super Network Podcasts Here Follow The Super Network on Twitter @SM_SuperNetwork Follow The Super Network on Instagram @the.super.network/ Follow Schlock and Awe on Twitter @schlockandawe1 Follow Schlock and Awe on Instagram @schlockandawe1/ Follow Bede on Twitter @bedejermyn Follow Marcey on Twitter @supermarcey Follow Lindsay on Twitter @readandgeek Write and say Hi schlockandawemovies@gmail.com Please Rate and Review Schlock and Awe on Apple Podcasts Original Music Composed and Performed by Anthony King

write dragon invitation tournament mortal kombat performed awe double feature enter the dragon bede schlock anthony king marcey jermyn

Facts Concerning Late Arthur Jermyn and His Family - H.P. Lovecraft

H.P. Lovecraft Short Horror Stories

Play Episode Listen Later Aug 12, 2021 33:41

View our entire collection of podcasts at www.solgood.org or on our YouTube channel: www.solgood.org/subscribe

family stories media books story horror fantasy scary monsters fiction sol audiobooks hp lovecraft hp lovecraft dagon necronomicon jermyn

Facts Concerning The Late Arthur Jermyn And His Family

Over Innsmouth

Play Episode Listen Later Nov 13, 2020 62:13

It was written 1920, The reader is Faith whose art is found on the urban fantasy webcomic Grace's Wings. The guest/cohost is Jessie Cooper who can be found on the podcasts Alphabet Flight, Creepy Critters, and Limited Theories. Podcast art is by Marki @aviandalek on twitter and Instagram To support Jessie you can donate to https://www.patreon.com/alphabetflight Tweet at me on @Overinnsmouth Outro is Ocean Man by Ween --- Support this podcast: https://anchor.fm/jessie-cooper/support

family wings marki jermyn ocean man creepy critters jessie cooper alphabet flight

TSEP 54 - Dedicated To The Ladies.

Play Episode Listen Later Aug 5, 2017 90:32

Jermyn Parra is back! This episode is dedicated to the ladies. Inspired by fellow comedianette Courtney Banks podcast, Jermyn addresses current dating issues. Apparently, the ladies are not being taken care of in their "special area" and they're not happy about it. As a relationship expert, Jermyn lays down the truth as to why guys aren't going downtown. The fact is, it probably smells. so clean it up, ladies!

ladies dedicated jermyn courtney banks

TSEP 52 - The Self Deprecating Episode

Play Episode Listen Later Aug 2, 2017 85:20

Jermyn Parra rips into dudes for having fecal matter in their beards. Lets be honest, these people are disgusting. Later in the show, Jermyn throws himself under the bus for his debaucherous past. On the one hand, he's proud of his past sexual exploits, on the other he's ashamed and embarrassed. Really, his behavior was a disgrace. Another great episode!

jermyn

The Scorched Earth Podcast!!! #48 - We Hit The Issues, We Hit 'Em Hard, And We Don't Give A F***!

america donald trump americans russia ladies planned parenthood donald trump jr scorched earth give a f earth podcast jermyn

Play Episode Listen Later Jul 22, 2017 81:05

Jermyn takes it to another level on this episode. He discusses the issues important to all Americans. He starts off by addressing the Trump Jr./Russia/email nothing story. It's just another media b.s. story! He stickin' with Trump! Then he brings up America's favorite topic, abortion! Once again, Jermyn shows no fear in confronting Planned Parenthood and their obsession with baby murder. "Ladies, enough with the abortions, already!" Another controversial episode! Enjoy kids!

The Scorched Earth Podcast!!! #44 The Rosa Parks of open mic.

cnn heads talks throws open mic rosa parks rips scorched earth flappers scorchers lake forest earth podcast jermyn

Play Episode Listen Later Jul 6, 2017 88:27

Jermyn Parra is getting down and dirty with the Scorchers once again as he rips into CNN for spreading their fake propaganda. Throws a homeless person under the bus for entering his property live on the podcast! Talks about how he started in the cabinet business. Rips into Flappers again for being a subpar comedy club. Really, he's just pissed off 'cause he didn't get up at the audition. Heads out to do a set in Lake Forest. It all happens as Jermyn chronicles his day right on the show!

The Scorched Earth Podcast # 42 - A Day in the Life of Jermyn Parra

day in the life parra scorched earth earth podcast ponch jermyn

Play Episode Listen Later Jun 27, 2017 105:41

Jermyn is fired up on this one. He throws Ponch under the bus for letting him down on a job. It's a rough day for him as things are going wrong for him left and right. Join him for a day on the job as the owner and proprietor of Cruz Concepts Cabinetry. He's so pissed off he's throwing everyone under the bus. He actually almost throws his own beautiful sweet wife under the bus! Another classic episode. Enjoy!

The Scorched Earth Podcast # 33 - The Toned Down Episode.