POPULARITY
The dreaded bug report: this app is slow. Ok, but what is slow? I have so many questions!In today's episode, we discuss our approach to getting more information out of that initial bug report, and a methodical approach to locate and quantify the slowness.If you'd like help setting up tools like Xdebug profiling, send us a message.
Junior developers are a category of people we haven't explored much on this podcast yet. In this episode of Smash the Bug, we're changing that and talking with Alice Dean, a developer with Fisheye. The hunger and tenacity of junior developers is unmatched, and should not be overlooked when it comes to those we can learn from! Alice gives us some insight into what her day-to-day is like, and talks about a couple bugs she has smashed recently. Show Notes 0:00 Intro 2:29 An average day for Alice 5:43 Making the jump from Technical Account Manager to Developer 7:10 Getting the first Magento certification 10:09 A recent invoicing bug 15:00 Recounting the troubleshooting steps that Alice took 16:09 The one line fix Alice used for this challenge 17:36 Using integration tests for troubleshooting 19:50 A custom product alert module issue with 2.4.4 23:29 When the error logs aren't consistently providing helpful information 26:14 How to get Xdebug working on the CLI 29:37 Alice's advice for others getting started as a developer 32:18 Outro Connect with Alice: LinkedIn Twitter Connect with Joseph: LinkedIn Twitter https://swiftotter.com https://twitter.com/Swift_Otter https://www.facebook.com/SwiftOtterInc Do YOU have an incredible debugging story to share? Send your story to brandym@swiftotter.com and you might be our next podcast guest! This podcast exists to inspire, educate and entertain eCommerce developers who are serious about improving their skills and advancing their careers! Have you joined the free SwiftOtter Slack community? It's exploding and we don't want you to miss out. You can join for free and get plugged into what might be the best group of collaborating developers around!
PHP Internals News: Episode 103: Disjunctive Normal Form (DNF) Types London, UK Friday, June 24th 2022, 09:07 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Disjunctive Normal Form Types" RFC that he has proposed with Larry Garfield. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 103. Today I'm talking with George Peter Banyard again, this time about a disjunctive normal form types RFC, or DNF, for short, which he's proposing together with Larry Garfield. George Peter, would you please introduce yourself? George Peter Banyard 0:39 Hello, my name is George Peter Banyard, I work on PHP paid part time, by the PHP foundation. Derick Rethans 0:44 Just like last time, we are still got colleagues. George Peter Banyard 0:46 Yes, we are indeed still call it. Derick Rethans 0:48 What is this RFC about? What is it trying to solve? George Peter Banyard 0:52 The problems of this RFC is to be able to mix intersection and union types together. Last year, when intersection types were added to PHP, they were explicitly disallowed to be used with Union types. Because: a) mental framework, b) implementation complexity, because intersection types were already complicated on their own, to try to get them to work with Union types was kind of a big step. So it was done in chunks. And this is the second part of the chunk, being able to use it with Union types in a specific way. Derick Rethans 1:25 What is the specific way? George Peter Banyard 1:27 The specific way is where the disjoint normal form thing comes into play. So the joint normal form just means it's a normalized form of the type, where it's unions of intersections. The reason for that it helps the engine be able to like handle all of the various parts it needs to do, because at one point, it would need to normalize the type anyway. And we currently is just forced on to the developer because it makes the implementation easier. And probably also the source code, it's easier to read. Derick Rethans 1:54 When you say, forcing it up on a developer to check out you basically mean that PHP won't try to normalize any types, but instead throws a compilation error? George Peter Banyard 2:05 Exactly. It's, it's the job of the developer to do the normalization step. The normalization step is pretty easy, because I don't expect people to do too many stuff as intersection types. But as can always be done as a future scope of like adding a normalization step, then you get into the issues of like, maybe not having deterministic code, because normalization steps can take very, very long, and you can't necessarily prove that it will terminate, which is not a great situation to be in. Imagine just having PHP not running at all, because it's stuck in an infinite loop trying to normalize the format. It's just like, oh, I can't compile Derick Rethans 2:39 Would a potential type alias kind of syntax help with that? George Peter Banyard 2:44 Maybe, I'm not really sure. Actually reading like research about it from computer scientists, in functional programming languages, which is everything is compiled on my head. And they have the whole thing was like, well, they need to type type normalize, and especially with type aliases, they haven't really figured out a way yet. So I'm not sure how we are going to figure out a way if experts and PhD students and researchers haven't really figured out a way. Derick Rethans 3:08 And is the reason for that mostly, because PHP, resolves types while it is running code sometimes because it has to overload classes, and then it might find out it is an inherited class, for example? George Peter Banyard 3:19 Yes, I think it's like this weird thing where might maybe PHP has like kind of an advantage, because it doesn't need to, like resolve all of the types at once. And if you have a type alias, it's just oh, if it's used, and you just need to resolve it, and then try to figure it out. There's also the added complexity of like, variance checks, because most functional programming languages, they have variance to some degree, but they don't have the whole inheritance of like typical OOP languages have. It's kind of a very strange field, the fact that yeah, PHP is just like, well, we kind of do stuff at runtime, and you don't necessarily need everything. And it just works is like, well, we'll do. That's mainly the reason why the dev needs to do the normalization step, the form is done. It's also I think, the most easiest to understand, it's just like, Oh, you have this and this, or this group, or stuff, or this group of stuff, or this thing, simple type. The other form would be another normalized form would be conjunctive normal form, which is a list of ANDs of ORs to just have this thing, or X, like (A or B or C) and X and (Y or Z), which I think is harder to understand. Derick Rethans 4:26 What is the exact syntax then? George Peter Banyard 4:28 So the exact syntax is, if you want to have an intersection type was in a union type, you need to like bracket it by parentheses. And then you have like the normal pipe union operator and you can mix it with single types, you can mix it with true, you can mix it with false, which are literal types, which now exist, or just normal, bool types. Derick Rethans 4:48 The parenthesis is actually required. You don't rely on operator precedence to make things work? George Peter Banyard 4:53 Yes. Relying on operator precedence is terrible. Derick Rethans 4:57 Yep, I agree. George Peter Banyard 4:58 I'd say Oh, yeah, but I think I've heard this argument on the list like a couple of times, it's just, oh, yeah, but maths, like, has like, and as priority over like, or, I mean, I did three years of a maths degree and not gonna lie. Maths notation is terrible for most of us. People don't even agree on terminology. I'm just gonna say, let's, let's just do better. Derick Rethans 5:19 I agree. I mean, most coding standards for any sort of variable for like conditions, will already require parenthesis around multiple complex clauses anyway, right? I mean, it's a sensible thing to do, just for readability, in my opinion. So the RFC also talks about a few syntax that you aren't allowed to do, and that you have to normalize or deconstruct yourself, what kinds of things are these? George Peter Banyard 5:41 if you would want to have a type which has an intersection of a class A with at least one other class, so let's say X or Y, but you can always convert it into DNF form, how this type would be, it would be (A and X) or (A and Y). This seems to be the more unusual case, I would imagine. One of the motivating cases of DNF types is to do something like Array or (Traversable and Countable). I don't really see mixing and matching various different object interfaces in differencing, the most useful user land cases to be able to do Array or (Traversable and Countable) so that you can use just count or seeing something as an array, or you have like Traversable and Countable and ArrayAccess. And it's just like, Oh, here's an object, which kind of behaves like an array. Derick Rethans 6:32 I think there's currently another RFC just being proposed, that extends iterator_to_array to multiple types as well to accept more things. So that sort of fits into this category of things to do with iterables and traversals then I suppose. George Peter Banyard 6:49 yeah Derick Rethans 6:50 I'm hoping to talk to the author of that RFC as well. At the moment where two and a half weeks or so before a feature freeze, you now see a whole flurry of RFCs while it was a bit quiet in the last few months. So because you're adding to the type system, that's also usually has consequences for variance rules, or rather, how inheriting works with return types and argument types, as well as property types. What do DNF types mean for these variance checks? George Peter Banyard 7:19 The variance is checks, kind of follow the similar rules as before. So property types are easy. They are invariant, so you can't change them. You can reorder types, like was in your union if you want to. But that was already the case with Union types previously, because PHP will just check that, well, the types match. So contravariant, you can always restrict types, meaning you can either add intersections, or you can remove unions, broadly speaking. What you could do, for example, if you have like A or B or C, you could do A and X as a subtype, because you're restricting A to be of an extra, like an extra interface. Derick Rethans 8:06 So then you will have (A and X) or B or C. George Peter Banyard 8:10 Yes. So that's one restriction. You can add how many interfaces you want and do an intersection type, you can add them on every type you can. On the other side, you can just add like unions. So if for contravariance, or like an an argument type, it's like, well, I just want to return something new, well, then you can add unions, but you can't add an intersection to a type, you can only widen types of arguments. So if your type is A or B or C, you can't do A and B, and you can't do (A and X) or B or C, because you're restricting the type. If your type would be (A and X) or (B and Y) or (C and Z), then you could lift the restriction to A or B or (C and Z) because you loosening the requirements on on the type that you're accepting. Derick Rethans 8:55 To summarize this: argument types, you can always widen; return types you can only restrict, and, and property types you can't change at all. I specifically wanted to summarize that because I always find contravariance and covariance. These names confuse me. So that's why I prefer to talk about widening types and restricting types instead. Because there are so close together for me. We spoke a little bit about redundant types. What is this new functionality do if you specify redundant types? George Peter Banyard 9:30 Redundant types how they currently work in PHP are done at compile time. And they do exact class matches or constant class aliasing matches. Derick Rethans 9:41 That will need an explanation. George Peter Banyard 9:44 Class names and interface names in PHP are case insensitive. So you can write a lower-case a or upper-case A and it means the same class. If you provide let's say lower-case a or upper-case A, the engine realize this, this is the same class, so we'll serve it on the type error. So PHP has use statements, or use as. So these are compile time aliases. If you define a class A, and then you say use A as B. So B is a compile time alias of A. And then you do a type which has A or B, PHP already knows these things refer to the same class. So it will raise a compile time error. Derick Rethans 10:25 These use aliases are per file only, right? George Peter Banyard 10:28 Yes, that's usually to do with if you import traits or like a namespaces. And you get conflicting class names. That's how you handle it about. PHP has also this feature, which you can do this at runtime, using the function called class_alias. Now, obviously, compile time checks are done at compile time. So it doesn't know at runtime that you aliasing these classes or using this name as an alias. So then PHP won't complain. Derick Rethans 10:53 But will don't complain during runtime. George Peter Banyard 10:56 No. Derick Rethans 10:56 You really just wanted to shoot yourself in the foot, we'll let you do this. George Peter Banyard 11:00 Yet, during this at runtime, just as like a whole layer of time, because it's not it's not really useful. Basically, what it means that PHP won't guarantee you the type is minimal. I.e. you might have redundant types, but it will just try to tell you, it's like oh, the- these are exactly the same types. And I know these are the same types, you probably do get mistake. So if it can determine this at compile time, it will tell you. Derick Rethans 11:23 The variance is still checked when you're passing in things. George Peter Banyard 11:26 Yes, so variance is checked on inheritance. When the class is inherited and compiled, because it needs to load the parent class, it will then check that it's built properly, and otherwise it will raise an error, that's fine. But just checking that the types is minimal is not possible. A) because inheritance, you don't know how it works, because it will only do the checks on basically on the name of the strings, it will do like compare strings of class names. And if it doesn't know the class name, or if it or if it needs to do some inheritance, it just won't do an instance of check. They just ignore that. It's just like, well, maybe it is maybe it's not I don't know. And that's fine. Derick Rethans 12:08 Of course, if you pass in a wrong type at runtime, then it will still get rejected during runtime anyway. George Peter Banyard 12:14 Yes, that hasn't changed. Derick Rethans 12:16 The only thing that you might end up in a situation where you don't get warned during compile time whether type is redundant. George Peter Banyard 12:23 Yes. So that's the behaviour we currently are the behaviour is added. So, it will check that two intersection types within the union are identical using the same class stuff. So for example, if you have class A, and you say use a as B, and then you have a type which is (A and X) or (B and X), it will tell you: Okay, these classes are the same. The check it adds now also it will check that you don't have a more restrictive type with a wider type. So if your type is T or (T and X), because T is wider than T and X, it will error at compile time, it'll tell you well, T is less restrictive than T and X. So the T and X type is redundant. Derick Rethans 13:11 Okay, so nothing strange. Basically, what you expect to happen will happen. And PHP does its best telling you at compile time whether you've done something wrong or not. George Peter Banyard 13:22 Yes. Derick Rethans 13:24 I think we've spoken mostly about the functionality itself and types. I'm a little bit interested in whether you encountered some interesting things while implementing this feature. George Peter Banyard 13:33 This feature basically, was a bit in limbo for the implementation, because I was waiting on a change to make Iterable, a compile time alias of Array or Traversable, which shouldn't affect userland. Because previously, all of the checks needed to cater to if you get Iterable, then you need to check for the variance. Has it Array , has it a Traversable type, does this accept? Is it why the is it more restrictive, it's identical. It's just this weird edge case, which makes the variance code harder. Moving this to a compile time alias, where now it just uses the standard, a standard union type in some sense, just makes a lot of the variance checks already streamlined and simpler. And because this is simpler, in some sense, was DNF types. When you hit the intersection, you need to recurse one step to check the variance. This helps. This is also kind of why DNF types are enforced like as like the structure on the dev because otherwise, you could potentially get into the whole like, oh, infinite recursion if you do like very nested types, because it's just like, oh, you hit one nested type and so, oh okay, now I'm again in unnecessary time and then you recurse again and then you recurse again, and so that's all you get into the thing: Oh you need to normalize the type. The variance check is: Can you see if it's a union type is the first type a sub list So a list of intersection types, okay, is it balanced? And then just recall the same function in some sense, like, check the types for variance, is this correct? Okay, move to the next type back into the Union and everything. So the implementation is conceptually simple, because all of the implementation details already exist. And all the everything hard has already been done. Now, it's just like, in some sense, it was extracting it into its own function, and then like recurse into it, and not forget to update opcache properly. Derick Rethans 15:31 You mentioned that in order to make the DNF types work, you were waiting on this Array or Iterable or Traversable kind of type. Is this also type people can use it and userland? Or is it internal only? George Peter Banyard 15:44 It is the standard Iterable type that you can already use. So currently, PHP considered Iterable, a full type in some sense. And what the this implementation change basically makes it Iterable into ... compile time alias of Array or Traversable. Iterable exists since, PHP, 7.1, I think. Can still use it, reflection should still be fine if you use it as a single type. Derick Rethans 16:08 So to change there is more, instead of: if you encounter Iterable, we check for both Array and Traversable. Then, instead of making the check every time you look at Iterable is already part of the type system, so you don't have to make the check every time. George Peter Banyard 16:23 Exactly, you basically move when it's being transformed in some sense. Now it has some repercussion on other parts, which needed to be taken care of, which is probably why it was in limbo for 10 months. I had already done the implementation of DNF types, basically, working on my local copy of that branch. It's just like: Okay, this got merged, nice, I can now open the PR onto PHP SRC. So I didn't wait for it to land until start working on it. Derick Rethans 16:50 Things like that also often affect reflection, because you're adding more complex types to the type system. So what kind of changes does that make to PHP's reflection system? And does this end up breaking backwards compatibility? George Peter Banyard 17:04 So in theory, no, it doesn't. How the reflection API works around the type system is that most method calls will turn a reflection type interface, ReflectionNameType, ReflectionUnionType, and ReflectionIntersectionType, are all instances of a ReflectionType. And methods if you would call on the list. So on a union type, the type it would return if you get like getTypes is a ReflectionType. The type system and how the reflection idea was designed, there is no BC break. How the standard was working, it's like, Oh, if you had like a union type, or an intersection type, if you call the getList or getListOfTypes, or getTypes, I don't remember exactly what the method name is actually called, you will always get an array of reflection name types, because you can only have like one level of list in some sense. However, now, if your top type is a union type, then if you get getTypes, you might get an array of ReflectionNameTypes with ReflectionIntersectionTypes. So that's the case that you now need to cater to. So if you get another ReflectionIntersectionType in between. There, you could only have ReflectionNameTypes, there was no nesting, whereas now if you have a union type, one of the types that you get back from the getTypes method in the array will be a ReflectionIntersectionType. Technically, all of the types of the part of the reflection type, so it's an array of reflection types that you get. How it worked before is that you didn't need to care about this distinction between: Oh, it returns a ReflectionType and a ReflectionNameType because well, it only return a ReflectionNameType. But now this is not the case. So you now need to cater to that that oh, you might have nesting. Which kind of boils down to like if in the future, we decide to like have oh, you can nest union types in an intersection type, then the getTypes method might return a union type with other name types. Derick Rethans 19:03 You just need to make sure that you check for more than just one thing that it previously would have done. You can't assume not everything is a ReflectionType any more. It could also be ReflecionIntersectionType. George Peter Banyard 19:18 Yes, exactly. Derick Rethans 19:20 I think that sort of what's in the RFC, is there any future scope? George Peter Banyard 19:25 I mean, the future scope is type alias. As usual. Everything I feel when you talk about the type system, it's like type aliases. At one point when your types gets very complicated. It would be nice to just be able to refer this as a as a named type in some sense, instead of needing to retype every time the whole union slash intersection of it. Hopefully we can get this running for 8.3. We are starting to get kind of complicated types. It would be nice being able to have this feature. The other obvious future scope in some sense, who knows if it's actually desirable is to allow either having conjunctive normal form so you can have like a list of ANDs or ORs Derick Rethans 20:05 You call these conjunctive normal forms? George Peter Banyard 20:08 Yes. Or just a type, which is not normalized. Not sure if it's really desirable to have this feature, because then you get into the whole thing of, if PHP doesn't, either PHP doesn't know how to like normalize it, or it's not in the best form, and then you get into like, very long compilation units or just checking. It's like, okay, does it respect the type? Does it do all of the instance of checks? And I'm not sure if it's super desirable. Derick Rethans 20:38 So it could be considered future scope. But from what I gather from you, you don't actually know what it is actually a desirable thing to add to the language? George Peter Banyard 20:46 Yes. Derick Rethans 20:47 Okay, George, thank you for taking the time this morning to talk about this new DNF types RFC. George Peter Banyard 20:54 Thank you for having me. As always. Derick Rethans 20:59 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Disjunctive Normal Form Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 102: Add True Type London, UK Thursday, June 2nd 2022, 09:06 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Add True Type" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:00 Hi I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 102. Today I'm talking with George Peter Banyard about the Add True Type RFC that he's proposing. Hello George Peter, would you please introduce yourself? George Peter Banyard 0:33 Hello, my name is George Peter Banyard, I work part time for the PHP Foundation. And I work on the documentation. Derick Rethans 0:40 Very well. We're co workers really aren't we? George Peter Banyard 0:43 Yes, indeed, we all co workers. Derick Rethans 0:45 Excellent. We spoke in the past about related RFCs. I remember, which one was that again? George Peter Banyard 0:51 Making null and false stand alone types Derick Rethans 0:53 That's the one I was thinking of him. But what is this RFC about? George Peter Banyard 0:56 So this RFC is about adding true as a single type. So we have false, which is one part of the Boolean type, but we don't have true. Now the reasons for that are a bit like historical in some sense, although it's only from PHP 8.0. So talking about something historical. When it's only a year ago, it's a bit weird. The main reason was that like PHP has many internal functions, which return false on failure. So that was a reason to include it in the Union types RFC, so that we could probably document these types because I know it would be like, string and Boolean when it could only return false and never true. So which is a bit pointless and misleading, so that was the point of adding false. And this statement didn't apply to true for the most part. With PHP 8, we did a lot of warning to value error promotions, or type error promotions, and a lot of cases where a lot of functions which used to return false, stopped returning false, and they would throw an exception instead. These functions now always return true, but we can't type them as true because we don't have it, and have so they are typed as bool, which is kind of also misleading in the same sense, with the union type is like, well, it only returns false. So no point using the boolean, but these functions always return true. But if you look at the type signature, you can see like, well, I need to cater to the case where the returns true and when returns false. Derick Rethans 2:19 Do they return true or throw an exception? George Peter Banyard 2:22 Yeah, so they either return true, or they either throw an exception. If you would design these functions from scratch, you would make them void, but legacy... and we did, I know it was like PHP 8.0, we did change a couple of functions from true to void. But then you get into these weird shenanigans where like, if you use the return value of the function in a in an if statement, null gets because in PHP, any function does return a value, even a void function, which returns null. Null gets coerced to false. So you now get like, basically a BC break, which you can't really? Yeah, we did a bit and then probably we sort of, it's probably a bad idea. That's also the point of like, making choices, things that are static analysers can be like, more informants being like, Okay, your if statement is kind of pointless here. Derick Rethans 3:06 Yeah, you don't want to end up breaking BC. Now, we already had false and bool, you're adding true to this. How does that work with Union types? Can you make a union type of true or false? George Peter Banyard 3:18 No. So there are two reasons mainly. A. true and false is the same as like boolean, which is like just use Boolean in this case. But you can say, well, it's more specific, so just allow it. So that's would be reasonable. But the problem is, false has different semantics than boolean. False does not coerce values. So it only accepts false as a literal value. Whereas boolean, if you're not in strict type, which is a lot of code, it will cause values like zero to false one, or any other integers to true. It will coerce every other integer to true, like the true type follows the behaviour of false of being a value type. So it only accepts true, you would get into this weird distinction of does true or false, mean exactly true or false? Or do you get the same behaviour as using the boolean type? Derick Rethans 4:07 So I would say that true or false would than be more restrictive than bool. George Peter Banyard 4:12 Exactly, which is a bit of a problem, because PHP internally has true and false and separate types, which also makes the implementation of this RFC extremely easy, because PHP already makes the distinction of them. But at the same time, the boolean type is just a union of the bitmask of true and false. You can't really distinguish between the types, true or false, or the boolean type within the type system. Currently just does it by checking if it only has one then it can do like two checks. Specifically, you would need to add like an extra flag. I mean, it's doable, but it's just like, Well, who knows which semantics we want? Therefore, just leave it for future discussion because I'm not very keen on it to be fair. Derick Rethans 4:55 True or false are really only useful for return values and not so much for arguments types, because if you have an argument that that always must be true, then it's kind of pointless to have of course. George Peter Banyard 5:05 Same as like it was with the null type RFC. Although there might be one case where PHP internal functions might change the value to true for an argument, I can maybe two types, would be like with the define function, this thing being like case insensitive or case sensitive, I don't remember what the parameter actually; could actually either be false or true, because at the moment, I think emits a notice, things do like the this thing is not supported, therefore the values what was ignored. But we could conceivably see that in PHP 9, we would actually implement this as a proper like: Okay, this only accepts true, yes, this argument is pointless, but it's in the middle of the function signature, so you can't really move it. The spl_register_overload function has like as its second argument, the throw on error or not, which since PHP 8 only accepts true, but it's in the middle of the function. The last argument is still very useful. It's prepend, instead of append the autoloader, I think, or might be the other way around, check the docs. Since PHP 8, this only accepts true. So if you pass in false, it will emit a notice and saying you'd like this argument has just been ignored. So whatever. But we can't really remove the argument. Because well, it's, if you use the third argument, as with positional arguments, then you would change like the signature and you would break it. Now, we don't have a way to enforce in PHP to use named arguments, because that would be a solution. It's just like, well, if you want to set this argument, you need to use named arguments, but we can't do that. Otherwise, then creating a new function, which has an alias, which is also kind of terrible. That would be one of the maybe only cases where you would actually get like true as a as an argument Derick Rethans 6:39 is that now currently bool? And there's a specific check for it? George Peter Banyard 6:42 It's currently bool, and if you pass in false enrolment, like a warning, or notice. Derick Rethans 6:47 How would inheritance work? As return types, you can always make them smaller, right? More restrictive. George Peter Banyard 6:53 Yes, that's also the thing. But that already exists in some sense a problem of. Like if you go from boolean to false, you're already restricting the type. And that problem existed, even before the restricting, well allowing false as a stand-alone type if you had like, as a union, because you could always say like, I don't know. That problem already existed with Union types. Because you could have something like overturn an array or bool and then you change it to either an array or false. And then if you try to return like zero, then you will get like a coercion problem. So the same problem applies with true, because it only affects return values. And like you control the code within a function compared to like how you pass it, that's less of an issue. It applies also, with argument types where you can go from true to like boolean, or true and like a union type. Derick Rethans 7:44 So there's nothing surprising here. I see that the RFC also talks a little bit about future scope. Can you tell a bit more about that? George Peter Banyard 7:53 True and false are part of what are called value types, they are a specific value within the type. One possible future scope would be to expand value types to all possible types. So that you could say, oh, this function returns one, two or three. Derick Rethans 8:08 Would you not rather use an enum for that? George Peter Banyard 8:09 Exactly. That's the point I was going to make is that enums serve this purpose, in my opinion. And as a type purist, ideally, I would have preferred that we didn't have to enforce because the code, it kind of goes against the grain in this sense. Derick Rethans 8:23 We've had it for 25 years, booleans. George Peter Banyard 8:26 Yes, right. But boolean is its own type, in some sense, which you could say is a special enum. Enums are types. But we have false, and not having true is just so weird to me. It's like, oh, you've got this thing, but you don't have this other thing. And there are loads of cases where functions return true, or due to legacy reasons and to preserve BC, and PHP 8 promoted a bunch of warnings to to error. So now you've got functions which used to return false, don't return false any more. And they only return true. Now, some of the famous examples are probably like array_sort of, like actually, the sorting array functions, return true for basically all of PHP 7. I think there was something changed in PHP 7, probably was the engine or something like that, that they stopped returning false, which is strange. And I've made the discovery somewhat recently, I'm like, this is so pointless, because you see loads of loads of code checking like that the return value of the sort function is correct. Derick Rethans 9:20 It's also that most of the sort functions actually sort by reference instead of returning the sorted array, which I can understand as a performance reason to do but... George Peter Banyard 9:29 it's not very functional. You modify stuff in place and like passing it around. And because yeah, I think the initial thing was that like, well do it would return a false or true because sometimes it could, the sort could fail. Derick Rethans 9:42 I don't understand how a sort could failure, but there we go. George Peter Banyard 9:46 I mean, I suppose if you have like incomparable values within the array like that somewhat logical, I suppose. Derick Rethans 9:53 Was there anything else in future scope? George Peter Banyard 9:56 One of the future scope, I feel was everything type related. It's like type aliases, because when you start making more complicated types, having a way to type alias, it is probably nice. Don't think we'll get this for PHP 8.2. I don't think we any of us had the time to work on it. Derick Rethans 10:11 Well, we only have a month left anyway. George Peter Banyard 10:13 Yeah. And I mean, I'll probably be back on here. I'm trying to get DNF types working, but... Derick Rethans 10:19 Can you explain that these are? George Peter Banyard 10:20 Disjoint normal form types? Derick Rethans 10:22 That did not help. George Peter Banyard 10:24 But it's the being able to combine union types with intersection types together, Derick Rethans 10:28 I can understand that doing that is kind of complicated. You also need to sort of come up with a with a language to define them almost right? I mean, you then get the argument, are you going to require a parenthesis around things? George Peter Banyard 10:38 I'm requiring parentheses. People have told me the argument of like: Yeah, but in maths like and takes priority, it's just like, have you seen mathematicians, mathematicians don't agree on notation, and it's terrible, or they call stuff and the different they call it something is like, oh, sometimes a ring is commutative, and sometimes it's not. Don't follow mathematicians, don't follow mathematician, Derick Rethans 10:57 Type aliases is something that would only apply to single files. See, that's what you're suggesting. And then there's exported type definitions, which I guess could be autoloaded at some point; would be nice to have, I guess. George Peter Banyard 11:09 I think that's the trouble just like defining the semantics. Type aliases within a file are nice, but they're not very useful. Most of the time, you would want to export the type. For example, if you say: Oh, I accept, I don't know, something which looks like an array, which is like an array and like Traversable, and ArrayAccess or something. I'm sure, it's nice to have it in your own file. But like, if you use it around a project, and you need to redefine the type, every single file kind of defeats the purpose. Derick Rethans 11:35 It's kind of tricky to do with type definitions, because you sort of need to make sure that there are available and maybe can be autoloaded, just like classes can be right. And that makes things tricky. Because having a type definition and just three lines in a file, is kind of annoying, but I guess that is sort of necessary to do the kind of thing in a PHP ish way. George Peter Banyard 11:55 Yeah, we talked about it with Ilija because he he was on about it. And I was like: Well, ideally, you would want the separate autoload of types. That's how I initially conceived it, it's like having a different autoloading for types. But then the problem is, is like if anytime you hit a class, like in an argument, if you autoload the type first, it will go through all of the type definitions. And if, okay, at the moment, that wouldn't be there wouldn't be much. But if you go into like importing 30 composer projects, or libraries, which are define their own types, it will go through all of those first, before going to the classes autoloaded, and trying to find it then, which is not ideal. Yeah, it's going to be a tricky problem. It's either you merge these symbols together. But then the class table is not always a class. And sometimes you can't do new type. Like I said, tricky problems. Derick Rethans 12:43 Yeah, that's a tricky problem, but an interesting one. George Peter Banyard 12:47 Yeah. Derick Rethans 12:47 So that's future scope then. George Peter Banyard 12:50 Exactly. That is future scope. Derick Rethans 12:52 Do you have anything else to add? George Peter Banyard 12:54 Um, no, not really. I think I've said all I have to say it's pretty straightforward. Should be uncontroversial, hopefully. Derick Rethans 13:02 It currently looks like it's 20 for, and zero again. So I guess it will pass. George Peter Banyard 13:07 Brilliant. Derick Rethans 13:08 Who said that, that if your RFC ends up passing unanimously, it is too boring? George Peter Banyard 13:13 Nikita. Derick Rethans 13:14 Which is not incorrect. George Peter Banyard 13:16 It is not incorrect. But I mean, at the beginning, because I was like: Well, this is pretty straightforward. So I wrote the RFC, it was tiny. And I put it on to the list and people was like: Yeah, but what's the motivation for? I understand for adding false, because they already exist. But what's the motivation for adding a new type, and I was like, I now need to go back to the drawing board and write more. To be fair, that was a smart, because I then discovered the whole issue about true and false. That false is just a value type and doesn't do coercions. And it's like, okay, how do you handle the semantics and everything? Derick Rethans 13:46 I'm glad to hear it. Then all I have to say thank you for taking the time today to talk about this new RFC. George Peter Banyard 13:53 Thank you for having me as usual. Derick Rethans 13:59 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@php internals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Add True Type RFC: Allow Null and False as Standalone Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 101: More Partially Supported Callable Deprecations London, UK Thursday, May 19th 2022, 09:05 BST In this episode of "PHP Internals News" I talk with Juliette Reinders Folmer (Website, Twitter, GitHub) about the "More Partially Supported Callable Deprecations" RFC that she has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 101. Today I'm talking with Juliette Reinders Folmer, about the Expand Deprecation Notice Scope for Partially supported Callables RFC that she's proposing. That's quite a mouthful. I think you should shorten the title. Juliette, would you please introduce yourself? Juliette Reinders Folmer 0:37 You're starting with the hardest questions, because introducing myself is something I never know how to do. So let's just say I'm a PHP developer and I work in open source, nearly all the time. Derick Rethans 0:50 Mostly related to WordPress as far as I understand? Juliette Reinders Folmer 0:52 Nope, mostly related to actually CLI tools. Things like PHP Unit polyfills. Things like PHP Code Sniffer, PHP parallel Lint. I spend the majority of my time on CLI tools, and only a small portion of my time consulting on the things for WordPress, like keeping that cross version compatible. Derick Rethans 1:12 All right, very well. I actually did not know that. So I learned something new already. Juliette Reinders Folmer 1:16 Yeah, but it's nice. You give me the chance now to correct that image. Because I notice a lot of people see me in within the PHP world as the voice of WordPress and vice versa, by the way in WordPress world to see me as far as PHP. And in reality, I do completely different things. There is a perception bias there somewhere and which has slipped in. Derick Rethans 1:38 It's good to clear that up then. Juliette Reinders Folmer 1:39 Yeah, thank you. Derick Rethans 1:40 Let's have a chat about the RFC itself then. What is the problem that is RFC is trying to solve? Juliette Reinders Folmer 1:46 There was an RFC or 8.2 which has already been approved in October, which deprecates partially supported callables. Now for those people listening who do not know enough about that RFC, partially supported callables are callables which you can call via a function like call_user_func that which you can't assign to variable and then call as a variable. Sometimes you can call them just by using the syntax which you used for defining the callable, so not as variable but as the actual literal. Derick Rethans 2:20 And as an example here, that is, for example, static colon colon function name, for example. Juliette Reinders Folmer 2:26 Absolutely, yeah. Derick Rethans 2:27 Which you can use with call_user_func by having two array elements. You can call it with literal syntax, but you can't assign it to a variable and then call it. Do I get that, right? Juliette Reinders Folmer 2:36 Absolutely. That's it. There's eight of those. And basically, the original RFC from Nikita proposed to deprecate support for them in 8.2, add deprecation notices and remove support for them altogether in PHP nine. And the original RFC explicitly excluded two particular things from those deprecation notices. That's the callable type and using the syntaxes in combination with the is_callable function, where you're checking if the syntax is callable. The argument used in the original RFC was to keep those side effect free. The problem with this is that with the callable type, this means you go from absolutely no notice or nothing, to a fatal error in PHP 9. Everything works, and you're not getting any notification. But in PHP 9, its fatal error at the moment that callable is being passed to a function. Derick Rethans 3:31 This is the callable type in function declarations. Juliette Reinders Folmer 3:33 Yeah, absolutely. And with is_callable, I discovered a pattern in my wanderings across the world where people use the syntax in is_callable, but then use it in a literal call. So not using call_user_func, not using a variable to call it, but it's callable static double colon method name, and then called static double colon method name as literal. And that pattern basically, for valid calls would mean that that function would no longer be called in PHP 9 without any notification whatsoever. Derick Rethans 4:13 So it's a silent change that you can't detect at all. Juliette Reinders Folmer 4:17 Yeah, which to me sounded dangerous. I started asking some questions about that. But six weeks ago, the conclusion was, well, maybe this should be changed. But as this was explicit in the original RFC, we can't just change it. We need to have a new RFC to basically amend the original RFC and remove the exception for these two situations and allow them to throw deprecation notices. Derick Rethans 4:44 What are you proposing to change with this RFC than? Juliette Reinders Folmer 4:47 What this RFC is proposing is simply to remove the exception that the callable type and is_callable are not throwing a deprecation notice. This RFC is proposing that they should throw a deprecation notice, so that more of these type situations can be discovered in time for PHP 9 to prevent users getting fatal errors. Derick Rethans 5:08 Now, of course, we have no idea when PHP nine is actually showing up, but I don't think it will be this year. Well, I know it won't be this year, and it certainly won't be be next year neither, I think. Juliette Reinders Folmer 5:17 That's all the same. I mean, it makes there'll be two, three years ahead, but it doesn't really make sense to have the main deprecation in 8.2 and then have the additional deprecation in 8.4 or something. Derick Rethans 5:29 Absolutely. Juliette Reinders Folmer 5:30 It's a lot more logical to have it all in in the same version. Because it's all related. It's basically the same thing without the exception for callable type. And is_callable. Derick Rethans 5:42 Although there is no current application, would this be able to be found if you had like a comprehensive test suite? Juliette Reinders Folmer 5:48 Yes and no. Yes, you can find this with a test suite. But one, you're presuming that there are tests. Two, that the tests covered the effected code with enough path coverage. Three, imagine a test you've written yourself at some point in the past where which affected callables, you might have, you know, a data provider where you say: Okay, valid callable function, which you've mocked or, you know, closure, which you've put in and second, this function does not exist. Okay, so now you're testing this function, which at some point in its logic has a callable, and expects that type to receive that type. But are you actually testing with the specific deprecated partially supported callables? Even if you have a test, and the test covers the affected code, if you do not test with one of these eight syntaxes, which has been deprecated, you still cannot detect it. And then, four, you still need to make sure that the tests are routinely run, and in open source, that's generally not a problem. Most open source projects, use GitHub actions by now to run the tests automatically on every pull request, etc. But, have the tests been turned on to actually run against PHP 8.2. Are the tests run against pull requests? I mean, there are still plenty of projects, which don't do that kind of thing. Yes, you can detect it with a good test suite. But there's a lot of caveats when you will not detect it. And more importantly, you will not be able to detect it until PHP 9. Derick Rethans 7:23 Yes, when your code and stops behaving as you were expecting it to be. Juliette Reinders Folmer 7:28 Yeah, because in 8.2, you're gonna get deprecation notices for everything else, but these two situations. But not in 8.2, not in 8.3, not in 8.4, and then whatever eights we're gonna get until nine, you will not be able to detect without deprecation notices, until PHP 9 actually removes support for these partials deprecated callables. Yes, but no. Derick Rethans 7:53 We already touched a little bit on how you found out for the need for this RFC or for changing behaviours. But as people have stated in the past, adding deprecation notices is a BC break. That's a subject that we will leave for some other time because I don't necessarily believe that. But would, the changes in your RFC not add more backwards compatibility issues? Juliette Reinders Folmer 8:14 The plain and simple the backward compatibility break is in the original RFC. That's where the deprecation is happening. This RFC just makes it clearer where the BC break is going to be in PHP 9. It's not PHP 8.2, which has a backward compatibility break. It's PHP 9 which will have to backward compatibility break. Yes, I've heard all those arguments, people saying deprecation notes are BC break, no they're not. But they are annoying. And Action List, to for everything that needs to be fixed before 9. Given big enough projects, you cannot say: Okay, I'm gonna do this at the last moment, just before 9 comes out. It literally means 10 months of the year I for one am working on getting rid of deprecation notices in project to prepare them all to be ready for PHP 9 when PHP 9 comes round. Derick Rethans 9:06 But it's still better to have them than to not,.and then you code starts breaking right? Because that is exactly why you're proposing this RFC as far as I understand. Juliette Reinders Folmer 9:16 Yes, absolutely. I mean, I'm always very grateful for deprecation notices, but it would be nice if we had fewer changes, which would cost them, for a year or two, so I can actually catch my breath again. Derick Rethans 9:28 I think PHP 8.2 will have fewer of these changes in there. There will still be some of course. Juliette Reinders Folmer 9:35 Well, I mean, this one is one deprecation. And then we have the deprecated Dynamic Properties and that one is already giving me headaches before I can actually start changing it in a lot of projects. I'm not joking, that one really is going to cause a shitload of problems. Derick Rethans 9:51 It's definitely for products have been going on for so long, where dynamic properties are used all over the place. And I see that in my own code as well. I just noticed this morning does actually breaks Xdebug. Juliette Reinders Folmer 10:03 I know it's currently breaking mockery, we're gonna have to have a discussion how to fix that or whether or not to fix it. If Mockery is broken, that means all your tests are broken. So the test tooling needs to be fixed first. Derick Rethans 10:18 That's always the case, if you work with CLI tools that make people run code on newer PHP versions, that's always a group of tools that needs to be upgraded first, which is your sniffers, your static analysis, your debugger still will always need to go first. Juliette Reinders Folmer 10:27 Which is why I look at things a lot earlier, probably then the majority of people. I mean, I see him huge difference between the open source and closed source community. For open source, I started looking at it well, I've been looking at 8.2 since the beginning. And I started running tests for all the CLI tools. As soon as 8.1 comes out, 8.2 gets added to the matrix for running in continuous integration. And then for applications, it gets added like in you know, once alpha 1-2-3 has come out. For the extensions, it gets added in September once the first RFC gets added. And all of them are trying to get ready before the release of 8.1 or 8.2 in this case, because you do not know as an open source maintainer, what version people are going to run your code on. And you can say IP, you can manage that via Composer, no you can't. Sorry, you can only do that if your users are actually installing via Composer. If your users are downloading a zip file, and uploading it to a web host via FTP, there's literally no way you can control whether they're running on 8.0, or 8.1, except for maybe during check: You cannot run on 8.1 yet. Derick Rethans 11:52 Upgrading software with version support is an issue that's been going on for 40 years and will go on for at least another 40 more. This is not a problem that we can solve easily. Juliette Reinders Folmer 12:03 But what I see there is like the closed source community is like, oh, yeah, but you know, by the time I want to upgrade my server to 8.1, or 8.2, I just run Rector and all will be fine. And I'm like, yeah, sorry, that does not work for open source. We need cross version compatible with multiple versions. And I try to keep that range of version small for the project, I initiate, I don't always have control over it. If for instance, one of the projects I maintain is Requests. And that's a project which does HTTP requests. It's used by WordPress, it cannot be let go of the minimum of 5, PHP 5.6, until WordPress, lets go of that. Derick Rethans 12:48 Well, the alternative is that WordPress uses an older version until it can let go of it. Juliette Reinders Folmer 12:54 Yeah, the only problem then is that we don't want to maintain multiple stable branches. For security fixes. Derick Rethans 13:03 For Xdebug, what I do is I support what the PHP project support when a PHP release comes out, which is a bit longer than PHP itself usually, but not by much more than a year or two. Juliette Reinders Folmer 13:15 I understand that. And I mean, I applaud Sebastian for at some point, having the guts to say to the community, I'm limiting the amount of versions I'm supporting. And I'm sticking to the officially supported PHP versions. That does not mean that that didn't give a large part of community which does need to support a wider range of PHP versions a problem. I fully support that people limit the amount of fish and stay support and like Sebastian, who I know got half the community up in arms against him when he said, I'm not going to support older PHP versions any more. It did create a problem and but the problem which I've tried to solve for instance with the PHP unit polyfills, which now is solvable by using the PHP Unit polyfills in quite a transparent way, which is helpful for everyone. It takes the complainers of Sebastian's back, and at the same time, it allows them to run the test. Derick Rethans 14:10 I think another good thing that Sebastian recently has done is make sure that deprecation notices are no longer failing your tests. Juliette Reinders Folmer 14:17 I don't agree. The thing is, I do understand him making that change. But changing that default from not showing those deprecation notices or not not allowing deprecation notes to fail the test, or not in a patched version, I don't think was the right thing to do. That should have been in a minor, let alone or maybe even in a major not in a 9.5.18 patch version. Also with the whole idea, I mean, again, this is very much an open source versus closed source discussion for closed source I completely understands that people say I don't want to know until I actually am ready to upgrade to that version. Derick Rethans 14:56 I understood it's more of a difference not necessarily between open and closed source, but rather between library maintainers and application maintainers. And the applications can then also be closed source. Juliette Reinders Folmer 15:06 The open source work I work in, I mean, I do want to see them. And the problem with the deprecation notices anyhow, and I've seen various experiments via Twitter fly past for the past year. Say you build something on top of something else, you want to see the deprecation notices and the errors which apply to your code. We don't want to see the ones which come from the framework on which you build on top. The silencing deprecation notices or not, allow tests to error out on deprecation and just not solve that problem. Derick Rethans 15:39 The only thing it does is make things a little bit less noisy so that fewer people complain to library authors isn't it? That's pretty much what it does. Juliette Reinders Folmer 15:48 The thing would I see what it has done is that people think the tests are passing. Derick Rethans 15:54 Well they are passing, but... Juliette Reinders Folmer 15:56 Yeah, but most people don't read change logs of PHP unit, especially as releases don't get actually have to change log included. When PHP Unit releases its actual release, it doesn't actually post a release on GitHub. So people who watch the PHP unit repo for releasing doesn't, don't get notifications, let alone a changelog. So they actually have to go into the repo to find out what has changed. Most people don't do that. They just get you know depend-a-bot update, which won't say much, because again, it doesn't have release information. Derick Rethans 16:28 It'd be nice, maybe if Composer ,when you upgrade packages, that it can show like the high level changes when you do an upgrade. The Debian project does that if you upgrade packages that have like either critical or behavioural changes, you actually get a log when you run the update. Juliette Reinders Folmer 16:43 And then the change should have been in major or minor, because in a patch release, you don't expect it kind of changes. I also know the struggle there. They've been going through to four PHP units and which is similar to what I'm struggling with with the amount of changes from PHP 8.0 and 8.1 which has to be deal dealt with. Projects are being delayed, we're having trouble keeping up as an open source community, we still need to look after our own mental health as well. Derick Rethans 17:10 What has the feedback been to far on the RFC or non? Juliette Reinders Folmer 17:13 The feedback on this particular RFC has been next to nothing. And that's not surprising. I mean, basically, the discussion has happened before. And I started the discussion six weeks ago, eight weeks ago, which led to this RFC. So far the responses, which I have seen, either on Twitter or in private or in our people will read through the RFC. They're like, yeah, it makes sense. Derick Rethans 17:37 I think this is quite a nicer way of getting RFCs done, you discuss them first. And if there's then found a need actually spend a time on writing an RFC. In other cases, the other way around happens, right? People write a long, complicated RFC, and then complain that nobody wants to talk about it. Juliette Reinders Folmer 17:53 When I started the previous discussion, it was I see this, I noticed this, was this discussed? And then I got back: yeah, nobody actually discussed the previous RFC and I'm like: Okay, so what's this whole point about under discussion if nobody's discussing? Derick Rethans 18:10 Well, you can't force people to talk, of course. Juliette Reinders Folmer 18:14 It does make me wonder, again, what we were talking about before, people who work in managed environments versus people who will have to support multiple PHP says, I sometimes wonder how many people who actually have voting rights work in those closed environments, and think, you know, upgrading is something you do with Rector. Now I have a feeling that often open source gets a little forgotten. Derick Rethans 18:38 Yeah, that's perhaps true. Thank you for taking the time this morning to talk about this RFC then. Juliette Reinders Folmer 18:44 Thank you Derick for having me. It was a pleasure to do you like always. Derick Rethans 18:49 Thanks. Derick Rethans 18:54 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Expand deprecation notice scope for partially supported callables Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 100: Sealed Classes London, UK Thursday, March 24th 2022, 09:04 GMT In this episode of "PHP Internals News" I talk with Saif Eddin Gmati (Website, Twitter, GitHub) about the "Sealed Classes" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is episode 100. Today I'm talking with Saif Eddin Gmati about the sealed classes RFC that they're proposing. Saif, would you please introduce yourself? Saif Eddin Gmati 0:31 Hello, my name is Saif Eddin Gmati. I work as a Senior programmer at Les-Tilleuls.coop. I'm an open source enthusiast and contributor. Derick Rethans 0:39 Let's dive straight into this RFC. What is the problem that you're trying to solve with it? Saif Eddin Gmati 0:43 Sealed classes just like enums and tagged unions allow developers to define their data models in a way where invalid state becomes less likely. It also eliminates the need to handle unknown subtypes for a specific model, as using sealed classes to define models gives us an idea on what child types would be available at run time. Sealing also provides us with a way for restricting inheritance or the use of a specific trait. For example, if we look at logger trait from the PSR log package that could be sealed to logger interface. This way, we ensure that every use of this trait is coming from a logger not from any other class. Derick Rethans 1:24 I'm just reading through this RFC tomorrow, again, and something I didn't pick up on reading to it last time. It states that PHP already has sort of two sealed classes. Unknown Speaker 1:35 Yes, the throwable class in PHP can only be implemented by extending either error or exception. The same applies for DateTime interface, which can only be implemented by extending DateTime class or DateTime Immutable class. Derick Rethans 1:52 Because PHP itself doesn't allow you to implement either throwable or DateTimeInterface. I haven't quite realized that that these are also sealed classes really. What is sort of the motivation behind wanting to introduce sealed classes? Unknown Speaker 2:06 The main motivation for this feature comes from Hack the programming language. Hack contains a lot of interesting type concepts that I think personally, PHP could benefit from and sealed classes is one of those concepts. Derick Rethans 2:18 What kind of syntax are you proposing? Saif Eddin Gmati 2:21 The syntax I'm proposing actually there is three syntax options for the RFC currently, but the main syntax is inspired by both Hack and Java. It's more similar to the syntax used in Java as Hack uses attributes. Personally, I have been I guess, using attributes from the start as I personally see sealing and finalizing similar as both effects how inheritance work for a specific class. Having sealed implemented as an attribute while final uses a keyword brings more inconsistency into the language which is why I have decided not to include attributes as a syntax option. Derick Rethans 2:56 In my opinion, attributes shouldn't be used for any kind of syntax things. What they should be used for is attaching information to already existing things. And by using attributes again, to extend syntax, you sort of putting this syntax parsing in two different places , right? You're putting it both in the syntax as well as in attributes. I asked what the syntax is, but I don't think he actually mentioned what the syntax is. Saif Eddin Gmati 3:20 The syntax the main set next proposed for the RFC is using sealed and permit as keywords we first have the sealed modifier which is added in front of the class similar to how final or abstract modifiers are used. We also have the permit clause which is basically a list allows you to name a specific classes that are able to inherit from this specific type. Derick Rethans 3:43 So when you say type here, is that just interfaces and classes or something else as well? Saif Eddin Gmati 3:48 It's classes interfaces and traits. Traits are allowed to add sealing but they are not allowed to permit. Okay for example, an interface is not allowed to permit a trait because a trait cannot implement an interface Derick Rethans 4:03 In the language itself, when does this get enforced? Saif Eddin Gmati 4:06 This inheritance restriction gets enforced when loading a class. So let's say we are loading Class A currently if this class extends B, we check if B is sealed. And if it is we check if B allows A to extend it. But when loading a specific sealed class, nothing gets actually checked. We just take the permit clause classes and store them and move on. Derick Rethans 4:32 It only gets checks if you're trying to implement an interface. Saif Eddin Gmati 4:36 This gets enforced when trying to implement an interface, extend that class, or use it trait. Derick Rethans 4:41 Okay. What are general use cases for this feature? Saif Eddin Gmati 4:45 General use cases for a feature are for example, implementing programming concepts such as Option which is a type that can only have two subtypes. One is Some, other is None. Another concept is the Result where only two subtypes are possible, either success or failure. Another use case is to restrict inheritance. As I mentioned before, for example, logger trait from the PSR log package is a trait that implements some of the method methods in logger interface, and expects whoever is using that trait to implement the rest. However, there is no restriction by the language regarding this, we can seal this trait to a logger interface ensuring that only loggers are allowed use this trait. Derick Rethans 5:34 When you say that Option has like the value Some or None, just sound like an enum to me. How should I think differently about enums and sealed classes here? Saif Eddin Gmati 5:43 Enums cannot hold a dynamic value. You can have a value but you cannot have a dynamic value, however, tagged unions will allow you to implement option the same way. Tagged unions are that useful only for this specific case, there is some other cases such as the one I mentioned for traits that cannot actually be implemented using the tagged unions. There is also the I don't know how to say this. Let's say we have a type A that sealed and permitting only B and C. And this case A on itself, as long as it's not an abstract class, is by itself a type. Can be used as a normal class, you can create an instance and use it normally. However with tagged unions, the option itself would not be a type, you either have some or none. That's the main difference between tagged unions until classes Derick Rethans 6:37 A tagged union PHP doesn't have them. So how does a tagged union relate to enums? Saif Eddin Gmati 6:43 With tagged unions as the, there is an RFC that's still in draft, I suppose that uses actually it is built on top of enums that that's why. Derick Rethans 6:55 I reckon once that gets closer to completion, I'll end up talking to the author of that RFC. So something I'm wondering, can a sealed type permit only one other type? Or does it have to be more than one? Saif Eddin Gmati 7:10 No, it can permit only one type. Let's say we have class A that only permits B. However, another thing is class B does not actually have to extend A, like if A is permitting B, B does not actually have to implement A. It's still useful because another class called C can extend B and implement A, so an instance of A B can still exists. Derick Rethans 7:36 I'm not quite sure whether I understood that. If you have an interface that says A permits B, then B is not required to implement A, mostly because the moment you loads class B, you don't even know it exists, right? Because it doesn't refer to it. Saif Eddin Gmati 7:54 Yes. Derick Rethans 7:55 It's just going to break anything? Saif Eddin Gmati 7:57 Hopefully not. The only break would be in the new reserved keywords which are sealed and permits. So those cannot be used as identifiers any more, but depending on the syntax choice, if for example, the second syntax choice wins which that would only take the permits keyword. If the third syntax choice is chosen then no new reserved keywords will be introduced so there will be no breaks. Derick Rethans 8:29 From what I see in the RFC the first syntax is using both sealed in front of a as a marker and then using permits. With the second syntax, you don't use seal but you infer that it is sealed from the permits keyword I suppose. And then in the last option you use the for keyword instead of permits and also don't use sealed yet? Saif Eddin Gmati 8:51 The third syntax choice is will be the one with no breaks as we will not be introducing any new keywords; for is already a reserved keyword in PHP. Derick Rethans 9:02 What is your preference? Saif Eddin Gmati 9:03 Personally I prefer the first syntax choice as it's the most explicit. When you start reading the code you can tell from the start this is a sealed class without having to continue reading until you reach permits. Derick Rethans 9:15 I think I agree with you there. Beyond the syntax is there anything else that needs to be changed in PHP itself? Saif Eddin Gmati 9:22 The only other change that will be introduced in PHP is in reflection class. A new method called isSealed will be added to reflection method, which allow you to check if a class the class being reflected is sealed. Another method will be added called getPermittedClasses which returns the list of class names provided in the permits clause. Also a new constant should be added to reflection class that is is_sealed constant which exposes the bit flag used for sealed classes. Some changes will happen to the getModifiers method in reflection class. This method will return the bit flag is sealed set, if the class being reflected is sealed. The getModifierNames method will also return the string sealed if the bit is set, that should be about it. Derick Rethans 10:12 Basically everything that you need in reflection to find out whether it's a sealed class and other permits. Saif Eddin Gmati 10:18 Yes. Derick Rethans 10:20 See, I see the name of getPermittedClasses has to use, has the word classes in it. Does that mean that the types after permits have to be classes? Saif Eddin Gmati 10:32 No, they can be either classes or interfaces. But PHP refers to both classes and interfaces as classes in the reflection. So we have a reflection class, but that's actually a reflection trait class interface. And basically everything is class-ish. Derick Rethans 10:47 Class-ish. I like that. Did you look at some other alternatives to implementing the same feature or just the three syntax choices that you came up with? Saif Eddin Gmati 10:56 I did not consider any other alternatives precisely as the alternatives might be type aliases, tagged enums, package visibility. But I think each of these RFCs focused on a specific problem and expanding that area, while sealed classes focuses on all the problems mentioned on in this RFC tries to solve them in a minimal way. But only in relation to inheritance in classes, interfaces, and traits. Derick Rethans 11:24 Keeping it short and sweet. What has the feedback been so far? Saif Eddin Gmati 11:29 The feedback has been pretty mixed. Some people are against adding more restriction to types and inheritance. But in my opinion, this is not about adding restriction, but rather providing the user with the ability to add restrictions. And we already have final classes, which a lot of people seem to dislike. Derick Rethans 11:48 I don't understand why. But fair enough. Saif Eddin Gmati 11:51 I have created a community poll a couple of weeks ago to gather feedback on Twitter. The results were 60% for with over 150 participants. Another poll was created by Peter on Facebook ended with 54 of people voting yes. However, such polls that do vary depending on the audience. So it can be really an accurate representation of the PHP community. Derick Rethans 12:15 Polls on Twitter are never scientific, or they? I see that the RFC is in voting already. So for people listening to this, and if you have voting rights, then you have until when exactly? Saif Eddin Gmati 12:28 Until the end of the month. Derick Rethans 12:30 March 31. It says yes. Okay. Well, thank you very much for taking the time today Saif about sealed classes. Saif Eddin Gmati 12:37 Thank you for having me. Hopefully, I get to be here another time in the future. Derick Rethans 12:42 I hope so too. Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Sealed Classes Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 99: Allow Null and False as Standalone Types London, UK Thursday, March 10th 2022, 09:04 GMT In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Allow Null and False as Standalone Types" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 99. Today I'm talking with George Peter Banyard, about the Allow null and false at standalone types RFC that he's proposing. Hello, George Peter, would you please introduce yourself? George Peter Banyard 0:36 Hello, my name is George Peter Banyard. I work on the PHP language, and I'm an Imperial student in maths in my free time. Derick Rethans 0:44 Are you're trying to say you're a student in your free time or contribute to PHP in your free time? George Peter Banyard 0:49 I feel like at this time, it's like, both are true at the same time. Derick Rethans 0:53 Let's hop into this RFC. It is titled allow null and false as standalone types. What is the problem that it is trying to solve? George Peter Banyard 1:02 This is the second iteration of this RFC. So the first one was to just allow null initially, and null is the unit type In type theory parlance of PHP, ie the type which only has one value. So null is a type and a value. And the main issue is that when for leads more with like inhabitants, and like the Liskov substitution principle. If you have like a method, like the parent method, which can be told like either null or an object, and your implementation in a child class always returns null, for various reasons, maybe because it doesn't support this feature, or whatever is out, or... If your child method only returns null, currently, you can't document, that you can't type this properly, you can document it in a doc comment or something like that. But due to how PHP type handling works, you need to specify at least like another type with null in the union. Basically resort to always saying like mimicking the parent signature, when you could be more specific. This was the main use case I initially went into. Derick Rethans 2:08 If I understand correctly, you can't just have an inherited method that has hinted as to just return null? George Peter Banyard 2:14 Exactly. If you always return null, maybe because you always work or something like that, then you must still declare the return type as like null or exception, which is not a concrete because you say what, like why never fail. And like static analysers, if they can figure it out that you're using a child class, they can't maybe like do some assumptions or work further down that like what you're doing is redundant or things like that. So that's one of the main reasons I initially went with it. And I didn't add false initially, because it was like, well, false, it's not really a type properly. It's, it's what's called a value type. False is one value from the Boolean type. And I was like, Well, okay, we're just going to limit it to like, being the type theory purist, limited to proper types, where null is a proper type, although it's a bit sometimes misunderstood, I feel in the PHP community at large. And then people were like, well, if we add null, then by the only type-ish thing, which you can use in a type declaration, or whatever, which can't be used in a return type on its own, is false. And it's just weird. So why not add it in full. So that was the second thing as to why I added it. Some of PHP internal's functions being terribly designed because they were designed back in the early noughties, return null on success and false on failure, which you can't probably type at the moment. Currently, we need to type them as like Boolean or null, but true can never be returned in this case. And there are some other some other people have reached out to me it's like, well, yeah, but I always return false in this case. Or I also return always true in this case, although true, we have this weird asymmetry that we have false as a value type and not true. Derick Rethans 3:49 What was the reason for having false but not true? George Peter Banyard 3:53 When the union type RFC got discussed and passed for PHP 8.0, false was added, because a lot of traditional behaviour of PHP internal functions, was to return false on failure, instead of the technically more correct thing would be to return null. Because loads of functions return a false on failure, and saying that like in returns, these types, or a Boolean would be basically lying because you could never have true, false was included in it. With the restrictions that you can only use false as the complement with other types. So you need to do for example, array, or false, you couldn't just use false. Derick Rethans 4:37 Would it also mean that you can define a return type of a method that inherited a method that returns a bool, as false? George Peter Banyard 4:48 Yes, that would be now possible with the amended proposal. Yeah, which goes back to this weird a symmetry, we're probably. Adding true to make a complete would be a future RFC to do. Derick Rethans 5:00 Now, we've talked about return types. But I guess the reverse applies to arguments? George Peter Banyard 5:06 Arguments and property types also would, would be allowed to, like declare themselves as like null or false. The usefulness here is way more limited. Because if you declare an argument to be of type null, then basically you can only ever pass a null to it. And then therefore, the type doesn't do anything. Derick Rethans 5:26 But in an inherited method, you could then widen it. George Peter Banyard 5:31 Yes, exactly. You could always say: Well, this argument exists, it's always null. If you extend like your class or message, then you can add other types. But in theory, you can already do that by adding like an argument at the end of the message, because that's LSP compliant. The case for, and properties of those, because they are typing, they're in like their beads. Kind of debatable why you would do that. But it's just that like, well, if you accept types at one point, just restricting them like somewhere else gets very weird. At this point is more like look at the human review, or like use static analysis for the analyser to tell you like this argument is redundant and just remove it or this property doesn't make any sense. Because if it can only ever be null, why does it even exist in the first place? Derick Rethans 6:13 Right now, you can already use false in union types, but why not with null or false? George Peter Banyard 6:19 That goes back to the when a union type RFC got introduced. Null got added as a keyword. Before you could only use the question mark, before a type to make the type nullable. If you have a more complex union type, to not use the question mark in front of it. Therefore, the null keyword got added as a proper type. And because the logic was, Well, you shouldn't ever be able to return just null. Because then that function is kind of equivalent to void. Because of that, it was said that like, Well, okay, null and false basically have like kind of the same status is that like, if you just want to use null on its own, you're doing something kind of weird. And if you're returning more than false, like that signature is very strange. I think when that was discussed, nobody knew initially that an actual PHP function within one of the extensions, like in core had such a weird signature. Which mainly, we just started discovering that after this got, like accepted and we could like actually start properly typing the internal functions, and then you discover these weird edge cases where sounds like, that's a bit strange, can't properly document it. We just need to make like a note on the PHP documentation side. And like the type signature kind of lies to you. PHP's type hierarchy is a bit strange, void kind of lives on its own. So if the function is marked as void, it must always like any child inheritance, or whatever needs to be void. And when you type return in the function body, you need to always use return with like a semicolon afterwards, you can't even return null. Although, under the hood, PHP will always return a value when you call a function, even if the function is void, which will be null. Derick Rethans 7:58 The RFC also talks about question mark null, what is that supposed to be? Is that null or null? George Peter Banyard 8:03 PHP has this limited type redundancy checks at compile time. It will basically check if you're duplicating types. So if you write for example, int or int, even if it's capitalized differently, PHP was like, okay, just specifying twice the same type in this union. This is redundant. And then it will throw a compile error, we're basically saying, maybe you're just doing a mistake, maybe you meant something else. In the same vein, basically the question mark, gets like, translated into like, any seeing pipe null. And so if you write null with a question mark in front of it, it's just saying like, well, you're doing null or null, which is basically redundant. Therefore, you'll get like a compile time error telling you like. Derick Rethans 8:41 That seems sensible to me. What's been the feedback so far? George Peter Banyard 8:45 The most feedback, I think I've got it when I first proposed it in October. And at the time, it was like, Yeah, this is useful. And it's kind of needed because well, having always more type expressiveness is I think, good in general. But the main feedback at the time was like, Well, why not include false? The other feedback I got was basically, well, for consistency, what shouldn't you also add true? Yes, I do agree with this. I frankly, find it very strange that we landed in this situation where we only have one of these value types, either true and false, or none of them would make more sense to me. But that's expanding the scope. And it's kind of not this is not really concerned with this specific detail. Probably next, another RFC to do, for either myself or somebody else. It's just like propose true as a value type. Derick Rethans 9:33 Is the implementation of this RFC complicated? George Peter Banyard 9:36 It's very simple. It basically removes checks, because currently in the compile step, which checks for like type signatures, it needs to check that like, Well, are you declaring false or are you declaring null, and these checks get removed, so it makes the code a bit more streamlined. Oh, there's one change in reflection. For backwards compatibility reason, because of the fact of the question mark, any union type which is composed of a only two types, where one of them is null,will get converted in reflection to use the question mark notation, which is kind of a bit weird because it then gets converted into like a name type instead of a union type in reflection. But that's there, because of backwards compatibility reasons. I am breaking this into the more sensible reflection type. If you have a type of null and false, then you'll get a reflection union type instead of a named. From my understanding from reading the reflection code, the intention was always probably in PHP 9, to remove this distinction. So if you get a named type, it's only a single type instead of a possible nullable type. And any nullable types get converted into like a reflection union type when you have like null and the other type. Maybe this is a good test case to see if your code breaks. Derick Rethans 10:50 I would probably call that a BC break though. George Peter Banyard 10:53 This only happens if you do false union null. You can't use false currently on its own. And I think like, if you get false, as a named argument type, with like a question mark in front of it. Because it would be completely new, and you would never deal with it. It's like, well, this can also break because false can be in the Union type. If your library or the tool supports union types with the reflection thing, it will automatically know how to deal with false because it needs to know how to deal with it. And null. So that was kind of also the logic. It's like, well, okay, like if the tool supports that, which it needs to, then if you put this case into that bracket, it will work. It makes kind of the reflection code a bit more complicated at the moment. The whole fact that we need to juggle between like figuring out like, should we use the old like the backwards compatible thing reflection of like using a name type instead of the Union type, if there's a null and depending on the type, makes a reflection code unwindy and everything. And we have like a special function in C, which is basically just like, okay, which object do I need to create, depending on this type signature? Derick Rethans 11:53 When do you think you'll be putting this up for a vote? George Peter Banyard 11:56 I suppose I could put it up for vote immediately now. I am planning on maybe putting this on to vote at the end of the week or something like that. Derick Rethans 12:04 Well, thank you for taking the time today to talk about this RFC. George Peter Banyard 12:09 Thank you for having me on the podcast. Derick Rethans 12:13 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Allow Null and False as Standalone Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 98: Deprecating utf8_encode and utf8_decode London, UK Thursday, March 3rd 2022, 09:02 GMT In this episode of "PHP Internals News" I chat with Rowan Tommins (GitHub, Website, Twitter) about the "Deprecate and Remove utf8_encode and utf8_decode" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP Internals News, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 98. Today I'm talking with Rowan Tommins about the "Deprecate and remove UTF8_encode and UTF8_decode" RFC that he's proposing. Hi, Rowan, would you please introduce yourself? Rowan Tommins 0:38 Hi, I'm Rowan Tommins. I'm a PHP software architect by day and try and contribute back to the community and have been hanging around in the internals mailing list for about 10 years and contributed to make the language better, where I can. Derick Rethans 0:57 Excellent. Yeah, that's how I started out as well, many, many more years before that, to be honest. This RFC, what problem is this trying to solve? Rowan Tommins 1:08 PHP has these two functions, utf8_encode and utf8_decode, which, in themselves, they're not broken. They do what they are designed to do. But they are very frequently misunderstood. Mostly because of their name. And because Character Encodings in general, are not very well understood. People use them wrong, and end up getting in all sorts of pickles that are worse than if the functions weren't there in first place. Derick Rethans 1:37 What are you proposing with the RFC then? Rowan Tommins 1:39 Fundamentally, I'm proposing to remove the functions. As of PHP 8.2, there will be a deprecation notice whenever you use them, and then in 9.0, they would be gone forever, and you wouldn't be able to use them by mistake, because they just wouldn't be there. Derick Rethans 1:56 I reckon there's going to be a way to actually do what people originally intended to do with it at some point, right? Rowan Tommins 2:02 So yeah, there are alternatives to these functions, which are much clearer in what you're doing, and much more flexible in what you can do with them so that they cover the cases that these functions sound like they're going to do, but don't actually do when you understand what they're really doing. Derick Rethans 2:20 I think we'll get back to that a little bit later on. You're wanting to deprecate these functions. But what do these functions actually do? Rowan Tommins 2:27 What they actually do is convert between a character encoding called Latin-1, ISO 8859-1, and UTF-8. So utf8_encode converts from Latin-1 into UTF-8, utf8_decode does the opposite. And that's all they do. Their names make it sound like they're some kind of fix all the UTF 8 things in my text. But they are actually just these one very specific conversion, which is occasionally useful, but not clear from their names. Derick Rethans 3:01 It's certainly how I have seen it used in the past, where people just throw everything and the kitchen sink at it, and expecting it to be valid UTF 8, and then at the end, decode. I mean, the decoding was not even part much of this, right? It's just throw everything at it, and then magically it will all be UTF 8. But I reckon that's not really quite the case. When and how does that go wrong? Rowan Tommins 3:26 So what actually ends up happening is, because text doesn't know what encoding it's in. Something that people misunderstand about character encoding is they think it's like, the text is a certain colour, and the computer knows what colour it is. And if you tell the computer to make it a different colour, then it will work. But it's not like that. In the computer, there's just the sequence of binary. And the encoding is how to read that binary as text. And if you tell the computer to read it as Latin 1, it will read it as Latin 1. If you take to convert from Latin 1 to UTF 8, it will assume the input is Latin 1, it will convert to UTF 8 on that basis. If your text actually wasn't Latin 1 in the first place, you're just going to end up with garbage. And some of the worst cases of that is when you already have UTF 8, and then you run utf8_encode on it, because the language doesn't know that you've already got UTF 8, so it tries to read its Latin 1, write it out ass UTF 8 and you get this weird Mojibake. I don't know pronouncing that right. Derick Rethans 4:27 I think it's pronounced Mojibake. Rowan Tommins 4:30 Mojibake. Derick Rethans 4:31 It's a Japanese term, because clearly these things, these issues happened with Japanese text quite a lot because they have a lot more different and difficult characters and encodings as well. With which things often go wrong though? Rowan Tommins 4:44 Using an unco on text that's already UTF 8 is obviously a big one. Usually obvious, but occasionally people just getting a muddle with that. The other thing that often happens is confusing with similar encoding. Latin 1 is often mistaken for a different coding windows 1252. To the extent that web pages labelled as Latin 1, web browsers will assume that they're actually in Windows 1252. These PHP functions don't make that assumption. If your text is actually in Windows 1252, and it's been mislabelled Latin 1, you might still think you're doing the right thing. So I've got Latin 1 text, but you haven't. And then the characters that are different, are going to get mangled again. And there's a few other related encodings that often look the same. There are a few other encodings that look the same at a glance that again, will go wrong on any character that's different between the different encodings. Derick Rethans 5:43 How could a function tell which encoding a certain text was in? Rowan Tommins 5:49 It's tricky. There are libraries out there that try to do it. Some encodings that are sequences of bits that aren't a valid character. So if any of those appear, it's definitely not in that encoding. Unfortunately, a lot of encodings, every pattern of bits has a meaning. It's just not necessarily mean. So you can't look at the string and just tell at a glance. The only way I've seen that does it effectively, is trying to guess based on what language text it might be in. If your text suddenly has a load of symbols in the middle of sentences, you're probably using the wrong encoding. If it's suddenly got a load of capital letters, in the middle of words, you're probably using the wrong encoding. So you can make guesses like that, that ultimately, there are only ever guesses. Derick Rethans 6:38 It's only always going to be a guess, right? You can't really tell for certain what it it is, which I've seen people assume that she can just tell. We have concluded that utf8_encode and decode don't actually do what they say they don't magically encode everything to UTF 8. What if things go wrong? How are errors handled? Rowan Tommins 6:58 If you're converting from Latin 1 into UTF 8, there Latin 1 covers all 256 possible eight bit binary strings. Those will correspond directly to a single mapping in Unicode and therefore in UTF 8. So there are no errors as such, when that happens, but it might not be what you want. One of the most notable ones that's different between these encodings is Latin 1 was standardized in 1985, the Euro didn't exist, then. The euro symbol doesn't have an encoding in Latin 1. If you've got a euro sign, you haven't got Latin 1 text, but you might think you've got Latin 1 text, and it will just encode it to what to a control character, which is where the windows 1252 code page puts the euro symbol, it replaces some control characters in Latin 1. One of the reasons why these character encodings are so easily confused is they've all nicely built to being compatible on top of each other. Latin 1 is deliberately an extension of ASCII. Windows 1252 is deliberately an extension of Latin 1, replacing some control characters. UTF 8 is also based on Latin 1, the first section of Unicode is actually the Latin 1, characters UTF 8 will encode and slightly differently so that it can carry on above 256. So in that direction, you can't actually get an error, you could just get a string, that doesn't make sense. Going back the other way. Unicode has, I think, potentially 11 million or something, and actually, at least a million assigned code points. Latin 1 only has 256. So you can't map all those back. And this function, the utf8_decode just replaces any that it can't match with the question mark. Similarly, if the input string isn't valid UTF 8. Again, if you've just misunderstood what strings doing and you haven't actually got a UTF 8 string in the first place, any sequence that doesn't look like valid UTF 8, again, just gets replaced with a question mark. Completely silently you get no warnings in your logs or anything. So you'll just get a few question marks. And problem is, a lot of people are writing text, mostly in English. So it's mostly ASCII. And all of these encodings agree on those first 127 things including all the letters and digits, most of your text will look fine. But if you're using utf8_encode, some of the accented letters will just look a bit funny. If using utf8_decode some of the characters will just turn into question marks. And you might just not notice that for a while until your applications been in production. And now all your strings a messed up. Derick Rethans 9:48 And I reckon that there's no way to fix that? Rowan Tommins 9:52 No. If you've saved saved the text, particularly with the decode direction. Run utf8_encode wrong, if you're careful and tracked carefully where what you've used, you can retrace your steps back to the original string. But if you've not understood what it was doing in the first place, you might have run it more than once, or put it into a system and then re interpreted it in a different way. And it can sometimes be quite hard to trace back what the original string was. You'll sometimes just have to edit it by hand. And guess that, oh, that's probably any acute because that was the word that was trying to be there. That was probably a curly quote mark that somebody was trying to type and those kinds of things. Derick Rethans 10:35 Talking about curly quote marks, I just found out that those are actually are code points in the windows 1252 encoding. Because I just had to edit a document that had these things in there. But the file was set as... this is UTF 8, which was a lie. It was a lie to begin with. We've established that these functions are pretty much destructive to text potentially, as well as not really doing what they say they do: encode every random stuff to UTF 8 or the other way around. I saw any RFC that you've done some research into their usage, didn't bring up anything interesting to talk about? Rowan Tommins 11:13 Yes, so there's a few things. So what I downloaded, it was last year, actually, I kind of had to pause on this RFC for real life happened a bit to me. So last year, I downloaded the 1000, I think top packages on Packagist, I'm most popular downloads, and went through all the uses, I could say of these functions. There were a handful that were using them correctly, they were checking that their input was Latin 1, or the output they needed was Latin 1. And using these, there were a few of those that were questionable, where they might have mistaken Latin 1 for Windows 1252. And actually, they were going to mess up any Euro signs or any of those few extra things that Microsoft added over the top of those control characters. There were a few using strftime, which can do translated Date Time strings. Those it turns out that functions been deprecated itself now, that will become a non issue, some people will have to find a different solution to that anyway. One of the odder ones that I've seen, which technically works, but only accidentally is people use it for what I describe as armour, where they've got a system that wants UTF 8 text, often encoding as JSON or something like that, where it needs to be UTF 8, they've got some unknown encoding that's not UTF 8, they encode to UTF 8, transmitted through the system. And then on the other end, run utf8_decode and they'll get back the string that they put in, because it never errors, there will always be a mapping of any string of bits that this function will give in UTF 8, it just won't be a meaningful string. You could put a JPEG image through utf8_encode, and you will get a string that is valid UTF 8, it's just not going to be very useful UTF 8. It's kind of a bit of a weird way of doing the thing you might do with base 64, or quoted printable encoding or something like that almost something for transport, it technically works. But this probably isn't the function you want to be doing it with. It's not a very useful encoding. And then there were a good number, which just tried throwing all the functions they could. And I kind of I don't want to call out the people with this. I think they were genuine mistakes, they were genuinely trying to solve a problem. But some of them just in hindsight looking at them or kind of hilarious. I think the one that makes me laugh most is the person who raised the StackOverflow question because their CSV file, some of the fields had grown to 32 kilobytes long, because they'd repeatedly run the same string through utf8_encode so many times, that each time it was encoding a single byte to multiple bytes, and then single bytes of that to multiple bytes. And only when it got to 32 kilobytes in one field, did they question whether they were doing the right thing? By which time their text was probably irrevocably lost in whatever other processing they've done on this file. Derick Rethans 14:22 Excellent encryption. Rowan Tommins 14:24 Yes. Derick Rethans 14:25 The RFC talks about a few other approaches to instead of deprecating utf8_encode and decode. What are the things that you look at? And why did you reject them in the end? Rowan Tommins 14:36 One of the most obvious things you could do? The biggest problem is the name of the functions. Could you just rename them? The problem with that is you'd have to spend a long time doing it because you want to introduce the new name in one version of PHP, then deprecate in a later later version of PHP, and then finally remove. And then at the end of it, you'd have these very specific functions. We could call them latin1_to_utf8 and utf8_to_latin1. If we were designing those functions, if you put an RFC to, to add those functions to the language, it wouldn't pass. There's they're very why, why would we have these specific functions, and we'd still have this problem of Windows 1252, and other related encodings, like Latin 9, which is the official successor to Latin 1, and also has a few differences amongst it. They still wouldn't solve a lot of people's problems. A lot of the people that actually want Latin 1 are going to need the euro symbol. So they don't probably don't actually use Latin 1 any more. Because I guess Canadian French, and Mexican Spanish, need to probably that in one's probably still a decent encoding for but the Western European languages it was originally designed for, probably everyone's going to want a euro symbol. Changing the name just leaves us with these awkward functions still. You could instead or as well add options to them, you could add a parameter to them that indicated what the source or destination encoding was. That defaulted initially to Latin 1, and then you were forced to add it later. And then at least you'd be spelling out what encoding it was. The problem with that is, the more encodings, you add, there's actually quite a lot of code that would need to then be added to the function, and it will be duplicating functions we've already got. Derick Rethans 16:31 Such as? Rowan Tommins 16:32 So we've actually in PHP got three functions that can convert between any pair of encodings, including the ones that these functions do. They're all unfortunately in extensions, which are technically optional. Which is something that the way PHP is modular, means that a lot of things that you'd think were kind of just part of the language are technically optional, for one reason or another. But we've got mb_convert_encoding from the mbstring extension. We've got iconv, which uses an external library of the same name. Derick Rethans 17:09 Are you sure it just doesn't use a GCC function or the glib functionality in PHP? Rowan Tommins 17:14 The iconv function uses whatever iconv is available on the system, and seems to vary quite a lot between systems. Oddly, one online code running tool I tried, doesn't actually recognize 8859-1 as an encoding in the iconv function. I don't know why. Just something about the libraries, that version of PHP was built, built against. The most powerful one we've got but also the least documented is the intl extension, which is built on the ICU library, made by the Unicode Consortium. That has a lot of options around how you handle errors and missing characters and supports a lot of different character sets. Some was completely undocumented, I've tried to write a manual page for it, which will hopefully get merged and put live soon. So at least, there will be some documentation there's a, there's an object that you can use with lots of options. But there's a static method, which just takes a from and to encoding. So that's one option. The mb_convert_encoding is probably the most widely available. And maybe we should be looking at making that MB string, less optional. I don't know what that looks like, because of the way, unless you force people to compile it in a lot of the Linux distros. Distribute every module they can separately, they make optional. Derick Rethans 18:39 But they also make it easy for you to install them then. Rowan Tommins 18:42 They make it very easy to install. So I don't know how many people actually run PHP with just its minimal set of modules. And how many just install a default set. The default set is a bit vaguely defined, unfortunately. So that's one of the my main hesitation with this removal, that although we've got these alternatives, we've got these three alternatives. They've all got slight problems, and they're all optional. Derick Rethans 19:08 But considering that utf8_encode and decode don't actually really do well, they say they do, everybody that had to do character set conversions correctly, would have already been using these functions. Rowan Tommins 19:23 Indeed, yes. So I've seen people misuse all of these. Again, people do just generally misunderstand character encoding. MB string does have a function to guess character encoding. As you're saying earlier, people just kind of assume that that will work. A lot of the time, it can't really tell the difference between different character encodings. It can tell you whether a string is valid UTF 8, it can't tell you whether it's Latin 1 or Windows 1252, or any of these others that are single byte encodings. Derick Rethans 19:52 I think ICU actually as functionality for guessing an encoding as well, but it will give you back an array of possibilities and perhaps even with a confidence. But it's a long, long time since I've looked at that. So I'll have to revisit it. Rowan Tommins 20:08 Yeah, that would at least be a more kind of transparent way of doing it that. And that's I guess what I'm trying to do with removing these, is that if you're forced to specify a pair of encodings, as you do for these other functions, at least hopefully, somewhere in your mind, you're going to be thinking about what encodings you might have, rather than just reaching for the first function you find. Derick Rethans 20:31 Yep, exactly. What is the feedback being so far? Rowan Tommins 20:34 Generally positive. There hasn't been a lot of a lot of comments. But those that have been have generally been supportive. I liked somebody said: All the times they've seen it used, including when they've used it themselves, it's been a misunderstanding. I'd like to hear more feedback of anyone. Anyone does have quite. The main feedback I have had has been around making sure there are alternatives to recommend to people. So anyone who is using these correctly, or nearly correctly, what we tell them to use instead, how do we make sure that's clear, and clearly documented, and we're recommending the right thing. I'm going to think a bit more about that, whether we should be being more definite in recommending one of these options. Particularly I think iconv does seem to have these odd platform issues. They used to be a fourth option. While I was looking at this, they used to be another library called recode. That one seems to have been discontinued. Some references in the PHP manual still refer to recode as an optional option for doing this. But that's been long since shelved. So MB string has the benefit that it doesn't rely on any third party libraries. It's technically a third party library, but it's shipped with PHP, and I don't think anything other than PHP uses it any more. And there have been a lot of there's been a lot of work on that library recently, particularly somebody called Alex Douward, apologies, if you're listening to this, and I pronounce your surname wrong, has done a lot of great work. I've seen recently improving that extension, making sure the detection algorithm is doing as sensible results as it can and improving the test test coverage of that extension and things like that. So that gives me a bit more confidence in that extension, which initially was one of those PHP reinventing the wheel, it felt a bit like, so probably update the RFC to more explicitly say, that's the number one recommended path. Derick Rethans 22:27 And of course, you can link that from the utf8_encode and utf8_decode manual pages as well. Please don't use this instead, do this, right? Rowan Tommins 22:36 Yeah. And that's again, where it can be a nice clear drop in replacement, so that people are using it right. Here's exactly what to what to use instead. But hopefully, while they're replacing it, they may be at least think about whether it was doing what they what they were hoping for in the first place. Derick Rethans 22:55 When do you think you'll be bringing this up for a vote? Rowan Tommins 22:59 Unless I get more feedback, further changes? I'll probably tweak that wording in terms of the recommendation that we'll put to users. Otherwise, probably in the next couple of weeks, unless I hear any more, to see if any last minute criticism comes out the woodwork when people are asked to vote on it. Derick Rethans 23:18 Yeah that always happens, right? No comments when there isn't a request for comments. But loads of comments if people are voting on it, and it makes it to Twitter. Okay, Rowan, thank you for taking the time today then to talk about this RFC. Rowan Tommins 23:32 Thank you very much for having me. Derick Rethans 23:39 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Deprecate and Remove utf8_encode and utf8_decode Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
It's Joseph and Jesse's first podcast together in 2022 and they are absolutely stoked about it. January was a big month for SwiftOtter's digital agency, and as developers all know very well, a “big month” means lots of difficult problems to solve and lessons to learn! Join Joseph and Jesse to talk about what they've learned recently. 1:37 Shopware's $100M funding announcement, why competition right now is a great thing, and what Magento could add or modify to make sure they are demonstrating value as competition gets even stronger. 3:50 SwiftOtter's 2022 Certification Challenge (takes place during the first quarter of every year) and how it's better than ever this year, with the new collaborative study group meetings every other week. These study groups showcase exactly why becoming certified is an excellent way to grow your skills and prove them both! 6:54 Magento/Adobe Commerce 2.4.4 comes out on March 8th, 2022! Joseph has some holdbacks on how amazing it will be in the beginning..(hint: BUG THE MODULE VENDORS!) and even has a nice little rhyme to share. 9:28 Skilled problem solving: moving from “if” to “when”. 11:00 Jesse's issue: customers adding products to the cart and getting a zero value. Taking advantage of recent pull requests to start the investigation. Flagging changes, thinking about what values should be returned, and having the willingness to do significant refactoring. 15:57 The 3 things that will influence behavior. 16:48 The art of identifying what areas are most likely to present a problem, and narrowing down options to reduce wasted time when debugging. 18:00 Another instance of Xdebug completely saving the day right before code needed to get to production, and the real life reason why looking at pull requests is so important. 23:00 Joseph's pet peeve about running queries with your constructor. 24:13 Jesse's thoughts about coding with future functionality in mind and how it actually improves code in the present. Naming, arrays, and grouping as many things together to do one single thing. 30:21 Another cart issue where the value came up as zero dollars - infinite loops, and adding breakpoints somewhat randomly with Xdebug to start a search. Connect with Jesse: LinkedIn Twitter Connect with Joseph: LinkedIn Twitter This podcast exists to inspire, educate and entertain eCommerce developers who are serious about improving their skills and advancing their careers! Have you joined the free SwiftOtter Slack community? It's exploding and we don't want you to miss out. Go to SwiftOtter.com/Slack to join for free and get plugged into what might be the best group of collaborating developers around! Special thanks to TrendingAudio for our awesome theme music!
PHP Internals News: Episode 97: Redacting Parameters London, UK Thursday, January 27th 2022, 09:09 GMT In this episode of "PHP Internals News" I chat with Tim Düsterhus (GitHub) about the "Redacting Parameters in Back Traces" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:00 Before we start with this episode, I want to apologize for the bad audio quality. Instead of using my nice mic I managed to use to one built into my computer. I hope you'll still enjoy the episode. Derick Rethans 0:30 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 97. Today I'm talking with Tim Düsterhus about Redacting Parameters in Backtraces RFC that he's proposing. Tim, would you please introduce yourself? Tim Düsterhus 0:50 Hi, Derick, thank you for inviting me. I am Tim Düsterhus, and I'm a developer at WoltLab. We are building a web application suite for you to build online communities. Derick Rethans 0:59 Thanks for coming on this morning. What is the problem that you're trying to solve with this RFC? Tim Düsterhus 1:05 If everything is going well, we don't need this RFC. But errors can and will happen and our application might encounter some exceptional situation, maybe some request to an external service fails. And so the application throws an error, this exception will bubble up a stack trace and either be caught, or go into a global exception handler. And then basically, in both cases, the exception will be logged into the error log. If it can be handled, we want to make the admin side aware of the issues so they can maybe fix their networking. If it is unable to be handled because of a programming error, we need to log it as well to fix the bug. In our case, we have the exception in the error log. And what happens next? In our case, we have many, many lay person administrators that run a community for their hobby, they're not really programmers with no technical expertise. And we also have a strong customers help customers environment. What do those customers do? They grab their error log and post it within our forums in public. Now in our forum, we have the error log with the full stack trace, including all sensitive values, maybe user passwords, if the Authentication Service failed, or something else, that should not really happen. In our case, it's lay person administrators. But I'm also seeing that experienced developers can make this mistake. I am triaging issues with an open source software written in C. And I've sometimes seeing system administrators posting their full core dump, including their TLS certificates there, and they don't really realize what they have just done. That's really an issue that affects laypersons, and professional administrators the same. In our case, our application attempts to strip those sensitive information from this backtrace. We have a custom exception handler that scans the full stack face, tries to match up class names and method names e.g. the PDO constructor to scrub the database password. And now recently, we have extended this stripping to also strip anything from parameters that are called password, secret, or something like that. That mostly works well. But in any case, this exception handler will miss sensitive information because it needs to basically guess what parameters are sensitive values and which don't. And also our exception handler grew very complex because to match up those parameters, it needs to use reflection. And any failures within the exception handler cannot really be recovered from, if the exception handler fails, you're out of luck. Derick Rethans 3:51 Quite a few things to think of to make sure that you're not sharing any secrets. And I certainly have seen almost doing this myself. We now know what the problem is. How is this RFC proposing to fix this? Tim Düsterhus 4:03 Primarily, we want to propose a standardized way for applications or libraries to indicate which parameters hold sensitive values. Our custom exception handler uses reflection as we said before, and it only matches up the parameter's names, but we also have this attribute I am proposing, SensitiveParameter within our application itself. Any parameter names that are not definitely sensitive can be attributed with this attribute. But this only works within our software, but not with any third party libraries we are using, e.g. for encryption or whatever there is. Primarily we want to propose a standardized way an attribute that is in PHP core, anyone can use that and everyone knows what this attribute means. Secondarily, the RFC is proposing a default implementation to keep the exception handler simple. As I said before, we are using reflection. This is very complex, it does not work with the require_once or include_once family, because that are not functions. We need to handle this case to not try to attempt to reflect on those non functions when redacting any parameters. This is complex. And we want to simplify that. Derick Rethans 5:20 From what I understand this is then a way to make sure that there's a standardized method for marking arguments as being sensitive. And because this is that now standardized, only one solution to the problem has to be found right? Tim Düsterhus 5:34 Basically, not every library is using their own attributes, possibly, or we can match parameter names that are not like password, secret, but it can be documented: hey, if you are using sensitive parameters, you should put this attribute and then those exception handlers will be aware that this attribute is sensitive and can strip it, or in case of the RFC PHP itself, will already strip those parameters from the stack trace. Derick Rethans 6:04 You're suggesting that PHP standard way of showing stack traces also takes care of the sensitive parameter here? Tim Düsterhus 6:11 Yes, exactly. Derick Rethans 6:13 Which internal PHP functions are likely to get this attribute? Tim Düsterhus 6:16 Basically anything with a parameter called password or secret, as I said before, examples include PDO's constructor, the database password will be in there and possibly also the user name or host name, which might be considered sensitive. But the password is the most important thing I have on my list. ldap_bind, which possibly includes user passwords; the password_hash function; possibly various OpenSSL functions. One will need to look and this list can be extended in the future as well, if someone realizes we missed anything. Derick Rethans 6:55 Now, I know sometimes that there's a problem where an application connects to the wrong server with PDO. And as you say, the host name was also in this PDO constructor, would it not then make debugging that specific case harder because the hostname would also be redacted from the stack traces? Tim Düsterhus 7:14 The attribute I am proposing as the parameter attribute, each parameter can be sensitive or non sensitive. We would need to decide whether we consider the hostname sensitive or not. It usually is not. So I would not put the attribute on the host name, or on the DSN string in the first parameter. The password definitely is sensitive. And the username possibly is a grey area. By default, I probably would not put the attribute there. But this is something that needs to be discussed in the greater community possibly. Derick Rethans 7:47 I saw in the RFC that when you request a stack trace in PHP with get back trace or whatever the name of this function is, is that the sensitive parameters are being replaced by an object of the class SensitiveParameter. Why did you pick that instead of just a string, saying something like "redacted". Tim Düsterhus 8:06 We cannot force users to put the attribute only on parameters that take strings. If we use a redacted string we might violate the type hint. If a function takes some key pair class, or an option of a key pair class, this usually is a sensitive attribute, we cannot simply put a string there. We can but then we would violate the typing. And as we violate the typing in at least some of the cases, we can also violate it in all of the cases and then make it very clear that this parameter was redacted and not a real value that just looks like a string "redacted". Exception handlers would be able to use an instanceof SensitiveParameter check to possibly make it more user friendly when they render the stack trace. When you using an GUI to handle your exceptions as such a Sentry can show some placeholder instead of pretending it's a real string in there. Derick Rethans 8:07 And of course, the string "redacted" can already exist as an argument value yet anyway, right? Tim Düsterhus 9:12 Yeah. Derick Rethans 9:13 Where would attribute be checked? Tim Düsterhus 9:16 My proposal would extend PHP to check this attribute within the function that generates the stack trace, because as I said, I want to keep my exception handler simple, so they won't need to use reflection to check this attribute. PHP itself will check this attribute when the stack trace is generated. So no exception handler can miss to check this attribute. Derick Rethans 9:39 Would it be possible for code that checks for SensitiveParameter to see what the original value was? I can imagine that in some cases, an exception handler as part of a debugging toolbar, whatever does want to show this extra information, although there's going to be hidden by default. Tim Düsterhus 9:58 Not with the current version of my RFC, but I can imagine that this sensitive parameter replacement value gets an attribute where the original value can be stored. Care would need to be taken, so exception handlers don't simply serialize that value and ship it to a third party service, basically negating the benefit. But a future extension, or maybe the further discussion of my RFC can extend this replacement value. So you can use sensitive parameter, arrow, original value, or whatever. Derick Rethans 10:34 In PHP attributes are basically markers on parameters or arguments. But they don't necessarily have to have an object implementation. Is your RFC also including the SensitiveParameter class that PHP core implements? Tim Düsterhus 10:51 Yes, in my current RFC, and my current proof of concept implementation, I'm just reusing that attribute class as the replacement value within the stack trace. So we can kill two birds with one stone by doing that, by including proper class, also, any IDE will be able to see that class and know where that attribute can be applied. Because attributes have a property where they say where they can be applied in this case parameters only. And by putting it on the method by accident, you will possibly get an error or the IDE can warn you that you're doing this not correctly, Derick Rethans 11:32 You might be aware that I work on Xdebug, a debugger for PHP. And in many cases, some of the users have already previously said that Xdebug should, for example, follow the debug_info() magic method on objects to show redacted information. Now, would you think that when people debug PHP with a debugger such as Xdebug, should they see the contents of the arguments that are set with SensitiveParameter, or should it stack traces show the real value? Tim Düsterhus 12:07 In case of debugging, you're not usually not in production. So within your debugging environment or development environment, you shouldn't really have any sensitive value such as passwords, or credit card numbers, or whatever there is. In that case, debugability and ease of development should be more important. Xdebug, or any other debugger should see through those sensitive attributes and show the real value, possibly with an indicator that this value would usually be sensitive. But you shouldn't need to work around PHP hiding something from you, because you really want or need to see what happens there. Derick Rethans 12:48 Now Xdebug also override PHP's standard exception handler, and then creates a stack trace of its own. Do you think that should redact the SensitiveParameter arguments? Tim Düsterhus 13:00 I'm not really sure if people run this in production. If this is something people usually do, then of course, Xdebug should make sure to redact those values, possibly with a special ini flag or something. If that's only used in development. In my case, I only use Xdebug in development and production servers don't have that; you don't really connect to your production server with your IDE and then step through the code. That does not happen. So we don't need Xdebug in production. Derick Rethans 13:32 I know some people do run Xdebug in production. But I also don't think those are the people that care about leaking sensitive parameters. I think the RFC talks about a few existing features that PHP already has for redacting some values. What are these? And how are they not sufficient? Tim Düsterhus 13:49 There are two php.ini values you can set. One of those is do not collect parameters in stack traces, I don't have the exact name. But basically, all functions will just show an empty parameter list within the stack trace. That makes debugging very hard, especially with PHP and the non-strict typing, it can happen that you pass some completely invalid value to a function, even in production after testing and such. And you really want to know about this value, because it makes debugging very hard. Not collecting the parameters makes the stack traces much, much less useful. So this targeted redaction, as I'm proposing, hides the sensitive values but the non sensitive values will still be visible. And the other one is that the length of collected strings within the stack place can be configured. By default. I think it's on 15, but 15 characters already include user passwords such as password, exclamation mark, or 12345. And also credit card numbers will be exposed to three fourths by then. And the last four digits are shown in clear text on many pages. So that doesn't really help with those type of user credentials. Of course, your database password might be 40 characters completely random. But that's not really the values you want, or need to protect, because the database server will not be exposed to the internet, in many cases. Derick Rethans 15:33 What has the feedback been so far to this RFC? Tim Düsterhus 15:36 Both positive, and "we don't need that nobody does that". It's a bit mixed. I've got some very good feedback. There's a Twitter account that tweets any new RFCs. And so the users on Twitter, the actual users, and not PHP internals list seem to be very happy with my proposal. On the list, many said, just don't log that values, or they don't really see the benefit yet, I think. Not really sure how the feedback is really. Derick Rethans 16:07 That's always a tricky thing, isn't it? Because the people that think "Oh, this is all right", often bother responding, because they don't have anything to add or criticize. Tim Düsterhus 16:17 Exactly. People that are happy won't write any reviews for whatever, just the people that complain are complaining. Derick Rethans 16:24 Yeah, it's either the people that are complaining are the people that are really happy about something. Are you expecting there to be any backward compatibility breaks? Tim Düsterhus 16:34 Yeah, obviously, when the attribute class name will be taken by default by PHP, userland code cannot use that any more. But I don't think that anyone is using a SensitiveParameter class in the global namespace. I used GitHub search and SensitiveParameter in PHP code only appears in some strings, in the AWS SDK or something like that. The replacement value will break any type signature. So if the exception handler checks, the original parameter types for whatever reason, that will, or might break, but I don't really think that's likely either. I don't expect any major backwards compatibility breaks. Derick Rethans 17:17 That's good to hear. And also good to hear that you have done some research into this. Do you have any extra selling points to convince people? Tim Düsterhus 17:26 My initial selling point was PDO's constructor. Or not really selling point, but example, because it's very obvious and it's in PHP core. I later expanded that with the credit card numbers and user passwords, and made, attempted to make this more clear that those sensitive values are not just values from your personal computing environment, but also something user input into your application. And that stack traces will be sent to third parties e.g. Sentry, which might even be run as a software as a service solution. And then your deep in GDPR territory. You don't want that. Derick Rethans 18:03 No, absolutely not. Tim, thank you for taking the time this morning to talk to me about your RFC. Tim Düsterhus 18:10 Thank you for having me. Derick Rethans 18:15 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening. I'll see you next time. Show Notes RFC: Redacting parameters in back traces PHP RFC Bot on Twitter Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 96: User Defined Operator Overloads London, UK Thursday, December 16th 2021, 09:24 GMT In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "User Defined Operator Overloads" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 96. Today I'm talking with Jordan, about a user defined operator overloads RFC that he's proposing. Jordan, would you please introduce yourself? Jordan LeDoux 0:33 My name is Jordan LeDoux. I've been working in PHP for quite a while now. This is the second time I have ventured to propose an RFC. Derick Rethans 0:44 What was the first one? Jordan LeDoux 0:45 The first one was the "never for parameter types", which was much more exploratory. And we talked about it a little bit. And it generated a lot of good discussion that contributed to kind of the idea formation, which was what I hope to get out of it. Derick Rethans 1:01 Okay, but that didn't end up making it into a PHP release. As far as I understand, right? Jordan LeDoux 1:07 No, I withdrew it actually, it was clear that the better way to approach the problem it was trying to solve was with a much more comprehensive solution. That particular solution was something that only required a seven line change to the engine. So I wanted to see if it was something people were okay with, or thought was a decent idea for that particular problem, much more comprehensive, like template classes, or something like that is probably the better route to go. Derick Rethans 1:35 Well, I think the RFC that we're talking about today, is going to require quite a bit more than seven lines of code? Jordan LeDoux 1:41 Quite a bit more. Yeah. Derick Rethans 1:42 So what is this RFC that we're talking about today? Jordan LeDoux 1:45 Well, user defined operator overloads is a way for PHP developers to define the ways in which objects interact with specific operators. So for instance, the plus operator, the plus sign. It's a way for those objects to kind of define their own logic as far as how that's handled, which right now, as of PHP 8.0, those were all switched to type errors. So it's not possible currently to write any code that doesn't result in a fatal error, where objects are used with operators. Derick Rethans 2:25 Usually, I ask about every RFC, what problem are you trying to solve this? So what problem are you trying to solve this RFC? Jordan LeDoux 2:31 The biggest problem that this solves is that objects contain, so objects in most programs represent a value or multiple values that have a program context. That's the most powerful thing about objects is they're contextual, and they understand the state, they understand what state the object is in, and sometimes even what state the whole program is in. And that's necessary for a lot of things. Like for instance, if you're tracking a distance, you know, you might measure that meters, and that would have a number you might have 30 meters of distance, but it also has a unit of meters. You could just represent that as an int. And then the program just knows internally, hey this is always in meters. But if you need to convert that to a different unit, then that becomes: Okay, well, now I need a special case some things, or I need a function just for converting, and I need to remember which unit my number is in. In a lot of cases, you handle that with objects because objects understand state, and they understand state transitions, which is what a lot of methods are about; transitioning the state of the object from one state to another. Operators are also about state transitions. And they're about very specific kinds of state transitions. It's natural in a lot of ways to think that you, you should be able to define how those two things interact. But currently, it's just not possible within PHP. Derick Rethans 4:00 Well, does them this magic operator overloading? Jordan LeDoux 4:04 It allows PHP developers to define an implementation logic, which is much like you define a function body that describes how does this object interact with this operator. That's essentially it. There's a lot of other details as to how it does that and what are the restrictions, but that's really the core of the idea. Derick Rethans 4:26 And in what kind of situations would you use that? Jordan LeDoux 4:28 A lot of them are situations where you're doing very complicated mathematics, or scientific computing or machine learning or things of that nature, where you are going to routinely encounter numbers that have state to them or that have multiple dimensions to them. So for instance, vector mathematics is one where the way that vectors interact with a lot of the operators that we're familiar with, like the multiplication sign is very different than how the number five interacts with the multiplication sign. Complex numbers is another one, you know, to multiply two complex numbers together, you have to treat it like a polynomial where you're multiplying it with the FOIL method: first, outside, inside, last. You know, there's a lot of those sorts of circumstances. But it also could potentially be very useful for some things that are not really mathematical but more quality of life for PHP developers. For instance, scalar objects is something that a lot of developers in PHP have, you know, wanted for a while. It's a thing that's a little more difficult to pin down, how exactly would you go about doing this within the engine, and it's a thing that the engine would kind of have to be very opinionated about by its nature. PHP developers can't provide their own scalar objects. And the main reason for this is that scalars interact with operators and objects can't. So simply allowing PHP developers to define a way for objects to interact with operators would allow user land to develop their own scalar object replacements. It wouldn't make every scalar that object; scalar objects within the engine still has, it's a separate feature. And it's still a thing that would be desirable, probably to a lot of people. But it gets quite a bit of the way there. Derick Rethans 6:20 It is always interesting that people come up with the example of complex numbers, because I'm not sure how useful that is in a PHP user land context. And then beyond the scalars, I then sometimes struggle to see where this could be used. With the only exception is probably doing calculations with money related issues. The moment you bring up operator overloading, you'll also get people to say that this is going to get abused. Examples of that, in my opinion at least, is where in C++ you have like the
PHP Internals News: Episode 95: PHP 8.1 Celebrations London, UK Thursday, November 25th 2021, 09:23 GMT In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 8.1. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:23 This is episode 95. I've been absent on the podcast for the last few months due to other commitments. It takes approximately four hours to make each episode. And I can now unfortunately not really justify spending the time to work on it. I have yet to decide whether I will continue with it next year to bring you all the exciting development news for PHP 8.2. Derick Rethans 0:44 However, back to today, PHP eight one is going to be released today, November 25. In this episode, I'll look back at the previous episodes this year to highlight a new features that are being introduced in PHP 8.1. I am not revisiting the proposals that did not end up making it into PHP 8.1 feature two features I will let my original interview speak. I think you will hear Nikita Popov a lot as he's been so prolific, proposing and implementing many of the features of this new release. However, in the first episode of the year, I spoke with Larry about enumerations, which he was proposing together with Ilija Tovilo. I asked him what enumerations are. Larry Garfield 1:26 Enumerations, or enums, are a feature of a lot of programming languages. What they look like varies a lot depending on the language, but the basic concept is creating a type that has a fixed finite set of possible values. The classic example is booleans. Boolean is a type that has two and only two possible values true and false. Enumerations are way to let you define your own types like that, to say this type has two values Sort Ascending or Sort Descending. This type has four values for the four different card suits, and a standard card deck. Or a user can be in one of four states pending, approved, cancelled or active. And so those are the four possible values that this variable type can have. What that looks like varies widely depending on the language. In a language like C or C++, it's just a thin layer on top of integer constants, which means they get compiled away to introduce at compile time, and they don't actually do all that much they're a little bit to help for reading. On the other end of the spectrum, you have languages like rust or Swift, where enumerations are a robust, advanced data type and data construct of their own. That also supports algebraic data types. We'll get into that a bit more later. And is a core part of how a lot of the system actually works in practice, and a lot of other languages are somewhere in the middle. Our goal with this RFC is to give PHP more towards the advanced end of enumerations. Because there are perfectly good use cases for it, so let's not cheap out on it. Derick Rethans 3:14 In the next episode, I spoke with Aaron Piotrowski about another big new feature: fibres. Aaron Piotrowski 3:20 A few other languages already have Fibers like Ruby. And they're sort of similar to threads in that they contain a separate call stack and a separate memory stack. But they differ from threads in that they exist only within a single process and that they have to be switched to cooperatively by that process rather than pre-emptively by the OS like threads. And so the main motivation behind wanting to add this feature is to make asynchronous programming in PHP much easier and eliminate the distinction that usually exists between async code that has these promises and synchronous code that we're all used to. Derick Rethans 4:03 I also asked Aaron about small PHP I actually have a slightly related question that pops into my head as like. There's also something called Swoole PHP, which does something similar but from what I understand actually allows things to run in threats. How would you compare these two frameworks or approaches is probably the better word? Aaron Piotrowski 4:25 Swoole is they try and be the Swiss Army Knife in a lot of ways where they provide tools to do just about everything. And they provide a lot of opinionated API's for things that in this case, I'm trying to provide just the lowest level just the only the very necessary tools that would be required in core to implement Fibers. Derick Rethans 4:48 Although I discussed several deprecations from Nikita and the last year, I only want to focus on the new features. In episode 76. I spoke with him about array unpacking, after talking about changes to Null in internal functions. Nikita Popov 5:01 The old background is set we have unpacking calls. If you have the arguments for the call in an array, then you write the free dots and the array is unpacked intellectual arguments. Now what this RFC is about is to do same change for array unpacking, so allow you to also use string keys. Derick Rethans 5:24 In another episode, I spoke with David Gebler on a more specific addition of a new function fsync. David explains the reason why he wants to add this to PHP. David Gebler 5:34 It's an interesting question, I suppose in one sense, I've always felt that the absence of fsync and some interface to fsync is provided by most other high level languages has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP core getting to learn the source code. And it's a very small contribution, but it's one that I feel is potentially useful. And it was easy for me to do as a learning exercise. Derick Rethans 5:58 And that is how things are added to PHP sometimes, to learn something new and add something useful at the same time. After discussing the move of the PHP documentation to GIT an episode 78, in Episode 79, I spoke with Nikita about his new in initializers RFC. He says: Nikita Popov 6:15 So my addition is a very small one, actually, my own will, I'm only allowing a single new thing and that's using new. So you can use new whatever as a parameter default, property default, and so on. Derick Rethans 6:29 The addition of this change also makes it possible to use nested attributes. Nikita explains: Nikita Popov 6:34 I have to be honest, I didn't think about attributes at all, when writing this proposal. What I had in mind is mainly parameter defaults and property defaults. But yeah, attribute arguments also use the same mechanism and are under the same limitations. So now you can use new as an attribute argument. And this can be used to effectively nest attributes. Derick Rethans 6:59 Static Analysis tools are used more and more with PHP, and I spoke to the authors of the two main tools, Matt Brown, of Psalm, and Ondrej Mirtes of PHPStan. They propose to get her to add a new return type called noreturn. I asked him what it does and what it is used for. Ondrej Mirtes 7:14 Right now the PHP community most likely waits for someone to implement generics and intersection types, which are also widely adopted in PHP docs. But there's also noreturn, a little bit more subtle concept that would also benefit from being in the language. It marks functions and methods that always throw an exception. Or always exit or enter an infinite loop. Calling such function or method guarantees that nothing will be executed after it. This is useful for static analysis, because we can use it for type inference. Derick Rethans 7:49 Beyond syntax, each new version of PHP also adds new functions and classes. We already touched on the new fsync function, but Mel Dafort proposed to out the IntlDatePatternGenerator class to help with formatting dates according to specific locales in a more specific way. She explains: Mel Dafert 8:07 Currently, PHP exposes the ability for locale dependent date formatting with the IntlDateFormat class, it says basically only three options for the format long, medium and short. These options are not flexible in enough in some cases, however, for example, the most common German format is de dot numerical month dot long version of the year. However, neither the medium nor the short version provide and they use either the long version of the month or a short version of the year, neither of which were acceptable in my situation. Derick Rethans 8:40 And she continues with her proposal: Mel Dafert 8:42 ICU exposes a class called DateTimePatternGenerator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. The skeleton just includes which parts are supposed to include it to be included in the pattern, for example, the numerical date, numerical months and the long year, and this will generate exactly the pattern I wanted earlier. This is also a lot more flexible. For example, the skeleton can also just consist of the month and the year, which was also not possible so far. I'm proposing to add IntlDatePatternGenerator class to PHP, which can be constructed for locales and exposes the get best pattern method that generates a pattern from a skeleton for that locale. Derick Rethans 9:26 Locales and internationalization have always been an interest for me, and I'm glad that this made it into PHP 8.1. I spoke at length with Nikita about his property accessors RFC, in which he was suggesting to add a rich set of features with regard to accessibility of properties, including read only, get/set function calls, and asymmetric visibility. He did not end up proposing this RFC, which he already hinted that during our chat: Nikita Popov 9:53 I am still considering if I want to explore the simpler alternatives. First, there was already a proposal, another rejected proposal for Read Only properties probably was called Write Once Properties at the time. But yeah, I kind of do think that it might make sense to try something like that again before going to the full accessors proposal, or instead. Derick Rethans 10:18 He did then later proposed a simpler RFC read only properties, which did get included into PHP eight as a new syntax feature. He explains again: Nikita Popov 10:27 This RFC is proposing read only properties, which means that a property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have Type Properties. Remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that a string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value. For example, one nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP. Derick Rethans 11:21 Nikita, of course, meant this kind of immutable object pattern, which we didn't pick up on during the episode. Another big change was the PHP type system, where George Peter proposed out pure intersection types. He explains what it is: George Peter Banyard 11:35 I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want x and y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. Derick Rethans 11:54 To explain our pure George Peter says: George Peter Banyard 11:58 So the word pure here is not very semantically, it's more that you cannot mix union types and intersection types together. Derick Rethans 12:06 Just after the feature freeze for PHP 8.1 happened in July, another RFC was proposed by Nicolas Grekas to allow the new pure intersection types to be nullable as well. But as that RFC was too late, and would change the pure intersection type to just intersection types, it was ultimately rejected. Derick Rethans 12:23 The last feature that I discussed in a normal run of the podcasts was Nikita's first class callable syntax support. He explains why the current callable syntax that uses strings and arrays with strings has problems: Nikita Popov 12:35 So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two string signs inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array elements will also be renamed. But that's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that colour bulls are not scope independent. For example, if you have a private method, then like at the point where you create your, your callable, like as an array, it might be callable there, but then you pass it to some other function, and that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both the like this callable syntax based on arrays, and also the callable type, is callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable culpability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable. Derick Rethans 14:08 This new feature is a subset of another RFC called partial function applications, which was proposed by Paul Crovella, Levi Morrison, Joe Watkins, and Larry Garfield, but ultimately got declined. So there we have it, a whirlwind tour of the major new features in PHP 8.1. I hope you will enjoy them. As I said in the introduction, I'm not sure if I will continue with the podcast to talk about PHP 8.2 features in 2022 due to time constraints. Let me know if you have any suggestions. Derick Rethans 14:41 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes Episode #73: Enumerations Episode #74: Fibers Episode #76: Array Unpacking Episode #77: fsync function Episode #79: New in Initialisers Episode #81: noreturn type Episode #85: Add IntlDatePatternGenerator Episode #86: Property Accessors Episode #88: Pure Intersection Types Episode #90: Readonly Properties Episode #92: First-Class Callable Syntax Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
Welcome to this special Tech Edition of Talk Commerce, where we explore how merchants, agencies, and developers experience commerce and the communities they work and live in. This week we interview Joseph Maxwell and discuss his new book, “The Art of Ecommerce Debugging”. The video version of the podcast includes an exclusive unboxing of the book. Joseph goes over his motivation for writing this book and how it will help developers be better developers! Joseph discusses the mindset of resolving problems. We go over TAD (You can listen to know what that means). We talk about work-life balance and how Joseph handles this. We talk about documenting your code. Joseph gives his top 5 things developers should do. (Hint number one is XDebug). The one thing to double your productivity is XDEBUG! Joseph talks about Magento certifications and why this is so important. Why NOT get certified? This episode was recorded on July 30th, 2021
PHP Internals News: Episode 94: Unwrap Reference After Foreach London, UK Thursday, August 26th 2021, 09:22 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 94. Today I'm talking with Nikita Popov about the unwrap reference after foreach RFC that he's proposing. Nikita, would you please introduce yourself? Nikita Popov 0:33 Hi, Derick. I'm Nikita and I work at JetBrains on PHP core development. Derick Rethans 0:38 So no changes compared to the last time. Nikita Popov 0:41 Yes, at the time before that. Derick Rethans 0:43 So what is the problem that is RFC is going to solve? Nikita Popov 0:46 Well, it's really a very minor thing. I think it's a relatively well known problem for the more experienced PHP programmers. It's like a classic example, you have a foreach loop by reference. So foreach array as value by reference, and then you do a second loop after that, foreach array as value at the same it's by value. So without the reference sign. The result of that is that your last two array elements are going to be the same, which is kind of unexpected. If you're not familiar with how references in PHP work and scoping in PHP works. So I think it's worth explaining what's going on there. Derick Rethans 1:27 Can you quickly explain the scoping or rather the lack of it, I suppose? Nikita Popov 1:31 Yeah, it's really the lack of PHP really only has function scoping. So if you have a foreach array as value, then the value variable is going to stay alive, even after the foreach loop. And usually, that won't make much of a difference. So you will just have like reference to the last element of the array, might even be useful for some cases, you know, before we added the array, I think, array_key_last function. If the last element now is a reference, so if you have a reference to the last element, then you're write into that variable is also going to modify the last element of the array. So if you now have a second foreach loop, using the same variable, that's actually not just modifying that variable, but it's also always modifying the last element of the array. Derick Rethans 2:15 Okay, just to clarify, it isn't necessarily the last element in the foreach loop. It's the last one that's been assigned to? Nikita Popov 2:22 Yeah, that's, that's true. Derick Rethans 2:24 Is this not something that people actually use for some useful reasons? Nikita Popov 2:28 As mentioned before, technically, you could use it to get a reference to the last element and then modify the last element outside the foreach loop. I don't think this is a particularly common use case. But I'm sure people have used in here there. This is a use case we would break with the proposed RFC. Derick Rethans 2:47 I think it is one I have used in the past, it's probably not how I would do it now. But I'm pretty sure I have some point in the past. What are you proposing to change with this RFC? Nikita Popov 2:57 The change is pretty simple. And that's to unwrap or to break the reference after the loop. You will still have like after the loop, the variable will still contain the value of the last element, or of the last like visited element, but it will no longer be a reference to it. If you write into the variable, it will not modify the original array. And if you have a second loop that writes into the variable that also doesn't modify the original error any more. Derick Rethans 3:25 At which point and how is this reference broken? Nikita Popov 3:29 It's at the end of the foreach loop, or as you say, if you break out too early, then of course, it would also get broken. So it's referenced inside the foreach loop and stops being referenced outside the loop. Derick Rethans 3:41 And that would happen also, if I would use a goto for example? Nikita Popov 3:45 Oh, that that's a trick question, actually, yes, it should happen. But now that you have mentioned it, I think my current implementation does not handle that particular case, I will have to double check it. But that should happen, yes. Derick Rethans 4:00 It's good to know that you've thought about it then. Nikita Popov 4:02 Well, I didn't think about it. Because I mean, I guess I can mention it here, the way this works is that well, at the end of the foreach loop, we have like an instruction that frees the loop variable. And I can just add an additional one that breaks reference. But if you use things like goto or multi level breaks, or something like that, then we insert these clean-up instructions before the jump. We have to make sure to actually insert the reference breaking instruction there as well. So it's like not automatically handled. Derick Rethans 4:38 Is this going to be a separate instruction or as we tend to call them opcodes? Nikita Popov 4:43 I'm using a separate one, but one could run it as a flag into the instruction that frees the loop variable, but I think it's cleaner to have a separate instruction for it. Like technically one could optimize it away in some cases, like I wouldn't bother but it's like semantically a different thing. Derick Rethans 5:01 I think it'd be nicer result, because it makes it easier to visualize what's happening, right? Nikita Popov 5:06 Yeah, it is. Derick Rethans 5:07 Did you actually check whether some code uses this construct? Nikita Popov 5:10 I have to admit, I tried checking it using a very basic approach, just look at foreach loops by reference. And then if the variable is used after that. But that kind of primitive approach has way too many false positives, for example, you have a foreach loop inside, and if, and then the variable is reused inside an else. So it like wouldn't flow from the if into the else. So you would have to do some kind of more sophisticated control flow analysis. It's something that can be done, but I didn't bother doing it for a one off backwards compatibility check. So I don't have any hard data on how much code is actually using something like this. Derick Rethans 5:51 So this is where I'm a little bit on the fence about this change, because it is changing behaviour, that's going to be pretty hard to figure out what is actually going to affect your codebase. Nikita Popov 6:01 It should be possible to very reliably detect that. It's just something you have to actually implement. But you're right now there is no easy way to check that. Derick Rethans 6:13 It's something that static analysers could probably have a look at. Nikita Popov 6:16 Yeah, expect that maybe Psalm or PHPStan, something like that will be easier to implement, because they already have control flow information. Derick Rethans 6:23 You don't really know how impactful this, which is, in my opinion, a bit of the scary bit. How important do you think you'll find it to have this RFC going through and implemented? Nikita Popov 6:33 I don't think it's super important. It's mostly like, small quality of life fix for newer developers . People who have already encountered this issue once won't forget about it again. In fact, it's somewhat common recommendation that you should always unset the loop variable after a foreach by reference loop. So I've seen that as like a policy some people use, that could be avoided. So yeah, I don't think it's a critical feature, just a small improvement. Derick Rethans 7:08 Would it be an alternative idea to instead deprecate the foreach by reference? Nikita Popov 7:14 Okay, that's the radical approach. Everything is possible. I think that foreach by reference is relatively, I mean, I think it's one of the most common uses of references we have, and one of the most reasonable ones. I mean, the alternative is search into by value loop, and then you modify it by looking up the element by key again, which is a bit more ugly, I would say. I think we shouldn't deprecate foreach by reference, though it would be kind of nice to have a different way to achieve the same. One other unfortunate thing about foreach by reference is that it leaves behind references in the array. The case I'm looking at here is this reference to the last element, where you have like reference structure that's pointed to both from inside the array, and from this loop variable. The other thing that foreach by reference does is that for all the other array elements, you will actually leave behind the reference wrapper that's just used in this one single place for this single array element. Essentially, you are wasting memory, because we will leave behind this that reference wrapper. So after you do the foreach by reference loop over the array, the array will actually grow larger. So if you're storing like integers, and it may grow significantly larger, like from a technical perspective, foreach by references, also not great. But like from a usability perspective, it's nicer then modifying values by key lookup. Derick Rethans 8:53 I guess it's going to depend on how big the array is, right? I mean, if it's a few elements, it probably doesn't matter. Nikita Popov 8:58 But if you have like a 100,000 element array, then you paying for 100,000 reference wrappers that you don't need afterwards any more. Derick Rethans 9:07 In that case, it's rather better to just modify it through the key that you obtained by doing foreach key as value. Nikita Popov 9:14 Right. But it's also worth noting that foreach reference actually has different semantics then foreach value, because foreach by value works on the copy of the array. Like it's not an actual copy just like semantically. If you modify the array inside the foreach by value loop, then we will copy the array. Doing the modification with a separate key lookup and foreach by value loop will actually copy the array at that point, while foreach by reference takes account modifications of the array. So even if you like add or remove elements in the array in the foreach by reference loop, it will try on the like best effort basis to still iterate on in a reasonable way on the modified array. It's like not a straightforward replacement. Derick Rethans 10:00 It all depends on what people intended to do with it. Right? Do you think there are any further situations that are a bit strange? That could benefit from having some subtle changes to the language semantics? Nikita Popov 10:13 Nothing can who comes to mind immediately. Derick Rethans 10:16 Yeah, I can't think of any either. But I thought maybe maybe have something in the pipeline. Would you have anything else to add to this RFC? Nikita Popov 10:23 Well, one more thing that's discussed in the RFC is the case of complex variables. A little known fact, in the foreach loop, you don't have to assign to a simple variable, you can also assign to something like an object property, or an object property on the result of a function call that that means that in the loop, this function is getting called on every iteration, and then you assign it to a property on the result. So you can do that kind of weird stuff, we allow it. Derick Rethans 10:52 And does it the work without any weird side effects? Nikita Popov 10:56 Depends on what you consider weird, but basically does what you expect as if you had written an explicit assignment to the complex variable. Derick Rethans 11:04 I reckon that's how it's instructed out in the oparray then as well. Nikita Popov 11:07 Yeah, exactly. As far as this RFC is concerned, the problem there is that to unwrap the reference of the loop, we actually have to evaluate the variable again. And if it's a complex variable that might have side effects, for example, the function call. And that's why the RFC says that if the variable is complex, we are not going to do that, like that's probably going to be more unexpected than leaving a reference wrapper around. So we have this extra weird edge case. In the internals discussion, some people already suggested that maybe we should just deprecate support for these kind of complex assignments. One could also mention that an alternative that has been suggested is to actually make the loop variable, scoped to the foreach loop. So we could unset it entirely after the loop, rather than just breaking the reference, which is, of course, a larger change, larger backwards compatibility break. It also doesn't really align with PHP semantics of only having function scope and not block scope. Derick Rethans 12:06 I probably agree without, it's too much of a change to do that. Because then you sort of expect that all the language constructs should have a scope. I mean, it needs to be either one or the other. Nikita Popov 12:15 Yeah, I mean, other languages like JavaScript have solved that by introducing a separate way to declare scoped variables. So that will be "let", just changing the behaviour in one place is probably not a good idea. Derick Rethans 12:30 I probably agree with you though. It was a bit of a shorter RFC this time. That's okay with me. Nikita Popov 12:35 Yes, I used that as an excuse to discuss some foreach behaviour details. Derick Rethans 12:40 Fair enough. Thank you for taking the time this morning to come and talk to me about the references after foreach RFC. Nikita Popov 12:47 Thanks for having me, Derick, once again. Derick Rethans 12:53 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening. I'll see you next time. Show Notes RFC: Unwrap Reference After Foreach Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 93: Never For Parameter Types London, UK Thursday, August 19th 2021, 09:21 BST In this episode of "PHP Internals News" I chat with Jordan LeDoux (GitHub) about the "Never For Parameter Types" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 93. It's been quiet over the last month, so it didn't really have a chance to talk about upcoming RFCs mostly because there were none. However, PHP eight one's feature freeze has happened now, a new RFCs are being targeted for the next version of PHP eight two. Today I'm talking with Jordan LeDoux, about the Never For Parameter Types RFC, the first one targeting this upcoming PHP version. Jordan, would you please introduce yourself? Jordan LeDoux 0:50 Certainly. And thanks for having me. My name is Jordan. I've worked as a developer for about 15 years now. Most of my career has been spent working in PHP. Although professionally, I've had experience working in C#, Python, TypeScript, mostly in the form of JavaScript, but a little bit of Node and, you know, a variety of other languages that I haven't spent enough time in to really be proficient in any real way. But recently, I decided to do something that I have thought about doing for many years, but never actually jumped into which is exploring the PHP engine itself and how I could possibly contribute to it. Derick Rethans 1:32 And here we are, but your first our thing. Jordan LeDoux 1:35 Yeah, it's exciting. Derick Rethans 1:36 What is this RFC about, what does it propose? Jordan LeDoux 1:39 Well, this RFC proposes allowing the never type, which was added in 8.1 as a return value, to parameters for functions and methods on objects. The main idea behind that is that when never was proposed as a return type, it was meant to signal that the function would never return. Not that it returns void, which of course, void signifies which is returning no value or returning, returning without any specified information. And never return signifies that the function will never return, which is a concept that exists in many other languages. And for that purpose in other languages, what's usually used is something called a bottom type. And that's what never ended up being. And I'm proposing that we extend the use of that bottom type to other areas where the type may be helpful. Derick Rethans 2:38 So a bottom type, that might be a new term for many people, it will certainly for me when I looked at the never RFC for as return types. Can you sort of explain what a bottom type is, especially thinking about object oriented theory with something that we'd like to call the Liskov Substitution Principle? And also, how does it apply to argument types? Jordan LeDoux 2:59 Let's start with the Liskov Substitution. The general idea behind Liskov Substitution is that if A is a subtype of B, then anywhere that A exists, you should be able to substitute B. It has to do with when you have a class hierarchy in in an object oriented language, that that class hierarchy guarantees certain things about substitutionality, like whether or not something can be substituted for something else. That affects language design in ways that a lot of programmers are kind of intuitively familiar with, but maybe not familiar with the theory and the ideas behind it more concretely. But LSP is the principle in SOLID, that's the L and SOLID. And it represents a portion of the whole idea of object oriented programming in PHP. Part of being able to substitute one object for another, based on their class hierarchy, and what they implement, and what they provide is there part of their ability to be substituted is whether or not they can fulfil the same kind of contractual requirements of typing. And with Liskov, that means that preconditions can never be strengthened, and post conditions can never be weakened. So a precondition would be a parameter type requirement. If you require that a parameters accepts an object, for instance, in PHP, you can't strengthen that requirement beyond just any object to a particular object. But you can weaken it from an object to an object or an integer with with unions. That's an example of the precondition side of it. The post condition side of it is that you can't, you can't weaken it. So if you have have, you know, if you have a return type of int, you can't have an inherited implementation return int or float, because that broadens the possible return types. They go in opposite directions. And one of them is covariance and one of them is contravariance. Derick Rethans 5:18 I can never remember which one is which. Jordan LeDoux 5:21 Yeah, basically contravariance go up the tree and covariance go down the tree. If you're thinking about widening, or sorry, narrowing. If you're thinking about narrowing, then covariance go down the implementation tree and contravariance go up the implementation tree. Derick Rethans 5:40 Okay, so how does the bottom type fit in here? Jordan LeDoux 5:43 The bottom type in any type system represents like the base type that all types originate from. And the best way in my mind to think about it is kind of just integer math. It's a thing that every programmer is going to be familiar with. And it fits, it fits all right. So you can think of the bottom type as zero integers, a lot of people would think of null as zero if they're thinking about a type system, but null is more like negative one. It's like the entire negative side of the integer system. We could say that the string type is one, and the integer type is two. And the float type is three, and maybe int or float, the union of them is five, which would be the two numbers added together. And if you describe type systems this way, then you can say, hey, if I take any type and add the value that represents the other type, then I get my result type. So if I take zero, the bottom type, and I add one, the string type, what I end up with is one, still the string type. So the bottom type is whatever type system or whatever, whatever type, when you add any other type to it, you get the type you added to it and nothing else, you just get your original thing. That's why it's called the union identity for the type system. The top type in PHP is mixed. And that's the opposite side of it. It's just like zero is the additive identity. One is the multiplicative identity. If you multiply anything by one, you're going to get what you originally had. And if you add anything to zero, you'll get what you originally had. So mixed ends up being, or the top type in general, ends up being the intersection identity, and the bottom type, or never, in PHP's case, ends up being the union identity. And all this is like deep type theory, but most programmers don't have to interact with it. It's more something that affects language design usually. Derick Rethans 7:48 Could you think of mixed as being infinity? Jordan LeDoux 7:50 That's actually with my with my crude integer analogy, yeah, it would be like all types, all possible types are added together. Derick Rethans 7:59 That makes sense then. Okay, so we have explained what the bottom type is, but, and never being the bottom type. So why is it useful to use the bottom type, or never, as this RFC proposes, as a method argument type for parameters? Jordan LeDoux 8:14 The largest benefit has to do with what we were talking about when it comes to covariance versus contravariance. You know, can you strengthen the requirements? Or can you weaken the requirements? When you inherit a method in a system that preserves Liskov Substitution, the parameters can be widened, they can accept more things. If I had an interface that said, it has one parameter, and that parameter is typed as int, then in any implementation, I could say, okay, but actually, the parameter type is int or float, I'm going to accept both. And I could do that in the implementation. But I would have to accept int, because that's part of the contract. That's part of the interface. So I have to accept whatever type is in the interface. I can just additionally add things on top of that. If I make my original definition, my root definition, the bottom type, then I can add any type to it. And I will just get that type. From never, if I had an interface with, you know, a method foo, and it has one argument, and that argument is typed never, I can re-type that argument as int, or I could re-type that argument as string, and all of them would be valid inheritances. Derick Rethans 9:38 So it's a way of getting around having at least one concrete type like int in an interface. Jordan LeDoux 9:44 Right. And in fact, never is a concrete type. It's a concrete type, that means this code can never be called. For return type, it means this type will never return. But if you're, if what you're trying to do is call it then you can never call it. Derick Rethans 10:01 Which is then why never actually makes sense as a type name, because one of my further questions is going to be how does never as a name make sense? But you've now explained it in such a way that it actually does make sense. So there we go. Jordan LeDoux 10:15 One of the questions that did come up in the internals discussion on this so far, has been a round that choosing never, and because a lot of the examples for use cases are around inheritance, it makes some kind of intuitive sense, if you're describing the inheritance behaviour, to name it something like any or anything, or you know, something about its how permissive it is in its inheritance definition. But the type should really describe the code that it's actually written in. You know, getting outside the idea of an interface or an abstract or something like that. If you have a function, and that function types it as never, or whatever you name, the bottom type. There's no data that can satisfy that. Because any data will have some type other than never. It'll have string, or int, or something. Even null has the null type. You can't provide any data to a function that requires never that will satisfy its type requirement. So that code can never be called. It's really when you start considering how does this affect inheritance that you start getting into this concept of Oh, maybe a different word makes sense. But then that doesn't really reflect the opposite side on the return side, when you're talking about covariance instead of contravariance. And that's why the bottom type for most languages is something like never, or nothing, or nil, something along those lines. Derick Rethans 11:48 So if you type an argument to a method as never, the engine will, of course, enforce that you can't call it with any data because it wouldn't satisfy. But would it also automatically make a method abstract so that inheritance inheriting classes have to widen it or not? Jordan LeDoux 12:04 So that was part of my original idea behind the RFC, was forcing something that implements it or something that inherits it, to widen it. However, there were a lot of good arguments about why that may not be a good idea. One is that in PHP, an empty type isn't an empty type, it's mixed, not widening, would actually just be saying the type is mixed, which is valid, you can go from the bottom type to the top type in a contravariant way, that's a totally valid way to do it. The problem is that PHP has a weakly enforced typing system. Like that's only a problem in this context, it's actually a very powerful feature of the language in a lot of other contexts. So we don't necessarily want to get rid of that. In addition to that, the actual mixed type as a literal that can be used in the language was only added in 8.0. It would kind of represent a much larger backwards compatibility break to require it to be explicitly widened. That was part of my original concept for a lot of the reasons that you were just talking about. But for PHP, specifically, it would probably present more problems than it would provide kind of solutions and utility. I've kind of been convinced off that point a little bit by the arguments of others. Derick Rethans 13:22 Would it be possible to instantiate a class that has a method with its argument typed as never? Jordan LeDoux 13:28 Yes, as long as he never called that argument directly.Using a never type in a constructor would definitely be a definitely be a No, no, that would result in a type error. As soon as you try to instantiate the class. Derick Rethans 13:41 It would basically make the constructor private. Jordan LeDoux 13:43 Yeah, you would get a slightly less useful error. Derick Rethans 13:47 Yes. Jordan LeDoux 13:48 Then you do if you make the constructor private, and then try and call it. Derick Rethans 13:53 It makes no sense to do it. The RFC slightly touches on generics. And it goes in a way talking about why this is sort of slightly like generics. Could you explain the interaction between these two concepts? Jordan LeDoux 14:07 Generics is a feature obviously, that a lot of PHP developers want. And it's also a very complicated feature to do. Way outside of what I was willing to consider for, for my first, my first attempt at something useful. One of the most common ways that generics are used, is within an inheritance structure to allow something similar to type widening, particularly for parameters. Being able to say, I want the type to be able to be widened, but I don't know exactly how it will be widened. That's something that generics offer. Generics offer many other features as well and many other capabilities. But that particular one is something that this can do. It comes with a cost though, because this isn't generics. It is not really the right way to do that. It can be done without generating compile errors now, if you use never, that's the main difference. You still would encounter errors in static analysis and IDE hints, for instance. The IDE, he wouldn't be able to tell if you typed against a interface that had never as a parameter type, it wouldn't be able to tell what sorts of types the implementers have. Because that's not really the point of accepting an interface as a type for a parameter, or for a function call, or something like that. The point is that any implementer of this will be an acceptable type. But that means that from a static analysis perspective, it won't know what the type requirements are, because it won't know what concrete implementations are being provided in the code. So obviously, this is a limitation. And this limitation does not exist in a in an actual generics implementation of some kind. I do view it as an improvement personally. And the main reason is that before, if you tried to do something like this, then the errors that you generated and the problems that you caused, were in your code, in its actual execution. Now, the errors and the problems that you need to solve are going to be in your static analysis or in your IDE. And that's, that's something that's a lot safer for the code in general. It can present some maintainability challenges, it can be annoying for developers to deal with, all of that's absolutely true. And it doesn't provide all the things that you would want from being able to do that. It moves the where the safety problems are from being in code execution, where it can cause real problems into code writing, which is where you have the opportunity to kind of think about it, reason about it and catch it. Derick Rethans 16:54 From what I understand is that if you have a class doing implementing some kind of generics, then you'd often expect that where this generic type is used in either argument parameter, or return value, that'd be the same for all the methods, whereas with never, you can of course not enforce it, that it isn't being the same type being used, in all the other places where you would otherwise expect or enforce with generics to be the same type. So I reckon that's one of the differences. I think that was a better explanation that what I read from the RFC. Jordan LeDoux 17:25 That was a explanation and in argument that I wasn't forced to actually articulate until I presented it to other people, because I went through several days of research before even writing the RFC. And sometimes when you do that, particularly for you know, for programming related things, you just absorb the information and you kind of forget about what did I, which things were new, like which things that I just learned, and which things that I already know. You just integrate it all into your new programming knowledge. And so that was definitely something that when I wrote the first draft of the RFC, I didn't, didn't articulate particularly well, because I it just made sense in my head. And I had forgotten: Hey, this is something I didn't know a week ago, I should probably explain it to others. Derick Rethans 18:10 That's why we have the RFC process, right? So that other people also voice the opinion, and perhaps all the slightly confused language that make perfect sense to the author, but not necessarily to other people that read it, right? Jordan LeDoux 18:23 Yes. Derick Rethans 18:24 The RFC kindly mentioned that there's no backwards incompatible changes. So that's always good news, that makes it a little bit easy to accept. What was sort of the biggest pushback against the RFC? Jordan LeDoux 18:36 Initially, actually, the most pushback that I got, when I presented it, was a round choosing never as the name, which I thought would, in my mind, I thought would be something that was barely discussed, actually. But I mean, that again, just like you said, that's why this process exists. So that everybody can actually understand, you know, the things that go into it. I did do the research into that prior, but I couldn't find a single language that had more than one bottom type. The concept of more than one bottom type itself, if you if you go back to the integers concept, that'd be like having what positive zero and negative zero or something like that. It's a concept that just intuitively when you, when you understand how the type system itself works, you feel like okay, so there's probably something wrong from a design perspective, if you have more than one bottom type. It's very easy to not be able to see that intuitively. If you don't go in and take a really deep look at how the type system works or, or how types in general work and what they mean and stuff like that. That was the biggest pushback that I got initially. That mostly just involved explaining the things that I just said about like, why is that the case? Why does that make sense? What are some examples of other languages? Just going into that kind of information. The biggest blocker at the moment, really is about, does it make sense to provide never as a parameter type? If you can't use that statically? If you can't use it in a static analysis situation, then does it make sense to ever use it? And if it doesn't make sense to ever use it, then even if it provides contravariance, is it worth adding? And that's the main discussion that's being had right now, that's not entirely resolved. I think that there is still value in doing that. I think that that argument would carry a lot more weight with me personally, if PHP had a better way of handling the situations where that might happen. That's a much larger undertaking, which I would be interested in, but is also not really not really something that would be as kind from a backward compatibility standpoint. Derick Rethans 20:54 Which we are quite keen on. Jordan LeDoux 20:57 Right, exactly. I don't personally see a way that this type of functionality could be provided, that could satisfy that concern, and not also invalidate enormous amounts of code that currently exist. You kind of have to choose one or the other from my understanding. I will be very pleased if somebody is able to provide me a way to, even if it's a lot of work, provide me a way that that can be accomplished without breaking a lot of code. That was one of my goals is I don't want this, I want this to be an addition to what currently exists, not not something that breaks the current typing that we have, or the way that PHP developers currently interact with the type system or anything like that. That's the current discussion. And I think the thing that is most unresolved. Derick Rethans 21:46 I think you'll have a hard time trying to introduce breaking changes into the language. And I also think rightfully so. Jordan LeDoux 21:54 Yeah, it's it's a good safeguard. And it's a good principle to have in general, I think. I think the way that I described it is that this type of breaking change, the type that would be necessary, wouldn't really be like the type of breaking change you expect in a major version, it would more be like a parallel language syntax. It'd be more like rewriting PHP into a different language with very similar semantics, but very different idea of what the language means underneath. Because fundamentally, in order to do that, typing could no longer be optional anywhere. It would have to be every variable, every piece of data, every function, would have to have an explicit type that the developer is able to control and modify and mutate as they want. I don't even know if type juggling would be possible with the kind of change that would be necessary. And that's, that's half of what PHP consider PHP, so. Derick Rethans 22:55 Definitely not requiring types and places. Because I think, as I said, I think you'll have a hard time convincing people to go that way. Jordan LeDoux 23:02 Which is the reason I didn't consider going that way, really, so. Derick Rethans 23:06 Would you have anything else to add that I'm, that we missed discussing this RFC? Jordan LeDoux 23:10 This RFC is really interesting to me personally, just from an intellectual perspective. I think that for a lot of users, there's not a lot of use cases where you would use it in your own programs. If you did, it would end up being in like the very core base systems, and only in a few places. In those places, it might really shine, it might be something that's absolutely incredible. Most places in most programs you would never be using, well, you would never be using this type. It's much more critical for some of the internal features things like array access, that interface, and the typing that it requires for parameters. The typing for that's pretty broken. This could be a way that we could fix that possibly, and a couple of the other internal engine features that are implemented through interfaces and things like that would also probably be helped quite a bit by this. Derick Rethans 24:10 Thank you very much for taking the time today to talk about the Never For Parameter Types RFC. Jordan LeDoux 24:16 Thank you for having me. Derick Rethans 24:20 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of a PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Never For Parameter Types Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 92: First Class Callable Syntax London, UK Thursday, July 22nd 2021, 09:20 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "First Class Callable Syntax" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 92. Today I'm talking with Nikita Popov about a first class callable syntax RFC that he's proposing together with Joe Watkins. Nikita, would you please introduce yourself? Nikita Popov 0:36 Hi, Derick. I'm Nikita and I am still working at JetBrains. And still working on PHP core development. Derick Rethans 0:43 Just like about half an hour ago when we recorded an earlier episode. Nikita Popov 0:47 Exactly. Derick Rethans 0:48 This RFC has no relation to read only properties. What is the first class callable syntax RFC about? Nikita Popov 0:55 The context here is that PHP has the callable syntax based on literals, which is that if you just use a plain string, it's interpreted as a function name, and an array where the first element is an object, and the second one is a method name, that's methods. Or the first element is the class name, and the second one is method name, that's a static method. Derick Rethans 1:17 I would consider this concept a bit of a hack, especially the the one with the arrays, and I reckon you feel similar and hence this RFC? Nikita Popov 1:27 Yes, I do. So the current callable syntax has a couple of issues. I think the core issue is that it's not really analysable. So if you see this kind of like array with two strings inside it, it could just be an array with two strings, you don't know if that's supposed to actually be a static method reference. If you look at the context of where it is used, you might be able to figure out that actually, this is a callable. And like in your IDE, if you rename this method, then this array should also be this array element will also be renamed. But there's like a lot of complex reasoning that the static analyser has to perform. That's one side of the issue. The second one is that callables are not scope independent. For example, if you have a private method, then like at the point where you create your callable, like as an array, it might be callable there, but then you pass it to some other function. And that's in a different scope. And suddenly that method is not callable there. So this is a general issue with both like this callable syntax based on arrays, and also the callable type. It's a callable at exactly this point, not callable at a later point. This is what the new syntax essentially addresses. So it provides a syntax that like clearly indicates that yes, this really is a callable, and it performs the callable callability check at the point where it's created, and also binds the scope at that time. So if you pass it to a different function in a different scope, it still remains callable. Derick Rethans 3:01 And it's guaranteed to always be callable. Nikita Popov 3:03 Yeah, exactly. Derick Rethans 3:04 What does the syntax like? Nikita Popov 3:06 The syntax is the funny bit. As a bit of context. This proposal was created as an alternative or as a subset of the partial function application RFC. Derick Rethans 3:17 That is just as hard to pronounce as first class callable syntax RFC. Nikita Popov 3:21 Yes, that's why we say PFA. The PFA RFC has a more general feature. It also allows you to create a reference to a callable as a side effect. But more generally, it allows you to also bind some of the arguments to a fixed value. And has like finer control over for example, you can create a callable that has three required parameters, by passing three question mark arguments. While the new syntax only allows you to use the signature of the original function. But the syntax between both of those is compatible. So the new RFC is a subset of PFA. And that's why it uses the syntax where you do a normal function call, but then pass three dots or an ellipsis as arguments. Derick Rethans 4:08 Instead of passing the function's or method's normal arguments, you use the three dots. Nikita Popov 4:14 I think like the way to think about the syntax is that this is similar to like a variadic argument, or to the argument unpacking syntax, just that the arguments haven't yet been provided, they will be provided during the actual call. But I think the syntax was definitely the most contentious bit in the discussion of the RFC. I think this is mainly related to the fact that if you the see this code snippet, it looks a bit like, like the example code where the arguments haven't been filled in. While now this is like actual syntax. Derick Rethans 4:44 I'm sure there's quite a few tutorials out there explaining how PHP works by using dot dot dot. That is not something you can avoid. Nikita Popov 4:54 Well, we can avoid it, but it's fairly tricky question. I mean, the reason for this dot dot dot syntax, on one hand, this the compatibility with partial functions. I mean, the PFA, RFC has recently been declined. But in the future, we could extend the current syntax to full partial functions. And we would not end up with two different ways. So that's one benefit of the syntax. But the other part is that PHP has different symbol tables for different kinds of symbols. People often ask, why can't you just write like strlen as a plain name, not inside a string, and have that be treated as a reference to this function? And the answer to that is that we can't do that because you can't have a constant that's called strlen. Normally, that would be reference to constant and the same actually applies to all other callable types as well. So if you have something like methods, like object or method name, that would right now be interpreted as a property access. And for static methods, it will be interpreted as a as a class constant access. So we have this ambiguity here. Even if we add an additional symbol to this, for example, like for classes, we have the syntax, class name, and then scope operator class, that gives you the class name. We could do something like strlen, scope operator function, or fn, or whatever, and have that return the callable. That would work, but it also has some ambiguities. For example, if you have something like object, arrow methods, and then scope operator fn, you have this ambiguity. Is this referencing the method of that name? Or is it referencing a callable stored inside the property of that name? This is like fundamentally ambiguous. The way we would resolve it is we will just say that this index is only usable with real simple, so it will always refer to a method, and you couldn't use the syntax to convert the callable stored in a property into a proper callable. I'm actually not sure how I should distinguish these two concepts, because we have the existing callable, strings and arrays, and the first class callables, which are really closure objects. Derick Rethans 7:11 Which actually sort of brings me to the next question which just popped in my head, which is: Does this first class scalable syntax, what is returned as return a closure or an existing callable type as we have now, with a callable type being a single string, or this array syntax that we now use. Nikita Popov 7:28 The syntax returns a closure. Actually, the syntax works essentially the same way as the closureFromCallable method. And we do need to return a closure otherwise, we don't get this behaviour where the scope is bound at the time where the callable is created, rather than called. I think maybe going forward, I would generally recommend that people use a closure type, instead of a callable type in type declarations. I mean, you already cannot use callable for property types. Exactly due to this problem that callability is context dependent. While we only forbid it in property types, the same general problem also exists for argument and return types. And especially with the new syntax being introduced here, I think it's best to use closure instead of callable in the future. Derick Rethans 8:18 Does that sort of mean that first class scalable syntax is syntactic sugar? Or does it do more than the closureFromCallable method? Nikita Popov 8:27 No, I think it's effectively just syntactic sugar for closureFromCallable. Derick Rethans 8:34 I'm actually not sure whether Xdebug is be able to do anything with these closure from callable things to begin with. So that is something I'm going to have to investigate. Nikita Popov 8:44 Be able as in like, display that it actually refers to a specific method rather than just some kind of closure? Derick Rethans 8:51 Yeah, because at the moment, it shows you the file name and the line numbers, it doesn't have a name right if you create normal closures, but in this case, it's important to know that it actually refers to specific methods, which is the same thing as the closureFromCallable syntax would also do, but I've never done anything with that. Nikita Popov 9:10 But I think there is a way to get like the underlying prototype for the closure, and you should be able to determine it from there. Derick Rethans 9:18 The first class callable syntax, are there situations where you can't use it? Nikita Popov 9:22 One place where you don't want to use the new syntax as if you don't want to actually create a closure object, and validate callability at the point of creation. For example, creating this first class callable also implies that you have to autoload the class for a static method. If you have some kind of like large definition of of handler, of static handler methods for routes or something like that, then using the first class callable syntax would imply that you have to immediately create closure objects for all of these and immediately load all those classes. That's a use case where you might want to stick with the old syntax. Derick Rethans 10:01 But wouldn't opcache resolve that issue really? Nikita Popov 10:04 No, opcache is really exactly the reason why you wouldn't want to do that. For example, for my fastroute library, I cache all the data as a static array. And that's something that OpCache can cache very efficiently because it's in shared memory and accessing it is essentially zero cost. If you include something like first class callables in it, then those have to always be created at runtime, because we don't have concept like, like a persistent object. That means that this can no longer, I mean, the whole script can be in shared memory, but it still has to be executed always at runtime to construct the whole data structure. And that's going to be less efficient. To give a more clear answer to your question is that the first class callable syntax has a cost when creating the callable, and if you are in a situation where avoiding that cost is really critical for performance, that's why you wouldn't want to use it. Derick Rethans 11:00 And instead you'd have to use the old scalable syntax that we already have. Nikita Popov 11:03 Exactly. So for that reason, I think that the old syntax is not going to be removed in the near future at least, though maybe we can deprecate certain aspects of it. For example, the syntax also allows you to do highly context dependent things like referencing self, which is even worse than the situation with a private method, because self could refer to something different every time you call it. Those are some things we might want to deprecate early, but the main syntax itself was probably going to stay for a while. Derick Rethans 11:34 Because callability is checked when you create the closures does that mean it also checks for strictness then? If your PHP file has been declared with strict types? Nikita Popov 11:45 Strictness is handled the same as with closureFromCallable.The strictness is still determined at the time where the call is made, not where the callable was created, which actually, I am not a fan of how PHP handles strict types together with dynamic calls. But that's like a pre existing problem. And this isn't touching on this. Derick Rethans 12:06 The language has many issues that probably could have been done better if it was designed from scratch. But that ship has sailed 26 years ago. Nikita Popov 12:15 The strict types are not quite that old. Derick Rethans 12:17 No, that is true. The language itself is of course. Derick Rethans 12:23 Okay, thank you very much then, for taking the time this morning to talk to me about first class scalable syntax. Nikita Popov 12:29 Thanks for having me, Derick. Derick Rethans 12:30 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: First Class Callable Syntax RFC: Partial Function Application Episode #89: Partial Function Applications Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 91: is_literal London, UK Thursday, July 15th 2021, 09:19 BST In this episode of "PHP Internals News" I chat with Craig Francis (Twitter, GitHub, Website), and Joe Watkins (Twitter, GitHub, Website) about the "is_literal" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 91. Today I'm talking with Craig Francis and Joe Watkins, talking about the is_literal RFC that they have been proposing. Craig, would you please introduce yourself? Craig Francis 0:34 Hi, I'm Craig Francis. I've been a PHP developer for about 20 years, doing code auditing, pentesting, training. And I'm also the co-lead for the Bristol chapter of OWASP, which is the open web application security project. Derick Rethans 0:48 Very well. And Joe, will you introduce yourself as well, please? Joe Watkins 0:51 Hi, everyone. I'm Joe, the same Joe from last time. Derick Rethans 0:56 Well, it's good to have you back, Joe, and welcome to the podcast Craig. Let's dive straight in. What is the problem that this proposal's trying to resolve? Craig Francis 1:05 So we try to address the problem where injection vulnerabilities are being introduced by developers. When they use libraries incorrectly, we will have people using the libraries, but they still introduce injection vulnerabilities because they use it incorrectly. Derick Rethans 1:17 What is this RFC proposing? Craig Francis 1:19 We're providing a function for libraries to easily check that certain strings have been written by the developer. It's an idea developed by Christoph Kern in 2016. There is a link in the video, and the Google using this to prevent injection vulnerabilities in their Java and Go libraries. It works because libraries know how to handle these data safely, typically using parameterised queries, or escaping where appropriate, but they still require certain values to be written by the developer. So for example, when using a query a database, the developer might need to write a complex WHERE clause or maybe they're using functions like datediff, round, if null, although obviously, this function could be used by developers themselves if they want to, but the primary purpose is for the library to check these values. Derick Rethans 2:05 That is a method of doing it. What is this RFC adding to PHP itself? Craig Francis 2:09 It just simply provides a function which just returns true or false if the variable is a literal, and that's basically a string that was written by the developer. It's a bit like if you did is_int or is_string, it's just a different way of just sort of saying, has this variable been written by the developer? Derick Rethans 2:28 Is that basically it? Craig Francis 2:30 That's it? Yeah. Joe Watkins 2:32 It would also return true for variables that are the result of concatenation of other variables that would pass the is literal check. Now, this differs from Google, because they introduced that at the language level, but not only at the language level, at the idiom level. So that when you open a file that's got queries in PHP, commonly, if they're long, basic concatenation is used to build the query and format it in the file so that it's readable. So that it wouldn't really be very useful if those queries that you see everywhere in stuff like PHPMyAdmin, and WordPress, and Drupal and just normal code weren't considered literal, just because they're spread over several lines with the concatenation operator. It's strictly not just stuff that's written by the programmer, but also stuff that was written by the programmer or concatenated, with other stuff that was written by the programmer. Derick Rethans 3:33 Now in the past, we have seen something about adding taint supports to PHP, right? How is this different, or perhaps similar, to taint checking? Craig Francis 3:44 At the moment today, there is a taint extension, which is something you need to go out your way to install, and actually learn about and how to use. But the main difference is that taint checking goes on the basis of say, this variable is safe or unsafe. And the problem is that it considers anything that had been through an escaping function like html_entities as safe. But of course, the problem is that escaping is difficult. And it's very easy to make mistakes with that. A classic example is if you take a value from a user, an SSH SSH, their homepage URL, if you use HTML encoding, and then put it into the href attribute of a link, that can also result in HTML injection vulnerability, because the escaping is not aware of the context which is used. Because if the evil user put in a JavaScript URL, that is in inline JavaScript, that has created a problem because taint checking would assume that because you use HTML encoding it is safe, and all I'm saying is that is it creates a false sense of security. And by stripping out all that support for escaping, it means that you can focus on libraries doing that work because they know the context, they understand the domain, and we can just keep it a much simpler, and much safer approach. Derick Rethans 5:02 Would you say that the is_literal feature is mostly aimed at library authors and not individual developers? Craig Francis 5:09 Yeah, exactly. Because the library authors know what they're doing. They're using well tested code, many eyes over it. The problem libraries have at the moment is that they trust the developer to write things themselves. And unfortunately, developers introduce a lot of injection vulnerabilities with those strings before they even get into the library. Derick Rethans 5:30 How would a library deal with with strings that aren't literal then? Craig Francis 5:35 So it really depends on each individual example. And the RFC does include quite a lot of examples of how each one will be dealt with. The classic one is, let's say you're sorting by a column in a database, because if we're dealing with SQL, the field name might come from the user. But that is also quite a risky thing to do if you start including whatever field name the user wrote. So in the RFC, I've created a very simple example where the developer would create an array of fields that you can sort by, and then whatever the user provides, you search through that array, and you pull out the one that you that matches and is fine. And therefore you are pulling out a literal and including into the SQL. To be fair, these ones are quite unique. And each one needs to be dealt with in its own way. But I've yet to find an example where you can't do it with a literal. Having said that, I think Larry Garfield actually gave an example where a content management system changed its database structure. And the way that would work is the library would have to deal with it, they would receive the value for a field, and then that field would be escaped and treated as a field, it understands it as a field, and it will process it as such, then it can include into the SQL, knowing full well that everything else in that SQL is a literal, and then it can just build up SQL in its own way internally. Derick Rethans 6:58 Okay, talking a little bit about the implementation here. Since PHP seven, we have this concept of interned strings, or maybe even before that actually, I don't quite remember. Which is pretty much a flag on each string and PHP that says, this's been created by the engine, or by coconut. Why would strings have to have an extra flag here to remember that it is created by the programmer? Joe Watkins 7:21 Well, interned does not mean literal. It's an optimization in the engine, should we use strings. We're free to do whatever we want with that. At the moment, it by happenstance, most interned strings are those written by the programmer. If you think about the sort of strings that are written by the programmer, like a class name, when those things are declared internally, by an extension, or by core code, those things are interned as if they were written by the programmer. They don't mean literal, we're free to use interned strings for whatever we want. For example, a while ago, someone suggested that we should intern keys while JSON decoding or unserializing. It didn't happen, but it could happen. And then we'd have the problem of, well, how do we separate out all this other input. There is another optimization attached to interned strings, which is one character strings, where if you type only one character, or you call a Class A or B, or whatever, the permanent interned string will be used. That results in when the chr function is called, that results in the return of that function always being marked as interned. So it would show as literal, which is not a very nice side effect. And that's just a side effect that we can see today. We don't want to reuse the string really, it does need to be distinct. Also, if you're going to concatenate, whether you do it with the VM or a specific function, obviously, you need to be able to distinguish between an interned string and a literal string, which interned means it has a specific life cycle and specific value. And we can't break that. Derick Rethans 9:00 So there are really two different concepts, is what you're saying, and hence, they need to have a special flag for that? Joe Watkins 9:06 Yeah, they're very, two very separate concepts. And we don't we don't want to restrict the future of what interned strings may be used for. We don't want to muddy the concept of a literal. Derick Rethans 9:16 Of course, any sort of mechanism that languages built into solve or prevent injections in any sort of form, there's always ways around it. Theoretically, how would you go around the is_literal checks to still get a user inputted value into something that passes the is_literal check? Craig Francis 9:36 Generally speaking, you would never need it because the library should know how to deal with every scenario anyway. And it's not that difficult. We're only talking about things like in the database world, you'll be taking value from field names and therefore it should receive field names or table names. And, you know, we are providing a guardrail as a safety net. And what should happen is that the default way in which programmers work should guide them, to do it the right way. We're not saying that you can't do weird things to intentionally work around this. A really ugly version, which you should never do, but use eval and var_export together, it's horrible. But if you are so desperate, you need to get around this. That's what we're doing it. But in reality, we can't find any examples where you'd actually need to do this. Joe Watkins 10:22 I would say that, hey, there's this idea that most people writing PHP are using libraries, and they're using frameworks. I don't actually find that to be true. I've been working in PHP for a long time. And most of the big projects I've worked on for a long time did not start out using frameworks. And they did not start out using libraries. They look a bit like that today, but their core, they are custom. There may be a framework buried in there. But there is so much code that the framework is a component and is not the main deal. Most code, we actually do write ourselves, because that's what we're paid to do. I think we don't decide how people are going to use it, and we don't decide where they're going to use it. The fact is, like Craig said, it's a guardrail that you can work around easily. And if you find a use case for doing that, then we shouldn't prejudge, and say, well, that's the wrong thing to do. It might not be the wrong thing to do. For example, an earlier version of the idea included support for integers. We considered integers safe, regardless of their source. If you wanted to do that, in your application, you could do that very easily and still retain the integrity of the guardrail is not compromised. I wouldn't focus on this is for libraries, and this is for frameworks, because these things become so small in the scheme of things that they're meaningless. I mean, most of the code we work on is code that we wrote, it is not frameworks. Derick Rethans 11:48 That also nicely answers my next question, which is what's happened to integers, which have now nicely covered. The RFC talks about that as hard to educate people to do the right thing. And that is_literal is more focused, so to say, on libraries, and perhaps query building frameworks as the RFC alludes to. But I would say that most of these query building tools or libraries already deal with escaping from input value. So why would it make sense for them to start using is_literal if you're handling most of these cases already anyway? Craig Francis 12:24 If you look at the intro of the RFC, there's a link to show examples of how libraries currently receive the strings. And you're right about the Query Builder approach is a risky thing, I would still argue it's an important part. That's why libraries still provide them. Doctrine has a nice example of DQL. The doctrine query language is an abstraction that they've created, which is also vulnerable to injection vulnerabilities. And it gives the developer a lot more control over a very basic API. I still think people should try and use the higher level API's because they do provide a nice safe default, but that depends on which library use, they're not always safe by default. So for example, when you're sort of saying: I want to find all records where field parameter one, is equal to value two, a lot of the libraries assumed that the first parameter there is safe and written by the developer. They can't just necessarily simply escape it as though it's a field because that value might be something like date, bracket, field, bracket, and it's sort of relying on the developer to write that correctly, and not make any mistakes. And that hasn't proven to be the case, you know, they do include user values in there. Derick Rethans 13:43 Just going back a little bit about some of the feedback, because feedback to the RFC has happened for quite some time now. And there were lots of different approaches first tried as well, and suggested to add additional functions and stuff like that. So what's been the major pushback to this latest iteration of the RFC? Joe Watkins 14:01 So I think the most pushback has come from an earlier suggestion that we could allow integers to be concatenated and considered literal. We experimented with that, and it is possible, but in order to make it possible, you have to disable an optimization in the engine, that would not be an acceptable implementation detail for Dmitri. It turns out we didn't actually, we don't need to track their source technically, but it made people extremely uncomfortable when we said that, and even when we got an independent security expert to comment on the RFC, and he tried to explain that it was no problem, but it was just not accepted by the general public. I'm not sure why. Derick Rethans 14:45 All right. Do you have anything to add Craig? Craig Francis 14:48 The explanation given by people is they liked the simpler definition of what that was as if it's a string written by the developer. Once you start introducing integers from any source, while it is safe, it made people feel, yeah, what is this. And that's where we also had the slight issue because we had to find a new name for it. And I did the silly thing of sort of asking for suggestions, and then bringing up a vote. And then we had, I think it's 18 to three people saying that it should be called is_trusted, and you have that sinking moment of going, Oh, this is going to cause problems, but hey, democracy. It creates that illusion that it's something more. So that's why we sort of went actually, while I like Scott's idea of having the idea of maybe calling it is_noble. It is a vague concept, which people have to understand. And it's a bit strange. Whereas going back to the simpler, original example, they've all seem to grasp grasp of that one. And we could just keep with the original name of is_literal, which I've not heard any real complaints about. Derick Rethans 15:53 I think some people were equivalenting is_trusted with something that we've had before in PHP called Safe mode, which was anything but of course. Craig Francis 16:02 Yes, no, definitely. Derick Rethans 16:03 We're sort of coming to the end of what to chat about here. Does the introduction of is literal introduce any BC breaks? Craig Francis 16:11 Only if the user land version of is_literal, which I'm fairly sure is going to be unlikely. So on dividing their own function called that. Derick Rethans 16:18 Did you check for it? Craig Francis 16:20 Yes. Derick Rethans 16:21 So if you haven't found it, then it's unlikely to to exist. Craig Francis 16:24 There are still private repositories, we can't shop through all their show, check through all their code. But yeah. Derick Rethans 16:29 Did I miss anything? Craig Francis 16:31 We covered future scope, which is the potential for a first class type, which I think would be useful for IDs and static analysers. But this is very much a secondary discussion, because that could build on things like intersection types, but we still need to focus on what the flag does. And there's also possibility of using this with the native functions themselves, but we do have to be careful with that one, because, you know, we got things like PHPMyAdmin. We have to be able to make the output from libraries as trusted because they're unlikely to still be providing a literal string at the end of it. So that's a discussion for the future. And the only other thing is that, you know, the vote ends on the 19th of July. Derick Rethans 17:08 Which is the upcoming Monday. How is the vote going? Are you confident that it will pass? Craig Francis 17:13 Not at the moment, we're sort of trying to talk to the people who voted against it. And we've not actually had any complaints as such. The only person who sort of mentioned anything was saying that we should rely on documentation and the documentation is already there. And it's not working. I think a lot of people just voted no, because they just sort of going well, that's the safe default. I don't think it's necessary. Or, you know, I'd like the status quo. And we still are trying to sell the idea and say: Look, it's really simple. It's not really having a performance impact. And it can really help libraries solve a problem, which is actually happening. Derick Rethans 17:46 Is this something that came out of the people that write PHP libraries or something that you came up with? Craig Francis 17:52 So I've come gone to the library authors and suggested you know, this is how Google do it. Would you like something similar? And we've certainly had red bean and Propel ORM saw show positive support for that. And I've also talked to Matthew Brown, who works on the Psalm static checking analysis. He's very positive about it, so much so that Psalm now also includes this as well. Obviously, static analysis is not going to be used by everyone. So we would like to bring this back to PHP so that libraries can use it without relying on all developers using static analysis. Derick Rethans 18:25 Thank you very much. Glad that you were both here to explain what this is_literal RFC is about. Craig Francis 18:31 Thank you very much, Derick. Joe Watkins 18:33 Thanks for having us. Derick Rethans 18:37 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: is_literal Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 90: Read Only Properties London, UK Thursday, July 8th 2021, 09:18 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Read Only Properties" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 90. Today I'm talking with Nikita Popov about the read only properties version two RFC that he's proposing. Nikita, would you please introduce yourself? Nikita Popov 0:33 Hi, Derick. I'm Nikita and I do PHP core development work by JetBrains. Derick Rethans 0:39 What does this RFC proposing? Nikita Popov 0:41 This RFC is proposing read only properties, which means that the property can only be initialized once and then not changed afterwards. Again, the idea here is that since PHP 7.4, we have typed properties. A remaining problem with them is that people are not confident making public type properties because they still ensure that the type is correct, but they might not be upholding other invariants. For example, if you have some, like additional checks in your constructor, that string property is actually a non empty string property, then you might not want to make it public because then it could be modified to an empty value for example. One nowadays fairly common case is where properties are actually only initialized in the constructor and not changed afterwards any more. So I think this kind of mutable object pattern is becoming more and more popular in PHP. Derick Rethans 1:35 You mean the immutable object? Nikita Popov 1:37 Sorry, immutable. And read only properties address that case. So you can simply put a public read only typed property in your class, and then it can be initialized once in the constructor and you can be... You don't have to be afraid that someone outside the class is going to modify it afterwards. That's the basic premise of this RFC. Derick Rethans 1:57 But it also means that objects of the class itself can modify that value any more, either. Nikita Popov 2:01 Exactly. So that's, I think, a primary distinction we have to make. Genuinely, there are two ways to make this read only concept work. One is like actually read only or maybe more precisely init once, which is what this RFC proposes. We can only set that once and then even in the same class, you can't modify it again. And the alternative is the asymmetric visibility approach where you say that, okay, only in the public scope, the property can only be read, but in the private scope, you can modify it. I think the distinction there is very important, because read only property tells you that it's genuinely read only, like, if you access a property multiple times in sequence, you will always get back the same value. While the asymmetric visibility only says that the public interface is read only, but internally, it could be mutated. And that might like be, you know, intentional, just that you want to like have your state management private, but that the property is not supposed to be immutable. Derick Rethans 3:05 How's this RFC different from read only properties, version one? Nikita Popov 3:09 Read only properties version one was called write once properties. I think the naming is kind of one of the more important differences. The new RFC is also effectively write once, but I think it's really important to view it from an API perspective as read only because that's what the user gets to see. While write once gives you this impression, that is know that you can externally from outside the class like passing the value once I know like dependency injection, that is what they would think of when they hear write ones. And from the technical site, there is a related difference. And that difference is that new RFC only allows you to initialize read only properties inside the class scope. That means if you do something really weird, like leaving a property uninitialized in the constructor, it's not possible for someone outside the class to initialize it instead, just like an extra safety check. Of course, you can, as usual bypass that, like if you're writing a serializer or hydrator, you can use reflection to initialize it outside the class. But normally you won't be able to. Derick Rethans 4:13 Does that mean that these read only properties can also be initialized from a normal method instead of just from the constructor? Nikita Popov 4:19 That's true. Yes, that's possible. Derick Rethans 4:21 So the RFC talks about that a read only property cannot be assigned from outside a class. Does that mean it can be set by a different object of the same class? Nikita Popov 4:29 Yeah, that's how scoping works in PHP. Scoping is always class based, not object based, it's a common misconception that if you have like private scope, you can access different objects of the same class. Derick Rethans 4:42 That was a surprise to me the first time I ran into that, but once you know, it's obvious that it's should work all over the place right. Now, the RFC states that you can only use read only with typed properties. Why is that? Nikita Popov 4:55 So this is related to the initialisation concept, that typed introduced. Typed properties start out, if you don't give them a default value, they start out in an uninitialized state. And we reuse that state for read only properties. You can only assign to the property while it's uninitialized. And once it's initialized, you cannot assigned to it any more or even unset it back to an uninitialised state. For non typed properties, you also can get into this uninitialized state by explicitly unsetting it. But the problem is that this is not the default state. Untyped properties always have no default value, even if you don't specify one, which effectively means that these properties are always initialized. So they will be kind of useless if you used read only with them. Which is why we make this distinction to avoid any confusion. And if you want to use an untyped read only property, you do that by using the mixed type, which is the same but has the initialisation semantics of typed properties. Derick Rethans 5:56 What would that mean if you have say, a resource or class typed property with a read only keyword? Can you not read or write to a resource any more? Or modify properties on an object, that is the value of a read only typed property? Nikita Popov 6:12 No, you can still modify those, because we have to distinguish the concepts of like exterior and interior mutability here. So objects and resources are... Well, I mean, we often say they are passed by reference, which is not strictly true, because those are not PHP references. But the important part is that they only pass around some kind of handle and you can still modify the inside of that handle. What you can't do is you can't reassign to a different resource or reassign to a different object, it's always the same object, the same resource, but the insides of the objects, those can change. Of course, if your object also only contains read only properties, then you won't be able to change this. Derick Rethans 6:55 Okay, and that answered another question that I have: how it possible to make the whole object read only or on all of its properties? Where the answer is by setting all the properties to read only. Nikita Popov 7:05 We could like add a read only class modifier that makes all the properties implicitly read only but maybe future scope. Derick Rethans 7:13 Is there a reason why read only properties can't have a default value? Nikita Popov 7:18 So this is, again, same issue with initialization. If they have a default value, then they're already initialized. So you can't overwrite it. Like we could allow it, but you just would never be able to change them from the default value, which is something we could allow it just wouldn't be very useful. Derick Rethans 7:36 in PHP, eight zero PHP introduced promoted or constructor promoted properties, I think it's the full name of it. How does this read only property tie in with that? Because can you set the read only flag on a constructor promoted property? Nikita Popov 7:49 Yeah, you can. So that works as expected. Important bit there is that with the promoted properties, you can set the default value. And the reason is that for promoted properties, the default value is not the default value of the property. It's the default value of the parameter. And that works just fine. Derick Rethans 8:07 But again, it would be a constant. Nikita Popov 8:10 No, in that case, it's not a constant, because it's just a default value for the parameter and the code we generate, we just assign this parameter to the property IN the constructor. Derick Rethans 8:19 Which means that if you instantiate the class with different arguments, they would of course, override the default value of the constructor argument value, right? Nikita Popov 8:28 That's right. If you pass it explicitly, then we use the explicit value. If you don't pass an argument, then we use the default of the argument, but it still gets assigned to the property through the like automatically generated code for the promotion. Derick Rethans 8:42 Promotion isn't actually... there isn't actually really a language feature, but more of a copy and paste mechanism. Nikita Popov 8:50 Yes, this is pure bit of syntax sugar. Derick Rethans 8:53 Which sometimes can be handy. But all kinds of interesting things that we added to properties or type system in general, usually inheritance comes into play. Are there any issues here with read only and inheritance? What does it do to variants or traits or things like that, all the usual things that we need to take into consideration? Nikita Popov 9:13 Most of our rules for properties, most of our inheritance rules for properties are invariant, which means that the property in the child class has to basically look the same as the property in the parent class. And this is also true for read only properties. So we say that if the parent property is read only the child property has to be read only, and the same the other direction and for the type as well. So we say that the type of the read only property has to match with the parent property. Those rules are very conservative and like they could theoretically maybe be relaxed in some cases. For read only properties on read only is like kind of return type and return types in PHP are covariant. So one could argue that the type should also be covariant here The problem is that like here, we get into this little detail that read only is really not read only, but init once. So there is this one assignment. And assignment is more like an argument. So is contravariant. So if we made them covariant, then we would basically lie about this one initializing assignment. And we don't like to lie about these things, at least without some further consideration. Derick Rethans 10:24 Read only doesn't change the variance rules at all, which is different from the property accessors RFC we spoke about a couple of months ago. Talking about that this is actually a competing RFC, or do they tie together? Nikita Popov 10:37 Well, I should first say that probably the variance stuff defined in the accessors RFC maybe isn't right, for the same reason and should also like just keep things invariant. But it's more complicated there because it also has abstract properties, or abstract accessors. And then the abstract case, that's where the different variance rules are safe. To get back to your question, they are kind of competing, but could also be seen as just complimentary. So read only is basically a small subset of the accessors feature. I mean, accessors also allow you to implement read only properties as part of a much more general framework. And read only properties cover like only this small but very useful corner in a way that's much simpler. And I think that's not just simpler from a technical perspective, but also simpler to understand for the programmer, because there are a lot of less edge cases involved. Derick Rethans 11:32 Because we're getting pretty close to feature freeze now. Are you still intending to take the property accessors RFC forward for PHP eight one? Nikita Popov 11:41 No, definitely not. Derick Rethans 11:43 But you have taken the read only properties one forwards because we're already voting on it. At the moment, it looks like the vote is going to pass quite easily. So there's that. We don't have to speculate about that, like we had to do with previous RFCs. Do you think I've missed anything talking about read only properties? Or have we covered everything? Nikita Popov 12:01 We've missed one important bit. And that's the cloning. So the read only properties RFC is not entirely uncontroversial. And basically all the controversy is related to cloning. The problem are wither methods as they're used by PSR seven, for example, which are commonly implemented by cloning the original object, and then modifying one property in it. This doesn't work with read only properties, because you know, if you clone the object, then it already has all the properties initialized. So if you try to change something, then you'll get an error that you're modifying read only property. So these patterns are pretty fundamentally incompatible. There are possible solutions to the problem primarily a dedicated syntax that goes into the code name of cloneWith where you clone an object, but override certain properties directly as part of the syntax, which could bypass the read only property checks. The contention here is basically there are some people consider this clone support and these wither methods are very important. Well, I mean, read only properties are about immutable objects. And like PSR seven is a fairly popular read only object pattern, they think that without support for cloning or for withers, the feature loses all value. That's why the alternative suggestions are either to first introduce some more cloning functionality. So like this mentioned, cloneWith or instead implement asymmetric visibility, which does not have this problem because inside the class, you can always modify the property. So it works perfectly fine with cloning. My personal view on this is well I personally use cloning in PHP approximately never. So this is not a big loss for me, and I'm happy for this from my perspective, edge case, to be addressed at a later time. I can understand that other people put more value on this then I do. Derick Rethans 13:55 There doesn't seem to be enough push against that for this RFC to fail. So there is that. Thanks Nikita for explaining the read only properties RFC which will very likely see in PHP eight one. Thanks for taking the time. Nikita Popov 14:08 Thanks for having me Derick, once again. Derick Rethans 14:13 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Read Only Properties RFC: Write Once Properties Episode #44: Write Once Properties Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 89: Partial Function Applications London, UK Thursday, June 17th 2021, 09:17 BST In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Joe Watkins (Twitter, GitHub, Blog about the "Partial Function Applications" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 89. Today I'm talking with Larry Garfield and Joe Watkins about a partial function application RFC that they're proposing with Paul Crevela and Levi Morrison. Larry, would you please introduce yourself? Larry Garfield 0:36 Hello World. I'm Larry Garfield or Crell on most social medias. I'm a staff engineer for Typo3 the CMS. And I've been getting more involved in internals these days, mostly as a general nudge and project manager. Derick Rethans 0:52 And hello, Joe, would you please introduce yourself as well? Joe Watkins 0:55 Hi, I'm Joe, or Krakjoe, I do various PHP stuff. That's all there is to say about that really. Derick Rethans 1:02 I think you do quite a bit more than just a little bit. In any case, I think for this RFC, you, you wrote the implementation of it, whereas Larry, as he said, did some of the project management, I'm sure there's more to it than I've just paraphrased in a single sentence. But can one of you explain in one sentence, or if you must, maybe two or three, what partial function applications, or I hope for short, partials are? Larry Garfield 1:27 Partial function application, in the broadest sense, is taking a function that has some number of parameters, and making a new function that pre fills some of those parameters. So if you have a function that takes four parameters, or four arguments, you can produce a new function that takes two arguments. And those other two you've already provided a value for in advance. Derick Rethans 1:54 Okay, I feel we'll get into the details in a moment. But what are its main benefits of doing this? What would you use this for? Larry Garfield 2:01 Oh, there's a couple of places that you can use partial application. It is what got me interested. It's very common in functional programming. But it's also really helpful when you want to, you have a function that like, let's say, string replace takes three arguments, two of which are instructions for what to replace, and one of which is the thing in which you want to replace. If you want to reuse that a bunch of times, you could build an object and pass in constructor values and save those and then call a function. Or you can just partially apply string replace with the things to search for, and the things to replace with and get back a function that takes one argument and will do that replacement on it. And you can then reuse that over and over again. There are a lot of cases like that, usually use in combination with functions that wants a callback. And that callback takes one argument. So array map or array filter are cases where very often you want to give it a function that takes one argument, you have a function that takes three arguments, you want to fill in those first ones first, and then pass the result that only takes one argument to array map or a filter, or whatever. So that's the one of the common use cases for it. Derick Rethans 3:15 That's the benefits and some of its background comes from functional programming, as you've just mentioned. What is the syntax that you're proposing and some of the semantics? Larry Garfield 3:26 The syntax that we've developed, are two placeholders that you can use in a function call. So if you're calling a function as you normally would, but for one of the arguments, you pass a question mark, or at the tail end, you have an ellipsis (dot dot dot), then that tells the engine: This is not a function call. This is a partial application. And what it will do is return not the result of the function but return a closure object that has the the arguments that correspond to those question marks. And then when called with those arguments, we'll pass those along with the original function. Probably easier to explain, if I use a concrete example, using the string replace example we talked about before, you would call it with str_replace, the example from the RFC, hello, hi, question mark. What that gives you is a callable, a closure that has one argument, which will take its type and name from str_replace. So the third argument to str_replace essentially gets copied into that closure. And what closure does internally when you call it with that one argument is it just calls string replace with hello, hi, and whatever argument you gave it and returns that value. It is conceptually very, very similar to just writing a short lambda or an arrow function that takes one arguments and calls string replace hello, hi, and that argument. In most cases, it ends up functioning almost exactly like that. There's a few subtle differences in a few places. But most of the time, you can think of it working essentially like that. The question mark means one required argument only. The dot dot dot means zero or more arguments, if you want to, say provide the first argument to a function, and then dot dot dot would mean: And then all of the other arguments, however many there are, even if it's that zero, those are what's left, which languages other languages that have partial application as a first class feature, usually end up doing it that way where you can only pre fill from the left. PHP, because the placeholder lets us do it in any order. So we can skip over arguments if we want to, which is quite nice. But it means that you can take a function and reduce it to, I want to prefill just these two arguments and leave these three arguments for the new function, or I want to prefill these arguments from the left, and then everything else, whatever it is, is left. It also lets you do cute things like if you provide all of the arguments to a function, and then just tack on a dot dot dot the end of it, then you get back a closure that takes essentially zero arguments. But when called, will call that other function. So it's lets lets you really easily build a delayed function as you need to. Derick Rethans 6:15 When do the arguments to the function get evaluated then? Larry Garfield 6:18 Arguments are evaluated in advance. So this is the subtle difference between partial application and the short lambda syntax. In a short lambda, what happens is, essentially, that entire expression on the right hand side gets wrapped up into a closure. And so any arguments that are compound like they have a function call that is inside one of the placeholders, or one of the arguments, that'll get evaluated later. With partial application, the function that is in a parameter position gets evaluated first and reduced to a value. And that value gets partially applied to the function. 90% of the time, that's not going to be an issue. There are a few cases where doing it one way or the other may be subtly different, but you'll spot those fairly easily. Derick Rethans 7:02 So the RFC talks about things that you can do, but also a few things that you cannot do or don't want to do yet. What are these things that partials won't support, or run support yet, at least? Larry Garfield 7:13 The main thing that it doesn't support is named placeholders. You can pre fill a value or an argument with a named named argument. But not a named placeholder. Those have to be positional. Named placeholders are complicated to implement, and run into a question of, if you provide those in a different order, does that also change the order of the arguments in the partially applied function that you get back in that closure? And there's a good argument to be made that either way is logical. And so we're like, no, does not deal with it, too complicated. We'll just positional only. And you cannot specify an optional arguments either. It's just again, too complicated. Things get too weird. If you have those advanced cases, use our short lambda, that works just fine. If you want to just make a new function that defers to a new function, and change its API in the process, short lambda works fine. And it's still quite short. Derick Rethans 8:13 I know the RFC talks a little bit about references, but I don't like talking about references. So let's skip that part. In my opinion, they should be removed from the language. But I know we can't. Larry Garfield 8:22 There's occasionally used for them. But very occasionally. Derick Rethans 8:25 There's a bunch of technical things that I also want to chat about. And hopefully, Joe, if you want to fill in, I'd be more than welcome to hear your opinions on these things. But the first one is that PHP has this thing called func_get_args. How does that work with these partials? How does that tie in together? Joe Watkins 8:42 It should mostly behave as if you've invoked the function directly. We don't want there to be a huge discrepancy between. The callee know whether they've been called through partial application or complete application. It should be the same. Derick Rethans 8:58 That is good to know. I mean, I always like it how things work as people expect them to work, right? Joe Watkins 9:03 Yeah. Derick Rethans 9:04 We already have used the dot dot dot operator for variadics. But you're reusing the dot dot dot, or ellipses, as you more eloquently call it earlier. Here again, as well, is that not going to cause issues? Or does that tie in well together? Joe Watkins 9:18 Well, there's quite a lot of debate about what's the right symbol to use. I think it's dot dot dot, and I think Larry agrees with me. But there's some people who want to stick an extra question mark on the end, which to me looks like it reads zero to one. And to Larry, it looks like an extra character that's just not needed. Other people say it makes sense for them. But if you can type three characters and not four, I mean, you need a really good argument. The arguments that have been put forward so far don't really make very much sense for me. Maybe we should ask that question and it doesn't really matter. In the end, what the syntax is, is if it's a difference between it getting in and not getting in, then we'll just put the extra question mark on there. I don't really have a really good argument to change it like to be like that. Derick Rethans 10:05 To be honest, to me, it looks like you then have two placeholders. Joe Watkins 10:09 Yeah. Derick Rethans 10:10 I don't feel the need for it. Joe Watkins 10:11 That's also another argument because we've introduced this one symbol, and then this other symbol, and then you put them together. And that's two things. I mean, you can't have one and one equals one. Derick Rethans 10:20 Fair enough. The RFC does touch on another quite interesting thing, I think, which is constructors, which it also be able to partially apply. But of course, you've mentioned that, that arguments get applied immediately when you do the substitution, when you do the partial application. But of course, the constructor is a bit weird because a constructor runs immediately after an object has been constructed. So how does that work together with partials? Joe Watkins 10:47 So at first, we made it so like if you invoke a constructor with reflection, and you just invoke it over and over again, it'll invoke it on the same object, you won't get back a new object. It's not the constructor that returns the object, it's the new operator. So first, we had a bit dumb. And we did just like what reflection does. And if you applied to a constructor, you'd get back a closure that just repeatedly invokes the constructor, which is, as Larry called it, quite naive. So we went back and revisited that. And so now it acts like a factory. Every time you invoke the closure return from an application, you get a brand new object, which is more in line with what people expect. And it's also quite cool. It's one of my favourite bits, actually as it turns out. Derick Rethans 11:31 In my opinion, it also makes more sense than then having an apply to the same object over and over again. Whether I'd like it or not, I don't know yet. Joe Watkins 11:39 Oh, the other option is traditional constructors to avoid the surprising behaviour. But that would be just a strange. Larry Garfield 11:45 There are a lot of use cases where you want to take a bunch of values, convert them to objects using an array map, supporting constructors for that makes total sense to me. Derick Rethans 11:54 And I would probably say, though, that I would prefer not allowing it over it applying over the same object over again. You've touched a little bit on some common cases where you want to use this, do you perhaps have some other ideas where this might be really useful? Larry Garfield 12:10 So there's three use cases that we think are probably going to be the lion's share. One is to just use the dot dot dot operator. So you have some function or method call, call it with dot dot dot, and that's it. You prefill nothing, which gives you back a closure that is identical in signature to the function or the method that you're applying it to. Everything we've said about functions applies the methods here as well. Which means we now effectively have a new way to refer to a function or a method and make a callable out of it, that doesn't involve just sticking it into a string. You just say, hey, function called dot dot dot, or an arrow bar, parentheses, dot dot dot, parentheses. And now you can turn any function or method into a callable and pass that around. And it's still, it's not wrapped up into the silly array format, it's still accessible to static analysers and refactoring tools. Hopefully, with this, you will never need to refer to a function name using a string ever again, never refer to a method call as an array of object and method. So that that just is not needed any more in the vast majority of cases. Derick Rethans 13:20 That alone is probably worth having them, maybe. Larry Garfield 13:23 And Nikita had an RFC that was doing just that, and nothing else. It's kind of a junior version of this. I don't think that's necessary, the full full scope here works, and gives us that. The second use case that I think is going to be common are unary functions. That's functions that take a single argument. More to the point, as I mentioned before, a lot of functions take a callback. And that callback needs a single argument, array map, array filter, some validation routines, a lot of other things like that. So it's now stupidly easy to take any arbitrary function or method and turn it into a single parameter function, which you can then pass as a callback to array map, array filter, all these other tools, and it just becomes really easy to pre fill things that way. The third is the other one I mentioned earlier, if you pre fill all the arguments, and then just put a dot dot dot at the very end, which means zero or more, you now have a function that takes no arguments, but calls the original function you specified with all the arguments you specified. This often the case for default values, where I want to have a default value available, but don't want to take the time to compute it in advance because it might be expensive. Whatever function it is that will determine that default value, I just partially apply that and give it all the arguments and I get back a callable. That creating a callable is dirt cheap, but when I actually need that value, I can then call it at that time, but it won't actually get called unless I need it. That's another use case that we expect to be common. There are no doubt others that we haven't thought of, or that will be less common, but still useful. I think this will probably replace a large chunk of the use cases for short lambdas. Not because short lambdas are bad, they're wonderful. But so many of them convert a function to a simpler function. And this gives us an even more compact, more readable syntax for that, with even less extra symbols and flotsam around it. Derick Rethans 15:24 I saw, hopefully as a joke, saying that, instead of using the question mark, we should use dollar sign dollar sign, and then we should call the token name T_BLING. Larry Garfield 15:36 This RFC actually has a storied history. Several years ago, Sara Golemon had proposed porting the pipe operator from Hack to PHP. The pipe operator is an operator available in a lot of different languages that lets you string together a series of functions. So you pass a function, pass an argument into one function, its results you pass to the next function, its results, you pass the next function and so on, which is a good case for unary functions. In Hack's syntax, they don't use a function on the right hand side, they use an arbitrary expression, and then dollar dollar as a placeholder for where to put the value from the left hand side from the previous step. It's the only language that does that. Derick Rethans 16:20 The other language that does it is bison. Larry Garfield 16:23 Or Bison also does that style of? Derick Rethans 16:25 It does something weird like that, yeah. Have a look at the grammar file. Larry Garfield 16:29 I've looked in there. It's scary. So at the time, she didn't actually put an implementation in for it. But there was some discussion about it. I joked that if she wanted to do that, she should call it T_BLING. And she thought it was hilarious, but never went anywhere. A year ago, I started working on a pipe operator RFC that did just the pipe part, but used a callable on the right hand side, instead of an expression, more like F#, and Haskell, and other languages that have a pipe operator. And their main response to that was, we'd like this, this is cool, except that just using short lambdas on the right all the time to make unaries is too ugly. We want partial application first. So I spent a while trying to bribe someone with more experience and knowledge than me to work on partial application. I tried bribing Ilya Tovolo, to do so by working with him on enumerations. And we got enumerations in, but he doesn't have the time to work on partial application. Levi and Paul had already written an RFC for partial application that had no implementation. It's just a skunkworks, essentially. Then a few weeks ago, Joe pops up and starts working on an implementation for partials. And I, to this day, don't know what interested him in it. But I'm very happy about this fact. So as we updated the RFC, I knew that people want a bike shed about syntax. So I threw that in as a joke. I don't think we're actually going to do that. It's just a little inside reference that is now no longer inside. Derick Rethans 17:56 Joe what made you work on partials, then? Joe Watkins 17:58 It's interesting to write. I've had my fun whether it gets in or not. Derick Rethans 18:02 Sometimes that's the case, right? So just working on this is all the fun. Larry Garfield 18:06 Sometimes it's fun to just run down rabbit holes for the heck of it. And sometimes really cool things can come out of that sometimes. Derick Rethans 18:12 At some point, I might have to implement support for partials like I have for closures in Xdebug as well. Because at some point, people might want to debug these things. So I'm a little bit interested in how do these the closures that it generates? Where does it store the already applied arguments? Joe Watkins 18:29 So partials have the same binary struct up to this point or of the closure, and then after that there's some extra fields. Derick Rethans 18:36 Would they still have the names? Joe Watkins 18:39 No, because named arguments aren't actually named, that information is lost. By the time we've got them, we don't have any name information. We've only got their correct position, according to the call that was made. Derick Rethans 18:50 And every argument that hasn't been filled and doesn't have a special placeholder in there, or does it keep track of which ones have been filled in? Joe Watkins 18:56 We've got two special placeholders internally, you won't see as undef or null or anything. Derick Rethans 19:02 Okay, that's good to know. What has the reaction been so far? Larry Garfield 19:05 Slightly positive. There were a lot of discussions early on about do we support argument reordering? And should it use a single placeholder or two separate placeholders? Originally, we had one and realized after a while, that doesn't actually work. There're use cases where that will be confusing. Overall, the feedback has been quite positive, and I fully expect that to pass. Really the only question people are still debating about at this point is ellipsis versus ellipses question mark. Joe Watkins 19:34 Yeah, I think the first version of the RFC was quite well received. Someone said we could document it as to make a partial sprinkle or question mark over it and hope for the best. Derick Rethans 19:44 Oh, that's good to hear. With feature freeze coming not pretty soon now. When do you think you're putting this up for a vote? Larry Garfield 19:51 Probably in the next couple of days. The only question I think is whether we include a second question for which variadic placeholder to use, which syntax/ Or if we just say it's dot dot dot, go away. Other than that it should go to a vote probably before this episode airs. Derick Rethans 20:06 Thank you very much, both of you for taking the time to me today to talk about partials. Larry Garfield 20:11 Thank you again Derick, hopefully see you once more on this season. Joe Watkins 20:15 Thanks Derick, see you soon. Derick Rethans 20:21 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Partial Function Applications Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 88: Pure Intersection Types London, UK Thursday, June 10th 2021, 09:16 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Pure Intersection Types" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 88. Today I'm talking with George Peter Banyard about pure intersection types. George, could you please introduce yourself? George Peter Banyard 0:30 Hello, my name is George Peter Banyard. I work on PHP code development in my free time. And on the PHP Docs. Derick Rethans 0:36 This RFC is about intersection types. What are intersection types? George Peter Banyard 0:40 I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want X and Y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. So traversable and countable. Currently, you can do intersection types in very hacky ways. So you can either create a new interface which extends both traversable and countable, but then all the classes that you want to be using this fashion, you need to make them implement the interface, which might not be possible if you using a library or other things like that. The other very hacky way of doing it is using reference and typed properties. You assign two typed properties by reference, one being traversable, one being countable, and then your actual property, you type alias reference it, with both of these properties. And then my PHP will check: does the property respect type A those reference? If yes, move to the next one. It doesn't respect type B, which basically gives you intersection types. Derick Rethans 1:44 Yeah, I saw that in the RFC. And I was wondering like, well, people actually do that? George Peter Banyard 1:49 The only reason I know that is because of Nikita's slide. Derick Rethans 1:51 The thing is, if it is possible, people will do it, right. And that's how that works. George Peter Banyard 1:56 Yeah, most of the times. Derick Rethans 1:57 The RFC isn't actually called intersection types. It's called pure intersection types. What does the word pure do here? George Peter Banyard 2:05 So the word pure here is not very semantic. But it's more that you cannot mix union types and intersection types together. The reasons for it are mostly technical. One reason is how do you mix and match intersection types and union types? One way is to have like union types take precedence over intersection types, but some people don't like that and want to explicit it grouping all the time. So you need to do parentheses, A intersection B, close parentheses, pipe for the union, and then the other type. But I think the main reason is mostly the variance, like the variance checks for inheritance are already kind of complicated and kind of mind boggling. Derick Rethans 2:44 I'm sure we'll get into the variance rules in a moment. What is it actually what you're proposing to add here. What is the syntax, for example? George Peter Banyard 2:52 So the syntax is any class type with an ampersand, and any other class type gives you an intersection type, which is the usual way of doing and. Derick Rethans 3:01 When you say class types, do you also mean interfaces? George Peter Banyard 3:04 Yes, PHP has a concept of class types, which are mostly any class in any interface. There's also a weird exception where parent and self are considered class types, but those are not allowed. Derick Rethans 3:20 Okay, so it's just the classes that you've defined and the class that are part of the language but not a special keywords, self and parent and static, I suppose? George Peter Banyard 3:28 Yes, the reason for that is standard types are not allowed to be part of an intersection, because nothing can be an integer and a string at the same time. Now, there are some of the built in types, which can be kind of true. You could have a callable, which is a string, because callables can be arrays, or can be a closure. But that's like very weird and not very great. The other one is iterable. If when you expand that out, you get redundant types, which we can talk about later. And the final thing is parent, self, and static, just makes for some very weird design questions, in my opinion, like, if you ask for something to be an intersection with itself, you basically can only enforce conditions on subclasses. You have a class and you say: Oh, I want it to return self, but also be countable for some reason, but I'm not countable. So if you extend me, then you need to be countable, but I'm not. So it's very weird. parent has kind of the very same weird semantics where you can ask a parent, but it's like, if the base class doesn't support it, and you ask for a parent to be an intersection, then you basically need the child to implement the interface and then a child to return the first child. If you do that main question. Why? Because I don't see any good reasons to do it. And it just makes everything harder. Derick Rethans 4:40 You've only added for the sake of completeness instead of it being useful. Let's move on birds. You've mentioned which types are supported, which is class names and interface names. You already hinted a little bit at redundant types. What are redundant types? George Peter Banyard 4:56 Currently, PHP already does that with union types. If you repeat the type twice in a union, you'll get a compile error. This only affects compiled time known aliases. If you use a use statement, then PHP knows that you basically using the same type. However you use a runtime alias, then it can't detect that. Derick Rethans 5:13 A runtime alias, what's that? George Peter Banyard 5:15 So if you use the function class_alias. Derick Rethans 5:16 It's new to me! George Peter Banyard 5:18 it technically exists. It also doesn't guarantee basically that the type is minimal, because it can only see those was in its own file. For example, if you say I want A and B, but B is a child class of A, then the intersection basically resolves to only B. But you can only know that at runtime if classes are defined in different files. So the type isn't minimal. But if you do redundant types, basically, it's a easy way to check if you might be typing a bug. Derick Rethans 5:46 You try to do your best to warn people about that. But you never know for certain. George Peter Banyard 5:51 You never know for certain because PHP doesn't compile everything into like one big program like in check. Static analyser can help for that. Derick Rethans 5:59 Let's talk a little bit about technical aspects, because I recommend that implementing intersection types are quite different from implementing union types. What kind of hacks that you have to make in a parser and compiler for this? George Peter Banyard 6:11 Our parser has being very weird. The parsing syntax should be the same as union types. So I just copy pasted what Nikita did. I tried it. It worked for return types without an issue. It didn't work with argument types, because bison, which is the tool which generates our parser, was giving a shift reduce conflict, which basically tells: Oh, I got two possible states I can go in, and I don't know which branch I need to go, because the PHP parser only does one look ahead. Because it was conflicting, the ampersand, either for the intersection type or for to mark a reference. Normally, if the paster is more developed, or does more look ahead, it is not a conflict. And it shouldn't be. Ilia managed to came up with this ingenious idea, which is just redefine the ampersand token twice and have very complicated names, and just use them in different contexts. And bison just: now I have no issue. It is the same token, it is the same character. Now that you have two different tokens it manages to disambiguate, like it's shift produce. So that's a very weird. Derick Rethans 7:17 I'll have a look at what that actually does, because I'm curious now myself. Beyond the parser, I think the biggest and most complicated part of this is implementing the variance rules for these intersection types. Can you give a short summary of what a variance rules are, and potentially how you've actually implemented them? George Peter Banyard 7:38 Since PHP seven point four, return types and up covariant, and parameter types are contravariant. Covariant means you can like restrict, we can be more specific. And contravariance means you can be broader or like more generic. Union types already gives some interesting covariance implications. Usually, you would think, well, a union is always broader than a single type, you say: Oh, I want either a traversable or accountable, it seems that you're expanding the type sphere. However, a single type can have as a subtype, a union type. For example, you say,:Oh, my base type is a Class A, and I have two child classes, which are B and C. I can type covariantly that I want either B or C, because B or C is more specific than just A. That's what union types over there allows you to do. And the way how it's implemented. And how to check for that is you traverse the list of child types, and check that the child type is an instance of at least one of the parents types. An intersection by virtue of you adding constraints on the type itself will always be more specific than just a single type. If you say: Oh, I want a class A, then more specifically, so I want something of class A and I want it to be countable. So you're already restrict this, which gives some very interesting implications, meaning that a child type can have more types attached to itself than a parent type. That's mostly due how PHP implements its type system, to make the distinctions, basically, I've added the flag, which is either this is a union, meaning that you need to check it is part of one, or it's an intersection. The thing with intersection types is that you need to reverse the order in how you check the types. So you basically need to check that the parent is at least an instance of one of the child types, but not that none of the child types is a super type of the parent type. Let's say you have class C, which extends Class B and Class B extends Class A. If I say let's say my base type is B to any function, and I give something which is a intersection T, any interface, this would not be a valid subtyping relation to underneath B. Because if you looked it was a Venn diagram in some sense, you've got A which is this massive sphere, you've got B which is inside it, and C which is inside it. A intersection something intersects the whole of A with something else, which might also intersect with B in a subset, but it is wider than just B, which means like the whole variance is very complicated in how you check it because you can't really reuse the same loop. Derick Rethans 10:13 I can't imagine how much more complicated this gets when you have both intersection and union types in the same return type or parameter argument type. George Peter Banyard 10:22 One of the primary reasons why it's currently not in the RFC, because it is already mind boggling. And although I think it shouldn't be that hard to like, add support for it down the line, because I've already split it mostly up so it should be easy to check: Oh, is this an intersection? Is this a union? And then you need to branch. Derick Rethans 10:42 Luckily because standard types aren't included here, you also don't really have to think about coercive mode and strict mode for these types. Because that's simply not a thing. George Peter Banyard 10:50 That's very convenient. Derick Rethans 10:52 Is the future scope to this RFC? George Peter Banyard 10:54 The obvious future scope is what I call composite types, is you have unions and intersections available in the same type. The main issue is mostly variance, because it's already complicated, adding more scope to it, it's going to make the variance go even harder. I think with most programming languages, the variance code is always complicated to read. While I was researching some of it, I managed to hit a couple of failures, which where with I think was Julia and the research paper I was it was just like focusing on a specific subset. And like, basically proving that it is correct. It's not a very big field. Professors at Imperial, which I've talked to, have been kind of helpful with giving some pointers. They mostly work with basically proper languages or compiled languages, which have this whole other set of implications. Apparently, they have like a bunch of issues about how you normalize the types like in an economical form, to make it easier to check. Which is probably one of the problems that will need to be addressed, when you get like such a intersection and union type. First, you normalize it to some canonical form, and then you work with it. But then the second issue is like how do you want the composite types to actually be? Is it oh, you have got parentheses when you want to mix and match? Or can you use like union precedence? I've heard both opinions. Basically, some people are very dead against using Union as a precedent. Derick Rethans 12:14 My question is going to be, is this actually something people would use a lot? George Peter Banyard 12:21 I don't think it would be used a ton. The moment you want to use it, it is very useful. One example is with the PSRs, the HTTP interfaces. Or if you want the link interface. Combining these multiple things gets it convenient. One of the reasons why I personally wanted as well, it's for streams. So currently, streams don't have any interface, don't have any classes. PHP basically internally checks when you call like certain string methods. For example, if you try to seek and you provide a user stream, it basically checks if you implement a seek method, which should be an interface. But you can't currently do that. Ideally, you would want to stream maybe like a base class, instead of having like a seekable stream, and rewindabe stream, or things like that. You basically just have interfaces. And then like if somebody wants a specific type of stream, just like a stream, which is seekable, which is rewindable. And other things. We already have that in SPL because there's an iterator. And we have a seekable iterator interface, which basically just ask: Oh, this is there's a seek method. I think it depends how you program. So if you separate the many things into interfaces, then you'll probably use intersections types a lot. If you use a maybe a more traditional PHP code base, which uses union types a lot. Union types are like going to be easier. And you want to reduce that. Derick Rethans 13:32 Would you think that lots of people already use union types because it's pretty new as well. Isn't it? George Peter Banyard 13:38 Union types are being implemented in various different libraries. PSRs are updating the interfaces to use union types. One use case, I also have a special method, which was taken the date, it takes a union of like a DateTime interface, a string or an integer. Although intersections types are really new, you hear people when union types were being introduced, you heard people saying, I would promote bad cleaning habits, you shouldn't have one specific type. And if you're using a union, you have a design issue. And I had many people complaining to me why and intersection types of see? Why they haven't intersection types being introduced first, because intersection types are more useful. But then you see other people telling us like, I don't see the point in intersection types. Why would you use an intersection type, just use your concrete class, because that's what you're going to type anyway. Derick Rethans 14:21 I can give you a reason why union types have implemented first, over intersection types, I think, which is that it's easier to implement. George Peter Banyard 14:28 It's easier to implement. And it's more useful for PHP as a whole, because PHP functions accepts a union or return a union. Functions return false for error states instead of null. It makes sense why union types were introduced first, because they are mostly more useful within the scope of what PHP does. Derick Rethans 14:46 Do you think you have anything else to add about intersection types? At the moment, it's already up for voting, when is that supposed to end? George Peter Banyard 14:54 So the vote is meant to end on the 17th of June. Derick Rethans 14:57 At the moment I see there's 15 votes for and two against so it's looking good. What's been your most pushback on this? If there was any at all? George Peter Banyard 15:05 Mostly: I don't see the point in it. However, I do think proper reasons why you don't want it, compared to like some other features where it's more like have thoughts on what you think design wise. But it is undeniable that you you add complexity to the variance. And to the variance check. It is already kind of complicated. I have like a hard time reading it initially. There's the whole parser hackery thing, which is kind of not great. It's probably just because we use like a restricted parser because it's faster and more efficient. Derick Rethans 15:36 I think I spoke with Nikita about parsers some time ago and what the difference between them were. If I remember which episode it was all the to the show notes. George Peter Banyard 15:44 And I think the last reason against it is that it only accepts pure intersections. You could argue that, well, if you're adding intersections, you should add the whole feature set. It might impact the implementation of type aliases, because if you type alias T to be a union of A and B, and then you use type T in an intersection, you basically get a mixture of unions and intersections, that you need to be able to work with. The crux of this whole feature is the variance implementation. And being able to rationalize the variance implementation and been to extend it, I think it's the hardest bit. Derick Rethans 16:18 I guess the next thing still missing would be type aliases, right? Like names for types, which you can't define just yet, which I think you also mentioned in the RFC is future scope. George Peter Banyard 16:29 Yeah. Derick Rethans 16:30 Thank you, George, for taking the time today to talk to me about pure intersection types. George Peter Banyard 16:36 Thanks for having me on the show. Derick Rethans 16:41 Thank you for listening to this installment of PHP internals news, the podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Pure Intersectio Types Episode #66: Namespace Token, and Parsing PHP GLR Parser LALR(1) Parser Iter Library Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 87: Deprecating Ticks London, UK Thursday, June 3rd 2021, 09:15 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Deprecating Ticks" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick, welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 87. Today I'm talking with Nikita Popov about a much smaller RFC this time: Deprecating Ticks. Nikita, would you please introduce yourself. Nikita Popov 0:34 Hi Derick, I'm Nikita, and I'm working on PHP core development on behalf of JetBrains. Derick Rethans 0:40 Let's jump straight into what this RFC is about, and that's the word ticks. What are ticks? Nikita Popov 0:46 Ticks are a declare directive,. You write declare ticks equals one at the top of your file, and then PHP we'll call a tick function after every statement execution. Or if you write ticks equals two, then as we'll call it the function after every two statement executions. Derick Rethans 1:05 Do you have to specify which function that calls? Nikita Popov 1:08 Of course, so there is also a register tick function and unregister tick function and that's how you specify the function that should be called rather the functions. Derick Rethans 1:17 How does this work, historically, because the RFC talks about the change being made in PHP seven? Nikita Popov 1:22 Technically ticks work by introducing an opcode after every statement that calls the tick function depending on current count. The difference that was introduced in PHP seven is to what the tick declaration applies. The way PHP language semantics are supposed to work, is that declare directives are always local. The same way that strict types, only applies to a single file, ticks should also only apply to a single file. Prior to PHP seven, it didn't work out way. So if you had declare ticks, somewhere in your file, it would just enable ticks from that point forward. If you included the different file or even if the autoloader was triggered and included a different file that one would also make use of ticks. That was fixed in PHP seven, so now it is actually file local, but that also means that the ticks functionality at that point behaviour became, like, not very useful. Because usually if you want to use tics you actually want them to apply it to your whole codebase. There are ways around that. I'm afraid to say that people have approached me after this RFC and told me that they actually do that. The way around that is to register a stream wrapper. It's possible in PHP to unregister the file stream wrapper and register your own one, and then it's possible to intercept all the file includes and rewrite the file contents to include the declare ticks at the top of the file. I do use that general mechanism for real things in other places, but apparently people actually use that to like instrument, a whole application with ticks, and essentially restore the behaviour we had in PHP 5. Derick Rethans 3:03 What was the intended use case for ticks to begin with? Nikita Popov 3:07 Well I'm not sure what was the intended use case, but at least it was the main use case, and that's signal handling. In the PCNTL extension allows you to register a signal handler, and when the signal arrives, we can't just directly call that signal handler, because signals are only allowed to call functions without that our async signal safe. Which excludes things like memory allocation, and a lot of other things that PHP uses. What we do instead is we only set the flag that okay signal has arrived and then we have to actually run the signal handler at some later point in time. In PHP five, that worked using ticks. You declare ticks, and the PCNTL extension registered the tick handler, and then after this flag was set, it would execute your callback on the next tick. In PHP seven, an attentive mechanism was introduced, that is based on virtual machine interrupts. Those were originally introduced for time-out handling, because there we have a similar problem, that when timeout arrives, we might be in some kind of inconsistent state, like the middle of the allocator right now, and if we just bail out at that point, we are likely to see crashes down the road. So that was a significant problem in PHP five. PHP seven changed that. We now set an interrupt flag on timeout, and then the virtual machine checks this flag at certain points. The interrupt flag is not checked after every instruction, but only, like, just often enough to make sure that it's checked, at some point. So that you can't like go in an infinite loop, that ends up never checking. These points are basically function calls, and jumps that go higher up in the function, PCNTL signals can now use the same mechanism. If you call PCNTL async signals true, then those will also set the interrupt flag, and execute the signal handler on the next opportunity. The next time the interrupt flag is checked. The nice thing about that is that it's essentially free. I mean we already, we already have to do these checks for the interrupt like anyway, adding the handling for PCNTL signals doesn't add any cost on top. Unlike ticks, which have to be like executed on every instruction or at least regularly, and that does add significant cost. Derick Rethans 5:28 Execution time itself because it's an opcode that needs to be executed. Nikita Popov 5:32 Exactly. Derick Rethans 5:33 So what are you proposing to do but the ticks in PHP eight one then? Nikita Popov 5:36 I want to deprecate that. So both the declared directive itself, and the register tick function, unregister tick function. Derick Rethans 5:44 How could users emulate the same behaviour as ticks allows them to do so now? Nikita Popov 5:49 That's a good question. As I mentioned, if the use case is, use case of ticks was signal handling, then by using async symbols. If it was something else, then you have a problem. My assumption when writing this RFC was basically that signal handling was really the main remaining use case of ticks, because other use cases require this kind of you know stream wrapper instrumentation, and I didn't expect that people will be crazy enough to use something like that in production. Derick Rethans 6:21 Hopefully they catch these rewritten files? Nikita Popov 6:23 Probably yeah. I think it's possible to make this integrate with opcache. If you use it for other purposes, then, I don't think there is a really good replacement. So I think what they use it for is some kind of well instrumentation, so profiling, memory profiling, for example, and the alternative there of course is to use a tool that is appropriate for that job, for example, Xdebug contains a profiler, but of course it is not a production profiler, but I think there are also production profilers. Derick Rethans 6:54 As far as I know all the production or APM solutions. They do this on their own without having to use sticks. They don't need any user land modifications. Nikita Popov 7:03 Yeah, definitely. All the APM solutions support this, they use internal handlers. Derick Rethans 7:08 Because it's actually removing functionalities that some people use, what's the reaction been to removing this functionality? Nikita Popov 7:14 Well on the mailing list at least positive, but as I mentioned at least some people have like pointed out on the pull request that they are using the functionality. Derick Rethans 7:23 Enough in such a way to sway for not deprecating them? What is the benefits of getting rid of ticks, if you don't use them? Nikita Popov 7:31 That's, I think the thing, that there is not really a big benefit to getting rid of them. Like they don't add a lot of technical complexity to the engine. They're pretty simple in that sense. I haven't seen those responses. I'm kind of rolling a bit unsure if we should really remove them, because you could argue that well they don't really hurt anyone. I do have to say that I think all the things that people use sticks for, all the cases I have heard about, and all of those cases ticks are not the right way to solve the problem. They are not the right way to solve the signal handler problem, they are not the right way to solve the profiling problem. And the other one I heard is also they're not the right way to solve the heartbeat problem, to make sure a service stays connected. While people do use them I think they use them for questionable purposes. Derick Rethans 8:24 Developers, if they're using something to rewrite the PHP file to introduce ticks, they can also technically rewrite a file to introduce calls to their own functions, after every statement. Nikita Popov 8:34 Yes, I actually have a very nice PHP fuzzing project that rewrites PHP files to introduce instrumentation functions at certain points. That needs a lot more control than ticks, because it's interested in branching statements in particular. That is definitely also possible, but it's kind of even more crazy than just adding ticks. If you're doing it like this, I think, if we want to keep ticks, then we should change ticks from a declare directive to a ini_set, because this kind of rewriting of files to introduce takes that's like not a great solution. On the other hand, that does mean that if you are, I don't know a library, implementing some code and expecting that, you know, it just runs normally, then someone can with by enabling an ini setting will suddenly run code in the middle of your library file that's like essentially any point. So enabling ticks us a major behaviour change, that's something we really don't like to have in ini settings which is I guess also, why does it declare in the first place, because that limits the scope. And you have to go out of your way if you want to not limit it using this rewriting hack. So I'm not really sure ultimately what to do here. Derick Rethans 9:44 Are you thinking of bringing this up for vote before PHP eight dot one's feature freeze? Nikita Popov 9:49 If I decide to go for it, then definitely before. I'm just not completely sure on this topic yet. Derick Rethans 9:55 it'd be interesting to, to hear what other people think about removing this. I have no opinion about this. Other features I do but in this case, I'm happy with them being there, I'm happy with them not being there, because it's something I'm using myself. In any case, thank you for going through this RFC with me today, and we'll see what happens. Nikita Popov 10:14 Thanks for having me, Derick. Derick Rethans 10:18 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug and debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Deprecating Ticks Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 86: Property Accessors London, UK Thursday, May 27th 2021, 09:14 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Property Accessors" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 86. Today I'm talking with Nikita Popov about his massive property excesses RFC. Nikita, would you please introduce yourself? Nikita Popov 0:32 Hi Derick, I'm Nikita, and I do work on PHP core development, on behalf of JetBrains. Derick Rethans 0:39 This is probably the largest RFC I've seen in a while. What in one sentence, are you proposing to add to PHP here? Nikita Popov 0:46 I would say it's an alternative to magic get and set, just for one specific property instead of all of them. That's the technical side. Maybe I should say something about the like motivation behind it, which is that since PHP seven four, we have type properties, that at least for me personally with that feature, the need to have this typical pattern of private property for storage, plus a public getter and setter methods, the main motivation for that has kind of gone away, because we can now use types to enforce any contracts on value. And now these getter and setter methods most if you like boilerplate. So the idea with accessors, at least my idea with accessors is that you really shouldn't use them. You should just have them as a backup option. You declare a public property in your class, and then maybe later, years later, it turns out that okay, that property actually requires additional validation. And right now if you have a public property, then you don't really have a good way of introducing that. Only way is to either break the API contract by converting the property into getter/setter methods where you can introduce arbitrary code, or by using magic get/set, which is definitely possible and persist the API contract, it's just fairly ugly. Derick Rethans 2:09 You changes the public property that people could read into a private one. And because it's private, the set and get metric methods are being called. Nikita Popov 2:18 Exactly. Derick Rethans 2:19 This RFC is titled Property accesses, how do these improve on the situation? Nikita Popov 2:24 So I think there are really two fairly orthogonal parts to this RFC. The first part is implicit accesses that don't have any custom behaviour, and just allow controlling the behaviour of properties a bit more precisely. In particular, the most important part is probably the asymmetric visibility, where you have a property that's publicly readable, but can only be set from within the class. So public read/ private write. I think that's a, maybe the most common requirement. The second part is where you can actually introduce some custom behaviour. So where you can say that okay, the get behaviour for this property looks like this, and the set behaviour, it looks like this. Which is essentially exactly the same as what magic get/set does, just for a single property. Derick Rethans 3:10 For example, when you then do set, or you can add additional validation to it. Nikita Popov 3:14 Exactly. Originally, you had a simple public property, then you can add a setter that checks okay this string cannot be empty. Derick Rethans 3:23 Okay, what it's the syntax that you're proposing? Nikita Popov 3:26 I went with these essentially the same syntax that's being used in C#. Looks like you write public foobar, and then you have this sort of semi colon you have a code block. And this code block contains two accessors, so then you have something like get, and another code block that specifies the get behaviour, and set, and the code block that specifies the set behaviour and so on. Derick Rethans 3:52 The RFC talks about implicit and explicit implementations of these getter and setter accessors. What is the difference between them and how does it look different in syntax? Nikita Popov 4:03 Yeah so the difference is, either you can write just get semi colon, set semicolon, that's an implicit implementation, or you actually specify a code block with real custom behaviour. To do the implicit implementation, you're saying that this is really a normal property, and PHP automatically manages the storage for you, is that you have this more fine grained control over how it works. Namely what you can do is you can say that you have get and private set. But that's a property that's publicly read only and internally writeable. You can write just get without set, in which case it's a real read only property both publicly and privately, or to be more precise, it's an init once property so you can assign to it once. Derick Rethans 4:52 How do you keep track of the init once? Nikita Popov 4:53 It's same mechanism as for Type Properties, where we distinguish between an initialized and an uninitialized property. You can assign to an uninitialized property, but you can't assign to an initialized one, if it's read only. The only maybe problem there is that this mechanism, requires that the property actually is uninitialized to start with, which means that for accessors you don't have any default values. To say there is no implicit default value, no implicit null value. If you want to have a default value the same as with type properties you have to specify it explicitly. Specifying a default value really only makes sense if the property is both readable and writable. For Read Only properties, if you specify the default then you will you can change that. Derick Rethans 5:37 You have basically have created a constant. Nikita Popov 5:39 Yes, it is essentially a constant. Derick Rethans 5:41 You mentioned already, PHP seven four introduced type properties. How do these types interact with the setter and getter accessors? Nikita Popov 5:50 I would say in the obvious way. The getter is required to return type of property, modulo the usual weak typing conversions, and the setter also checks before it's called whether the passed value matches the type or not. But enforces that matches the type. Derick Rethans 6:08 This does mean that if you provide an explicit implementation for the set accessor, you also need to specify the parameter name? Nikita Popov 6:15 No, or you can specify the parameter name, and if you don't then that's just passed in as the value variable. It's also inspired by how C# and Swift do it. I mean there are some possible variations here we could always require an explicit name, some people for that, or I also heard that some people would like to have the name of this implicit variable match the name of the property, instead of always being just value. Derick Rethans 6:41 Would you have to specify the type though? Nikita Popov 6:43 You wouldn't have to and you're actually not allowed to. So the accessor implementation is somewhat strict about not allowing you to do anything that would be redundant because otherwise, you know, there are quite a lot of extra things you could be adding everywhere. Derick Rethans 6:56 That's the same way as marking a property as private. And then the accessors as private as well. Right? Nikita Popov 7:03 Yeah exactly. So, then that will also say: if the property is already private you can't, again say that the accessors also private. Derick Rethans 7:11 I think that's the wise thing, otherwise people go overboard with adding private and final and whatever everywhere anyway right. Nikita Popov 7:18 One could argue that it's really not our business and this is a coding style question, but you know it's better to not leave people, with the option of doing stupid things. Derick Rethans 7:28 I saw in the RFC that it is also possible to use references with the get accessor. Does this complicated implementation and the idea of this RFC, a lot, or just a little? Nikita Popov 7:39 I think the important context to keep in mind here is that we already have magic get set, and the accessors are, like, largely based on their semantics. Magic getters already have this distinction between returning by value and returning by reference. The by reference return value is primarily useful for two cases. One and this is really the important one, is if you're working with arrays, any write operation on an array like setting an element or appending an element, those require that the getter returns by reference, because PHP will actually do the modification on the reference. Because some people asked about that. Why can't we just like get the array using the getter, then make the change and then assign back using the setter. That would theoretically work, but it would be extremely inefficient, and the reason is that this breaks PHP's copy on write mechanism. If the error is returned from the getter, then we have one array inside the property. And we have one copy of the array inside the property, and as the return value. Then we change the return value and the resource is now shared, we actually have to copy the whole array, and then we assign it back. So effectively what we do is we copy the array, we do single element change, and then we copy the array, we do a single element change and then we destroy the old array. That works in theory, but it's so inefficient that we would not want to promote this kind of usage. Derick Rethans 8:42 The way around is of course, is having an implicit methods on the class to make this change to the array itself right? Nikita Popov 9:10 That would be another option. Problem is that you will need a lot of methods, I mean it's not just a matter of setting a single element or unsetting an element, but you can also set like a deep element where you're not modifying the outermost array but, like, a multi dimensional array. You would actually have to pass through that information somehow as well. I don't think there is a simple solution to that problem beyond the reference based solution that we currently use. Derick Rethans 9:34 I saw people arguing about not bothering with references in this new implementation at all, but I think you've now made a good case for keeping them. Nikita Popov 9:42 Effectively not bothering with references just means not supporting that array use case. Which might be, maybe a reasonable limitation, especially if we like make a distinction and supported for the implicit accessor case where we can, you know, do internal magic to support that and not support it in the explicit accessors case. I mean, people were arguing that this reduces the complexity of the proposal, but it kind of also increases the complexity because now we are doing something else for the accessors and we're doing for the magic get/set, where we already have this established mechanism. I'm not really convinced by that. Derick Rethans 10:20 And I also think it creates inconsistencies in the language itself because it does something different with an implicit or explicit accessor, as well as it being different between the original underscore underscore get magic method as well. Nikita Popov 10:34 It's not a secret that I'm not a big fan of references, and I would certainly love to get rid of them, but it's a hard problem, and this array modification behaviour for magic get or for get accessors is certainly a large part of that problem, and I just don't have a good solution for it. Derick Rethans 10:52 I don't either. The RFC also goes into great detail about inheritance and variance. Would you have a few words on that? Nikita Popov 11:00 I think mostly inheritance works like inheritance does for methods, at least that's how it's supposed to work. Of course there are some interactions, because you can for example mix real properties and accessor properties. In which case, if you have parent accessor property, you can always replace it with a normal simple property, because normal properties they support all operations that accessor properties do. What you can't do is the other way around. If you have a parent normal property, then you can't replace that with an accessor property. And reason is that it does have some limitations. Not a lot, but there are some limitations. One of them is related to references, I mean, we're already talking about this topic. What the by reference get allows is taking a reference to the property, so you can do something like a reference equals the property. What you can't do is the other way around the property reference equals something else. So you can't assign a new reference into the property, that just doesn't work on a pretty fundamental level, because it would require an additional set handler for set by reference. As we don't particularly love references, adding a new mechanism to support that is not a very popular choice. Derick Rethans 12:20 Variance wise, I guess, the same rules apply as for normal properties and property types? Nikita Popov 12:27 Approximately. Properties are apparently invariant, so you can't change the type or I mean you can change it but it has to be an equivalent type. If you have a read only property, with only a getter, then the implementation makes the type covariant, which means you can use a smaller type in the child class. This is similar to how if you have a getter method, you could also give it a smaller type in the child class. The converse case, if you have a property that can only be set, then the type is contravariant, you can have larger type in the child class, though I should say that properties that can only be set are somewhat odd and really only supported for the sake of completeness, so maybe it might be worthwhile to drop the type specific behaviour there, because a set only property should already be really rare, and then set property with a contravariant inheritance that's like a edge case of an edge case. Derick Rethans 13:24 Would it even make sense to support set only properties? Nikita Popov 13:27 Not sure. So for the C#, implementation, I think they don't support this and there is a StackOverflow question about that, and people try to convince their, that they should support this, that the are really use cases. Currently the imagined use cases are along the lines of injecting values into a class, so using setter injection, just that now it's property based setter injection. Okay, I'll be honest I think it doesn't make sense. Derick Rethans 13:55 To be fair, I don't think either. It would reduce the length of the RFC a little bit. Nikita Popov 14:00 A little bit, yes. Derick Rethans 14:01 Can you say a few words about abstracts, traits, private accessors shadowing and things like that. So a lot of complicated words, maybe you, you can distil that into something slightly simpler. Nikita Popov 14:12 Well I think actually abstract properties are worth mentioning. In particular, the fact that you can now specify properties inside interfaces. If you have public properties, then it makes sense to have them really on the same level as public methods, so they are part of the API contract, and as such should also be supported in interfaces. Typically what the RFC allows is, you can't specify a simple property in the interface, but you can specify an accessor property, which tells you which operations have to be supported. So you can't have a property declaration that says, it just has a get accessor, or it has get and set. The implementation of course can always implement more, so if the interface requires get, then you can implement both get and set, but it has to implement at least get, either through an accessor offer another property. I think in most cases implementation will just be a normal property. Derick Rethans 15:03 Because a normal property would implement an implicit get already anyway? Nikita Popov 15:07 Yeah. Derick Rethans 15:08 How do property accessors tie in, or integrate with constructor property promotion? Nikita Popov 15:13 They are supported and promotion with the limitation that it's only implicit accessors. If you use constructor promotion, then you can specify your read only property in there, or property that is Public Read/ private write. You cannot specify a property with complex behaviour in there. This is mainly because it would mean that you embed large code blocks into the constructor signature, which is I think, pushing the limits of shorthand syntax, a bit. Like there is nothing fundamental that will prevent it, it's more a question of style. Derick Rethans 15:50 The RFC talks a little bit about how, or rather what happens if you use foreach, var_dump, or an array cast on properties with explicit accessor. What are the restrictions here? Is something chasing from normal standard properties like we currently have. Nikita Popov 16:03 I don't think so. So here is once again the case where we have this distinction between the implicit accessors, which are really just normal properties with limitations. So those show up in var_dump and array cast, foreach, as usual. And we have explicit accessors, which are really virtual properties, so they don't have any storage themselves. Any storage to use, you have to manage separately somehow. So, these don't show up in var_dump, foreach, and so on. Both these actual computed properties, they don't show up because that would require us to actually call all the accessors if you do foreach and that seems rather dubious to me. Derick Rethans 16:44 How this will work for internal API's that some extensions use to access, like a list of all the properties, for say, for a debugger. Nikita Popov 16:51 It'll work the same way as var_dump. I mean, in the end it's all, well it's not quite based on the same API's, but still, the answer is the same. You only get those properties that have some kind of backing storage, and those are only the ones that are either normal properties, or the ones with implicit accessors. Derick Rethans 17:09 That means I need to go find out a way how to be able to read the ones with explicit accessors. Nikita Popov 17:14 Yeah, if you want to. I don't think that the debugger should read those by default, because that means that doing a dump, will have side effects, which is not ideal, but maybe you want to have an option to show them. Derick Rethans 17:26 That's something for me to think about, because I'm pretty sure people are going to want to see the contents of these properties, even in a debugger, even though that could mean that are side effects, which I'm not keen on. Nikita Popov 17:36 I guess that's one of the, I would say advantages of using this over just magic get/set, because actually know which properties you're supposed to look at, with for magic get/set you just don't know at all. Derick Rethans 17:51 The RFC talks a little bit about the performance impacts and although I saw the numbers I didn't actually read them, when preparing for this recording. What are the performance impacts for implicit accessors as well as explicit ones? Nikita Popov 18:02 Impact is basically if you use implicit accessors that has similar performance to plain properties, performance is a bit worse. The reason is essentially that we have some limitations on caching. So we can't just cache it as if it were a normal property, because it could have asymmetric visibility. And we reuse the same cache slots for reads and writes. I've been thinking about maybe splitting that up but at least for now there is a small additional performance impact of using implicit accessors, but it's not really significant. On the other side if you use explicit accessors. Those are expensive, they are not quite as expensive as using magic get/set, but they are more expensive than using normal method calls. Reason is basically their normal method calls, they are very optimized, and they do not have to re enter the virtual machine, so we just stay in the same virtual machine loop, and we just switch to different stack frames. For magic get/set we actually have to like recursively call the virtual machine, because we don't have a good point to re enter it, at least based on our current API's. And we also have to deal with some additional stuff, particularly the fact that magic get/set and property accessor as both, they have recursion guards. Normally if you recurse methods in PHP, we don't do any checks about that. Xdebug does, but PHP itself doesn't, so you can infinitely recurse and PHP is fine. The only thing that happens is that at some point you'll run out of memory. Derick Rethans 19:37 Or when extensions are loaded such as Xdebug, you'll actually still get a stack overflow. Nikita Popov 19:41 So that's something we should still be addressing, at least the baseline behaviour that you can get to that memory limit error. For properties will set have recursion guards, which say that if you recursively access a property in magic get/set, that it will not call magic get/set again and instead, access the property as if they didn't exist. Derick Rethans 20:01 Instead of throwing in an error? Nikita Popov 20:03 Yeah. For property accessors I'm actually throwing an error on recursion, and the reason for that is if we didn't throw an error, then this would end up accessing dynamic property of the same name as the accessor, which would technically work, but it's very likely not what the programmer actually intended. So it's going to be really inefficient because you actually have to allocate space for the dynamic properties and access for those. So if you wanted to have some kind of backing storage for the property, then you should just explicitly declare it and access that, rather than accessing something with the same name and implicitly creating a dynamic property. Derick Rethans 20:41 Yeah, that sounds all very complicated. Nikita Popov 20:44 It's cleaner to just make it an error and let the programmer fix it, instead of PHP try to fix it for you. Derick Rethans 20:51 Are there any BC considerations about the introduction of property accessors? Nikita Popov 20:55 Not strictly, but I'm sure that it's going to break, various assumptions for people, or at least in the sense that, right now, most assumptions should already be broken through magic get/set. I mean you can always have this kind of magic behaviour. If we have accessors this is probably going to be a lot more common, and people will have to deal with things like properties being publicly readable, but privately writable much as because someone very rarely manually implements that behaviour, but because the language though has native support for it and it's going to be common. Derick Rethans 21:28 We spoke a little bit about all the different sticking points in his RFC, for example with references, but there's one other thing and I think it's an argument you make somewhere on the bottom of the RFC, that there is a separation between implicit and explicit property accessors. I'm wondering whether it would make sense to consider adding whether the implicit part of this RFC first and then maybe later look at adding explicit property accessors. Nikita Popov 21:54 That's really the main sticking point, and also my own problem with the RFC. I mean, you mentioned at the very start, that this is a very long RFC and still a little bit incomplete so it's going to be longer. It's a fairly complex feature that has complex interactions with other features in PHP. The implementation is actually, maybe less complex, then you think, given the RFC length. The main concern I have is that, at least for me personally the most useful part of the RFC, are the read only properties. The read only properties and the like Public Read, Private Write properties. I think these two cover like 90% of the use cases, especially because if you have a property that is only publicly readable, then you don't really have to be concerned about this case where you have to, later on, add additional validation. I mean after all the property is read only, or you control all the sets because they're private. There is no danger of introducing an API break, because you have to add additional validation. I think like the largest part of the use case of the whole accessors proposal will be covered by these two things, Or maybe even just one of these two things, that's a bit of a philosophical question. There are some people who think we should have just public read / private write and no proper read only properties, because that like looks the same from the user perspective, but still gives you more flexibility. I think that's like the most important use case, and we could implement that part with a lot less language complexity. So the question is really does it make sense to have this full accessor proposal, if we could get the most useful part as a separate simpler feature, and, well, I heard differing opinions on that one. I was actually pretty surprised that their reception of the on like a full accessors proposal was fairly positive. I kind of expected more pushback, especially as, this is the second proposal on the topic, we had earlier one with, like, similar syntax even though different details, and that one did fail. Derick Rethans 24:02 How long ago was that? Nikita Popov 24:03 Oh that was quite a while actually, at least more than five years. Derick Rethans 24:06 I think that the mindset of developers has changed in the last five to 10 years, like introducing this 10 years ago would never happened, or even typed properties, right. It would never have happened. Nikita Popov 24:17 That's true. Derick Rethans 24:19 Do you have any idea when you're going to put us up for a vote? Because, of course, PHP 8.1 feature freezes coming up in not too far away from now. Nikita Popov 24:28 Yeah, I'm not sure about that. I'm still considering if I want to explore the simpler alternatives, first. There was already a proposal. Another rejected proposal for Read Only properties, probably was called write once properties at the time. But yeah, I kind of do think that it might make sense to try something like that, again, before going to the full accessors proposal or instead. Derick Rethans 24:54 Do you have anything else to add? Nikita Popov 24:56 What are your thoughts on this proposal, and the question at the end? Derick Rethans 24:59 I quite like it, but I also think that it might make sense to split it up. I'm always quite a fan of splitting things up in smaller bits, if that's possible too, and still provide quite a lot of use out of it. And that's why I was asking whether it makes sense to split it up into the implicit part and the explicit part of it, especially because it makes the implementation and the logic around it quite a bit easier for people to understand as well. Nikita Popov 25:24 It's maybe worth mentioning that Swift also has a similar accessor model but it is more like a composition of various different features like read only properties, properties with asymmetric visibility, and then finally properties with like fully controlled, get and set behaviour rather than this C# model where everything is modelled using accessors with appropriate modifiers. So there is certainly precedent in other languages of separating these things. Derick Rethans 25:55 Something to ponder about, and I'm sure we'll get to a conclusion at some point. Hopefully some of it before PHP eight one goes and feature freeze, of course. We've been chatting for quite a while now, I think we should call it the end for this RFC. Thank you for taking the time today to talk about property accessors. Nikita Popov 26:11 Thanks for having me, Derick. Derick Rethans 26:12 Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Property Accessors Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
On this episode of Smash the Bug, Simon Frost gives us a sneak peek at his upcoming book on the Magento 2 admin panel, outlines his debugging process and explains what it takes to become a "power user"! -- KEY TAKEAWAYS Brief overview of Simon's upcoming book on the Magento 2 admin panel and why it will be useful for tons of developers. What it takes to become a "power user" and what that means. The importance of a "personal solutions log" and how to start one. Admin action logs as a great start to evidence gathering. "Created at" and "updated at" - approaching a bug like a forensic investigator. Why Simon tries to avoid firing up Xdebug until he knows it's time. Simon's "comparison technique". Skilled debugging is more than compiling a bunch of different tools, but more importantly, choosing the right one. "If everything seems to be working properly, challenge your assumptions". -- Do YOU have an interesting experience with debugging? Send your story to logan@swiftotter.com and you might be our next podcast guest! This podcast exists to inspire, educate and entertain eCommerce developers who are serious about improving their skills and advancing their careers! The Art of Ecommerce Debugging, your complete guide to becoming an expert debugger, is NOW AVAILABLE at SwiftOtter.com! Claim your copy right here: https://swiftotter.com/art-of-debugging#/ The information contained in this book is your pathway to becoming an incredible developer. In fact, veteran developers from all over the world are reporting that they learned valuable new tricks from this book. Your ROI on this inexpensive tool will be immediate and through the roof! Get your copy here: https://swiftotter.com/art-of-debugging#/
On this episode of Smash the Bug, Joseph and Jesse talk about the importance of resetting even before a problem is solved, a deep Xdebug stack trace, and the importance of visualizing data in the right ways. -- KEY TAKEAWAYS Why are we on a dock at the lake? The importance of clearing your mind and resetting, even when deadlines are approaching. The light bulb moment (it's not what you think). Amending the discussion on "views" from Episode 8 Products not loading into a quote item , a deep Xdebug stack trace with a fix in under an hour, and the value of teamwork and collaboration. Price re-calculation/updating not happening (logging is so important!) and how visualizing data in the right ways gets you far. The ultimate value of becoming an amazing problem solver. -- Do YOU have an interesting experience with debugging? Send your story to logan@swiftotter.com and you might be our next podcast guest! This podcast exists to inspire, educate and entertain eCommerce developers who are serious about improving their skills and advancing their careers! Don't forget to check out SwiftOtter's brand new FREE COURSE, "Mastering Xdebug"! Go to SwiftOtter.com/MasteringXdebug to enroll! The Art of Ecommerce Debugging, your complete guide to becoming an expert debugger, is NOW AVAILABLE at SwiftOtter.com! Claim your copy right here: https://swiftotter.com/art-of-debugging#/ The information contained in this book is your pathway to becoming an incredible developer. In fact, veteran developers from all over the world are reporting that they learned valuable new tricks from this book. Your ROI on this inexpensive tool will be immediate and through the roof! Get your copy here: https://swiftotter.com/art-of-debugging#/
PHP Internals News: Episode 85: Add IntlDatePatternGenerator London, UK Thursday, May 20th 2021, 09:13 BST In this episode of "PHP Internals News" I discuss the Add IntlDatePatternGenerator RFC with Mel Dafert (GitHub). The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick, welcome to PHP internals news, the podcast, dedicated to explain the latest developments in the PHP language. This is episode 85. Today I'm talking with Mel Dafert about the "Add Intl Date Pattern Generator RFC" that she's proposing for inclusion into PHP 8.1. Mel would you please introduce yourself? Mel Dafert 0:35 Hello, I am Mel. I've been working professionally with PHP for about three years. Recently I started reading the internals mailing list in my free time, but this is my first time contributing. Derick Rethans 0:46 What made you think starting to read the PHP internals mailing list? Mel Dafert 0:50 I generally like reading mailing lists and issue trackers. And since I work with PHP, it was interesting to read what's, what's happening. Derick Rethans 1:02 That's what I'm trying to read this podcast as well of course; explaining what happens in the PHP development. But let's get to your RFC. What is the problem that you're trying to solve for this? Mel Dafert 1:14 Currently, PHP exposes the ability for locale dependent date formatting with the Intl Date Formatter class. It is basically only three options for the format: long, medium and short. These options are not flexible enough in some cases, however. For example, the most common German format is day dot numerical month, dot long version of the year. However, neither the medium nor the short version provide this, and they use either the long version of the month, or a short version of the year, neither of which were acceptable in my situation. Derick Rethans 1:47 I realize that you basically ran into a problem that PHP wasn't doing something you wanted to do it. But what made you actually wanting to contribute this? Mel Dafert 1:57 I ran into this exact problem at work where I wanted to format dates in this specific way. After some research, I found out that ICU, the library that powers Intl Date Formatter, exposes exactly this functionality already. It would be relatively easy to wire this up into PHP and expose it there as well. I also found in a bug report that other people had this problem as well, so I decided to try my best at hacking at the PHP source and make it available to everyone, using PHP. Derick Rethans 2:25 Had you ever seen a PHP source code before? Mel Dafert 2:28 I don't think so. No. Derick Rethans 2:29 But you are familiar with C a little bit? Mel Dafert 2:32 On a very basic level, yes. Derick Rethans 2:34 As part of this RFC What are you trying to suggest to add to PHP? Mel Dafert 2:39 ICU exposes a class called date time pattern generator, which you can pass a locale and so called skeleton and it generates the correct formatting pattern for you. Skeleton just includes which part are supposed to include it, to be included in the pattern, for example the numerical date, numerical month, and the long year, and this will generate exactly the pattern I wanted earlier. It is also a lot more flexible, for example the skeleton can also just consist of the month and the year, which was also not possible so far. I am proposing to add a Intl Date Pattern Generator class to PHP, which can be constructed for locale, and exposes the get best pattern method that generates a pattern from a skeleton for that locale. Derick Rethans 3:22 The skeletons, what do you specify in these skeletons? Mel Dafert 3:27 It's a similar format to the pattern itself. For example, it's lowercase y lowercase y uppercase M uppercase M, would give you only the year and only the month, if I'm correct, that's exactly what the skeleton looks like. Derick Rethans 3:43 But it puts it in the right order? Mel Dafert 3:45 It puts it in in the right order, and in some cases also adds extra characters, or even changes the format slightly, depending on the locale. Derick Rethans 3:55 So it is a bit of a flexible way to tell the Intl extension to format them in a slightly more, well how do you say this, a slightly more intelligent way than what the standard, long, short and medium constants do for you. Mel Dafert 4:11 Exactly. Derick Rethans 4:12 Why is it so important that you get these formats, right, or rather I should say, how do these locales influence formats and why is this important? Mel Dafert 4:21 There are conventions of how to format dates and times vary rather strongly between languages and country. In Austria, for example, nobody would expect to understand the US format of month slash day last year. I assume people in England may have the same issue. Derick Rethans 4:38 I think everybody has that issue except for people in the US. Mel Dafert 4:42 But that only shows the importance of using a format that people are used to and understand. Other languages like mainland Chinese even have the words for day and month included in the format, as far as I understand. I don't speak Chinese. Derick Rethans 4:59 Neither do I, but a long time ago when I, when I added the date time support, not Intl, but PHP standard date time support, I also looked at locales that operating systems have. And even these locales, which is not something that Intl uses now, also encode these extra characters at least for Japanese, so that was interesting to see there as well. Mel Dafert 5:22 There is a lot of sometimes somewhat unexpected formats. Derick Rethans 5:27 And I think German sometimes once the add the in front, and sometimes behind and things like that. I know there's lots of little intricacies, yes. I see that he RFC makes an argument about which name to pick for the new class. Can you elaborate on the two different options that are? Mel Dafert 5:44 Yes, this is certainly for us and what I would call bike shedding. ICU has something of an inconsistency in its naming. The formatting class is called date formatter. And the pattern generator class is called Date Time pattern generator. Derick Rethans 6:00 So it has the extra word time in it? Mel Dafert 6:03 Between some inconsistency with Intl Date Formatter, which already exists in PHP, and the Intl Date Time pattern generator, or if we make sure PHP is internally consistent and omit the time in all cases. So far consensus seems to lean towards the second option. This is also what the Hack people decided to use. Derick Rethans 6:24 And I believe that's the one you are wanting to go with in this RFCs as well, right? Mel Dafert 6:28 Exactly. So far, everybody voted slide, or like express themselves to slightly favour the version without time. So that's the one I'm going with. Derick Rethans 6:40 Of course, as you mentioned, this is a fairly small change to it, but the RFC talks a bit about things to add in the future, because I believe you weren't suggesting to add all of these Intl functionality straightaway. What is this future scope? Mel Dafert 6:55 ICU would also expose more methods around the skeletons, for example, turning a pattern back into its skeleton, or building a list of skeleton and then mapping to the patterns from scratch. That's what you would do in theory if you added your own special locale to this. Derick Rethans 7:17 I'm not sure how to do that with PHP actually, but I think ICU allows you to build your own basically files with settings right? Mel Dafert 7:25 Exactly. This is omitted all of this, for simplicity, and because they couldn't think of a use case for it, personally, at least. If someone does need them, they could easily be added. It would just be a bunch of extra methods on the, on the class. Derick Rethans 7:43 I know that ICU has so much functionality that hasn't been exposed to PHP, because there's just so much of it right? Mel Dafert 7:50 Extremely, yes. I did see that Hack decided to expose all of them, like all the methods that the class has, but I really don't see the use of having to document and test all of these methods when really only one is going to be used. So I've decided to just go for the one that I can actually see people using. Derick Rethans 8:14 And it is always easy to get smaller parts added to PHP than big things, to begin with. Mel Dafert 8:21 Exactly. Derick Rethans 8:22 How has the reception been so far? Mel Dafert 8:24 I haven't gotten feedback from too many people, but it seems positive so far. A few people that did give some feedback were constructive and seem to seem to like the idea of adding this. Derick Rethans 8:36 I reckon outside of English speaking countries this is quite an important thing to actually support, especially as we just discussed, people are picky about how these things are formatted. Mel Dafert 8:46 Very picky. Derick Rethans 8:48 So the name that you're going for would be Intl Date Pattern Generator, would it also support patterns for the time itself? Mel Dafert 8:55 Of course, just like Intl Date Format also support formatting time. Derick Rethans 9:02 It would be strange if it didn't, to be honest. Mel Dafert 9:04 Yeah. Derick Rethans 9:05 When do you think you're going to put us up for a vote for inclusion to PHP 8.1? Mel Dafert 9:10 I think I sent out the first email about two weeks ago for opening the discussion. So I was planning to send out the heads up, either today or tomorrow, and opening the vote after that. Derick Rethans 9:23 Okay. To be fair, I think there is very little controversy in this one, so it would surprise me if it didn't pass. Mel Dafert 9:30 That's reassuring. I am somewhat anxious about them. Derick Rethans 9:33 It's not controversial, it is an, it is perhaps a niche thing but it is something that is useful, so I can't see people really be opposing to this. To be fair, I think it looks like just an omission from when the Intl extension was written in the first place. Mel Dafert 9:48 That's true. It might have not been supported in ICU at that point. Derick Rethans 9:54 That is a good point as well because I think the Intl extension came with PHP five three, or five four, which I think is now eight years ago or something like that. Mel Dafert 10:04 I think, I think ICU might have not had it at the end. It's an old word, like it's an all supported versions of PHP. Derick Rethans 10:13 That is good to know. Would you have anything else to add? Mel Dafert 10:16 No, I think that's it. Derick Rethans 10:17 Thank you for taking the time today to talk to me about your proposal to add the Intl date pattern generator to PHP 8.1 Mel Dafert 10:25 Of course. Thank you for having me. Derick Rethans 10:29 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Add IntlDatePatternGenerator Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 84: Introducing the PHP 8.1 Release Managers London, UK Thursday, May 13th 2021, 09:12 BST In this episode of "PHP Internals News" I converse with Ben Ramsey (Website, Twitter, GitHub) and Patrick Allaert (GitHub, Twitter, StackOverflow, LinkedIn) about their new role as PHP 8.1 Release Managers, together with Joe Watkins. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick, welcome to PHP internals news, a podcast, dedicated to explaining the latest developments in the PHP language. This is episode 84. Today I'm talking with the recently elected PHP 8.1 RMs, Ben Ramsey and Patrick Allaert. Ben, would you please introduce yourself. Ben Ramsey 0:34 Thanks Derick for having me on the show. Hi everyone, as Derick said I'm Ben Ramsey, you might know me from the Ramsey UUID composer package. I've been programming in PHP for about 20 years, and active in the PHP community for almost as long. I started out blogging, then writing for magazines and books, then speaking at conferences, and then contributing to open source projects. I've also organized a couple of PHP user groups over the years, and I've contributed to PHP source and Docs and a few small ways over the years, but my first contributions to the project were actually to the PHP GTK project. Derick Rethans 1:14 Oh, that's a blast from the past. You know what, I actually still run daily a PHP GTK application. Ben Ramsey 1:21 Oh, that's interesting. What does it do? Derick Rethans 1:23 It's Twitter client. Ben Ramsey 1:24 Did you write it. Derick Rethans 1:26 I did write it. Basically I use it to have a local copy of all my tweets and everything that I've received as well, which can be really handy sometimes to figuring out, because I can easily search over it with SQL it's kind of handy to do. Ben Ramsey 1:41 It's really cool. Derick Rethans 1:42 Yep, it's, it's still runs PHP 5.2 maybe, I don't know, five three because it's haven't really been updated since then. Ben Ramsey 1:49 Every now and then there will be some effort to try to revive it and get it updated for PHP seven and eight, but I don't know where that goes. Derick Rethans 1:59 I don't know where that's gone either. In this case, for PHP eight home there are three RM, there's Joe Watkins who has done it before, Ben, you've just introduced yourself, but we also have Patrick Allaert, Patrick, could you also please introduce yourself. Patrick Allaert 2:13 Hi Derick, thank you for the invitation for the podcast, my name is Patrick Allaert. I am a Belgian freelancer, living in Brussels, and I spent half of my professional time as a IT architect and/or a PHP developer, and the other half, I am maintaining the PHP extension of Blackfire, a performance monitoring solution, initiated by Fabien Potencier. Derick Rethans 2:39 I didn't actually know you were working on that. Patrick Allaert 2:40 I'm not talking much about it but more and more. So I succeeded to Julian Pauli, who by the way was also released manager before so now I'm working with Blackfire people. It's really great, and this gives me the opportunity to spend about the same amount of time developing in C and in PHP. This is really great because at least I don't. It's not just only doing C. I, at least I connect with what you can do with PHP. I see the evolution from both sides. And this is really great. It's great, it's also thanks to you Derick, you granted me access to PHP source codes. That was to contribute to testfest something like 12/13 years ago, it was, CVS, at that time. Derick Rethans 3:28 CVS, so now I remember that. Basically, what you both of you're doing is making me feel really old and I'm not sure what I like that or not. I think we all have gotten less head on our heads and greyer in our beards. In any case, what made you volunteer for being the PHP 8.1 RM? Patrick Allaert 3:46 In my case, I think there were two two reasons is that PHP really brings a lot to me in my career, everything is built around my expertise in PHP and its ecosystem. By volunteering as a release manager. I think I can give something back to PHP, because the last time I contributed to source code of PHP, it was really years ago. If I remember it was array to string conversion that was very silent and not emitting any notice; now it's warning. In the meantime, so I think that was PHP 5.0, Derick Rethans 4:22 Ages ago. Patrick Allaert 4:23 Ages ago. Indeed. I was quite passive I was mostly reading on PHP internals, and most of the time now that is quite big so if, if I had to say something I could always see some someone who already just said the same thing so I was not saying: plus one. This is one of the reason and the second one I think is that I think it's kind of a unique opportunity, and I can learn a couple of things. I think, on day one when the Rasmus gave me the access, saying that I can do to OAuth authentication on SSH and that was: okay, day one I already learned something, so that was really cool. Derick Rethans 4:58 And you Ben, I think you tried to be the PHP eight zero release manager as well at some point. That didn't happen at the time, but you've tried again. Ben Ramsey 5:06 I almost didn't try again. I don't know why but when Sara announced it this year, I thought about it, and I don't know, I tossed it around a little bit, but I've been wanting to do it for a long time and I've noticed as Joe Watkins recently put it on a blog post that we need to help the internals avoid buses. So since this is a programming language that I've spent a lot of time with just as Patrick mentioned, both in and out of my day jobs. I want it to stick around to thrive. Since I'm not a C guru, but I do have a lot of experience managing open source software. I wanted to volunteer as a release manager, and I hope that I can use this as an opportunity to inspire others who might want to get involved, but don't know how. Derick Rethans 5:55 And of course you just mentioned Joe, Joe Watkins, who is the third PHP release manager for 8.1, and that is a bit of a new thing because in the past, when the past many releases I can remember you've only had two most of the time. Ben Ramsey 6:09 I think, on the mailing list that came up early on in the thread, and there was a general consensus, I think, consensus may be the wrong word, but there were a couple of people who spoke up and said that they wouldn't mind seeing multiple rookies or mentees or whatever you want to call us, and Joe when he volunteered to be the veteran, and he was the only one who volunteered as the veteran. He said that he would take on two. And so that's that's why Patrick and I are both here and I think that's a good idea, because it will continue to help, you know, us to avoid buses. Derick Rethans 6:46 Yep. And if you're three, you only have once every 12 weeks. Whereas of course, in my case doing it for PHP 7.4 it's every four weeks, because it's me on my own, isn't it. Which is unfortunate that these things happen because people get busy in life sometimes. Getting started being a PHP release manager can be a bit tricky sometimes because just before we started recording, I had to add you to a few mailing lists. Do you think you've now have access to everything, or what do you need access to to begin with? Patrick Allaert 7:18 There is the documentation about release managers, what are you supposed to do, and, and there is an effort of documentation, what you have to ask, in terms of access, and that's great. We are probably going to contribute with our findings to, to improve the documentation. Once you did a bit of the setup, mainly needs to access the servers. You should also know what is the workflow and what are the usual tasks. This is mentioned in the documentation, but I think it would be better to have a live discussion with someone that already did it. The fact that we are doing it with Joe Watkins, who is not only a release manager of 8.1, but also previous release manager, that should be really smooth, to, to see what the the orders and what is the routine to do. To do so, why do you think Ben? Ben Ramsey 8:16 I agree. I think that, I mean we've only just gotten started. It's only this I believe is what was it two weeks ago that we, that this was announced. So this is the first time that Patrick and I have actually spoken face to face. Hi, Patrick! We've communicated by email and slack. I'm sorry not Slack, StackOverflow chat. Joe has given us a lot of good pointers. I feel like some of the advice he's given his been really good, but it's like Patrick said, we haven't really had like a live, like one on one chat, or face to face chat, where we could kind of get caught up on things and understand what the flow looks like. So last week I started going through a lot of the pull requests on GitHub. And I've been tagging them as bug fixes or are enhancements, and there's also an 8.1 milestone that I've been adding to a lot of the tickets, are the pull requests, and I've merged a few of them, but I think that I've merged them a little prematurely. So there were some funny things that came up out of that. I do plan to blog on this, but one of Nikita's comments in the Stack Overflow chat was, you've just made it your personal responsibility to add tests for uncovered parts of the Ristretto255 API. Derick Rethans 9:40 Right, exactly say because I'm doing release management for PHP seven four. I don't do any merging at all. The only thing I'm doing is making the packages, and then coordinating around them. I'm not even sure whether it is a responsibility of a release manager to do. Ben Ramsey 9:55 It may not be a responsibility. I felt like it was helpful maybe to go ahead and take a look and see where things were trying to follow up with people, to get them to respond if something had been sitting there for two weeks or so without any kind of movement. I would, you know, leave a message saying what's the status of this. Derick Rethans 10:19 I know from the documentation that we have on our Release Management process. And many of these steps actually been replaced by a Docker container that actually builds the binaries, so I'm not sure whether Joe I've mentioned that to you yet, because I'm not sure whether that was around when he did release management, the previous time. Ben Ramsey 10:36 Right, it wasn't around either when he did release management, but he's also mentioned that he would like for us to learn how to do it without the Docker container, even if we do plan to use the Docker container. Derick Rethans 10:48 That's fair enough, I suppose. I have never had to do that, but that there you go. Now, what is the timeline like? Patrick Allaert 10:56 In terms of timeline I think the very first thing is being all three release managers having live discussion to define what, what we should do, when we should do, and how. This way we clearly knows our responsibility and the sequence, and also how we are going to organize. Do we do every three releases? We share the task? How are we going to do the work together. In terms of timeline I think the very first release is going to happen in June, if I remember correctly. I set up an agenda sheet with ICAL so that we all can put that in our calendar, nothing really clear on my side. Derick Rethans 11:41 From what I can see from the to do list that the first alpha release is June, 10, which is exactly a month away from when we are recording this. Patrick Allaert 11:51 Right, yeah, it's one month come down before the very first one. I think it might be great that the very first release being made by by Joe, so that we can really see every single step he's doing, so that we can do the same. However, I guess it's kind of a shared responsibility to do triage of bugs and pull requests. Ben Ramsey 12:14 Right. I think there is some desire among the community to see these releases in real time at least a few of them. So I'm going to try to encourage us to stream some of them maybe live, or at least record it and put it up somewhere for people to kind of just see the process to demystify it, so to speak. Derick Rethans 12:35 I actually tried it a few months ago to record it, but there were so many breaks and pauses and me messing things up, and me swearing at it, that I had to throw away the recording. I mean the release went out just fine but like absolute as again... I can imagine the first few times, you're trying this there might be some swearing involved, even though you might not vocalize that swearing. Ben Ramsey 12:56 Oh I'll vocalize it. Derick Rethans 12:58 Fair enough. This is something that is that you're going to have to do for the next three and a half years. Do you think you'll be able to have the time for it in another three years? Ben Ramsey 13:08 I mean for myself I I'm committed to it, I definitely believe that I'll have the time over the next three and a half years, and I'll make the time for it. Derick Rethans 13:18 What about you, Patrick? Patrick Allaert 13:20 Exactly the same. I think it's the least that I can do to PHP, in terms of contributing back, there will be some changes because I it's not like it's, it's not like the infrastructure is something that doesn't change, like for example recently, GitHub, being more having more focus rather than our Git infrastructure. So the changes that will happen, we will have to adapt, I have the impression that release manager has to, every time it's adapting to change this, and that will be very interesting. Derick Rethans 13:53 Luckily we haven't had too many. The only thing I had to change with a change from git dot php.net to get up, was my local remote URLs. So there wasn't actually a lot to do, except for running git remote set-url. I was pleasantly surprised by this because if anything messing around with Git isn't my favourite thing to do. Ben Ramsey 14:14 Also, merging is a little bit more streamlined now you don't have to go to qa.php.net to do that. Derick Rethans 14:21 I've never done anything without Ben Ramsey 14:23 Really? Oh, I guess you would commit directly to git.php.net? Derick Rethans 14:27 Yep. Ben Ramsey 14:28 If there were PRs on GitHub, the only way to merge them well, probably wasn't the only way but one way to merge them was to go to qa.php.net, and if you were signed in with your PHP account, you were able to see all the pull requests, and choose to merge them. Derick Rethans 14:46 Yep, also something I've never done as an RM. The only way how I have reacted with pull requests is commenting on the pull requests, and I wouldn't merge them myself. With the only exception of security releases where you need to cherry pick from certain branches into your release branches. I'm not always quite sure about it as the responsibility for release managers actually do the merging into the main branches. From what I've understood is it's always the people that made the contributions, who just merge themselves, and you then sometimes need to make sure that they merge into the right branch instead of just master, which is what, as far as I know, the, the buttons on GitHub do. Ben Ramsey 15:21 Well the individual contributors, in this case, if they're doing like a bug fix or something, most of them, or many of them aren't don't have permission to do the merging, so someone else has to merge it, like, often I see Nikita merging, a lot of the pull requests. Derick Rethans 15:37 Maybe I've just been relying on Nikita to do that then. I'm not sure how, bug fixes are merchants debug fig branches. I think it's usually been done by people that have access already anyway, because it's often either Nikita or Cristoph Becker, or Stas, and the main developments, or the main other new things that people don't have access to are usually to master. So I guess there's a bit of a difference now. I'm not sure what if any other questions, actually, would have anything to add yourself? Patrick Allaert 16:05 maybe something that would be quite challenging is the very recent discussion about the system that we, that we might change from. The system or the issue tracker with where we have all the bugs. I understand the current issues, I understand as well the drawbacks of what is possibly, for example GitHub issues. It might be great for some, would it be great for us? If we do it was going to be in the bring a lot of changes, and I think, 8.1 will be already slightly impacted by the change to GitHub in terms of pull request strategies, but potentially there will be another change, which is around the bugtracking system. Derick Rethans 16:54 I have strong opinions about this, but we'll leave that for some other time. What about you, Ben? Ben Ramsey 17:00 Right, I actually don't think that we're going to end up making a lot of changes in that regard, very, like, not in the near term, probably. But I did want to point out, or promote that I've started journaling some of these experiences, and capturing information mainly for my own purposes, but I'll be posting these publicly so that others can follow along. My blog is currently down right now. Derick Rethans 17:28 That's because you're using Ruby isn't it? Ben Ramsey 17:30 That's because I'm using Ruby. The short story of it is that there are some gems that were removed from the master gem repository at some point in the past, or the versions I'm using were removed, either for security reasons or what I have no idea why. And that's put, put it into a state where I just can't easily update. I just haven't, I just don't care, right now, so I plan on migrating to something else. In the short term, I'm not going to be doing that. So I've started writing at https://dev.to/ramsay and Dev.to is just a developer community website. If you're on Twitter. It's run by @thePracticalDev, I'll be, I'll be blogging there. Derick Rethans 18:18 And I'll make sure to add a link to that in the show notes as well. Thank you for taking the time this afternoon, or morning, to talk to me about being a PHP 8.1 release managers. Ben Ramsey 18:28 Thank you for having me on the show. Patrick Allaert 18:30 Thank you, Derick for that podcast. I'm really glad you invited us. Derick Rethans 18:39 Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes PHP 8.1 Release Todo List Ben's journal Joe Watkins' Avoiding busses blog post Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 83: Deprecate implicit non-integer-compatible float to int conversions London, UK Thursday, April 29th 2021, 09:11 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Deprecate implicit non-integer-compatible float to int conversions" RFC that he has proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 83. Today I'm talking with George Peter Banyard, about another tidying up RFC, George, would you please introduce yourself? George Peter Banyard 0:31 Hello, my name is George and I work on PHP in my free time. Derick Rethans 0:35 Excellent. I was just talking to Larry Garfield, and he was wondering whether you or himself, are the second often guests on this podcast, but I haven't run a stats. But it's good to have you on again. Following on for from other numeric RFCs, so to speak. This one is titled deprecate implicit non integer compatible floats to int conversions. That is a lovely small title you have come up with. George Peter Banyard 1:01 Yeah, not the best title. Derick Rethans 1:03 What is the problem that this RFC is trying to solve, or rather, what's the change that is in this RFC is trying to solve? George Peter Banyard 1:11 Currently in PHP, which is a dynamic language, types are not known at the statically at compile time, so it's so everything's mostly runtime. And most type conversions are relatively sane now in PHP 8, because like numeric strings have been kind of fixed. But one last standing issue is that floats will pass an integer check, without any notices or warnings. Although floats, don't usually fit in integer will have like extra data which can't be represented as an integer. For example, they can have a fractional part, or they can be infinity, or not a number if you divide, like infinity by infinity, or 0 over 0 or other things like that. Derick Rethans 1:55 These are specific features of floating point numbers on computers? George Peter Banyard 1:59 Yes. Derick Rethans 2:00 Is there any prior work that is RFC is building on top of George Peter Banyard 2:03 It builds up on top on the saner numeric string RFC, because it tries to like make the whole numericness of PHP, as a concept better and like less error prone, but in essence it's mostly self contained. If you use a floating point number, were you should be using an integer. If the floating point number, is considered an integer because it only has like decimal zeros, and it fits in the integer range, then you'll have like no error. So if you use 15.0 as an array key, it gets converted to 15, you'll get don't get any error because it's like well it's just 15 like it doesn't mind. But if you do 15.5, then you'll get like a, like a deprecation notice which will tell you it's like, well, here's the key gets implicitly converted, you should be aware of this because if you use 15 somewhere else, you'd be overriding the value. Derick Rethans 2:54 And that currently doesn't happen yet. And you say, fits in the integer range, what ranges are we talking about here? George Peter Banyard 3:01 On 32 bit, which I would imagine most people don't use any more, is a very just like minus 2 billion to 2 billion, because PHP integers are signed, and on 64 bits, it's like nine quintillion? Derick Rethans 3:15 It is a 64 bit range. George Peter Banyard 3:17 You should be fine by not hitting them, but like if you do some maths computation with it, you might hit like the boundaries, or you do like very edge cases where you try to like mess up with PHP, like that so it's something which you can do it. Derick Rethans 3:30 From what I understand floating point numbers they store, integer things as well as fractional things, and from what I remember is that the range that floating point numbers can store things in without losing any precision is something like 53 bits I think. So if it's larger than 53 bits, then it would have to store something in its floating point part of it, and hence starts losing numbers. George Peter Banyard 3:56 Yeah, I'm not the expert on floating point numbers. Derick Rethans 4:00 They are tricky. George Peter Banyard 4:01 They are tricky and the standard is very confusing at times, especially with like NaNs and like signalling NaNs and like, but the basic concept is like exponential numbers like exponential like scientific notation. You have like your base number, and then you have like a power, and then just gives you like a larger range, but with exponential scientific notation, you also lose like precision, because you don't really care about like the minute numbers. Derick Rethans 4:24 This is why there is a conversion issue both ways right, a floating point number, without fraction or NaN or INF, will always fit in the 64 bit integer. George Peter Banyard 4:32 Yeah. Derick Rethans 4:33 But although you can represent a 64 bit integer in a floating point number. You can't do that without losing data if the integer takes up more than 53 bits, so the round trip conversion also has issues. George Peter Banyard 4:44 That's why, like, that's the check. I'm using, because initially I was just doing an F mod check, because I was like oh just check for like fractional parts, and then Nikita was like: Well, you probably should do round trip checks. Because you also catch infinity, not a number, which also has like some interesting implications, that like if your floating string is considered infinity and you cast it to an integer, you get max int. If it's a floating point number, you'll get zero, which is an handy thing that needs to be like also dealt with, because I just discovered that while working on that. Trying to already get rid of like conversions, is I think a good first step on making most things sane. And we already do that with string offsets. So it's also just like making it more of a global aspect of the language. Derick Rethans 5:35 So this RFC only talks about converting floats to int, but not int to float? George Peter Banyard 5:40 Yeah, because mostly integer to float is a, is a safe conversion, because you can, it fits usually in a floating point number, except apparently 64 bits. Derick Rethans 5:52 I think it is something that we actually should also look at and this is not something I'd realized because I originally thought that before reading the RFC, this that is what you were trying to get, but it's they other way, the other the way around here; so I can see another upcoming RFC to do the other side of the conversion as well. George Peter Banyard 6:11 I imagine so. I put that on my to do list, which is already growing larger and larger with every small idea. I encountered in the language which I'm like, why on earth is PHP doing that? Derick Rethans 6:22 But to get back to this RFC in which kind of situations can this trip up developers? George Peter Banyard 6:27 I would expect most of the time it shouldn't, because this every time you use integers, floating points is mostly maths code, or, if you're doing something very weird, like storing money as a floating point number, which you shouldn't do, but people do it anyway. Derick Rethans 6:45 Does PHP have an arbitrary precision type? George Peter Banyard 6:48 No we don't. But you can use GMP. Derick Rethans 6:52 I don't know what it stands for either, what GMP stands for. They also used to be BCMath, is that's still around as well? George Peter Banyard 6:58 Yeah, BCMath is still around. Most of the time you don't need arbitrary precision, at least for traditional PHP code which is a web based and possibly like E commerce so you're not hitting like insane numbers, but it is mostly full of direct cases or also with like string conversions to like integers, that's I think like, my main point is to try make also string conversions to then numeric type like to make them safer. I think was the previous RFC was the saner numeric string, there's maybe an expectation that you can finally bypass the strict type mode, because everything is strings in HTTP land. So if you get a value, and you just wanted to let like the engine, take care of like making it, it's a valid number and it doesn't lose precision, and you get an integer. That makes it helpful that you get these warnings and notices that hopefully in PHP nine which is who knows how many years away, we finally can like lock down on these edge case behaviours. Derick Rethans 7:57 The RFC is not making this stop working, but rather it will throw a deprecation notice? George Peter Banyard 8:04 Yes. Currently, yes. Derick Rethans 8:06 Why do you say currently? George Peter Banyard 8:07 The plan is in PHP nine, to make this a type error, which initially, I wanted to make it a warning instead of a deprecation notice, but then people on list were, well, like a warning is too strong, and it doesn't imply anything. And if you want to change this to like a type error you should make it a deprecation notice because it means the behaviour will stop working in the, in the later version. So that's why I changed it to the deprecation notice in the second iteration of the RFC. Derick Rethans 8:34 Because, I mean, she just said that could potentially impact already existing code. What kind of BC issues are ever this, by introducing this deprecation warning? George Peter Banyard 8:43 There are various operators, that will implicitly convert floating or float strings to integers. So those are like bitwise operators, shift operators, the modulo operator, the assignment operators of the above. If you try to assign a float to an integer type property. If you try to pass a float to a parameter integer type, or as a return type. Those will show deprecation notices. And then only for floats, not float strings, is the bitwise NOT operator, because that one works with strings as well. And if you use a float string it will use the normal string semantics. And then, as an array key because floating strings already so I noticed was as an array key. Derick Rethans 9:28 Do you think it is better to have a deprecation notice than in stead PHP silently truncating data? George Peter Banyard 9:35 Yeah, if you want that behaviour of like implicitly truncating, you can always use an int, cast, which will do the job for you. Which makes the code explicit and tells the intention of the developer, instead of just like, oh I got a float here, pass it to an integer. Derick Rethans 9:49 What's the reaction to this been so far? George Peter Banyard 9:51 Not many reaction on list, but voters currently one weekend, and it's been unanimously approved, so I'm pretty happy that most people are for it. Derick Rethans 10:02 It's always good to hear unanimous agreement, maybe I should switch my vote to No. As you have said the reaction has been fairly good. And obviously this RFC passed, so the reaction was good enough for this to pass. Do you think there will be some follow up RFCs for ironing out more things like this? George Peter Banyard 10:20 Possibly, I don't know if I'll get them into PHP 8.1. Because time, and I've got some other projects. But I think, maybe, to see you, I've just learned that like some integers lose precision as in floating point numbers, which I wasn't aware of. What's maybe a bit more controversial is to change the behaviour of casting floats, which don't fit into an integer range to now produce Max int, or minimum Int, instead of zero. You will need to put like deprecation notices or warnings when you use an explicit cast, which I don't know how people will feel about that. Derick Rethans 10:58 I see what you mean there. It will be an interesting discussion for when that happens I would say. George Peter Banyard 11:04 Yep. Derick Rethans 11:05 Would you have anything else out about is RFC itself? George Peter Banyard 11:08 Not really it's mostly straightforward. All the details are in the RFC, all the BC breaks are in the RFC. If you're an extension maintainer, there's only one BC break with like a function. When you take Zval and you convert it to an integer, you'll get a notice, which I expect most extension maintainers want their users to know that this is going to like throw at the later point. But you can also then do it manually if you want to support this behaviour implicitly in your extension. Derick Rethans 11:36 I think it is important that extension, that for extensions be if it doesn't suddenly change, but forcing an API change on them is often a better way than deciding to changing an existing API, I think. George Peter Banyard 11:47 The problem is is the API I'm using is used all over the PHP source code, changing that everywhere, felt a bit like hassle, but I've added like a C function which is long compatible, so you can check in advance if it will also do stuff like that. And then there's also a version which, which doesn't serve any notices so you can do it anyway. Derick Rethans 12:08 And that is a new function I suppose? George Peter Banyard 12:10 Yes. Derick Rethans 12:11 I think it's something that extension authors should look at in any case, I mean, we have this lovely upgrading dot internals file, where this certainly should fit in as well in that case, I suppose. George Peter Banyard 12:22 Yeah, it'll fit in. It's currently not that big as a file that usually gets big, a bit before feature freeze because all the changes, land then. Derick Rethans 12:30 I know how this goes. This is also exactly the next debug starts breaking again because of API changes. So far I have been lucky there, so there's not been too many in PHP eight one. Do you know actually how much time there is until feature freeze? George Peter Banyard 12:45 I would imagine it's end of July, as usual, that's the usual timeline. I don't know because RM selection hasn't happened yet, so I don't know how long that usually takes. Derick Rethans 12:54 You're talking about RM for release manager selection here. Once this happens all hope to talk to the new release managers as well, and get them to introduce themselves here. George Peter Banyard 13:02 Seems like a good idea. Derick Rethans 13:03 To chats about any favourite things for PHP eight one. All right, George, thank you for taking the time this afternoon to talk to me about another tweak to PHP's handling of numbers in general, and I'm sure it won't be the last one. George Peter Banyard 13:19 Thanks for having me, and I'll talk to you soon. Derick Rethans 13:22 Hopefully in a pub with a pint. George Peter Banyard 13:24 Yeah, that would be nice. Derick Rethans 13:27 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Deprecate implicit non-integer-compatible float to int conversions RFC: Saner Numeric Strings Episode #62: Saner Numeric Strings Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 82: Auto-Capturing Multi-Statement Closures London, UK Thursday, April 22nd 2021, 09:10 BST In this episode of "PHP Internals News" I chat with Larry Garfield (Twitter) and Nuno Maduro (Twitter, GitHub, Blog) about the "Auto-Capturing Multi-Statement Closures" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 82. Today I'm talking with Nuno Maduro and Larry Garfield. Nuno, would you please introduce yourself? Nuno Maduro 0:30 Hi PHP developers. My name is Nuno Maduro, and I am software engineer at Laravel, the company that owns the Laravel framework, and I have created multiple open source projects for the PHP community, such as Pest PHP, Laravel zero, collusion and more. Derick Rethans 0:48 Alright, and Larry, could you please follow up on that. Larry Garfield 0:51 Hello world, so I'm Larry Garfield. You may know me from several past podcasts here, various work in the PHP fig, and all around gadfly and nudge of the PHP community. Derick Rethans 1:03 Good to have you again Larry and good to have you here today, Nuno. The RFC, that we're talking about here today is to do with closures, and the title of the RFC is auto capturing multi statement closures, which is quite a mouthful. Can one of you explain what this RFC is about? Nuno Maduro 1:20 As you said, the RFC title is indeed auto capturing multi statement closures. But to make it simple, we are really talking about adding multi line support to the one line arrow functions that got introduced it, in PHP 7.4. Now, this new multi line arrow functions have exactly the same features as the one line arrow functions, so they are anonymous, locally available functions; variables are auto captured lexically meaning that you don't actually need the use keyword to manually import the use of variables, they just get auto captured from the outer scope. And the only difference really is one line arrow functions have a body with a single expression. This RFC actually allows you to use a full statement list that possibly ends with a return. Derick Rethans 2:18 Excellent, what the syntax that you're proposing here? Nuno Maduro 2:22 Well, as you may know, one line arrow functions have the syntax, which is fn, parameter list, and then that arrow expression thing, and this new RFC proposes that, optionally, developers can pass a curly brackets with statements, instead of having that arrow expression syntax. Now, this curly brackets with statements, simply denotes a statement list that potentially ends with a return. Concerning the Auto Capture syntax, we will be just reusing the Auto Capture syntax and feature that already exists on one line arrow functions, meaning that you don't need the use keyword to manually import variables. And of course, the syntax itself, in the in the feature, works the exactly same way. Concerning the syntax, it's also important to mention that this RFC was done in combination with the short functions RFC from Larry, but I think I'm going to let Larry speak about that later on this episode. Derick Rethans 3:26 What's the main idea behind wanting to introduce this auto capture multi statement closures. Because from what I understand the arrow is now gone, and it's been replaced by just a function body within curly braces. But why would you want to extend a single expression to like multiple statements? Nuno Maduro 3:44 Well, we all know that long closures in PHP can be quite verbose, even when you want to perform a simple operation. And this is largely due to the syntax syntactic boilerplate that you need to use when using long closures to manually import all the variables with the use keyword. Now, while one line arrow functions solve this problem to some extent, there are also a few cases that you might want to leverage the simplicity of auto capturing, but using two or three lines in a statement list. And one example I can think of is that when you are within a class method with multiple arguments on it, you might want to just return a closure, using all the arguments of that method, and actually using the use keyword in least all of the arguments is in this case, redundant and even pointless. It is also some use cases for example with array filter or similar functions, where they use keyword just adds some visual complexity to the code. We believe that the massive majority of PHP developers are really going to appreciate this RFC, as it makes just the code more simpler and shorter. The community loved these changes really, as proven on property promotions, one line arrow functions, or even the null safe operator. Derick Rethans 5:10 And I think this is something that the PHP language itself is moving forwards to, right. I see in the RFC that you're, you're trying to make sure that syntax stays consistent with itself and you use a word called sigils here for some of these things. What were the important parts of making sure that the syntax stays the same? Larry Garfield 5:30 So this actually relates to the short functions RFC. So the short functions RFC, as we discussed in previous episode, I'm trying to make it possible to write single line expression named functions in a more compact way. And that is a different problem with different syntax than auto capturing closures, which is why we needed to work together on these, because we want to make sure that the syntax for the various different kinds of functions that PHP supports, are all consistent with each other, and this piece of syntax indicates this thing consistently, rather than: in some cases this syntax means this, and other cases that means this other thing. Didn't want to end up with that kind of mess. After the short functions RFC, one part of the feedback was, there are people working on auto capturing long closures. Y'all should work together and make sure that the syntax plays nicely. And, for various reasons I just sat on it for a while, before finally getting hold of Nuno earlier this year, and we got together and talked and made sure that what he was doing and what I was doing, the syntax complemented each other. We ended up with in our discussion and analysis is that you can have a function that is named or anonymous. It can have Auto Capture or not. And it can either be a single expression with no return statement because it just evaluates to that value, or it can be a list of statements, one of which could be returned, potentially multiple could be returned. And right now we have syntax in PHP to support three possible combinations. But there's actually eight possible ways to combine those two, those three variants. We looked at all right, we have three of the eight, which of the others makes sense to have, and the two that makes sense to have are: short functions named function, no closure, and I say an expression body is what my RFC does. And then, anonymous Auto Capture statement list, which is what Nuno's RFC does. That rounds out that list, and the other combinations, I'm not sure actually have a purpose. Technically could exist and this also then means, if in the future we wanted to add those, then we know exactly what the syntax is going to be for those and what they would mean it's all just following that established pattern, so it makes it really easy for people to learn and understand. So what we end up with, out of these two RFCs together, which can stand alone, and they're going to be put to votes separately, but as I said complement each other. If you have: something, double arrow, expression, that consistently throughout the language ends up meaning evaluates to this expression. Short lambda means: you know, this function evaluates to this expression, a named function, double arrow expression, array key double arrow expression, a match statements, and so on and so on and so on, double arrow always means evaluates to expression. And the key word function means declaring a function named or anonymous, that does not auto capture anything. The case of a named function, there's nothing to capture; in the case of a closure, you'd have the manual capture with a use statements exactly like we've had since 5.3. And the FN keyword means, this is a function that is going to Auto Capture. But oh fn statement list with curly braces. I know that means: Auto Capture, statement list, or keyword function with an arrow: I know that means not auto capturing expression, and all these combinations then just make sense and they're easy to learn and they fit together well. That's really what we were trying to make sure, that between these two RFCs we ended up with that consistent set of rules around the syntax that was not designed that way originally but plays out in that way, and we can now make sure it stays playing out in that way that is internally consistent, and therefore easy to learn, easy to document and so on. Derick Rethans 9:35 Because all the things that you just mentioned, they're already in place for existing syntax. Larry Garfield 9:40 Correct. And so we're just taking existing syntax that means a thing, and combining those existing pieces of syntax in a new way. In some cases that syntax didn't necessarily mean that deliberately, for example, the FN keyword on short lambdas, was not added for the purpose of declaring Auto Capture. It was added because we needed to have some kind of syntax there to keep the lexer happy, fine, but now that it's there, we can say: all right, that is going to now mean auto capturing function, because that's something consistent with the language as it is and the language as it evolves into. Derick Rethans 10:14 Is the intention of this new syntax to replace long closures? Larry Garfield 10:19 Not entirely. I suspect, a great many use cases of long closures could get away with then using the auto captured version, but there's no plan to remove the long closure version. Part because there are cases you do need the manual closure, particularly, it's still the only way to capture a variable by reference. All the other versions are by value, which 99% of the time is what you want, but in that other 1% If you do need to by reference, then you've still got the long version. Derick Rethans 10:48 The long version that uses the use keyword. Larry Garfield 10:51 Right, and then you're manually capturing things, are cases where you would want to, you know, use the same variable name inside and out but not refer to the same variable, so in that case you can use the long version, and then you don't have that collision. In practice however, the only languages I know of, that have explicit capture on closures are PHP and C++. As far as I know, every other language, including the other major scripting languages, Auto Capture. We're really the oddball out, and in practice, I think using Auto Capture is going to be fine. It's going to be easier, and we're not going to introduce any substantial amount of new bugs around it. The place where that might cause issues, is if you have a long function with a long anonymous function in the middle of it with Auto Capture and you can't keep track your variables, in which case you're already doing it wrong anyway, you should have shorter functions and shorter closures. A real use case for this is: I have a function that's a closure that's two or three lines long, because not everything in PHP can be an expression, sometimes it has to be a statement. So okay, this thing is going to be two lines long instead of one line long, but I don't want to have to convert to the super verbose long version and manually declare all of my imports, I just want to add a second line. And so this makes that use case, a lot easier and a lot more convenient. Derick Rethans 12:11 I remember when discussing the match operator, which I think I also spoke to you about? Larry Garfield 12:17 Match is the one you spoke to yourself about. Derick Rethans 12:19 That's what it was, yes. Larry Garfield 12:20 It was a fun episode. Derick Rethans 12:22 When I discussed the match operator with myself, I think I looked at whether it was possible to extend a match expression with having multiple statements on the right hand side as well, where it is currently: it's the arrow with a similar expression. Is this something that you'd be looking at to tie this auto caption closure style into as well? Larry Garfield 12:42 It's a related issue that has been discussed mainly around match, around supporting multi line expressions. And that'll be some kind of syntax which hasn't been defined yet to list a series of statements, which can then be wrapped up together and have a final statement that is returned, and then the whole thing gets evaluated, and can be used in place of an expression like in a match statement or a function body. If we had a syntax for multi line expressions, that would be an alternative way to get to the same place this RFC gets to, because you could say: FN, parameters, double arrow, multi line expression, with whatever syntax that ends up being. And that gets you essentially the same thing at the end of the day. Is that good or bad, I'm kind of torn on it. What we point out in the RFC, is this syntax for auto capturing multi line closures, gives us a kind of roundabout way to put a multi line body into a match arm, where what you, the single expression, that the match arm evaluates to, is a multi line closure with Auto Capture that you then immediately self execute. The syntax for that is a little bit quirky. It looks kind of like older style JavaScript. We have parenthesis, function definition, closed parenthesis, open paren, close paren, so it just executes immediately. It's not ideal for that use case. Personally I think multi line match expressions, or multi line match arms, are a rare enough need that on the rare occasion you need it, this would be good enough. And if it's not good enough, you really should break that logic out to a separate function anyway and just call that. Not everyone agrees with that. So that's more a more of an interesting side effect of this RFC than a goal per se. One have to use it in that fashion I probably not will not use that in that fashion, very often, but it's, we now have a solution for that use case, if you actually have that use case, and we don't need any dedicated syntax for that. Derick Rethans 14:46 That could be part of a future RFC if people still feel inclined, that they need that. Talking about things in the future, is there any future scope with this RFC as well? Nuno Maduro 14:57 There is really nothing planned on this RFC is future scope. Yet, there is something that I will like to personally explore in the future, that now we have this fn keyword, that means Auto Capture, or access to the outer scope. And I think something would be very cool, is to explore named functions in the way that they are declared globally, with something like fn get name, and then that function would be able to access the outer scope, but again this is something that personally I would like to explore but it's not included in this RFC. I just plan to explore this in the future now that we have this possible combinations that Larry just explained. Derick Rethans 15:43 It's always interesting to see what people think of when you post RFC to the mailing list. What sort of where the biggest arguments against introducing this new syntax? Larry Garfield 15:54 It's interesting. For both of these RFCs both my short functions, and now the Auto Capture multi line, the feedback from the public community on Reddit, on Twitter and so forth has been extremely positive people love: Oh, I can write less syntax and get stuff. It's been not universally but overwhelmingly positive. The feedback on the mailing list has been decidedly mixed with some people saying, cool this you know I've been waiting for that, and others saying: Why? Been push back: If your capture statement is that complex you, you're doing it wrong anyway. Or, if you do have Auto Capture, rather than explicit, your odds of capturing something you don't intend to are higher. And so you can introduce weird bugs that way. Derick Rethans 16:42 Which aren't really arguments against having a multi line out to capture closure, with two or three statements. I mean if you're putting 50 lines in there then sure, you can make that argument I guess. Larry Garfield 16:53 Exactly, and that's kind of our response is: if you have a complex enough piece of code that Auto Capture becomes problematic. You have a complex enough piece of code that you really should be manually capturing, or just refactor your code so you don't have that much complexity. That since it kind of becomes a good indicator of need to refactor. Then there's always the argument of: why should you add more syntax for anything, you know, we've got one syntax let that rule everything and that's that comes up with every RFC. Points are valid, to an extent, but I think the convenience factor of being able to write code more naturally with less effort that does stuff that right now is just clunky, is a stronger argument, especially given that most other languages don't have manual capture and get along fine. People have mentioned JavaScript as an example where the Auto Capture used to be highly problematic, then resolved with an extra keywords you can declare a variable inside a closure with let, that is then locally scoped and overrides anything in a parent scope. I don't think PHP needs that in part because we don't use closures as overwhelmingly as JavaScript does, and honestly that problem has kind of gone away in JavaScript, as they've introduced real classes, and other more traditional techniques that obviate the need for those kind of closure inside closure inside closure inside the closure nonsense. Python doesn't actually have multi line lambdas as far as I'm aware, because they have named functions that are scoped local. Ruby, as far as I know just does Auto Capture and doesn't have any special syntax around it. So, I have not heard of them having any problems. As I said, C++ has manual capture and that's the only one I can think of that has it. I think looking at other languages, the problems people have pointed out are more hypothetical than real, and I'm hoping that, you know, voters on the list will see all right. This makes life easier and the problems with it are hypothetical, not real problems that we've seen in the wild. So let's just make life easier for people. Derick Rethans 19:05 Is there any chance of this breaking BC, somehow? Larry Garfield 19:08 It shouldn't, the syntax right now would be syntax error. I don't see any, any BC breaks possible. Derick Rethans 19:16 That's always a good thing isn't it. Larry Garfield 19:18 Yes. Derick Rethans 19:19 You were talking about appealing to the voters on the mailing list, which have the right to vote on features usually. When do you think you will be putting this up for a vote? Larry Garfield 19:29 Probably around the end of April. We can put probably both RFCs up for a vote, you know, let the chips fall where they may. As you said both RFCs are stand-alone. If one pass and the other fails everything still works. Obviously we both like both the pass. Derick Rethans 19:43 And with both you mean: both this RFC, which is the output capturing multi statement closures RFC, as well as the short functions RFC that we spoke about in episode 69. Larry Garfield 19:53 Correct. Derick Rethans 19:54 Thank you for taking the time today to talk to me about the new RFC that you're proposing. Larry Garfield 20:00 Thank you, Derick always good to talk. Nuno Maduro 20:01 Yeah, thank you so much for having me. Derick Rethans 20:06 Thank you for listening to this instalment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patron. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Auto-Capturing Multi-Statement Closures Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 81: noreturn type London, UK Thursday, April 15th 2021, 09:09 BST In this episode of "PHP Internals News" I chat with Matthew Brown (Twitter) and Ondřej Mirtes (Twitter) about the "noreturn type" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 81. Today I'm talking with Matt Brown, the author of Psalm and Ondřej Mirtes, the author of PHPStan, about an RFC that I propose to alter the noreturn type. Matt, would you please introduce yourself? Matthew Brown 0:37 Hi, I'm Matthew Brown, Matt, I live in New York, I'm from the UK. I work at a company called Vimeo, and I've been working with for the past six years on a static analysis tool called Psalm, which is my primary entry into the PHP world, and I, along with Ondřej authored this noreturn RFC. Derick Rethans 1:01 Alright Ondřej, would you please introduce yourself too? Ondřej Mirtes 1:04 Okay, I'm Ondřej Mirtes, and I'm from the Czech Republic, and I currently live in Prague or on the suburbs of Prague, and I've been developing software in PHP for about 15 years now. I've also been speaking at international conferences for the past five years before the world was still alright. In 2016, I released PHPStan, open source static analyser focused on finding bugs in PHP code basis. And somehow, I found a way to make a living doing that so now I'm full time open source developer, and also father to two little boys. Derick Rethans 1:35 Glad to have you both here. We're talking about something that clearly is going to play together with static analysers. Hence, I found this quite interesting to see to have two competitive projects, or are the competitive, or are the cooperative. Matthew Brown 1:56 I think half and half. Derick Rethans 1:57 Half and half. Okay. Ondřej Mirtes 1:59 Competition is a weird concept in open source where everything is released for free here that Derick Rethans 2:04 That's certainly true, but you said you're making your living out of it now so maybe there was something going on that I'm not aware of. In any case, we should probably chat about the RFC itself. What's the reason why you're wanting to add to the noreturn type? Ondřej Mirtes 2:18 I'm going to start with a little bit of a detour, because in recent PHP development, it has been a trend to add the abilities to express various types natively, in in the language syntax. These types, always originally appeared in PHP docs for documentation reasons, IDE auto completion, and later were also used, and were being related with static analysis tools. This trend of moving PHP doc types tonight this type started probably with PHP seven that added scalar type hint. PHP 7.1 added void, and nullable type hints, 7.2 added object type, 7.4 added typed properties. And finally, PHP, 8.0 added union types. Right now to PHP community, most likely waits for someone to implement the generics and intersection types, which are also widely adopted in PHP docs, but there's also a noreturn, a little bit more subtle concept, that would also benefit from being in the language. It marks functions and methods that always throw an exception, or always exit, or enter an infinite loop. Calling such function or method guarantees that nothing will be executed after it. This is useful for static analysis, because we can use it for type inference. I have an example, when you're accepting nullable object as a function parameter, you probably want to eliminate the null value before you can safely call a method on it. So, you will write if $object, three equal signs null, somehow handle this situation, and at the end of the if statement, you will return, or throw an exception. But instead of return, or throw, you might choose to call framework specific or a library specific function, that also always throws or exits the process. This will also tell the user, the IDE, and the static analyser, that below the if statement, the variable can no longer be null. For example, if you ever called mark test skipped in PHP unit, or if you call the abort function in Laravel, you've already used the function that would benefit from being marked with noreturn keyword. Derick Rethans 4:24 You mentioned that currently people use the docblock no it @noreturn for that. Why would it be better to have it in the language? Matthew Brown 4:31 Jumping off, Ondřej's point. PHP has this has this thing, right, you know things where the doc block, but PHP also, it's a language where developers are used to the language telling them if they did something wrong. So whereas other languages, you might need, like for example, JavaScript, they can be a bit more permissive. Developers when they write PHP code, they're used to getting errors instantly. They call a function with an object instead of a string, and expects a string, and it's marked in the signature as expecting a string, when they run that they get an error. And so that's just a kind of way that most PHP developers write cod. With a noreturn type, we sort of thought that there's a benefit to developer, having written a noreturn type, instantly getting an error if they actually do something that returns. So it follows that pattern that PHP has adopted, of, if I do something that violates a type that I've annotate, that I've explicitly added to the function, PHP should error. There's also a useful sort of side effect here, which is that when you add noreturn to a function, it's guaranteed that it will never return the context. If you call it, it will never not return because it will either whenever not throw an exception or exit, because if the noreturn is invalid, if it does actually do something where it's returning somehow, PHP will then throw a Type error. Cause it's supported by the language. If it wasn't supported by the language, you'd be able to use a function that called noreturn, and it wouldn't actually return. I mean obviously Ondřej and I are big fans of static analysis. The language itself isn't just to pat ourselves on the back and think, you know, we had the right idea when we were doing static analysis, it's because it can help PHP developers write code. Derick Rethans 6:16 The void return type can only being used in specific locations, I mean you can't type a property for examples void. Are the similar restrictions on where noreturn can be used? Ondřej Mirtes 6:27 Yeah, right now it can be used just as a return type. There might be some other possible usages, but they are not part of this RFC. For example the noreturn bottom type could be used as a parameter type to denote a function that shouldn't be called. So, this might be some relevant use case, but I've already had a feature request for PHP Stan, to actually support this type as a parameter type for callbacks that should never be called, but I don't remember why that person wanted this. Once we have generics, or at least the possibility to type what's in an array, we could also use the no return type for that. For example, array that contains noreturn, or never, would mean that the array is empty. And also during static analysis, the type inference engine also produces this type internally, basically to mark dead code. So for example if you ask better variable that can only ever contain an integer, if that variable can be a string, you're creating a condition that cannot be executed, that will be always false, and the resulting type of the variable inside that condition is the same type as noreturn or never. Derick Rethans 7:41 You mentioned never there we haven't spoken about yet, but we'll get back to that in a moment I'm sure. Is there any prior art for this? Matthew Brown 7:47 Yes, a number of languages have a noreturn type. Hack has specifically a noreturn type, Hack, if anyone listening doesn't know, hack is a language created as a sort of offshoot of PHP. Engineers at Facebook, when they were running into issues with PHP from about the moment they started using it in 2007/2008 as the site started growing, and performance really became an issue. And so eventually they created their own version, basically. And one of the benefits of working at Facebook is, you have lots and lots of smart engineers, and they added a lot of different typing functionality to this new language. And so one of the things I added was a noreturn type, as well as adding generics and many, many other things. Another language with prior art is type script. TypeScript has a never type, which is essentially the same. It's a bottom type as Ondřej was talking about. And a bottom type is the subtype of all subtypes. You have a class structure, you have exception, and then you have a child class of logic exception, and noreturn, is the subclass of subclasses of the child class, the thing right at the bottom of the type hierarchy, and so it can always be returned when you would expect some other thing. But basically, this is the understanding of what a bottom type is. I talked about interpreted languages to interpreted languages, but also many compiled languages, most recently Rust, that have the notion of a bottom type. It's a type, where you're guaranteed that program execution ends, in some way shape or form. Derick Rethans 9:23 You mentioned that noreturn is the bottom type, how does that play with variance that PHP implements? Matthew Brown 9:32 The concept of variance for return types is essentially, if a parent method returns something like a, an exception class, the child classes can either return an exception class, or they can return children for that same method of the exception. So let's say I have a method getException, that is described as returning an exception, the child methods in our child class, so child::getException can either return an exception, or they can also say they return a child class, so they can say, I actually return a logical exception, and this is valid according to Liskov substitution principle, which is to say: you're allowed to return a child type of whatever the overridden method was. So where this comes into play with noreturn is, noreturn is defined as is the bottom type is at the very bottom of all those class structures, you can always return a bottom type, basically And this makes sense if we just think about it, you're not breaking a contract, if your function always returns or exits; the variance rule to kind of follow that. Derick Rethans 10:43 How would that compare with void? Because void has some interesting variance rules as well right? Ondřej Mirtes 10:49 Actually, no or little similarities between void, and noreturn. Because when you are calling a void function or method, you expect it to do something, and then actually continue in the execution flow. Not expect to read the return value, but with noreturn, you call a method, and you don't expect it, the execution flow to continue. These are completely different, and I actually don't know how people can mistake one for the other. Derick Rethans 11:22 Yes, seems very, very different to me as well. The RFC talks about alternative ways of introducing noreturn. And one of the things that had mentioned, is using the attribute. Attributes, being introduced in PHP 8.0. Why did you decide not to implement it as an attribute or suggested as an attribute instead? Matthew Brown 11:43 Attributes I think are really cool. I think attributes have a place in the language, obviously they have a place as the RFC described, in place a docblocks, they can be reflected very quickly at runtime. And I also I'm interested in ideas like a deprecated attributes. And also I've just been kind of toying around in my head, the idea of a pure attribute, which could guarantee at runtime that a function with that attribute, was pure. It would never, for example, use a property, or it would never use like a static variable. We could guarantee purity of functions which would interest the pure functional programming people Derick Rethans 12:26 Could you explain what a pure function is? Matthew Brown 12:28 A pure function is a function that doesn't use any other data but the data you provide it. If I have a multiply function that takes two parameters, x and y, and it returns the multiplication of those things, you would call that function pure. There are many ways the function can become impure. One of the ways is it can have output, you can have IO output for example so if the body of the function you then echo the value of x, before returning, that function becomes impure because it's changed the environment that it operated in slightly. Additionally, if you metal memorize the value of x. So let's say you have x and y as inputs, and then in the first line, you take the value of a property elsewhere, and you add x to that value, and then you multiply that result, then that function is also impure, you're using data from outside the function to return this value. So the idea of a pure function is one which essentially can be modelled mathematically, and that's why some kind of purists, like the this idea because it allows things to be modelled mathematically, but more importantly, then it allows those functions to be tested very effectively. Some implements purity, so that you can add a docblock annotation to function and it will tell you that, whether or not the function is pure. This has extra benefits when writing really complex code. So the most complex code that Psalm has, which performs some boring computation, I've added these pure annotations everywhere. And what it does, it forces me to write the code in a way that avoid side effects. The hope is from my end that writing this very complicated code in a pure fashion, makes it easier to debug at some later date. Derick Rethans 14:20 Thanks for that. Matthew Brown 14:21 I think attributes are great and have these uses. I don't believe that attributes are useful to encode types, because PHP has a place where you can already represent types, you know, we've introduced into the language itself, the notion of typing, you know obviously many years ago. I think there is a benefit to where possible, keeping the types as types. There was a suggestion that noreturn could be an attribute instead, because it in some way it's it's really about behaviour. But it's still a type, and in the wider programming community, there is prior art for it to be considered a type. So there's basically no benefit to my mind so making it an attribute. And as well, the implementation as a type is very small, you know, it's less than, well under 100 lines of actual written PHP to implement this feature because it uses the existing checks that we already use. We also use for other return types, and to make an attribute we kind of take it out and very much expand the implementation. There are two good reasons there to not want to use an attribute. Ondřej Mirtes 15:31 There are not very useful combinations. If noreturn was an attribute, then what would you write as a return type. There are not many useful combinations of what it could be. Derick Rethans 15:44 And it also can't be used with any kind of variance rules any more. Matthew Brown 15:48 Or at least if it were to be used for variance rules then we would have to write that logic. You'd be like why are we writing this logic in this particular way, it wouldn't make sense. Derick Rethans 15:57 Because noreturn is a type, and not a behavioural thing. Makes perfect sense. Matthew Brown 16:03 But it's both a type and a behaviour. In the same way that when you actually say, this function returns a thing, PHP then does a behavioural or check to make sure that that function always returns. You could argue that every type is essentially a behaviour, because you're saying the behaviour of this function has to return a value. Derick Rethans 16:21 Earlier, one of you mentioned instead of noreturn, the never keyword. Is that the only alternative that that was discussed are the further ones? Ondřej Mirtes 16:32 Well there's noreturn and never and the RFC is now going through the voting process, so the secondary vote is about the name, and some languages also use nothing. It feels more natural to say that that function never returns, or using the noreturn keyword, then, saying that it returns nothing which blends closer to the void, void keyword Derick Rethans 16:58 Earlier you were mentioning that for future scope you wanted to use this new keyword that you're suggested to introduce also in all the locations where perhaps noreturn does not make sense. Ondřej Mirtes 17:08 Yes. Also. What no return has going for it is that it's unlikely to be used as a class name somewhere so making it, whereas if key word isn't an issue, but just as you said, it looks like a key word for a single purpose being written in a return type thing that it's quite obvious which one of us two like which keyword, because I like never more. And one reason is that it's a single word and it reads more naturally in the source code, and it's also looks more like a full fledged type and TypeScript, uses the same keyword. Derick Rethans 17:42 Why did you put noreturn in the RFC? Ondřej Mirtes 17:44 Because Matt likes it more. Matthew Brown 17:47 Yeah, I wrote the first draft of the RFC, I got first dibs, but this is a big point of contention with Ondřej and I, and we're almost at the point of not speaking to each other, because I'm on one side and he's on the other. And it looks at the moment like never will succeed. I think the TypeScript thing is a good point. When I wrote the RFC originally, I wasn't thinking that so many PHP developers write TypeScript. I hadn't really factored into my head. And I think, given that it does make more sense that never is used. Derick Rethans 18:21 Looking at how recurrent voting is going, never has 32 votes going for it, and no return has 14 votes going for it. Ondřej Mirtes 18:29 Just kidding. I can't wait to have a beer with him again, once the world is, is fine again. Derick Rethans 18:35 Me as well. Matthew Brown 18:36 He can't start inventing new words; like yeah ironically naming is hard right. Derick Rethans 18:41 Definitely the case. At the moment it's very clearly looks like that, the new keyword is going to be never, with 40 votes for introducing a keyword to begin with and 10 against, so that looks like a done deal. Would either of you have anything else to add? Ondřej Mirtes 18:57 Yeah, Derick, last time I refresh the wiki, I noticed that you haven't voted yet so what is going to be your vote? Derick Rethans 19:04 I intend not to vote until I've spoken to the people on the podcast. Matthew Brown 19:09 Great, great. Derick Rethans 19:10 I will make sure to vote. Having said that, thank you very much for taking the time today to talk to me about this RFC. Matthew Brown 19:17 Thank you. It was a pleasure. Ondřej Mirtes 19:19 Yeah, I've been following this podcast closer since the beginning, so I'm happy I was able to finally join, and have something to talk about here. Thank you. Derick Rethans 19:26 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: noreturn keyword Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 80: Static Variables in Inherited Methods London, UK Thursday, April 1st 2021, 09:08 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Static Variables in Inherited Methods" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick, welcome to PHP internals news, the podcast dedicated to explain the latest developments in the PHP language. This is episode 80. In this episode I speak with Nikita Popov again about another RFC that he's proposing. Nikita, how are you doing today? Nikita Popov 0:30 I'm still doing fine. Derick Rethans 0:33 Well, that is glad to hear. So the reason why you saying, I'm still doing fine, is of course because we basically recording two podcast episodes just behind each other. Nikita Popov 0:41 That's true. Derick Rethans 0:42 If you'd be doing fine 30 minutes ago and bad now, something bad must have happened and that is of course no fun. In any case, shall we take the second RFC then, which is titled static variables in inherited methods. Can you explain what is RFC is meant to improve? Nikita Popov 1:00 I'm not sure what this meant to improve, it's more like trying to fix a bug, I will say. This is a really, like, technical RFC for an edge case of an edge case, so I should say first, when I'm saying static variables, I'm not talking about static properties, which is what most people use, but static variables inside functions. What static variables do unlike normal variables, is that they persist across function calls. For example, you can have a counter static $i equals zero, and then increment it, and then each time we call the function, it gets incremented each time and the value from the previous call is retained. So that's just the context of what we're talking about. Derick Rethans 1:43 Why would people make use of static variables? Nikita Popov 1:46 I think one of the most common use cases is memoization. Derick Rethans 1:50 Can you explain what that is? Nikita Popov 1:51 If you have a function that that computes some kind of expensive result, but which is always the same, then you can compute it only once and store it inside the static variable, and then return it. Maybe possibly keyed by by the function arguments, but that's the general idea. And this also works if it's a free standing function. So if it's not in the method where you could store state inside the static property or similar, but also works inside a non method function. Derick Rethans 2:22 The keyword here in his RFC's title is inherited methods, I suppose. What happens currently there? Nikita Popov 2:29 There are a couple of issues in that area. The key part is first: How do static variables interact with methods at all? And the second part is how it interacts with inheritance. So first if you have an instance method, with a static variable, then some people expect that actually each object instance gets a separate static variable. This is not the case. The static variables are really bound to functions or methods, they do not depend on the object instance. Second problem is: What happens with inheritance? So you have a parent class with a method using static variables and then you have a child class that inherits this method. There are two ways you can interpret this, either this is still the same method, so it should use the same variables, or you could say okay the inherited method is actually a distinct method and should use separate variables. PHP currently follows the second interpretation. Derick Rethans 3:24 And is this even the case, if it's overridden, or just when it's inherited? Because there's a difference there supposed as well. Nikita Popov 3:30 Yeah, this is the basic model that PHP tries to follow but there are quite a few edge cases. The one that's what you mentioned if you override the method, then of course, you're calling the overridden method so the static variables don't even come in. But if you then call the parent method, then usually you would expect, if you do an override and just call the parent that the behaviour is exactly the same as if you didn't override it at all. That's not the case here, because now you're calling the parent method. So the problem you have here is that if you didn't override the method, then the child method and the parent method would have different static variables. Now if you call parent, then you're just calling the parent method, so you get back to one set of static variables, which are the same for both methods. You can see they're the same for both methods, but rather because you're calling the parent method, there is only one method involved, only one set of static variables involved. You can't really just like seamlessly extend a method that uses static variables, without changing the behaviour by accident. Derick Rethans 4:37 This sounds all very complicated. Nikita Popov 4:39 Yes I said, I did warn you that this is an edge case of an edge case. Derick Rethans 4:44 But I think the whole idea behind the RFC is to make it less complicated. Nikita Popov 4:48 Yes, this is like not the, the only issue you can run into. There is another one that we have actually addressed separately, but which still exists on earlier PHP versions, which is that the value of the static variables depend on the time of inheritance. Let me be a bit more explicit there. What we had in previous versions is that half your parent method was a static variables, then you call that parent method, static variables change, then you inherit it. And again, call the inherited method. In that order: first declare the parents, call it, declare the child, call it. In that case we will actually take the static variables at the time, where the inheritance actually happens. The first call onto the parent method modify the static variables, then we will use some modified variables. From that point on, it will have a separate copy, the child method, but it will like pick up these original modifications before inheritance happens. Now in PHP 8.1, we actually already fixed that so that we always use the original values, but this is like just one more thing to the list of weird things that happen, if you use static variables inside methods and you inherit them. Derick Rethans 6:06 I think I understand more of it now. Nikita Popov 6:08 I think you understand more than you ever wanted to know. Derick Rethans 6:12 You've mentioned the edge cases. What is the result, going to be once this RFC passes, which I'm going to think is quite likely Nikita Popov 6:21 The result is, hopefully, simpler than what we have, namely that static variables are really bound to a specific function or method declaration. If you have one method using static variables, then you have only one set of static variables ever. If it's inherited, you still reuse the same static variables because there is no separate inherited method. It's just the same method in the child class. That's the concept. Derick Rethans 6:52 And if you override it in an inherited class? Nikita Popov 6:55 If you override it, and you call the parent method, then the behaviour is unchanged because you still have just a single set of static variables, so there is no edge case here any more because the child, the child method never had a separate set. Derick Rethans 7:10 But if the overridden method also defines its own static variable with the same name? Nikita Popov 7:17 That's possible. In that case, once again this rule is that each method has its own static variables and methods can have static variables and the child method can have them as well, if they are overridden and there are no name clashes between them. Derick Rethans 7:32 Because they are going to be totally separated, meaning that any code you run in the inherited methods will only affect its static variables, and any code that runs in the original methods only affects the static variables that are bound to that specific method. Nikita Popov 7:50 Exactly. I mean, in the end, static variables are really the type of global state, just a type that is kind of isolated to a specific namespace and doesn't cause clashes, so in that sense, it's important that these things are isolated. Derick Rethans 8:06 And that would also make the behaviour, a lot more easier to explain than it currently is. Because every methods, has its own set of static variables. Nikita Popov 8:15 Yes. Derick Rethans 8:15 Or I should say, every declared methods, has its own set of static variables. Nikita Popov 8:20 I guess that is an important distinction. If you can see the methods inside your code and see the static variable inside it, then that is a distinct one. If we ignore the exception of traits. Derick Rethans 8:34 You're going to have to explain that as well. Nikita Popov 8:37 Well traits are always a special snowflake. Our general model for traits is that they are compiler assisted copy and paste. So a trait should roughly behave as if you just copied all the methods into the class that's using the trait. And in that sense, if you are actually copying the code of your method with a static variables, then it should also use a distinct set of static variables for each use. And that is also how it is proposed to behave. So that is like the one exception where you have a single method declaration in your code, but each using class will get a separate set of static variables. Derick Rethans 9:15 Because the code is copied in place, instead of linked, or used in place. It's also the case for all the methods declared in traits, they're also copied into the same symbol table as the methods belong to a class. Nikita Popov 9:32 Yeah, that's right. Derick Rethans 9:33 Should be reference counted in some way because you probably won't duplicate the exact data. Nikita Popov 9:38 We of course don't actually copy the methods, or at least most of the methods, but from the programmer perspective that's how it works Derick Rethans 9:46 Why do you say most of the methods, and not all the methods? Nikita Popov 9:50 We separate two things there. There is the method itself. So the op-array, and then there is all the stuff it uses like the opcodes the like arguments information and so on. What they do for traits, is we share all the data, and only create a separate op-array. Reason is that there are some differences. For example, we have to adjust the scope, we have to adjust possibly the function name if aliases are involved, and we have to adjust the static variables. So it's like kind of a partial copy we do. Derick Rethans 10:22 Which is probably the most efficient way of doing it? Nikita Popov 10:24 Yes. Derick Rethans 10:25 Because this RFC is changing behaviour due to bug fixes, I would probably argue, what kind of backwards compatibility issues are there? And have you looked at how much code that actually impacts? Nikita Popov 10:39 I haven't looked how much code it impacts because this seems like pretty hard to really analyse. I mean I guess something we could easily check is how much static variables are used in methods at all. But it would be hard to distinguish whether this change. I mean how to distinguish in a completely automated way. Whether this change makes behavioural difference for a particular use case or not. So I can't really give information on that, though I would expect that impact is relatively low because the common use cases, things like memoization, they aren't affected by it, or they are only affected by it in the sense that: Then you will memoize a value only once for the whole class hierarchy instead of memoizing it once for each like inherited class. Derick Rethans 11:32 So, it's going to improve the situation there as well, is pretty much what you're saying? Nikita Popov 11:36 Yeah, but I'm sure there are also cases where the previous behaviour was like intentionally used. I mean it was never documented, but you know if some behaviour exists, people will always make use of it in the end, but I can't really say exactly how much impact this would have. Derick Rethans 11:55 Do you have anything else to add, discussing this RFC? Nikita Popov 11:59 No, I think that's it. Derick Rethans 12:00 Then I would like to thank you for taking the time today, again, to talk to me about static variables in inherited methods. Nikita Popov 12:08 Thanks for having me once again. Derick Rethans 12:16 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: Static Variables in Inherited Methods Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 79: New in Initialisers London, UK Thursday, March 25th 2021, 09:07 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "New in Initialisers" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. As you might have noticed, the podcasts are currently not coming out once every week, as there are not enough RFCs being submitted for weekly episodes. I suspect that this will change soon again. This is episode 79. In this episode, I speak with Nikita Popov, about a few more language additions that he's proposing. Nikita, how are you doing today? Nikita Popov 0:43 I'm doing well, Derick, how are you doing? Derick Rethans 0:45 I'm pretty good as though, I always much happier when it's sunny outside. Nikita Popov 0:48 Yeah, for us to weather also turned today. Yesterday it was still cold. Derick Rethans 0:53 We're here to talk about a few RFCs. The first one, titled "New in Initializers". What is this RFC about? Nikita Popov 1:00 The context is that PHP has basically two types of expressions: ones, the ones used on normal code, and the other one in a couple of special places. These special places include parameter default values, property default values, constants, static variable defaults, and finally attribute arguments. These don't accept arbitrary expressions but only a certain subset. So we call those usually constant expressions, even though they are not maybe constant in the most technical sense. The differences really that these are evaluated separately so they don't run on the normal PHP virtual machine. There is a separate evaluator that basically evaluates an abstract syntax tree directly. They are just like, have different technical underpinnings. Derick Rethans 1:49 Because it is possible to for example, define a default value to seven plus 12? Nikita Popov 1:54 Exactly. It's possible to define it to seven plus 12, but maybe not to seven plus variable A, or seven plus some function call or something like that. Derick Rethans 2:03 I guess the RFC is about changing this so that you can do things like this. What are you proposing to add? Nikita Popov 2:09 Yes, exactly. So my addition is a very small one, actually. I'm only allowing a single new thing and that's using new, so you can use new, whatever, as a parameter default, property default, and so on. Derick Rethans 2:23 In this new call, pretty much a constructor call for a class of course, can arguments to be dynamic, or do they need to be constant as well? Nikita Popov 2:33 Rules are always recursive, so you can like pass arguments to your constructor, but they also have to follow the usual rules. So again, they also have to be constant expressions, or after this RFC, they can also include new, but they cannot include variable references and so on. Derick Rethans 2:50 Is this something that is being defined at the grammar level or somewhere else? Nikita Popov 2:55 We actually used to define that at the grammar level back in PHP five, but nowadays we just accept all expressions, and then we print you a bit nicer error message if it turns out it's not supported in this context. Derick Rethans 3:08 And that happens when the AST is created. Nikita Popov 3:10 Yeah, exactly. Derick Rethans 3:11 The new syntax additions is to allow new in default values in places. The RFC talks a little bit about the evaluation order, and what is this and why is is important. Nikita Popov 3:23 This is really the critical part. Right now, the kinds of things you can use in constant expressions are well as the name says, these can supposed to be constant, and not dynamic. This is not entirely true because, for example, you can like reference a class constant and referencing a class means that you call an autoloader. And that can run arbitrary code. Or you can trigger a notice that triggers an error handler and that also runs arbitrary code, like more like at the design level at the conceptual level, these really are constant expressions that are not supposed to have any side effects. Allowing new changes that in the big way, because new calls constructors and constructors can do whatever they want. I think it would be unusual to print out a string in the constructor, but some people might want to create a database connection there. The question of where exactly these expressions get evaluated becomes more important now, so we have to think about that a bit more carefully. Previously this was never really formally specified, how the evaluation works. There are some cases where it's pretty clear cut, for example, parameter default value of course gets evaluated when you call the function, and that parameter hasn't been passed. Similarly if you define a global constant, then the value is evaluated when you define the constant. And then there are the problematic cases, that I'm actually still not sure what to do about. The problematic cases are class constants, and static properties. Because the way these work right now is that they are evaluated lazily, so when you declare the class, they are not evaluated, and then later if you use the class, at least most uses of the class, they will be evaluated. Derick Rethans 5:07 It would happened when you're done instantiate the class? Nikita Popov 5:09 For example, yes. If you instantiate the class, if you add a static property and so on. But for example, not if you extend the class. In cases that like might potentially meet the evaluated initializers. The problem here is that, um, of course, if now your expressions can have side effects, then it's not great if you don't have like hard guarantee when it's actually going to be evaluated. On the other hand, what I actually implemented already, but I'm not sure it's a good idea is to change it to evaluate the expressions equally, so when you declare the class, we immediately evaluate all the static properties and constants. Derick Rethans 5:47 I think I found a problem with that as well. For example, if one of the default values is doing new DateTime for example, if you rely on that happening when you instantiate the object, you will get a different time than when you declare a class. Nikita Popov 6:01 I probably should have mentioned that explicitly. So when it comes to properties we'll always evaluate when you create the object. If you do a new DateTime then you will always get a new DateTime for each object, otherwise it wouldn't really make sense. The problem with the, with evaluation order for the static properties is, that if we evaluate immediately when they declare the class, then we can run into issues with dependencies. If you're using auto loading it's usually not a problem, but if you like declare classes manually, then you might have one class on using a constant from a class that you declare later on. Right now, that works fine, because the initializers are evaluated lazily, but if we evaluate them immediately then this is going to throw a fatal error because it says okay the class hasn't been declared yet. That's a backwards compatibility break, and there are also some other issues, for example with preloading, where it's not really clear when exactly we should be evaluating things in that context. So this is a point I'm undecided on, when things should evaluate. If we should stick with the current lazy evaluation or make it easier or possibly just limit the RFC, to not allow new inside static properties and class constants, because, at least for me personally, those are not the main use case. The three things that seem most important to me are parameter default, property default, I mean non static property default, and usage in attribute arguments. Derick Rethans 7:29 When I was reading the RFC, I was realizing that if a constructor throws an exception, for example if it's a default argument to a method, then what would happen? Would the method fail, or, or would the method call fail or something else? Nikita Popov 7:45 It would behave basically the same way as if you initialize the method argument, inside the method, you could do like you would do right now like with a null check maybe. So you would just get an exception thrown inside the method, and it would fail. Derick Rethans 8:01 And is that the same if you would use it as the new new syntax as a default value to have constructor arguments as well? Nikita Popov 8:09 Yeah, I think there's a situation would be the same, the one special case is if you use it as a property default, and then the instantiation fails, then we treat this basically the same way as the constructor having failed, which is a special situation, a slightly special situation, because we will also not call the destructor in that case. So we say that the object has been incompletely constructed so it will not get destructed. Derick Rethans 8:38 And of course if this is a standard class property, then this happens on instantiation of the class. How would this work if it would be for a static property? Nikita Popov 8:50 For a static property, well that, that depends again on the whole question of evaluation order. So for example the way things work right now, like without this proposal, is for example if you have a static property and you're referencing it constant that doesn't exist. Then, when you try to use the class you get an exception that okay undefined constant whatever. If you try to use it again, you still get the same exception, so you get this exception every time you use a class. This is what would happen on this case as well. Derick Rethans 9:24 So it wouldn't happen on class declaration, but when you start using it? Nikita Popov 9:29 Depending on where we evaluate. Either only on use or the first time on declaration and then afterwards in each use. If you still try to use it despite the declaration having failed, which is an odd thing to do but you would have to counter it somehow. Derick Rethans 9:45 You know, if it is possible to do people will find a way how to do it. Nikita Popov 9:48 Yes, certainly. Derick Rethans 9:49 Can you talk a little bit about recursion protection as well because the RFC talks about that? Nikita Popov 9:54 Well that's another edge case. So if you create an, have an, for example Class A with a property that has an initializer new A. That means when you create an object of class A, and try to initialize it you have to create another object, and then another object, and another, and we have to detect that situation, or we do detect that situation and for nice exception instead of, resulting in a stack overflow. Derick Rethans 10:20 Which is beneficial. Nikita Popov 10:21 Yes, because most people do not know what to do when people went PHP throws a segmentation fault, so they do prefer exceptions, usually. Derick Rethans 10:30 I would too. The RC also talks about, there are some issues around traits which I didn't quite fully understand, would you mind explaining that to me? Nikita Popov 10:38 The issue here is that traits can have properties and or rules. A rule is that if you have two traits, used in the same class, and declaring the same property, they have to be compatible. And compatible means effectively they have to be exactly the same. So, same visibility and same default value. The trouble here is that if we are dealing with an instance property, which has a new expression as a default value, then we have to somehow check that these are the same. It would be not great if we actually had to evaluate the initializer to do that because, I mean it's okay if it's just you know, with initializer something like one plus two, but if it's an actual new expression we don't want to create objects which again might have side effects and so on. What I'm specifying is that if you have a trait property with this kind of dynamic initializer, so using the new expression, than we will always consider it not compatible. Derick Rethans 11:36 Would it currently be compatible with one of those trait properties, it says seven plus three, for example? Nikita Popov 11:42 That will be compatible, which is actually, I think relatively new thing. We used to not evaluate initializers and traits at all, and say those are incompatible and that changed at some point and seven point, I don't know which version. But in this case we would go back to saying it's incompatible, because at least I don't see a good way to make it compatible and I don't think it's particularly important to support that case. Derick Rethans 12:10 Do you have any information about how much traits are actually used? Nikita Popov 12:15 Well, I know that Laravel uses them. But I have no idea how much. Derick Rethans 12:22 One last thing I think RFC mentioned, is that it also has an effect on attributes, that it sort of gets nested attributes in by the back door. How does that work? Nikita Popov 12:33 I wouldn't call it the back door. Exactly. I have to be honest, I didn't think about attributes at all when writing this proposal, what I had in mind is mainly parameter defaults, and property defaults. But yeah, attribute arguments also use the same mechanism and are under the same limitations. So now you can use new as an attribute argument. And this can be used to effectively nest attributes, so the example I've seen from Symfony is that they have, for example, assertions. They have an assert all attribute which has the which accepts, which wants to accept a list of assertion attributes. And now you can actually do that because you can, um, create these attribute objects recursively. The example from the RFC is assert all, then new assert not null, new assert length max six. Derick Rethans 13:26 That's actually kind of neat, that is just ends up starting to work on right? Nikita Popov 13:30 Yeah, I mean, I read the thread for Symfony how they are trying to work around that. They have various ideas of how to do it and it's all pretty ugly. So I think it's nice to have a more or less proper solution for that. Derick Rethans 13:45 They'll just have to wait until PHP 8.1. Nikita Popov 13:48 Yes, that is the disadvantage. Derick Rethans 13:51 Out later this year. Derick Rethans 13:53 Are there any backwards incompatible changes? Nikita Popov 13:56 That again comes back to the evaluation order the problem. Originally I had intended to this, this to be compatible. Now if we change evaluation order then it is breaking, depending on that, the answer is yes or no, I am still not sure on that one. Derick Rethans 14:11 Because I think PHP eight one already has a breaking changes in there where the order of declaration of properties is now different. Nikita Popov 14:19 Yeah, that the change, though I hope that does not affect people too much because it's mostly about debugging functionality, which of course you are kind of interested in. Derick Rethans 14:29 Yep, it broke my tests, which is a good thing because it means that my tests cover all the edge cases as well. I think we sort of done discussing this RFC, is there anything else that might ends up being added here in the future, or what still needs to be hammered out before you can put it up to vote? Nikita Popov 14:47 Apart from the evaluation order question that I have been continuously mentioning, the future scope would be to extend this to not just new expressions, but also for example static method calls, popular alternative pattern is to not use constructors, but named constructors, which are implemented as static methods, and similarly also function calls for example so you can use something like strlen() or count inside an initializer. Derick Rethans 15:13 Isn't strlen a language construct now? Nikita Popov 15:15 No it isn't. It has an optimized implementation in the virtual machine, but it's still technically a normal function call. Derick Rethans 15:23 Because I remember that, breaking tests in Xdebug as well at some point, because it suddenly didn't suddenly was no longer a function call. Nikita Popov 15:30 Things do tend to break in Xdebug. Derick Rethans 15:34 Okay, I'm used to it. Thank you, Nikita for taking the time to talk about your new in initializers RFC. Nikita Popov 15:40 Thanks for having me. Derick Rethans 15:45 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supports of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time. Show Notes RFC: New in Initialisers Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 78: Moving the PHP Documentation to GIT London, UK Thursday, March 11th 2021, 09:06 GMT In this episode of "PHP Internals News" I chat with Andreas Heigl (Twitter, GitHub, Mastodon, Website) to follow up with his project to move the PHP Documentation project from SVN to GIT, which has now completed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 78. In this episode, I'm talking with Andreas Heigl about moving the PHP documentation to GIT. Andreas, would you please introduce yourself? Andreas Heigl 0:35 Hi yeah I'm Andreas, I'm working in Germany at a company doing PHP software development. I'm doing a lot of stuff in between, as well. And one of the things that I got annoyed, was always having to go through hilarious ways of contributing to the PHP documentation, every time I found an issue with that. So at one point in time, I thought why not move that to Git and, well, here we are. Derick Rethans 1:07 Here we are five years later, right? Because we already spoke about moving the documentation to GIT back in 2019 and Episode 28. But now it has finally happened, so I thought it'd be nice to catch up and see what actually has changed and how we ended up getting here. Where would you want to start. What was the hardest thing to sort out in the end? Andreas Heigl 1:27 Well the hardest thing in the end to sort out was people, as probably always in software development. The technical oddities and the technical bits and pieces were rather fast to solve. What really was taking a long time was, well for one thing, actually, consistently working on that. And on the other hand, chasing down people to actually get stuff done. Because one of the major things here was not the technical side but getting the bits and pieces of information together to get access to the different servers, to the infrastructure of the PHP ecosystem, and chasing down the people that want to help you is one thing, and then chasing down the people that actually can help you is a completely different one. That was for me the most challenging bit, getting actually, to know who can do what and getting, yeah in the end, getting access to the actual machines, the whole ecosystem is running on that was really heavy. Derick Rethans 2:34 The System Administration of PHP.net systems is very fragmented. There's some people having access to some machines and some other people having access to other machines and yea it sometimes takes some time to track down where are all these bits actually run. Andreas Heigl 2:51 One thing is getting tracking down, where the bits run, the other one is, there is an excellent documentation in the wiki, the PHP wiki, which in some parts is kind of outdated. The other thing is, if you don't actually address the different people themselves personally, it is like screaming into the void so you can can send an email to 15 different people that have access to a box, like someone needs to do this and this and this. And everyone kind of seems to think, yeah, someone else can do that. I just don't have the time at this point in time Things get delayed, so you're waiting for an answer for a week; you do some other stuff, so two weeks go into the lab four weeks go into the land, and suddenly two months have passed. You didn't receive an answer and oh wait a minute, there was this project with moving the documentation to GIT so perhaps I should have a look at that again. Derick Rethans 3:44 So what has changed for people that want to contribute to the PHP documentation. Can you explain a little bit the difference between before and after? Andreas Heigl 3:52 Before the documentation was moved everything was an SVN and there was generally there were two kinds of people. There were regular contributors that had an SVN account. They had the documentation on their machine. They could actually modify the documentation and just do an SVN commit, and everything was working smoothly, so that documentation was then actually built. We ran into some issues with that as well but that's a different story. Now, it is as everything is has moved to GIT, the sources are now on git.php.net. Before I come to that, what was it for people that did not have an SVN account. There was an awesome piece of technology on a web server called edit.php.net, which was a graphical user interface to the PHP documentation and to the translation so everyone could more or less log in there, with an anonymous account for example, modify the documentation, and create yeah well a kind of a pull request Create a patch that was then reviewed and could then be merged by people with SVN access. That awesome piece of technology was an awesome piece of technology when it was created some 12 years ago or something like that. It has not changed much in between. So it was still kind of working, not always, it was offline for some time. And the people that had access to the SVN were not really that responsive at all times so it could take some while for your patch to actually be merged in. So how is it now, now all the sources are on git.php.net, and there is a mirror for all translations, on https://github.com/php/, for example, doc-en for the English documentation. You have the repository, and you can create a pull request there. So you just move there, edit the file you want to move, create a pull request. And then the pull request will be merged into the main sources, again, by people with merge access. As this is a pull request that usually happens, a bit faster than before. We are still not at that point where we can create a GitHub action for that so that perhaps that can be completely automated after some technical things are resolved, like yes that does build, and we don't have any issues, technical issues with that. We could just build that automatically and merge that automatically into to the GIT sources. Those are possibilities that we now have, and from that point of view, now the contribution is much easier as we are using the technology that every one of us knows already. Derick Rethans 6:34 The original translation to work with sub modules in SVN, now how is with the GIT approach? Andreas Heigl 6:40 It actually didn't work with sub modules in SVN, it worked with, actually, one folder for the documentation, and then sub folders for the different translations. So there was a base folder PHP Doc, and within that base folder PHP Doc, there was a folder EN for the English base, kind of, and then DE for the German translation or PT_BR for the Brazilian Portuguese translation, or IT for the Italian translation. Or some other rather outdated translations that we actually didn't move over, so we only moved everything that was touched within the last two years, which brought some interesting bugs. Of course we actually moved something that was touched within the last two years, but that is not considered an active translation, and that caused some havoc. Derick Rethans 7:30 Which translation was that? Andreas Heigl 7:31 That was the Italian translation. So actually, there is no official Italian translation of the PHP documentation. But there is an Italian translation that is actually worked on, and hopefully at one point in time, we can actually promote that to a valid active translation, so that Italian people can actually see some Italian documentation for them. In GIT on the other hand, we have different repos, different repositories for the different languages. It is now, not possible to just check out phpdoc, and have every translation. Now you actually have to say, I want to check out the English documentation, and I want to check out the Italian documentation, and I want to check out the Japanese documentation, because I want to work on each of them. That has some disadvantages, especially for people that are working on multiple documentations or multiple translations. On the other hand, that also has advantages because you don't need to actually download all the translations that you're not at all interested in. Derick Rethans 8:35 But that shouldn't be something new because I'm pretty sure that I've never checked out all the translations, even with SVN. Andreas Heigl 8:41 SVN you could decide to only check out certain translations but if you check out the PHP doc base folder you would get all the documentation. Yes, there were some sub modules that actually did exactly that, like, if you check out the sub module for Italian for example you would get the English base and the Italian translation. That was all. Derick Rethans 9:04 I remember we employed some SVN magic to do that kind of things, but I forgot the most about that because it's so long ago, Andreas Heigl 9:11 Not really to worry about. Derick Rethans 9:14 No. Andreas Heigl 9:14 We're thinking about doing this similar thing for GIT now for the by using GIT sub modules, but we have not yet implemented that, because there were other pressing issues like getting the ref check documentation up and running, where you can actually see which files are outdated which files need to be translated and stuff like that. So that was more more pressing some other people have done, also work on that need to check what the current status is, to be honest, because I didn't check that. That was going on very strongly in January, after we moved between Christmas and New Year's Eve. After we equalized some glitches that happened during the whole process, because of Yeah, sometimes also processes that were nowhere really documented, and I got just got plain wrong. So then I had to invest some time and fix all that, but luckily that was during my holiday time so I had a lot of time for that so that was not an issue. Derick Rethans 10:15 So working on the documentation during your holiday's huh? Andreas Heigl 10:18 Yes, definitely. Derick Rethans 10:20 That makes it different from travelling to visit family, because that is of course not something we could do to here. Andreas Heigl 10:25 Exactly, though. Luckily, having a family at home was quite okay it was a nice change to actually be able to get away sometime, from seeing the same people over and over again. Derick Rethans 10:38 Now the documentation has moved from SVN to GIT, and everybody can now finally forget all their SVN commands. But the documentation itself is still written in Docbook. XML as far as I understand. Are there any discussions going on about changing that to something, perhaps a bit more modern? Andreas Heigl 10:58 Yes, there were a lot of discussions going on during the whole phase. I deliberately try to calm that down to not to too many things at once. The thing that I wanted to get going was moving the documentation from SVN to GIT. Just change the underlying source code repository, and not change anything else in the process, because that was already hard enough. Now that we have moved, it is easily possible to actually modify or move the documentation to some other toolage, whether that is markdown or ASCII doc or whatever. I don't care to be honest because that's someone else's job to do. In my opinion, Docbook is actually a pretty good format for that. Yes it is rather verbose for sure, but it allows you to create a lot of different documentations, because the HTML is not the only documentation that is created, there is also the possibility to create a PDF documentation or a CHM for Windows documentation, and stuff like that. I'm not 100% sure how that would work with a rather, with something like like markdown or ASCII doc or something like that. Derick Rethans 12:12 There's different strengths in different formats. Markdown for example doesn't really allow you to link in between documents, so that's probably not very handy but there's like Pandoc, which is stuff that the Python project uses. It's all pretty much designed around restructured text and linking in between them and stuff like that, so I guess that could be a way forward. It is certainly a lot easier to use than Docbook XML, but of course Docbook XML was created for this kind of rich marker without laying things out kind of situations right. Andreas Heigl 12:44 Yeah. The nice thing is actually that in with Docbook. What perhaps that is possible with other tools as well but in Docbook for example, you have one single file where all the links are located in, and every translation just links to this one file, so if a link changes for whatever reason, you can just modify that in one place and don't have to go through all the documentation. And, of course, leave half of the links unchanged, and broken, whatever. So there is a lot of stuff actually that is pretty cool. But as I said that's now up for discussion. If there is someone that actually wants to tackle that and move the documentation format to something else that is a different story. Go ahead, propose that to the internals mailing list, or to the documentation team, we'll see how it goes from there. The documentation itself, the source code itself, is hosted on a platform that we all understand, at least by now a bit better than SVN. Derick Rethans 13:42 Even though it started out on CVS, just like PHP that's. Andreas Heigl 13:46 Yes. Derick Rethans 13:47 A long long time ago. Andreas Heigl 13:49 I found the remaining stuff from the transfer from S from CVS to SVN, yes, Derick Rethans 13:55 I am sure there's still some commits lurking somewhere for that. Andreas Heigl 13:59 Oh yes, especially in the in the ref check, there is a lot of commented out code with yeah CV, in CVS we did it this way now we have to modify that for SVN. Yes, I'm pretty sure now we have some commits in there that modify the SVN stuff for making it usable with GIT. Derick Rethans 14:16 Andreas, do you have anything else to add? Andreas Heigl 14:19 In all it was an awesome experience for me. I got to know a lot of people, a lot of awesome people. That was really really insightful, and I'm really happy that I had the chance to do that, and the trust of the community was really amazing. If anyone wants to get into that stuff, kind of stuff, pick yourself something that the community needs, and go for it, and don't let yourself be derailed by unresponsive mailing lists, or just things not happening. It's not because people think you are stupid, or the task is stupid, it's just because everything is done by volunteers that just pays its price. Derick Rethans 15:00 It certainly does. Thank you, Andreas for taking the time today to talk to me about moving to PHP documentation to GIT. Andreas Heigl 15:07 Thank you very much. It was a pleasure to be here. Derick Rethans 15:13 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes Episode #28: Moving PHP Documentation to GIT Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 77: fsync: Buffers All The Way Down London, UK Thursday, February 25th 2021, 09:05 GMT In this episode of "PHP Internals News" I chat with David Gebler (GitHub) about his suggestion to add the fsync() function to PHP, as well as file and output buffers. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:13 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 77. In this episode I'm talking with David Gebler about an RFC that he's written to add a new function to PHP called fsync. David, would you please introduce yourself? David Gebler 0:35 Hi, I'm David. I've worked with PHP professionally among other languages as a developer of websites and back end services. I've been doing that for about 15 years now. I'm a new contributor to PHP core, fsync is my first RFC. Derick Rethans 0:48 What is the reason why you want to introduce fsync into the PHP language? David Gebler 0:52 It's an interesting question. I suppose in one sense, I've always felt that the absence of fsync and some interface to fsync is provided by most other high level languages, has always been something of an oversight in PHP. But the other reason was that it was an exercise for me in familiarizing myself with PHP's core getting to learn the source code, and it's a very small contribution, but it's one that I feel is potentially useful, and it was easy for me to do as a learning exercise. Derick Rethans 1:16 How did you find learning about PHP's internals? David Gebler 1:19 Quite the roller coaster. The PHP internals are very arcane I suppose I would say, it's it's something that's not particularly well documented. It's quite an interesting challenge to get into it. I think a lot of it you have to pick up from digging through the source code, looking at what's already been done, putting together the pieces, but there is a really great community on the internals list, and indeed elsewhere online, and I found a lot of people very helpful in answering questions and again giving feedback when I first opened my initial proof of concept PR Derick Rethans 1:48 Did you manage to find room 11 on Stack Overflow chat as well? David Gebler 1:52 I did not, no. Derick Rethans 1:53 I'll make sure to add a link in the show notes and it's where many of the PHP core contributors hang out quite a bit. David Gebler 2:00 Sounds good to know for the future. Derick Rethans 2:02 I read the RFC earlier today. And it talks about fsync, but it also talks about flush, or f-flush. What is the difference between them and what does fsync actually do? David Gebler 2:14 That's the question that will be on everyone's lips when they hear about this feature being introduced into the language, hopefully. What does fsync do and what does fflush do? To understand that we have to understand the concept of the different types of buffering, an application runs on a system. So we have the application or sometimes called the user space buffer, and we have the operating system kernel space buffer, Derick Rethans 2:36 And we're talking about writing to file here right? David Gebler 2:39 We're talking about writing to files with fflush and fsync yep. So fflush and fsync are both about getting data out to a file, and using it to push something out of the buffers. But there are two different types of buffers; we have the application buffers; we have the operating system buffers. What fflush does is it instructs PHP to flush its own internal buffers of data that's waiting to be written into a file out to the operating system. What it doesn't do is give us any guarantee that the operating system will actually write that data to disk, so the operating system has its own buffer as well. Computers like to be efficient, operating systems like to be efficient, they will save up disk writes and do them in the order they feel like at the time they feel like. What fsync does is it also instructs the operating system to flush its buffers out to disk, thus giving us some kind of better assurance that our data has actually reached disk by the point that function returns. Derick Rethans 3:29 I really only know about Linux here, but I know on Linux that there's a journal in the file system or, in most of the file systems that it uses. Would have seen make sure the data ends up in a journal, are also committed as how the file system does it itself? David Gebler 3:45 The manner in which fsync synchronizes is indeed dependent on the particular POSIX type operating system, and file system that it's running on. I think you're right in respect of modern Linux and ext4, that would ensure the journal was also updated and that the data was persisted to disk. Older versions may behave a little bit differently. And one thing to note with fsync is that you can't take it as a cast iron guarantee that the data has either basically been persisted to disk, or that the corresponding file system entries have been updated. It is the best assurance you can get that those things have happened, and by the point that function returns, again, in PHP implementation, you have a solid an assurance as you're going to get. Because that manner of synchronization is dependent on the underlying system. It can vary. Yeah Linux ext4 is probably your best bet with fsync, but one interesting thing to know if it does go into PHP and people start using it, is that when you call fsync on a file, if you want to ensure that the file system entries are properly updated as well, you should also call fsync or a handle to that file's containing directory. And of course on a Linux or POSIX type system you can do that you can you can fopen a directory, and you can then call a sync on that handle. Derick Rethans 5:02 You mentioned that you can call fsync on a file handle. How's it different than calling just fsync without an argument? Is fsync without an argument just telling the operating system to flush its current buffers, and fsync on a file handle specifically to flush that specific file, and its metadata? David Gebler 5:22 So in the PHP implementation of fsync, you must supply it with a stream resource that is plain File Stream. You can obviously at the PHP level give it some other kind of resource but it will return a warning if it's not able to convert that into a plain File Stream. Under the hood, when you're talking about C, underlying operating system level, fsync operates on a file descriptor so it doesn't even operate on a stream handle; it operates on the actual underlying number that represents the file at the operating system level. Derick Rethans 5:51 So that is different from the Unix shell command called fsync. David Gebler 5:55 It is different. Yep. Derick Rethans 5:57 The RFC also talks about another function called fdatasync. How's that different from fsync itself? David Gebler 6:03 I think that was a really good question because it's got two answers. It's got the theoretical answer and the practical answer. The practical answer is that these days, it most likely makes no difference at all. The theoretical answer is the fdatasync only pushed out the actual file data to a disk write, but it wouldn't necessarily update metadata about the file, such as the time it was last modified the data that might be stored in the file system about the file. The reality now is that most operating systems, we'll update both of those things at once. The reason they were separated out in the POSIX specification is because of course updating the metadata is another write, so a call to fsync could require two rights were call to fdatasync would only require one. I'm not aware of any operating system and file system now that I'm going to use that actually still treats them differently Derick Rethans 6:57 In PHP, we try to make sure that functions that are implemented work exactly the same on operating systems, and you've already explained that fsync depends a little bit on how the file system handles the specific requests. Would fsync also work in a similar way on on Windows or OSX, or is it specifically meant for just Linux? David Gebler 7:20 It's not specifically meant for just Linux. fsync itself is part of the POSIX specification. Strictly speaking, fsync as an operating system level API does not exist on Windows. Windows does have a similar API mechanism called flush file buffers, and in the RFC, and in the implementation attached to that RFC, on Windows, that's what fsync does, it's a wrapped call to flush file buffers. It has, in practice the same effect. OSX is a bit of a trickier one. fsync does exist on OSX I know. I'm not a user of Apple products myself but I can tell you what I know about OSX. fsync on OSX, it will sort of attempt to flush your file buffers out, but OSX itself will not guarantee that the disk buffers are cached. So we have another layer of buffering there. You have the application space buffering, we have the kernel space buffering, and we have the hardware disk buffering itself. Physical drives will sometimes lie about having written data to disk. I mean USB drives are a notorious example of that. A USB drive will tell your operating system it's finished persisting data when it hasn't, and you can even see that on the little flashing LED on the drive, and if you pull it out your data will be corrupted. OSX is not so reliable. The interface exists, but whether it's actually worked or not is open to question because it may not have flushed the disk cache. Derick Rethans 8:38 Are there any backward compatibility issues with this RFC or the implementation of this? David Gebler 8:43 I'm pleased say there are no backwards compatibility issues at all. It's straightforwardly a new function that operates on plain file streams. It triggers a warning if you give it some kind of resource that can't be converted to a plain file descriptor - no consequence to not using it. You just get the same behaviour you've had in previous PHP versions. It's just a new optional function. Derick Rethans 9:06 What has the feedback to introducing fsync and fdatasync been so far? David Gebler 9:11 When I originally proposed the RFC, I had a bit of feedback on the internals list and around some of the other aspects of the PHP community that I reached out to in various places. Some people didn't see a need for it, which is fair enough, but my answer to that would be the when we look at the history of PHP as a web first language, you can see why people might not have had much of a use case for something like fsync. I mean it only fits particular situations, which is where you want some kind of transaction or guarantee. Quite often what people were doing with PHP were web applications where there was a degree of volatility in file rights, that wasn't important to them; they didn't particularly care about it. What I would say now is that when we look at PHP eight, and the evolution of the PHP ecosystem. You're seeing that it is, it is such a versatile and performance general purpose programming language these days, that people are using it for all manner of tasks. Using it in micro service architectures for back end services, they're using it even for things like data science and machine learning, emerging industries like that, so it's got so many more applications now. Broadly the feedback I had was, for the most part, probably wouldn't take much more than a pull request for it to be accepted. I think it's a very non controversial RFC. It's a small feature to add in the form of a new function. Derick Rethans 10:33 And that is very different than Introducing enumerations or Fibers, of course. David Gebler 10:38 Both of which I'm looking forward to. Derick Rethans 10:40 We spoke a little bit about file system buffers, once you do a write it up in the application buffers, with flush, you can flush that to the operating system buffers and fsync to the file system. But in PHP development, there's also other buffers, that if you output something to the screen on the command line that ends up directly on the terminal. When you do this when PHP runs in a web server environment it doesn't do that because there's more buffers in between right. There's PHP's own buffers, there's buffers you can configure, there is then web server buffers, and networking buffers. So it's buffers all the way down pretty much, isn't it? How does the buffering in PHP's output work, and what kind of things can you do with that? David Gebler 11:21 When we look at buffering in PHP, this is very easy to get confused, because he says buffers all the way down. On one hand we may be talking about buffers at the file system level and application space and kernel space, and on the other side we're gonna be talking about these kinds of buffers that you've just mentioned. Again, this is something that PHP does primarily for performance and perhaps for a few other purposes in how it manages the application that is running. Buffering is a way of storing output, somewhere before it gets sent on to the web server and ultimately from the web server to a user's browser. As you say a web server has its own buffering as well and PHP provides a couple of functions by which you can also attempt to force output to the browser. So again we have much as we have fflush for files. We have flush for regular output that you're trying to send to a web server. And that function will flush the internal output buffers of PHP, and it will then try and flush your web server buffers. There's an interesting parallel here because much like with fsync, flush versus fsync, you can't necessarily guarantee with fflush what the operating system will do. Your data wanted received it into its own buffers. With PHP you can't necessarily get a guarantee that a web server will flush its own buffers when you flush your output there. Perhaps we need to invent some kind of fsync for the web server as well. Output buffering is something that you can configure in PHP and you can stack and nest output buffers as well, so that means from an application developer point of view, you are using some other components, some other bit of code in your PHP system that produces its own output. When I say output, I mean via the normal mechanisms you would write something to a browser in PHP, so things like echo at the simplest level. What you can do is you can use PHP's output buffer functions, which are all the ones that start with ob then an underscore, to capture that output and control what you do with it, instead of it being output to the browser; you can capture it into a string, you can manipulate it, you can discard it, you can throw it up the chain to be output yourself. That's what we would primarily use those buffers for in PHP, but output buffering is also something you can configure in the php.ini file. You can turn it off, you can set the buffer size. Do you have a little bit of fine tuning there that you can do. Derick Rethans 13:39 Something of that just popped in my mind is that when you call the new fsync on a file resource handler, file resource in PHP are implemented with an underlying interface because streams. Is there another fall writing buffer in streams itself as well? David Gebler 13:58 There is a buffer in the actual C library level when it comes to streams. That's going into the detail a little bit of what I was talking about earlier where you have user space buffers and kernel space buffers. The reason those things exist is to do with the way an operating system manages a computer and keeps everything safe. User space isn't able to access kernel space. It's about range of memory addresses in the computer; we're getting quite low level now in terms of how all this works. In the underlying implementation in PHP source code, we're using the C File Stream functions to write the streams, and that means that the actual data gets copied into a buffer in userspace. What fsync does is it instructs the kernel to make a copy of that data in its own memory space, and then push that out to disk. It's getting quite low low level when we get into those details, it has to do with how file access is managed in PHP. There is an even lower level of access, which is more suitable for very high throughput intensive IO operations, which isn't available in PHP at all. I'm not so familiar with how this works on Linux because the file systems are different, but I can tell you in Windows, if you if you want really intensive throughput, you don't want to be calling fsync or equivalent flush file buffers, you know, hundreds of 1000s of times per second. You want to use what we call unbuffered output, where you write directly at the block level to disk. It wouldn't be suitable to try and do something like that in PHP, because it's to higher level language. It's a very very fine level of control that you need to be able to do that kind of thing. But with that level of control you also have a high level of consequence if something goes wrong. Derick Rethans 15:39 I think there's actually an extension in PECL, or there used to be one at least, I'm not sure but it's still maintained for newer PHP versions, called dio or d.i.o., standing for direct IO, which I think implements some of these features, but I don't quite remember but I thought that was file system related, or just only network related direct IO. David Gebler 16:00 I think direct IO extension did have lower level file system access off the top of my head. It's been a long time since I looked at it, and I think it actually had the option to open a file in synchronized mode, which means every write is essentially implicitly calling fsync. What it didn't have was the ability to open file writes in unbuffered mode, which is probably because that's lower level, still I mean that requires you to literally know the block size of the file system you're writing to and to write in that, in that size, Derick Rethans 16:30 Doing that from PHP goes a little bit too far, I suppose. David Gebler 16:34 It does go too far, I think. Derick Rethans 16:36 Then again, if you can open a file you can open a file device as long as you have permission. So, I guess, nothing stops you from at least on Linux opening /dev/sda whatever number it is, as long as you're running PHP as root and writes to it, but I don't think this is a wise thing to do. David Gebler 16:52 Definitely something I'd urge people to try with caution. I mean, obviously on Linux you you do have this kind of extraordinary power from the combination of the user you're running a process as, coupled with the fact that Linux treats everything as a file, and then actually has some interesting implications for fsync, you can try compiling the branch that I've submitted in the PR for the RFC; compile it on Linux, call fsync on some different handles to things which aren't actually files, but which to the underlying Linux operating system look like files. Hopefully the implementation I've provided is robust enough that it should just return false when you try and fsync things that are not actually files. Derick Rethans 17:28 What kind of things are you thinking of here, things like directories? David Gebler 17:32 Directories are fine for fsync and you should actually be able to get a successful fsync on directories because they are literally part of the file system. It's fine on POSIX type systems to do that. I'm not actually sure whether you can do that on Windows or not, but I don't think it provides any particular benefit to fsync a directory on Windows because the underlying flash file buffers API works on the file level only, but on Linux you could try opening file handles to something like a Unix socket and try and fsync that. You should just get false in PHP land from that function because it knows that it can't convert that file handle into an actual file stream, and thus can't get a regular fsync on it. Derick Rethans 18:12 Have you created a test case for that situation? David Gebler 18:15 I have created some test cases for opening streams to things that are not files from PHP. I can't whether I've covered that particular one or not, but I have got a couple in there for things which are not files at the PHP level. Derick Rethans 18:29 That's good to hear. When do you think he will be putting this up for a vote? David Gebler 18:34 I'm planning to put this up for a vote later this week actually. The RFC has been open for a little while, I think it's been about three weeks since I announced it on internals, and obviously there hasn't been a huge amount by way of feedback or discussion on it lately. That doesn't particularly surprise me, it is not a particularly controversial thing to add. I think a lot of people probably don't have much feeling about it one way or the other. But then, I'm hopeful, people who vote will see that as a reason to include it. Derick Rethans 19:00 Thank you David for explaining the fsync RFC and related topics to me today. David Gebler 19:05 Well, thanks for having me on. It's been great talking to you. And of course to anyone listening who's interested in the RFC, do check it out on the PHP wiki. Do have a look at the discussion on internals. And if you're a voting member, don't forget to vote when I open it up to vote later this week. Derick Rethans 19:21 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: fsync function Room 11 on StackOverflow PECL extension DIO Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 76: Deprecate null, and Array Unpacking London, UK Thursday, February 18th 2021, 09:04 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Deprecate passing null to non-nullable arguments of internal functions, and Array Unpacking with String Keys. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 76. In this episode, I'm talking with Nikita Popov about a few more RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself. Nikita Popov 0:34 Hi, I'm Nikita. I work on PHP core development on behalf of JetBrains. Derick Rethans 0:39 In the last few PHP releases PHP is handling of types with regards to internal functions and user land functions, has been getting closer and closer, especially with types now. But there's still one case where type mismatches behave differently between internal and user land functions. What is this outstanding difference? Nikita Popov 0:59 Since PHP 8.0 on the remaining difference is the handling of now. So PHP 7.0 introduced scalar types for user functions. But scalar types already existed for internal functions at that time. Unfortunately, or maybe like pragmatically, we ended up with slightly different behaviour in both cases. The difference is that user functions, don't accept null, unless you explicitly allow it using nullable type or using a null default value. So this is the case for all user types, regardless of where or how they occur as parameter types, return values, property types, and independent if it's an array type or integer type. For internal functions, there is this one exception where if you have a scalar type like Boolean, integer, float, or a string, and you're not using strict types, then these arguments also accept null values silently right now. So if you have a string argument and you pass null to it, then it will simply be converted into an empty string, or for integers into zero value. At least I assume that the reason why we're here is that the internal function behaviour existed for a long time, and the use of that behaviour was chosen to be consistent with the general behaviour of other types at the time. If you have an array type, it also doesn't accept now and just convert it to an empty array or something silly like that. So now we are left with this inconsistency. Derick Rethans 2:31 Is it also not possible for extensions to check whether null was passed, and then do a different behaviour like picking a default value? Nikita Popov 2:40 That's right, but that's a different case. The one I'm talking about is where you have a type like string, while the one you have in mind is where you effectively have a type like string or null. Derick Rethans 2:51 Okay. Nikita Popov 2:52 In that case, of course, accepting null is perfectly fine. Derick Rethans 2:56 Even though it might actually end up being different defaults. Nikita Popov 3:01 Yeah. Nowadays we would prefer to instead, actually specify a default value. Instead of using null, but using mull as a default and then assigning something else is also fine. Derick Rethans 3:13 What are you proposing to change here, or what are you trying to propose to change that into? Nikita Popov 3:18 To make the behaviour of user land and internal functions match, which means that internal functions will no longer accept null for scalar arguments. For now it's just a deprecation in PHP 8.1, and then of course later on that's going to become a type error. Derick Rethans 3:35 Have you checked, how many open source projects are going to have an issue with this? Nikita Popov 3:40 No, I haven't. Because it's not really possible to determine this using static analysis, or at least not robustly because usually null will be a runtime value. No one does this like intentionally calling strlen with a null argument, so it's like hard to detect this just through code analysis. I do think that this is actually a fairly high impact change. I remember that when PHP 7.2, I think, introduced to a warning for passing null to count(). That actually affected quite a bit of code, including things like Laravel for example. I do expect that similar things could happen here again so against have like strlen of null is pretty similar to count of null, but yeah that's why it's deprecation for now. So, it should be easy to at least see all the cases where it occurs and find out what should be fixed. Derick Rethans 4:35 What is the time frame of actually making this a type error? Nikita Popov 4:38 Unless it turns out that this has a larger impact than expected. Just going to be the next major version as usual so PHP 9. Derick Rethans 4:45 Which we expect to be about five years from now. Nikita Popov 4:49 Something like that, at least if we follow the usual cycle. Derick Rethans 4:52 Yes. Are there any other concerns for this one? Nikita Popov 4:55 No, not really. Derick Rethans 4:57 Maybe people don't realize it. Nikita Popov 4:58 Yeah, possibly. You can't predict these things, I mean like, this is going to have like way more practical impact for legacy code than the damn short tags. But for short tags, we get 200 mails and here we get not a lot. Derick Rethans 5:14 I think this low impact WordPress a lot. Nikita Popov 5:17 Possibly but at least the thing they've been complaining about is that something throws error without deprecation, and now they're getting the deprecation so everyone should be happy. Derick Rethans 5:28 Which is to be fair I think is a valid concern. Nikita Popov 5:30 Yes, it is. I've actually been thinking if we should like backport some deprecations to PHP 7.4 under an INI flag. Not like my favourite thing to work on, but people did complain? Derick Rethans 5:47 Which ones would you put in there? Nikita Popov 5:48 I think generally some cases where things went from no diagnostics to error. I think something that's mentioned this vprintf and round, and possibly the changes to comparison semantics. I did have a patch that like throws a deprecation warning, when that changes and that sort of something that could be included. Derick Rethans 6:12 I would say that if we were in January 2020 here, when these things popped up, then probably would have made sense to add these warnings and deprecations behind the flag for PHP seven four, but because we've now have done 15 releases of it, I'm not sure how useful this is now to do. Nikita Popov 6:30 I guess people are going to be upgrading for a long time still. I don't know I actually not sure about how, like distros, for example Ubuntu LTS update PHP seven four. If they actually follow the patch releases, because if they don't, then this is just going to be useless. Derick Rethans 6:48 Oh there's that. Yeah. Derick Rethans 6:50 There is one more RFC that I would like to talk to you about, which is the array unpacking with string keys RFC. That's quite a mouthful. What does the background story here? Nikita Popov 7:00 The background is that we have unpacking in calls. If you have the arguments for the call in an array, then you write the three dots, and the array is unpacked into actual arguments. Derick Rethans 7:14 I'd love to call it the splat operator. Nikita Popov 7:16 Yes, it is also lovingly called the splat operator. And I think it has a couple more names. So then, PHP 7.4 added the same support in arrays, in which case it means that you effectively merge, one array to the other one. Both call unpacking and array unpacking, at the time, we're limited to only integer keys, because in that case, are the semantics are fairly clear. We just ignore the keys, and we treat the values as a list. Now with PHP 8.0 for calls, we also support string keys and the meaning there is that the string keys are treated as parameter names. That's how you can like do a dynamic named parameter call. Actually, this probably was one of the larger backwards compatibility breaks in PHP eight. Not for unpacking but for call_user_func_arg, where people expected the keys to be thrown away, and suddenly they had a meaning, but that's just a side note. Derick Rethans 8:21 It broke some of my code. Nikita Popov 8:23 Now what this RFC is about is to do same change for array unpacking. So allow you to also use string keys. This is where originally, there was a bit of disagreement about semantics, because there are multiple ways in which you can merge arrays in PHP, because PHP has this weird hybrid structure where arrays are a mix between dictionaries and lists, and you're never quite sure how you should interpret them. Derick Rethans 8:54 It's a difference between array_merge and plus, but which way around, I can ever remember either. Nikita Popov 9:00 What array_merge does is for integer keys, it ignores the keys and just appends the elements and for string keys, it overwrites the string keys. So if you have the same string key one time earlier and again later than it takes the later one. Plus always preserves keys, before integer keys. It doesn't just ignore them, but also uses overriding semantics. The same is the other way round. If you have something in the first array, a key in the first array and the key in the second array, then we take the one from the first array, which I personally find fairly confusing and unintuitive, so for example the common use case for using plus is having an array with some defaults, in which case you have to, like, add or plus the default as the second operand, otherwise you're going to overwrite keys that are set with the defaults which you don't want. I don't know why PHP chose this order, probably there is some kind of idea behind it. Derick Rethans 10:01 It's behaviour that's been there for 20 plus years that might just have organically grown into what it is. Nikita Popov 10:07 I would hope that 20 years ago at least someone thought about this. But okay, it is what it is. So ultimately choice for the unpacking with string keys is between using the array_merge behaviour, the behaviour of the plus operator, and the third option is to just always ignore the keys and always just append the values. And some people actually argue that we should do the last one, because we already have array_merge and plus for the other behaviours. So this one should implement the one behaviour that we don't support yet. Derick Rethans 10:40 But that would mean throwing away keys. Nikita Popov 10:43 Yes. Just like we already throw away integer keys, so it's like not completely out there. So yeah, that is not the popular option, I mean if you want to throw away keys can just call array_values and go that way. So in the end, the semantics it uses is array_merge Derick Rethans 10:58 The array_merge semantics are.. Nikita Popov 11:01 append, like ignore integer keys just append, and for string keys, use the last occurrence of the key. Derick Rethans 11:07 So it overwrites. Nikita Popov 11:08 It overwrites, exactly. Which is actually also the semantics you get if you just write out an array literal where the same key occurs multiple times. Unpacking is like kind of a programmatic way to write a function call or an array literal, so it makes sense that the semantics are consistent. Derick Rethans 11:26 I think I agree with that actually, yes. Are there any changes that could break existing code here? Nikita Popov 11:32 Not really because right now we're throwing an exception if you have string keys in array unpacking. So it could only break if you're like explicitly catching that exception and doing something with it, which is not something where we provide any guarantees I think. So generally I think that, removing an exception doesn't count as a backwards compatibility break. Derick Rethans 11:55 I think that's right. Do you have anything else to add here? Nikita Popov 11:59 No, I think that's a simple proposal. Derick Rethans 12:02 Thank you, Nikita for taking the time to explain these several RFCs to me today. Nikita Popov 12:07 Thanks for having me Derick. Derick Rethans 12:11 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Array unpacking with string keys RFC: Deprecate passing null to non-nullable arguments of internal functions Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 75: Globals, and Phasing Out Serializable London, UK Thursday, February 11th 2021, 09:03 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about two RFCs: Restrict Globals Usage, and Phase Out Serializable. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is Episode 75. In this episode, I'm talking with Nikita Popov about a few RFCs that he has been working on over the past few months. Nikita, would you please introduce yourself? Nikita Popov 0:34 Hi, I'm Nikita, I work at JetBrains on PHP core development and as such I get to occasionally, write PHP proposals RFCs and then talk with Derick about them. Derick Rethans 0:47 The main idea behind you working on RFCs is that PHP gets new features not, you end up talking to me. Nikita Popov 0:53 I mean that's a side benefit, Derick Rethans 0:55 In any case we have a few to go this time. The first RFC is titled phasing out Serializable, it's a fairly small RFC. What is it about? Nikita Popov 1:04 That finishes up a bit of work from PHP 7.4, where we introduced a new serialization mechanism, actually the third one, we have. So we have a bit too many of them, and this removes the most problematic one. Derick Rethans 1:19 Which three Serializable methods or ways of doing things currently exist? Nikita Popov 1:24 The first one, which doesn't really count is just what you get if you don't do anything, so just all the Object Properties get serialized, and also unserialized, and then we have a number of hooks, you can use to modify that. The first pair is sleep and wake up. Sleep specifies which properties you want to serialize so you can filter out some of them, and wake up allows you to run some code, after unserialization, so you can do some kind of fix up afterwards. Derick Rethans 1:52 From what I remember, if you use unserialize, where does the wake up the constructor doesn't get called? Nikita Popov 1:59 During unserialization the constructor, never gets called. Derick Rethans 2:03 So wake up a sort of the static factory methods to re rehydrate the objects. Nikita Popov 2:08 Exactly. Derick Rethans 2:08 So that's number one, Nikita Popov 2:10 Then number two is the Serializable interface, which gives you more control. Namely, you have to actually like return the serialized representation of your object. How it looks like is completely unspecified, you could return whatever you want, though, in practice, what people actually do is to recursively call serialize. And then on the other side when unserializing you usually do the same so you call unserialize on the stream you receive, and then populate your properties based on that. The problem with this mechanism is exactly this recursive serialization call, because it has to share state, with the main serialization. And the reason for that is that, well PHP has objects, or object identity. So if you use the same object in two places you really want it to be the same object and not two objects with the same content. Serializable has to be able to preserve that, and that requires that it runs in the middle of the unserialization. Derick Rethans 3:14 Not sure if I follow that bit. Nikita Popov 3:16 Well maybe it's not a hard requirement more like an issue with our serialization format that comes into play here. Way PHP implements this, is using back references. So at first unserializes an object and then later you can have like a pointer back to it, that says like, I want to use the same object as at position number, 10, or so. For these back references to work, we have to actually execute the serialization handler while unserializing because otherwise the offsets will no longer match. So we can just run this at the end of unserialization for example because then our offsets would be incorrect. And this is a big problem because it's not really safe to run code, during unserialization because things are partially initialized. To make these back references work, PHP has to actually store pointers to these objects. And if you somehow modify things in specific ways, then these pointers become invalid. They point to a memory that no longer exists, and a possibly exploitable crash. This is why we would like to get rid of this mechanism. Derick Rethans 4:25 But of course, in order to get rid of things, we had to have a better way of doing things in place first, right, which came with PHP seven four. Nikita Popov 4:32 That's right. Derick Rethans 4:32 So that's number three. Nikita Popov 4:34 That's number three. Number three is actually very similar to number one: two new magic methods, double underscore serialize and double underscore unserialize. Serialize returns an array, usually like an array of properties for example, and then unserialize populates the object from that array. In practice, this works very similar to the Serializable interface, just that you don't manually call serialize and unserialize, but PHP will do so on your behalf. So you just return an array or get an array, and PHP will integrate that into the like main serialization, and because it's left to PHP, PHP can control where these calls occur. Derick Rethans 5:19 With sleep originally you only return the name of the properties. Whereas with this new interface you return the names of the properties but also their values. Nikita Popov 5:30 That's right. The new mechanism, this, like, in practice, it serves as a replacement for the Serializable interface. But from a technical side it's really close to sleep and wake up, um, just that, as you said, instead of returning property names you return both names and values. Derick Rethans 5:51 And this is now the recommended way of doing serialization. Nikita Popov 5:54 Like the motivation is one problem was, what I mentioned the security problem. Maybe the thing that impacts users more commonly is that things like calling parent::serialize and parent::unserialize with the Serializable interface, usually doesn't do what you want. Again, due to these back references because, like, the calls get out of order, we should do the same thing with the magic methods, with the underscore underscore serialize and unserialize and you can safely call parent methods and compose serialization in that way. Derick Rethans 6:29 That's our state of serialization right now. We haven't spoken about RFC, what are you proposing to do here? Nikita Popov 6:34 The RFC proposes to get rid of the Serializable interface. And, like in a way that is a bit more graceful than just deprecating it outright. And the idea is that if you have code that is still compatible with PHP 7.3, where the new mechanism doesn't exist, you probably still want to use Serializable. So if we just deprecated out right that would be fairly annoying to have code that's compatible with PHP 7.3, and 8.1. So instead what we do is we only deprecate the case where you implement Serializable without implementing the new mechanism. If you implement both of them, then you're fine for now. Derick Rethans 7:15 The new mechanism, the one we're introducing PHP 7.4, would overrides the PHP 7.3 one already anyway. Nikita Popov 7:22 Exactly. So on PHP 7.3 you would end up using Serializable and PHP seven four and higher, you would be using the new mechanism. And then, at a later point in time we would actually also deprecate Serializable itself and then remove it, though, like based on mailing list response, some people at least didn't like the long timeline. I'm not exactly sure what the alternative is, so either to deprecate Serializable right away, or to later remove it without deprecation of the interface itself. Derick Rethans 7:57 Yeah, from what I saw the, the long-term-ness of phasing it out. I think had mentioned that it finally got removed in PHP 10, which is potentially 10 years away right. If we following every five years with a new major release. But then in the end, it does have some merit making sure that people can move on without being left in the dark at some point right. What is your own preference? Nikita Popov 8:22 My own preference is what I proposed. I would also be fine with, like say in PHP 8.1, we call the proposal so you only get a warning if you only implement Serializable without the new mechanism, and the PHP nine we could just drop Serializable entirely. I think that would not be, because then the only problem then would be if you have code that is competitive with PHP 7.3 and PHP 9.0. I am sure that code will exist ... pretty normal version range to have. Derick Rethans 9:08 Yeah, I probably would agree with you there. When I read the RFC it also mentioned PDO. Why would it mention PDO? Nikita Popov 9:15 This all is something I only found out while writing it's on there is a PDO fetch serialize flag, which automatically calls unserialize when fetching values. So I will not comment on the really dubious idea of storing serialized data in the database. Derick Rethans 9:35 I mean, people would currently said that the alternative is to store JSON, in these columns as values. Nikita Popov 9:40 That would still be better. Derick Rethans 9:42 But it's still a serialized format? Nikita Popov 9:44 But at least the way this flag is implemented is effectively broken, because it doesn't just call unserialize, the function; it calls unserialize on the Serializable interface. I have no idea how this was intended to be used in practice, because it's not compatible with, like the normal serialization of the class. In practise like everything I have found about this online is basically just that okay if this functionality is broken, you shouldn't use it. Derick Rethans 10:15 So you have less concerns just removing that straight away, I suppose. Nikita Popov 10:19 Yeah. Derick Rethans 10:20 Do you have anything else out about serialization. Nikita Popov 10:22 I think this proposal is a very simple one and we have actually talked, way too much about this. Derick Rethans 10:29 Let's move on to the next RFC, which is titled Restrict Globals Usage. This title almost sounds worse than it is as it might imply that you want to get rid of the globals array altogether. But I bet that's not the case. And I also suspect that restricting the globals array is a lot more technical as a subject as it might seem. Nikita Popov 10:49 That's right. So this is really, mostly motivated by internal concerns, and has hopefully not a great deal of impact on like practical usage. There are a couple motivations, so some of them are about semantics, so globals is a very magic variable, that does not follow the usual semantics of PHP a number of ways. In particular array are typically by value. In all other cases, they are by value, which means that if you modify, like if you copy an array and modify one copy, then the other one doesn't get modified, I mean it's a copy so obviously it doesn't get modified. For globals if that's not the case. If you make a copy of globals and you modify the copy, then the original array also gets modified. Derick Rethans 11:36 Which is not the case for other super globals such as underscore get and underscore post. Nikita Popov 11:41 The other super globals are a bit magic but not that magic. There are a couple of other concerns with edge cases, but I think the real motivation here is the internal concern. And that's how globals is implemented. PHP, normally, manages variables in functions and scripts, using so called compiled variables. And this works by well when the script is compiled we actually see all the variables with the used, at least all the variables that don't go through something like variable variables or globals or something like that. And we reserve a slot for each of these variables, so we can directly access it. We don't have to look up, like the variable by name, we just say this is variable number seven and we can directly access it, which is much much more efficient. The problem is, then if you have something that globals you want to both have this access by index, and access by name, and they do that by storing a pointer inside the globals array to the actual location of the variable. Yeah, so this is a very special concept. So we call this an indirect, a variable of indirect type, and it essentially occurs only inside the globals array, and for object properties. For object properties it happens for the same reason, so object properties are normally accessed by index, but if you do something like variable object dynamic object access, then we also have to look it up by name. There we do the same thing, so we have a like map from property names to values, and if the value is really stored inside an object property slot then we just store a pointer there. The thing with the objects is that this is like really an internal concern that's well encapsulated and doesn't leak into normal PHP code. That's not the case with globals because globals is on the surface just a normal array. So you can do everything with it, you do with a normal array you can pass it to functions. Like in theory, all the functions, need to deal with this special value type that says: okay actually this is not the value itself is just a pointer to the value. The way you do it is every time you access a value you check okay is this an indirect value; if it is, follow the pointer. Derick Rethans 14:01 I have plenty of code in Xdebug for this. Nikita Popov 14:04 So it's really a super simple operation to do, but you actually have to do it. And you have to do it absolutely everywhere, if you're being pedantic. In practice that just doesn't happen. In PHP's own code, in the standard library, the array functions are those do consistently handle this edge case. But if you like go further, even most bundled extensions, and certainly most third party extensions, they are not going to do this and if they don't either they just get some, like you know benign misbehaviour where it looks like array elements are missing, or you get a crash, because the type is simply not handled. Yeah, well that's not a great state to be in, because like pushing passing the globals array into something like array pop or something, is very weird operation to do. I don't know if ever, anyone has done that for purposes outside testing PHP. But to support it, we have to like handle this special case everywhere, which is not robust and also has a certain performance impact when it comes to low level operations. So we also have to do this check every time you access an array for example from normal PHP code The idea is to remove the special case. That's the motivation here. Derick Rethans 15:23 What are you proposing to change? Nikita Popov 15:26 One is if you just access variable in globals. So you write $GLOBALS[], some variable name. Then we treat that especially and compile it down to an access to this global variable. So it could be a read access, could be a write access, or anything else, Derick Rethans 15:44 But it is something that happens, when PHP compiles scripts. Nikita Popov 15:48 That's right. The second part is you can also access the globals array in a read-only way, so you can take the whole array, and for example, do a for each loop over it. And that continues to work. The part that doesn't work is to take the whole globals array and modify it in some way, for example, passing globals to array pop, which requires passing it by reference is going to throw an error. Derick Rethans 16:13 At which state. Is that going to throw an error? Nikita Popov 16:15 That's usually during compilation, but specifically for the case of by-reference passing it can't be detected at runtime, because we don't always know if it's a by-reference or by-value pass. But for most of the cases it's a compile time error. Maybe one particular case that's worth mentioning is that you also can do a foreach by-reference over it. So if you like want to loop over globals and modify entries while doing so the way to do it now would be to do by-value loop and then just again access specific elements in it, like access globals key or something. And the reason why this helps us is that we can just return, like when you access globals, we can actually return a copy of the array. We don't have to maintain these like indirect pointers which are only necessary to support modifications, we can just return a copy. That means we no longer have to deal with this edge case in most places, in the engine and in third party extensions, Derick Rethans 17:15 Talking about third party extensions, the code that implements this RFC has already been merged into PHP eight one, but the moment you did that, tests in Xdebug started failing, because I read the globals array, but it doesn't seem like it exists any more now. Nikita Popov 17:31 That's actually a good point. Globals, I would know view it as a like, more like a syntax construct, similar to variable variables, or even the $this variable. So this is also not a real variable. Globals is no longer added as an actual variable in the symbol table, which is directly compiled down to either an access to the specific global or returns a copy of the table. So for Xdebug you, I probably filter you you have to access the EG symbol table. Derick Rethans 18:02 Yes, but it wasn't as simple as it seemed because this is a hash table, and no longer is that a full array, which means that all my logic code doesn't work with that. So I've decided that globals just no longer exists and stuff, which is what it logically is in PHP eight one anyway. Nikita Popov 18:22 So that might actually be nice. So I know that, like code that does work with globals, like as an array, usually also always skips skips globals itself when iterating over it, because otherwise you usually run into some kind of infinite recursion issue. That's actually another thing, so globals is the one way you can have a recursive array, without references being involved. So I know that the Symfony like variable/cloner dumper. That goes for a lot of effort to detect cycles, like has some extra fun hacks to detect globals correctly for that reason, because usually you just take references but for globals that doesn't work. Derick Rethans 19:09 Right, how much of an impact is this going to have to existing code? Nikita Popov 19:12 So I like analysed the top composer packages and found, not a lot of usages. I don't remember the exact number, it was maybe five cases that break. That's not to say that it has no impact. I do know that PHPUnit eight point whatever, had such a globals use, which was fixed already because Sebastian Bergmann now, adds support for new PHP versions to PHPUnit eight and nine both. If you're using PHPUnit seven, then probably, it's no longer going to work for that reason. Of course, it also doesn't work for many other reasons, as well. Depending on which features to use, but I do know that you know sometimes if you're not using mocks, then you can often use old PHPUnit versions, but I think that's no longer going to work in this case. Derick Rethans 20:04 It's something that users of PHP and PHPUnit, probably should start testing once the alpha and beta releases of PHP eight one start happening. Nikita Popov 20:16 Right. I mean, I hope that it's not going to be a big issue. After all, this is minor PHP version. So we really shouldn't be introducing bad breaks, but at least the usage I've seen in open source project suggests that it should not be a big problem. Derick Rethans 20:33 Excellent. As I've mentioned this RFC is already been merged. So I don't really have to ask about feedback, because it's irrelevant right now. It's already there. Nikita Popov 20:44 Well, you could still have feedback afterwards. Derick Rethans 20:48 Thank you, Nikita for taking the time to explain these several RFCs to me today. Nikita Popov 20:52 Thanks for having me Derick. Derick Rethans 20:57 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Restrict Globals Usage RFC: Phase Out Serializable Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 74: Fibers London, UK Thursday, February 4th 2021, 09:02 GMT In this episode of "PHP Internals News" I talk with Aaron Piotrowski (Twitter, Website, GitHub) about an RFC that he is proposing to add Fibers to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick. Welcome to PHP internals news, the podcast dedicated to explaining the latest developments in the PHP language. This is Episode 74. Today I'm talking with Aaron Piotrowski about a Fiber RFC, that he's working on together with Nicolas Keller. Aaron, would you please introduce yourself. Aaron Piotrowski 0:33 Hi everyone I'm Aaron Piotrowski, I started programming with PHP back in 1998 with PHP three, so I've just dated myself there but, but I've worked with a lot of different languages over the last few decades but PHP is always continually remaining, one of my favourite and I'm always drawn back to it. I've gotten a lot more involved with the PHP projects since PHP seven. The Fiber RFC is my first major contribution I have attempted though. In the past I did the RFC for the throwable exception hierarchy. And the Iterable pseudo type in PHP 7.1. Derick Rethans 1:12 Yeah, these things are both before I started doing the podcast so hence we haven't spoken yet, at least on here. We've actually met at some point in the past. I've had a read through the Fiber RFC this morning, but I'm still fairly baffled. Could you perhaps explain in short what Fibers are where the idea comes from. And what's your specific interest is in adding them to PHP? Aaron Piotrowski 1:35 A few other languages already have Fibers like Ruby, and they're sort of similar to threads in that they contain a separate call stack, and a separate memory stack, but they differ from threads in that they exist only within a single process, and that they have to be switched to cooperatively by that process rather than actively by the OS like threads. So sometimes they're called Green threads, and the generators that are in PHP already are actually somewhat similar to Fibers; but generators differ in that they're stack less. And so what that means is that generator function can only be interrupted at one layer. Whereas a Fiber can be interrupted anywhere in the call stack. So like it'd be imagine if you had a generator where yield could be very deep in a function call. Rather than at the top level. Like, how generators can be used to make interruptible functions, Fibers can also be used to create similarly interruptible functions, but with again without having to know exactly when it's going to be interrupted not at the top level but at any point in the call stack. And so the main motivation behind wanting to add this feature is to make asynchronous programming in PHP much easier and eliminate the distinction that usually exists between async code that has used promises and synchronous code that we're all used to. Derick Rethans 3:09 So what specifically are you proposing to ask to PHP here then? Aaron Piotrowski 3:12 Specifically I'm looking at adding a low level Fiber API, that's really aimed specifically at async framework authors to create their own more opinionated API's on top of that low level API. So adds just a couple of classes: Fiber, and a FiberScheduler on within a couple of exception classes and reflection classes for inspecting Fibers. When a Fiber is suspended to the execution switch is to FiberScheduler, which is then a special Fiber, that's able to start and resume, regular user Fibers. So a Fiber scheduler is generally going to be something like an event loop that then, when a Fiber is suspended that our scheduler event loop will resume certain Fibers, like in response to events like data becoming available on a socket or like a timer expiring. Derick Rethans 4:17 How would the event loop, decide which Fiber to resume, depending on on input, for example? Aaron Piotrowski 4:24 It's largely up to, how like that framework choose to write that event loop, but in general, like when a Fiber is going to suspend it'll set up some sort of callback, or add it to like an array of Fibers that's waiting on events, and when execution switches to the event loop. Derick Rethans 4:44 A Fiber before it suspends itself set up add another Fiber to the scheduler. Aaron Piotrowski 4:48 That's exactly right. Before a Fiber suspends itself it adds itself to some sort of event in the event loop that when it triggers, it will resume that Fiber. So, if you're familiar with how some of the other async frameworks that work now they'll add something like a callback or promise to the event loop that's resolved. This is sort of working the same way except that it's just resuming a Fiber that of like invoke a callback, although you know that Fiber might be resumed and by invoking a callback. Derick Rethans 5:23 Is the Fiber scheduler or the event loop as however if you want to call it, is that something you would, or can also make use of in a normal PHP applications, I mean by normal I mean, not like an async PHP framework? Is that the intention is all? Aaron Piotrowski 5:39 Using Fibers than kind of helps you eliminate that boundary that that exists, trying to put asynchronous code into something using synchronous code because you end up with a promise that you have to await in that doesn't really work very well, where you're trying to mix it with sync code, Fibers eliminate the need for a promise. Since, the asynchronous function can still return types, you can mix async code into a traditional like sync application, even like something running in Nginx or Apache, it doesn't have to be a fully asynchronous app to make use now of some async I/O. Derick Rethans 6:23 For example if he wants to do multiple database calls at the same time. Will you be able to use a uses for that? Aaron Piotrowski 6:30 Exactly. Derick Rethans 6:31 If a user would have a PHP application and want to use multiple database calls at the same time, how would, how would they set it up it with Fibers and Fibers scheduler? Aaron Piotrowski 6:40 This is a low level API is aimed primarily at like a framework author. Generally if you're writing application with it, you're probably going to use one of those frameworks so it would largely depends on how that framework would set that up. Although, in general, those frameworks are going to provide some sort of abstraction for running code concurrently, that they probably have their own sort of placeholder object like like a promise again. So that when you start running things concurrently, they return to something that you can then wait on for all those things to end up, or when those things, complete executing. So it doesn't totally eliminate the need for promises, but it does allow for both to do not always that async to not always have to return a promise rather a promise is only required when you want concurrency, and that, you know, a framework will provide tools to await that can still be mixed in with synchronous code. Derick Rethans 7:48 Do I understand this correctly that you won't need to promise unless you mix it with synchronous code? Aaron Piotrowski 7:54 You won't need a promise unless you explicitly need concurrency. Derick Rethans 7:57 Okay, that makes more sense I suppose. Aaron Piotrowski 7:59 It's difficult to explain it's so much easier with examples. Derick Rethans 8:03 Yeah but examples are very difficult to do in audio only. Aaron Piotrowski 8:07 Yes, exactly. You have like a database query that returns a result. If you want to run multiple queries at the same time, the async library that uses Fibers underneath would be able to provide an abstraction that would allow you to run multiple queries at once. But that those two run concurrently would return a promise. But you would be able to collect those promises together, and use like a await function, provided by that async framework to then get the results of all of the queries at once. Derick Rethans 8:47 You mentioned that Fibers are not threads, they just are more, they're sort of logical threads, but not physical threads in the same process. PHP isn't multi threaded, how would this work internally? What would have Fiber do or store, so that the scheduler can resume them for example? What is the internal mechanism, how does this interact with PHP itself. Aaron Piotrowski 9:11 Each Fiber is allocated a C stack and a VM stack on the heap. So switching between them is similar to generators, when switching between Fibers the current VMs stack is swapped and the C stack is swapped, but it doesn't touch any of the other memory in the process, so things like globals are still accessible to each Fiber, since only one Fiber can be executing at the same time, you don't have some of the same race conditions that you have with threads of memory being accessed or written to by two threads at the same time. It can't happen with Fibers that you can have two Fibers that might be dependent on the same memory, and you may have to do some of the same sort of synchronization, that you have to do with threads to that memory if you don't want interleaving of Fibers to be potentially overriding that memory. That's the sort of thing that's being left again to like the async frameworks that would use this to provide that sort of mechanism over a low level Fiber API. Derick Rethans 10:15 Of course when a Fiber is running, there's no need for locking anything because nothing runs at the same time anyway. And of course, when a Fiber suspense itself it then sort of knows that, well, I'm unlocking what I'm wanting to use of don't have this synchronization issue there. Aaron Piotrowski 10:32 You don't have the synchronization issue where you have to worry that while while this Fiber is running, another Fiber might overwrite the same memory. But there is a potential that if a Fiber suspends that while it's suspended another Fiber could have overwritten some global memory, so if you're if you're sharing memory between Fibers it's best to use some sort of abstraction, like channels in Go to share data between Fibers rather than like a global. It could just be a global, it could even be like a class property or something, anything that you might share between two Fibers you could give the same object to two different Fibers, and those Fibers could modify that object. Well, I wouldn't recommend doing that, I would share that object over like a channel instead. Derick Rethans 11:23 Your RFC doesn't talk about channels. So, I reckon that'd be something else that has to be implemented, probably with Fibers in the async framework. Aaron Piotrowski 11:31 Exactly, yes. Derick Rethans 11:32 What is your reason to want to others to PHP core instead of having it sitting in a PECL extension because I could argue that this isn't something that many PHP developers would ever use. Aaron Piotrowski 11:43 I definitely see that point. I think that availability for being able to use that in any sort of application would be important for some reason there still seems to be a hesitation on certain platforms to install extensions. But more beyond that, there are reasons that you'd want to have it in core all the time, extensions that would want to profile code will need to be aware of Fibers. And if, if Fibers are an extension well then actually making use of it in a real application might be difficult because your code profilers don't work very well because they don't understand the Fiber switching. So that is one area that if this were merged into core, code profilers would probably have to be updated to account for that. There was also a bit of an issue in the extension right now that due to destructor order, how the shutdown logic goes. And what hooks are available in PHP, that if a registered shutdown function or a destructor suspends a Fiber, it might have to restart the scheduler unnecessarily. But if it were in core, I could avoid that. And then there's there's also issues with how to handle some of the global stacks that PHP provides when switching Fibers should those be reset, should they remain, but those are issues that can only be addressed if Fibers were part of the core rather than extension. Otherwise I have no choice but to just leave them as stacks that aren't switched. Derick Rethans 13:22 Okay yeah that makes sense, because the stack switching is something that is trickier to do from an extension. Aaron Piotrowski 13:28 Like the error handler, you know, how should that be handled. Should it be the error handler stack depends on which Fiber or should it remain just a constant global and I can't change that from an extension that would have to be part of core. Derick Rethans 13:41 Because Fibers allow you to basically switch between threads. Have you had a look at how how debuggers, for example deal with this? Aaron Piotrowski 13:50 In my testing with Xdebug, I didn't have any issue with inspecting execution stacks, or code coverage, that I will have to really defer to you. If you think that there's any anything that in Xdebug that would have to be updated or changed to accommodate. So far it's worked very well. Derick Rethans 14:10 I know you submitted a bug report with a crash, but that's been fixed already, of course. What was that issue actually, I don't quite remember what it was? Aaron Piotrowski 14:18 Something code coverage where I honestly don't really remember any more. It is invalid pointer for something. Derick Rethans 14:26 It's an interesting thing that's with all these fancy extensions, and Fiber and not being the only, sometimes you run into things that extensions do something very strange that, then make things crash in Xdebug. I can't always test for that of course up front. I actually have a slightly related question that pops into my head here is like, there's also something called Swoole PHP, which does something similar, but from what I understand actually allows things to run in threads. How would you compare these two frameworks, or approaches is probably the better word? Aaron Piotrowski 15:00 Swoole is, they try and be the Swiss Army knife, in a lot of ways where they provide tools to do just about everything and they provide a lot of opinionated API's for things that, in this case I'm trying to provide just the lowest level, just the only the very necessary tools that would be required in core to implement Fibers, I do believe Swoole implements Fibers as well. They use the term co-routine for their Fibers. I believe they actually use the same boost assembly language code that I used for swapping C stacks. I'm not sure if they provide actual threading as well. If they do, then that's great. Of course threading still requires a ZTS build of PHP. Fibers do not because it's still within one process. Derick Rethans 15:55 I know that Swoole definitely doesn't work with Xdebug because the way how they do things, but it sounds like Fibers will actually work just fine. Aaron Piotrowski 16:02 It seems so yes. I've used it already extensively with PhpStorm like setting breakpoints and things to debug. When I was upgrading some of the, the AMP libraries to figure out what was going wrong and it worked perfectly. Derick Rethans 16:16 Are you involved with AMP. Aaron Piotrowski 16:18 Yes, I am. One of the primary maintainers now along with Nicolas. I didn't start the library. The original author has moved on to other things, but it's it's pretty much just Nicolas and I doing most of it now. Bob still contributes occasionally as well. Derick Rethans 16:38 And I guess that's why are you interested in having Fibers in PHP come from then? Aaron Piotrowski 16:42 Yes, exactly. Derick Rethans 16:44 What has the feedback been so far? Aaron Piotrowski 16:47 Largely positive from the people that are more familiar with it. I haven't actually gotten a whole lot of feedback from the core contributors of PHP, so I'm not really sure where the proposal stands with them at the moment, but I guess maybe no feedback is good feedback if they had a problem with it somebody who's spoken up by now, I'm not sure. Derick Rethans 17:09 That is often the case right, if it's if there is something to be added that is quite complicated, you get a lot less feedback. Then where there's something very simple like picking a name for function right. Aaron Piotrowski 17:19 Yes, exactly. Derick Rethans 17:21 When do you think your will be putting this up for a vote? Aaron Piotrowski 17:24 I think I want to wait at least another month or so. I did make a recent change to how the Fiber scheduler API worked, and so I wanted to make sure that that people had time to review it. Maybe send another reminder email or two to internals, so that they, so that more people get a chance to look at it and play with it and provide feedback. Derick Rethans 17:47 Somewhere around mid February? Aaron Piotrowski 17:49 Something like that, yeah. Derick Rethans 17:51 Did we miss anything discussing Fibers. Do you have anything to add yourself? Aaron Piotrowski 17:55 No, I don't really think so. I think we covered the main points of it. Derick Rethans 17:59 I have to say I understand that quite a lot better now, which is always good, and hopefully the people listening to this episode will also find it interesting and understand it well. So I would say thanks for explaining Fibers to me today. Aaron Piotrowski 18:13 Yeah, thanks a lot for having me on. Derick Rethans 18:18 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Fibers Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 73: Enumerations London, UK Thursday, January 28th 2021, 09:01 GMT In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a new RFC that he is proposing together with Ilija Tovilo: Enumerations. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:14 Hi I'm Derick and welcome to PHP internals news that podcast dedicated to explain the latest developments in the PHP language. Derick Rethans 0:22 This is Episode 73. Today I'm talking with Larry Garfield, who you might recognize from hits such as object ergonomics and short functions. Larry has worked together with Ilija Tovilo on an RFC titled enumerations, and I hope that Larry will explain to me what this is all about. Larry, would you please introduce yourself? Larry Garfield 0:43 Hello World, I'm Larry Garfield, I am director of developer experience at platform.sh. We're a continuous deployment cloud hosting company. I've been in and around PHP for 20, some odd years now. And mostly as an annoying gadfly and pedant. Derick Rethans 1:00 Well you say that but in the last few years you've been working together with other people on several RFCs right, so you're not really sitting as a fly on the wall any more, you're being actively participating now, which is why I end up talking to you now which is quite good isn't it. Larry Garfield 1:15 I'm not sure if the causal relationship is in that direction. Derick Rethans 1:18 In any case we are talking about enumerations or enums today. What are enumerations or enums? Larry Garfield 1:26 Enumerations or enums are a feature of a lot of programming languages, what they look like varies a lot depending on the language, but the basic concept is creating a type that has a fixed finite set of possible values. The classic example is Boolean; a Boolean is a type that has two and only two possible values: true and false. Enumerations are way to let you define your own types like that to say this type has two values, sort ascending or descending. This type has four values, for the four different card suits in a standard card deck, or a user can be in one of four states: pending, approved, cancelled, or active. And so those are the four possible values that this variable type can have. And what that looks like varies widely depending on the language. In a language like C or c++, it's just a thin layer on top of integer constants, which means they get compiled away to integers at compile time and they don't actually do all that much, they're a little bit to help for reading. At the other end of the spectrum, you have languages like rust or Swift, where enumerations are a robust Advanced Data Type, and data construct of their own. That also supports algebraic data types, we'll get into that a bit more later. And is a core part of how a lot of the system actually works in practice, and a lot of other languages are somewhere in the middle. Our goal with this RFC, is to give PHP more towards the advanced end of enumerations, because there are perfectly good use cases for it so let's not cheap out on it. Derick Rethans 3:14 What is the syntax? Larry Garfield 3:15 Syntax we're proposing is tied into the fact that enumerations as we're implementing them, are a layer on top of objects, they are internally objects with some limitations on them to make them enumeration like. The syntax is if I can do this verbally: enum, curly brace, a list of case foo statements, so case ascending, case descending, case hearts, case spades, case clubs, space diamonds, and so on. And then possibly a list of methods, and then the closed curly brace. So it looks an awful lot like a class because internally, the way that we're implementing them, the enum type itself is a class, it compiles down to a class inside the engine, but instead of being able to instantiate your own instances of it. There are a fixed number of instances that are pre created. And those are the only instances that are ever allowed to exist. And those are assigned to constants on that class, and you have the user status, then you have a user status enum, which has a case approved, or active, and therefore there is in the engine, a class user status which has a constant on it, public constant named approved, whose value is an instance of that object. And then you can use that like a constant, pretty much anywhere and it will just work. The advantage of that over just using an object like we're used to, is you can now type a parameter, or a property or return type for: This is going to return a user status, which means you're guaranteed by the syntax that what you get back from that method is going to be one of those four value objects, each of which are Singleton's so they will always identity compare against each other. Larry Garfield 5:07 So just like if you type something as Boolean, you know, the only cases you ever have to consider are true and false. If you type something as user status, you know, there are only four possible cases you ever have to think about, and you know exactly what they mean and you can document them, and you don't need to worry about what if someone passes in, you know, a string that says cancelled instead of rejected or something like that. That's syntactically impossible. Derick Rethans 5:32 What happens if you do that? Larry Garfield 5:33 You get a syntax error. If for example, a method is typed as having a parameter that takes a user status and you pass in the string rejected. Well, it's a string but what's expected is a user status object. So it's going to type error. Derick Rethans 5:47 You get a type error yes. Larry Garfield 5:49 If you try and pass in user status, double colon rejected, when that's not a type, you get an error because there is no constant on that class named rejected therefore, you'll get that error instead. Building on top of objects means an awful lot of functionality, kind of happens automatically. And the error checking you want to kind of happens automatically. And it's in ways that you're already used to working with variables in PHP. Derick Rethans 6:14 For example, it also means that enums will be able to be auto loaded because they're pretty much a class? Larry Garfield 6:19 Exactly. They autoload exactly the same way as classes, you don't you didn't even do anything different with them just follow the standard PSR four you're already following, and they'll auto load like everything else. Derick Rethans 6:28 That also means that class names and enum names, share the same namespace then? You can't have an enum with the same name as a class. Larry Garfield 6:36 Right. Again, making it built on classes means all of these. So it's kind of like a class, the answer is yes, almost always, and it makes it easier to work with is more convenient, it's like you're used to. Derick Rethans 6:47 I suppose it's also makes it easier to implement? Larry Garfield 6:50 Substantially. Derick Rethans 6:51 I'm familiar with enums in C, or c++ where enums is basically just a way of creating a list of integers, that's auto increase right, then I've been reading the RFC, it doesn't seem to do anything like that. How does this work, are the constants on the enums classes, are they sort of backed by static values as well? Larry Garfield 7:12 They can be. There's some subtlety here that's one of the last little bits we're still ironing out as we record this, the user status active constant is backed by an object that is an instance of user status and has a hard coded name property on it, called active. That is public but read only. That's how internally in the engine, it keeps track of which one is which. And there is only ever that one instance of user status with the name active on it. There is another instance that has name cancelled or name pending or whatever. Derick Rethans 7:53 There's no reason to be more than one instance of these at all. Larry Garfield 7:58 Then you can look at that name property but that's really just for debugging purposes, most of the time you'll just use user status double colon active user colon user status colon approved, whatever as a constant and, roll, roll with that. The values themselves don't have any intrinsic primitive backing, as we've been calling it. You can optionally specify that enumeration values of this type, are going to have an integer equivalent, or a string equivalent. And those have to be unique and you have to specify them explicitly. The syntax for that would be enum user status colon string, curly brace, blah blah blah. And then each case is case active equals. Derick Rethans 8:46 Is it required that the colon string part is part of your enum declaration? Larry Garfield 8:51 If you wanted to have a primitive backing, yes, a scalar equivalent. We're actually calling its scalar enums at the moment because that's an easier word to say than primitive. The idea here is enumerations themselves conceptually should not just be an alias for a scalar, they have a meaning conceptually unto themselves rather than just being an alias to an integer. That said, there are plenty of cases where you want to be able to convert enumeration case/enumeration value, into a primitive to store it in a database, to show it on a web page, where I can't store the object in a database that doesn't really work well, MySQL doesn't like that, I added this ability to define a scalar equivalent of a primitive. And then, if you do that, you get two extra pieces that help you. One is a value property, that is again a public read only property, which will be that value, so if approved, or active is a, then user status double colon active arrow value will give you the value A. Which means if you have an enum and you want to save it to a database table, you just grab the value property and that's what you write to the database, and then it's just a string or integer or whatever it is you decide it to make that. When you then load it back up, there is a from method. So you can say user status double colon from. If it's some primitive, and it will return the appropriate object Singleton object for that type so user status double colon active. Derick Rethans 10:28 You mentioned that enums are quite like classes, does that also mean, you can define methods on the enums? Larry Garfield 10:35 Enums can have methods on them, both normal objects methods and static methods. They can also take interfaces, and then they have to conform to that interface like any other class. They can also use traits, as long as the traits do not have any properties defined, as long as they are method only traits. We are explicitly disallowing properties, because properties are a way to track and maintain and change state. The whole point of an enumeration is that the enumeration value with the case is a completely defined instance of into itself and user status active is always equal to user status active. Derick Rethans 11:16 And you can't do that, if there are properties. Larry Garfield 11:19 Right. You don't want the properties of user status to change dynamically throughout the course of the script that's not a thing. Derick Rethans 11:25 Yeah, what would you use methods for on enums? Larry Garfield 11:29 There's a couple of use cases for dynamic methods or object methods, the most common would be a label method. As an example, if you have a user status enumeration, the example we keep going back to. You can have a method for label. Give each of them a make it a scalar enum, so they all have a string or an integer equivalent that you can save for a database, then you give them all a label method. A which in this case would be a single label method that has a match statement inside it and just switches based on the, the enumeration, expect matching enumeration fit together perfectly they're, they're made for each other, Derick Rethans 12:07 And also match will also tell you if you have forgotten about one of the cases. Larry Garfield 12:12 Ilija started working on match in PHP 8.0, he knew he wanted to do enumerations, I came in later. This is a long plan that he's been working on I've joined him. You have a label method that matches on the enumeration value, which is dollar this inside the method will refer to the enumeration object you're on, just like any other, and then return a human friendly label for each of those. You can then call a static method called cases, which will return an array of the enumeration cases you have, you'll loop over that and read the value property, and you read the label, and you stick that into a select box or checkboxes or whatever, you know template system you have. There you have a way to attach additional static data to a enumeration without throwing on more stateful properties. That's the most common case, there are others. You can have a enumeration method that returns a specific other enumeration case. Like if you're on enumeration you say what's next, you're creating a series of steps like a junior very basic state machine. And you call next and it'll return, another enumeration object on the same type for whatever the next one is, or whatever else you can think of them. I'm sure there are other use cases I'm not thinking of. For static methods, the main use case there is as an alternate constructor. So if you want to say you have a size enumeration small, medium, large, you have a from length static method, you pass it an integer, and it will map that integer to the small, medium or large enumeration case based on whatever set of rules how you want to define which size is which. Probably the most common case for our static method is that kind of named constructor, essentially, maybe others, cool, come up with some. Derick Rethans 14:19 During our compensation you mentioned two functions: cases and from, where do these methods come from? Larry Garfield 14:26 cases is defined on all enumerations period, it just comes with it. You are not allowed to define your own, and it returns. It's a static method, and it returns a list of the instances for that that enum type from the static method that is generated automatically on scalar enum so if there's a scalar backing, then this from method gets automatically generated, you cannot define it yourself it's an error to do so. Derick Rethans 14:55 Is it an error, even if it isn't a scalar enum, to define the from method? Larry Garfield 15:01 At the moment I don't believe so you can put a from on a non scalar enum, like unit enum. We recommend against it. Maybe we should block that just for completeness sake, I'll have to talk to Ilia about that. There's a few edge cases like that we're still dealing with. Nikita has a suggestion for around error handling with from, for what happens if you pass an invalid scalar to from, we're not quite sure what the error handling of that is. It may involve having an extra method as well. In that case, details like that we're still working on as we speak, but we're at that level of shaving off the sharp corners, the bulk of the spec is pretty well set at this point. Derick Rethans 15:45 When I read the RFC it mentioned somewhere that adding enumerations to PHP is part of a larger effort to introduce something called algebraic data types. Can you explain what these are, and, and which future steps and proposals would additionally be needed to support these there? Larry Garfield 16:05 We're deliberately approaching this as a multi part RFC. The larger goal here is to improve PHP's ability to make invalid states, unrepresentable, which is not a term that I have coined I've read it elsewhere. I think I've read it multiple elsewhere, but its basic idea is you use the type system to make it impossible to describe states that are allowed by your business rules, and therefore, you don't need to write error handling for them. You don't need to write tests for them. Enumerations are a form of that, where you don't need to write a test for what happens when you pass user status joking, to a method, because that's a type error, that's already covered, you don't need to think about it any more. Derick Rethans 16:53 Would that be a type error or would you get an error for accessing a constant, that doesn't exist on the enum if you say user status joking? Larry Garfield 17:02 I think you'd actually get an error on a constant that doesn't exist in that case. If you just pass a string, an invalid string to the method and you would get a type error on string doesn't match user status, and the general idea being you structure your code, such that certain error conditions, physically can't happen. Which means you don't need to worry about those error conditions, it's especially useful where you have two things that can occur together but only in certain combinations, my favourite example here, if you're defining an airlock. You have an inner door and an outer door. The case of the indoor and outdoor both being open at the same time is bad, you don't want that. So you define your possible states of the airlock using an enumeration to say: inner door closed out other door closed, is one state is one enumeration case; inner open outer closed as one state, and inner closed outer open is one state. Those are the only three possible values that you can define so you cannot even describe the situation of both doors are open and everyone dies. Therefore, you cannot accidentally end up in that state. That's the idea behind making invalid states unrepresentable. Algebraic data types are kind of an extension of the idea of enumerations further by adding the ability for the cases, to have data associated with them. So they're no longer singleton's. There's a number of different ways that you can implement that different languages do it in different ways. Most of them, bake them into enumerations somehow for one reason or another, a few use the idea of sealed classes, which are a class, which is allowed to be extended by only these three or four or five other classes, and those are the only subclasses you're allowed to have. They're conceptually very similar, we're probably going to go with enumeration based, but there's still debate about that, there's no implementation yet. The idea with ADTs as we're envisioning them is, you can define, like, like you can have an enum case that is just its own thing. You can have an enum case that's backed by a scalar, or you can have an enum case, that has data values associated with it. And then it's not a singleton, can spawn new instances of it. But then those instances are always immutable. The most common example of that that people talk about would be: You can now, implement, monads very easily. I'm not going to go into the whole, what is a monad here. Derick Rethans 19:36 It's a word that I've heard many times but have no idea what it means. Larry Garfield 19:40 You should read my book I explained it wonderfully. But it allows you to say, the return value of this method is either nothing, or there is something and that something is this thing. An enumeration, where the two possible types are nothing, and something. And the something has parametrised with the thing that it actually is carrying. That's one use case. Another use case Rust actually in their documentation is a great example of this. If you're building a game, then the various moves that a player can take: turn left, turn right, move forward, move backward, shoot, back up, those are a finite set of possible moves so you make that an enumeration. And then those are the only moves possible in the code. But some of them, like, move forward by how much. By how much is a parametrised value on the move forward enum case. And you can then write your code to say alright, these are the five possible actions a player can take; these three have no extra parameters, this one has an extra parameter. And I don't need to think about anything else because nothing else is can even be defined. So that's the idea behind ADTs, or algebraic data types. There are way to do the kind of things you can do now just in a much more compact and guaranteed fashion, you can do that same thing now with subclasses, but you can't guarantee that no one else is adding another subclass you have to think about. With an ADT you can guarantee that there's no other cases and there will be no other cases. Larry Garfield 21:17 Another, add on that we're looking at is pattern matching. The idea here is match right now, the match construct in PHP eight, do an identity match. So match variable against triple equals, various possibilities, and that covers a ton of use cases it's wonderful. I love match. However, it would be also very useful to say match this array, let's say an associative array, against a case where the Foo property is bar, I don't care about the rest. If the case where the foo property is beep. And I don't care about the rest I'm gonna do stuff with the rest on the right side of match. With enumerations, that comes in with ADTs, where I want to match against: Is it a move left match against move left and then I want to extract that how far is the right hand side and do something with that. If it's a move right on extract the how far is that on the right. If it's a turn left. Well that doesn't have an associated property so I can match that and do whatever with that. It's a way to deconstruct some value, partially, and then take an action based on just part of that object. That's a completely separate RFC, that, again, we think would make enumerations, even more powerful, and make it a lesson number of other things more powerful of make the match statement, a lot more robust, all the pieces of that we're looking at is looking at match against type, so you can do an instance of or an isint or whatever check as part of the match. Again this is all still future RFC nothing has actually been coded for this yet, it's in the, we would like to do this next if someone lets us Derick Rethans 23:12 Coming back to this RFC. What has the feedback been to it so far? Larry Garfield 23:17 Very positive. I don't think anyone has pushed back on the idea. Or earlier draft used a class per case, rather than object per case, which a couple of people push back on it being too complicated so we simplified that to the object per case. And at this point, I think we're just fighting over the little details like: Does that from method return null, or does it throw? What does it throw? Do we have a second method that does the opposite? Exactly what does reflection look like? We have an isenum function or do we have an enum exists function? We're at that level. I expect this RFC is probably going to pass with like 95%. At this point, something like that. It's not a controversial concept, it's just making sure all the little details are sorted out, the T's are crossed, and the i's are dotted and so on. Derick Rethans 24:09 Did we miss anything in the discussion about enums, do you have anything to add? Larry Garfield 24:15 I don't think so, at least as far as functionality is concerned. To two other things process wise: one, I encourage people to look at this multi stage approach for PHP. I know I've talked about this on, on this podcast before; having a larger plan that you can work on in pieces and then multiple features that fit together nicely to be more than the sum of their parts. It's a very good thing, we should be doing more of that, and collaborating on RFC so that small targeted functionality adds up to even more functionality when you combine them, which is what we're looking to do with enums, and ADTs, and pattern matching for example. Credit where it's due, Ilija has done all of the code for this patch. I'm doing design and project management on this, but actually making it work is 100% Ilija he deserves all the credit for that, not me. Derick Rethans 25:06 Thank you, Larry for explaining enums to me today. I learned quite a lot. Larry Garfield 25:11 Thank you. See you on a future episode, hopefully. Derick Rethans 25:16 Thank you for listening to this instalment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next time. Show Notes RFC: Enumerations Episode #51: Object Ergonomics Episode #69: Short Functions Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
On this devMode Standup, Patrick & Andrew talk about the promise of the JAMstack, Patrick's experience with Nuxt, and how Sprig is the same but different.
This week on the podcast, Eric, John, and Thomas talk about the release of PHP 8, XDebug 3, Salesforce buying Slack, and more...PHP: PHP 8.0.0 Release AnnouncementRasmus Lerdorf thoughts on PHP 8 ReleasePHP: Supported VersionsPHP 8 Docker Image released SpaceX Starlink: User terms of service declare Mars as ‘free planet’Laravel Internals – A new YouTube podcast from the Laravel TeamSalesforce buys Slack in a $27.7B megadealAmazon Sidewalk: Should You Be Co-Opted Into A Private Neighbourhood LoRa Network?Amazon is bringing macOS to its AWS cloudA Million Dollars vs A Billion Dollars, Visualized: A Road TripPHPUgly streams the recording of this podcast live. Typically every Thursday night around 9 PM PT. Come and join us, and subscribe to our Youtube Channel, Twitch, or Periscope. Also, be sure to check out our Patreon Page.AZPHP Meetup
PHP Internals News: Episode 72: PHP 8.0 Celebrations! London, UK Thursday, November 26th 2020, 09:35 GMT In this episode of "PHP Internals News" we're looking back at all the RFCs that we discussed on this podcast for PHP 8.0. In their own words, the RFC authors explain what these features are, with your host interjecting his own comments on the state of affairs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:23 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:32 This is Episode 72. PHP eight is going to be released today, November 26. In this episode, we look back across the season to find out which new features are in PHP eight dot zero. If I have spoken with the instigator of each of these features, I'm letting them explain what this new feature is. in the first episode of this current year, I spoke with Nikita Popov about weak maps, a feature that builds on top of the weak references that were introduced in PHP seven four. I asked: What's wrong with the weak references and why do we now need to weak maps. Nikita Popov 1:10 There's nothing wrong with references. This is a reminder, what weak references are about, they allow you to reference, an object, without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference, and if you try to access it you will get acknowledged sort of the object. Now the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with some typical use cases are caches or other memoize data structures. And the reason why it's important for this to be weak, is that you do not. Well, if you want to cache some data with the object and then nobody else is using that object, you don't really want to keep around that cache data, because no one is ever going to use it again, and it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well. Derick Rethans 2:29 A main use case for weak maps will likely be ORMs and related tools. In the next, episode 38, I discussed this trainable interface with Nicolas Grekas from Symfony fame. I asked: Nicolas, could you explain what stringable is. Nicolas Grekas 2:45 Hello, and stringable is an interface that people could use to declare that they implement some the magic to string method. Derick Rethans 2:53 That was a short and sweet answer, but a reason for wanting to introduce this new interface was much more complicated, and are also potential issues with breaking backwards compatibility. Nikolas replied to my questioning about that with: Nicolas Grekas 3:06 That's another goal of the RFC; the way I've designed it is that I think the actual current code should be able to express the type right now using annotations, of course. So, what I mean is that the interface, the proposal, the stringable is very easily polyfilled, so we just create this interface in the global namespace that declare the method and done, so we can do that now, we can improve the typing's now. And then in the future, we'll be able to turn that into an actual union type. Derick Rethans 3:39 I had a chat with Nikita Popov about union types as part of the last season in Episode 33 before PHP seven four was released, but after the feature freeze for it happened. I happen to speak with Nikita quite a lot because he does so much work improving PHP. In episodes 40 and 43, we discussed a bunch of smaller features and tweaks to the language. First up was the static return type, Nikita explains: Nikita Popov 4:05 So PHP has a three magic special class names that's self for into the current class parent, refering to the parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved than static is the same as self, introducing refers to the current class. However, if the method is inherited. And you call this method on the child class. Then, self is still going to refer to the original class, the parent. Well static is going to refer to the class on which the method was actually called. Derick Rethans 4:50 Next, most the class name literal on objects, which adds the following: Nikita Popov 4:55 And that just returns to the fully qualified class name, for example, have a use statement for that class, you can get back the full name instead of the short name. I think we've had this since PHP five five. That's a great feature because it's like makes it clear whether you're referencing the class and not just some random string, and that means, for example, that the IDE refactorings can work better and so on. Derick Rethans 5:20 In PHP seven, we touched up inconsistencies in PHP is compound variable syntax, but we missed a few cases, Nikita explains: Nikita Popov 5:28 All of these remaining consistencies are like really really minor things and edge cases. But weirdly, all or at least most of them are something that someone that at some point ran into and either open the bug or wrote me an email, or on Twitter. So people somehow managed to still run into these things. Derick Rethans 5:56 But syntax changes are not the only thing that needs fixing; sometimes we also get the theory slightly wrong. In this case related to inconsistencies with traits, Nikita explains again: Nikita Popov 6:07 The problem is that traits are sometimes not self contained. So to give a specific example we have in the logger PSR. We have a trait called logger treat, which has a bunch of methods like: warning, error, info, notice, and so on. So we just simple helper methods which all call the log method with a specific log level, and this trait only specified these helper methods, but still requires the actual class to implement the log method that we usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait and to have a number of abstract methods that the trait itself requires to work. This already works fine. The problem is just that these methods are not actually validated, or they are all inconsistently validated. Even though the trait specifies this abstract method, you could implement it in the class with a completely different signature. Derick Rethans 7:09 Probably one of the big new features and PHP 8.0 are attributes, certainly the most discussed new feature with various RFCs to introduce a feature, and then change the syntax a few times. It all started with Benjamin Eberlei introducing a feature in April, ran Episode 47 I asked Benjamin, what attributes are. He replied: Benjamin Eberlei 7:30 They are a way to declare structured metadata on declarations of the language. So in PHP, or in my RFC, this would be classes, class properties, class constants, and regular functions. You could declare additional metadata there that sort of tags, those declarations with specific additional machine readable information. Derick Rethans 7:56 At the moment, many tools are already used up block comments to do a similar thing. So I asked how attributes are different. Benjamin answered with his first proposed syntax: Benjamin Eberlei 8:06 The idea is that we introduce a new syntax that is independent of the docblock comments. Essentially, before each declaration, you can use the lesser-than symbol twice, then the attribute declaration, and then the greater-than sign twice. This is the syntax, I've used from the previous attributes RFC. Dmitri at that point, use the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction anymore, but because Hack at that point they introduced that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with that kind of sort of free symbols. We can still use at certain places, and lesser-than and greater-than at this point are easy to parse. There are a bunch of alternatives, and one thing that I would probably proposes an alternative syntax, where we start with a percentage sign, then a square bracket open, and then a square bracket close. It is more in line with our Rust declares attributes, by Rust uses the sort of the hash symbol which we can't use because it's a comment in PHP. Derick Rethans 9:23 He already hinted at alternatives for syntax and we'll get back to that in a moment. However, the main important thing is what was inside the surrounding attribute notation, and Benjamin explains again: Benjamin Eberlei 9:35 If you declare an attribute name, and then you sort of have a parenthesis, open parenthesis close to pass optional arguments. You don't have to use them so you can only use the attribute name. If you sort of want to tag something just, this is a validator, this is an event listener, or whatever you come up with to use attributes for. But if you need to configure something in addition, then you can use the syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it. Derick Rethans 10:10 In the rest of that episode we spoke about how to use attributes and what the general ideas behind them were. Soon after the attributes RFC was accepted, Benjamin proposed a second related RFC to tweak some of the working that came up throughout the discussion phase. At the time of the original RFC he did not want to make any changes in order to keep the discussion focused, but there were some tweaks necessary. In Episode 64 Benjamin explains what these changes were: Benjamin Eberlei 10:38 There was renaming the attribute class. So, the class that is used to mark an attribute from PHPAttribute to just Attribute. I guess we go into detail in a few seconds but I just list them. The second one is an alternative syntax to group attributes and yeah save a little bit on the characters to type and allow to group them. The third was a way to validate which declarations an attribute is allowed to be set on, and the fourth was a way to configure if an attribute is allowed to be declared once or multiple times, on one declaration. Derick Rethans 11:23 All the suggested tweaks passed with ease. A contentious issue, however, was the syntax to enclose the attributes. Two RFCs later, the PHP development team finally settled on the syntax which has attributes enclosed in hash, square bracket open, and close with the square bracket. Derick Rethans 11:44 George Peter Banyard likes tidying up things in the language. I spoke with him on several occasions this season, where he was suggesting to just do that. In the first instance he's suggesting to change PHP to use locale independent floating point numbers to string conversions. He explains the problem: George Peter Banyard 12:02 Currently, when you do a float to string conversion. So or casting or displaying a float, the conversion will depend on like the current locale. So instead of always using like the decimal dot separator. For example, if you have like a German or the French locale enabled, it will use like a comma to separate like the decimals. Derick Rethans 12:23 He explained what he suggested to change: George Peter Banyard 12:26 Change more or less to always make the conversion from float to string, the same so locale independent, so it always uses the dot decimal separator. With te exception of printf was like the F modifier, because that one is, as previously said locale aware, and it's explicitly said so. Derick Rethans 12:46 The second RFC was candidly titled saner numeric strings. I asked George, what the scope of the problem he wanted to address is. George Peter Banyard 12:55 PHP has the concept of numeric strings, which are strings which have like integers or floats encoded as a string. Mostly that would arrive when you have like a GET request or a POST request and you take like the value of the form, which would be in a string. And the issue is that PHP makes some kind of weird distinctions, and classifies numeric strings in three different categories mainly. So there are purely numeric strings, which are pure integers or pure float, which can have an optional leading whitespace and no trailing whitespace. However trailing white spaces are not part of the numeric string specification in the PHP language. To deal with that PHP has a concept of leading numeric strings, which are strings which are numeric but ... Derick Rethans 13:42 As you can hear the way how PHP handles these numbers in strings is extremely complicated. The fix is just as complicated, but it pretty much boils down to stop treating strings like "5elephant" as a number. In the last episode, I briefly discuss Larry Garfield's object ergonomics article where he sets out a more coherent way forwards into thinking on how to solve some more of the bigger pain points of PHP. Most related to value objects. Although he did not end up proposing any RFCs himself, Nikita Popov did take some inspiration from it. And he proposed two features for inclusion into PHP eight: constructor property promotion, and named arguments. In Episode 53 Nikita explains with constructor property promotion intends to solve. Nikita Popov 14:29 Right now, if we take a simple example from the RFC, we have a class Point, which has three properties, x y and z. And each of those has a float type. And that's really all the class is ideally, this is all we would have to write. But of course, to make this object actually usable we also have to provide a constructor. And the constructor is going to repeat that, yes, we want to accept three floating point numbers, x y and z as parameters. And then in the body we have to again repeat that. Okay, each of those parameters needs to be assigned to a property. So we have to write this x equals x, this y equals y, this z equals z. I think for the Point class. This is still not a particularly large burden. Because we have like only three properties. The names are nice and short, the types are really short, and we don't have to write a lot of code. But if you have larger classes with more properties, with more constructor arguments, with larger and more descriptive names, and also larger and more descriptive type names. And this makes up for quite a bit of boilerplate code. Derick Rethans 15:52 I asked: What is the syntax that you're proposing to improve this? Nikita Popov 15:57 The syntax is to merge the constructor and the property declarations, so you declare the constructor, and you add an extra visibility keyword in front of the normal parameter name. So instead of accepting float x, in the constructor. You accept public float x. And what this shorthand syntax does is to also generate the corresponding property. So you're declaring a property, public float x, and to also implicitly perform this assignment in the constructor body so to assign this x equals x. This is really all it does so it's just syntactic sugar. It's a simple syntactic transformation that we're doing, but that reduces the amount of boilerplate code you have to write for value objects in particular, because for those commonly, you don't really need much more than the properties and the constructor. Derick Rethans 16:58 Tying in the constructor property promotion was a slightly more controversial RFC, named arguments, which I discussed with Nikita in Episode 59. I asked him what named arguments are: Nikita Popov 17:09 Currently if you're calling a function or a method you have to pass the arguments in a certain order. So in the same order in which they were declared in the function or method declaration. And what named arguments are, and parameters allow us to do, is to instead specify the argument names, when doing the call. Just taking the first example from the RFC, we have the array_fill function, and the array_fill function accepts three arguments. So you can call like array_fill(0, 100, 50). Now, like what what does that actually mean. This function signature is not really great because you can't really tell what the meaning of this parameter is and, in which order you should be passing them. So with named parameters, the same code would be something like array_fill( start: 0, number: 100, value: 50). And that should immediately make this call, much more understandable, because you know what the arguments mean. And this is really one of the main like motivations or benefits of having named parameters. Derick Rethans 18:21 We also briefly touched on the main issues where the introduction of named arguments could introduce backward compatibility issues. Nikita Popov 18:28 If you don't use named arguments that nothing is going to break. But of course, if named arguments are used with codes that did not expect them, then we can run into some issues. So that's one of the issues. And the other one is more of a like long term maintenance concern, that if we introduce named parameters, then those parameters become significant to the API. Which means you cannot rename parameter names in minor versions of libraries if you're semver compatible. Of course, you might be breaking some codes, using those parameter names. And I think one of the biggest concerns that has come up in the discussion is that this is a significant increase in the API burden for open source libraries. Derick Rethans 19:15 Beyond the main features that we've discussed so far. PHP eight also outs a few smaller ones. For example, the non capturing catches, which I discussed with Max Semenik in Episode 58. He explains his short proposal: Max Semenik 19:29 In current PHP, you have to specify a variable for exceptions you catch, even if you don't need to use this variable in your code. And I'm proposing to change it to allow people to just specify an exception type. Derick Rethans 19:48 This proposal password 48 votes for, and one against. The last few major PHP releases a lot of focus was put into strengthening PHP's type system. You see that and PHP seven four with additions to OO variance rules, and in PHP eight already with union types, the stringable interface and a static return type. Dan Ackroyd explains in episode 56, why he was suggesting to ask the predefined union type "mixed". Dan Ackroyd 20:14 I have a library for validating parameters, and due to how that library needs to work the code passes user data around a lot. Internally, and then back out to whether libraries return the validator's result. So I was upgrading that library to PHP 7.4, and that version introduced property types, which are very useful things. What I was finding was that I was going through the code, trying to add types everywhere occurred. And as a significant number of places where I just couldn't add a type, because my code was holding user data. It could be any other type. The mixed type had been discussed before, an idea that people kind of had been kicking around but it just never been really worked on. So that was the motivation for me, I was having this problem where I couldn't upgrade my library, as I wanted to, I kept forgetting: has this bit of code here, been upgraded and I just can't add a type, or is it the case that I haven't touched this bit of code yet. Derick Rethans 21:16 When I spoke with Dan, he also mentioned that sometimes he assists with writing RFCs in case some person would benefit from some technical editing, for example, due to language barriers. In the same way I ended up speaking to Dan again in Episode 65 about a null safe operator, on which he was working with Ilija Tovilo. That explains what a feature is about. Dan Ackroyd 21:38 Imagine you've got a variable that's either going to be an object, or it could be null, so the variable is an object, you're going to want to call a method on it, which obviously if it's null, then you can't call method on it because it gives an error. Instead, what the null safe approach allows you to do is to handle those two different cases in a single line, rather than having to wrap everything with if statements to handle the possibility that it's null. The way it does this is through a thing called short circuiting, so instead of evaluating whole expression. As soon as use the null safe operator, I want the left hand side of the operator is null, and then get short circuited and it all just evalutates to null instead. Derick Rethans 22:18 It also gets an additional benefit related to having shorter code. Dan Ackroyd 22:24 And having the information about what the code's doing in the code, rather than in people's heads makes a lot easier for compilers and static analyzers to their jobs. Derick Rethans 22:35 The last big nice syntax feature in PHP eight zero is the match expression, again by Ilija Tovilo. Instead of me interviewing Dan again, I decided as a joke to interview myself on the subject. It was a little bit surreal but I think it worked out well enough, as a one off event. I first explained to myself the problem with the existing switch language construct. Derick Rethans 22:56 So, before we talk about the match expression, we really need to talk about switch. Switch is a language construct in PHP that you probably know allows you to jump to different cases depending on the value. So you have to switch statement: switch, parentheses opening, variable name, parenthesis closes, and then for each of the things that you want to match against your use case condition, and that condition can be either static value or an expression. But switch has a bunch of different issues that are not always great. So the first thing is that it matches with the equals operator, or the equals, equals sign. And this operator as you probably know, will ignore types, causing interesting issue sometimes when you're doing matching with variables that contain strings with cases that contains numbers, or a combination of numbers and strings. So, if you do switch on the string foo, and one of the cases has case zero, then it will still be matched because it could type juggle the foo to zero, and that is of course not particularly useful. At the end of every case statement you need to use break, otherwise it falls down to the case that follows. Now sometimes that is something that you want to do, but in many other cases that is something that you don't want to do and you need to always use break. If you forget, then some weird things will happen sometimes. Anothercommon thing to use it switches that we sit on on a variable. And then, what you really want to do is the result of, depending on which case's being matched assign a value to a variable. And the current way how any student now is case, say case zero, $result equals string one, break, and you have case two where you don't set: return value equals string two and so on and so on. Which isn't always a very nice way of doing it because you keep repeating the assignment, all the time. And another but minor issue with switch is that it is okay not to cover every value with a condition. So, it's totally okay to have case statements, and then not have a condition for a specific type and switch doesn't require you to add default at the end either, so you can actually end up having a condition that would never match any case, and you have no idea that that would happen. Derick Rethans 25:16 Before I went into details, I also explained how the new match language construct could solve some of these criticisms. Derick Rethans 25:24 The match expression is a new language keyword, which also allows you to switch depending on a condition matching a variable. You're saying matching this variable against a set of expressions just like you would do with switch. But there's a few major differences with switch here. Unlike switch, match returns a value, meaning that you can do return value equals match, then your variable that you're matching on, and the value that gets assigned to this variable is the result of the expression on the right hand side of each condition. Derick Rethans 26:04 That's it for the new features in PHP eight, but I haven't spoken yet about a reason why the PHP team is releasing PHP eight and not PHP seven dot five. And of course, that is PHP's new JIT engine that is slated to improve performance, quite a lot. I have some concerns of my own. And in Episode 48 I spoke with Sara Goleman, which articulated my main concerns with it more eloquently. Sara Golemon 26:31 If you go and look at the engine, particularly the runtime pieces of the engine, although the compiler's complex as well. You have to do a lot of digging before you even get to a point that you can see how the pieces maybe start to fit together. You and I have spent enough time in the engine code that we know where to look for a particular thing like let's say that opcode you mentioned that implements strlen. We know that Zend VM def dot h has got the definition for that. We also know that that file is not real code, it's a pre processed version of code that gets built later on. Somebody coming to that blind is not going to see a lot of those pieces. So there's already this big ramp up just to get into the Zend engine, as it exists now in 7.4. Let's add JIT on top of that, you've got code that is doing call forward graphs and single static analysis and finding these tracelets, and making sense of the code at a higher level than a single instruction at a time, and then distilling that down to instructions that the CPU is going to recognize, and CPU instructions are these packed complex things that deal with immediates and indirects, and indirects of indirects, and registers, and the x86 call API is a ridiculous thing that nobody should ever have to look at. So you add all this complexity to it, that by the way, sits in ext/opcache, it's all isolated to this one extension, that reaches into the engine and fiddles around with things to make all this JIT magic happen and we're going to take your reduced set of developers who know how to work on Zend engine and you're going to reduce that further. I think at the moment it's still only about three or four people who actually understand how PHP's JIT is put together enough that they can do any effective work on it. Derick Rethans 28:20 I am still a little apprehensive about whether the effort of introducing a JIT engine is going to pay off. I certainly hope that I'm going to be proven wrong and that the JIT engine is going to be a massive performance boost. But in the end, we do definitely need more people to understand and work on the PHP engine and a new JOT engine that is built into opcache. Perhaps that's something you yourself might want to have a look at in 2021. PHP 8 will be out later today and I hope that you're pleased with all the new features that a PHP development worked on hard throughout the year. With this I'm concluding this episode and also this year's season. I will be back in the new year with more episodes where I hope to demystify the development of the PHP engine some more. Enjoy the holidays and stay safe. Derick Rethans 29:08 Thank you for listening to this installment of PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next year. Show Notes Episode #33: Union Types Episode #38: WeakMaps Episode #39: Stringable Interface Episode #40: Static Return Type, Class Name Literal on Object, and Variable Syntax Tweaks Episode #43: Validation for abstract trait methods Episode #47: Attributes v2 Episode #48: PHP 8.0 and JIT Episode #52: Locale-independent float to string cast Episode #53: Constructor Property Promotion Episode #54: Magic Method Signatures Episode #59: Named Arguments Episode #64: Attribute Amendments, Shorter Attributes Syntax, and Shorter Attributes Syntax Change Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
This week on the podcast, Eric, John, and Thomas talk a lot of PHP. We continue our discussions around what is coming in PHP 8, as well as Laravel Breeze, xDebug, and more...Links:PHP 8: Weak mapsPHP 8: AttributesPHP 8 Weak Maps and Practical Use CasesPHP RFC: Nullsafe operatorPHP: rfc:match_expression_v2PHP: The Right WayPHP[architect] free article of the month on PodcastInterview with Steve McDougall - Voices of the ElePHPantLaravel BreezeXdebug Update: October 2020 — Derick RethansTryHackMe | Hacking TrainingPHPUgly streams the recording of this podcast live. Typically every Thursday night around 9 PM PT. Come and join us, and subscribe to our Youtube Channel, Twitch, or Periscope. Also, be sure to check out our Patreon Page.
PHP Internals News: Episode 70: Explicit Octal Literal London, UK Thursday, November 12th 2020, 09:33 GMT In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed to add an Explicit Octal Literal to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:24 This is Episode 70. Today I'm talking with George Peter Banyard, about a new RFC that he's just proposed for PHP 8.1, which is titled explicit octal literal. Hello George, would you please introduce yourself? George Peter Banyard 0:38 Hello Derick, I'm George Peter Banyard, I'm a student at Imperial College London, and I contribute to PHP in my free time. Derick Rethans 0:46 Excellent, and the contribution that you're currently have up is titled: explicit octal literal. What is the problem that this is trying to solve? George Peter Banyard 0:56 Currently in PHP, we have four types of integer literals. So decimal numbers, hexadecimal, binary, and octal. Decimal is just your normal decimal numbers; hexadecimal starts with 0x, and then hexadecimal characters so, null to nine and A to F, and then binary starts with 0b, and then it's only zeros and ones. However, octal notation is just a decimal, something which looks like a decimal number, which was a leading zero, which doesn't really look that much different than a decimal number, but it comes from the days from C and everything which just uses like a zero as a prefix. Derick Rethans 1:48 But I have seen is people using like array keys for the, for the month names right and they use 01, 02, 03, you get 07, and 08 and 09, and then they look at the arrays. They notice that they actually had the zeroth element in there but no, but no eight or nine. That's something that is that PHP no longer does I believe. No, it's mostly that the parser doesn't pick it up anymore. Instead of silently ignoring the eight, it'll just give you an error. You've mentioned that there's these four types of numbers with octal being the one started with zero. But what's the problem with is that a moment? George Peter Banyard 2:31 Sometimes when you want to use, which looks like decimal number. So, for example, you're trying to order months, and use like the full two digits for the month number, instead of just one, you use 01, as an array key. When you get to array, it will parse error because it can't pass 08 as an octal number, which is very confusing, because it. Most people don't deal with octal numbers that often, and you would expect everything to be decimal. Because numeric strings are always decimal, but not integers literals. So, the proposal is to add an explicit octal notation, which would be 0o. So python does that, JavaScript has it, Rust also has it, to allow like a by more explicit to say oh I'm dealing with an octal number here. This is intended. Derick Rethans 3:33 Beyond having the 0b for binary, and the 0x for hexadecimal, the addition of 0o for octal is the plan to add. And is that it? George Peter Banyard 3:45 That's more or less the proposal. It's non-BC, because the parser before would just parse or if you had 0o, so there's no PC very possible numeric strings are not affected because since PHP 7.0 hexadecimal strings are not handled anymore as numeric strings. Numeric strings will always be decimal integers, literals will have your four different variants, and maybe a future proposal is to deprecate the implicit octal notation to always make a decimal, even if you have leading zeros. Derick Rethans 4:21 At the moment, if I do as a string literal 014, and do an echo that I get 12. George Peter Banyard 4:27 Because then it's interpreted as an octal. The most bizarre example is if you do var_dump string of 014 double equal to 014, you will get false, because one is interpreted as well 14, like the numeric string is interpreted as 14, whereas the octal number, which says 014 as an integer literal is interpreted as an octal number, which is 12, which is slightly confusing for most people, because that also if you because PHP, most, we all deal with like HTTP requests, and I GET and POST a data, which everything is in strings because it's a text protocol. And if you get user output, which is like I don't know, naught 14 and you're, are you intending to compare munz numbers which are or. 01201, and then you get to array, well then you just fail. Derick Rethans 5:22 Of course, removing that support means a BC breaking change, which phones happen until PHP nine, of course, which might be a while away from now let's say that. George Peter Banyard 5:31 Probably five years, if we're going through the timelines from PHP seven to PHP 8, but to be able to deprecated and remove it. Well, you need to add support for something else. So that's more the long term plan. Derick Rethans 5:46 And your proposal is basically to make it equivalent to binary and hexadecimal numbers, so that it is less confusing in general. George Peter Banyard 5:55 Yes, that's why the RFC is very short. Derick Rethans 5:58 What are octal numbers actually used for? George Peter Banyard 6:02 The only practical use case that I've seen is for Linux permissions, so chmod. Execute read and write, are those who permissions which chmod will use an octal number. Derick Rethans 6:15 In a different order though but George Peter Banyard 6:17 Yes, I don't know chmod though on top of my heart. Derick Rethans 6:20 Is it only Linux permissions that you can think of? Is there anything else? I can't either so I'm asking you. George Peter Banyard 6:25 No, I can't. That's why I find it very odd that like the leading zero just makes it octal instead of anything else. I mean it has precedence because many other languages do that like C, Java, I don't know, many any language I suppose was just picked it up from. I think C. But when I looked into the history, weirdly enough before C. They had a prefix for like binary, octal, and dec, and hexadecimal. But then the one for octal just got dropped, for some reason. Derick Rethans 6:57 Maybe because the zero and the "o" next each other look very the same. We've already touched on whether there are BC breaks or not, BC standing for backwards compatibility. And, there shouldn't be any because it's something that a parser currently doesn't understand. But do other build-in extensions need to be modified for example? George Peter Banyard 7:18 We have two extensions, which one which deals with numbers, so which is GMP, which is arbitrary precision arithmetic. And then there's the filter extension to filter octal, which filters data and tells you if it's valid or not and it gives you back a, like a correct integer or something like that, which is the filter extension, which has an octal filter. Both of these extensions have been modified to support like the prefix notation, and interpreted as a valid octal number. And then we have like the function which is oct2dec, which is basically octal to decimal, which which weirdly enough already supported like the octal prefix. Derick Rethans 7:59 But that accepts strings, I suppose? George Peter Banyard 8:01 Yes that that accepts strings. Derick Rethans 8:04 And it already supported the 0o prefix? George Peter Banyard 8:07 Yes, which is very on point for PHP I feel. Some things are just supported randomly in one side but not everywhere else. Derick Rethans 8:15 It's a surprise for me that is what I can say. So, yeah, you mentioned as a short RFC, you think there will be any extensions to this in the future? You already mentioned having it maybe deprecating the current just zero prefix? George Peter Banyard 8:31 So one other possible future scope is with the prefix to reintroduce octal, binary, and hexadecimal numbers. As with the prefixes as numeric strings. If you type, 0xAABBCC in, and you have that as a string, which could be useful if you get like colorus back from, from a webform, that would be automatically converted into an integer, or not automatically converted if you do like if you compare it to numbers, or if you cast it to an integer, because currently if you get 0x, something and you cast it to an integer, you will get zero. So that way you need to use like a function like hex2dec, or oct2dec, or bin2dec to convert from a string, or to another string and then cast that. Or it may be cast directly to an integer, I'm not exactly sure. But that's also debatable if it's something we want to add. Derick Rethans 9:37 Is it actually possible to do, for example with hexadecimal numbers, do like if you have inside a string. Can you do xAA, does that actually work? George Peter Banyard 9:48 I didn't think so. Derick Rethans 9:49 That actually works. You can do var_dump("x6A") and it gives you the letter J. George Peter Banyard 9:55 The more, you know. Derick Rethans 9:56 But it doesn't work for binary, or octal. Only for hexadecimal with x. So I guess that's something that could be added to string interpolation at some point. George Peter Banyard 10:07 PHP is so weird sometimes. Derick Rethans 10:10 Yes, I mean PHP does things in its own way, however, making this kind of small changes to it, just end up improving the language step by step and that is of course the way forward. Right. George Peter Banyard 10:23 Yeah. Derick Rethans 10:25 And I'm looking forward to more of these small incremental changes in the future as well. George Peter Banyard 10:30 Seems like a good plan. Derick Rethans 10:32 Are you planning any more? George Peter Banyard 10:34 Well, so I went through some of the old RFCs, most notably the one about when the whole scalar type thing was going on. We had like strict types and then we had like the coercive types. One which was by Dmitri, Zeev, pretty sure Stas, and um forgot, forgetting somebody else. But some of them, some of the ideas they had, which was making some of the type juggling more strict, so float to integer conversions. Currently, even if the floating number has like decimal part, it will just truncate it to an integer, and it won't emit any warning and it will just like pass without any issue, I think that may be is kind of unexpected. I made the other warning to that to possibly make it a type error in the future. Derick Rethans 11:24 You mean upon a cast? George Peter Banyard 11:26 If you've type hint function as accepting only integers, so if you say foo(int $bar), and you pass it the float. And you would like in normal mode, it will truncate, and it will just pass an integer. Derick Rethans 11:40 Because it's just typed. George Peter Banyard 11:42 Yes, and we've had multiple reports of people being very confused about why it's just truncating the numbers, because it's not even rounding up. If you had like if you have like 0.9 it won't round up to one it will just truncate to zero, which a lot of people are confused by. Derick Rethans 11:58 In strict mode doesn't do that? George Peter Banyard 11:59 Yeah, because strict mode is very strict and will only allow you to pass explicitly what's been what you've requested, with the exception of the normal integer to float conversion which is lossless. Derick Rethans 12:12 That's lossless up to a certain point yes. George Peter Banyard 12:14 To a certain point like your integer doesn't fit, then it goes overflow to a float. Derick Rethans 12:19 All right. George thank you very much for taking your time this afternoon to talk to me. George Peter Banyard 12:23 Thank you for having me. Derick Rethans 12:26 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Explicit octal integer literal notation Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 69: Short Functions London, UK Thursday, November 5th 2020, 09:32 GMT In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a new RFC that's he proposing related to Short Functions. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Derick Rethans 0:24 Hello, this is Episode 69. Today I'm talking with Larry Garfield, about an RFC that he's just announced called short functions. Hello Larry, would you please introduce yourself? Larry Garfield 0:35 Hello World, I'm Larry Garfield, the director of developer experience at platform.sh. These days, you may know me in the PHP world mainly from my work with PHP FIG. The recent book on functional programming in PHP. And I've gotten more involved in internals in the last several months which is why we're here. Derick Rethans 0:57 I'm pretty sure we'll get back to functional programming in a moment, and your book that you've written about it. But first let's talk about short functions, what are short functions, what is the problem that are trying to solve? Larry Garfield 1:11 Well that starts with the book actually. Oh. Earlier this year, I published a book called Thinking functionally in PHP, on functional programming in PHP, during which I do write a lot of functional code, you know, that kind of goes with the territory. And one of the things I found was that the syntax for short functions, or arrow function, or can be short lambdas, or arrow functions, you know whatever name you want to give them, was really nice for functions where the whole function is just one expression. Which when you're doing functional code is really really common. And it was kind of annoying to have to write the long version with curly braces in PSR 2, PSR 12 format for functions that I wanted to have a name, but we're really just one line anyway does return, blah blah blah. It worked, got the job done. Larry Garfield 2:13 Then hanging around with internals people, friend of the pod Nikita Popov mentioned that it should be really easy. Now that we've got short functions, or short lambdas, do the same thing for named functions. And I thought about. Yeah, that should be doable just in the lexer, which means, even I might be able to pull it off given my paltry miniscule knowledge of PHP internals. So, I took a stab at it and it turned out to be pretty easy. Short functions are just a more compact syntax for writing functions or methods, where the whole thing is just returning an expression. Derick Rethans 2:56 Just a single expression? Larry Garfield 2:58 Yes. If your function is returning two parameters multiplied together, it's a trivial case but you often have functions or methods that are doing. Just one expression and then returning the value. It's a shorter way of writing that. Mirrored on the syntax that short lambdas use. It doesn't enable you to really do anything new, it just lets you write things in a more compact fashion. But the advantage I see is not just less typing. It lets you think about functions in a more expression type way, that this function is simply a map from input to this expression, which is a mindset shift. So yes it's less typing but it's also I can think about my problem as simply an expression translation. And that's very very common in functional programming, also it helps encourage writing pure functions which functional programming is built on, and is just general good practice anyway, pure functions have no side effects. Take no stealth input from globals or anything like that. And their only output is their return value. You want most of your codebase to be that, and a short function is really hard to not make that, you can but it's hard not to. In one sense it's just syntax saving, in another sense it's enabling a more functional mindset, as you're writing code, which I'm very much in favor of. Derick Rethans 4:27 What is it that you're proposing then? Larry Garfield 4:29 The specific syntax is the same function, signatures we have now: function name, open paren, parameters, closed paren, optionally colon return type. But then instead of open curly brace, it's just a double arrow expression semi colon, the exact same body style as a short lambda has. Derick Rethans 4:52 Or as a match expression? Larry Garfield 4:54 Or a match expression or, you know, just array values, you know they're all examples of double arrow expression, which means this thing becomes that thing, wraps that thing. And so it's a very familiar syntax, and it can all go on on one line or you can drop it to the next line and indents depending on how long your code is. I've done both with short lambdas, they both read just fine. That is the exact same semantic meaning as open curly brace return that expression closed curly brace. Derick Rethans 5:28 Yeah, and that wouldn't allow you to allow multiple statements in that case, because the syntax just wouldn't allow for it. Larry Garfield 5:34 Correct. There's just like short lambdas. If you have multiple statements we'll get to that, that's not a thing. There is discussion of having short lambdas now take multiple lines. I haven't weighed in on that yet I'm not getting into that, at this point. Derick Rethans 5:50 Where came up more recently again was with the match expression for PHP 8.0, where, because you're limited to this one expression on the right hand side, there was originally talk of extending that to multiple lines, but then again that wouldn't met, that wouldn't match with the short lambdas that we have, or your short functions if they make it into the language. Larry Garfield 6:11 Right, all of that comes down to the idea of block expressions, which would be a multi lines set of statements that gets interpreted as an expression, which doesn't exist in PHP today, is how something like Rust works, but lifting that into PHP while there might be advantages to it is a big lift, and that needs a lot of thinking through to make it actually makes sense, it may not make sense. Again, I have not dug into that in much detail yet, other than to say, we really need to think that through before doing anything about it. Derick Rethans 6:45 Yeah, especially about what kind of return value you'll end of returning from it because a return without the return keyword is tricky to. It's tricky to create semantic reasoning for and how to do that right. Larry Garfield 6:56 Yeah, there's a lot of semantic trickery involved there so I explicitly avoiding that in this RFC, it's a nice simple surgical change. Derick Rethans 7:05 You mentioned some use cases in the form of, they're useful in functional programming, but most people don't use functional programming with PHP, or maybe in your opinion don't use it yet. Would would be use cases for non functional programming with PHP for this new syntax? Larry Garfield 7:22 Even if you're not doing formal functional programming. There's still a lot of cases where you have a function that just ends up being one line because that's all it needs to be. Especially if you're doing object oriented code. How many classes have you written that have, they're an entity class and they have eight properties, which means you have eight getter methods on them, each of which does nothing except return this arrow, you're removing three lines out of each one of those again using standard syntax conventions. By using a short function for that. You may also have a lot of refactoring techniques encourage producing single line functions or single line methods. For example, if you have an if statement, or a while statement or some other kind of check and check is A and B, and or C equals D, or some kind of complex logic there. Very common recommendation is alright break that out to a utility method, or utility function that can give a name to and becomes more self documenting. This is a good refactor and giving you a bunch of single line functions that are just really an expression. So, I'll write structure them as just an expression. My feeling is is more advantageous with standalone functions than with methods, but most of the logic applies to both equally well. Derick Rethans 8:51 In the case of setters and getters, that actually makes quite a bit of sense righ? Larry Garfield 8:55 It's just for getters for setters generally you set something, and then return this or return null, or something like that and that is a different statement, so it wouldn't work for setters. There are ways to click around that by calling a sub function there which I'm not actually going to encourage but you can do. Derick Rethans 9:15 Yeah, I guess you could create a lambda. Larry Garfield 9:17 And actually what you do is have a function that just takes a parameter and ignores and returns a second parameter. And then the body of your setter is calling that function with the Sep sabyinyo this row foo equals whatever, then your second parameter is this, and it ends up working. I don't actually suggest people do that. Derick Rethans 9:39 It's also too complicated for me to understand what you're trying to say there so let's, let's not encourage that use of it. Larry Garfield 9:46 I did it just to see if I could not because it's a good idea. Derick Rethans 9:50 Okay, your conclusion was, it's not a good idea. Larry Garfield 9:54 Cute hack. Derick Rethans 9:55 I saw in a discussion on the mailing list, some people talking about why is this using function and not fn. What is your opinion about that? Larry Garfield 10:04 Mainly because using function was easier to implement initially. So that's what I went with; just the way some of the lexer rules are structured, it was a bit trickier to use, to do fn. And I figured go with the easy one. That said, Sara Goleman gave a patch that's takes care the FN part. So there are patches available for both. Personally I moderately prefer function, I think it has less confusion. And if you have to convert a function from one to the other, it's less work then. But I don't really care all that much so if the consensus is we like the feature but we want to use fn I'm good with that too. There's some interesting discussion around, we were saying, there are some people trying to push right now to have short lambdas also take multiple statements, or to have long lambdas, anonymous functions, do auto capture. And so this question is now okay, the double arrow versus the FN, which one means auto capture, which one means single expression. I haven't weighed in on that yet. It'd be sense to sort all of it out and make it all, logically consistent, but as long as things are consistent I don't particularly care which keyword gets used where. Derick Rethans 11:20 By adding this feature to PHP, the syntax feature, is there a possibility for backward compatibility breaks? Larry Garfield 11:27 I don't believe so. The syntax I'm proposing would be a syntax error right now, so there shouldn't be any backward compatibility issues. Other use case I forgot to mention before, is if you're doing functional style code. Then, very often want to branch, your logic. Very concisely without full if statements, people are used to Haskell, I use the pattern matching and stuff like that, I'm not proposing that here. But the new match statement in PHP eight zero is a single expression. That gives you a branching capability. And so that dovetails together very nicely to say: okay, here's a function branch, using a match statement based on its inputs. And it maps to a single expression. Could be a call to another function, could be just a single expression. It's just another place where it doesn't make possible, anything you couldn't do before. It just makes certain patterns more convenient. Derick Rethans 12:25 So no BC breaks, but more use cases. Can you see what else could be added to this kind of style functionality for the future? Larry Garfield 12:35 I think this syntax itself is very easily self contained, does one simple thing and does it well and there's not much room to expand on it. There's no reason to have closures on named functions that's not really a thing. One of the things I like about it though, is it dovetails nicely with some other things that are in flight. A teasier for future episodes I suppose I'm collaborating with Ilya Tovolo on enumerations that hopefully will support methods on enumeration values. It is an excellent case for single line expressions because there's not much else to do in a lot of cases. So you can end up writing a very compact enumeration that has methods that are single expression, and boom, you've got a state machine. I've been working on a pipe operator that allows you to chain functions together. You can now have a single line, single expression, function that is just: take input and pipe it through a bunch of other functions. And now you have a complex pipeline that is just one single expression function. Hopefully these things will come to pass. I still got quite a bit of time before eight one's feature freeze, so we'll see what happens, but to me all of these things dovetail together nicely. And I like it when functionality dovetails together nicely. That means you can have functions or functionality that has benefit on its own. But in tandem with something else they're greater than the sum of their parts and that's to me the sign of good language design and good architecture. Derick Rethans 14:09 And we have been getting to some of that that PHP 8.0 with having both named arguments and promoted constructor arguments for example. Larry Garfield 14:17 Exactly. You talked about that several episodes ago. Derick Rethans 14:20 Would you have anything else to add? Larry Garfield 14:22 Like, the. A lot of the changes that have happened in PHP eight that have made, 7.4 and eight, that have made functional style code more viable and more natural. And that's the direction that I'm hoping to push more with my limited technical skills in this area, and better design skills and collaborating with others, but there's a lot of targeted things we could do in PHP 8.1 to make functional style code easier and more viable, which works really well in a request response environment. Which PHP's use cases are very well suited to functional programming. I'm hoping to have a lot of these small targeted changes like this that add up to continuing that trend of making PHP a more functional friendly language, Derick Rethans 15:11 But you're not proposing to turn into a functional language altogether? Larry Garfield 15:15 A strictly functional language? No, that would not make any sense. A language in which doing functional style things is easier and more natural? Absolutely. I think there's a lot of benefits to that, even in a world where people are used to doing things in an OO fashion. Those are not at odds with each other. Functional programming and object oriented code. A lot of the principles of functional programming are also principles of good OO code, like stateless services. That's a pure function by a different name. Ways to do a lot of those things more easily. I think is a benefit to everyone. Those who agree with me, I could use help on that, volunteers welcome. Derick Rethans 15:54 As always, as always. Okay, thank you, Larry for taking the time this morning slash afternoon to talk to me about short functions. Larry Garfield 16:02 Thank you, Derick and take care of PHP world. Derick Rethans 16:06 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me slash patron. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Short Functions Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 68: Observer API London, UK Thursday, September 17th 2020, 09:31 BST In this episode of "PHP Internals News" I chat with Levi Morrison (Twitter, GitHub) and Sammy Kaye Powers (Twitter, GitHub, Website) about the new Observer API. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 68. Today I'm talking with Levi Morrison, and Sammy Powers, about something called the observer API, which is something that is new in PHP eight zero. Now, we've already passed feature freeze, of course, but this snuck in at the last possible moment. What this is observer API going to solve? Levi Morrison 0:44 the observer API is primarily aimed at recording function calls in some way so it can also handle include, and require, and eval, and potentially in the future other things, but this is important because it allows you to write tools that automatically observe, hence the name, when a function begins or ends, or both. Derick Rethans 1:12 What would you use that for? Levi Morrison 1:13 So as an example, Xdebug can use this to know when functions are entered or or left, and other tools such as application performance monitoring, or APM tools like data dog, New Relic, tideways, instana so on, they can use these hooks too. Derick Rethans 1:38 From what I understand that is the point you're coming in from, because we haven't actually done a proper introduction, which I forgot about. I've been out of this for doing this for a while. So both you and Sammy you work for data dog and work on their APM tool, which made you start doing this, I suppose. Sammy Kaye Powers 1:54 Yeah, absolutely. One of the pain points of tying into the engine to to monitor things is that the hooks are insufficient in a number of different ways. The primary way that you would do function call interception is with a little hook called zend_execute_ex and this will hook all userland function calls. The problem is, it has an inherent stack bomb in it where if, depending on your stack size settings you, you're going to blow up your stack. At some point if you have a very very deeply deep call stack in PHP, PHP, technically has a virtually unlimited call stack. But when you use zend_execute_ex, it actually does limit your stack size to whatever your settings are your ulimit set stack size. One of the issues that this solves is that stack overflow issue that you can run into when intercepting userland calls but the other thing that it solves is the potential JIT issues that are coming with PHP eight, where the optimizations that it does could potentially optimize out a call to zend_execute_ex where a profiling or APM tracing kind of extension would not be able to enter set that call, because of the JIT. The Observer API enables to solve multiple issues with this. Not only that, there's more. there's more features to this thing, because zend_execute_ex by default will intercept all userland function calls, and you have no choice but to intercept every single call, whereas, this API is designed to also allow you to choose which function calls specifically you want to intercept, so there is on the very first call of a function call. And it'll basically send in the zend function. This is a little bit of a point we've been kind of going back and forth on what we actually send in on the initialisation. But at the moment it is a zend function so you can kind of look at that and say okay do I want to monitor this function or observe it. If your extension returns a handler and says this is my begin handler This is my end handler. Those handlers will fire at the beginning and end of every function call. So it gives you a little bit of fine grain sort of resolution on what you want to observe. The other really kind of baked in design, part of the design is, we wanted it to play well with neighbours, because some of the hooks, at the moment, well, pretty much all of the hooks. Aside from typical extension hooks. Whenever you tie into the engine it's very easy to be a noisy neighbor. It's very easy not to forward a hook along properly it's very easy to break something for another extension. This has like kind of neighbour considerations baked right in. So when you actually request a begin and end hook. It will manage on its side, how to actually call those, and so you don't have to forward along the hook to other other extensions to make it a little bit more stable in that sense. Derick Rethans 4:52 From working on Xdebug, this is definitely problem forwarding both zend_execute_ex and also zend_execute_internal, which is something I also override are of course. And I think there are similar issues with, with the error display as well and PHP eight will also have a different, or a new API for that as well. Also coming out of a different performance monitoring tool, which is interesting to see that all these things works. You mentioned the Zend function thing and I'm not sure how well versed the audiences and all this internal things what is this zend function thing? Levi Morrison 5:24 as any function in the engine is what represents a function so not the scope that it's called in but the scope that it's defined in. It represents both method calls and function calls. It's triggered whenever a user land function is in play. So it has the function name, the name of the class that it's associated with, it tells you how many parameters you have and things like this. It does not tell you the final object that it's called with, and this is partly why we are debating what exactly should get passed in here, because some people may care. Oh, I only want to observe this with particular inheritors or, or other things of that nature so there's a little bit of fine tuning in the design perhaps still but the basic idea is you'll know the name of the function. What class it's in, and it's bound late enough in the engine that you would also have access to whatever parents that class has, etc. Derick Rethans 6:33 Does it contain the arguments as well, that are being sent, or just a definition of the arguments? Levi Morrison 6:38 The Zend function only contains the definition of the arguments. The hook is split into three sections kind of so there's like initialisation and then begin and end. Initialisation only gives you the Zendo function but to begin and gives you access to what's called the Zend execute data which has more information, including the actual arguments being passed. Derick Rethans 7:03 Okay, so it's the idea of the initialisation, just to find out whether you actually want to intercept the function. And if you want remember that and if not it wouldn't ever bother are the trying to intercept that specific zend function either. Sammy Kaye Powers 7:17 Actually what we actually pass into that initialization function is has been sort of up for debate. The original implementations, that is plural. We've had many different implementations of this thing over the, over the year. Derick you did mention that this got squeezed in last minute it has been a work in progress for a very long time and it actually is fulfilling JIT work so there's a specific mention in the JIT RFC that that mentions an API that is going to be required to intercept some function calls that are optimized out so that's why we were able to sneak in a little bit past feature freeze on the actual merge I think. But what we actually sent into this initialization function is spin up two for debate based on how we've actually implemented it. One of the original early implementations actually called this initialization function during the runtime cache initialization, just basically kind of a cache that gets initialized before the execute data actually is created. We didn't have the option of sending in the execute data at that time, we did have the zend function. So we were sending that in. Later on this implementation get refactored to a number different ways. We have the option now to send an execute data if we wanted to, but it might be sufficient to send in the op array instead of the Zend function. The op array should be the full sort of set of opcodes that basically is a function definition from the perspective of of internals, but it also includes like includes and evals. Having additional information at initialisation might actually be handy. I think we're still kind of maybe thinking about that potentially changing I don't know what do you think Levi. Levi Morrison 8:59 Yeah, you can get the oparray from the function so it's a little pedantic on which one you pass in I guess, but yeah. The idea is that we don't want to intentionally restrict it. It's just that the implementations have changed over the year so we're not sure exactly what to pass in at the moment. I think a zend function's pretty safe, passing in a zend oparray is perhaps a better signal to what it's actually for, because it can measure function calls, but also include, require, eval. And the oparray technically does contain more information. Again, if you have zend function, you can check to see if it is an oparray and get the operate from the Zend function. So a little pedantic but maybe a little better in conveying the purpose and what exactly it targets. Derick Rethans 9:56 And you can also get the oparray from zend_execute_data. Levi Morrison 10:00 Yeah. Derick Rethans 10:01 If I want to make use of this observe API I will do that? I guess, you said only from extensions and not from userland. Sammy Kaye Powers 10:08 Exactly. At the moment you would as an extension during MINIT or startup, basically in the very early process with the actual PHP processes starting up, would basically register your initialization handler. And at that point, under the hood, the whole course of history is changed for PHP at that point, because there is a specific observer path that happens when an observer extension registers at MINIT or startup. At that point the initialization function will get called for every function call. The very first function call that that function called is called, I know that sounds confusing but if you think you have one function and it's called 100 times that initialization will run one time. That point you can return either a begin and or at an end handler. If you return null handlers it'll never, it'll never bother you again for that particular function, but it will continue to go on that is don't mentioned earlier for every new function that encounters every new function call and encounters, I should say. Derick Rethans 11:12 There is not much overhead, because the whole idea is that you want to do as little overhead as possible I suppose. Levi Morrison 11:19 Exactly, we have in our current design in pre PHP 8.0. You could hook into all function calls using that zend_execute_ex, but it has performance overhead just for doing the call. So let's imagine we're in a scenario where we have two extensions, say Xdebug and one APM product. Both of them aren't actually going to do anything on this particular call it will still call those hooks, which has overhead to it. So if nobody is interested in a function, the engine can very quickly determine this and avoid the overhead of doing nothing. This way we only pay significant costs, if there's something to be done. Derick Rethans 12:09 You're talking about not providing much overhead at all. Just having the observer API in place, was there any performance hits with that? Sammy Kaye Powers 12:17 That was actually one of the biggest sort of hurdles that we had to overcome specifically with Dmitri getting this thing, merged in because it does touch the VM and whenever you touch the VM like we're talking like any tiny little null check that you have in any of the handlers is probably going to have some sort of impact at least enough for Dmitri, who understandably cares about like very very very small overheads that are happening at the VM level, because these are happening for every function call. You know, this is, this is not something that's just happening, you know, one time during the request is happening a lot. In order to apeace Dmitri and get this thing merged in, it basically had to have zero overhead for the production version non observer, his production version but on the non observed version on the non observed path it had to basically reach zero on those benchmarks. That was quite a task to try to achieve. We went through four, about four or five different ways of tying into the engine, we got it down to about, like, two new checks for every function call. And that still was not sufficient, so we end up going with based on Dmitris excellent suggestion, went with the opcode specialization, to implement observers so that at compile time. We can look and see if there's an observer extension present and if there is, it will actually divert the function call related opcodes to specific handlers that are designed for observing and that way once, once you get past that point, the observer path is already determined at compile time and all the observer handlers fire. In a non observed environment, all of the regular handlers will fire without any observability checks in them. Derick Rethans 14:03 At the end of getting within the loss of zero or not? Levi Morrison 14:07 It is zero for certain things. Of course, there are other places besides the VM that you have to touch things here and there for, you know, keeping code tidy and code sane but it's effectively zero, for all intents and purposes. Goal achieved. I will say zero percent. Derick Rethans 14:30 I think the last version of the patch that I saw didn't have the code specialization in it yet. So I'm going to have to have a look at myself again. Levi Morrison 14:39 Yeah, the previous version had very low overhead, so low overhead that you couldn't really observe it through any time based things. But if you measured instructions retired or similar things from the CPU, then it was about point four to 1% reduction, and personally I would have said that's good enough because all of them would correctly branch predict, because you either have handlers in a request, or you don't. And so they would perfectly predict, every time. But still, those are extra instructions technically so that's why Dmitri pushed for specialization and those are no longer there. Derick Rethans 15:27 Does that mean there are new opcodes specifically for this, or is it just the specialization of the opcodes that is different? Sammy Kaye Powers 15:33 It's just this specialization. During the process of going, figuring out what exactly Dmitri needed to mergeable actually proposed an implementation that added basically an observer version of every kind of function call related opcode like do_fcall_observed, or observed_return or something like that. With, opcodes specialization, it reduces the amount of code that you have to write sort of at the VM level, it doesn't change the amount of code that's generated though because with opcode specialization, basically the definition file will get expanded by this, this php file that actually generates C code. When you add a specialization, to a handler that already has specializations on it, it will expand quite considerably. The PR at one point ended up being like 10,000 lines or something like that, so we had to do some serious reduction on the number of handlers that were automatically generated. Long story short, is there are no new opcodes but there are new opcode handlers to handle this specific path. Derick Rethans 16:40 Not sure what, if anything more to ask about the Observer API, do you have anything to ask yourself? Levi Morrison 16:45 I think it's worth repeating the merits of the observer API and where we're coming from. The key benefits in my opinion are that it allows you to target per function interception for observing. It allows you to do it in a way that's that plays nice with other people in the ecosystem and increasingly that's becoming more important. We've always had debuggers and some people occasionally need to turn debuggers on in production and other things like this. But increasingly, there are other products in this space; the number of APM products is growing over time. There are new security products that are also using these kinds of hooks. And I expect over time we will see more and more and more of these kinds of of tools, and so being able to play nicely is a very large benefit. At data dog where Sammy and I both work we've hit product incompatibilities a lot of times, and some people are better to work with than others. I know that Xdebug has done some work to be compatible with our product specifically but you know competitors aren't so interested in that. We care a lot about the community right, we want the whole community to have good tools, and I don't think we actually mentioned yet that we did collaborate with some other people and competitors in this space. That hopefully proves that that's not just words of mine that, you know, we actually met with the competitors who were willing to and discussed API design, and use cases, and making sure that we could all work together and compete on giving PHP good products rather than, you know, hoarding technical expertise and running over each other and causing incompatibilities with each other. So I think those are really important things. And then lastly, it does not have that stack overflow potential that the previous hooks you could use did. Derick Rethans 18:54 Yeah, which is still an issue for Xdebug but but I fixed that in a different way by setting an arbitrary limit on the amount of nested levels you can call, right. Levi Morrison 19:02 Yeah, and in practice that tends to work pretty well because most people don't have intentionally deep code. But for some people they do. And we can't as an APM product for instance say: sorry your code is just not good code, we can't observe a crash your your your thing and so we can't make that decision. And then the biggest con at the moment is that it doesn't work with JIT, but I want to specifically mention that, that's not a technical thing, that's just a not enough time has been put in that space yet because this was crunched to the last second trying to get it in. And so, some things didn't get as much focus yet. Hopefully by the time 8.0 gets released it will be compatible with JIT, or at least it will be only per function, so maybe a function that gets observed, maybe that can't be JIT compiled that specific function call, but all the other function calls that aren't observed would be able to. We'll see obviously there's still work to do there but that's our hope. Derick Rethans 20:10 What happens now, if, if you use the observer API and the JIT engine is active? Does it just disable the JIT engine. Sammy Kaye Powers 20:16 Yep. It just won't enable the JIT at all. In fact, it just goes ahead and disables it, if an observer extension is enabled and there is a little kind of debug message that's emitted inside of the OP cache specific logs that will will say specifically why the JIT isn't enabled just in case you're sitting here trying to turn the JIT on you're like, why isn't enabled, and it'll say there's an observer extension present so we can enable the JIT. Hopefully they'll be able to work a little bit, and maybe just change an optimization level or something in the future. I'd like to give a shout out to Benjamin Eberlei, who has been with us since the very beginning on this whole thing has been vetting the API on his products, has gotten xhprof on not only the original implementation but also on the newest implementation, and has just been a huge help in actually getting this thing pushed in, and was said some of the magic words that actually, this thing merged in, when it was looking like it wasn't gonna land for eight dot O and got it landed for eight dot one so Benjamin gets a huge thumbs up. So, Nikita Popov, Bob Weinand, and Joe Watkins really early on. These are awesome people from internals who have spent some time to help us vet the API, but also to help us with specific implementation details. It's been just a huge team effort from a lot of people and it was just like, really great to work across the board with everybody. Derick Rethans 21:35 Yeah, and the only thing right now of course is all the extensions that do observe things need to be compatible with this. Sammy Kaye Powers 21:43 Exactly. Derick Rethans 21:44 Which is also means there's work for me, or potentially. Sammy Kaye Powers 21:47 Absolutely. Levi Morrison 21:49 I guess one one minor point there is that if an extension does move to the new API, it is a little bit insulated from those that haven't moved to the new API. So, to some degree, it still benefits the people who haven't moved yet because the people who have moved have one less competitor in the same same hook, so it's just highlighting the fact that it plays nicely with other people. Derick Rethans 22:14 Is opcache itself actually going to use it or not? Levi Morrison 22:17 So this is focused only on userland functions; past iterations that was not the case. Dmitri kind of pushed back on having this API for internals and so that got dropped. I don't think at this stage there's any there's any value in opcache using it specifically, but there are some other built in things like Dtrace. I don't know how many people actually use Dtrace; I actually have used it once or twice, but Dtrace could use this hook in the future instead of having a custom code path and things like that. Derick Rethans 22:49 For Xdebug I still need to support PHP seven two and up, so I'm not sure how much work it is doing it right now, but definitely something to look into and move to in the future, I suppose. Well thank you very much to have a chat with me this morning. I can see that for Sammy the sun has now come up and I can see his face. Thanks for talking to me this morning. Sammy Kaye Powers 23:10 Thanks so much, Derick and I really appreciate all the hard work you put into this because I know firsthand experience how much work podcasts are so I really appreciate the determination to continue putting out episodes. It's a huge amount of work so thanks for being consistent. Levi Morrison 23:26 Yeah, thank you so much for having us Derick. Derick Rethans 23:30 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patroen. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes Pull Request: https://github.com/php/php-src/pull/5857 Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
This week on the podcast, Eric, John, and Thomas discuss what they liked and what they didn't from Laracon 2020. They also discuss this years Hacktoberfest, XDebug 3, and much more. Laracon Online | The official Laravel online conferenceHacktoberfest presented by DigitalOceanXDebug 3 Features a New Info PanelPHP: Using namespaces: Aliasing/Importing - ManualApple has backed down in its latest developer fight, apologizing to WordPress after it pressured the website builder to add in-app payments
PHP Internals News: Episode 67: Match Expression London, UK Thursday, August 20th 2020, 09:30 BST In this episode of "PHP Internals News" I chat with Derick Rethans (Twitter, GitHub, Website) about the new Match Expression in PHP 8. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:15 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 67. Today we're going to talk about a match expression. I have asked the author of the match expression RFC, lija Tovilo, whether it wanted to come and speak to me about the match expression, but he declined. As I think it's important that we talk in some depth about all the new features in PHP eight, I decided to interview myself. This is probably going to sound absolutely stupid, but I thought I'd give it a go regardless. So here we go. Derick Rethans 0:53 Hi Derick, would you please introduce yourself? Derick Rethans 0:56 Hello, I'm Derick and I'm the author of Xdebug I'm also PHP seven four's release manager. I'm also the host of this podcast. I'm also you. Derick Rethans 1:07 What a coincidence! Derick Rethans 1:10 So what is the problem that is RFC is trying to solve? Derick Rethans 1:13 Well, before we talk about the match expression, we really need to talk about switch. Switch is a language construct in PHP that you probably know, allows you to jump to different cases depending on the value. So you have to switch statement: switch parentheses open, variable name, parenthesis close. And then for each of the things that you want to match against your use: case condition, and that condition can be either static value or an expression. But switch has a bunch of different issues that are not always great. So the first thing is that it matches with the equals operator or the equals, equals signs. And this operator as you probably know, will ignore types, causing interesting issues sometimes when you're doing matching with variables that contain strings with cases that contains numbers, or a combination of numbers and strings. So, if you do switch on the string foo. And one of the cases has case zero, and it will still be matched because we put type equal to zero, and that is of course not particularly useful. At the end of every case statement you need to use break, otherwise it falls down to the case that follows. Now sometimes that is something that you want to do, but in many other cases that is something that you don't want to do and you need to always use break. If you forget, then some weird things will happen sometimes. It's not a common thing to use it switch is that we switch on the variable. And then, what you really want to do the result of, depending on which case is being matched, assign a value to a variable and the current way how you need to do that now is case, say case zero result equals string one, break, and you have case two where you don't set return value equals string two and so on and so on, which isn't always a very nice way of doing it because you keep repeating the assignment, all the time. And another but minor issue with switch is that it is okay not to cover every value with a condition. So it's totally okay to have case statements, and then not have a condition for a specific type and switch doesn't require you to add default at the end either, so you can actually have having a condition that would never match any case, and you have no idea that that would happen. Derick Rethans 3:34 How's the match expression going to solve all of this stuff? Derick Rethans 3:37 The match expression is a new language keyword, but also allows you to switch depending on a condition matching a variable. You have matching this variable against a set of expressions just like you would switch, whereas a few major differences with switch here. So, unlike switch, match returns a value, meaning that you can do: return value equals match, then your variable that you're matching, and the value that gets assigned to this variable is the result of the expression on the right hand side of each condition. So the way how this works is that you do result equals match, parenthesis, opening your variable name parenthesis close curly braces and then you have your expression which can just be zero, or one, or a string, or floating point number, or it can be an expression. For example, a larger than 42. And then this condition is followed by an arrow, so equals, greater than sign, and then another statement. What a statement evaluates to get assigned to the return value of the match keyword. Which means that you don't have to do in every case, you don't have to return value equals value, or return value equals expression. Now the expression itself, the evaluator what evaluates to gets returned to match. So that is one thing but as a whole bunch of other changes. The matching with a match keyword is done based on strict type. So instead of using the equal operator, the two equal signs, match uses the identical operator which is the three equal signs so that is strict comparison. Normal rules for type coercion are suspended, no matter whether you have strict types defined in your script, meaning that the match keyword always takes care of the type. It will not be any type juggling in there that can create confusing results. I've already mentioned that it can also be an expression on both the left and right hand sides. Switch also allows you to have an expression on the left hand side, although that isn't used very much, I guess. Another difference between switch and match is that, in case a switch on your right hand side, you can have multiple statements so you can have case seven, colon, and then a bunch of statements and statements are ended by the break, pretty much. With match you can only have one expression. It's possible that is going to change in the future, but it is at the moment analogous with what you can do with the short arrow functions, the arrow function with the fn keyword. That also only allows one expression on the right hand side. Switch also doesn't do automatic fall through to the next condition. If you have a single statement that doesn't make a lot of sense anyway, but this is one of the other changes here. So just like switch you can add multiple conditions separated by comma, on the left hand side. So you can do case zero comma one arrow, and then your expression again. As I mentioned with switch, it is possible to not cover all the cases like it is possible to say four possible values for a specific variable. Take for example you have addition, subtraction, multiplication and division. If none of the conditions that you have set up with a match matches. Then you'll get a unhandled match error exception, it is still possible to have a default condition just like you have it switched out. But if you don't cover any of the conditions or default, then you will get an exception. Anything that is actually still the same between switch and a new match. Well yes, any implementation each condition is still, even evaluated in order, meaning that if the first condition doesn't match it start evaluating the second condition or the third condition and so on, and so on. Also, just like switch again, if all the conditions are either numeric values or strings, PHP's engine will construct a jump table. That was introduced somewhere in PHP seven three I think, that automatically jumps to the right case. Switch, all the conditions have to be either integer numbers like 01234 or all strings, like foo, like bar, like baz, or so on and so on. The match statement, actually, it's a bit more flexible here because it can construct one jump table, if all the conditions are numbers or strings. It doesn't matter that are all numbers or all strings. If there are all numbers or integer numbers or strings, then it can construct this jump table. That doesn't work with switch because switch's type coercion and match doesn't and the internal implementation already support like this hashmap which is pretty much an array that supports integer array keys as well as associative array keys. But because for match the, the matching happens independent on the type is actually ends up working, so does actually works a little bit better, which is great. Derick Rethans 8:54 Where there, any other additions that were considered to add to the new match keyword? Derick Rethans 8:58 Well, there were a few things. There was a bit of discussion about blocks, meaning multiple statements to run for each condition. In the end it didn't become part of the RFC, perhaps because it made it a lot more complicated, or perhaps because it was really important to think that functionality through and also at the same time, think about what that does for the short array functions which also, just like match, only support one specific statement at the moment. There were some thoughts about adding pattern matching to the condition just like Perl does a little bit, where yeah like with regular expressions for example, but is also really difficult subject and lots of considerations have to be taken into account so that was also dropped from this RFC. The last one was a quick syntax tweak, which allow you to omit the variable name for a match expression. If you end up matching only against like expressions, like a larger than 42, or b smaller than 12, then it doesn't necessarily matter what you have behind match; the variable name there doesn't matter. So, well the trick that people already use with switch is, is to use switch (true). And with match you can also use match (true), and the addition that was suggested to do here was to be able to not have the true there at all. So, the match would only work on the conditions and not try to match these against a variable, but that also didn't become part of this current RFC. Derick Rethans 10:31 Are there any backward compatibility breaks? Derick Rethans 10:34 Well beyond match being a new keyword, there are none. But because match is a new keyword that we're introducing, it means it can't be used as a full namespace name, a class name, a function name, or a global constant. It shouldn't really be much of a surprise that PHP just can introduce these keywords, and that ends up breaking some code. I haven't looked at any analysis about how much code is actually going to break, but it is possible that it does actually do some. But also PHP also top level namespace so if you had a class name called match, you should have put it in your own namespace. And in PHP eight with Nikita's namespace token names RFC, as long as match is on its own, then your namespace name, you'd still be able to use it now, as part of a namespace name, which isn't possible or wouldn't have been possible with PHP seven four. Derick Rethans 11:33 Okay. What was the reception of this RFC? Derick Rethans 11:37 Initially, there was quite a little bit of going back and forth about especially the pattern matching or a few other things. But in the end were to slightly reduce scope of the RFC, it actually ended up passing very well with 43 votes and two votes against, which means it's now part of PHP eight. In the last week or so we did find a few bugs. Some crashes. But, Ilija the author of both RFC and the implementation is working on these to get those fixed, so I'm pretty sure we'll all quite ready for PHP eight with the new match expression. Derick Rethans 12:12 Thanks Derick for explaining this new match expression. It was a bit weird to interview myself, but I hope it turned out to be fun enough and not too weird. Derick Rethans 12:21 Thanks for having me Derick. Derick Rethans 12:23 This is going to be the last episode for a while, as PHP 8's feature freeze is now in effect, and no new RFCs are currently being proposed. Although I'm pretty sure they are being worked on. There's one exception to the feature freeze period, which is the short attribute syntax change RFC, which which I'm collaborating on the Benjamin Eberlei, whether that will turn into yet another episode about attributes, we'll have to see. For the PHP eight celebrations, I'm hoping to make two compilation episodes again, as it did last season with Episode 36 and 37. For the PHP eight celebrations episodes, I am again looking for a few audio snippets from you, the audience. I'm looking for a short introduction, with no commercial messages, please. After your introduction, then state which new PHP eight feature you're looking most forwards to, or perhaps another short anecdote about a new PHP eight feature. Please keep it under five minutes. With your audio snippet, feel free to email me links to your Twitter, blog, etc. The email address is in the closing section of each of the episodes. Here's an example of what I'm looking for. Derick Rethans 13:35 Hi, I'm Derick, and I host PHP internals news. I am the author of Xdebug and I'm currently working on Xdebug cloud. My favourite new feature in PHP eight are the additions to the type system, but union types and a mixed type continue to strengthen PHP's typing system. We've grown up from simple type hints to real proper types, following the additional property types and contract covariance and PHP seven four. I'm looking forward to PHP's type system to be even stronger, perhaps, with generics in the future. Derick Rethans 14:07 Please record us as a lossless compressed file, preferably FLAC, FLAC, recorded at 44,100 hertz. And if you save them bit of 24 bits that. If you want to make a WAV file or a WAV file that's fine too. Please make them available for me to download on a website somewhere and email me if you have made one. Derick Rethans 14:33 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Match expression Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 66: Namespace Token, and Parsing PHP London, UK Thursday, August 13th 2020, 09:29 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Namespaced Names as a Single Token, and Parsing PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 66. Today I'm talking with Nikita Popov about an RFC that he's made, called namespace names as token. Hello Nikita, how are you this morning? Nikita 0:35 I'm fine Derick, how are you? Derick Rethans 0:38 I'm good as well, it's really warm here two moments and only getting hotter and so. Nikita 0:44 Same here. Same here. Derick Rethans 0:46 Yeah, all over Europe, I suppose. Anyway, let's get chatting about the RFC otherwise we end up chatting about the weather for the whole 20 minutes. What is the problem that is RFC is trying to solve? Nikita 0:58 So this RFC tries to solve two problems. One is the original one, and the other one turned up a bit later. So I'll start with the original one. The problem is that PHP has a fairly large number of different reserved keyword, things like class, like function, like const, and these reserved keywords, cannot be used as identifiers, so you cannot have a class that has the name list. Because list is also a keyword that's used for array destructuring. We have some relaxed rules in some places, which were introduced in PHP 7.0 I think. For example, you can have a method name, that's a reserved keyword, so you can have a method called list. Because we can always disambiguate, this versus the like real list keyword. But there are places where you cannot use keywords and one of those is inside namespace names. So to give a specific example of code that broke, and that was my, my own code. So, I think with PHP 7.4, we introduced the arrow functions with the fn keyword to work around various parsing issues. And I have a library called Iter, which provides various generator based iteration functions. And this library has a namespace Iterfn. So Iter backslash fn. Because it has this fn keyword as part of the name, this library breaks in PHP 7.4. But the thing is that this is not truly unnecessary breakage. Because if we just write Iter backslash fn, there is no real ambiguity there. The only thing this can be is a namespace name, and similarly if you import this namespace then the way you actually call the functions is using something like fn backslash operator. Now once again you have fn directly before a backslash so there is once again no real ambiguity. Where the actual ambiguity comes from is that we don't treat namespace names as a single unit. Instead, we really do treat them as the separate parts. First one identifier than the backslash, then other identifier, then the backslash, and so on. And this means that our reserved keyword restrictions apply to each individual part. So the whole thing. The original idea behind this proposal was actually to go quite a bit further. The proposal is that instead of treating all of these segments of the name separately, we treat it as a single unit, as a single token. And that also means that it's okay to use reserved keywords inside it. As long as like the whole name, taken together is not a reserved keyword. The idea of the RFC was to, like, reduce the impact of additional reserved keywords, introduced in the future so in PHP eight we added the match keyword, which can cause similar issues, and in PHP 8.1 maybe we're going to add an enum keyword, and so on. Each time we add a keyword we're breaking code. The idea of this RFC was to reduce the breakage, and the original proposal, not just allowed these reserved keywords inside namespace names, but also removed the restrictions we have on class names and function names and so on. So you would actually be able to do something like class Match. Then, if you wanted to use the class, you would have to properly disambiguate it. For example, by using a fully qualified name, starting with a backslash or by well or using any other kind of qualified name. So you wouldn't be able to use an isolated match, but you could use backslash match, or some kind of things they just named backslash match. However, in the end, I dropped support for this part of the RFC, because there were some concerns that would be confusing. For example if you write something like isset x, that would call the isset on built in language construct. And if you wrote backslash isset x, that would call a user defined, isset function, because we no longer have this reserved keyword restriction. While this is like an ambiguous from a language perspective, the programmer might get somewhat confused. So if we want to go in this direction we probably want to add more validation about which reserved keywords are allowed in certain positions and I didn't want to deal with that at this point. Derick Rethans 5:33 Also by removing it, you make the RFC smaller which gives usually a better chance for getting them accepted anyway. Nikita 5:40 Yes. Derick Rethans 5:41 You mentioned in the introduction that originally you tried to solve the problem of not being able to use reserved keywords in namespace names. It ended up solving another problem that at the moment you wrote this, you wasn't quite aware of. What is this other problem? Nikita 5:57 The other problem is the famous or infamous issue of the attributes syntax, which we are having a hard time solving. The backstory there is that originally the attributes introduced the PHP eight use double angle brackets, or shift operators as delimiters. Derick Rethans 6:18 I've been calling it the pointy one. Nikita 6:20 Okay, the pointy one. And then there was a proposal accepted to instead change this to double at operator at the start, because that's kind of more similar to what other languages do; they use one @, we will use two @s on to avoid the ambiguity with the error suppression operator. Unfortunately, it turned out that as initially proposed the syntax is ambiguous. All because of quite an edge case, I would say. So attributes in PHP are going to be allowed on parameters, and then parameters can have a type. You can have a sequence where you have @@, then the attribute name, then the type name, then the parameter name. And the problem is that because we treat each part of the name separately, between the backslashes, there can also be whitespace in between. So you can have something like a, whitespace, backslash, whitespace, b. Now the question is, in this case does the ab belong to the attribute so is it an attribute name, or is only the a an attribute name, and b is the type name of the parameter. Yeah that's ambiguous and we can't really resolve it unless we have some kind of arbitrary rule, and while the original proposal did introduce an arbitrary rule that the attribute name cannot contain whitespace, it will be interpreted in a particular way. And what this proposal, effectively does is to instead say that names, generally cannot contain whitespace. Derick Rethans 8:01 Your namespace names as token RFC basically disallows spaces for namespace names which you currently can have? Nikita 8:10 Right. So I don't think anyone intentionally uses whitespace inside namespace names. Or actually, you could even have comments inside them. But it is currently technically allowed, and one might introduce it as a typo. That means that this change is a backwards compatibility break. Because you can have currently whitespace in names, but based on static analysis of open source projects, we found that this is pretty rare. So I think we found something like five occurrences inside the top thousand packages. There is one other backwards compatibility break that is in practice I think much more serious. And that's the fact that it changes our token representation. So instead of representing just names as a string separates a string. We now have three different tokens, which is the one for qualified names, one for fully qualified names. So with everything backslash and one for namespace relative names, which is if you have namespace backslash at the start, and namespace I mean, here literally. So this is a very rarely used on PHP feature. Derick Rethans 9:26 I did not know it existed. Nikita 9:28 I also actually did some analysis for that I found very few uses in the wild. I think mostly people writing the static analysis tools know about that, no one else. But the other problem is that this breaks static analysis tools because they now have to deal with a new token representation. Derick Rethans 9:47 We have been talking about these tokens. What are tokens on like the smallest level. Nikita 9:51 PHP has, what's a three phase compilation process where first we convert the raw source code, which is just a sequence of characters into tokens, which are slightly larger semantic units. So instead of having only characters we recognize reserved keywords and recognize identifiers and operators, and so on. Then on the second phase, we have the parser which converts these individual tokens into larger semantic structures like addition expression, or an assignment expression or class expression and so on. Finally we convert the result of that which is a parse tree or an abstract syntax tree into our actual virtual machine instructions, our bytecode representation. Derick Rethans 10:41 And then PHP eight down to machine code, if the JIT kicks in. I remember from a long time ago when I was in uni, is that there are different kinds of parsers and from what I understand PHP's parser is a context free parser. What is is a context free parser? Nikita 10:57 Well a context free in particular is a term from the CompSci theory, where we separate parsers into four languages, into four levels, though I think this is not a super useful distinction, when it comes to language parsing. Because general context free parser has complexity of O(n^3). Like if you have a very large file, it will take very long to parse it could be with the general context free parser. So what we'll be do in practice are more like deterministic context free languages, which is a smaller set. The formal definition there are a set that can be parsed by deterministic push down automaton but I don't think you want to go there. Derick Rethans 11:41 No, not today. Nikita 11:44 Because in practice actually what we do in PHP is even more limited than that. So the parser we use in PHP is called LA/LR1 parser. So that's a look ahead, left to right, right most derivation parser Derick Rethans 11:59 Also mouthful, but it's simpler to explain. Nikita 12:02 I think really the most relevant parts of this algorithm to us is the "1". What the one means is that we can only use a single look ahead token. When we recognize some kind of structure, we have to be able to recognize it by looking at the current token, and that the next token. At each parsing step, those two things have to be sufficient to recognize and to disambiguate syntax. Actually even more restrictive than that but I think that's a good mental model. This is really, I think the core problem we often have when introducing new syntax, so this is the problem we had with the short arrow syntax, short closure syntax. There we would need not one token of look ahead but an infinite potentially in finite number of look ahead. Derick Rethans 12:55 In case the function keyword is still being used. Nikita 12:59 Yes, that's why we added the fn keyword to what the problem because with the fn keyword we can immediately say at the start of the start of the arrow function that this is an arrow function. And we will have similar problems if we introduce generics in the future, at which point we might actually just give up and switch to a more powerful parser. The parser generator we use, which is bison, has two modes. One is the LA/LR mode. And the other one is the GLR mode. They both use the same base parsing algorithm at first, but the GLR mode allows forking it. So if it encounters an ambiguity, it will actually like try to pursue both possibilities, which is of course a lot less efficient, but it allows us to parse more ambiguous structures. Derick Rethans 13:50 Wouldn't that not cause a problem that has some cases because every token will be ambiguous, it will explode in like an exponential way? Nikita 13:57 Yes, I mean it's worst case exponential. But if you like have a careful choice of where you are ambiguous, then you can see that okay with this particular ambiguity, you can actually get worse still linear for valid code. Derick Rethans 14:14 That makes a decision between two possibilities pretty much at that stage. Nikita 14:18 But I think the danger there is that one might not notice when one introduces ambiguities, or maybe not real syntax ambiguities, you do tend to notice those. But ambiguities on the parser level where the parser cannot distinguish possibilities based on the single token for get. Derick Rethans 14:41 Was there in the past being ambiguities introducing a PHP language, that we fail to see? Nikita 14:46 I don't think so. I mean, because we use this parser generator. It tells us if we have ambiguities, so either in the form of shift/reduce or reduce/reduce conflicts. So we cannot really introduce ambiguities. We can it's possible to suppress them, and we actually did have a couple of suppressed conflicts in the past, this was for one particular very well known ambiguity, which is the dangling elseif/else. Basically that the else part of an ifelse. So if you have a nested if without braces. Then there is an else afterwards, the else could ever belong to the inner if or to the outer if, but this is like a standard ambiguity in all C like languages that allow omitting two braces. Derick Rethans 15:35 That's why coding standard say: Always use braces. The patch belonging to this RFC has been merged. So it means that in PHP eight we'll have the longer or the new token names. Do you think that in the future we'll have to make similar choices or similar adjustments to be able to add more syntax? Nikita 15:56 Yes. As I was already saying before, I think these syntax conflicts, they crop up pretty often. This is like very much not an isolated occurrence. If it's really fundamental. If there are any fundamental problems in the PHP syntax where we made bad choices. I think it's pretty normal that you do run into conflicts at some point. So, especially when it comes to generics using angle brackets, in pretty much all languages I deal with my major parsing issues when it comes to that. For example Rust has the famous turbo fish operator, where you have to like write something like colon colon angle brackets to disambiguate things in this specific situation. Derick Rethans 16:49 I just listened to a podcast on the Go language where they're also talking about adding generics, and they had the exact same issue with parsing it. I think they ended up going for the square bracket syntax instead of the angle brackets, or the pointies, as I mentioned. Nikita 17:04 Everyone uses the pointy brackets syntax, despite all those parsing issues because everyone else use it. For PHP for example, we wouldn't be able to use square brackets, because those are used for array access, and that would be completely ambiguous, but we could use curly braces. Well, people have told me that they would not appreciate that. Let's say it like this. Derick Rethans 17:31 Is it's finally time to start using the Unicode characters for syntax? No! Nikita 17:38 Although I did see that, apparently, some people use non breaking spaces inside their test names, because it looks nice and is pretty crazy. Derick Rethans 17:49 I like using it in my class names as well instead of using study caps. Nikita 17:52 I hope that's a joke, you can ever tell. Derick Rethans 17:54 I like using emojis instead. That's also a joke, but I do use it in my presentations and my slides just to jazz them up a little bit. Nikita 18:01 This is an advantage of PHP, because our Unicode support in identifiers works because the files are in UTF-8. That means that all the non ASCII characters have the top bit set, and we just allow all characters with a top bit set as identifiers. And that means that there is no validation at all that identifiers used contain only like identifier or letter characters, so you can use all of the emojis and whitespace characters and so on, you won't have any restrictions. Derick Rethans 18:36 And that was possible for as long as PHP 3 as far as I know. It's a curious thing because this is something that popped up quite a lot when we're discussing, or arguing, about Unicode. You shouldn't allow Unicode because then you can have funny characters in your function names like accented characters and stuff like that, and then also always fun to point out that yeah you could have done that since 1997. Anyhow, would you have anything else to add? Nikita 19:01 I don't think so. Derick Rethans 19:02 Well thank you very much for having a chat with me this morning. And I'm sure we'll see each other at some other point in the future. Nikita 19:09 Thanks for having me once again Derick. Derick Rethans 19:12 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Treat namespaced names as single token GLR Parser LALR(1) Parser Iter Library RFC: Match expression Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 65: Null safe operator London, UK Thursday, August 6th 2020, 09:28 BST In this episode of "PHP Internals News" I chat with Dan Ackroyd (Twitter, GitHub) about the Null Safe Operator RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 65. Today I'm talking with Dan Ackroyd about an RFC that he's been working on together with Ilija Tovilo. Hello, Dan, would you please introduce yourself? Dan Ackroyd 0:37 Hi Derick, my name is Daniel, I'm the maintainer of the imagick extension, and I occasionally help other people with RFCs. Derick Rethans 0:45 And in this case, you helped out Ilija with the null safe operator RFC. Dan Ackroyd 0:50 It's an idea that's been raised on internals before but has never had a very strong RFC written for it. Ilija did the technical implementation, and I helped him write the words for the RFC to persuade other people that it was a good idea. Derick Rethans 1:04 Ilija declined to be talking to me. Dan Ackroyd 1:06 He sounds very wise. Derick Rethans 1:08 Let's have a chat about this RFC. What is the null safe operator? Dan Ackroyd 1:13 Imagine you've got a variable that's either going to be an object or it could be null. The variable is an object, you're going to want to call a method on it, which obviously if it's null, then you can't call a method on it, because it gives an error. Instead, what the null safe operator allows you to do is to handle those two different cases in a single line, rather than having to wrap everything with if statements to handle the possibility that it's just null. The way it does this is through a thing called short circuiting, so instead of evaluating whole expression. As soon as use the null safe operator, and when the left hand side of the operator is null, everything can get short circuited, or just evaluates to null instead. Derick Rethans 1:53 So it is a way of being able to call a methods. A null variable that can also represent an object and then not crash out with a fatal error Dan Ackroyd 2:02 That's what you want is, if the variable is null, it does nothing. If a variable was the object, it calls method. This one of the cases where there's only two sensible things to do, having to write code to handle the two individual cases all the time just gets a bit tedious to write the same code all the time. Derick Rethans 2:20 Especially when you have lots of nested calls I suppose. Dan Ackroyd 2:25 That's right. It doesn't happen too often with code, but sometimes when you're using somebody else's API, where you're getting structured data back like in an in a tree, it's quite possible that you have the first object that might be null, it's not null, it's going to point to another object, and the object could be null so and so so down the tree of the structure of the data. It gets quite tedious, just wrapping each of those possible null variables with a if not null. Derick Rethans 2:55 The RFC as an interesting example of showing that as well. Is this the main problem that this syntax or this feature is going to solve? Dan Ackroyd 3:03 That's main thing, I think there's two different ways of looking at it. One is less code to write, which is a thing that people complain about in PHP sometimes it being slightly more verbose than other languages. The thing for me is that it's makes the code much easier to reason about. If you've got a block of code that's like the example in the RFC text. If you want to figure out what that code's doing you have to go through it line by line and figure out: is this code doing something very boring of just checking null, or is it doing something slightly more interesting and, like maybe slipping in a default value, somewhere in the middle, or other possible things. Compare the code from the RFC, to the equivalent version using the null safe operator, it's a lot easier to read for me. And you can just look at it. Just see at a glance, there's nothing interesting in this code going on. All is doing is handling cases where some of the values may be null, because otherwise you can just look at the right hand side of the chain of operators, see that it's either going to be returning null, or the thing on the right hand side. So for me, it makes code a lot easier to reason about something. I really appreciate in new features that languages acquire. Derick Rethans 4:17 From what I can see it reduces the mental overhead quite a bit. As you say, the full expression is either going to be the null, or the return type of the last method that you call. Dan Ackroyd 4:29 There's nothing of value in between. So all of that extra words is wasted effort both writing it, and reading it. Derick Rethans 4:37 Okay, you mentioned short circuiting already, which is a word I stumble over and had to practice a few times. What is short circuiting? Dan Ackroyd 4:45 Very simple way of explaining it is this is similar to an electronic circuit when it's short circuited the rest of the circuit stops operating. It just the signal stop just returns back to earth. The null safe short circuiting, it means that when a null is encountered the rest of the chain of the code is short circuited and just isn't executed at all. Derick Rethans 5:08 Okay, what you're saying is that if you have a variable containing an object or working objects, you call a method that returns a null. And then, because it's null it won't call the next method on there any more. Dan Ackroyd 5:20 Yes, it won't execute the rest of the chain of everything after a short circuit, takes place. Derick Rethans 5:27 The RFC describes multiple way of short circuiting in, to be more precise, it's talks about three different kinds of short circuiting. Which ones are these and which ones have been looked at? Dan Ackroyd 5:37 $obj = null; $obj?->foo(bar())->baz(); There's apparently three different ways to do short circuiting. And the way this has been implemented in PHP is the full short circuiting. One of the alternative ways was short circuiting for neither method arguments, nor chained method calls. Imagine you've got some code that is object method call foo. And inside foo there's a function called bar and then after the method call to foo, there's a another method call to baz. First way that some other languages have is short circuiting when basically no short circuiting. So, both the function call would be called, and also then the method call baz, would also be called. And this option is quite bad for both those two details, having the function call happen when the result of that function call is just gonna be thrown away is pretty surprising. It just doesn't seem that sensible option to me, and even worse than that, the chaining of the method call as is pretty terrible. It means that the first method call to foo could never return null, because the, there's no short circuiting happening. Each step then need to artificially put short circuiting after the method call to foo, which is a huge surprise to me. This option of not short circuiting either method, arguments, or chained method calls seems quite bad. Derick Rethans 7:04 If one of the methods set wouldn't have been called normally because the objects being called is null, and the arguments to that function are, could be functions that could provide side effects right. You don't know what's the bar function call is going to do here for example. Dan Ackroyd 7:18 It just makes the whole code very difficult to reason about and quite dangerous to use short circuiting in those languages. Derick Rethans 7:25 It's almost equivalent that if you have a logical OR operation right. If the first thing is through evaluates to true and you do the OR the thing behind the OR it's never going to be executed, it's a similar thing here and I suppose, except the opposite. Dan Ackroyd 7:40 It's very similar in that people are used to how short circuiting works in OR statements. And for me, similar sort of short circuiting behaviour needs to happen for null safe operator as well for it to be not surprising to everybody. Derick Rethans 7:55 This is the first option short circuiting never for neither method arguments, for chained methods calls. What was the second one? Dan Ackroyd 8:02 So the second one is to short circuit for method arguments but not for chaining method calls. This scenario, the call to bar wouldn't take place, but the call to the subsequent method call of baz would still take place. This is slightly less bad, but still, in my opinion, not as good as for short circuiting, again because even if the method call foo should never return null, cause the null propagates through the chain in the expression, then have to artificially use a another null safe operator to avoid having a can't call methods on parent. Derick Rethans 8:41 And then the third one, which is the short circuiting for both method arguments and chained method calls. Dan Ackroyd 8:47 That's the option has been implemented in PHP. This is the one that is most sensible, in my opinion. Soon as the short circuit occurs, everything in the rest of the chain of operators that applies to that objects, get short circuited. To me is the one that is least surprising and the one that everyone's going to expect for it to work in that way. Derick Rethans 9:08 So I've actually I have a question looking at the code that you've prepared here where it says: object, question mark, arrow, foo, which is the syntax. We didn't mention the syntax yet. So it is object, question mark, arrow, foo. And the question mark being the addition here. Would it make sense that after foo, instead of using just the article baz to use question mark baz. It'd be a reason why you want to do that? Dan Ackroyd 9:33 There are only for languages that don't do full short circuiting. For the languages that don't do full short circuiting and the null makes its way through to then have baz called on it, you have to use another null safe operator in there, just to avoid another error condition happening. Derick Rethans 9:51 Very well. Which other languages, actually have full short circuiting? Dan Ackroyd 9:55 So the languages that have full short circuiting are C sharp, JavaScript, Swift, and TypeScript, and the languages that don't have for short circuiting are Kotlin, Ruby, Dart, and hack. I'm not an expert on those languages but having a quick look around the internet, it does seem to be that people who try to use null safe operator in the languages that don't implement full short circuiting are not enjoying the experience so much. To me it appears to be a mistake in those languages. I don't know exactly why they made that choice to imagine that is more a technical hurdle, rather than a deliberate choice of this is the better way. It does appear that implementing the full short circuiting is quite significantly more difficult than doing the other option, other types of short circuiting, just because the amount of stuff that needs to be skipped to the full short circuiting, so you've got to imagine that they thought it was going to be an acceptable trade off between technical difficulty and implementation. The, I think that's probably going to be useful enough. To me it just doesn't seem to be that great of a trade off. Derick Rethans 11:07 Short circuit circuiting happens when you use the null safe operator. So the null safe operator that syntax is question mark arrow. You mentioned that there is a chain of operators, what kind of operators are part of this chain or what is this chain specifically? Dan Ackroyd 11:21 So the null safe operator will by, and short circuit, a chain of operators, and it will consider any operators that look inside or appear in side and object to be part of a chain, and operators that don't look inside an object to be not part of that chain. So the ones that appear inside an object are property access, so arrow, null safe property access. So, question mark arrow. Static property access, double colon. Method call, null safe method call, static method call, and array access also. Other operators that don't look inside the object would be things like string concatenation or math operators, because they're operating on values rather than directly like pairing inside the object to either read a property or do mythical, they aren't part of the chain. They'll still be applied, and they will be part of the chain that gets short circuited Derick Rethans 12:17 Which specific chain is used as an argument in a method called as part of another, what sort of happens here? Dan Ackroyd 12:27 This is at the limit of my understanding, but the chains don't combine or anything crazy like that. It's only in the object type operators here inside the objects that might be null. That will continue a chain on a chain of operators is then used as a argument to another method call or function call. That's the end of that chain, and they'd be two separate chains, so for me there's no surprising behaviour around the result of a non safe operator being used as an argument to another function call, or another method call, which might have a separate null safe operator on the outside, those two things are independent. They don't affect each other. Derick Rethans 13:09 Yeah, I think that seems like a logical way of approaching this. Otherwise, I expect that the implementation will be quite a bit trickier as well. Dan Ackroyd 13:17 This is actually something I consider quite deeply when people come up with RFCs is, how would I explain this to a junior developer. If the answer is: I would struggle to explain this to a junior developer. That probably means that, one I don't understand the idea well enough myself, or possibly that the idea is just a bit too complicated for it to be used safely in projects. I mean, there's a difference between stuff being possible to use safely. And we've got things in PHP that are possible to use safely but aren't always going to be used safely, like eval and other questions which can be used quite dangerously, and the general rule for a junior developer would be: You're not allowed to use eval in your code, you need to have a deep understanding of how it can be abused by people. But something for like the null safe operator. It's got to work in a way that's got to be safe for everyone to use without having a really deep understanding of what the details are of its implementation. Derick Rethans 14:14 That makes a lot of sense yes. The RFC talks about a few possibilities that are not possible to do with a null safe operators. What are these things it talks about? Dan Ackroyd 14:25 Until right before the RFC went for a vote. There was, as part of the RFC there was quite a bit more complexity. And it was possible to use null safe in the write context. So you could do something like null safe operator bar equals foo, effectively assigning the string foo something that might or might not be null. I only learnt this recently. You can also foreach over an array into the property of an object, which I've never seen before in my 20 odd years writing PHP. It would be possible to use the null safe operator in there. And you'd be foreaching over an array into either a variable or into null, which is not a variable. The RFC was trying to handle this case, these cases are generally referred to as write contexts, as opposed to just read contexts where the value is being read. The RFC has a lot of work went into supporting the null safe operator in these write contexts. But luckily, somebody pointed out that this was just generally a rubbish idea, and hugely complex. It has a problem that you just fundamentally can't assign a variable or a value to a possibly non existent container. It just doesn't make any sense. So, a couple of days before it went to a vote, Ilija just asked the question, why don't we just restrict it to read contexts. That's where the value of the RFC is, making it easier to write code that's either going to read a value or call a method on something that might or might not be null. All of this stuff around write context was just added complexity that doesn't really deliver that much value. It's nothing to do with the using null safe operator in write context was removed from the RFC, which made it a whole lot simpler. It made it a lot less likely that people would like to code that doesn't do what they expected. Derick Rethans 16:17 I also think it would be a lot less likely to have been accepted in that case, as it stands the vote for this feature was overwhelmingly in favour. Did you think it was going to be so widely accepted or did you think the vote was going to be closer? Dan Ackroyd 16:31 So I thought it was gonna be a lot closer. There are quite a few conversations on the internet, where people raise a point that one I don't fully agree with it was a valid point to make. They were concerned that people use null safe operator in places were having null, rather than an object might represent an error. And if people are just using the null safe operator to effectively paper over this error in their code, then it would make figuring out where the null came from, and what error had occurred to cause it to be there. I think the answer to that is this feature isn't appropriate use in those scenarios. If a variable being null, instead of an object represents an error in your code, then you shouldn't be using this null safe operator to skip over that error condition. You need to not use it and watch a bit more code that explicitly defines, or checks that error, handles it more appropriately, rather than just blindly using this feature, without thinking if they pick your use case. Derick Rethans 17:30 Or of course thrown exception, which is what traditionally is done for this kind of error situations right? Dan Ackroyd 17:35 be a thing so it shouldn't be null, but there's very small chance it is, but only is in situations where you're not aware and throwing an exception so you can error out, and then debug it is the correct thing to do, rather than just silently have errors in your application. Derick Rethans 17:50 Would you have anything else that to the null safe operator? Dan Ackroyd 17:52 Not really. Except say it's quite interesting that quite a few of the new features for PHP eight are, don't technically allow anything new to be done, there just remove quite a bit of boilerplate. For me, it'll be interesting to see the reaction to that from the community because this is something people have criticized PHP for for quite a long time, they're being very verbose language, particularly compared to TypeScript, where a lot of very basic creating objects can be done in very few lines of code. So between the null safe operator, and the object property promotion, that's for some projects, which use a lot of value types, or data from other services where you don't have control over how it's structured, I think these two features are going to remove a lot of boilerplate code. So I think this might improve people's experience of PHP quite dramatically. Derick Rethans 18:40 That ties back into this sort of idea that Larry Garfield has with his object ergonomics, right, especially with the data values that the document was referring to mostly. Dan Ackroyd 18:50 Yeah. Derick Rethans 18:51 I've another question for you, the last one. Which is: What's your favourite new feature in PHP eight? Dan Ackroyd 18:56 Generally the improvements to the type systems are going to cheat by giving two answers. Union types. This will actually be really nice for the imagick extension just because a huge number of the methods in there, accept either string or an imagick pixel object which represents colour internally to the imagick library. Have been able to have correct types on all the methods that currently don't have the right, don't have the correct type information, will be very, very nice. Doesn't make anything new possible. It just makes it easier to reason about the code, so it's easier for tools like PHP Stan to have the correct type of information available rather than having to look elsewhere for the correct type information. Again the mixed types, is very small improvements to the type system in PHP, but it's another piece that helps complete the puzzle of the pulling out the type system to make it be closer to being a complete a type system that can be used everywhere, rather than having to have magic happening in the language. Derick Rethans 20:02 It also will result in more descriptive code right because all the information that you as well as PHP Stan need to have to understand what this is saying or what I was talking about. It's all right there in the code now. Dan Ackroyd 20:15 Yeah, and having the information about what code's doing in the code, rather than in people's heads, makes it easier for compilers, and static analysers to do their jobs. Derick Rethans 20:26 Thank you, Dan for taking the time this afternoon to talk to me about the null safe operator. Dan Ackroyd 20:31 No worries Derick, very nice to talk to you as always. Derick Rethans 20:35 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Null Safe Operator Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 64: More About Attributes London, UK Thursday, July 30th 2020, 09:27 BST In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about a few RFCs related to Attributes. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 64. Today I'm talking with Benjamin Eberlei, about a bunch of RFCs related to Attributes. Hello Benjamin, how are you this morning? Benjamin Eberlei 0:36 I'm fine. I'm very well actually yeah. The weather's great. Derick Rethans 0:39 I can see that behind you. Of course, if you listen to this podcast, you can't see the bright sun behind Benjamin, or behind me, by the way. In any case, we don't want to talk about the weather today, we want to talk about a bunch of RFCs related to attributes. We'll start with one of the RFCs that Benjamin proposed or actually, one of them that he has proposed and one of them that he's put into discussion. The first one is called attribute amendments. Benjamin Eberlei 1:05 Yes. Derick Rethans 1:06 What is attribute amendments about? Benjamin Eberlei 1:08 So the initial attributes RFC, and we talked about this a few months ago was accepted, and there were a few things that we didn't add because it was already a very large RFC, and the feature itself was already quite big. Seems to be that there's more sort of an appetite to go step by step and put additional things, in additional RFCs. So we had for, for I think about three or four different topics that we wanted to talk about, and maybe amend to the original RFC. And this was sort of a grouping of all those four things that I wanted to propose and change and Martin my co author of the RFC and I worked on them and proposed them. Derick Rethans 1:55 What are the four things that your new RFC was proposing? Benjamin Eberlei 1:59 Yes, so the first one was renaming Attribute class. So, the class that is used to mark an attribute from PHP attributes to just attribute. I guess we go into detail in a few seconds but I just list them. The second one is an alternative syntax to group attributes and you're safe a little bit on the characters to type and allowed to group them. And the third was a way to validate which declarations, an attribute is allowed to be set on, and the force was a way to configure if an attribute is allowed to be declared once or multiple times on one declaration. Derick Rethans 2:45 Okay, so let's start with the first one which is renaming the class to Attribute. Benjamin Eberlei 2:51 Yeah, so in the initial RFC, there was already a lot of discussion about how should this attribute class be caught. Around that time that PhpToken RFC was also accepted, and so I sort of chose this out of a compromise to use PhpAttribute, because it is guaranteed to be a unique name that nobody would ever have used before. There's also, there was also a lot of talk about putting an RFC, to vote, about namespacing in PHP, so namespacing internal classes and everything. So I wanted to keep this open at the at the time and not make it a contentious decision. After the Attributes RFC was accepted, then namespace policy RFC, I think it was called something like that, was rejected, so it was clear that the core contributors didn't want a PHP namespace and internal classes to be namespaced, which means that the global namespace is the PHP namespace. And then there was an immediate discussion if we should rename at PhpAttribute to Attribute, because the prefix actually doesn't make any sense to have there. So the question was, should we introduce this BC break potentially, because it's much more likely that somewhere out there has a class Attribute in the global namespace. And I think the discussion, also one, one person mentioned that they have it in their open source project. Suppose sort of the the yeah should we rename it and introduce this potential BC break to, to users that they can't use the class Attribute and the global namespace anymore. Derick Rethans 4:25 So that bit passed? Benjamin Eberlei 4:26 That bit passed with a large margin so everybody agreed that we should rename it to Attribute. I guess consistent with the decision that the PHP namespace is the global namespace. It's sort of documented in a way. Everybody knows that we have the namespace feature for over 10 years now in PHP or oma, yeah exactly 10 years I guess. So by now, userland code is mostly namespaced I guess, and we can be a bit more aggressive about claiming symbols in the global namespace. Derick Rethans 4:54 Second one is grouping, or group statements in attributes, what's that? Benjamin Eberlei 4:59 The original syntax for the RFC, for attributes uses ... the ... we had this before. Remember? I can't spell, I don't know the HTML opening and closing signs. Derick Rethans 5:15 The lesser than and greater than signs. Benjamin Eberlei 5:17 Two lesser than signs and then two greater than signs, for the opening and closing of attributes. And this is quite verbose. So if you have multiple attributes on one declaration, it could be nice to allow to use the opening and closing brackets only once, and then have multiple attributes declared; so that was about the grouping. There was a sentence in there that should there be an RFC that sort of changes the syntax and that would become obsolete. It was also the most contentious vote of all the four because it almost did not pass but it closely passed. Derick Rethans 5:56 32-14 Yeah, yeah. So the group statements also made it in. You can now separate the arguments by comma. The third one is validate attribute target declarations. Benjamin Eberlei 6:07 That one makes a lot of sense to have from the beginning. What is does is, whenever you create a new attribute, you can configure that this attribute can only be used for example on a class declaration, or only on a method and function declaration, only on a property declaration. And so there are different targets, what we call them, where an attribute can be put on. This is something that will be required from the beginning. So, if you start using attributes, they will only work on certain declarations. Let's say we are configuring routes. So, the mapping of URL to controllers using attributes in a controller can only ever be a function, or a method. So it makes sense to require that the route attribute is only allowed on the function and the method. This should be a good improvement. Another example would be if you use attributes for ORM. So database configuration, then you would declare properties of classes to be columns in a database table, but it would never makes sense to declare the column attribute on a method, or on a class, or something like this. So it would be nice if the validation fails, or the usage of the attribute fails on these other declarations. Derick Rethans 7:25 When is this target checked? Benjamin Eberlei 7:27 The target checks are only done, I started referring to this as deferred validation, so that the validation is deferred to the last possible point in the usage of attributes. Which is, you call the method newInstance(), so give me an instance of this attribute on the ReflectionAttribute class. This has been a point of discussion because it's quite late, it will not fail at compile time, for example, even though it potentially could. The idea to do this at this late point is: we cannot always guarantee that an attribute is actually available when the code is compiled. For example if you use attributes of PHP extensions, the extension might be disabled. And also, you might have different attributes of different libraries on the same declaration. If we would validate them early it could potentially lead to problems. Well, sort of one usage of one attribute of one library breaks others. We decided to make this validation as late as possible from the language perspective. I already see that potentially static analysis tools like Psalm, PHP Stan and IDEs will probably put it in your face very early that you're using the attribute on the wrong declaration. And I guess I don't know static analysis tools in PHP have gotten extremely good and more and more people are starting to use them. And I really like the sort of approach that the language at some point can't be too strict. And then the static analysis tools, essentially, introduce an optional build step that people can decide to use where the validation is happening earlier. Derick Rethans 9:06 Where do you define what the targets for an attribute are? Benjamin Eberlei 9:10 This was also a lot of discussion. We picked the most simple approach; the Attribute class gets one new argument: flags, and it is a bit mask, and by default, an attribute is allowed to be declared on all declarations; classes, methods, properties, and so on. You can restrict the flags, the bitmask, to only those targets that you want to allow. So you would, let's say you have the route attribute we talked about before, you specify this as an attribute by putting an attribute at the class declaration. And then you pass it the additional flags and say only allowed using the bitmask for function method. I think it's just one constant, you use a constant to declare it's only allowed on a method or a function declaration. Derick Rethans 9:59 Can you for example set it a target is both valid for a method and a property? Benjamin Eberlei 10:02 You can, because it's a bit mask you can just use the OR operator to combine them, and allow them to be declared on both. Derick Rethans 10:10 It's also passed, so this is also part of PHP eight. And then the last one is the validate attribute repeatability Benjamin Eberlei 10:20 Repeatability essentially means you're allowed to use an attribute once, or multiple times on the same declaration. Coming back to the routing example, if you're having route attribute, then you could say this is allowed multiple times on the same function, because maybe different URLs lead to the same controller. But if you have a column property for example, you could argue, maybe it doesn't make sense that you define a column twice on the same property so it's mapped to two columns or something. So you could restrict that it's only allowed to be used once. This is also validated again deferred, so only when newInstance() called, it would throw an exception if it's sort of the second or third usage of an attribute but it's only allowed to be there once. Derick Rethans 11:06 What's the default? Benjamin Eberlei 11:07 The default is that the attributes are not allowed to be repeatable and you would, again using the flag arguments to the attribute class, put the bit mask in there that it's allowed to be repeatable. Derick Rethans 11:21 Is there anything else in that RFC. Benjamin Eberlei 11:23 No. Derick Rethans 11:25 The second one was actually something that came out of the first one that she did. The original RFC already mentioned that she could for example have attributes called deprecated, or JIT, or no JIT. I suppose this RFC fleshes out the deprecated attribute a little bit more? Benjamin Eberlei 11:40 Idea was to introduce an attribute "deprecated", which you can put on different kinds of declarations, which would essentially, replace the usage of trigger_error(), where you would say, e_user_deprecated and then say for example: this method is deprecated use another one instead. Benefit of having an attribute versus using trigger_error() is that static analysis tools have an easier way of finding them. It's both a way to make it more human readable because attributes are declarative, it's easy to see them and spot them. And also machine readable. Both developer and machine could easily see that this method, or this class is deprecated, and stop using it, or using the alternative instead. The way it was done at the moment, or until now, mostly you're using a PHP doc block @deprecated in there. Please understand that. You can also read it as human, of course, the problem is that it has no really no runtime effect. That means during development for example, you couldn't see that you're accidentally using deprecated methods and collect this information, or even on production for example, because it's only in a doc block. And with an attribute, it would be possible to have this information at runtime to collect it, aggregate this information so that you see, it's been deprecated code is being used, and also make it human readable and for the IDEs and everything. Derick Rethans 13:12 When I looked at this RFC, it was under discussion, are you intending to bring this up for a vote, before 8.0 because you don't have a lot of time now? Benjamin Eberlei 13:19 I'm not. The reason is that first, you say, I'm running out of time, and I'm a bit stressed at the moment. Anyways, and even then, there are two points that I wanted to tackle it first. I mentioned that you can use the deprecated attribute on methods and functions which is something people are using now. One thing that I saw would be a huge benefit which is not possible at the moment is to be deprecating the use of classes, vacating the use of properties and constants. This is something you cannot hook into at runtime at the moment. So you cannot really trigger it. Essentially has led to a lot of hacks specifically from the Symfony community because they are very very good about deprecations. So they are, they have a very very thorough deprecation process, and they have come up with a few workarounds to make it possible to trigger this kind of things that would like to make this a much better supported case from the language. The problem there is that it touches the error handling of PHP and the usage of the error handler for deprecation triggers is a contentious problem, because it triggers global error handlers as by this side effects, sometimes effects on the performance because it logs to a file. There was a lot of questions if we can improve this in some way or another, and I haven't found a good way for this yet and I hope to work on this for a month. Derick Rethans 14:47 Maybe 8.1 then. The third RFC that goes into attributes, is the shorter attribute syntax RFC. What's wrong with the original syntax, where you have lesser than, lesser than, attribute greater than, greater than? Benjamin Eberlei 15:03 The name implies, the verbosity is a problem, I would agree. So writing, writing the symbols lesser than or greater than twice, yeah, takes up at least four symbols. There's one problem with shorter attribute syntax would be nice. Problem I guess is that this syntax is sort of a unicorn. We do refer to HHVM as one language or Hack, in that case one language that uses it. Whatever the Hack has essentially no adoption and is not very widespread in usage, and since they have parted ways with PHP. They also have a pass to removing the syntax and they, I think plan to change the syntax to use the @ symbol instead of the lesser and greater. So one problem would be that we were like, language wise, we would be a unicorn by using a syntax that nobody else is using this also I guess a problem. And then there are some, some problems that focusing on future RFCs. Something that I don't see myself really advocating or wanting, is allowing to nest attributes, so one attribute can reference another attribute, so that you have a sort of an object graph of attributes stuck together. This is necessary if you want to configure, if you want to use attributes to use very detailed configuration. Not sure if it makes sense at some point. There's this trade off where you should go to XML, YAML, or JSON configuration instead off attribute based configuration. To confusion with potential future usages of PHP and existing usages, so: generics. PHP would ever get generics, in some way, it would use the lesser and greater than symbols, but only once. I cleared up with Nikita before that there's no conflict between so that that would not be a problem. However, if we had the symbol used, they would be near each other. So attributes would come first and then there would be the function declaration using generics and types. These symbols would be very near each other so it might introduce some confusion when reading the code. Then there's obviously the existing usage of this shift up operators so if you're doing shift operations with bits, then they are using exactly the same symbol, the lesser than, greater than symbol twice, to initiate this or declare this operation. Derick Rethans 16:18 I realized there's a few issues with the pointy pointy attribute pointy pointy, to give it a funnier name. Benjamin Eberlei 17:34 Pointy pointy is good. Derick Rethans 17:41 I don't think I've come up with a name for a specific spaceship that looks like this yet, but there were a few other proposals. The original RFC already had the @:, which was rejected in that RFC. There was another proposal made that uses the @@ syntax to do attributes. It this something used by all the languages. Benjamin Eberlei 18:01 Syntax was the most discussed thing about attributes, so everybody sort of agreed that we need them or they we benefit. Nobody could agree on the syntax. So there were tons of discussion everywhere about how ugly the pointy pointy syntax is, and because we're using the @ symbol and the doc blocks already you by we couldn't use the @ symbol for attributes themselves, so there are lots of discussions running in circles. Just using the @ symbol once it's just not possible because we have the error suppression operator. Then a lot of people always suggest okay let's get rid of that, because it's anyways it's bad practice to use it Derick Rethans 18:38 It's a bad practice but sometimes it's necessary to use it. Benjamin Eberlei 18:41 So there are some APIs in the core that cannot be used without them. For example, unlink, mkdir, fopen. Because of race conditions you have to use the symbol there in many cases. Even if we would deprecate and remove that sort of syntax we would still somehow need it in a way for these functions. So I'm not seeing that we will ever remove them, and even then it will take two cycles of major versions, so it doesn't make sense. That one is out. The pointy syntax was something that Dmitri proposed many years before and people were sort of in favour of it, so I didn't change it for the initial RFC. We looked a lot at different syntaxes but none of them were nice and didn't break BC. And then when the vote started some core contributors proposed that maybe we can do slight BC breaks in the way we have existing symbols and one proposal was to use @@, which means we are using the @ symbol twice. This is actually allowed in PHP at the moment, we consider breaking this not a big problem because you can just use it once and it has the same effect. Problem would only be going through the code and finding all @@ usages which we assumed are zero anyways. Derick Rethans 19:57 Is there any other language that uses @@? Benjamin Eberlei 19:59 A few other languages use just to @, like many other, like many people propose PHP should also do, which I explained I guess is impossible. It's only proposed because it's looking close enough to @, and so it wouldn't be a huge difference and sort of familiar in that way. Derick Rethans 20:18 There was another alternative syntax proposed which is #[ attribute ]. Where does this come from? Benjamin Eberlei 20:25 So the @@ syntax is different to the pointy the syntax that doesn't have a closing symbol, it only has the starting symbols. Nikita and Sara both said that would be fine as a BC break and Sara is the release master for PHP eight, so she has a lot of pull on these kinds of things. Nikita as well. So, they said it's okay to have small BC breaks, we did look at other syntaxes. I saw that, which I liked before already but couldn't use because would be BC break, was the yeah hash, and then square brackets, opening and there's a closing square bracket close. This is used in Rust. It is a small BC break in PHP because the hash is used as a one line comment. It's not used that much any more, by people to use comments because coding styles discourage the use of it. I use it myself sometimes, like, specifically in one time shell scripts and stuff it's nice to use it, to use the hash as a comment. And the change, would there be that if you have a hash, followed by any other symbol than an opening square bracket, it would still be a comment. If the first symbol is square bracket opening, then it would be parsed as an attribute instead. Derick Rethans 21:42 So this is something you say that rust uses? Benjamin Eberlei 21:43 Yes. Derick Rethans 21:44 Because they're pretty much three alternatives now, we have the original pointy pointy one, which I still think is the best name for it, @@ operator, and then the hash square brackets operator. With three alternatives it's difficult to do our normal voting which is, how would you two thirds majority, but with three variants that is a tricky thing. As a second time ever, I think we use STV to pick our preferred candidate. Well this is STV kind of thing. Benjamin Eberlei 22:11 You need to help me with the abbreviation any ways resolving it. Derick Rethans 22:14 Single transferable vote. Benjamin Eberlei 22:16 STV is a process where you can vote on multiple options, and still make sure that the majority wins, or that the most preferred one wins. Ensure that it has a majority across the voters. The way it works is that everybody puts up that their preferences. There are many different options, in our case three, everybody puts their preference, I want this one first, this one second, is one third. And then the votes are made in a way where we look at the primary votes, who gave whom the primary vote. And then the option that has the least votes gets discarded. All those people that pick the least one, they have a second preference, and we look at the second preference of these voters and transfer, sort of change, the votes, their originally, but discarded option to the options that are still available. Now, somebody has the majority, they win if they don't, we discard another option and it goes on and on. With just three options, it's a transfer of a vote would essentially only happen once, because once one is discarded, one of the two other ones, not must have because it can be a tie, but probably has a majority, yeah. Derick Rethans 23:34 In this case, when we're looking at votes , when were counted them, it actually was obvious that no transferring of votes was actually necessary because the @@ won outright, and had quorum. In the end it up ended up being a pointless exercise, but it's a good way of picking among three different options. Which one won? Benjamin Eberlei 23:53 The @@ syntax won over the Rust Rusty attributes, and the pointy, pointy pointy attributes. One was clearly chosen as a winner, I think, but 54%, 56% in the first vote already. However, one thing that came up in the sort of aftermath of the discussion is that the grammar of this new syntax is actually not completely fine parse any language like PHP or any other programming languages to have defined the grammar and then the grammars inputs to a parser generator, as required some hacks to get around the possibility of using namespaces relative name step basis afterwards. This is something where Nikita satisfaction not allowed. In the way PHP builds its grammar, without any further changes @@ is actually going to be rejected as a syntax. However, Nikita independently worked on sort of change in the tokensets of PHP, that would provide a workaround, or not a workaround. It's actually a fix for the problem. If this new way of declaring tokens for namespaces would be used in PHP eight, then it's also possible to use the @@ syntax for attributes. If we don't change it, then the @@ syntax is actually not going to make it. Derick Rethans 25:18 In my own personal opinion, I think, @@ is a terrible outcome of voters is going to be so does that mean I should vote against Nikita's namespace names as token RFC or not? I mean it's a bit unfair. Benjamin Eberlei 25:30 I will vote for it. The premise of Nikita's RFC, and he worked on it before the @@ problem became apparent, so he did not only work on it to fix this problem. The problem he's trying to fix is that parts of namespace could use potentially keywords that we introduce as keywords in the future. An example is the word enum, and people are talking about adding enums to PHP for years. He had also one use case where one of his own PHP libraries broke by the introduction of the FN keyword for PHP 7.4, the short closure syntax. Essentially, it would help if some part of a namespace would contain a word that becomes a keyword in the future, that it wouldn't break the complete namespace. Because at the moment, the namespace. In this case is burned. You can't even use it anywhere in the namespace. The key word, which I guess it's a problem. But this reason I am actually for this proposal that Nikita has, even though I agree with you. I was actually a proponent of the Rusty attribute syntax, so I would have preferred to have that very much myself. Derick Rethans 26:40 I hope to talk to Nikita about what this namespace token thing is exactly about in a bit more detail in a future episode. Feature freeze is next week, August 4th. Are you excited about what PHP eight is going to be all about? Benjamin Eberlei 26:54 It's been a lot of last minute, things that have been added that to look really, really great. I can't remember PHP seven any more that closely, but I remember that it was the case with PHP seven as well. So anonymous classes was added quite late. Essentially PHP seven initially was just about the performance and then there was a lot of additional nice stuff added, very late, and made it a from a future perspective, very nice release, and it seems, it could be the same for PHP eight. Match syntax, attributes, then potentially the named parameters, which at the time of our speaking is still an open vote but looks very very promising. Derick Rethans 27:36 And then of course we have union types as well, and potentially benefits of the JIT right? Benjamin Eberlei 27:40 Union types this already accepted for so long I forgot about it. Derick Rethans 27:45 Alpha versions of PHP eight zero are already out, so if you want to play with these features you can already, and please do. And if you find bugs, don't assume it's you doing something wrong, instead file a bug report at https://bugs.php.net. Thank you, Benjamin, for talking to me about more attributes, this morning. Benjamin Eberlei 28:03 Thank you very much for having me. Derick Rethans 28:06 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Attribute Amendments RFC: Deprecated attribute RFC: Shorter Attribute Syntax RFC: Treat namespaced names as single token Psalm PHPStan Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 63: Property Write/Set Visibility London, UK Thursday, July 23rd 2020, 09:26 BST In this episode of "PHP Internals News" I talk with André Rømcke (Twitter, GitHub) about an RFC that he is working on to get asymmetric visibility for properties. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 63. Today I'm talking with André Rømcke, about an RFC that he's proposing titled property write/set visibility. Hello André, would you please introduce yourself? André Rømcke 0:38 Hi Derick, I'm, where to start, that's a wide question but I think I'll focus on the technical side of it. I'm been doing PHP for now 15 plus years. Actually started in, eZ systems, now Ibexa, met with you actually Derick. Derick Rethans 0:56 Yep that's a long time ago for me. André Rømcke 0:58 And well I came from dotnet and front and side of things. Eventually I learned more and more in PHP and you were one of the ones getting me deeper into that, while you were working on eZ components. I was trying to profile and improve eZ publish which what's left of the comments on that that point. A long time ago, though. A lot of things and I've been working in engineering since then I've been working now actually working with training and more of that, and the services and consulting. One of the pet peeves I've had with PHP has been properties for a long time, so I kind of wanted several, several years ago to look into this topic. Overall, like readonly, immutable. To be frank from our side that the only thing I really need is readonly, but I remember the discussions I was kind of involved for at least on the sideline in 2012/ 2013. Readonly and then there was property accessors. And I remember and there was a lot of people with different needs. So that's kind of the background of why I proposed this. Derick Rethans 2:04 We'll get back to these details in a moment, I'm sure. The title of the RFC is property write/set visibility. Can you give me a short introduction of what visibility actually means in this contract, in this context? André Rømcke 2:16 Visibility, I just use the word visibility because that's what doc usually in php.net says, but it's about access the property, so to say, or it being visible to your code. That's it on visibility but today we can only set one rule so to say, we can only say: this is either public, private, or protected, and in other languages, there are for good reasons, possibilities to have asynchronous visibility. So, or disconnected or whatever you want to call it, between: Please write and read. And then in, with accessors you'll also have isset() and unset() but that's really not the topic here at this point. Derick Rethans 2:56 PHP sort of supports these kind of things, with it's magic methods, like with __get() and __set(). And then also isset() and unset(). You can sort of implement something like this, of course, in your own implementation, but that's, of course, ugly, I mean, ugly or, I mean there was no other way for doing it and I remember, you made have a user that in eZ Components that you mentioned earlier. What is the main problem that you're wanting to solve with what this RFC proposes? André Rømcke 3:25 the high level use case is in order to let people, somehow, define that their property should not be writable. This is many benefits in, when you have multiple API's, in order to say that I have this property should be readable. But I don't want anyone else about myself to write to it. And then you have different forms of this, you have either the immutable case were you literally would like to only specify that it's only written to in constructor, maybe unset in destructor, may be dealt with in clone and so on, but besides that, it's not writable. I'm not going into that yet, but I'm kind of, I was at least trying to lay the foundation for it by allowing the visibility or the access rights to be asynchronous, which I think is a building block for moving forward with immutability, we only unintentionally also accessors but even, but that's a special case. Derick Rethans 4:24 What is the syntax that you're suggesting to implement this, this feature? André Rømcke 4:30 Currently in the RFC there's two proposals. There's one where you have, for instance public:private. First you define the read visibility and then you run the find the write visibility. That was based on the email list back in 2012 proposal made back then. The one is inspired by Swift, the apple language, where they actually have a concept for this. So they have public than being the read access and then they have private and in the parenthesis they have (set). Basically this is then set. Benefit of this maybe it will probably be a better match for our reflection API, in terms of getting the modifier names and stuff like that. Secondly, the terminology with set they're kind of just nicely with accessors, so that's why I added this alternative suggestion to the syntax. Derick Rethans 5:27 Would either of these would it be possible to extend this to get actual setters and getters later? André Rømcke 5:32 Yes. Well this is a good question actually and Larry brought this up on the mailing list. In terms of how this actually should align up to a potential future accessors, if that is ever added. Made a good point, we started to rethink this a bit so I think I might have to discuss with him and also Nikita on this, but his point was: okay so if we add this now, how will this look, if you add accessors later to your code. The mindset of thinking I was thinking, this would be an either or. Oh, this would be kind of a shorthand for what you can specify in much more details with accessors. So once you use accessors you would remove this word, because it doesn't have a meaning any more. It's basically language syntax sugar, so to say, or short hand. Derick Rethans 6:21 Besides the two that are in the RFC. Did you consider any other syntaxes that you discarded? André Rømcke 6:26 Yeah I actually had a previous RFC where I, which is linked to in this one, where I was looking into how it might look like with the new attributes, instead for directly for the read only and immutable keywords for instance. Some might find it strange that I'm using referring to both, keywords, because in some languages they mean one and the same. That first draft explored a bit on the attributes syntax how that could work. I like that a lot actually. But it made me realize that underneath, what we're actually dealing with, is asynchronous visibility. So I don't think necessarily people agree with me but I think from at least in how we expose this in a reflection API. It's much nicer to work with if it's purely about, okay, do I have write access or not, as opposed to having to deal with a checking if there is an attribute called readonly in the reflection cote, which for me, looks like code smell or. It looks like it's solving the wrong problem or, I don't know how to phrase it. Derick Rethans 7:32 And I think I agree with you there. I'll always consider attributes as something that modifies the attribute, but visibility would be something that is so inherent of a property that it doesn't really make a lot of sense to do it that way, but that's my feeling about it. I mean, it makes sense to have an attribute for JIT or no JIT because that's some contextual meaning to something else. Similarly, like if you have the ORM attributes or column attributes that people have been suggesting and mainly estate, they convey information about the properties to some other third party tool, not necessarily to PHP itself. Because PHP doesn't care about its JIT or not. It's OPCache cares about it and this would be a similar way right? André Rømcke 8:13 So in this case, it might have been better as a keyword, but I was, so I was exploring that as well and that's what's been done in the past in 2012 and then later in 2018 with that immutable RFC, and then again in 2020 with the writeonce RFC which is doing this, different semantics and focusing more on the write once but anyway those three cases were keywords. I would guess that you would prefer rather a keyboard on for this kind of things. Derick Rethans 8:42 Oh you have keywords already right, I mean private and public are keywords, or the alternatives that you have with having a modifier on when this keyword exists. I mean, is a similar idea of doing it. I mean, I haven't really thought about which ones I prefer more than the other but I would definitely prefer something like this over an attribute. André Rømcke 9:00 Either way, the benefit of doing adding a keyword or attributes for this is that you could, you could also allow them on the class level so you could allow it to be okay, unless anything else is to find then, please let this be the default but all properties be, for instance, immutable, unless they say otherwise. They would need the keyword on for not immutable or something like that. In a immutable case, if you say on the class it's immutable, the whole class should be immutable, and I'm sorry. Derick Rethans 9:27 I don't think it makes sense to that subdivided does it? André Rømcke 9:30 For me at least that distinction, which could exist between readonly and immutable. Readonly, if you look away from how C sharp is defining it and rather look to Rust. In Rust, readonly is basically the modules can read it, but not write it, your module can write to it. So that's the definition of readonly and I think there are use cases for it. But, Marco is challenging me on that so I need to find. Derick Rethans 10:00 It's often good when Marco challenges people on these things because I would trust him with making that kind of semantic choices over many other of the people that are really good working on PHP engine, for example. You slightly touch reflection, that has to be modified to support it, well either keyword or keyword modifier ,or whatever you want to call it in the end. Is there is something else that needs to be changed besides Reflection, or how does reflection, need to be changed? André Rømcke 10:29 Reflection needs to one way or the other expose, at least this RFC, expose the fact that you have now disconnected or asynchronous read and write visibility. But other than that, I don't see a big problem for reflection because, at least, in most cases, code tend to use setAccessible(). This will work as before, basically, you will get access in the instance of the reflection property. If there's any code out there that is checking isPrivate(), isProtected(), isPublic(), they might need to check if the read or write access visibility is as they expect. I haven't never had the need to use this myself but there's probably use cases out there for code dealing with unknown objects, for instance. Derick Rethans 11:20 And I saw in the RFC that you're suggesting to slightly change what isAccessible() does, but also add new methods to specifically check for the setability and getability. What are the names of these methods that you're wanting to add because I can't quite remember? André Rømcke 11:37 So the names of the reflection properties suggested currently in RFC is isSetPrivate(), isSetProtected(), isSetPublic() and then similar for the gets so isGetPrivate(), isGetProtected(), and isGetPublic(). And this would be, in addition to the current isPrivate(), isProtected(), isPublic(), which would be then have to be just a tiny bit to rather affect the visibility of both, so to say. So in the case of public both actually needs to be public for this to return true. Same with protected could also be protected:public or in the case of the other syntax protected, but public set. So, a kind of a write only property, then it would be considered protected, along with the private. Derick Rethans 12:34 So this brings me actually to point that, when seeing that your second syntax which has this set in between parenthesis, to getter didn't have anything within parenthesis. Could it make sense to have, get in parenthesis there instead of having it not marked at all? André Rømcke 12:51 It could be, but this, this this kind of brings us to maybe our downside with the syntax. And also, one, one point that Larry was making in terms of how this would fit with accessors, because it was basically proposing to reverse it. So it would be set colon private for instance, so that the order would be more in line with what might come with accessors later. So, Nikita basically proposed that we move this to a PHP 8.1 and even though I haven't updated the RFC yet I agree. So, I expect us to do more discussions around the syntax and maybe look at this as Nikita proposed on a higher level. Look at how this different needs can be aligned in a better way. Derick Rethans 13:37 You mentioned PHP 8.1, if that's the case, then this is the first RFC where it's discussing for for PHP 8.1, which is kind of nice because of course feature freeze is coming pretty soon now. And that's even closer to when this podcast comes out at the end of July. Is there any potential for breaking backwards compatibility with the addition of the syntax that you're proposing? André Rømcke 13:58 There's a few things which I haven't explored yet and defined in RFC and that is with this RFC, the probably needs., well, there definitely needs to be the definition on how property exists and these kind of things should work, and also other methods that check about. There's also isset() and unset(). Needs to be defined how this should work, so I don't know currently about any clear BC breaks per se, currently I'm only aware of it's being the case once people use it. For existing code I haven't seen any BC breaks yet, but for any code that starts to use it. You will need to adapt your, your reflection code potentially, and we'll need to look into how propertyExists(), and isset() and so on will behave. Derick Rethans 14:41 What's the feedback been so far? André Rømcke 14:43 From others in Symfony slash CMS is part of the realm, it's kind of positive. They would very much like this kind of features to easily more granularly say how this property should be should be accessed. So that's a definite need out there. There are a few language purists maybe, or whatever we should call it, that are clearly against this and allowing this kind of pragmatic access to define whatever you want. While I kind of understand it, and to some extent agree with it, I still think there's usefulness in defining the underlying semantics, on the, on the access to the properties. Yes for immutable, it's not right there yet, you could see the future where we have, if we use the words from from C sharp, it could be public:init for instance, or something like that, whatever the syntax is to define that this is public property, but it's only writable in constructor and potentially destructor and so on. So there's definitely missing a lot here to enable the full immutability semantics. Personally, I think this could be the underlying semantics of immutability, even though that would be that maybe the keyword we expose some promote more to use for for entities and grant uses. Derick Rethans 16:05 I don't have to ask you then when you think you'll be putting up this to vote if if you end up retargeting this for PHP 8.1, because you have a full year to go for that. André Rømcke 16:14 To be honest, I'm not a C developer. So, even though I think I got some pointers that you for trying to debug the session code in PHP but didn't go beyond that. So, unless anyone volunteers to code the patch for this in the next few weeks I don't see us moving this to that point right now. Derick Rethans 16:34 I mean, this could be a good starting point for exploring how to implement all the concepts that you were talking about such as immutability and readonly concepts and things like that. In that that's always we're doing it right and it's not a bad thing that takes a longer time than just the minimum discussion period of two weeks. All these complex interactions are always going to be very difficult to sort out anyway. André Rømcke 16:56 On one side of course I would like to have this as soon as possible, but I've been waiting since 2011 so I can wait another year. Derick Rethans 17:03 What is a year on a decade. Alright André would you have anything else to add? André Rømcke 17:08 No. One thing to add this, I, no matter what we end up with here there is definite need to avoid the boilerplate code, dealing with magic methods and also the performance it of using those. It would be great and simpler to get into the language without this stuff. Derick Rethans 17:25 Thank you, André for taking the time this morning to talk to me, as I said, I think this will be a good starting point for a somewhat longer discussion. André Rømcke 17:32 Thank you as well this was a great opportunity to meet you again of course, and also great to be on your podcast. It's a nice podcast to follow. Derick Rethans 17:43 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Property write/set visibility Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
Jonty's Twitter - https://twitter.com/jontybehrCodeigniter - https://codeigniter.comMettle - https://mettle.io/Laravel Live UK - https://laravellive.uk/PaperTrail - https://papertrail.com/Understand.io - https://understand.io/Xdebug - http://xdebug.org/Bugsnag - https://bugsnag.com/ELK Stack - https://www.elastic.co/what-is/elk-stackTICK stack - https://www.thoughtworks.com/radar/platforms/tick-stackMatt's live stream with Derick - https://www.youtube.com/watch?v=iloCjuqMdKULaravel discord channel - https://t.co/4fjanVoFIE?amp=1Send Portal - https://sendportal.io/Charity Majors and Observability - https://hub.packtpub.com/honeycomb-ceo-charity-majors-discusses-observability-and-dealing-with-the-coming-armageddon-of-complexity-interview/ Transcription sponsored byLarajobsEditing sponsored byTighten
PHP Internals News: Episode 62: Saner Numeric Strings London, UK Thursday, July 16th 2020, 09:25 BST In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed to make PHP's numeric string handling less complicated. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 62. Today I'm talking with George Peter Banyard about an RFC that he's proposing called saner numeric strings. Hello George, how are you this morning? George Peter Banyard 0:36 How are you; I'm doing fine. I'm George Peter Banyard. I work on PHP, and I'm currently employed by The Coding Machine to work on PHP. Derick Rethans 0:46 I actually think I have a bug swatter from The Coding Machine, which is hilarious. Huh, I can't show you that okay of course in a podcast and not on TV. But yes, I think I got it in Paris at some point at a conference there, and it's been happily getting rid of flies in my living room. Anyway, that's not what we want to talk about here today, we want to talk about the RFC that is made, what is the problem that is RFC is hoping to address? George Peter Banyard 1:09 PHP has the concept of numeric strings, which are strings which have like integers or floats encoded as a string. Mostly that would arrive when you have like a get request or post request and you take like the value of a form, which would be in a string. Issue is that PHP makes some kind of weird distinctions, and classifies numeric strings in three different categories mainly. So there are purely numeric strings, which are pure integers or pure floats, which can have an optional leading whitespace and no trailing whitespace. Derick Rethans 1:44 Does that also include like exponential numbers in there? George Peter Banyard 1:48 Yes. However trailing white spaces are not part of the numeric string specification in the PHP language. To deal with that PHP has a concept of leading numeric strings, which are strings which are numeric but like in the first few bytes, so it can be leading whitespace, integer or float, and then it can be whatever else afterwards, so it can be characters, it can be any white spaces, that will consider as a leading numeric string. The distinction is important because PHP will sometimes only accept pure numeric strings. But in some other place where we'll accept leading numeric strings. Of casts will accept whatever string possible and will try to coerce it into an integer. In weak mode, if you have a type hint. It will accept leading numeric strings, and it will emit an e_notice that a non well formed string has been encountered. When you use like a purely string string, you'll get a non numeric string encountered warning. So the main issue with that is that like strings which have a leading whitespace are considered more numeric by PHP than strings with trailing whitespaces. It is a pretty odd distinction to make. Derick Rethans 3:01 For me to get this right, the numeric string in PHP can have whitespace at the start, and then have numbers. There's a leading numeric string that can have optional whitespace in front, numbers and digits, and then rubbish. Then there's a non numeric string which never has any numbers in it. George Peter Banyard 3:22 No numbers in the beginning. "HelloWorld5" will be considered non numerical. Derick Rethans 3:26 So it's a string that doesn't start with digits. George Peter Banyard 3:29 Yes, or optional whitespace. Derick Rethans 3:31 So there are three different numeric strings, sort of. There're two, and then one that is a string that doesn't have numbers. And you mentioned that some places. These are accepted and in other places they're not. So typecast will accept both numeric strings and leading numeric strings. Where is the leading numeric string, not accepted? George Peter Banyard 3:53 If you use is_numeric call, it'll only return true on pure numeric strings. Derick Rethans 4:00 And they have whitespace ain the end? George Peter Banyard 4:02 They can only have leading white spaces. Explicit typecasting will work regardless, so even on non numeric strings, an int cast that will convert it to to zero, because that's how tight juggling works in PHP, and it will do. American leading numeric strings, it will take us to the initial leading numeric. Derick Rethans 4:27 And stripping out leading whitespace if there's any? George Peter Banyard 4:30 Strip stripping leading white spaces and stripping garbage out of the end if it's a just a leading numeric string. String to string comparison with the double equal comparison operator will perform a numeric compare comparison, only if both strings are numeric, purely numeric. Whenever you do a string to int, or float comparison, the string will get type juggled to an int or to a float, regardless of its numericness. So, we'll get non numeric string for get typecast into zero implicitly, and you'll get warnings, but it has some odd behaviour. In weak typing mode, so strict types disabled, an int typecast where an int type declaration for an argument. When you pass it an numeric string to it. If it's a leading numeric string, it will convert it was an E notice, and it will do a type error if it's a non numeric string. This can be a slight issue, if you for example you pass in a hash, it should be a string. As always, but it starts with like a digit, then it will get type juggled to an int. And it will pass the type declaration check and just like work with. Derick Rethans 5:54 And you're get a notice? George Peter Banyard 5:56 So you get a notice. Whereas like if it's, if it would be an a hash was just purely which starts with a with a character, you would get an e_warning, as in like a non well formed string like numeric string has been encountered. Derick Rethans 6:10 That sounds quite complicated. You mentioned that there's one other place where you can use numeric strings, which is in array keys. George Peter Banyard 6:21 Yes, array keys and string offsets. So array keys have a special semantic, which are like integer strings, which are separate concept and kind of same; as in, it needs to start with a nonzero digit, or be zero. For the zero index. It needs to be only digits, and that will be interpreted as an integer key. Otherwise, anything else will be interpreted as a string key, "5.5", which is a float like a numeric float string, will stay as "5.5" as the array key. This behaviour is different to string offsets. Derick Rethans 7:07 So you're saying that a string with "5.5" in it, in array key stays "5.5"? George Peter Banyard 7:15 Yes, and the same if you have a string key which is "03", you'll get a string key which is "03", it won't get evaluated as three. You can try it yourself, because it is the most weirdest behaviour, ever. I got what's quite surprised about that. Derick Rethans 7:32 You are correct, but if it's a float it gets truncated. George Peter Banyard 7:36 Yes, to five. Derick Rethans 7:38 Hey, I've learned something new here, I thought that would also truncate. George Peter Banyard 7:41 That would be kind of logical, in some sense, but it doesn't. Derick Rethans 7:46 Continuing George Peter Banyard 7:47 Array offsets have this behaviour, string keys have the more usual behaviour of using numerical, like numeric strings, as there can't be a string offset first, like it can only be like an integer. So that's why it's more lax, in some sense, it will use the usual semantics. However, if the numeric string is a float, or if it's a leading integer string, it'll emit the illegal string offset warning, but still used explicit int cast to cast it to an integer. "2str" would be cast to two, like a string index "foo" would be casted to zero, and "5.5" would be cast it to 5. It's all kind of confusing I wish doesn't follow other illegal offset behaviour for some sentence. If you try to pass an array as a as an offset you'll get a type error in PHP 8. Derick Rethans 8:55 I have to admit, I am totally getting lost here. This sounds also complicated, and that something needs to be done about this. Am I correctly understanding that this is exactly what your RFC is trying to do? George Peter Banyard 9:08 Yes, this is an attempt to bring back sanity into this whole mess. Derick Rethans 9:13 So what are you proposing here? George Peter Banyard 9:14 The proposal is to get rid of the concept of leading numeric strings, because it's mostly weird, and it's more confusing than it needs to be. To do that, numerical strings, will accept trailing white spaces. So numeric string which has leading whitespace won't be more numeric than a string with trailing white spaces. On top of that, all current, e_notices a non well formed numeric value encountered, will be changed to emit a non numeric value encountered e_warning. There's a promotion and severity in some sense as well. Should only affect purely non numeric strings, or leading numeric strings with have jibberish after the digit. For string offsets, numeric strings which correspond to well formed floating point numbers will emit the more usual string offset cast occurred warning, instead of the illegal string offset. Leading numeric strings which currently emit a non well formed numeric value and countered notice will emit the illegal string offset, and still continue to evaluate the previous value to ease the migration to PHP eight and for backwards compatibility. However, non numeric strings, which don't represent a number at all. Now throw in an illegal offset type error. This would affect our estimates operation on strings, so plus minus, multiplication, etc. Then float type declarations. So, in turn, float type declaration for internal and user land functions. Comparisons operator which considered that numeric strings with trailing white spaces weren't numeric, and so would produce false, say for example, the string "123 ", equal, equal to string " 123" will now produce true instead of false. The built in is_numeric function would return true for numeric strings which have trailing white spaces, where before it would emit false. And the plus plus, minus minus, increment, decrement operators would convert numeric strings with trailing white spaces to integers or floats and use the numerical increment instead of the alphanumeric would increment rules. Derick Rethans 11:35 You say whitespace, do you just mean the space characters or does it include like tabs and returns as well? George Peter Banyard 11:43 Tabs, new lines vertical ,spaces. Mostly what would consider white spaces. Derick Rethans 11:48 I guess there's a horizontal tab and a vertical tab and stuff like that. What's the potential for for breaking changes here because messing around with PHP's type juggling rules is always a bit tricky. What are the BC implications here? George Peter Banyard 12:05 I would expect most reasonable code to not be affected. It changes, one which is relatively minor, which is, if you, for some reason, your code needs the string to be numeric and only have leading white spaces, but no trailing white spaces, which is a pretty specific requirement. Then accepting trailing white spaces would break that code, because that would be considered a valid numeric string, whereas the code assumes that would be non non well formed, which is an odd requirement to have. That's why I don't expect it to be that big. Second one, more problematic one, is code which has liberal use of leading numeric strict. If for example you pass the DOM, an XML or a CSS file or something, and you get 2px, for example, for 2 pixel. And you just take that string, and dump it into various things and expect it to get two out of it. Sometimes you will need to now use an explicit cast to get the previous behaviour. That would be notified by you or by the by an e_notice in PHP 7.4, and it would it would inform you with a e_warning in PHP 8. Derick Rethans 13:28 Considering you get a warning ish thing in both cases it's not really a BC break, I mean it's not suddenly going to start throwing an exception, which could break your code flow for example. George Peter Banyard 13:39 Yes, and also all behaviour should be identical to PHP 7.4 and PHP 8. If there wasn't a warning before, if it was a notice, and it's been moved to a warning, the behaviour should be the same, except for like non numeric strings which sometimes will emit a type error, that's most likely a bug, were you expecting something to be an integer like and it's just pure or strict. Derick Rethans 14:07 Oh, of course for user input, we know we shouldn't casting anyway, we should use the filter extension to get to this data, does this impact the filter extension at all? George Peter Banyard 14:19 No, I don't think so. I don't think the filter extension uses the C is_numeric, is_numeric_string function. And it uses its own parsing of strings. Derick Rethans 14:30 Have you gotten any feedback about this so far? George Peter Banyard 14:33 Some feedback was to clarify some of the changes if it would affect code. Also, I had some doubt about how to handle the string offset case, which initially one of the proposals was to promote the leading number of strings to emit the warning, but also returned zero instead of returning the previous value, which would be pretty hard to detect, although they emitted a notice previously. So I've changed that again to like more in line with the behaviour, it has in PHP seven, where it just truncates the gibberish and cast it to an integer. So at least that BC concern should be removed. Derick Rethans 15:24 As I mentioned, this is all pretty hard to wrap my head around, not because you don't explain this correctly, but mostly because it's so complicated to begin with. I would probably recommend that people that listen to this podcast episode would also have a look at the RFC, because it will come with examples in the cases as well, and sometimes just looking at the examples is a lot easier than listening to the exact descriptions of strengths as parsed by the PHP engine. George Peter Banyard 15:53 Yes, which, at time can be mostly weird and nonsensical, but mostly based on Perl semantics. Derick Rethans 16:02 Sometimes we steal from Java, sometimes we steal from Rust, and sometimes some Perl it seems them. And there's nothing wrong with that. George Peter Banyard 16:10 There's nothing wrong, and in some sense, if you steal all the good things you get a better language, and sometimes you make some slight mistakes along the way. Derick Rethans 16:19 let me not start about the @@ operator. We'll keep that for another episode, maybe. George Peter Banyard 16:25 Yes. Derick Rethans 16:26 When do you think you're going to put this up for a vote? George Peter Banyard 16:29 So I started the discussion early this week. So on the 29th of June. I would expect the two weeks discussion period, because feature freezes coming up pretty soon. It needs to be voted on before and implemented into core before that. Voting should start on the 13th of July for two weeks until the 27th, which would give like another week to land stuff; to land it into core and tweak the implementation details. Derick Rethans 16:59 I'm expecting a lot more RFCs just wanting to get in, just before the deadline. George Peter Banyard 17:05 I suppose so, it's also kind of difficult because getting really tight. Derick Rethans 17:09 Okay, George. Thanks for this. Would you have anything else to add? George Peter Banyard 17:13 No, thanks for having me on the show again Derick, and I hope you have a nice evening. Derick Rethans 17:17 Thanks very much. Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Saner numeric strings Related Saner string to number comparisons RFC Related Permit trailing whitespace in numeric strings RFC Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 61: Stable Sorting London, UK Thursday, July 9th 2020, 09:24 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Stable Sorting RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 61. Today I'm talking with Nikita Popov about a rather small RFC that he's proposing called stable sorting. Hello Nikita, how are you this morning? Nikita 0:36 Hey, Derick, I'm great. How are you? Derick Rethans 0:38 Not too bad myself. Let's jump straight in here. The title of the RFC is stable sorting, what does that mean, what is stable sorting, or what is sorting stability? Nikita 0:48 Sorting stability refers to the behaviour of the sort when it comes to equal elements. And equal share means that we sort comparison function. For example, the one you pass to usort says the elements are equal, but there is still some way to distinguish them. For example, if you're sorting some objects, to take the example from the RFC, we have an array with users, and users have an age, and we use usort to only sort the users by age. Then according to the comparison callback all users with the same age are equal. But of course, the user also has other fields on which we can distinguish it. And the question is now in what order will equal elements appear. If we have a stable sort, then they will appear in the order they were originally in. So it's something not going to change. Derick Rethans 1:41 And that is not what PHP sorting mechanism currently does? Nikita 1:44 Right. PHP currently uses an unstable sort, which means that the order is simply unspecified. It will be deterministic. I mean if you take the same input array and sort it, then every time we will get the same result. But there is no well specified order or relative order of elements. There's just some order. The reason why we have this behaviour is that well there are, I would say, two, the only two sorting algorithms. There is merge sort. Which is a guaranteed n log n sort that the stable, but has the disadvantage that that requires additional memory to perform the merge step. The other side there is a quicksort, which is an average case n log n sorting algorithm and is unstable, but does not require any additional memory. And in practice, everyone uses one of these algorithms, usually with a couple of extensions on sort of merge sort. Nowadays we use timsort, but which is still based on the same underlying principle, and for quicksort, we have sort which is better than quicksort, which tries to avoid some of the bad worst case performance which quicksort can have. PHP currently uses us a quicksort, which means that our sorting results are unstable. Derick Rethans 3:07 Okay, and this RFC suggesting to change that. How would you do that? How would you modify quicksort to make it stable? Nikita 3:15 Two ways. One is to just change the sorting algorithm. So as I mentioned, the really popular stable sorting is timsort, which is used by Python by Java and probably lots of other languages at this point. And the other possibility is to stick with an unstable source. So to stick with quicksort, but to artificially enforce that the comparison function does not have, does not report equal elements that are not really equal. And we can do that by introducing an extra artificial fallback comparison. We remember the order of the elements in the original array. And as the comparison function tells us that elements are equal. You will check against this original order, which means that, okay are sort of still unstable, but because the comparison, we'll never actually report that two elements are equal unless they really equal. It doesn't matter for the result. Derick Rethans 4:16 So you're basically artificially changing the key to have the original index in the array. Nikita 4:24 That's pretty much exactly the implementation. And this is actually also how you would implement the stable sort if you'd do it in PHP code. So you would take your array and convert it into an array of pairs, where you have the original array value and the original position of the array element. Difference is just that if you do this in PHP code this is extremely extremely inefficient, in terms of memory and performance, while when we do it internally it's essentially free because we already have a little bit of unused space in each array element. We can easily store the current position. Derick Rethans 5:02 Do you think there will be much of a performance hit here? Nikita 5:04 So I expect that there is a bit of performance hit, but for typical usage, not much. For the good case where your array does not actually contain any equal elements, the overhead should be very small, something like maybe one or 2%,. If your array does contain a huge number of duplicates. Then there is more overhead, and the effect is basically that the sort performance, no longer depends on the number of duplicates you have. Previously if you had a lot of duplicates, then the sort became faster, the more duplicates you had. Well now, as you add more duplicates, the sorting performance will stay both stable. That's really the difference in performance. Derick Rethans 5:53 If you have the numbers in the RFC I'll make sure to link to them. There are possibility that is that this is going to break any code? Nikita 6:01 Yes, it could break tests. Derick Rethans 6:04 Tests, because the test's output can change because the sorting order of arrays might have changed. Nikita 6:11 Exactly. So we already had such a change in PHP seven, where we switched from a pure quicksort, to a hybrid quicksort and insertion sort, which means that effectively we have a stable source for arrays smaller than 16 elements and an unstable source for larger arrays, which is weird, weird intermediate state. Derick Rethans 6:33 Yes. Nikita 6:35 I think that one already had quite a bit of fallout for testing purposes. Hopefully this one will be a little bit smaller because most tests will work on a few elements. Those would have already been stable previously. But there is definitely going to be a little bit of fallout for unit testing. Derick Rethans 6:56 At the moment we're talking about this, the RFC's already up for voting. By the time this podcast has come out. It's pretty likely that it has been accepted for PHP eight, because I think the voting was 51 to zero or something like this. Nikita 7:10 It's 36 to zero. Derick Rethans 7:13 There you go. Thank you, Nikita for taking the time this morning to talk to me about stable sorting. Nikita 7:19 Thanks for having me. Derick Rethans 7:23 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Stable Sorting Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 60: OpenSSL CMS Support London, UK Thursday, July 2nd 2020, 09:23 BST In this episode of "PHP Internals News" I chat with Eliot Lear (Twitter, GitHub, Website) about OpenSSL CMS support, which he has contributed to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 60. Today I'm talking with Eliot Lear about adding OpenSSL CMS supports to PHP. Hello Eliot, would you please introduce yourself. Eliot Lear 0:34 Hi Derick, it's great to be here. My name is Eliot Lear, I'm a principal engineer for Cisco Systems working on IoT security. Derick Rethans 0:41 I saw somewhere on the internet, Wikipedia I believe that he also did some RFCs, not PHP RFC, but internet RFCs. Eliot Lear 0:49 That's correct. I have a few out there I'm a jack of all trades But Master of None. Derick Rethans 0:53 The one that piqued my interest was the one for the timezone database, because I added timezone support to PHP a long long time ago. Eliot Lear 1:01 That's right, there's a whole funny story about that RFC, we will have to save it for another time but there are a lot of heroes out there in the volunteer world, who keep that database up to date, and currently the they're corralled and coordinated by a lovely gentleman by the name of Paul Eggert and if you're not a member of that community it's really a wonderful contribution to make, and they need people all around the world to send an information but I guess that's not why we're here today. Derick Rethans 1:29 But I'm happy to chat about that at some other point in the future. Now today we're talking about CMS support in OpenSSL and the first time I saw CMS. I don't think that means content management system here. Eliot Lear 1:41 No, it stands for cryptographic message syntax, and it is the follow on to earlier work which people will know as PKCS#7. So it's a way in which one can transmit and receive encrypted information or just signed information. Derick Rethans 1:58 How does CMS, and PKCS#7 differ from each other. Eliot Lear 2:03 Actually not too many differences, the externally the envelope or the structure of the message is slightly better formed, and the people who worked on that at the Internet Engineering Task Force were essentially just making incremental improvements to make sure that there was good interoperability, good for email support and encrypted email, and signed email, and for other purposes as well. So it's very relatively modest but important improvements, from PKCS#7. Derick Rethans 2:39 How old are these two standards? Eliot Lear 2:42 Goodness. PKCS#7, I'm not sure actually of how old the PKCS#7 is, but CMS dates back. Gosh, probably a decade or so I'd have to go look. I'm sorry if I don't have the answer to that one, Derick Rethans 2:56 A ballpark figure works fine for me. Why would you want to use CMS over the older PKCS#7? Eliot Lear 3:02 You know, truthfully, I'm not, I'm not a cryptographer, so the reason I used it was because it was the latest and greatest thing and when you're doing this sort of work. I'm an, I'm an interdisciplinary person so what I do is I go find the experts and they tell me what to use. And believe it or not, I went and found the person who's the expert on cryptographic signatures, which is what I need. I said: What should I use? He said: You should use CMS and so that's what I did. What I ran into some troubles though, which is that some of the tooling, doesn't support CMS. So, in particular PHP didn't support CMS. So that's why I got involved in the PHP project. Derick Rethans 3:40 You are a new contributor to the PHP project. What did you think of its interactions? Eliot Lear 3:45 I had a wonderful time doing the development. There was a fair amount of coding involved, and one has to understand that the underlying code here is OpenSSL and OpenSSL's documentation for some of its interfaces could stand a little bit of improvement. I needed to do a fair amount of work and I needed a fair amount of review so I got a lot of support from Jakub particular, who looks after the OpenSSL code base, as one of the maintainers, and I really enjoyed the CI/CD integration, which allowed me to check the numerous environments that PHP runs on. I really enjoyed the community review, and I really enjoyed it even though I didn't have to really do one in my case, I did do an RFC, as part of the PHP development process, which essentially forced me to write really good documentation or at least I hope it's really good. Before all of the caller interfaces that I defined, so it was a really enjoyable experience. I really liked working with the team. Derick Rethans 4:47 That's good to hear. I think sometimes although an RFC wasn't particularly necessary here, as an RFC one particularly necessary I always find writing down the requirements that I have for my own software, first, even though this doesn't get publicized or nobody's going to review that always very useful to just clear my head and see what's going on there. Eliot Lear 5:06 Yeah, I think that's a good approach. Derick Rethans 5:07 During the review, was there a lot of feedback where you weren't quite sure, or what was the best feedback that you got during this process? Eliot Lear 5:15 Biggest issue that we had was, how to handle streaming, and we have some code in there now for streaming, but it's it's unlikely to get really heavily exercised in the way that the interfaces are defined right now. It's essentially files in/files out interface which mirrors the PKCS#7 interface. One of the future activities that I would like to take on if I can find a little bit more time, is to move away from the files in/files out interface, but rather use an in memory structure or in memory interface. So that can actually take advantage of streaming and can be more memory efficient, over time. Derick Rethans 5:56 When you say file now you actually provide a file name to the functions? Eliot Lear 6:00 That's right, you know, depending on which of the interfaces you're using, there's an encrypt, there's an encrypt call there's a decrypt call. There's a sign and a validate call, and or a verify call, and each of them has a slightly different interface, but you know if you're encrypting you need to have the destination that you're encrypting through these are all public key, you know PKI based approaches so you have to have the destination certificates, that you're sending. If you're verifying you need to have the private key to do or you need, I'm sorry you need to have the public key chain and if you're decrypting to have the private key to do all this. So, but they're all filenames that are passed and it's a bit of a limitation of the original interface in that you probably don't really want to be passing file names from most of your functions you'd rather be passing objects that are a bit better structure than that. Derick Rethans 6:53 Is the underlying OpenSSL interface similar or does that allow for streaming in general? Eliot Lear 6:59 The C API allows for streaming in such. The command line interface, it doesn't seem to me that they do any particular things with with streaming. If you look at the cryptographic interface that we that we did for CMS, mostly it is an attempt to provide the capability that you would otherwise have on the open using the OpenSSL command line interface and I think the nice thing here is that we can evolve from that point. Derick Rethans 7:26 And the progress wouldn't only be done implemented for the CMS mechanism, but also for PKCS#7, as well as others that are also available. Eliot Lear 7:35 Yes. Another area that I would like to look at, I'm not sure how easy it will be, we didn't try it this time was to try and combine the code bases because they are so close, and be a little bit more code efficient, but there are just slight enough differences in the caller interfaces between PKCS#7 and CMS that, I'm not sure I could get away with using void functions for everything I have. I might have to have a lot of switches, or conditionals in the code. But what I am interested in doing for both sets of code is, again, providing new interfaces, where instead of passing file names, you're passing memory structures of some form that can be used to stream. That's the future. Derick Rethans 8:22 I've been writing quite a bit of GO code in the last couple of months. And that interface is exactly the same, you provide file names to it, which I find kind of annoying because I'm going to have to distribute these binaries at some point. And I don't really want any other dependencies in the form of files, so I need to figure out a way how to do that without also provide those key files at some point. Eliot Lear 8:43 Indeed, that's, that's an issue, and for us right well who are web developers I did this because I was doing some web development. A lot of the stuff that I want to do. I just want to do in memory and then pass right back to the client and I don't really want to have to go to the File System. And right now, I'll have to take an extra step to go to the File System and that's alright, it's not a big deal, but it'll be a little bit more elegant when I get away from that. We'll do that you know at an appropriate time. Derick Rethans 9:11 Yes, that sounds lovely. I'm not an expert in cryptography either. I saw that the RFC mentions the X 509. How does it tie in with CMS and PKCS #7? Eliot Lear 9:21 X 509 is essentially a certificate standard. In fact, that's what really what it is. A certificate essentially has a bunch of attributes, along with a subject being one of those attributes and a signature on top of the whole structure. And the signature comes from a signer, and the signer is essentially asserting all of these attributes on behalf of whoever sent the request. X 509 certificates are, for example the core of our web authentication infrastructure. When you go to the bank online, it uses an X 509 certificate to prove to you that it is the bank that you intended to visit, that's the basis of this and CMS and PKCS#7 are structures that allow the X 509 standard to be serialized, so there's the distinguishing coding rules that are used underneath PKCS#7 and CMS, and then what you have, CMS essentially was designed as at least in part for mail transmission. So how is it that you indicate the certificate, the subject name, the content of the message. All of this information had to be formally described, and it had to be done in a way that is scalable. And the nice thing about X 509, as compared to say just using naked public keys, is with naked public keys, the verifier or the recipient has to have each individual public key, whereas with X 509, it uses the certificate hierarchy such that you only need to have the top of the chain, if you will, in order to validate a certificate. So X 509 scales, amazingly well, we see that success, all throughout the web. And so that's what CMS and PKCS#7 help support. Derick Rethans 11:24 Like I said, I've never really done enough research into this but I think it is something that many web developers should really know how that works because this comes back, not only with mail, but also with HTTPS. Eliot Lear 11:35 It's another part of the code right. So CMS isn't directly used for supporting TLS connections, there's a whole a whole set of code inside of PHP for that. Derick Rethans 11:44 Would you have anything else to add? Eliot Lear 11:46 I would say a couple of things. The basis of this work was that I was attempting to create signatures for something called manufacturer usage descriptions. The reason I got involved with PHP is that I'm doing tooling that supports an IoT protection project. And this this manufacturer usage descriptions essentially describes what the device, what an IoT device needs in terms of network access. And the purpose of using PHP and adding the code that I added was so that those descriptions could be signed, and that's why Cisco, my employer, supported my activity. Now Cisco loves giving back to the community. This was one way we could do so it's something I'm very proud of when it comes to our company. And so we're very happy to participate with the PHP project. I really enjoyed working with Derick Rethans 12:33 That's glad to hear. I'm looking forward to some other API improvements because I agree that the interfaces that the OpenSSL extension has aren't always the easiest to use and I think it's important that encryption is easy to use, because more people will use it right. Eliot Lear 12:49 I have to say, in my opinion, the encryption interfaces that we have today are still relatively immature. And not just CMS, the code that I wrote, which is really you know fresh it just got committed, but the whole category of interfaces, is something that will evolve over time and it's important that it do so because the threats are evolving over time and people need to be able to use these interfaces, and we can't all be cryptographic experts, I'm not. I just use the code but I needed to write some in order to use it in my case, but as we go on I think will enjoy richer and easier to use interfaces that normal developers can use without being experts. Derick Rethans 13:38 PHP has been going that way already a little bit because we started having a simple random interface, and in a simple way of doing hashes and verifying hashes, to make these things a lot easier because we saw that lots of people are implementing their own ways in PHP code, and pretty much messing it up because, as you say not everybody's a cryptographer. Eliot Lear 13:56 That's right. And so that's a really good thing that PHP did, because as you pointed out, it eliminates all the people who are going onto the net looking for the little snippet of code that they're going to include in PHP, whether that snippet is correct or not that's a big issue. Derick Rethans 14:11 Absolutely. And cryptography is not something that you want to get wrong. Eliot Lear 14:15 That's right, because for every line of code that you've written in this space, there's going to be somebody who's going to want to attack it, maybe several. Derick Rethans 14:23 Absolutely. Thank you, Eliot, for taking the time this morning to talk to me about CMS support. Eliot Lear 14:28 It's been my pleasure Derick, and thanks for having me on. And again, it was really enjoyable to work with the PHP team and I'm looking forward to doing more. Derick Rethans 14:38 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Add CMS Support Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 59: Named Arguments London, UK Thursday, June 25th 2020, 09:22 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about his Named Parameter RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 59. Today I'm talking with Nikita Popov about a few RFCs that he's produced. Hello Nikita, how are you this morning? Nikita Popov 0:35 Hey Derick, I'm great. How are you? Derick Rethans 0:38 Not too bad, not too bad today. I think I made a decision to stop asking you to introduce yourself because we've done this so many times now. We have quite a few things to go through today. So let's start with the bigger one, which is the named arguments RFC. We have in PHP eight already seen quite a few changes to how PHP deals with set up and things like that we have had an argument promotion in constructors, we have the mixed type, we have union types, and now named arguments, I suppose built on top of that, again, so what are named arguments? Nikita Popov 1:07 Currently, if you're calling a function or a method you have to pass the arguments in a certain order. So in the same order in which they were declared in the function, or method declaration. And what named arguments or parameters allows you to do is to instead specify the argument names, when doing the call. Just taking the first example from the RFC, we have the array_fill function, and the array_fill function accepts three arguments. So you can call like array_fill( 0, 100, 50 ). Now, like what what does that actually mean? This function signature is not really great because you can't really tell what the meaning of this parameter is and, in which order you should be passing them. So with named parameters, the same call would be is something like: array_fill, where the start index is zero, the number is 100, and the value is 50. And that should immediately make this call, like much more understandable, because you know what the arguments mean. And this is really one of the main like motivations or benefits of having named parameters. Derick Rethans 2:20 Of course developers that use an IDE already have this information available through an IDE. But of course named arguments will also start working for people that don't have, or don't want to use an IDE at that moment. Nikita Popov 2:31 At least in PhpStorm, there is a feature where you can enable these argument labels for constants typically only. This would basically move this particular information into the language, but I should say that of course this is not the only advantage of having named parameters. So making code more self documenting is one aspect, but there are a couple couple more of them. I think one important one is that you can skip default values. So if you have a function that has many optional arguments, and you only want to say change the last one, then right now you actually have to pass all the arguments before the last one as well and you have to know: Well, what is the correct default value to pass there, even though you don't really care about it. Derick Rethans 3:19 If I remember correctly, there are a few functions in PHP's standard library, where you cannot actually replicate the default value with specifying an argument value, because they have this really complex and weird kind of behaviour. Nikita Popov 3:33 That's true, but that's something we're trying to eliminate in PHP eight mostly. Derick Rethans 3:39 And of course additional you'd never have to remember, whether in_array and array_search have needle or haystack first, which is also beneficial. Nikita Popov 3:46 That's true. Yeah. Derick Rethans 3:48 You mentioned that there are a few other benefits as well. You mentioned self documenting and the skipping of arguments, what other benefits are there? Nikita Popov 3:54 The other part is that you can also reorder the parameters. So this varies a little bit by language. In some languages you're required to still pass the arguments in the same order. They were declared, even if you're using name parameters. But for the purposes of PHP, you would allow passing them in arbitrary order. Just like you said you don't have to remember if the haystack is first, or the needle comes first. And I think one case where all of these benefits, play together particularly well, is when it comes to object construction. So you already mentioned that we have the constructor promotion RFC in PHP eight, which makes it pretty simple to declare value objects. So you just list all the available properties and their default values and types, the constructor and you're done. But when you actually instantiate the object, you still have to, their ergonomics are not particularly good, because you have to remember in which order you have to pass the parameters, don't really know which parameters which just looking at the call. And once again, you have to specify everything and you can't just skip a few of them with default values. And if you have like a constructor with maybe five or six arguments coming in, which is maybe unusual for normal methods, but I think somewhat normal for constructors in particular, then the current development experience there is just not very nice. And named parameters would essentially provide us something akin to an object initialization syntax which is available in many other languages, and which has also been proposed for PHP, previously. But you would get this just as a side effect of combining constructors and named parameters, without having to define any kind of special semantics for how object construction works, and how initializer syntax interacts with constructors and so on. Derick Rethans 5:55 That ties in again with the object ergonomics that I spoke about with Larry earlier this season as well. Nikita Popov 6:01 Yeah, I believe that this combination of ,constructor promotion and named parameters for constructors was one of the things. Derick Rethans 6:10 We've spoken a little bit about what it is. Now, how would you use this in PHP, what is the syntax for that you're proposing? Nikita Popov 6:18 I mean syntax is always bike shedding question. The particular one, I am proposing for now is to save the parameter name as literal, so no dollar in front of it or something. And the colon and the value you want to pass. Derick Rethans 6:35 Is there any precedence for this syntax already, either in PHP or outside of PHP? Nikita Popov 6:41 In PHP, not really. I mean, PHP, we usually use the double arrow to have any kind of key value mapping. This is sort of key value mapping. In other languages, yes the syntax does exist. I'm actually not sure which languages exactly use it. Probably C sharp and Kotlin. Python uses just an equal sign. Well, there are a couple who use it. I actually initially use the double arrow syntax because it's more familiar with PHP, but I found that it's, there's not really read as nicely. And I also have some ideas on how we can, like, integrate this colon syntax, into the language in a more consistent way. Derick Rethans 7:27 I think I saw in the RFC that the only said the only way how you can do the keys is by literal and not by a variable. Nikita Popov 7:34 That's right. This is mainly just to avoid confusion. Well if you allow specifying a variable, then the question is, well, is this variable just the parameter name? Because I mean the signature, you also write this as a variable, or is it the variable that contains the parameter name like variable variables in PHP. So I think to sidestep that confusion, we just allow identifiers, but you can still use a variable parameter names from the argument unpacking syntax. Derick Rethans 8:04 How does that work? Nikita Popov 8:05 So PHP supports the three dots, the ellipsis operator, both in the function declaration, and for function calls. The declaration that just means collect all the trailing arguments. And the call, at the call, means that you get an array, and the elements of this array should be interpreted as function arguments. And parameters extend that by also allowing array keys. And if you unpack an array with string keys then those will be interpreted as parameter names, and we'll use the usual named parameters passing semantics. Derick Rethans 8:47 Interesting. I actually missed that, while reading the RFC. To be fair, I skimmed it, not really read tit. Yeah it's good to see that actually. Now people currently use positional arguments and not named arguments. How would these two interact. Nikita Popov 9:01 Mostly, the named parameters are just syntax for positional arguments, so we perform an internal transformation to convert named parameters into positional parameters. As far as both the engine is concerned and the callee is concerned. They don't really know about parameters that's all. They see usual positional call where all the missing arguments have been filled in with default values. I think the only part to watch out for there is exactly this case of variadics, because previously, the variadic parameter could only contain a list of arguments, and now it can also have string keys, or like left over named parameters. So which did not have a matching argument in the function signature so both will now get collected to the variadic parameter. Think that's like the only case where I know that the calling convention really changes for the recipient of the arguments. Derick Rethans 10:02 Because otherwise got a normal array they now get a bunch of things with potentially having keys in there as well. What would happen if I specify a named argument by name and also include it into the variadics? Nikita Popov 10:15 So generally the rule is always you can pass a parameter at most once you can have the situation where you first pass some positional arguments, and then you pass named arguments. If you do that this named argument cannot clash with the previous past positional argument, if you run in this kind of situation we will always throw an exception at that point. So you're not allowed to overwrite the previous argument, or something like that. Derick Rethans 10:42 Same would work that if a method would collect named arguments and also have the variadics array. In case you specify more arguments then the function would take. And, in the variadics you'd have that name again that would have already clashed before it even gets turned into variadic. Are the names that she gives to named arguments are case sensitive or case insensitive? Nikita Popov 11:04 They are case sensitive. Because the parameters you specify in the function are just variables and variables in PHP are case sensitive as well. Derick Rethans 11:14 At the moment if you inherit a method in a inheriting class, then it doesn't particularly matter what the names of these method arguments are. When you get now named arguments, is this going to change, because at the moment PHP doesn't enforce that the names of inheriting methods are of course clashing, or the same as the ones that are overriding in the parent class? Nikita Popov 11:37 This is one of the bigger open questions we have. The problem is that if you call a method with the names from the parent class, and the child class change them, then you'll get an error because this named parameter just doesn't exist in the child class. And there are a couple of ways to approach that one is to forbid during inheritance, any kind of parameter name changes, which would be a fairly significant backwards break because well, it never mattered in the past and based on some cursory analysis, this is like parameter name changes, somewhat common in code right now. The other possibility is to just ignore this issue, expect that a lot of code is never going to use name parameters. So using the parameters only makes sense with some types of methods. If you have a method that only accepts one argument can be pretty sure that no one's going to call it that has a name parameter, and there is the option of just ignoring this issue and fixing it as it comes up, more or less. Which is maybe not the most principled approach. But if we look at other languages that do make heavy use of parameters for example like Python. And we see that they also just ignore the problem. So it looks like in practice this does work out. Of course, a significant difference there is that Python has had in parameters for a long time already. We will be retrofitting them on an old language. So the situation is somewhat different and probably rather than more dangerous for us. Derick Rethans 13:14 This is something of course that static analysis tools can check for quite easily and I would argue that they probably should start doing that as well. Nikita Popov 13:22 This this right, so this is both something easy to check for, and also easy to automatically fix. Derick Rethans 13:28 Except that you need to choose which one is the correct name, of course. Nikita Popov 13:32 Yeah, that's right. But there is one more possibility, which is to allow the parameter names from both the parent method, and the child method. This will be like more or less a transparent way to fix that issue. The only problem you can run into this if both the parent method and the child method use the same parameter name but in a different position. If we would go with this option then we say that only in this particular case where parameter name is reused but different position that would become an inheritance error. Derick Rethans 14:04 I quite like that actually, because that's a pragmatic approach isn't it? Nikita Popov 14:07 I also quite like it, maybe it's just technically a bit problematic. Derick Rethans 14:11 I can already imagine that if this gets accepted for PHP eight, which of course not sure at the moment, that Xdebug is going to have to show the variadics already with the names array elements which of course it doesn't do yet because it has no notion of. But that's good to know to have a heads up on these things. PHP eight has already seen quite a lot of work for internal methods to get their names properly, recorded as well, so that types of stubs that you have already been working on. How does named arguments tie in with this? Nikita Popov 14:38 The actual named arguments proposal is already pretty old. It dates back to PHP 5.6, I think, and one of the open questions since then was how we handle internal control functions, because they don't really have a notion of default values. We have optional parameters, but the default value is not known to the engine, it's only known to the implementation. There are kind of ways to work around that. They are not really safe, so they will work for most functions, but for some which who like argument context, we might end up just crashing if this function is used with named parameters and particularly weird way. One of the nice things in PHP eight is that thanks to the stub effort we actually have default values for functions available as collectible meta data so it's available for reflection, and we will would also be able to use this for named parameters. If an internal function parameter has been skipped, we can essentially fetch it from reflection and fill in the value, the same way we would do for for normal user functions. The issue there is that this only works if there are stubs available. This works for all of our internal functions. I mean, not internal but bundled functions for PHP, but it will not work out of the box with old extensions. So it will mostly work, just this kind of parameter skipping is not going to work. So it will give you an error like okay we don't have default information for this function so you can't call it like this. Derick Rethans 16:17 There's this common myth saying that reflection is actually a very slow thing, you should never use this in your code. Is this going to be a concern for using reflection information this way for internal functions? Nikita Popov 16:29 Well, I mean the self like you will be directly using reflection, but internal API's that do the same thing. There is a performance concern here because we store the default values, not as values but as strings. So, in the worst case we actually have to parse those strings, convert them into a syntax tree, validate the syntax tree. That's all. That's of course slow, but it's not like we can't add a bit of caching in there to make sure this only happens once, at which point the problem should be avoided. Derick Rethans 17:02 Especially when you use things like opcache. Nikita Popov 17:04 I should say that I do expect name parameter calls to be generally slower than positional calls, so maybe in super performance critical code you would stick with the positional arguments. Derick Rethans 17:16 I mean it would work perfectly well so far object construction still right? Nikita Popov 17:19 For object construction the real cost is really in the object allocations so and so. Derick Rethans 17:24 With the introduction of named arguments aren't going to be any BC breaks, potentially? Nikita Popov 17:29 There are not going to be any direct BC breaks, but there are of course some concerns. The first one is the change I mentioned about the variadics. That variadics can now have string keys. But I should clarify what I mean by: no, no, BC breaks. If you don't use named arguments than nothing is going to break. But of course, if named arguments are used with code that did not expect them, then we can run into some issues. So that's one of the issues. And the other one is more of a like long term maintenance concern that if we introduce named parameters, then those parameters become significant to the API, which means you cannot rename parameter names in minor versions of a library if you're semver compatible. Because, you might be breaking some codes on using those parameter names. And I think one of the biggest concerns that has come up in the discussion is that this is a significant increase in the API burden for open source libraries. Derick Rethans 18:34 Because now suddenly, they have to think about the names of the arguments to all their methods as well, right. Nikita Popov 18:39 So I think, like, the merits of this proposal, mostly comes down to how much additional burden does this impose on people maintaining libraries versus how much like ergonomics improvements that we get out of the feature for everyone else. One more thing to consider is that named parameters really change how you design APIs or what APIs you can reasonably design. So right now if you have a method with, for example, three boolean arguments, that would be like a really horrible method, because you call it like, true, true, false, like what does this mean? If you have name parameters, and you have the same three boolean arguments, then it's not really a problem any more. So you can, of course, you say, what the argument means and you can leave out arguments that are that you don't want to modify. Derick Rethans 19:30 You mentioned that this RFC is quite old already. Do you think this will make it into PHP eight, as we're getting closer and closer to feature freeze, we're not quite there yet we have another month or so to go. Do you think it's ready enough to throw to the lions, so to speak? Nikita Popov 19:46 So I think I will at least give it a try, because I do think that PHP eight is a good target for such a change. Even though it nominally does not break backwards compatibility, it does have a very significant impact in practice, so it wouldn't be good to put this on a major version. And additionally, we also did all this work on stubs in PHP eight with this it'll also fits in very well. Oh, and finally, one thing I didn't mention before is that we get attributes in PHP eight. And attributes, firstly, replace the existing Doctrine annocation system, which already supports named parameter. For all the code that is now going to migrate from Doctrine Annotations to PHP Attributes, it would be helpful if we had named parameters, because it would make the migration a lot more straightforward, because you don't also have to change the meaning of the arguments at the same time. Derick Rethans 20:51 I'm curious to see what the reception of this will be, especially when it is going to be voted for. Nikita Popov 20:57 Yeah me as well. I never did get this to voting, the last time around, but we should at least get a vote this time and well if it doesn't go through then there is always next time. Derick Rethans 21:10 there's always next time yes. Okay Nikita Thank you for taking the time this morning to talk to me about named arguments. Nikita Popov 21:17 Thanks for having me Derick. Derick Rethans 21:20 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Named Parameters Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 58: Non-Capturing Catches London, UK Thursday, June 18th 2020, 09:21 BST In this episode of "PHP Internals News" I chat with Max Semenik (GitHub) about the Non-Capturing Catches RFC that he's worked on, and that's been accepted for PHP 8, as well as about bundling, or not, of extensions. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:18 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 58. Today I'm talking with Max Semenik about an RFC that is proposed called non capturing catches. Hello Max, would you please introduce yourself. Max Semenik 0:38 Hi Derick. I'm an open source developer, working mostly on MediaWiki. So that's how I came to be interested in contributing to PHP. Derick Rethans 0:50 Have you been working with MediaWiki for a long time? Max Semenik 0:53 Something like 11 years, I guess. Derick Rethans 0:56 That sounds like a long time to me. The RFC that you've made. What is the problem that is trying to address? Max Semenik 1:03 In current PHP, you have to specify a variable for exceptions you catch, even if I you don't need to use this variable in your code, and I'm proposing to change it to allow people to just specify an exception type. Derick Rethans 1:20 At the moment, the way how you catch an exception is by using catch, opening parenthesis, exception class, variable, and you're saying that you don't have to do the name of the variable any more. I get that right? Max Semenik 1:33 Yes. Derick Rethans 1:34 Is that pretty much the only change that this is making? Max Semenik 1:38 Yes, it's a very small, and well defined RFC. I just wanted to do something small, as my start to contributing to PHP. Derick Rethans 1:51 I'm reading the RFC, it states also that the what used to be an earlier RFC. How does that differ from the one that you've proposed? Max Semenik 2:00 The previous RFC wanted to also permit a blanket catching of exceptions, as in anything. And that's all, which, understandably, has caused some objections from the PHP community. While most people commented positively on the part that I'm proposing now. Or should I say really propose because the RFC, passed and was merged yesterday. Derick Rethans 2:35 I had forgotten about it actually, it's good that you reminded me. So yeah, it got merged and ready for PHP eight. Basically what you say you picked the non controversial parts of an early RFC? Max Semenik 2:47 I actually chose something to contribute and then looked for an RFC, to see if it was discussed previously. Derick Rethans 2:55 Oh, I see. So, your primary idea of wanting to contribute to PHP, instead of you having an itch that you wanted to scratch, it's like you're saying? Max Semenik 3:04 I have way larger itches that I will scratch later when I will learn how to work with PHP's code base which, which is really huge. Derick Rethans 3:16 That makes some sense I suppose. When looking at the vote for the RFC I actually couldn't see that you had voted it for yourself. I missed something? Max Semenik 3:25 I don't have a php.net account so I can't vote for myself, obviously. Derick Rethans 3:31 I actually think you can because you have written an RFC. Max Semenik 3:35 I haven't seen any interface to vote. Derick Rethans 3:38 Interesting. It's actually something to catch up on because I pretty much sure that you can. Should investigate that for some other RFCs that are still open because I think you should be able to. Max Semenik 3:49 Would benice. I mean, this wouldn't change anything but.. Derick Rethans 3:54 That's true but I mean you've started contributing. If you be able to vote right that's the fair thing to do, I suppose. So as you said, this is your first contribution to PHP itself. How did you find the whole process of getting this going and getting started with it? Max Semenik 4:10 As far running an RFC, it was fairly straightforward to me. Maybe because I was looking at PHP RFCs in the past, so I knew how the process worked and it was really something that I already knew how to navigate. It's not the first open source community I'm contributing to, so I kind of know what to do in general. Derick Rethans 4:40 How large is the MediaWiki community? Max Semenik 4:43 It's probably larger than PHP community in terms of actively contributing people, as in which the Wikimedia Foundation has lots of paid programmers that work on the ecosystem. Obviously the outreach of your community is larger than MediaWiki's. Derick Rethans 5:08 You're saying that there's more people working on, on it. But there's more people using PHP? Max Semenik 5:15 And more people actively interested in development. Derick Rethans 5:21 Do you think that's because it's easier to contribute to something that's written in PHP, than PHP itself? Max Semenik 5:28 Not a lot of people know how to program in C these days. And while I used to be paid for writing C, my C's currently extremely rusty. Unlike PHP, for example. Derick Rethans 5:44 For me it's sort of the other way around, because I haven't been writing PHP code for quite some time now, except for some test cases, so I know nothing about frameworks whatsoever. I know C pretty well. In any case, we now have one more active contributor, that is you, that is you. You've things merged that makes you a contributor, in my eyes. As this is a pretty small RFC. And I think during the course of the last few months we have I've discussed with several other contributors that small RFCs are a good thing, because it makes it much harder to find problems with. There are a few other RFCs as well that are also so small and for which the authors declined to talk to me about that for various different reasons. And two of those are actually really really simple things, and they are both having to do with the bundling of extensions in PHP. Now, just thinking about this question. How does MediaWiki, for example, think about which extensions, it can use in its source code? Max Semenik 6:45 For MediaWiki. First of all, on start-up MediaWiki quickly checks if all the hard required extensions are available, and they just bails out if they aren't available. I need to look, whether it checks for JSON or as soon as it's way too obvious to even consider whether it's present or not. Derick Rethans 7:10 So you just mentioned the JSON extension. That makes sense because that's one of my notes. One of the RFCs as you just alluded to is to JSON extension, and PHP eight will have this always available now without you having to enable this in configure flags, which is pretty good way of making sure that extension is always available to everybody using PHP. Do you agree that having a JSON extension always available is a good idea for PHP? Max Semenik 7:37 Yes absolutely. One of the aspects of writing software that's available for everyone to use, as opposed to some internal company software that's running on a few servers and that it, is that the you need to support a wide variety of systems. And if it's possible to compile PHP without JSON, it means that someone will compile without it. It also means that some Linux distribution developers will package it as a separate package, and then someone will not install it, and you will get people to complain that MediaWiki doesn't work on their system. For more, very popular extensions are available. If I will know that many popular extensions that I need, are always available, it makes my job easier and it also allows me to write better software, without having to resort to hacks and decrease the functionality. Derick Rethans 8:52 An what some other framework to do this they start making polyfills for them. Max Semenik 8:56 And these polyfills might have vital like orders of magnitude worse performance. If I can have guarantees that a system has JSON, as well as other extensions like mbstring, intl, and so on, it would be really awesome. Derick Rethans 9:16 The argument always between, do we always want to have everything inside PHP or not, and at some point you need to start making a distinction about is this useful enough for everybody or just for a smaller group of people, and mbstring is probably an example where this is sort of, sort of on the line right. I mean it's useful enough, but is it useful enough to have it always enabled instead of having it easily installed as a package. Max Semenik 9:42 Well you know lots of people are running software, whether it's MediaWiki whether it's some WordPress or something else on crappy shared hosting, which is the bane of every programmer's existence but they still have to support it. The question is really something can be messed up. Some people will have to run a node on systems that have messed up. And if we can avoid it. Why not? Derick Rethans 10:11 Another RFC that's just gone through its unbundling extension. Some versions of PHP will have extensions, being brought into core and being always made available like we did with the hash extension in PHP seven four. But of course we also removing extensions from PHP to live somewhere else. Not even having them always enabled but not even having them distributed with a PHP source code. In PHP seven four we had for example the Firebase extension, I believe, because there wasn't a lot of people using this. In this case we having the XMLRPC extension. Have you ever heard of this XMLRPC extension, because you said you've been programming PHP for a while? Max Semenik 10:51 I've heard about the protocol itself and I might have heard about PHP having this extension, but I've never used it, and honestly I don't know why anyone using it. Derick Rethans 11:04 It's sort of being used a little bit when people really didn't want to use SOAP, because it was too complicated. But before we had invented JSON pretty much. That's a long long time ago. Max Semenik 11:18 These days. XMLRPC is sounds like a legacy corporate system. That's why probably, it's no use having it in PHP proper. Derick Rethans 11:32 I think I very much agree there. In any case, non capturing caches are in PHP eight. You said that the RFC was saccepted, has the patch being merged as well. Max Semenik 11:41 Yep. Derick Rethans 11:42 Great. I'm going to have to have a flavour that I'm going to give a talk next month for the Dutch PHP conference, where I'm talking about a new additions in seven four, but also what's coming up in eight dot zero, I might be able to have a slide about it in there. Max Semenik 11:57 Awesome. Derick Rethans 11:58 Thank you, Max for taking the time today to talk to me about non caption captures and bundling of extensions. Max Semenik 12:05 Thank you, Derick for giving me this tribune. It was a nice talk. Derick Rethans 12:09 Excellent. Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Non-Capturing Catches RFC: Always available JSON extension RFC: Unbundle ext/xmlrpc Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 57: Conditional Codeflow Statements London, UK Thursday, June 11th 2020, 09:20 BST In this episode of "PHP Internals News" I chat with Ralph Schindler (Twitter, GitHub, Blog) about the Conditional Return, Break, and Continue Statements RFC that he's proposed. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:17 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 57. Today I'm talking with Raluphl Schindler about an RFC that he's proposing titled "Conditional return break and continue statements". Hi Ralph, would you please introduce yourself. Ralph Schindler 0:37 Hey, thanks for having me Derick. I am Ralph Schindler, just to give you a guess the 50,000 foot view of who I am. I've been doing PHP for 22 years now. Ever since the PHP three days, I worked in a number of companies in the industry. Before I broke out into the sort of knowing other PHP developers I was a solo practitioner. After that I went worked for three Comm. And that was kind of a big corporation after that I moved to Zend. I worked in the framework team at Zend and then after that, I worked for another company based out of Austin for friend of mine Josh Butts. That offers.com, we've been purchased since then by Ziff media. I'm still kind of in the corporate world. Ziff media owns some things you might have heard of, PC Magazine, Mashable, offers.com. The company that owns us owns is called j two they are j facts. They keep buying companies, so it's interesting I get to see a lot of different products and companies they get bought and they kind of get folded into the umbrella, and it's, it's an interesting place to work. I really enjoy it. Derick Rethans 1:39 Very different from my non enterprise gigs Ralph Schindler 1:43 Enterprise is such an abstract word, and, you know, it's kind of everybody's got different experiences with it. Derick Rethans 1:49 Let's dive straight into this RFC that you're proposing. What is the problem that this RFC is trying to solve? Ralph Schindler 1:54 This is actually kind of the bulk of what I want to talk about, because the actual implementation of it all is is extremely small. As it turns out it's kind of a heated and divided topic, My Twitter blew up last weekend after I tweeted it out, and some other people retweeted it so it's probably interesting. I really had to sit down and think about this one question you've got is what is it trying to solve. First and foremost, it's something I've wanted for a really long time, a couple years. Two weekends ago I sat down and it was a Saturday and I'm like, you know what I haven't haven't hacked on the PHP source in such a long time. The last thing I did was the colon colon class thing, and I was like seven or eight years ago. And again, I got into that because I really wanted the challenge of like digging into the lexer and all that stuff and, incidentally, you know, I load PHP source in Xcode, and my workflow is: I like to set breakpoints in things, and I like to run something, and I look in the memory and I see what's going on and that's how I learned about things. And so I wanted to do that again. And this seemed like a small enough project where I could say, you know this is something I want to see in language, let me see if I can hack it out. First and foremost, I want this. And, you know, that's, it's a simple thing. So what is it exactly is, it's basically at the statement level of PHP, it is a what they like to call a compound syntactic unit. Something that changes the statement in a way that I think probably facilitates more meaning and intent, and sometimes, not always, it'll do that and fewer lines of code. To kind of expand on that, this is a bit of a joke but a couple years ago there was that whole argument online about visual debt. I don't know if you remember hearing that, that terminology. Derick Rethans 3:34 Yep. Ralph Schindler 4:47 «transcript missing, sorry» Derick Rethans 6:28 Up to now we haven't spoken about but the RFC is proposing so maybe we should talk about it first and then get back to other things that he said have you spoken a little bit about the reasons why you want to change something. But what would you like to add to PHP or, or what would you like to modify in PHP? Ralph Schindler 6:46 It's, you know, it's, it's very closely related to what in computer science is called a guard clause, and I used that phrase lightly when I originally brought it up on the mailing list but it's very closely aligned to that, it's not necessarily exactly that, in terms of the syntax. In terms of like when you speak about it in the PHP code sense, it really is sort of a change in the statement; so putting the return before the if. That's really what it is. So guard clause, it's important to know what that is, is it's a way to interrupt the flow of control, you know, over the history of programming languages. Ralph Schindler 7:19 Let's just go back to Pascal. Pascal like 50 years ago, there was no opportunity in Pascal code to exit early from either a loop, or a method, so you had to wait until you got to the very final sort of statement, and there was a single exit from a function. Guard clauses allow you to effectively, if you're inside of a block of code, or a loop, or some kind of flow of control. It gives you an opportunity to say I want to exit here instead of continuing on. They did a whole bunch of studies on Pascal and they found out that students were like, they couldn't come up with the right solution when let's say if you had a loop statement, it had to execute 100 times there was no opportunity to get out early. When you gave them the opportunity to interrupt the flow control the correctness of their solutions, ultimately got better. Almost 100% of the time they were able to, you know what this is an exceptional piece of code, I want to exit here. Fast forward guard clauses, they're kind of, if you've kind of followed the Kent Becks and the Martin Fowlers they would argue for guard clauses. Y'know over the line that's gotten more popular as an argument over the past, let's just say 15 years in our industry Derick Rethans 8:23 Would another term for this be like an early return? Ralph Schindler Early returns are one of them, early breaks, and early continues, so getting to a place in code where you just say you know what this, there's a particular condition, in this normal flow of execution, I want to stop that normal flow and I want to break out of it. Goto is another tool that allows you to do this. I don't know if you can do it inside of loops, maybe you can. There's like some exceptions in PHP where you can jump to and from, Derick Rethans You can jump out of loop, but you can't jump into one. Ralph Schindler To some degree, these tools do sort of exist, goto, another heated topic in the PHP world. So getting back to what the guard clause is. More specifically, it's, it is very closely, and semantically aligned with a Boolean expression. You will generally say, I want to either return, break, or continue, based off of this Boolean. PHP itself does not have first class support for guards. The way we achieve it currently is, we will put the Boolean expression first, and as part of a block of code associated with that, so: if curly brace block of code, that might terminate in a early return. Inside of switch statements or loops, you'll see that if something something something continue one continue two, or break one break two. Return expression, break continue, along with a return or break expression, is the way we achieve it in PHP. This is kind of giving first class support to a guard clause. It would spell it out in the manual and it would be a tool that since it has a name, and it isn't the language, programmers could reach out and say, I know what that is, or: Here's what it is in the manual, how do I use that? That's kind of, you know what a guard clause is. Derick Rethans At the moment, if you mentioned the guard clause you can sort of implement by doing: if, your condition and then a curly braces return, or break, or continue, whatever you set. What is the syntax that you want to replace this with? Ralph Schindler I don't want to replace syntax. PHP is a flexible language. We have multiple ways of doing lots of things. We have multiple ways of crafting closures and anonymous functions. We have two different ways that have existed since the beginning of PHP's time for doing if statements, one can be broken up by the, the semicolon, with the block the endif, or you can do with curly braces. You've noticed that with various PSRs and whatnot that people have gravitated towards a particular coding standard. And that, for all intents and purposes for the global community of programmers to have the shared diction, that's a good thing. Ralph Schindler 10:50 With regards to PHP. So the most important characteristic of this RFC is that it is now, PHP is a left to right language, you know like much of the 90-95% of the speaking world left to right. They tend to put the emphasis, especially encoding of precedence on the left side. So this moves the return keyword to the left side of a statement or syntactic unit. So when you look at this code. The first thing you see is: return. In the variation one, which is the one I proposed of this, this feature, "return" is followed by "if", what you notice is that when you look at code you'll see "return if", and almost looks like its own key word. Those two individual, you know tokens, those key words must align themselves closely together exactly. You know, maybe there's like two spaces between them but return if are right next to each other, they can be treated almost as a new keyword and of itself. So as you're reading code top down, left aligned, you'll see return if, return if, finally at the bottom method, you'll see return. So that's variation one and what it does is it creates sort of this precedence that the keywords you know the static constant keywords return an effort first. Your expression is third. Your optional return value is fourth. In most of the cases where you're writing this, it does become a one liner. That's not to say we can't do one liners today, because you can do: if, if-expression, something, return. But what happens when you look at that code is that the return value is off to the right. Optionally if you don't, if you want to break outside of the PSR coding standards, or with the PSR coding standards. You can do curly braces and then put the return on the next line, now you got three lines of code, you've returned is indented. As you're visually approaching this code. See, you know what's most important to you is that there's a if statement there, but then you have to kind of scan the body of that to see if there's an early return. The fact that it's an early return in variation one becomes abundantly clear at the leftmost rail of the code, at the leftmost side of the statement, assuming you're not putting all of your code on one line. Derick Rethans 12:59 You talk about variation one, I guess there's a variation two as well. What is the difference between them? Ralph Schindler 13:05 As with RFCs, people have preferences and they have. Just with politics in general, if you're in a political position, which this is a political changes to PHP, you have to know where your constellations are. You have to know, basically, if I want to appease the most amount of people like what will I have to give up in order to get something that is still beneficial to me. For me right now, it is the compromised position. That's not to say I won't like it more, maybe a month from now on, but effectively the variation two is moving the optional return value after the Return. Return, optional return value, then the if, i f, and then the optional, not the non optional if expression, followed by the semicolon. So basically it would read more like English, so to speak. Return this, if this. What I understand it is that way in Perl. I know it's that way in Ruby. So Ruby follows the same thing because the way they've implemented it is not necessarily in a single statement they've, they've implemented what they call a statement modifiers, which is any statement can be modified with this conditional at the end of it. That's the alternative syntax. If I were to use this, I get value out of it because maybe I don't return an optional expression and then I'm still left with return if this. I still have my escape hatch for methods that have an optional return, the ability to return void. Derick Rethans 14:26 In variation one, how do you separate out the condition with the optional return value? Ralph Schindler 14:32 Another reason why I thought variation one was good for PHP specifically. Let's just do like two seconds of history. If you go back 20 years, C++, the way you write a method signature in C++ is: you'll do public, int, method name, typed arguments, so the return, we call them, hints, the hint for the method in C++ precedes the method. Derick Rethans 14:55 I've just been talking to Dan Ackroyd for the podcast episode that came out last week, where he is saying that we should stop calling it hints, because they're no longer hints, they're not proper type names. Maybe we should pick that up here as well than? Ralph Schindler 15:10 We've had that discussion for 10 years now. But people know them as hints. We've such loaded phrasing and PHP like type coercion. Whatever we call them, I'll just continue with hints for the time being, because that's the audience at this particular podcast knows them as hints. The hint in C++ would have been all the way to the left of the line, whereas in PHP when we chose to implement typing of the return values, we did it in a way where it was the method signature had the semi colon and the return type at the end of the method signature. This particular variation one, this follows that same pattern, where your semi colon return value looks exactly how the layout of the method signature is where it's semi colon, what you see up top. There's a big parallel there between an early return with an optional return value. Also, I like optional things to be at the end. And when you look at this whole statement that's the optional part, whereas when variation two the optional part being in the middle means return optional part if, or return if are both valid things. So parallel is the method signature. That was kind of why I personally like the first one. They're both my children at this point I love them both. Derick Rethans 16:20 As you said, introducing syntax is always a bit tricky and it's a political choice. What has been sort of the feedback and, and or the criticisms, to your suggested that additional language constructs? Ralph Schindler 16:33 Smallest changes always get the most feedback, because there's such a wide audience for a change like this, like they can immediately see the benefits or negative value of it in their own code, all the way from the junior programmer, all the way up to the senior programmer, I can't quantify who's Junior new senior, I can't also quantify who has been programming a long time and it was, for lack of a better term set in their ways and likes their style versus those who have adopted a certain flexibility in the way that they develop and like the size of the team they're on and how much of a leniency they put on someone else to write code that they will just you know code review and accept. So the interesting thing is that you have to kind of understand Junior programmers, or senior programmers. When the junior programmer gets in there, and they start programming, they tend to write code that is very brute force, they just write a lot of code because in order to get better at writing code you just keep writing code. To them, their perspective is from the code writing standpoint, they're not looking at this from a code reading standpoint, they're looking at it from a writing standpoint. So when you see a junior programmer they rely on ifs and loops and like the rudimentary techniques, less abstraction, fewer methods, more lines of code. They tend to not break things out into well equipped to well named methods. Whereas as they grow as programmers they start reading other people's code more and then they do start appreciating abstraction like this 50 line thing needs to be a five line thing. It needs to have its own name as a method over here, I need to reduce the number of inputs, have a very specific outputs, so on and so forth. So it's more highly structured code. Putting a feature out, you know like this, you get a range of perspectives from people. It goes without saying. I mean, Taylor retweeted it, I know he has a preference for this style of programming. I know exactly where it came from. He appreciates certain things in like the Ruby world, the return if statements in Ruby is a clear, concise, and very impactful statement, and too much of a degree he's, he's implemented that same thing in Laravel. So if you look at the helper methods in Laravel someone that writes Laravel applications is used to using something like abort if, or throw if. Interesting side note here, PHP is going to have a feature where you can put a throw expression, following a ternary. That in and of itself, allows exceptions to have a much more concise syntax. It allows you to use PHP exceptions for flow control. So you still can't do that with a return value for example, you can't have it a ternary with a return value. And I guess that is another way of being able to do achieve the same thing. This idiom, of being able to going back to guard clauses, and going back to thinking about early exits of methods, this was prevalent in Laravel where you could say in a controller method, and this is specific to an HTTP context, because you're inside of a controller, abort if, abort is highly specific to HTTP, where are you going to return a 404 or 500, it's going to throw an exception, an HTTP exception, which the framework knows to convert these kinds of exceptions into error paths in an application. So again we're still talking about application code, not necessarily library code. So abort if and abort unless is an idiom that I've seen is a fantastic idiom for controllers. I mean you can when you're thinking about a request which PHP is highly request driven, you can see when I start this method with the request object, you know, these are all my early outs, you know, this is where I'm going to return, and then at the very final spot I might be returning a view, which is a successful page for this MVC application. I feel like it was a successful idiom there and that was also part of the reason that drove me say, you know, it would be neat. If I could just say, return response, if this condition and have that early out. Derick Rethans 20:12 What's been the biggest criticism so far? Ralph Schindler 20:15 Biggest criticism is we can already do this. See, I hear that all the time, with all sorts of other features to varying levels varying degrees. I can do this with if something return something early. I said earlier that the proposed syntax might not be shorter and that's true. It is just changing the order of the operators, or the order of the keywords but, you know, that's an important distinction, like I want the precedence of the return to be earlier in the line. I think that's the important distinction. And I feel like maybe people that are saying it doesn't reduce the amount of code need to take that into account. And it's hard to see it really take that into account, unless you see variations of this sort of mental model of code. That's on me. I've been taking all the sort of like criticism, I'm kind of in a cooldown phase right now. I've been looking, I've been watching Twitter, I've been watching the Reddit. It's generally cooled down on internals mailing list, and I'm just kind of thinking about it because going back to likening this to a political sort of thing is that I have to rephrase my argument so that people that have a very firm stance on: I don't like this because I don't like it, or I don't like this because it doesn't shorten my code. I have to find an argument that gets them to start thinking about why this might be a good thing. I understand like this might get shot down in PHP. Right now, if I was a betting man, we were in Vegas, and someone asked me: Do you think this is going to go through, I probably would have to bet against myself I think 40-60. The temperature that I've taken on internals and everywhere else seems to indicate that it wouldn't be successful, but I'm collecting my evidence right now and putting out a blog post that kind of explains why it is, what it is, and putting a better argument forward. If that can't push it over the threshold, you know, I'll accept the defeat, so to speak, look at the history of PHP: annotations, and whatever they were called attributes, eight years ago were shot down. And, interestingly, I use the annotations back in the day with doctrine, I'd no longer use doctrine. So I voted to accept them. I might have voted to not accept them eight years ago, and I voted to accept them now, even though I don't use a variation of that any more. Derick Rethans 22:15 There's a few things that keep changing over time, right, first of all people turn from junior programmers into senior programmers, so they think about how to structure code more and more. And at the same time they also start seeing the value of some things that PHP never had right and. A good example is the scalar typing, that's been spoken about for maybe 15 years even, and it took so many different approaches, and as you say attributes, although attribute is a little bit different because this RFC is absolutely not the same as the earlier ones where the implementation is quite different from the version one then end up solving lots of problems that people found with the original RFC. Ralph Schindler 22:53 I have not been part of sort of the global PHP community. I started in the mid, 2000s. And having worked with PHP since 1998. I remember the early days where PHP was not fast at all. It was as fast as other things, but I gravitated towards it because I liked the syntax. Back in that day, I would have had more of an emphasis on things that would run faster, regardless of how they look because, I had projects for example in college I wrote a program where kids would go up and like on Valentine's Day, put all their preferences in. That was a week leading into Valentine's Day, and then on Valentine's Day they could come back to the University Center, and get a printout of all the other people that have fill out the questionnaire, and matched. When you have 1000 people fill out a questionnaire, this was PHP in 2000, 99 on 2000. And when I tell you, it took hours for the script to run and calculate all of the matches for a person, changing just the way an if statement would run, or changing the way you early exited an if statement when you know that you had to filter out a person. It changed the output by hours. The code was very, very closely aligned to like the performance, whereas now, PHP eight: I don't think that we have so many more affordances. You don't have to think about: Should I interpolate strings inside of a single quote or double quote, like none of that matters any more. We've solved all those problems. You can call sprint off just as quickly as you can do an echo, echo out and no one really cares, it's gonna perform the same. Wasn't the case 20 years ago, it is the case now, so now we have this affordance where we can look at the, you know, for lack of a better term, you know, is the code pretty, like is it easy to read. Derick Rethans 24:32 Thank you all for taking the time this afternoon, or in your case morning, I think, to talk to me about your RFC. I'm looking forward to seeing this coming to vote at some point. Ralph Schindler 24:43 I appreciate you having me on the, on your podcast. Thank you. Derick Rethans 24:47 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Conditional Return, Break, and Continue Statements Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 56: Mixed Type v2 London, UK Thursday, June 4th 2020, 09:19 BST In this episode of "PHP Internals News" I chat with Dan Ackroyd (Twitter, GitHub) about the Mixed Type v2 RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:20 Weekly a podcast dedicated to demystifying the development of the PHP language. This is Episode 56. Today I'm talking with Dan Ackroyd about an RFC that he's made together with Mate Kocsic it's called the mixed type version two. Hello, Dan, would you please introduce yourself? Dan Ackroyd 0:38 Hi Derick. So my name is Dan Ackroyd, also known as Dan Ack online. I maintain the PHP image extension. And I also contribute to PHP internals illegitimate by maintaining some documents that called the RFC codecs that are a set of notes of why certain ideas haven't reached fruition in PHP core, and occasionally I help other people write RFCs. Derick Rethans 1:04 Continuing with the improvement of PHP type system in the last few releases. And we've seen a few more things coming into PHP eight but union types. For a long time, there has been an issue with PHP's internal functions that the type that a return cannot necessarily be represented in PHP type system because they do strange things. It is RFC building more on top of PHP's type system. What is this is trying to solve? Dan Ackroyd 1:29 There's a couple of different problems that's trying to solve. The one I care more about is userland code, I don't actually contribute that much to internals code so I'm not that familiar with all the problems that has. The reason I got involved with doing the mixed RFC was: I had a library for validating parameters, and due to how that library needs to work the code passes user data around a lot internally, and then back out to whether libraries return the validators result. So I was upgrading that library to PHP 7.4, and that version introduced property types, which are very useful things. What I was finding was that I was going through the code, trying to add types everywhere occurred. And there's a significant number of places where I just couldn't add a type, because my code was holding user data that could be any other type. The mixed type had been discussed before, an idea that people kind of had been kicking around but it just never been really worked on. That was the motivation for me, I was having this problem where I couldn't upgrade my library, as I wanted to, I kept forgetting has this bit of code here, been upgraded. And I just can't add a type, or is it the case that I haven't touched this bit of code yet. So coincidentally, I saw that Mate was also looking at picking up the RFC, and he had copied the version that Michael Moravec had been working on previously. I want as I mentioned earlier, I help people write RFCs is for a lot of people where English isn't their first language, it's a difficult thing to do writing technical documents in English. I also think that writing RCFs in general is slightly harder than people really anticipate. Each RFC needs to present clearly why something's a problem, why the proposed solution would work, snd, at least to some extent why other solutions wouldn't work. Looking at the text from the previous version I could see the tool though, I understood, all of the parts of that RFC, I don't think that it made the case for why mixed was the right thing to do in a very clear way. So I spent some time working with Mate to redraft the RFC, discussing it between ourselves and going through a few of the smaller issues before presenting it to internals, for it to be officially discussed as an RFC. Derick Rethans 3:51 Where does the name mixed actually come from? Dan Ackroyd 3:54 So, mixed is actually a very old concept in PHP it's been used in the docs for multiple decades. I think we have multiple core contributors who are younger than the mixed type, which is an interesting situation for a language to be in. It had been used in the documents, all over the place. It has been used to show that the type of a parameter, or return type from functions was quite complicated. It's actually slightly different from how people might use it in userland code. A lot of the places where it's used in the docs would now use a union type there instead of the mixed type. But there are still places where mixed is the correct type to use in the documents. Derick Rethans 4:40 This being an RFC, you're proposing something to do in it. What are you proposing to introduce into PHP? Dan Ackroyd 4:46 To be precise, the RFC proposes being able to use the word mixed as a type to be used for parameter types, return types, and property types and mixed is really a shortcut for something that can be done in Union types, mix is the equivalent of writing array or blue, or callable or int, or float, or no object or resource or string. One of the benefits of mixed is that it's much shorter to type but the full equivalent to that. Derick Rethans 5:18 And you'd have to do is every time you use it. Dan Ackroyd 5:20 It's particularly hilarious when you've got a function that accepts any type of parameter, and then returns that parameter, that's been modified. So you have mixed on the way in, and mixed on the way out, having all of those words on the same line of code is just too much. Derick Rethans 5:35 Does the mean that makes is pretty much implemented as a union type? Dan Ackroyd 5:39 I have no idea. I'd have to refer you to the actual implementation which I can't recall the details off, off the top of my head. The actual internal type checking in PHP is not as clean as you might imagine, from userland, particularly around things like callable, that's not, it's not a straightforward path of code for tracking, whether something's callable. It works as union type, but how it is actually implemented internally, is probably more detailed than that. Derick Rethans 6:07 I'll have a good book, a little bit later than. As, you set a sort of acts as a union. But Union types, and variance are quite tricky. And then I spoke with Nikita about union types, it wasn't the clearest explanation because it's a really difficult concept, right. So how does the mixed type interact with variance in either arguments or return types properties? Dan Ackroyd 6:30 I agree completely. Variance's complicated thing, and liskov substitution principle is a reasonably complicated thing. Full disclaimer here, I am not a computer scientist, I didn't study computer scientists in University. I studied chemistry and molecular physics, and the only formal education I've had in programming, was a single 10 hour course that taught us how to use Fortran 77, which is a lovely language for the 70s, not quite so good for the 1990s when I was learning it. I think people concentrate too much on the theory behind computer science. If I read out the general rule of LSP or liskov substitution principle. It says: For each object O1 of type S, there is an object of type T, such that for all programs P defined in terms of T, the behavior of P is unchanged. When O1 is substituted for O2 and S is a subtype of T. I don't fully understand that. I mean I can go through it and understand it in principle, but I don't understand it. I don't grok it at a fundamental level when I'm writing code, for me a better way of thinking about LSP is to simply say that: if your code follows LSP, then it's probably not going to blow up. If you violate LSP, your code has a very good chance of blowing up. For both parameter types and return types, the way that PHP implements the type checking through variantce, the type checking is done to make it conform with LSP, but the simplest way of putting it is: make sure that your codes not going to blow up on bad assumptions about the types that being passed around. Derick Rethans 8:17 Because PHP does it adhere to LSP your lovely new mixed type does have to adhere to it. How does your lovely new mixed type tie in with LSP and variance specifically because mixed is a little bit special. In some cases, because at the moment PHP if you have a method. And you return nothing from it, sort of acts like mixed. So I saw that in the RFC there is a specific handling of having no arguments going to mixed and then back to no type. Dan Ackroyd 8:48 The RFC; one of the details, is when no type is present for a functional term the signature checks for inheritance are done as if the parameter had a mixed, or void type, so that's a union type of mixed and void. That's the correct thing to do. It makes the code work as you'd expect it to do, and avoids any possible scenarios where you'd make an assumption about the method in the parent class, and that assumption not being true in the child class. I think this is one of the areas where PHP's special behaviour, shines through. This might not be an acceptable solution to people who work in languages that have a cleaner type system, but they probably stay well clear of PHP to begin with, but the details of how it works means that the code behaves as you'd expect it to and doesn't blow up. Derick Rethans 9:42 Well, that's the reason why void isn't part of the mixed union? Dan Ackroyd 9:47 Mixed and void are related, but quite different from each other. Mixed is a guarantee that for return types. It's a guarantee that a parameter will be returned, but you can't, we can't give you any more details of what the type of that parameter will be. Void, is a guarantee, in quotes, that no value will be returned. I actually strongly regret void being present in PHP. I think it was a mistake. One of the very nice things about PHP is the way that every function returns null, even if you don't have a return statement in that function. This is something that's quite different to a lot of other languages where it's common to have functions declared as void return type, so there's no return value at all. Because PHP always return null, it allows you to do things like var dump, then put a function inside var dump bracket, and that's always guaranteed to not blow up. I would have strongly preferred us to introduce the null type to PHP, and for people to use that, when they're not returning a more semantically meaningful value from their function. I think that would actually be a lot better into the PHP type system, and make it a lot easier to write code, that's chainable. Derick Rethans 11:10 The only real locations where it can't return any values is a constructor and a destructor in PHP. Dan Ackroyd 11:16 It would still have a use for functions that never return. So like continual loops, and also functions that only ever exit for by throwing an exception. I think TypeScript has this, I think they call it none. I can't remember the details but it has its uses but the way that most people are using it in PHP is wrong, in my opinion. The reason I still get a little bit worked up about this is because people are still suggesting that we should change the behaviour of the language to match the void return type. I.e. make it so that if you try and use the return value from a function that has a return type of void that PHP should blow up. I just strongly disagree with that, I think, returning null so that functions can be chained together. Even if there's no semantically useful information there is preferable to having code blow up through trying to read the result of a function. Derick Rethans 12:12 Because it's a bit different than in statically typed or compiled languages where you can do all these checks in the compiler right? And never had runtime, whereas in PHP these checks always have to happen at runtime. Dan Ackroyd 12:23 They do but I think it's at a different level than that it's just does, being able to define the fact that we're reading from a particular function should make the program blow up. Is that a useful thing to do or not? This is quite similar to another discussion that pops up every now and again, of whether to make PHP blow up if too many parameters are passed to functions. There's people who strongly feel that this is a terrible thing to allow, that we need to punish anybody who has extra parameters, being passed around. I actually find having extra parameters be a useful debugging technique very occasionally. Imagine scenarios, in scenarios where you've got an interface that comes from a library that's implemented in 10 different classes in your code, but you want to debug one particular implementation. Just being able to temporarily add on some extra parameters to a method call, and have that just work allows you to do some debugging techniques that just wouldn't be possible if PHP blew up when extra parameters get passed. This is similar, really similar to the void discussion where people have very strong feelings about, we need to punish people who are writing code wrongly, we need to stop that code from working. The other way that yeah it's not great code, and maybe they might want to refactor their code to not do that, but I can't see any benefit in making PHP blow up. Derick Rethans 13:49 In my opinion, this is I think that belong in project's coding standards, and their static analysers that they run over the code to make sure that they do all our stylistic choices correct, and not having too many arguments to methods is exactly belongs in that category. Right. Dan Ackroyd 14:05 I agree completely. Derick Rethans 14:06 There's a few more things that I'd like to poke your mind about. The mixed type does not include null, is there a reason for that? Dan Ackroyd 14:14 We discussed this a reasonable amount when drafting the RFC, there's reasons to allow nullability, but what we couldn't see was a clear strong need of why nullability would be required. The mixed type includes null as one of the types and the union of the types of represents. So, adding nullability doesn't actually add any more, more information to the mixed type, because by definition, it's already can be null. It's always possible to add more to PHP core but removing features is really difficult. So we decided to leave it out, for now, just because we can't think of a really strong reason to add it. If someone finds a really clear compelling argument to allow mixed to be nullable, I would definitely be in support of that so long as there was a reasonable reason to have it. What I probably prefer before that, though, is it's kind of odd that the null type isn't usable as a type in PHP by itself. I think that's unfortunate because for union types, imagine you've got some code that can, it's going to return either a float or int, and then you find a reason why it might need to return null. Changing the definition from float or int, to float or int or null, is easier to read for me than question mark, float or int. So I think that might be another RFC that pops up on the radar in the not terribly distant future. Derick Rethans 15:38 Time is running out for PHP eight little bit of course. So resource is part of mixed, but resource as a type you can't use as a type hint anywhere in PHP. So what's going on here? Dan Ackroyd 15:51 Resource is more of a pseudo type, then a real type in PHP. It comes from code that was written before PHP even had classes is my understanding. Though obviously that's from the dawn of time so it's hard to figure out where. When people started writing PHP, they used resource, as we use classes now to represent a complicated bit of state that needs to be passed around from one piece of code to another. The problem with resource as a type, is that it doesn't really tell you that much about the type. If something is a resource, it could be a file handle, a curl handle, a GD image, an XML parser, or any of the other things that are called resource types. It's an ongoing piece of work to slowly refactor resource types away and replace them with classes wherever possible. An example of that is the hash context, used to be a resource type in PHP and I think since PHP 7.2 that's been changed to a class. Work's ongoing, and eventually hopefully most of the other resources will go away, and made into more specific types, but in the meantime resource still exists in PHP. The reason that's included in the mixed definition is because it's a reasonable thing to do to pass a file handle around. And so if you've got a parameter type of mixed. It's absolutely fine to pass in a file handle to that piece of code. Excluding the resource type would make the mixed type be too annoying to deal with because your, your code would then deal with all the other types, except resource. Derick Rethans 17:21 That make sense. As I mentioned in the introduction mixed is already something that's used in a PHP documentation for a long time, and the RFC talks about stubs in PHP. This is something that is going to be introduced with PHP eight as well, what are these stubs. Dan Ackroyd 17:38 I haven't contributed to any of this work so I apologize to anybody who has been doing this piece of work if I get any of the details wrong. One of the problems with PHP core was that for a long time, the information that was used to generate the reflection information was done on a very ad hoc basis. Some of the information was incorrect, and keeping the reflection information up to date with the actual definitions of how the functions work was annoying, to say the least. It's been an effort by a number of the core contributors to set up a system of file stubs, that allow people to write PHP code that defines a stub for each of the internal functions. So that's just like literally a PHP file that has a stub version of the function that just defines the parameter types, parameter names, and the return types. My understanding is that that information is then used internally by the PHP eight build process to generate the reflection information extract the parameters where appropriate, and could be used for features like named parameters where the name of a parameter in those stubs, the name would be coming from the stub file, rather than some random C file in the middle of the PHP core code. Derick Rethans 18:53 And the stubs at the moment can't represent mixed. There's still a hold on, with comments. Dan Ackroyd 18:58 That's correct. This is similar to what I was finding with my own libraries that there were just some things that you just can't currently, add type information for. And it was quite frustrating having to, oh no somebody hasn't missed this one it's just not expressible. Another reason for having mixed is that although generics are going to be still quite a long way off from arriving in PHP. If you wanted to express just a generic array that can contain any possible value. That's another case where the mixed keyword would be used. Derick Rethans 19:29 I've saw some people ask why mixed was chosen here and not any. Is there any specific reason for that? Dan Ackroyd 19:36 The very short reason is that it was easier. Mixed has had a mixed concept for multiple decades, mixed is used widely in PHP core code and documentation. It's also used widely in a community for tools like PHP Stan and Psalm where people use mixed in docblocks, or Psalm annotations to indicate any type. It's really widely established. We did discuss, using any instead. It just didn't seem worth the effort of trying to push it through, at least in part because there's so much legacy going on. Also it's just not clearly that much superior to mixed. Derick Rethans 20:16 Very well. Are there any BC concerns by introducing the mixed keyword. Dan Ackroyd 20:20 That's a small BC break, you can't use mixed as a class name or function name probably any more, but it's a pretty small one, and anybody using an IDE can just add as using a function called mixed in their code can right click on the function, rename, maybe go and get a cup of coffee if that IDE is slow. There is also tools in the PHP community. This is actually quite a surprising thing that PHP has one of the best refactoring tools out there in Rector. That's a tool that, because it understands the abstract syntax tree of PHP, it can understand that: Oh hey there's this new BC break in the next version of PHP. In this case, if you have some code that had a class name mixed it would understand this is going to break. They provide sets of tools for allowing you to upgrade your code automatically. It's a really awesome tool. It's slightly surprising to me that it's probably like one of the best code refactoring tools, if not the best, in any software language. I've looked at some other language's ecosystems, and I think one of the things about PHP is that because it's actually quite a diverse ecosystem, and people sometimes migrate from Symfony to Laravel, or want to upgrade a PHP 5.6 codebase to PHP seven, or those types of things to value in a refactoring tool is a lot higher. Somebody has gone out and done the work to make that tool, and it's really pretty good. Derick Rethans 21:46 Sounds like something I should investigate a little bit then, because I actually had never heard of it. Also make sure to either link in the show notes to it. When you're introducing yourself, you mentioned that you're the maintainer of the image magic extension and PHP that you can use to manipulate images. What's going on with this extension? Is there going to be an upcoming release at some point? Dan Ackroyd 22:05 I want to apologize to everybody for being very lazy and not doing a release, even though there's a small segfault, that happens occasionally, and it's which we have a fix for. To be honest, I don't really use the extension at all myself. And so, maintaining it is more source of stress rather than enjoyment. I know there's many, many things that could be improved for the project including doing releases on a timely basis, and improving the security of how it works, but it's just really hard to justify spending time working on it when it's just a source of stress for me, but it doesn't really provide any benefit to me. As an effort to make it be worth my time effectively or at least give me a gold focus on, I'm going to start asking people to donate money to the projects, to sponsor it, just that I can actually justify myself getting stressed out from trying to help people with impossible to solve bugs that only happened on their system, because otherwise it's just a bit too much stress for me to really want to spend any much, much more time working on it. Derick Rethans 23:08 Very well, do you have anything else to add? Dan Ackroyd 23:10 Yes, I have a big request, and you've done this a couple of times during this interview. I'd very much appreciate it if everyone in the PHP community could refrain from using the word hints. When talking about types. It used to be that PHP type system was just hints where yeah the documentation says that this function takes an int, but that was just a hint, and it wasn't really enforced. The type system in PHP has evolved into an actual type system that is enforced at runtime, and although it's not a big deal. It does help when talking amongst ourselves as a community, but also when we're talking to people who don't do that much PHP, who are coming from other languages, where their type system is still just a set of hints. Using a slightly more precise language of the PHP type system and parameter types, return types, and property types. It avoids any confusion about what's actually happening in the engine. And if that is my windmill that I tilt at. Derick Rethans 24:11 Alright, thank you, Dan for taking the time this afternoon to talk to me. And I will be looking forward to seeing mixed in PHP because it got accepted, just earlier, yesterday I think. And, yeah, part of PHP's improving type system again. Dan Ackroyd 24:25 Thanks for having me on. It's been a pleasure. Derick Rethans 24:28 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool, you can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes i RFC: Mixed Type v2 Rector for refactoring RfcCodex Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 55: Dealing with Bugs London, UK Thursday, May 28th 2020, 09:18 BST In this episode of "PHP Internals News" I chat with Ignace Nyamagana Butera (Twitter, GitHub, Blog) about how the PHP project handles bugs and bug reports. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 55. Today I'm talking with Ignace Nyamagana Butera after he'd asked me on Twitter, how PHP deals with bugs. A few episodes ago, I did a Q&A session about the RFC process. And this time again, we'll have Ignace Nyamagana Butera asking the questions. Would you please introduce yourself? Ignace Nyamagana Butera 0:46 Hello, everyone. Hello, Derick. My name is Ignace Nyamagana Butera, but you can call me Nyamsprod. I've been a PHP developer for around 15 years now. Currently, I'm working as a software developer, and technical lead in the internet content provider agency. When I have free time, I'm doing some open source, I have a couple of projects that you may have heard of, like, league CSV and league URI. I created them and I am currently maintaining them. Derick Rethans 1:23 Yeah, as I said, it is not me asking the questions as you this time. So I think we should jump straight in actually. Ignace Nyamagana Butera 1:30 So my first question will be somehow really simple, because we are talking about bugs. And I was wondering if we had some statistics about bugs in PHP. Derick Rethans 1:44 Though there are some statistics. I mean, it's not really easy to get that information out of our bug system. But just having had a look, it's about on average, maybe one bug a day gets reported at the moment or is nearly 80,000 bugs in the bug system of course, not all of these are closed, some of them are open, but the majority of them are closed. Ignace Nyamagana Butera 2:07 Do bugs from the EOL PHP still being taken into account or we just say: okay, these bugs for instance, are for PHP five, will no longer look at them. Derick Rethans 2:18 If it's a bug, unless it's a security bug fix, we won't look at them for unsupported PHP versions. So at the moment, PHP, seven three, and seven four are still supported. So those bugs will of course look at, if it's a security bug, we only will go back to PHP seven two. If it's reported to any older version and seven two for example, seven one or seven zero, or even PHP four or five, which does happen occasionally, we'll tell them to upgrade first because we won't spend time doing that. Ignace Nyamagana Butera 2:47 Because I manage and maintain open source project. I know that PHP as a language is used everywhere and you can have multiple reports. First thing first, what is a bug? Because there are multiple definition of it. Derick Rethans 3:03 And I'm sure if you asked 12 people, you get 13 definitions. I think it is unexpected behavior of something that is documented. So if something is documented do this, and it does something else, or it does something really wrong like crash your program, then that will be a bug. Ignace Nyamagana Butera 3:21 What is the source of truth? Is it the PHP documentation? Is it the PHP specification language, what is the source of truth? Nothing. Okay. This is expected behavior because it is documented, or how does it work? Derick Rethans 3:38 For most of the syntax, it's what the source does. And of course, you always find edge case. And I don't have a good example right now. For anything that the syntax, I mean, documentation and behavior should absolutely always work the same. If it doesn't, it's likely going to be a bug in the documentation. If you for example, look at other functionality like in an extension, there is almost as likely that the documentation is sometimes wrong than it is that the code's behavior is wrong. In that case, we need to have a good look at what what the expected behavior should have been. Now, with all the new features that have been put in, since we have the RFC process, pretty much anything that the RFC describes how it should work, is how the feature should work. And if it doesn't, that pretty much means there's a bug. Having said that, not everybody writes on all the expected behavior for all the functionality that an RFC has been put up for. And in those cases, you just need to see what makes the most sense whether it's about core feature. Ignace Nyamagana Butera 4:40 What is the best way to report a bug? Okay, you have to go to bugs.php.net, I suppose. Yes. But apart from that, what is the best way to report a bug? Derick Rethans 4:51 As you said, PHP is issue tracker is bugs.php.net. It tells you to fill in your problem, your expected behavior and what you actually get out, what is always really important for people to be able to fix an issue and to find out whether there is an issue to begin with, because that's not always the case either of course, is always to have a short reproducible script that reproduces your problem. And by short, that means it the short you can get it. 10 lines at most for most syntax features who probably do the job. In some cases, if it's a bug for a database related system, then of course, there's going to be some database setup necessary for it. But if it's just syntax, then a short script that reproduces the problem that shows what goes wrong, is really important. And of course, it's also important to say what it did, and what you expected it to do. Also, don't lie about your PHP version, because in some cases, people try to report a bug with a higher PHP version than they're actually using, which is kind of frustrating at times. Ignace Nyamagana Butera 5:52 I guess that yeah, if we report something that didn't work in PHP five, but it was fixed in PHP 7.2 or PHP 7.3 everybody loses a little bit of time. Derick Rethans 6:02 And in some cases people find a bug report for, say, PHP 7.4.1. Right, and we're currently at 7.4.6. We will always ask them first to upgrade if they can, because upgrading PHP should take a lot less time than trying to reproduce and fix a problem that has already been fixed. Ignace Nyamagana Butera 6:20 And what is the strategy between the release of each version of PHP and the bug fix? Does PHP wait for all the bug fixes to be done and then a release is made. Or if for instance, I report a bug like today before a release is scheduled, then this bug will be skipped from the next release and will be tackled after Derick Rethans 6:46 Every minor version of PHP, be at seven two, seven three, or seven four a moment, has a release every four weeks. Two weeks and two days before a release gets made, we make our release candidates. Everything that has made it in the release candidate will make it into the release. If in between the release candidate gets created and the final release, if bugs get fixed, unless they are really critical, they will make it into that release. But we'll have to wait until the next cycle. So we don't necessarily wait for all the bugs to be fixed before we make a release. Now, there is an exception here, and that is for security bugs. If you find security bugs, they don't end up in a normal PHP seven four branch. They get committed to a security repository that very few people have access to. And these security bug fixes. They get merged into the release branches two days before the release comes out. They don't end up in a release candidate builds because we don't want people 16 days to be able to exploit security bugs if they are remote exploitable, for example. Ignace Nyamagana Butera 7:53 And can security bugs, or critical bugs push a release? Derick Rethans 7:59 Technically, yes. If somebody ends up finding, like a remote exploitable bug in PHP, then there will be an emergency release for them. But I can't remember the last time we had to do that. Ignace Nyamagana Butera 8:10 I remember, like one or two years ago, there was a bug that was going from the bugtrack to the internal mailing list and coming back again to the bugtrack, because there was some kind of indecision to know if it is a bug, or if it should be a feature. How is this possible? Derick Rethans 8:32 We don't really have a set method for doing this. But our bug tracker isn't the most advanced system in the world. And sometimes it just makes sense to trash out a discussion over email on our PHP internals mailing lists, or sometimes these discussions happen on other chat channels as well I'm sure, just to go through to see what's the case. And sometimes if it is hard to take a decision while there's a bug, then it is always a good idea that more PHP core developers have a look at it and see what's going on there. So sometimes it makes it easier if that's discussed on the mailing list, then in the bug tracker. Ignace Nyamagana Butera 9:04 Is it possible that for instance, someone submit an RFC. And then during the course of discussion of this RFC, it becomes clear that this is not an RFC, but more of a bug fix. Derick Rethans 9:16 I don't think I can think of an example here actually. Ignace Nyamagana Butera 9:19 I remember one example. Derick Rethans 9:21 Okay. Ignace Nyamagana Butera 9:23 Because I think it was yeah two years ago about the behavior of the CSV escape character. And I remember at some point, it was suggested to be an RFC. And because of the amount of background compatibility breaks, it was better to treat it like a bug. But I remember when between the bug tracker and the note sufficient there was a whole discussion to exactly being able to say: Okay, this is a bug. And this is an RFC and it was really not, it was a call at the end saying, okay, we will treat it like an RFC, and we will change the way the escape corrector works today. But it won't be as impacting as if it was an RFC that introduced a completely new behavior Derick Rethans 10:12 CSV is a very difficult format, because everybody slightly implements a standard in a different way. And the way how it originally got implemented in PHP for reading CSV files was done in a very different way than for example, what Microsoft products would create. I mean, it has to do with escaping, if I remember correctly. And I mean, what do you decide, right? I mean, since then Microsoft have made a specification for this. And of course, what we then want to do in PHP is to make sure that we support a specification, but by doing so, we will then break previous behavior, and that is always a really difficult decision to do, right. If it is very clear that it is a bug, then we don't mind changing PHP, even though that could technically break people's code. But if it's unsure or whether it's based on a subjective decision, then that makes it a lot harder to write because we can't definitively say that, yeah, we have a bug here. But if we look at other codebase out there, so many people rely on this. So is the old behavior bug, or is it a feature in PHP? I mean, these things, you have to take one by one, and it's very hard to decide on what is what is a feature, and what is the bug in this case. Ignace Nyamagana Butera 11:22 I think another subject that comes with bugs is people should be able to fix them. But I suppose that every one of us has a work and who can fix those bugs? Derick Rethans 11:33 Technically, everybody who has time and know C code could fix a bug. PHP is an open source projects. Our repositories are available on GitHub, or on git.php.net, which is our source of truth, although most people submitted bug fixes against the GitHub repository because it makes it easier to review them and comment on pull requests, for example. But it's open for everybody. It's the same thing about triaging bugs. Trying to find out if the bugs that are actually reported are actual bugs and the bugs.php.net website has in the top right hand corner, it has a random link. And if you click that you get a random bug that hasn't been resolved yet. If somebody, if any of the listeners, or maybe you, are interested in looking at these bugs or wanting to attempt to fix them, click random and see what happens. Maybe you get something interesting, maybe because something really complicated, but in any case, it's possible for everybody to fix a bug. They will get reviewed. For a good enough bug fix it will get merged. Ignace Nyamagana Butera 12:31 People are usually thinking when they think about open source nowadays they think about semver and people may think that if they look at the versioning of PHP, then they have an idea of it is a patch release, it is a bug release, it is a feature release. How is this related to bugs and how is it versioning of PHP working? Derick Rethans 12:53 PHP's versions number consists out of three numbers. At the moment, we are the latest version is 7.4.6. The six is your bug fix release. In bug fix releases, there will not be any new functionality. Unless there are very minor, small contained parts in extensions. We tend not to want to have these. And unless you can make a good case for it, it's unlikely to happen. But it isn't unheard of. An example I think I can remember is that open SSL, added a bunch of new API's in there, and other technically new function functions in PHP, they sort of had to be supported, because as part of making sure that you could run the latest version of open SSL or something like that, but that being an exception there. Now, the middle number, traditionally, in semver, is there for features, right, you've bump the middle number, the middle digit, if you have new features, and that is the same in PHP. What we don't really have is a major number that indicates that we are going to break things. The major number in PHP is mostly a marketing number. So at the moment, we have PHP seven four out there. We don't have PHP eight zero next. But that is pretty much a PHP seven five, but with additional functionality that we find important enough to bump the major version from seven to eight for. Having said that, we do have a rule that we don't remove functionality, unless we bump the major number. For example, from five to seven, or from seven to eight. So there will be in the course of time, we might deprecate functionality, we don't tend to remove that until we bump the major number. And you also see that if the major number gets increased, that there is potentially more effort in removing or deprecating more functionality that would otherwise do say for example, it changed from 7.3.0 to 7.4.0. But it doesn't mean that we don't bump major numbers so that we can break all the things for example. So I think the PHP protect tries to, we don't always succeed of course, try to never break people's code. Unless it's a bug fix Ignace Nyamagana Butera 15:03 That was it for my questions. Derick Rethans 15:06 Maybe I have some questions for you now. I think it is good to talk about these issues. What are you most surprised with in the way how the PHP process handles bugs and bug reports? Ignace Nyamagana Butera 15:15 The first thing is, like I say, I've been coding in PHP for more than 15 years, but I only started really to report bugs once I start doing some open source project. Because before I think, and I think it's the majority of people, it's like, yes, there is a bug, oh it's something for PHP, or for any kind of language. I'm not the maintainer. So it's a bug, someone else will report it not to me. Since I've changed because I'm doing myself some open sourcing. I'm like, hey, if I found a bug, I think the best way to resolve that bug is first, to report it and to report it correctly, to the project, to the language or to whatever has that bug. And once you've made this change of how you think about the language, then you start to ask yourself, okay, how can I do it the most efficient way so that the bug get reported? And then the bug can get tackled by the people who can. Derick Rethans 16:19 Yeah, and the start of that, as you say's, always send us a bug report or sent your favorite open source project a bug report. Ignace Nyamagana Butera 16:26 Exactly. Derick Rethans 16:27 I can sort of see where you're coming from. Because I can understand that if you're just in an agency, for example, and the only thing, the only thing you have to do is to make sure that your project is done on time. You can't necessarily wait for the bug to be fixed in PHP anyway, because the product needs to be done by tomorrow or yesterday. And you're going to have to find a workaround you issue in that case anyway. And then you spending time reporting the bug will just takes you time and you don't have time for that, for example. But of course, if you do that, then everybody else that runs into this bug will have to come up with a workaround, and that means you're all end up wasting lots of time. Ignace Nyamagana Butera 17:04 I remember I had a small story. In one of my previous jobs, someone came to me and we're talking about something and he said: Oh, but there is no constant on the DateTimeImmutable. That's very sad. And I said: no, there is because I remember I submitted the bug, and it was tackled. And now the constants are on the interface. So DateTimeImmutable has the constant and was like: Oh, yeah, but I didn't know. And I was; it was reported and someone use it. And if you don't report it, then maybe in two years, you will ask yourself the same question. Indeed, it takes time. Between the moment it is reported the moment it is tacked, because people need to have time to resolve the issue. But if you don't do the first step, which is reporting it correctly, then it will never be solved. Derick Rethans 17:53 And by correctly that also means doing in the PHP bug tracker and not complaining on Twitter. Ignace Nyamagana Butera 17:58 Exactly. Exactly. Derick Rethans 18:02 Of which I see quite a bit of for Xdebug for example. Thank you very much for taking the time to talk to me, or I should say thank you very much for taking the time to interview me to talk about bugs today. I hope you enjoyed this. Ignace Nyamagana Butera Thank you for having me. And hopefully we'll meet again. Derick Rethans I'm looking forward to that. Thanks very much. Ignace Nyamagana Butera 18:21 Thank you. Derick Rethans 18:23 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes League CSV PHP Bug Tracker Random Bug Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 54: Magic Method Signatures London, UK Thursday, May 21st 2020, 09:17 BST In this episode of "PHP Internals News" I chat with Gabriel Caruso (Twitter, GitHub, LinkedIn) about the "Ensure correct signatures of magic methods" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick, and this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 54. Today I'm talking with Gabriel Caruso about his ensure correct signatures of magic methods RFC. Hello Gabriel, would you please introduce yourself? Gabriel Caruso 0:37 Hello Derick and hello to everyone as well. My name is Gabriel. I'm from Brazil, but I'm currently in the Netherlands. I'm working in a company called Usabila, which is basically a feedback company. Yeah, let's talk about this new RFC for PHP eight. Derick Rethans 0:52 Yes, well, starting off at PHP eight. Somebody told me that you also have some other roles to play with PHP eight. Gabriel Caruso 0:59 Yeah, I think last week I received the news that I'm going to be the new release manager together with Sara. We're going to basically take care of PHP eight, ensuring that we have new versions, every month that we have stable versions every month free of bugs, we know that it's not going to happen. Derick Rethans 1:17 That's why there's a release cycle with alphas and betas. Gabriel Caruso 1:20 Yeah. Derick Rethans 1:21 I've been through this exactly a year early, of course, because I'm doing a seven four releases. Gabriel Caruso 1:25 Oh, nice. Yeah. So I'm gonna ask a lot of questions for you. Derick Rethans 1:29 Oh, that's, that's fine. It's also the role of the current latest release manager to actually kickstart the process of getting the PHP, in this case, PHP eight release managers elected. Previously, there were only very few people that wanted to do it. So in for the seven four releases it was Peter and me. But in your case, there were four people that wanted to do it, which meant that for the first time I can ever remember we actually had to hold some form of election process for it. That didn't go as planned because we ended up having a tie twice, which was interesting. So we had to run a run off election for the second person between you and Ben Ramsey, that's going to go continuing for you for the next three and a half years likely. Gabriel Caruso 2:11 Yep. Derick Rethans 2:12 So good luck with that. Gabriel Caruso 2:13 Thank you. Thank you very much. Derick Rethans 2:15 In any case, let's get back to the RFC that we actually wanted to talk about today, which is the ensure correct signatures of magic methods RFC. What are these magic methods? Gabriel Caruso 2:24 So PHP, let's say out of the box, gives the user some magic methods that every single class have it. We can use that those methods for anything, but basically, what magic methods are are just methods that are called by PHP when a given action happens to the class. So for example, if a class is being constructed, then the construct magic method is going to be called. If I'm calling serialize function, then the magic method serialize as per PHP seven four or PHP eight. I don't remember, so this is basically what magic methods are, are methods that PHP hook into the classes and then once a certain action happened with the class, then PHP is going to call those magic methods in something magic, so to speak is going to happen. Derick Rethans 3:13 And other options are like underscore underscore get, and underscore underscore set. Gabriel Caruso 3:17 We have, we have a lot. Derick Rethans 3:19 Exactly, what do people tend to use these magic methods for? Gabriel Caruso 3:22 So that's something interesting. As the magic method is called by a number of actions we can use, for example, for let's let's get the example of ORM for example, Doctrine or Eloquent or whatever one. Let's say I'm a maintainer of that library. I don't know what fields do you have in your database. So when I'm porting, when I'm doing the translation, what it can do is map in a property, all those columns and values that I have in the database. And then when you instantiate your entity and you try to access a variable that is does not exist, then we're going to go to a magic method in this case is get, as I said, and I'm going to say okay, is not set in the class, but is mapped in the entity that I have. So this is one case, we also have the case for testing your you have, for example, the famous PHP Unit test framework, every time that a test case is called with all those methods is starting in with test, the call magic method is invoked. And then you can perform whatever action you have. You also have middlewares and the examples go go even further Derick Rethans 4:32 In the title of RFC you have the word signature, what is the signature? Gabriel Caruso 4:37 All the attributes that our method can have. So for example, the name of a method is its signature, what does it return? What parameters does it take? And also what modifiers so for example, is it static or not? Is it public, private or protected? So all this information together in usually is one line in PHP. So for example, private static MyMethod, that receives a string and returns a Boolean. There you go. This is the signature of my method Derick Rethans 5:06 Because some of these magic methods have been in PHP for a long long time. Back in the time where we didn't have argument types or return types or perhaps not even static. All the way back from the past PHP hasn't really done anything with signatures because they've simply didn't exist. At the moment which signature checks this PHP already do? Gabriel Caruso 5:26 I don't remember a by the RFC but I think was introduced together with the scalar type RFC. But only constructors and destructors until PHP seven four, those two only magic methods were being checked. If they have none return type, not even void, just no return type. But in PHP eight, we're gonna have the new stringable interface and then every single toString magic method. If it is typed, this is very important if it is typed it needs to be a string and these are the only from the 17 that we have only three in PHP 8 are being checked. Derick Rethans 6:01 PHP seven four. Gabriel Caruso 6:02 Yeah, in PHP seven four only two and then PHP eight, we have the new toString. Derick Rethans 6:07 But this RFC suggesting to change that of course. Gabriel Caruso 6:10 yeah. Derick Rethans 6:11 What's the reason why you want to extend these checks to the other magic methods? Gabriel Caruso 6:14 That brings me back how I figured out that. I was looking at some bugs, because we have the https://bugs.php.net, where we centralized all the bugs of PHP. Then there is a bug report explaining in complaining exactly about that. Like, I can't hide my magic method. Back in the days I can say, for example, that my tostring method is going to return an integer or a Boolean. That makes no sense. And then I was like, yeah, makes makes no sense. We need to fix that out and then I start to search how do we type that? How what types do we have and then I was like, we can't in PHP eight, because this is going to be a new major version. So we are allowed to at least vote for do that. We can check if someone is using types, we can check those types. We are not going to force, we are not going to require, we're not going to evaluate even run static analysis. Nope, we're going to simply check. Okay. Are you saying that this get magic method is going to return anything? Okay, that's okay. Oh, but I want to my guess is that you specifically return a string. That's also okay. As to how to pronounce that liskov mistook principle, right? Derick Rethans 6:36 The liskov substitution principle. Gabriel Caruso 7:26 Yeah. And so this is what we're going to basically do with this RFC, there's going to be voted. We're going to simply check if you're using the right types, because, in my opinion, magic methods are a foundation in PHP. As we have theses methods across different code bases across different projects from different behaviours, at least when I'm looking at that code. Okay, I'm looking at this magic method. I know what parameters does it take. I know what return does it have. This is worth less tab to the bug are trying to understand what is happening. Because today maybe I'm debugging a toString method there is return an integer. And I'm like, okay, this is the bug, it's supposed to return a string. But once you ensure those all those signatures, is one less bug that we're gonna have in production. Derick Rethans 8:17 When are these signatures being ensured? Gabriel Caruso 8:19 It's not at compile time because he does not have a compile time. But he's when the Zend machine is compiling the code, we have a very specific method that is checking all the modifiers. So for example, the signature that we mentioned before so all the magic methods needs to be public. This has been checked, for example, they callStatic magic method needs to be static. So this has also been checked. And then I'm extending how do we check for signatures for param types and also for return types. So during compilation of the Zend VM. Derick Rethans 8:52 Taking as example callStatic in the RFC, I see that the name has to be a string and the arguments has to be an array. What happens if you use a different type there? Gabriel Caruso 9:01 So nowadays if you use a different type that's allowed. So if you say there, you're going to receive an integer, and you're going to receive a string. This is allowed today. And this is what I mentioned about when you are debugging or analyze different code bases, you're going to be like why in the documentation says that we need to receive a string and an array, and there's this specific code base is receiving a string and an integer. So this is what kinds of mismatch I want to avoid. Of course, when using types, because we also know that PHP in some projects does not use types. And that's perfectly fine. If you're not using types, I'm not going to ask you, hey, you need to type those magic methods. Well, what I'm going to do is okay, you're using types and I need to make sure they're using right otherwise this is going to be a mess. Derick Rethans 9:47 If you type it; say use an integer for the name of underscore underscore get, will give you a warning or a compile error, or parse error? What what kind of feedback which you get back from that? Gabriel Caruso 9:59 While you are running your code, as soon as that class get referenced, we're going to check. Is not when is initiated, when is not when is called, as soon as I think the autoload detects that class is gonna parse, is going to identify, and then is going to compile and during the compile time that we mentioned. We're going to identify that. So it's going to be early in the stages. Perhaps as soon as you run something or you would upset me, you're going to have that feedback saying: hey, this is not compatible with what we are expecting. Derick Rethans 10:32 Is that a warning or type error? Gabriel Caruso 10:34 It's going to be a fatal error, because this is what we are constantly returning with the destructors and constructors. Derick Rethans 10:41 Yeah, we alluded to mixed already a little bit and the RFC mentioned mixed a few times, of course mixes in the type and PHP yet. So what do you want to do about that? Gabriel Caruso 10:51 Today we are 11th of May of 2020. Right now we have an RFC voting in PHP to introduce the mixed type. I'm not going to say if I agree or disagree, it's being voted. If that RFC gets accepted then I have already talked with the authors of the that RFC, I'm going to wait until they merge into master. I'm going to rebase and readapt to my RFC, to have those mixed types. And there we go PHP eight probably can have mixed, and probably can already have the usage of mixed in the magic methods. So either No, I'm gonna need to wait for the end of their RFC. If it's approved, there go I need to rebase my PR. In the other case, we are going to keep as comments because we can't ensure that in the compile time with the VM. Derick Rethans 11:41 At the moment, it looks like that vote will and in May 21. The current votes are 35 to six for passing. So it looks like that will go through Unknown Speaker 11:50 And then I need to rush because we have the upcoming feature freeze of PHP eight. So I need to make sure that I start to vote and implement my RFC before that time. Derick Rethans 12:00 Feature freeze should be by the end of July. So I think you have plenty of ime pfor that. And of course you have a release manager, you can make an exception. That's how that works. Usually adding extra checks will have impact to existing code. Is there much impact to existing code here as well? Gabriel Caruso 12:18 That was the interest question that I made myself. Okay, I'm going to touch the magic methods of PHP. I'm going to break some code in an issue identified those breaking changes in an each map in the RFC. How do I map across many projects, many libraries, many PHP codes out there? How do I do that? I remember that Nikita back in his RFC about the parenthesis origin, like how do we present this ordering and yada yada yada. He made a script, where he went through I think was the top thousand or top 10,000 packages. On packagist, that is the official composer package provider and he identified everything, and ask myself how he did that. And actually was very easy. He just cloned other repositories. He instantiate a new PHP parser instance that is his magic parser. That is behind PHP Stan, is behind psalm, is behind a lot of infection, a lot of big projects, where you analyze the code. So you have a code base where you can analyze and say: Do I have magic methods wrong? And then I run this script, identify, I think six or seven types that were not perfect. Three of them. I have already submitted a request because we're in PHP Unit and I said to Sebastian: hey, this actually is not right. Because I'm proposing this RFC, he was like: Okay, perfect, let's merge it. And the other cases are the cases that I mentioned. For example, with get. Get, you need to return mixed but by the LSP, you can nail down to an integer or a string. So there you go, at least in the top 10,000 packages of composer is not going to be a breaking change. But of course, it's going to be breaking change for people that I can't map. So this is why it's mentioned the RFC that if you're using types with magic methods wrong, we're going to warn you. Derick Rethans 14:13 But at least it's an easy thing to check for. Because even running all your files through PHP minus L should catch it. Gabriel Caruso 14:20 Yeah, there you go. Derick Rethans 14:22 So it's a very easy to check for something. You provided a link to Nikita's script where he checks for those ternairies, do you have a version of your own script available as well? Gabriel Caruso 14:33 That's interesting. I thought the RFC was updated. So I'm going to update the RFC, because I do have the script locally. Derick Rethans 14:39 Then I can link to it for the podcast as well. Gabriel Caruso 14:41 Okay, perfect. Derick Rethans 14:42 In the future, are you thinking of extending checks to a few more things? Gabriel Caruso 14:46 So this is something that I fought about this RFC, like how much you want to break and explode people's code. And I think starting with checking types in the signature is the first step. The next step is to actually check the return type. We do that with toString. So for example, although you have type right for maybe, some logic or something is wrong, you're returning an integer. There is a check before the actual type saying you're supposed to return a string you're return an integer. And actually, there is a check in the magic method saying this magic method was supposed to return a string. I think is gonna break even more code because then it's something that I can't measure. So I was like: Okay, let's first start with types and then we can give it next step that is: okay, inside this method, what is being returned, okay, is something different from the signature: explode. You're returning something that I was not supposed to return. But this is not a fight that I'm going to pick. So I leave it up for the next major version of PHP or whatever. Derick Rethans 15:49 Wouldn't PHP's strict versus weak type mechanism already catch these things. So from debugInfo, if you would type that as returning an array, and then you end up returning an object, which is not necessarily wrong, just not what you expected. PHP's return type checking mechanism should already catch that for you. Gabriel Caruso 16:13 If you have a magic method typed. If it's not typed, so we can say that some efforts do have that check. And then we're going to expand when we don't have types in the signature. Derick Rethans 16:24 That's clear now. Do you have anything else to add? Gabriel Caruso 16:27 The only thing that I want to add that is, I have created another RFC, and this is something that I always tell everyone that is easy to do; is not impossible. Anyone can go there, identify a bug or catch a bug report and then try to fix it. And this is what I'm doing. Like I'll do them to release many of PHP eight. I'm also fixing bugs, improving documentation and everything else. This is something that I try to do and share with everyone. So everyone can also be the next one contributor to the to PHP and it's evolution. Derick Rethans 16:57 This RFC isn't out for voting yet. You set you want to sort of wait until mixed gets passed or not. What's the reception been so far? Gabriel Caruso 17:05 So I asked a couple of key members of the PHP community, both internal and external people. They agree, they said that the right approach is to first check for the signature, because if someone is already using types, that project is type friendly, so we can at least play with that. But if someone is not typing, then this is a bigger fight. And then we're going to talk about that in the future. Derick Rethans 17:29 Thank you, Gabriel for taking the time this morning to talk to me. I've learned a few more things about this RFC, so that's always good to know. And again, congratulations of being the PHP eight release manager together with Sara. Gabriel Caruso 17:41 Thank you very much. Also thank you for inviting me for this new podcast is amazing. Always listen to all these famous people of PHP that talked with you. And I'm like, Whoa, Derick has invited me this is going to be so much fun. Thank you very much. Derick Rethans 17:55 Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystify the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to Dderick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC`: Ensure correct signatures of magic methods Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 53: Constructor Property Promotion London, UK Thursday, May 14th 2020, 09:16 BST In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the Constructor Property Promotion RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 53. Today I'm talking with Nikita Popov about a few RFCs that he's made in the last few weeks. Let's start with the constructor property promotion RFC. Nikita Popov 0:36 Hello Nikita, would you please introduce yourself? Hi, Derick. I am Nikita and I am doing PHP internals work at JetBrains and the constructor promotion, constructor property promotion RFC is the result of some discussion about how we can improve object ergonomics in PHP. Derick Rethans 0:56 Object economics. It's something that I spoke with Larry Garfield about two episodes ago, where we discuss Larry's proposal or overview of what can be improved with object ergonomics in PHP. And I think we mentioned that you just landed this RFC that we're now talking about. What is the part of the object ergonomics proposal that this RFC is trying to solve? Nikita Popov 1:20 I mean, the basic problem we have right now is that it's a bit more inconvenient than it really should be to use simple value objects in PHP. And there is two sides to that problem. One is on the side of writing the class declaration, and the other part is on the side of instantiating the object. This RFC tries to make the class declaration simpler, and shorter, and less redundant. Derick Rethans 1:50 At the moment, how would a typical class instantiation constructor look like? Nikita Popov 1:55 Right now, if we take simple examples from the RFC, we have a class Point, which has three properties, x, y, and Zed. And each of those has a float type. And that's really all the class is. Ideally, this is all we would have to write. But of course, to make this object actually usable, we also have to provide a constructor. And the constructor is going to repeat that. Yes, we want to accept three floating point numbers x, y, and Zed as parameters. And then in the body, we have to again repeat that, okay, each of those parameters needs to be assigned to a property. So we have to write this x equals x, this y equals y, this z equals z. I think for the Point class this is still not a particularly large burden. Because we have like only three properties. The names are nice and short. The types are really short. We don't have to write a lot of code, but if you have larger classes with more properties, with more constructor arguments, with larger and more descriptive names, and also larger and more descriptive type names, then this makes up for quite a bit of boilerplate code. Derick Rethans 3:16 Because you're pretty much having the properties' names in there three times. Nikita Popov 3:20 Four times even. One is the property name and the declaration, one in the parameter, and then you have to the assignment has to repeat it twice. Derick Rethans 3:30 You're repeating the property names four times, and the types twice. Nikita Popov 3:34 Right. Derick Rethans 3:36 What is the syntax that you're proposing to improve this? Nikita Popov 3:39 The syntax is to merge the constructor and the property declarations. So you only declare the constructor and you add an extra visibility keyword in front of the normal parameter name. So instead of accepting float x in the constructor, you accept public float x. And what this shorthand syntax does is to also generate the corresponding property. So you're declaring a property public float x. And to also implicitly perform this assignment in the constructor body. So to assign this x equals x, and this is really all it does. So it's just syntax sugar. It's a simple syntactic transformation that we're doing. But that reduces the amount of boilerplate code you have to write for value objects in particular, because for those commonly, you don't really need much more than your properties and the constructor. Derick Rethans 4:40 Besides public, I suppose you can also use protected and private there as well. Nikita Popov 4:45 That's right. So you can use all the visibility modifiers. Well, public protected private, static does not really make sense. But if we add other modifiers in the future, then those could be used there as well for example, if we add support for read only properties, then of course, you could also write public readonly float x or something. Derick Rethans 5:09 The RFC talks about desugaring. How's this implemented? Is this transformation on in the AST, or in another way? Nikita Popov 5:17 This is not an AST transform, but I would say close enough. So we just generate the corresponding property declarations and assignments in the compiler. If you inspect the AST with an extension like PHP AST, you will see the code as written. So with the public in front of the parameter name, but if you inspect the code in reflection, then it will look as if you declared the property explicitly. Derick Rethans 5:48 So the RFC talks about a few constraints and what you can and cannot do with those promoted properties. One of the things it talks about is nullability. Nikita Popov 5:58 Well, we have two different nullability semantics in PHP for historical reasons. One is in parameters, where we say, if you use a type that is not explicitly nullable, but you have a null default value, then we make the type implicitly nullable. While for property types, which are newer, we no longer have this implicit behaviour. So if you want to have a nullable property, you do need to explicitly mark it as nullable. Just using a null default value on will result in an error. And the handling is the same here. So if you want to have a nullable promoted property, you have to mark it as nullable Derick Rethans 6:43 And you cannot just rely on setting the default to null? Nikita Popov 6:46 Exactly, but I think it's like detail. And really this could go either way. I just prefer the explicit nullability because this seems like the direction we are going to in the future. I don't know if we will ever remove this implicit behaviour. Maybe not. But I think nowadays explicit one is preferred. Derick Rethans 7:10 Less magic is better. Nikita Popov 7:11 Less magic, exactly. Derick Rethans 7:13 The RFC also has like constraints in there. You can also define a constructor in traits and abstract classes. Can you also use a constructor property promotion there as well?. Nikita Popov 7:23 In traits? Yes, I mean in traits, using it will be a little bit weird. But there is no reason why it can't work. After all traits can have a constructor that will be used in the using class. And traits can also have properties that get imported. So the same mechanism works there as well. It does not work for abstract constructors or constructors in interfaces. The syntax also implies that you have some assignments inside the body of the constructor, and if we have an abstract constructor, then we could not emit these assignments anywhere. We could support it as a special case, like saying that it only declares the properties but skips those assignments. But I know how often you've used abstract constructors, I probably used them like maybe once or twice in all my time working with PHP. So either they really need extra support in that area. Derick Rethans 8:25 It would also then introduce an inconsistency were promoted properties in abstract classes or abstract class constructors if that's the thing, would be different from normal class constructor property promotion. How does the inheritance work? Is the working in the same way or is there no specific difference in it? Nikita Popov 8:44 Based on like discussion feedback, I think inheritance is the largest point of confusion with this syntax. The thing is that does not really have any special interaction with inheritance. So you can just follow this like syntactical transformation it does, which does not have any impact on inheritance. But the thing is, if you just look at the code, and you see you have the parent class defining the constructor, and the child class defining the constructor, and then you're wondering, well, is there some kind of connection between the parameters? The promoted parameters declared in one constructor and the other one? And the answer is simply: No, there isn't. Those have nothing to do with each other. And even more generally, constructors are a bit of a special case where inheritance is concerned. So usually, we say that methods always have to be compatible with the parent method. So the signature has to be compatible, the return type has to be well not match, but be contravariant. And similar for the argument types, but this rule does not apply for the constructor. So the constructor really belongs to a single class, and constructors between parent and child class do not have to be compatible in any way. Derick Rethans 10:09 Are there any types that you can't use for constructor property promotion? Nikita Popov 10:14 Just callable. Because callable is not a valid property type. Well, there is one more thing that you can't use a variadic argument. Well, if you write a variadic argument, you write something like int, dot, dot, dot, whatever. But the type you're actually writing is int, because that's the type of each individual argument. But all of that gets collected into an array. So the type of the corresponding property would have to be array. So we would have to do an extra transform that's maybe not super obvious. And so I've left this part out. Derick Rethans 10:50 And also PHP's type system doesn't support defining an array of integers. It only supports describing an array. At a time we're talking about is, at the end of April, this hasn't gone up for a vote yet. When do you think this will happen? Nikita Popov 11:05 The RFC will need one small adjustment because the attributes RFC is currently in voting and it very much looks like it's going to be accepted. We will need to also consider support for attributes on the promoted properties. I think the only small question there is, what does the attributes apply to? Because this could apply to the parameter or to the property, or both. Derick Rethans 11:34 How would you actually set these attributes because from what I understand docblocks, you can only use in front of a method name or a property declaration. How would you define a different attribute for each of the promoted properties? Nikita Popov 11:48 I believe that the attributes RFC already supports attributes on parameters, so that shouldn't be a problem. Derick Rethans 11:55 So it allows for setting a specific attribute for each of the arguments coming into the constructor. But that didn't quite answer the question. When do you think we'll be voting on this? Nikita Popov 12:05 Maybe in a week or so. Derick Rethans 12:06 By the time this podcast comes out? Nikita Popov 12:09 Well, we have had a lot of activity recently in PHP internals. So I guess we are one of the few places that benefit from the Coronavirus, because people now have time to work on PHP. Derick Rethans 12:24 Yeah, I mean, I'm looking at so much extra code now. Interestingly, when going to the RFC, and as a side note, it mentioned somewhere that when defining more properties, the line length goes too long, because you now have this extra keyword in there. And that could benefit from then separating the constructor arguments over multiple lines. And that that raises the point is that you can use a trailing comma in arrays when you call functions, but not in argument lists. And I saw that you've also made another RFC for adding the trailing commas in the parameter lists. Nikita Popov 12:58 So there's like a super simple RFC, just allow that extra comma. This has actually already been discussed a couple of times in the past, and has not, has been declined that point. Derick Rethans 13:13 I'm just having a quick look at it. Because this RFC is already voting to see what the current votes are, and it's 58 for and one against. Nikita Popov 13:21 I think like the main counter argument people have against this kind of trailing comma stuff is, well, doesn't that mean that it encourages writing methods with a lot of parameters, which is a bad style. I don't think it does. And I think that even if you don't have a lot of parameters, it's fairly easy to run into line length limitations, because nowadays like to use expressive long parameter names, and expressive long type names, so even without adding an extra protected in front of all of that, you can really easily get signatures that split across multiple lines. In which case having the trailing comma is nice, mainly because we already write it everywhere else. Derick Rethans 14:12 Except for in arguments to methods, because you can't. Nikita Popov 14:17 Well, there are also a couple of other places where you can't. For example, like if you have a class implements, and then implements many interfaces, then you can't put a trailing comma after the last interface. And this is something we could also allow. But I think the relevant distinction there is that this is kind of a freestanding list. Um, it's not wrapped inside brackets, or parentheses. So it kind of looks a little bit weird if you have a trailing comma there, which is possibly also why previous RFC on that simply allowed trailing comma everywhere did not pass. Derick Rethans 14:58 As I said, it looks likely that will pass. Nikita Popov 15:01 Yes, I think it's unlikely that we're going to get 13 new no votes. Derick Rethans 15:07 What I also find interesting is that an RFC that you've mentioned earlier in the episode is that attributes are going to pass as well. At the moment, there's only one no votes there as well, which surprised me because the last time attributes was discussed was very much not going to pass whatsoever. Nikita Popov 15:27 Yeah, this is an interesting effect. It's hard to say why it happens. Probably, well, part of the reason is just that issues that were raised on previous proposals have been addressed. For example, the last one by Dmitri had the very controversial aspects where it's exposed the AST. The abstract syntax tree representation of the attributes, which has gone from this one, and thus removes one of the contentious issues. But I think another part is just that sometimes it takes multiple proposals to really get an idea through internals. We have this situation pretty commonly that though the first RFC fails, second RFC fails, and then the third one does pass. Derick Rethans 16:18 It's also it's taken five years or so. And people's opinions might just change about these things. Nikita Popov 16:23 Exactly. The previous proposals might just have been before their time. Derick Rethans 16:29 I saw you had made one other tiny RFC, which is the stricter type checks for arithmetic slash bitwise operators. What is that about? Nikita Popov 16:40 Very simple. So if you're write, well, x minus y, and x is an array. And y is a resource, like what do you expect the outcome to be? There is really no reasonable way that can work. So this RFC proposes to make the arithmetic and the bitwise operators, when working on arrays, when working on objects, and working on resources, simply throw an exception. And the motivation for that was the operator overloading RFC, which has in the meantime been declined. But still, this was a concern raised there that while you can overload operators for objects, but you still get pretty weird behaviour if an overloaded operator is missing, because we currently handle that with just a otice and assuming that the object is equal to one, which is usually not a useful or desired behaviour. Derick Rethans 17:39 There is of course, one exception where you can still use an arithmetic operator, which is the plus between arrays. Nikita Popov 17:46 That's right, yeah. So array plus array is similar to an array merge operation. And that one is of course, well defined and remains supported Derick Rethans 17:55 Whereas things like true divided by 17, although not sensible, it'll continue to work. Nikita Popov 18:00 Right, that also. Yeah, so because this is simply a much more contentious issue whether, like implicitly treating true as one is a good idea or not. Personally, I know I have written code where I, for example, add up booleans. Just as a count of how often something is true. This is like maybe maybe, style wise it would be better to write an explicit integer cast. But the code is also not really wrong. This may be as a discussion for another time. Derick Rethans 18:33 As we've said before, the smaller the RFCs, the easier it is to get them passed as well. Alright, Nikita, thanks for taking the time this morning to talk to me about constructor property promotion RFC, and a few others. We'll see whether they get passed for PHP eight. Nikita Popov 18:48 Thanks for having me Derick, once again. Derick Rethans 18:52 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes Constructor Property Promotion Allow trailing comma in parameter list Stricter type checks for arithmetic/bitwise operators Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 52: Floats and Locales London, UK Thursday, May 7th 2020, 09:15 BST In this episode of "PHP Internals News" I talk with George Banyard (Website, Twitter, GitHub, GitLab) about an RFC that he has proposed together with Máté Kocsis (Twitter, GitHub, LinkedIn) to make PHP's float to string logic no longer use locales. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 52. Today I'm talking with George Banyard about an RFC that he's made together with Mate Kocsis. This RFC is titled locale independent floats to string. Hello, George, would you please introduce yourself? George Banyard 0:39 Hello, I'm George Peter Banyard. I'm a student at Imperial College and I work on PHP in my free time. Derick Rethans 0:47 All right, so we're talking about local independent floats. What is the problem here? George Banyard 0:52 Currently when you do a float to string conversion, so all casting or displaying a float, the conversion will depend on like the current local. So instead of always using like the decimal dot separator. For example, if you have like a German or the French locale enabled, it will use like a comma to separate like the decimals. Derick Rethans 1:14 Okay, I can understand that that could be a bit confusing. What are these locales exactly? George Banyard 1:20 So locales, which are more or less C locales, which PHP exposes to user land is a way how to change a bunch of rules on how string and like stuff gets displayed on the C level. One of the issues with it is that like it's global. For example, if you use like a thread safe API, if you use the thread safe PHP version, then set_locale() is not thread safe, so we'll just like impact other threads where you're using it. Derick Rethans 1:50 So a locale is a set of rules to format specific things with floating point numbers being one of them in which situations does the locale influence the display a floating point numbers in every situation in PHP or only in some? George Banyard 2:06 Yes, it only impacts like certain aspects, which is quite surprising. So a string cast will affect it the strval() function, vardump(), and debug_zval_dump() will all affect the decimal locator and also printf() with the percentage lowercase F, but that's expected because it's locale aware compared to the capital F modifier. Derick Rethans 2:32 But it doesn't, for example, have the same problem in the serialised function or say var_export(). George Banyard 2:37 Yeah, and json_encode() also doesn't do that. PDO has special code which handles also this so that like all the PDO drivers get like a constant treat like float string, because that could like impact on the databases. Derick Rethans 2:53 How is it a problem that with some locales enabled and then uses a comma instead of the decimal point. How can this cause bugs and PHP applications? George Banyard 3:02 One trivial example is if you do, you take a float, you convert it, you cast it to string, and then you cast it back to float. If you're on a locale, which is the dot decimal separator, you will get back the original float. However, if you have like locale which com... which changes the decimal separator, like the German one, you'll get a string; you'll get like three dash, three comma 14, and then when you convert it back to float, you will only get three because PHP doesn't recognise the comma as a decimal separator in its string to float conversion and so it will loses the decimal information. Derick Rethans 3:39 That doesn't seem particularly very useful as a feature. So my question here is we talked about floating point numbers and, and I think floating point numbers have other issues as well. Not sure whether we want to go into the details of how floating point numbers and computers work, but we can if you want to. George Banyard 3:56 The easy way to explain floating points is to use like exponential notation, or to use the scientific exponential notation, which most people will know from engineering or physics, where you usually have like, one significant like the number, like a comma, a couple of numbers, and then you have like an exponent which raises it to usually, so to your power 10 to the something, which then gives you an order of magnitude. Floating points, basically that but in base two. Derick Rethans 4:26 Positions have magnitudes attached to them. They're all powers of two. George Banyard 4:30 Yeah. Derick Rethans 4:31 And of course, when we use numbers an decimal, like pi being a bad example. George Banyard 4:36 Once said. Derick Rethans 4:37 I was going to say if you divide 10 by three, you get 3.33333 that never ends, right. And I reckon if you have a specific number in decimal like three point 14, then you can't necessarily always exactly represent it in binary. George Banyard 4:55 Yeah, one common example would say it's like one 10th which has like a perfect representation in decimal. But like in binary is a never ending repeating sequence. When you try to like display naught point one, like how it's saved in floating point, it's really weird and everything to get like these rounding errors which can propagate. Derick Rethans 5:15 And hence you often hear people recommend to never use float for things like monetary values, but then as you said that you sentence that right? George Banyard 5:23 Yeah, put everything in integers and work with integers and just like format it afterwards. Derick Rethans 5:29 So let's get back to what you and Mate are actually suggesting to change. What are the changes that you want to make through this RFC? George Banyard 5:36 The change's more or less to always make the conversion from float to string the same, so locale independent, so it always uses the dot decimal separator, with the exception of printf() was like the F modifier, because that one is, as previously said, locale aware, and it's explicitly said so. Derick Rethans 5:56 Doesn't printf also have other floating related format specifiers? I believe there's an E and a G as well. And uppercase F. What is the difference between these? George Banyard 6:06 Lowercase F is just floating point printing with locale awareness. Capital F is the same as lowercase, but it's not locale aware. So it always uses the dot decimal separator. Lowercase E is, what I've learned recently also locale aware, and uses the exponential notation, like with a lowercase e. Uppercase E is the same as lowercase E, but instead of having a small like a lowercase e in the printing format, it's a uppercase E, and lowercase G has some complicated rules onto when it decides which format to choose between lowercase F and lowercase E, depending on like how big like the number of significant digits are after the comma, or like the dot. And uppercase G is the same but using uppercase F and uppercase E instead of lowercase E and lowercase F. Derick Rethans 6:58 And all of them can be locale dependent then except for uppercase F. George Banyard 7:02 Yeah. Derick Rethans 7:02 Do you think this is going to impact people's applications, if you change the default of normal casts to be locale independent? George Banyard 7:10 I would have expected it to not be that significant. And only that would affect displaying floating point. So if you're like in Germany, instead of like seeing a comma, you would now see a dot, which can be annoying, but I wouldn't imagine is the most, the biggest problem for you like end users. But apparently, people made tooling to work around the locale awareness of it. And so they could maybe break with passing stuff, which I suppose that happens because it's been, PHP's 25 years old. And that behaviour has been there for like ever. So people worked around it or work with it. Derick Rethans 7:49 Is this going to be purely a displaying change or something else as well? George Banyard 7:54 For example, if you would send like a float to like an API via HTTP, you would usually already need to have like code around to like work around like the locale awareness, or like all by resetting set locale or by using number_format or like sprintf or something like that. Because most other APIs or like you would like contact would expect like the float to use like a decimal point. PHP. If you do the string to float conversion again, which was not a point, then you get only an integer basically. Derick Rethans 8:27 Because PHP's parser, strips it out once it stops recognising digits, which is in this case, the comma. George Banyard 8:33 Yeah, that would make the code nicer. The main reason why me and Mate like decided to propose this RFC is because like most APIs, and also databases and everything, expect strings to be formatted in like a standard way. Currently, like if you for whatever reason, use a locale, then it's not, but yeah, like apparently people worked around that when they were maybe stripping stuff from like HTML whatever displayed and try to work around it because that got raised in the list quite recently. Derick Rethans 9:06 This change does not necessarily remove the ability of using locales for formatting numbers, because PHP still has the lowercase F as format specifier for printf. And sprintf and friends. Does PHP have other ways of rendering numbers according to locales? George Banyard 9:24 According to locales? I don't think so. You can format it something like manually, or the number format a class from the Intl extension. Derick Rethans 9:35 Yeah, from what I understand, number_format, you have to do it all by yourself. And the intl extension doesn't support the posix or C locales from the operating system, right. It uses its own locale rule set from the Unicode project. The RFC lists some alternative approaches. Would you mind touching a little bit on these as well? George Banyard 9:58 One of the alternatives approaches is to deprecate setlocale altogether. Because as a byproduct, this just fixes the issue because you can't define any locale anymore. So, there will always be locale independent. This has been discussed like in back in 2016, mostly because of the non thread safe behaviour. Because it affects global states and everything. But at the time, the conclusion was, because HHVM, like did a patch, making a thread safe, setlocale function was to mimic this patch and like implement it into PHP, which hasn't been done yet. Another one that we thought about was to deprecate kind of the behaviour and like raise a notice, like a deprecation notice, because that would happen like basically on every float to string conversion. The penalty, like the performance penalty, seemed pretty like strong. One other thing we considered was with Mate was to deprecate the current behaviour in some way. However, emitting a deprecation notice on basically every float to string conversion seemed not to be ideal. And just like flood, the log, the log output, and like also bring like a performance penalty because like outputting warnings isn't like most friendly thing to do performance wise. Derick Rethans 11:21 What has the feedback been so far? George Banyard 11:24 Feedback currently has been that like most people, well, one person because there hasn't been that much feedback. Derick Rethans 11:30 There hasn't been that much feedback because you've only just proposed? George Banyard 11:33 some of the feedback we got officiates the change However, they have concerns about like the modification of like, in every case for locales without having any upgrade paths. In some sense. It's just, oh, you have the change, and then you need to execute it and see what breaks. We may be currently considering like ways to figure that out, maybe by adding a temporary ini setting which would kind of be like a debug mode, where when you use that it would like emit notices when like this conversion would happen before and they would notice: Oh, this is not happening anymore. You need to like be aware of this change in behaviour Derick Rethans 12:17 Did we not used to have E_STRICT for this at some point or E_DEPRECATED? George Banyard 12:24 E_DEPRECATED is still a thing. E_STRICT got mostly removed with PHP seven. There've been like a couple of remaining notices which I got rid off or put back to normal E_WARNINGS or E_NOTICES in PHP seven point four. There were like two or three remaining. But yeah, like so that's one way to maybe approach it of like implementing a debug ini setting which would only be used for like dev because then where if you get like warnings and everything, you don't really care about the performance impact. And then in production, you would like disable that and the warnings wouldn't pop up. Derick Rethans 12:56 How would that setting be any different from just putting it behind an E_DEPRECATED warning? George Banyard 13:00 So with an E_DEPRECATED warning, we would need to show this behaviour, and we would need, and we could only change the behaviour in like PHP nine. Currently if we do that with like debug setting, we could change it with PHP 8. Derick Rethans 13:13 That's a bit cheating isn't that? George Banyard 13:15 Could say so. Derick Rethans 13:16 I'm interested to see how this ends up going. Do you have any timeframe of when you want to put it for a vote? George Banyard 13:23 Currently, we've only started this discussion. And I think until we figure it out, if we get like an upgrade pass, or multiple upgrade passes that we could then put into a secondary vote. I wouldn't expect it to go to voting that soon. Maybe end of April would be nice. Derick Rethans 13:41 So around the time when this podcast comes out? George Banyard 13:44 Ah! For once! Derick Rethans 13:46 For once I got my timing right. George Banyard 13:49 Yes. Don't you have like the string contain one which just got out. Derick Rethans 13:53 Yes. George Banyard 13:54 Then that vote close like last week. Derick Rethans 13:57 Yeah, it's really tricky because there's so many, so many small now that I can't keep up. George Banyard 14:02 Yeah, Mark also did like his debug. Derick Rethans 14:04 Yeah. And there's like two or three tiny ones more that I would quite like to talk about. But by the time there's an opening in the schedule, it's pretty much irrelevant. So I'm trying to see whether I can wrap a few of the smaller ones just in one episode because there's the throw expression, the is literal check, and typecasting in array destructuring expressions, and all showed up in the last three days. George Banyard 14:26 I suppose people have like, lots of time now. Now, it's a taint checker, basically, like I know, there's been like this paper by Facebook like six or eight years ago, which talks about how they kind of tried to implement in their static analyzer, but like, a static analyzer doesn't need to be something in the engine. That's what I don't really get. Derick Rethans 14:45 Thank you, George, for taking the time this afternoon to talk to me about a locale independent float to string RFC. George Banyard 14:53 Thanks for having me on the podcast again. Derick. Derick Rethans 14:55 You're most welcome. Thanks for listening to this installment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Locale-independent float to string cast Floating Point Numbers Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 51: Object Ergonomics London, UK Thursday, April 30th 2020, 09:14 BST In this episode of "PHP Internals News" I talk with Larry Garfield (Twitter, Website, GitHub) about a blog post that he was written related to PHP's Object Ergonomics. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 51. Today I'm talking with Larry Garfield, not about an RFC for once, but about a blog post that he's written called Object Ergonomics. Larry, would you please introduce yourself? Larry Garfield 0:38 Hello World. My name is Larry Garfield, also Crell, CRELL, on various social medias. I work at platform.sh in developer relations. We're a continuous deployment cloud hosting company. I've been writing PHP for 21 years and been a active gadfly and nudge for at least 15 of those. Derick Rethans 1:01 In the last couple of months, we have seen quite a lot of smaller RFCs about all kinds of little features here and there, to do with making the object oriented model of PHP a little bit better. I reckon this is also the nudge behind you writing a slightly longer blog post titled "Improving PHP object ergonomics". Larry Garfield 1:26 If by slightly longer you mean 14 pages? Yes. Derick Rethans 1:29 Yes, exactly. Yeah, it took me a while to read through. What made you write this document? Larry Garfield 1:34 As you said, there's been a lot of discussion around improving PHP's general user experience of working with objects in PHP. Where there's definitely room for improvement, no question. And I found a lot of these to be useful in their own right, but also very narrow and narrow in ways that solve the immediate problem but could get in the way of solving larger problems later on down the line. So I went into this with an attitude of: Okay, we can kind of piecemeal and attack certain parts of the problem space. Or we can take a step back and look at the big picture and say: Alright, here's all the pain points we have. What can we do that would solve not just this one pain point. But let us solve multiple pain points with a single change? Or these two changes together solve this other pain point as well. Or, you know, how can we do this in a way that is not going to interfere with later development that we've talked about. We know we want to do, but isn't been done yet. So how do we not paint ourselves into a corner by thinking too narrow? Derick Rethans 2:41 It's a curious thing, because a more narrow RFC is likely easier to get accepted, because it doesn't pull in a whole set of other problems as well. But of course, as you say, if the whole idea hasn't been thought through, then some of these things might not actually end up being beneficial. Because it can be combined with some other things to directly address the problems that we're trying to solve, right? Larry Garfield 3:07 Yeah, it comes down to what are the smallest changes we can make that taken together have the largest impact. That kind of broad picture thinking is something that is hard to do in PHP, just given the way it's structured. So I took a stab at that. Derick Rethans 3:21 What are the main problems that we should address? Larry Garfield 3:24 So the ones that identify that people have been talking about are the following. One is constructors are just way too verbose. If you've looked at almost any PHP class, in almost any framework, the most common pattern is: you start with a class, you declare three to five properties that are private or protected. Then you have a constructor that takes three to five parameters and assigns each of those to those properties. Usually the names match all the way through, types match all the way through. It's all it's doing is shoving those parameters into properties. Right now, you have to repeat each property name four times total. It's just way too verbose. It's just more typing than we should be doing. And so there have been various proposals for ways to have to type less to do that. Derick Rethans 4:11 We'll get to the solutions in a moment, I'm sure. Larry Garfield 4:14 The next one is what I've called the bean problem. So I've referenced to Java beans. For those who have not worked with Java before. And I haven't worked with it in a long time. But when I last did, this was standard, you'd have what's called a Java bean, which is just a Java class that has a bunch of properties that are private, and then a getter and a setter for every single one of those properties. PHP, you see the same pattern a lot, especially in ORMs. Largely that comes down to this makes serialisation and deserialization straightforward because you can access properties through a method, you know, the names, automatic naming and so on. But that's again, an awful lot of typing to bypass the private and protected keyword. So how can we reduce the mental overhead of that and just have access to what we need to with less work. That relates to a lot of the reasons for that is immutable objects. So it's been increasingly popular in PHP in recent years to have objects that even though the language doesn't support immutability are effectively immutable, in that the object doesn't give you a way to change its properties. But it gives you a way to create a new object that is the same, but with certain changes. Think DateTimeImmutable in PHP core, or it has a modify() method, which doesn't change the objects in place. You see, if you call a DateTimeImmutable object, call it with the modify() method with a parameter of plus one week you get back a new DateTimeImmutable object, that is the timestamp one week later. That pattern is increasingly common. PSR-7, the HTTP messages spec uses that a lot of other packages have started doing it. The way that usually ends up working is these wither methods. It's with some value, with some some property name and so on, similar to a setter, but it returns a new object and there's a common pattern for that now. Another problem is materialised values, where you have something that conceptually is a property. And to a outside caller, it really should just be a property. But you want to not have it be a full property itself. The example I use the kind of the canonical example is you have a first name property and a last name property and you want to format a full name property. There's a lot of cases like that. Right now, you do that as a method, and you have some kind of static cache internally. Which works. It's just: Can we make that better? And can we not make it worse with any of these other changes? A lot of this comes down to how do we make not make any of these problems worse. Another problem is, for lack of better term, and what I call the documented property problem, where if you have a large constructor, then you're going to pass in a bunch of different values because they all map to properties, but you need to keep track of: Okay, which one of these is which? And especially comes up for value options, rather than service objects. Were introduced in C, or Rust or Go would just be a bare struct, essentially, which PHP doesn't have. And we can get to why I think that's okay, we don't have. But objects where you really just have a combination of properties, and that's okay. But you still need to keep track of them, you want to be able to create an object that has only some of them. And if you have eight optional properties, and you want to just set the last one, right, now you have a bunch of nulls or question marks, or empty quotes, or zeros, or whatever default value, and again, it's just very cumbersome. And so the kind of the question I was looking at is, how can we make all of these better and not make any of them worse? That's kind of the problem space. I think most people can relate to, at least most of these. Derick Rethans 7:46 I would think so to certainly in some of my code, where that's been the case. Hopefully, that was all the problems you found. Larry Garfield 7:53 I think I got all of them. Derick Rethans 7:55 As I alluded to, in the introduction, there have been quite a few smaller RFCs already to address some of the problems that you just mentioned. Which you list and as well as others in things that you have found that multiple people currently already do. Should we have a quick look at what these things are? Larry Garfield 8:15 One of the proposals that I looked at was writeonce properties, as we are recording this, there's an RFC for that that's in voting. Although it looks like it's probably not going to pass that the vote stays where it is. Now, the idea there is allow typed properties to have a read only marker on them just like the type or public or private, and then they can only be written to once if they're uninitialised you can write to them, after that they're just stuck that way. The advantage is that would make them safe to expose publicly. And so you can have a property that you can expose to the world just access a property but not be concerned about someone changing it out from under you. The downside of that mainly comes down to that evolvable immutable object where that with method then becomes a lot harder, because you can't say: clone this object and change this one property because well, you can't change this one property, you'd have to fully construct a new object. There's also two different proposals that have been floated recently for compact object property assignments. I think they have different names for the same basic idea. Basically, if an object has public properties, being able to write to those in one shot in a code block, along with the constructor in a named fashion. It's essentially there's a common pattern now where you pass an associative array to a function which has a bunch of named properties, and then you can put them in whatever order you want. And then you know, dissect those and map those to properties internally. It's essentially taking that idea and baking it into the syntax, which does help and gives you when you have a lot of properties that are optional. It makes it a lot easier to you have a lot of properties defined or a lot of parameters defined it makes it a lot easier to piecemeal select them. The downside is all of those proposals to date only work on public properties, which have a long list of challenges with them. It also means you're bypassing any kind of validation around this property is only valid if this property is set, or this property has to be less than this property, and so on. Those are too limiting, but definitely they're trying to solve a real pain point. Derick Rethans 10:19 Nor can you enforce types through that, of course. Larry Garfield 10:21 Some of them I think, might be able to Derick Rethans 10:23 I meant associative arrays. Larry Garfield 10:25 Yeah, the associative array approach you can do now, which is really the only possible thing I can say in its favour is that it works today. Type enforcement isn't there, it's poor for documentation. Please don't do that. All these are dancing around names parameters, which is a different language feature that's been discussed on and off for many, many years. I don't know of any current RFCs on the table for this one, but it's come up many times. Number of languages have this Python has it for example, where give or take whatever syntax instead of specifying, call this function with parameters, one, seven and 19, and then you have to guess what those numbers mean, you can call a function with count equals one, order equals ASC, whatever. And then you can reverse the order, change the order around. It's essentially the same idea. But for function parameters rather than Object Properties. Again, there's implementation challenges there. But certainly there are languages that do it successfully. Another problem space people have been looking at is access control. So we mentioned the the read only property. In the discussion for that Nicholas Grekas, made a suggestion for having instead of having a read only flag, allow the access control on a property to be different for read and write. So you could have a property that is publicly readable but not writable. But private writable, or private and protected writable. That gives you many the same benefits as the read only flag would have, but without breaking some of the current patterns we have around cheap cloning of objects and so forth. Derick Rethans 11:58 Because of course in PHP, PHP's object oriented system is based on classes, not on objects. You can access read and write private properties of other objects as long as they have the same class. Larry Garfield 12:10 Correct. And that's something that we take advantage a lot of in cloning, to hold wither method style is based on that. If that feature of PHP went away, it would break an awful lot of code. So don't change that. Other things have been on the table. People have talked in the past about constructor promotion, which is a feature that a couple of languages have including Hack, which is the Facebook PHP fork. The basic idea there is, instead of repeating properties once for their declaration, once in the constructor, and then twice in an assignment, you just declare them as part of the constructor. And it becomes essentially a macro to expand that out to the same original code. Hack already has a syntax for that. This one actually has been a proposal for PHP before and it didn't pass. Derick Rethans 12:57 Was it proposed in the exact same syntax as Hack? I don't believe so because Hack had types at the moment, and PHP did not. Larry Garfield 13:05 The earlier syntax, I was just looking at that RFC earlier today, used public function constructs this arrow foo, comma, this arrow bar. And then you still had to declare the properties independently, so it only solves half the problem. And the syntax looked kind of weird. The Hack syntax just lets you put the entire property declaration in place of the parameter in the constructor line, and it fills in all of the other pieces. You have public function, construct, parentheses, private int, a number, private bar, some bar object, and so on. And it would automatically create that property on the class and take the parameter and promote it and do the assignment for you. So that's what Hack does. I believe TypeScript has something similar, although I haven't worked with it. It's again just simplifying that common case. Another non PHP place I look for inspiration is Rust, because Rust does immutable objects very well. And so I figured, alright, let's let's look what other languages are doing. What Rust does, they have objects that are more bare than PHP does, much like Go where it's really a struct to which you can attach methods rather than an enclosed object, but they let you create a new object. Here, the object constructor syntax is essentially named parameters already, you're essentially providing a Json like block of this property of this value, this property should have this value, similar to the object constructor proposals. But you can then say, dot dot some other object of the same type, which Rust reads as: and fill in anything I haven't specified with the values from this other object. The fallout of that is making new object that is the same as this other object, but for this one change really easy. Could we do something like that either using Rust syntax or something else just conceptually, would that work to make with the with style methods easier, possibly would it help bypass the problems with a read only flag and so on. Finally, kind of the granddaddy of them all proposal in PHP from a couple of years ago is property accessor methods. This is a very contentious RFC, it didn't pass mostly for performance reasons, as I understand it. But the idea here was you could declare a property to have a dedicated getter and setter method. And then when you try to read or write a property, that method gets called transparently in the background. It's essentially the same idea as the magic get and magic set methods on objects, but specifically for each property, which can then eliminate a lot of: if we're talking about this property, if we're talking about that property gives you a lot more flexibility. It also allows you to then, because those are methods, control the access of those methods separately for get and set. So you can have a public getter and private setter method. A number of other languages have this, Python does, JavaScript does. So I included that okay, this has been a proposal on the table before, I personally really like it. The only downside is the performance impact because since people can't really know in advance if a property it's going to be accessing is guarded by methods like this or not, it means every property access, therefore has an extra if statement around it in the engine. And the performance impact of that, well, small, individually, really adds up when you're talking about 10s of thousands of property accesses. As I understand that, that was the main reason that it didn't pass before. I don't have a good solution for the performance issue. Unfortunately, it would be delightful if you know the typing system would let us do that. Or if the JIT would do something there. I have no idea that's well out of my wheelhouse. Derick Rethans 16:34 That's lots of solutions that people have come up with in the past and haven't made RFCs for yet. Solving them all one by one, as you mentioned isn't particularly useful thing to do. Because, as you say, you end up in a jumbled mess of things. Your article continues to have an analysis section about all the different aspects of all the different problems and solutions that we've just mentioned here. What's your thinking here, how to join up all the dots? Larry Garfield 17:00 My goal was alright, as I said, what's the minimum amount of change we can do, that gets us the maximum benefit and solve as many problems as possible without making anything worse? Is there a way that we can make some problems not their own problem, but the result of some other problem? Can we make one a degenerate case of another and thereby solve, kill multiple birds with one stone essentially? What I came up with was: one, constructor promotion on its own, I think is very useful. Let's do that. Named parameters on their own are very useful, let's do that. The combination of constructor promotion and named parameters together gives us the equivalent of a object initialization syntax. The specific symbology in the syntax may look slightly different. But essentially you get the same net effect where you could say, hey, new product object and pass it a series of key values and you're done. And the object itself is defined as just a bunch of key values in the construct statements, and no body, and that still gets promoted. So we end up with struct like, or record like objects with relatively little syntax as kind of a side effect of these two other changes that have good arguments for them on their own. Derick Rethans 18:14 And also without introduce a new concept such as struct. Larry Garfield 18:18 Exactly. There's also discussion about, should we just introduce a separate language construct for a struct or a record, that is just their properties, possibly some validation, they will pass by value instead of by reference, which makes immutability easier, to design those for immutability. I've toyed with that idea in the past. And every time I come down to eventually I'm going to want to do everything that classes do anyway. Or if they do something special, I'm going to want to do those in classes, except for the way they pass. Legitimately, there's cases where we would want to have a value object that passes in a more by value style instead of the pseudo reference that objects passed today. There are use cases for that, that's really the only difference. Everything else is essentially the same in both cases, it's more work than is needed to try and create a whole separate construct there. Instead, let's make this one construct flexible enough that we can use it in either way, at whatever use case makes sense. I think those two changes together give us the most bang for the buck and don't harm anything else. Derick Rethans 19:16 Both of these two proposals help to solve the first problem that you have outlined, which is the problem with constructing objects. So the other problem that we spoke about is the value object and access to properties for example. Have you come up with a solution of which proposals would work towards solving that problem as well? Larry Garfield 19:36 My proposal on that front, based on what's available, is so I like Nicholas's idea of separate access control for read and write. Okay, now what syntax can we use for that that is going to be self explanatory and readable and not block property accessors if we ever get to the point of figuring out how to do those performently. I don't think we can go all the way to property accessors right now, I would love to, but I don't think that's feasible. Instead, we can borrow some of the syntax from that proposal and let you declare hard to explain this in verbal format. It's like: string name, curly brace, public get, private set, curly brace. Which is essentially the syntax that the property accessor proposal RFC had, but with the method bodies removed, which that RFC actually supported anyway. And what that gives us is then a syntax to say, this property has different visibility for reading and writing, for get and for set, in a way where it's natural to be able to add in functionality to that later for getters and setter methods. If we figure out how to do it. There are probably other syntaxes that could do the same. I'm flexible. I think the key here is some sort of syntax that gives us that split visibility in a way that opens itself to future extension, rather than just throwing more keywords before a property and hoping it works out for the best. And once you've done that, then I think it's worth it to consider: could we do some kind of Rust like cloning or Rust like creation process? I don't know. It could be a variant on cloning. People have proposed a clone this with and then list of properties. And that, essentially de-sugars into creating that new object and then calling a bunch of property set commands. Maybe that's viable. Maybe it's not I'm not sure. Maybe using a syntax closer to what Rust has so that certain thing parameter lists can get auto populated, I don't know. But I think that's an area worth exploring, and would be a nice add on to these others, but it's not a prerequisite. The thing I like about what I'm proposing here, each of these individual pieces carries value on its own. And there's a good reason to vote for each of these on their own, but they dovetail together so that the whole is greater than the sum of the parts. And I think that's the mark of good design where you don't solve each individual problem. You have tools that together solve several problems. It just kind of falls out of the design. Derick Rethans 22:06 Of course, at the moment you wrote this blog post, none of these proposals had more to it than your description in your article. Larry Garfield 22:15 Some of them had old RFCs that had been proposed and either didn't make it to a vote or the vote gone slightly negative for various reasons. But yeah, I did not have any patches. My C skill is still extraordinarily limited. That this was a discussion starter, not a here's an RFC with code. Derick Rethans 22:32 Of course, we are no day and a half or two days later. And now there is of course, an RFC for one of them, which is the constructor promotion, which pretty much as we spoke about earlier, picks up Hacklang's syntax and ports it to PHP. Larry Garfield 22:47 Yes, I've concluded that my primary role in PHP internals is inspiring Nikita to go write things. Derick Rethans 22:53 And you were successful in this case. Larry Garfield 22:56 A year ago, I was on this podcast with you talking about comprehensions, when I was pushing for those, and those never happened. But out of that discussion, Nikita noticed, oh yeah, short lambdas I should go finish those and then went and finished that RFC. My role is convincing Nikita, he should do things. So I consider that a worthwhile contribution. Derick Rethans 23:13 Fair enough. I agree. Anyhow, it would be interesting to see where this ends up going. We are about, what three, three months away from PHP 8.0's feature freeze. So there's plenty of time to look at these other three proposals that you concluded would be great to have altogether. Larry Garfield 23:32 I'm happy to work with anyone who actually does know, working on internals on any of these. Personally, I think the asymmetric visibility is the next one after constructor promotion. That's straightforward to do. I know Levi Morrison on the lists has suggested that named parameters has a lot of other gotchas around it that I didn't get into here. And that is very likely. There may very well be implementation reasons why these are harder than I present them as. I fully acknowledge that. But again, if any of these individually, I think still moves the language forward in a way that doesn't close off future avenues. Derick Rethans 24:07 Do you think you'll end up learning some C to be able to work on this yourself? Larry Garfield 24:11 So I used to work in C briefly, 16 years ago. I had a very, very short career writing software for Palm OS. Derick Rethans 24:18 And I remember us talking about it, when we recorded episode last year. Larry Garfield 24:22 And I did some C again, just recently, while playing with FFI. As we've discussed before, the PHP engine is not written in C, it's written in a macro language that is written in C. There's a learning curve there that I have yet to scale. Derick Rethans 24:34 Fair enough. Larry Garfield 24:35 If someone wants to mentor me in that while we work on one of these, I am very open to that. So putting that out there. Derick Rethans 24:40 You might be inundated by messages now, you never know. Larry Garfield 24:43 Better that then getting ignored Derick Rethans 24:45 Do you have anything else to at? Larry Garfield 24:46 I think it's beneficial for PHP collectively to take this broader approach of, not just okay, what can solve this immediate problem in front of us, we can scratch this one itch, but what are all the itches that we have that need to get scratched? And how can we solve all of those in a way that is going to have the best bang for the buck. And let us do the least amount of work at the least amount of syntax, least amount of conceptual overhead, and yet give us the most flexibility. And there's been a lot of talk anytime we're talking about the PHP type system of we eventually want generics, generics are hard. But let's make sure that whatever we do, doesn't make generics even harder. I think that's good that we have this goal in mind. And we're: all right, what iterative steps get us closer to that without locking us, in without painting us into a corner. And that's kind of what I'm trying to do here. And I would very much encourage everyone working on PHP to take that approach of: don't solve the immediate problem, look at the broader picture, what will solve multiple problems, what will dovetail nicely with something else and what kind of big picture plan in architecture we can look at that ends up making the language better rather than just looking at our feet. Derick Rethans 25:57 Well, thanks for taking the time this afternoon to come and talk about the object ergonomics. We'll see how much of it ends up in PHP eight. Larry Garfield 26:05 Fingers crossed. Derick Rethans 26:07 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes Larry's Blog Post Improving PHP's object ergonomics RFC: Object Initialiser RFC: Compact Object Property Assignment Episode 30: Object Initialiser Episode 49: COPA Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 50: The RFC Process London, UK Thursday, April 23rd 2020, 09:13 BST In this episode of "PHP Internals News", Henrik Gemal (LinkedIn, Website) asks me about how PHP's RFC process works, and I try to answer all of his questions. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 50. Today I'm talking with Henrik come out after he reached out with a question. You might know that at the end of every podcast, I ask: if you have any questions, feel free to email me. And Henrik was the first person to actually do so within a year and a half's time. For the fun, I'm thinking that instead of I'm asking the questions, I'm letting Henrik ask the questions today, because he suggested that we should do a podcast about how the RFC process actually works. Henrik, would you please introduce yourself? Henrik Gemal 0:52 Yeah, my name is Henrik Gemal. I live in Denmark. The CTO of dinner booking which does reservation systems for restaurants. I've been doing a PHP development for more than 10 years. But I'm not coding so much now. Now I'm managing a big team of PHP developers. And I also been involved in the the open source development of Mozilla Firefox. Derick Rethans 1:19 So usually I prepare the questions, but in this case, Henrik has prepared the questions. So I'll hand over to him to get started with them. And I'll try to do my best to answer the questions. Henrik Gemal 1:27 I heard a lot about these RFCs. And I was interested in the process of it. So I'm just starting right off here, who can actually do an RFC? Is it anybody on the internet? Derick Rethans 1:38 Yeah, pretty much. In order to be able to do an RFC, what you would need is you need to have an idea. And then you need access to our wiki system to be able to actually start writing that, well not to write them, to publish it. The RFC process is open for everybody. In the last year and a half or so, some of the podcasts that I've done have been with people that have been contributing to PHP for a long time. But in other cases, it's people like yourself that have an idea, come up, work together with somebody to work on a patch, and then create an RFC out of that. And that's then goes through the whole process. And sometimes they get accepted, and sometimes they don't. Henrik Gemal 2:16 How technical are the RFCs? Is it like coding? Or is it more like the idea in general? Derick Rethans 2:23 The idea needs to be there, it needs to be thought out. It needs to have a good reason for why we want to add or change something in PHP. The motivation is almost as important as what the change or addition actually is about. Now, that doesn't always get us here at variable. In my opinion, but that is an important thing. Now with the idea we need to talk about what changes it has on the rest of the ecosystem, whether they are backward compatible breaks in there, how it effects extensions, or sometimes how it effects OPCache. Sometimes considerations have to be taken for that because it's, it's something quite important in the PHP ecosystem. And it is recommended that it comes with a patch, because it's often a lot easier to talk about an implementation than to talk about the idea. But that is not a necessity. There have been quite some RFCs where the idea was there. But it wasn't a patch right away yet. It is less likely that these RFCs will get accepted, because in order to get something into PHP not only needs to be there a good idea, that also needs to be there a good implementation of it. If you have been a long term contributor to PHP, then you should know how to write a patch yourself. In other cases, you'll see people that have an idea try to find somebody else to do and work on the implementation together. But all RFCs, if they get accepted. It's always pending a good implementation. Henrik Gemal 3:52 How is an RFC actually done? Is that like a template you fill out or is it like a website or how does it work? Derick Rethans 3:59 Our Wiki, I will add a link to that in the show notes, has a template of how to create an RFC. It has a set set of sections. There's always an introduction that basically lays out what it is about or why this change is being made. Then there is often a proposal of what the change actually is. And then there's a few sections that are sometimes empty or sometimes are filled in such as, at least backwards incompatible changes, for which PHP version is been targeted, what the impact is to all the parts of the PHP ecosystem. But these things are not always necessary, because they don't always make sense to do right? If you want to add a new syntax to PHP, then that almost never influences existing extensions, but it will influence OPCache, for example. And then there's also often things like open issues, things we haven't quite thought through yet. A bit of a discussion, discussion bits will get filled in after people in the PHP internals list, which I'm sure we'll get to in a moment, come up with better ideas or alternatives sometimes, and then things like future scope will also be part of the template. We don't really require a very rigid approach to this, but we do appreciate if all the sections are filled in, or at least thought about in such a way that there's either information or not information. And then at the end, there's often a proposed voting choice. Everything at the moment needs to pass by two thirds majority before it gets accepted. So yeah, those are the things in the template itself. But the template is important. And you do need to fill it in, if you want to propose an RFC. Henrik Gemal 5:33 Are all RFCs public or do you have like private RFCs? Derick Rethans 5:38 All RFCs have to be public, otherwise they can't be voted on. But some RFCs start out of just a conversation with a few developers coming up with an idea. In the last few months, some more complicated RFC start out on a GIT repository. As a pull request, they never get merged anywhere. Because on GitHub, it makes it much easier to comment on specific sections for adopting feedback. Instead of having large discussions on the PHP internals mailing list, where sometimes comments might just get lost because there's too much text in there. Even though these RFC start out, while they're still sort of public, but nobody knows about them. In the end, they will always have to be public otherwise there won't be any voting, done on it, and it won't get accepted. Henrik Gemal 6:27 Where's the RFC sent to and who's kind of in charge of the RFC? Is the one that makes the RFC or is it like a RFC commander? Derick Rethans 6:37 The person that makes the RFC is responsible for guiding it through the whole process that we have. Once they are finished, there is a requirement for you emailing the PHP internals list with a specific prefix, which I think is RFC in square brackets. And then that starts a minimum discussion periods of two weeks. That discussion period might end up longer, in cases, lots of things to talk about or discuss or lots of disagreements, but the discussion period has to be a minimum of two weeks on the PHP internals mailing list. Henrik Gemal 7:09 I was wondering a little bit about the priority RFCs because I see RFCs as like, a little bit like feature requests. So wondering who actually decides on the priority of an RFC? Derick Rethans 7:23 Nobody really decides on the priority. Multiple RFCs can go through the process at the same time, you don't really have a priority of which one is more important than others. So yeah, there's nothing really there for it. Henrik Gemal 7:35 I was just wondering if it's done like a normal project, you know, there might be many RFCs at the same time. I'm wondering how many kind of RFCs are there at the moment, are we talking 10 or are talking thousands? Derick Rethans 7:50 This depends a bit on where in PHP's release cycle we are. PHP should get released at the end of November or the start of December. In all PHP seven releases that actually has happens. Usually the period between December and March, there will be like maybe one or two a week, which is great because that makes it possible for me to pick the right one to make an episode out for the podcast. At the moment, there are 10 outstanding RFCs. That means there are so many that I don't actually have enough time to talk about all of these on the podcast. However, they are often more just before we go to feature freeze, which happens at the end of June. So there's still two months to go. But you also see that over the last two years, there's a lot more smaller RFCs than there are big RFCs. So big RFCs like union types. They tend to be early in a release cycle, where smaller RFCs, as an example here, there's currently an RFC that there is no episode about, that suggests to do a stricter type checks for arithmetic or bitwise operators. Those are tiny, tiny changes. And in the last two years, there have been more and more smaller RFC than bigger RFCs because they tend to limit the amount of contention that people can disagree with and hence, often makes it easier to then get accepted. That is a change that I've been seeing over the years. But no, there are no thousands for each PHP version, I would say on average, there's about one a week, so about 50. Henrik Gemal 9:19 I want to get a little bit into the voting part, because that sounds kind of interesting, who can actually vote? Derick Rethans 9:28 After the two week minimum discussion period is over on the PHP internals mailing list, an RFC author can decide to put up the RFC for a vote. And that also requires you then to send an email to the PHP internals mailing list prefixing your subject with the word vote in capital letters. Now at this moment, you unfortunately see that people start paying attention to the RFC. Instead of doing that during the discussion period. At a moment of vote gets called you shouldn't really change RFC unless it's for like typos or like minor clarifications to things, you can't really change syntax in it for example. People can vote our people with a PHP commit access. And that includes internals developers, documentation contributors, and people that do things in the infrastructure. Everybody that has a PHP VCS account and VCS, version control system, that used to be CVS and now then SVN, and now GIT, as well as people that have proposed RFCs. So the group that technically could vote is over 1000 big, but the amount of people that vote is very much under 50 most of the time. We don't really have any criteria beyond you have to have an account to be able to vote in PHP RFCs. Henrik Gemal 10:43 How is the voting actual done? Derick Rethans 10:47 Since about last year, each RFC needs to be accepted, with a two thirds majority. On each RFC on the wiki, once a vote gets called you as an RC alter needs to include a small code snippets that then creates a poll. Very often do we want this? Or do we not want this? So it's a yes or no question. But sometimes there are optional votes, whether we want to do it a specific way, or another specific way. Sometimes that allows you to then select between different syntaxes. I don't think that is necessarily a good idea to have. I think the RFC author should be opinionated enough about picking a specific syntax. It is probably better to have a secondary vote as we call those. Those secondary votes don't to have two thirds majority is often which one of the options wins out of these. But the main RFC won't get accepted, unless there's a two thirds majority with a poll done on the wiki. Henrik Gemal 11:46 What happens after the vote? You know if it's both if it's Yes or no? Derick Rethans 11:53 I'll start with the easy case, the no case. If it's a no then the RFC gets rejected. That also means that sometimes an RFC fails for a very specific reason. Maybe some people didn't like the syntax, or it was like a one tweak where it would behave in a wrong way or something like that. But as a rule that says that you cannot put the same subject back up for discussion for six months, unless there are substantial changes. Now, this has happened with scalar type hints, for example, and a few other big ones. If an RFC gets accepted, then pending on whether there is an implementation, the implementation will get set up as a pull request to the PHP project on GitHub. And then the discussion about the implementation starts. If the implementation doesn't get to the point where it is actually good enough, or whether it can actually not be implemented in a way that it doesn't impact performance, it still might end up failing, or might not get merged. And in some cases, it means that a feature will get added at some point but it might not be necessarily in the PHP version that it got targeted for. I don't actually have an example for that now. If the implementation is already good and already discussed it can get merged pretty much instantly. And then it will be part of the next PHP version. Henrik Gemal 13:08 How many RFCs voted on every year? And what majority voted yes or no? Derick Rethans 13:16 I don't have the stats for that. But there is a website called RFC watch, where you can see which RFCs had been gone through and which one had been accepted or not, in a nice kind of graph way. I will add a link in the show notes for that. I would guess that during a year, about 50 RFCs are voted on. And I will think that about half of them are passing. But that's a guess I don't have the stats. Henrik Gemal 13:42 Thank you very much for the answers. It brought me closer to the whole process of the PHP development. You have any other things to add? Derick Rethans 13:52 I don't think so at the moment. I think what we she'd be a bit careful about is that although we're getting closer and closer to feature freeze at the end of June. We currently have just elected the new PHP eight zero release managers, but I keep the names secret, because this podcast is recorded in the past. They are going to be responsible now for doing all the organisatorical work for PHP eight zero. And that also means that feature freeze will happen at the end of June somewhere. And I expect to see a bunch of RFCs coming up with just enough time to make it into PHP eight zero, or not. So that's going to be interesting to see what comes up there. But other than that, I think we have explained most things in the RFC process now. And I thought it was a fun thing for once somebody else asking the questions and me giving the answers. And I think in the future, I think I would like to do like a Q&A session where I have multiple people asking questions about the PHP process. I also thought this was a good experiment and thanks for you taking the time to ask me all dthese questions today. Henrik Gemal 15:00 No problem. I love your podcast. I listen to it whenever I bike to work. It's nice to get some insights into the PHP development. Derick Rethans 15:10 Yeah, and that is exactly why I started it. Thank you Henrik for taking the time this morning to ask me the questions. And I hope you enjoyed it. Henrik Gemal 15:18 Thank you very much for having me on the show. Derick Rethans 15:22 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes How to create an RFC List of RFCs php RFC Watch Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 49: COPA London, UK Thursday, April 16th 2020, 09:12 BST In this episode of "PHP Internals News" I converse with Jakob Givoni (LinkedIn) about the "Compact Object Property Assignment", or COPA for short, RFC that he is proposing for inclusion in PHP 8. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 49. Today I'm talking with Jakob Givoni about an RFC that is made with a very long name, the compact object property assignment RFC or COPA for short. Jakob, would you please introduce yourself? Jakob Givoni 0:39 Yes, my name is Jakob. I'm from Denmark, and I've been working programming in PHP for 20 years now. I work as a software engineer for a company in Barcelona that's called Vendo. I got inspired to get involved in PHP internals after I saw you as well as Rasmus and Nikita in a PHP conference in Barcelona last November. Derick Rethans 1:00 there was a good conference, I always like going there. Hopefully, they will run it this year as well. What I'd like to talk to you about today is the COPA RFC that you've made. What is the problem that this is trying to solve? Jakob Givoni 1:14 Yes, I was puzzled for a long time why PHP didn't have object literals. And I looked into it. And I saw that it was not for lack of trying. Eventually, I decided to give it a go with a different approach. The basic problem is simply to be able to construct, populate, and send an object in one single expression in a block, also called inline. It can be like an alternative to an associative array. It gives the data a well defined structure, because the signature of the data is all documented in the class. Derick Rethans 1:47 Of course, people abuse associative arrays for these things at a moment, right? Why are you particularly interested in addressing this deficiency as you see it? Jakob Givoni 1:57 Well, I think it's a common task. It's something I've been missing, as I said inline objects, obviously literals for a long time, and I think it's a lot of people have been looking for something like this. And also, it seemed like it was an opportunity that seemed to be an fairly simple grasp. Derick Rethans 2:14 What kind of solutions do people use currently, instead? Jakob Givoni 2:18 I think, very popular one is the associative array where you define key value pairs as an array. The problem with that is that you don't get any help on the name of the indexes nor the types of the values. Derick Rethans 2:33 I mean, it's easy to make a typo in the name, right? And it just either exists in the array suddenly, if you set it or you just get a random null value back. As you said, yeah, there's no way of enforcing the type here, of course. COPA compact object property assignment is a mouthful, and it is a new bit of syntax to the PHP language. What is this new syntax going to look like? Jakob Givoni 2:55 While it looks just like when you assign a value to a property, but here you can add several comma separated lines of property name equals value inside a square bracket block, which is coming after the array and the array arrow operator. The syntax shouldn't really conflict with anything else we have at the moment. Derick Rethans 3:17 Because that's becoming more and more of a problem, right? Finding new bits of characters to use for new syntax. It is something that came up with annotations or attributes as well. Jakob Givoni 3:27 And then to start talking about, does this look like typical PHP? Or do you just like this syntax? Or do you hate it? It becomes a taste based thing. For me, the important thing is that if it works, and if it's fairly trivial to implement, I don't have a problem with it. Derick Rethans 3:43 There was a related RFC early in the year which was called the object initializer RFC. How is your proposal different from that one? Jakob Givoni 3:51 The object initializer is a new concept. Mine is different in in that I didn't want to introduce any new concepts. My approach was focused on pragmatism. In that other RFC, the initialization is done at the construction time. And you can kind of do it without even having to define your constructor. And one of the most important aspects of that one was to enforce that all the mandatory properties have been initialised. Because you can have type properties in PHP 7.4. If they don't have a value, then there is introduction of this new state of uninitialized properties. And the author of that RFC wanted to make sure that once the object was ready was fully constructed, it would validate that there was nothing missing there. So it has like six out of seven characteristics in common with mine, and one characteristic that is different. I looked into this about the mandatory promises and I didn't find a simple way or an obvious way to handle it. I have one idea if this COPA should pass and I have another idea if it fails. I didn't want to include that it was not part of my main goals. Derick Rethans 5:01 I'm looking at the syntax here for a bit. And it seems that way how you can do this COPA block. If you have an object, you use the arrow which is dash greater than sign square brackets, and then the list of properties that you want to assign values to. And the RFC shows that to be equivalent to doing each line manually yourself. Does that mean that it is only works for public properties? Jakob Givoni 5:31 No, it would work also, for what do you call it, virtual properties that don't actually exist, or if they're private, it would just invoke the magic set method in that case. The same thing would happen as if you were to do the assignment line by line as in the example. Derick Rethans 5:48 Without there being the underscore underscore set method set, it means that you can only really set the public properties in that case. Jakob Givoni 5:56 You won't be able to set private or protected properties directly unless the magic method does that. Derick Rethans 6:03 So does that mean that it is pretty much only something that happens in syntax, and it doesn't have any other side effects or any other functionality that you wouldn't already be able to do? Jakob Givoni 6:15 Yeah, it's just a new syntax for that. The emphasis here was pragmatism. So not introducing any new concepts. Derick Rethans 6:23 What would use cases for this be? Jakob Givoni 6:25 Typically, as I mentioned, they're data transfer objects, value objects. Those simple associative arrays that are sometimes used as argument backs to constructors, when you create objects. Some people have given some examples where they would like to use this to dispatch events or commands to some different handlers. And whenever you want to create and populate and and use the object in one go, the COPA should help you. Derick Rethans 6:58 I suppose COPA would also work for standard class objects? Jakob Givoni 7:02 It's an object just like anything else. So yeah, yes, there shouldn't be any surprises. Derick Rethans 7:07 But of course, it doesn't really make a lot of sense to use standard class because then again, of course, you don't have the benefits of checking your property names or types, again, of course. Are the other use cases you can think of? Jakob Givoni 7:19 Why don't have anything else in mind. Derick Rethans 7:22 I remember quite a long time ago, because this is a subject that comes up quite a bit. That's pretty much people that write PHP code abuse associative arrays so much. Just like the object initializers RFC, as well as your COPA RFC, try to use objects in a different way to be able to prevent developers from abusing associative arrays, pretty much as more stricter data types. In languages like C, there's a distinct datatype for this is called a struct. Do you think it would make sense that instead of trying to overload our object semantics, then in stats use, or introduce something like a struct concept of that C or other kind of statically typed languages have? Jakob Givoni 8:10 As I understand it, a struct is basically the same thing as structured as what I'm talking about structure set of data. However, I'm not sure if it's worth it to introduce a new concept. I don't know if it's necessary if it's possible to reuse the things that we already have enough familiar with. I think I would prefer that you call it overloading the object. But I don't see a lot of problems with having an object that is simply a list of properties with values. It's a very basic object. An object doesn't need to have any methods, it's possible to use that. Every time we add a new concept like struct would be, I feel that it would lead to a combinatorial explosion of implications that later you need to assess every time you want another future change. I haven't seen any RFCs that have specifically mentioned structs. But it is a very related concept. Derick Rethans 9:08 I'm just asking because I spent a lot of time in C where we have structs. But we don't really have objects or classes to begin with. It's more familiar for me to use that. And the other reason why I was asking is that perhaps it would be possible to create like a slightly more natural syntax, because, in my opinion, I think the one that you currently have chosen isn't particularly the most friendly one, but that's my own opinion here. Jakob Givoni 9:33 There might be a window of opportunity, because curly brackets after the variable is going to be deprecated as a way as an array access. So maybe that could be used just curly brackets and dropping the arrow itself. That would look a lot more like like an object, I think, and it would also be shorter. Right. I mean, PHP 7.4 deprecated these. So the question is just how soon can we remove it and replace it to mean something else completely? Derick Rethans 10:03 Yeah, that's a good question. I don't think I have the answer either. I guess it can be introduced as long as syntax that existed previously would now not do something different. And I think you would actually be okay here. Jakob Givoni 10:15 I'm pretty sure it would throw a syntax error. If you try to run this code in a previous version. Derick Rethans 10:21 I meant saying if you would reuse the curly braces, because as you said, they have been deprecated in PHP 7.4. Jakob Givoni 10:28 I mean, if someone were not to follow that deprecation notice, that is now in place and would continue to keep their the code. If we change the implementation, it's better to get a clear, fatal error than to just have something really spurious happening. Derick Rethans 10:45 Yes, absolutely, I definitely agree. Now, that's sort of what I was trying to get at, but you explained it more eloquently than I did. The RFC lists a few special cases. It talks about execution order and exceptions. I think some, somebody brought up somewhere that what happened If we're trying to set multiple properties through COPA and say the second out of three throws an exception. What would be the end state of the object for example? Could you talk a little bit through that? Jakob Givoni 11:11 Regarding exceptions being thrown in any of those expressions where you are assigning, it's important to understand that the block of code that is COPA is not an atomic operation. Anything that happened before the exception will still have happened. And everything anything that happens after won't happen. Exactly like what you would expect if you were doing it line by line. Or if you were using method chaining to do several things on an object. I think it's going to happen what you would expect to happen unless for some, I think it might be unintuitive, that it's not an atomic operation. But it's just important to keep that in mind. That's why I listed it under special cases. And there's something similar with the execution order, in that you can list the properties in any order you like. It doesn't necessarily mean that you're going to get the same result if you change the order because you will be able to use the value of a previous assignment in the next one. Again, not 100% intuitive, but I think it might be worth the trade off in implementation and flexibility. Derick Rethans 12:19 As you mentioned, there's no new semantics in there. Talking a little bit about implementation here. As there is no patch available, is this something that you'd be interested in developing yourself? Or are you looking for somebody else to help you out on that? Jakob Givoni 12:32 I actually haven't contributed any code before. I'm not familiar with C. But one reason that I chose this RFC and this approach is also that if I can't get any volunteers, I might be able to learn and to do it myself, since it seems like it's mostly a parser syntax thing, probably should be able to pick that up. Derick Rethans 12:53 I would also think because there is no new semantics in here, that it would instead be something in between, probably just the lexer that we have, the parser, and then constructing an equivalent abstract syntax tree or AST segment out of that. Jakob Givoni 13:12 I would be thrilled to collaborate with someone to do some pair programming in order to get started if anyone is up for it. Derick Rethans 13:18 So if you're listening to this episode, and you want to help Jakob out, why not get in touch with him? His contact details will be in the show notes for sure. The RFC also lists a few things that you have thought about, but you have decided not to either pick up into the RFC or you don't think they are in scope. Would we'll talk about that a little bit? Jakob Givoni 13:36 There's some special things that you can do at the moment when you assign a value to a property. Things like using a variable to specify the property name, or to generate the property name from an expression using the curly brackets after the arrow. There's also array access directly on the properties, or increment, decrement, or nested object accesses. I don't think that these things are really essential. I've decided to probably leave it out of scope for now unless it's trivial. If it if it's trivial to implement that as well. It's okay with me. It's not deal breaker. But you have to do a cost benefit analysis. And I'm thinking that it could be a future scope. If there's a demand this can be addressed in a later RFC. Derick Rethans 14:23 The RFC also talks about nested COPA. But it looks so complicated to me that I'm not sure whether it is actually something that we even should add to begin with. Jakob Givoni 14:34 I don't think it's as complicated as it looks. So you can already already do nested COPA in if you create a new object inline as well as you of course, you can assign it to a property in the outer scope of the COPA. But if you want to over, to set just one property of a nested object, then you cannot do that directly. Well, you can do it actually if you access the previous one. Because you have access to the current property when you do their assignments. So you can see in my example that you can do it. But there might be a better syntax for doing that. Derick Rethans 15:11 I'm happy to see that there's no backward incompatible changes. So that's always a win. What has been the feedback so far? Jakob Givoni 15:17 Yeah, the feedback has been mixed bag as to say. There's some recognition that this has potential to be a useful feature. This is a critique of the syntax, as you also mentioned, and then about the missing functionality, like the mandatory properties and atomic operations. And then of course, named parameters always comes up. The PHP internals list. It's a tough crowd. I really enjoyed engaged in this project. So I don't mind it's part of it. I also really like this side discussion that we're having currently about ways to improve the way that we collaborate and make progress, especially on tough issues. Derick Rethans 15:58 That has definitely improved over the last five years to a decade, but it can always be improved more, I would say. What is your end goal with this RFC? I guess you would like to see this added to PHP at some point, are you targeting it for PHP eight? Jakob Givoni 16:13 I would be extremely proud to see this added to PHP at some point. And if it can make it into PHP eight in the first release, that would be awesome. That's at least what I'm going for, for now. Derick Rethans 16:25 The PHP project is looking for release managers for PHP eight zero, with feature freeze happening at the end of June somewhere. So there's lesser and lesser time available for doing these things. So I'm curious to see where this ends up. Jakob Givoni 16:39 It's a race against time at the moment. Derick Rethans 16:42 But that's always the case, isn't it? I think be interesting to see if, if somebody wants to help out to make the implementation of this, or rather, I'd be interested to see whether you'd be able to pick up that yourself actually. We can always do with more people that work on a PHP language. Do you have anything else to add yourself? Jakob Givoni 17:00 I'd say that I spent a lot of effort researching and writing this. And I just hope that people will study the RFC properly and keep an open mind. I know it's probably going to be a hard sell. And that's okay. I just wanted to give it a go. And this is just just the beginning of my contributions, I hope. Derick Rethans 17:19 I spoke with Mate a little bit a few episodes ago. He was getting worried about it not getting accepted at some point. And I pointed out to him that scalar type hints took about a decade and seven attempts to finally make it into PHP. So it helps to just persist I would say in times. Jakob Givoni 17:37 Times change and also you get new ideas and you evolve. Derick Rethans 17:42 The language continues to improve and that's how I like it. Thanks, Jakob for taking the time to talk to me today. It was interesting to see what you're up to. Jakob Givoni 17:51 My pleasure. Thank you so much Derick for having me. Derick Rethans 17:56 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Compact Object Property Assignment RFC: Object Initialiser Episode 30: Object Initialiser Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 48: PHP 8, JIT, and complexity London, UK Thursday, April 9th 2020, 09:11 BST In this episode of "PHP Internals News" I discuss PHP 8's JIT engine with Sara Golemon (GitHub). The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 48. Today I'm talking with Sara Golemon about PHP 8 and JIT. Sara, would you please introduce yourself? Sara Golemon 0:33 Hi there. Hi there, everybody listening to PHP internals podcast. I'm Sara. I've been on this podcast before. But in case you're just getting here to for the first time, welcome to the podcast. You have a nice backlog to go through. I am a lapsed web developer, come database security engineer by day, and an opinionated open source dev slash PHP 7.2 release manager by night and also day. I've been involved with the project for about 20 years now off and on. Somehow I just keep coming back for more punishment. Derick Rethans 1:03 We're leading up to PHP 8, with lots of new features being added. But one of the biggest thing in PHP 8 that I've spoken about on the podcast on before all the way back last year in Episode 7, is that PHP eight is going to get a JIT engine. Would you care to explain what a JIT engine does again? Sara Golemon 1:20 Well, I'm going to give you the short, you can look this up on Wikipedia in two seconds definition of JIT, means just in time compilation. That doesn't really tell you much, unless you listen to it on the sort of other half of that of AOT, or ahead of time compilation. AOT is what you expect from applications like GCC, you know, you just make an application that you've got C or C++ kind of source code to that's ahead of time. JIT is saying, well, let's take the source for application. And let's just run with it. Let's just start executing it as fast as I can. And eventually we're going to get down to some compiled code. That's going to run a little bit quicker than the initial stuff did. PHP already has this nice little virtual machine built into it. We call it the Zend engine. That takes your script and immediately just says: All right, well, what does this say in computer terms? Well, a computer readable term is a series of these op codes, they're also called byte codes in other languages that give you instructions for: run this type of instruction at this time and get something done. The PHP runtime interpreter interprets that one instruction at a time basically pretending to be a CPU. This works quite well, it runs quite efficiently. But there's still this sort of bottleneck in the middle there of a program pretending to be a CPU running on top of a CPU in order to run other code. The idea of JIT is that this thing sitting in the middle is going to gradually figure out what your program really is trying to do and how it's intended to run, and It's going to take those PHP instructions and it's going to turn them all the way down into CPU instructions, so that it can get out of the way and let the CPU run your code natively as if it had been written in a compiled AOT kind of language. What that actually means for execution of PHP code in PHP 8 is still sort of a, you know, a question that's, that's left to be answered here. I listened to your interview with Zeev. Episode 7, is a good episode of getting some good information on that. We do definitely agree on what the status of the JIT within PHP is, right now we can. It's subjective facts like this is how much work has been done largely by Dmitri, where we can kind of expect to see the best gains come from. I personally think I might be a little bit more pessimistic than him in terms of the actual performance impact we get out of it. I think we both recognise we're not going to see the two to one kind of improvements we saw from five to seven. Nobody's realistically expecting that, but if you look at the demo that Zeev ran a few months ago, where he shows the Mandelbrot set being generated in two different PHP requests, and then WebSocket out to a nice pretty display, it's a very visceral reaction because you can see one Mandelbrot set being calculated much, much faster than the other. And he acknowledges though this is not realistic PHP code, nobody's writing the Mandelbrot calculation in PHP. We can see that under certain workloads, it's definitely getting faster. But for PHP core mission, which is web serving, I mean, we both know that it's not going to be massively fast. I think it's going to be almost imperceptibly fast. Derick Rethans 4:41 One question for my site, the Mandelbrot set, the implementation of that is all in a specific function, right? And it's all CPU heavy code, not IO. Sara Golemon 4:51 Yes. Derick Rethans 4:52 And it's all that in the same function. Sara Golemon 4:54 Yes. Derick Rethans 4:55 Now, what I was thinking of the other day is that how does this interact with calling standard library functions, because the JIT engine is going to have to go out of basically running things on the CPU and calling things that are then implemented in C to begin with. Sara Golemon 5:10 So you're asking that question, because you already know some of the pitfalls of JIT, and you're leading me into it. And that's fine. When a JIT emitter is taking the language that it's emitting, so PHP. As long as it remains within the scope of PHP, it can sort of keep track of where it's at. It's like, Okay, I know this variable's init, your because I saw it get set. I know that this is going on here. I know that's going on there. And it can carry those assumptions around as it's admitting code. And emit very efficient code that doesn't need a whole bunch of double check guards of like: Wait, is this still an integer? Wait, is that still a string? All of these sort of like escape hatches for when things go wrong. Anytime you cross over into, I will say C-land, or internals land, or ahead of time compiled land. It's basically calling into what it sees as a black box. And it just says: Okay, here's some data, I know the types going in, have fun with it. And something air quotes happening in the air happens with that code and the black box spits out an answer. Well, by the time the black box has spit out the answer, the JIT that has taken that PHP code, no longer knows if any of its assumptions are true or not. It just has to say: Well, time to start from scratch, time to keep track of where we are from here, build up a new set of assumptions. So we get this speed bump in the road of executing code. And it turns out most PHP applications are using a whole lot of those internal API's because they're quite useful. There is a kitchen sink in PHP, and it does stuff. So you have these repeated hits of this road bump happening, and that's not great. If we want to compare this to other JIT languages that are out there. I might suggest we compare this to HHVM because of course, HHVM, at least in the beginning implemented a fairly close kin cousin to the dialect of PHP. It has since diverge much more and become hacklang. But it was doing the same thing, taking PHP code, running it native on the CPU and occasionally having to make that cross to this its own version of internals, or it was running C++ code. One of the ways to reduce those numbers of jumps is that they took a lot of those internal functions, the ones that actually didn't need to do anything, particularly internals ish, and just rewrote them in PHP code. And if you look at the HHVM source code right now, there is a big directory called systemlib and that's a whole bunch of hacklang code, read it as PHP code, that is implementing a lot of these very common quote unquote internal functions. We just had an RFC for function called str_contains(), that is a function that could have been hundred percent been written just as PHP code. Something could have thrown that into packagist. For the record, I voted against it because of exactly that. I think you should write that in packagist and just put it in your composer.json is okay. It's gonna pass anyway, it got a lot of votes. That aside over, that is a sort of function that if we were putting it into sort of an 8.X version of PHP, where we did have our own type of systemlib, we would have probably just said, let's write that as PHP code. So that the JIT, when it enters that function, can keep all those assumptions intact, and potentially even inline some of those instructions and avoid the function call entirely. That's basically taking all of the instructions that are part of the in this case, str_contains() function, and implementing them within the scope of the function that was calling it. So you skip that entire function call overhead, which a lot of people know is still one of PHPs sort of weaker points in terms of where that fat to trim is, as Zeev said in Episode Seven, we still have some parts of PHP that are a bit slow, irrespective of a JIT. Derick Rethans 8:50 There are actually a few functions that have been inlined now into op codes. strlen() is an example of this where instead of it now being a function call, it's actually directly an opcode. Because it is a function that is used so much and actually gain a bit of performances there. Sara Golemon 9:05 Yeah, I think all of these functions as well are just a single opcode for type check. Yeah. Derick Rethans 9:10 There's a whole bunch of them for sure. I saw that earlier this morning, Dmitri produced, or proposed another branch in which he implemented tracing JITs, instead of the JIT that we already have, and I have no idea what the difference is between a normal JIT engine and the tracing JIT engine, Sara Golemon 9:25 Ultimately, the distinction is not that important to end users, it's going to function the same, but it is a sort of an internal implementation detail. HVVM's by the way, is a tracing JIT. It basically looks at any given unit of work that it needs to translate, let's say a function, and it says, what are the pieces that have these sort of non branching parts attached to them? Let me look at each of the non branching pieces. And let me create a version of that translation based on the types that I expect to be going in there. If the types fail, I'm gonna have to create a new version of that piece. But then that piece can plug into this sort of chain of tracelets to create a full function. Most of the time, especially if you've written code that is well type hinted, you've got, you know, strict types turned on, you've got all of your types on the on the function parameters set. And it's very easy for the JIT to infer the types out of what you've put into your function. You're only ever going to need to create a single tracelet of any given section, and your full trace is going to be a single, unbroken chain of: do this, do this, maybe do a jump to another spot, just keep doing this, doing this, doing this. If you have, let's say, slightly messier code, maybe you're not using any kind of type hinting it becomes very difficult to infer any of the types, because there's lots of different call sites, that are doing lots of different things. We may end up having some functions that have multiple tracelets per body section that get built into the giant bush of interconnected edges, that's less ideal in terms of maximising performance, but it still at least functions. Derick Rethans 11:06 We have spoken a little bit about what a JIT engine is and sort of how it works. It sounds quite complex and complicated. Sara Golemon 11:14 It is definitely complicated. And I'm feeling like that's another lead. And so I'll just run with it. Derick Rethans 11:19 I've also got to say my next leading question... Maybe I should actually ask the question? Sara Golemon 11:24 Well, let's actually take a step back from the JIT for a second. And let's look at where the engine is right now. So the engine is basically two very large pieces. That's the sort of the extension library of all of the runtime functions. Everything you see exposed in user space, and the actual scripting engine. There are some other smaller pieces, but those are two, the two really big pieces. There are a whole lot of people pay a whole lot of attention to the extension piece, because that's the flashy bit. That's the part that gives you some bit of binding that you didn't have before, or some bit of functionality that can be delivered out of the box as part of that kitchen sink. And that definitely needs attention. I'm glad that that continues to evolve. But the scripting engine is that piece that defines syntax and how code is actually going to run. Derick Rethans 12:09 Reading extension's code as a whole lot easier than reading the engine code. Sara Golemon 12:13 And that's where I was going to go with that, yes, if you look at the code that's under ext, you can even come into that code without knowing any C at all. And you can actually make pretty good sense of a lot of it because a) PHP uses a whole lot of macros. So every function is literally defined with a macro that says: PHP_FUNCTION, like right here, PHP function, every class method, PHP_METHOD, here's the class name. Here's the method name. And what these things do are pretty clear sort of API's. They're very small bite sized pieces for the most part. The bits that involves sort of defining a class and how it does its memory management, those get a little bit more complicated, but I think on the whole extension code is far more accessible. If you go and look at the engine, particularly the runtime pieces of the engine, although the compiler is complex as well. You have to do a lot of digging before you even get to a point that you can see how the pieces maybe start to fit together. You and I have spent enough time in the engine code that we know where to look for a particular thing. Like let's say that opcode, you mentioned that implements strlen(). We know that, oh, zend_vm_def.h has got the definition for that. We also know that that file is not real code. It's a pre processed version of code that gets built later on. Somebody coming to that blind is not going to see a lot of those pieces. So there's already this big ramp up just to get into these engine as it exists now in 7.4. Let's add JIT on top of that. You've got code that is doing call forward graphs, and single static analysis, and finding these tracelets, and making sense of the code at a higher level than a single instruction at a time, and then distilling that down into instructions that the CPU is going to recognise. And CPU Instructions are these packed complex things that deal with immediates, and indirects, and indirects of indirects, and registers. And the x86 call ABI is ridiculous thing that nobody should ever have to look at. So you add all this complexity to it, that by the way, sits in ext/opcache. It's all isolated to this one extension that reaches into the engine, and fiddles around with things to make all this JIT magic happen. You're going to take your reduced set of developers who know how to work on Zend engine, and you're going to reduce that further. I think at the moment, it's still only about three or four people who actually understand how PHP's JIT is put together enough that they can do any effective work on it. That worries me for sure. I don't think that's an insurmountable hill to climb, especially if we can start getting some documentation written about it, at least from a high level point of view. Hey, you know, look over here to find this stuff. Look over here to find that stuff. Something to get started. So the people who have at least that basic understanding of how the VM part of the Zend engine works can sort of upgrade their knowledge to get into to the JIT. I only think that's worth it. If we actually get real performance boost out of JIT. If we actually turn the JIT on, and we see that for PHP's core workload, which is web serving, we're only seeing a one to 2% gain. For me, that's not enough. It may be enough for others. But for me, I would call that experiment, not a failure, but a non success at that point. Certainly there are people out there who are still going to want to use it, because they are you doing command line applications, and they're doing complex math. And I'm not saying we can't have it. I'm just saying it takes less than a forward stage that point. Derick Rethans 15:43 Somebody mentioned earlier in the chat room. It's also another set of potential bugs, right? Sara Golemon 15:48 It is definitely another potential bugs. Derick Rethans 15:51 It's pretty much another implementation of the PHP syntax bits of PHP. Sara Golemon 15:57 So if you run an application and you get behaviour you don't expect, where is that behaviour actually coming from? You can spend a lot of time looking in Zend engine because you're thinking like: Oh, well, this is the thing that executes opcodes. And when I run it in a single command line, it's definitely going through this bit of code, but it works on a single command line run. But at the twentiest request on my web server, it's not working. Why is that happening? Well, it turns out, it's happening, because that's when the JIT has finally kicked in, because it has enough information. And it's running through this tracelet that was just a little bit wrong. And well, crap. You mentioned I think, at one point, when we were talking in Miami just a couple months ago, that you're just gonna have to turn the JIT off entirely when Xdebug is running, Derick Rethans 16:41 Just like I'll already turn OPCache optimizations off, because there's just too confusing for people. Sara Golemon 16:46 It's confusing and complex, but it's also it may not even be 100% possible because we are right there down at the bare metal of running CPU instructions. There's not a lot of opportunity to just say like, Oh, hold on Mr. CPU, let me just take a look at your registers right now. Okay, this is okay, let's go ahead and keep going now. The VM that we have now in in Zend lends itself 100% to those kinds of activities, CPU does not. What that means is that what we experience in the development mode with Xdebug running is not going to be the exactly the same thing that we experience in real runtime code. And I don't know if we have a solution for that. Derick Rethans 17:23 As far as I know, there's no solution for it at all. Sara Golemon 17:26 I was trying to cage it in the hope that maybe we could someday have solution for it. Derick Rethans 17:30 It'd be lovely, but I can't see that happening to be honest. I think it's going to be important to find out how much this actually benefits, real live code. How does it benefit your Laravel project or your Symfony project or anything like that? I think it's going to be hard to now make a case for not shipping PHP 8 with a JIT. I think that'd be a bit unfair. But on the other side, if it's, as you say, only really gives you one or 2%, whether this is worth have the additional complexity. The additional maintenance burden as well as another opportunity for having bugs that are a lot harder to reproduce, but it's actually worth having it at all? Sara Golemon 18:11 I definitely don't want to poopoo on the JIT effort. Derick Rethans 18:14 Oh, no, absolutely not. Sara Golemon 18:15 I think this is an important experiment to run. And I think if 8.0 as a whole winds up being a sort of public beta experiment of it, that will definitely give us a lot of good information. And I am super hopeful that we see better percentages, that we see 5-10 maybe even 15% Derick Rethans 18:31 Absolutely. Sara Golemon 18:32 I want to be guarded in what I how I talk about it on a podcast like this because I don't want anybody say: Oh, 8's gonna be great. Our code is gonna run 10 times as fast as it was running before No, that's not gonna happen two x is not gonna happen. We're talking much lower numbers than that. Be guarded, be hopeful, but 8.0 is going to be, as I said, it's going to be that sort of public beta experiment. Derick Rethans 18:55 I think that's great. I think running this experiment again because ta similar experiment was, of course run during the PHP 5.6 days when PHP 7 came out. Originally with PHP 7, was PHP with a JIT engine. And then Dmitri and others found out that it was so much other things that could be done to make PHP run pretty much twice as fast. Sara Golemon 19:16 Yeah, there was a lot of really low hanging fruit. Derick Rethans 19:19 Yep. And that was great to see. I am apprehensive about people thinking that the JIT engine in PHP eight is going to similar performance boost. Sara Golemon 19:29 We'll see. Nothing to say about it, but then: we'll see. Derick Rethans 19:32 But I would suggest is that if you're interested in seeing what this can do for your projects, you should go try it out. Download PHP's master branch, enable it and see how it goes. Sara Golemon 19:41 And of course, make sure you are running on x86 hardware. I doubt very much that he's bothered to put more than one back end on this. Derick Rethans 19:48 I don't actually know. Sara Golemon 19:49 I haven't looked. He might be using some helper library for it. So it's possible that we're hitting multiple backends. But this is probably going to be an x86 only thing and possibly a Linux thing. I should find out the answer to that question. Derick Rethans 20:00 I should do too. Okay, Sara, thanks for taking the time this morning to have a chat with me about PHP 8' JIT efforts. Sara Golemon 20:08 It's fun as always, I always love to speak with you Derick. You bring a bright Corona of sunlight to my day. Derick Rethans 20:16 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes Episode 7: PHP and JIT Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
Phil’s guest on this episode of the IT Career Energizer podcast is Derick Rethans. Derick is a PHP internals expert and author of Xdebug. He works as an independent contractor and consultant on PHP extensions and related projects. Derick has contributed to the PHP project in numerous forms and is the host of the PHP Internals News podcast as well as being a frequent speaker at conferences. In this episode, Phil and Derick Rethans discuss why asking for help is a strength rather than a weakness. Derick shares how he counters his introvert tendencies so that he can fill in his knowledge gaps and help others. They also talk about how to take on challenging roles and build new skills, while still maintaining the right work/life balance. KEY TAKEAWAYS: (3.18) TOP CAREER TIP Asking for help should be a skill that you use liberally. Doing so saves you a huge amount of time and hassle. Often, it is a mutually beneficial experience. They answer your questions and, often, will ask you a few of their own. We should all be leveraging off each other´s knowledge base. (4.47) WORST CAREER MOMENT At the start of his career, Derick was asked to fix an SEO Unix machine. He knew a bit about Linux and Unix II, but the machine he was working on used an older version. One he was not familiar with. As requested, he attempted the fix. A decision that nearly led to disaster. He explains what happened, in the podcast. From this experience, he learned not to assume you know things about technologies you have never touched before. Things are not always as similar as you might think. (7.10) CAREER HIGHLIGHT Being approached by people at conferences who say to him “It is nice to meet you; I like what you did on x project…” feels particularly good. Although, at first, it felt very weird to him. (8.23) THE FUTURE OF CAREERS IN I.T Derick finds the pace of change and the fact that there is always something new to learn to be exciting. For example, he has only recently started to learn new languages. He has also had a lot of fun playing a game that teaches you how to build a computer from scratch. Something he tells you about in the podcast. (10.39) THE REVEAL What first attracted you to a career in I.T.?– Derick´s interest in the industry was sparked by the fact that, as a kid, he was allowed to play around with a computer. What’s the best career advice you received?– You need to take the leap. What’s the worst career advice you received?– You have to work 80 hours a week to be successful. This is not a sustainable way of working. What would you do if you started your career now?– He would focus on his language skills so that he could better communicate with others right from the start. What are your current career objectives? – Derick wants to start hiring this year, so he is focusing on building the skills he needs to do that effectively. What’s your number one non-technical skill?– Being able to put himself out there by writing books, public speaking, and blogging. How do you keep your own career energized?– Pushing himself to have contact with other people keeps him energized. He explains why in the podcast. What do you do away from technology? – Derick loves walking. He walks everywhere, even if it takes a couple of hours to get there. (16.44) FINAL CAREER TIP With the help of others, actively work to teach yourself the things you are not good at. Even if you are very shy you should still push yourself to do this. Derick is naturally quite introverted. But he forces himself to talk to random people, especially at events. Doing so makes it easier to fill in the gaps in his knowledge. BEST MOMENTS (3.17) – Derick- “Asking for help should be a skill that you use liberally. Often, it is a mutually beneficial experience.” (6.52) – Derick- “Don't assume that you know things on stuff that you've never touched before. It is too easy for things to go wrong.” (11.26) – Derick- “Don´t be afraid to take the leap. Most of the time it will pay off.” (15.42) – Derick- “Don´t keep doing the same thing for years. Instead, learn about how others are utilizing that tech.” (17.09) – Derick- “With the help of others, teach yourself the things you're not great at.” (17.29) – Phil- “Standing in the middle of the room, at conferences, draws others to you. So, you can learn from more people.” ABOUT THE HOST – PHIL BURGESS Phil Burgess is an independent IT consultant who has spent the last 20 years helping organisations to design, develop and implement software solutions. Phil has always had an interest in helping others to develop and advance their careers. And in 2017 Phil started the I.T. Career Energizer podcast to try to help as many people as possible to learn from the career advice and experiences of those that have been, and still are, on that same career journey. CONTACT THE HOST – PHIL BURGESS Phil can be contacted through the following Social Media platforms: Twitter: https://twitter.com/philtechcareer LinkedIn: https://uk.linkedin.com/in/philburgess Facebook: https://facebook.com/philtechcareer Instagram: https://instagram.com/philtechcareer Website: https://itcareerenergizer.com/contact Phil is also reachable by email at phil@itcareerenergizer.comand via the podcast’s website, https://itcareerenergizer.com Join the I.T. Career Energizer Community on Facebook - https://www.facebook.com/groups/ITCareerEnergizer ABOUT THE GUEST – DERICK RETHANS Derick Rethans is a PHP internals expert and author of Xdebug. He works as an independent contractor and consultant on PHP extensions and related projects. Derick has contributed to the PHP project in numerous forms and is the host of the PHP Internals News podcast as well as being a frequent speaker at conferences. CONTACT THE GUEST – DERICK RETHANS Derick Rethans can be contacted through the following Social Media platforms: Twitter: https://twitter.com/derickr LinkedIn: https://www.linkedin.com/in/derickrethans/ Website: https://derickrethans.nl Podcast: https://phpinternals.news
PHP Internals News: Episode 47: Attributes v2 London, UK Thursday, April 2nd 2020, 09:10 BST In this episode of "PHP Internals News" I chat with Benjamin Eberlei (Twitter, GitHub, Website) about an RFC that he wrote, that would add Attributes to PHP. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 47. Today I'm talking with Benjamin Eberlei about the attributes version 2 RFC. Hello, Benjamin, would you please introduce yourself? Benjamin Eberlei 0:34 Hello, I'm Benjamin. I started contributing to PHP in more detail last year with my RFC on the extension to DOM. And I felt that the attributes thing was the next great or bigger thing that I should tackle because I would really like to work on this and I've been working on this sort of scope for a long time. Derick Rethans 0:58 Although RFC startled attribute version two. There was actually never an attribute version one. What's happening there? Benjamin Eberlei 1:05 There was an attributes version one. Derick Rethans 1:07 No, it was called annotations? Benjamin Eberlei 1:08 No, it was called attributes. There were two RFCs. One was called annotations, I think it was from 2012 or 2013. And then in 2016, Dmitri had an RFC that was called the attributes, original attributes RFC. Derick Rethans 1:25 So this is the version two. What is the difference between attributes and annotations? Benjamin Eberlei 1:30 It's just a naming. So essentially, different languages have this feature, which we probably explain in a bit. But different languages have this. And in Java, it's called annotations. In languages that are maybe more closer home to PHP, so C#, C++, Rust, and Hack. It's called attributes. And then Python and JavaScript also have it, that works a bit differently. And it's called decorators there. Derick Rethans 1:58 What are these attributes or annotations to begin with? Benjamin Eberlei 2:01 They are a way to declare structured metadata on declarations of the language. So in PHP or in my RFC, this would be classes, class properties, class constants and regular functions. You could declare additional metadata there that sort of tags those declarations with specific additional machine readable information. Derick Rethans 2:27 This is something that other languages have. And surely people that use PHP will have done something similar already anyway? Benjamin Eberlei 2:35 PHP has this concept of doc block comments, which you can access through an API at runtime. They were originally I guess, added as part or of like sort of to support the PHP doc project which existed at that point to declare types on functions and everything. So this goes way back to the time when PHP didn't have type hints and everything had to be documented everywhere so that you at least have roughly have an idea of what types would flow in and out of functions. Derick Rethans 3:07 Why is that now no longer good enough? Benjamin Eberlei 3:09 Essentially, user land developers use doc blocks to put metadata in there, and you could access them through an API. We had two sort of standards, or we still have two standards that use this. The documentation standard coming from the PHP documentor community. And then mostly runtime use case that exists now is covered by the doctrine annotations library, which, incidentally, I have also worked on a lot. It is used, for example, by the Symfony community, by the Drupal community, and by a few other communities as well that are smaller that wanted to go into the direction of using annotations in this case or attributes. Derick Rethans 3:53 What would doctrine use an annotation for? Benjamin Eberlei 3:55 I said before that annotations, add metadata to declarations. So let's say you have in your code, for example, classes that you want to store in the database. So you need to map PHP classes to database tables and back. Usually, you would do that using some kind of configuration. And configuration can be many folds. So the easiest way would be to write this in PHP, say, this is the column name, this is the field name, this is the class name and then store and use this information. And then you can go and store this in ini files, yaml files, XML files. The problem with this kind of approach is often that you have the configuration file and you have the class, and they are totally separate from each other, usually in very different places of the codebase. This is not some kind of configuration that is fluid. It's very, very static configuration that depends on the class. And it will not really change unless the class also changes. So changes are usually done together. In this case, it might make sense to put the configuration on to the class. Because then you see the declaration, you see it's configured in some way. And then you can more easily understand that changes affect each other in some way. And it leads to less mistakes, in my opinion. And it makes it a little bit more obvious that the class is used in some configured way. Derick Rethans 5:26 We've had a quick look at what annotations are. The RFC introduces them in a different way, the attributes that you're not proposing, how are they different from the doc block comments? Benjamin Eberlei 5:37 The idea is that we introduce a new syntax that is independent of the doc block comments. Essentially, before each declaration, you can use the lesser than symbol twice, then the attribute declaration, and then the greater than sign twice. This is the syntax I've used from the previous attributes RFC. And Dmitri at that point used the syntax from Hack. And it makes sense to reuse this not because Hack and PHP are going in the same direction any more. But because Hack at that point they introduced it that they had the same problems with which symbols are actually still easy to use. And we do have a problem in PHP a little bit with the kind of sort of free symbols that we can still use at certain places. And lesser than and greater than at this point are easy to parse. There are a bunch of alternatives and one thing that I will probably propose is an alternative syntax where we start with a percentage sign, then the square bracket open and then a square bracket close. This is more in line with how Rust declares attributes. While Rust uses the sort of the hash symbol, which we can't use because it's a comment in PHP. Derick Rethans 6:54 And you don't want to use emojis. Benjamin Eberlei 6:55 Some crazy people propose to use emojis which would easily work in PHP, but I guess it would be hard to remember the number to get the Unicode sign. Derick Rethans 7:06 Within the two opening lesser than signs and two greater than signs to close it. What's in the middle? Benjamin Eberlei 7:12 You declare an attribute name. And then you sort of have a parenthesis open, parentheses close, to pass optional arguments. You don't have to use them. So you can only use the attribute name. If you sort of want to tag something: just this is a validator, or this is an event listener, whatever you come up with, to use attributes for. But if you need to configure something in addition, then you can use. The syntax sort of looks like if you would construct a new class, except that you don't have to put the new keyword in front of it. Derick Rethans 7:45 It looks like function arguments pretty much. Benjamin Eberlei 7:47 Yes, exactly. Yeah. Derick Rethans 7:48 What kind of values can you use in the optional arguments to the attributes? Benjamin Eberlei 7:53 The attributes are not really runnable code in a way. Since they are declarations, they don't allow arbitrary PHP code to run there. What is obviously allowed a simple literal values, so a number, or a fixed string, a fixed array declaration, and all this kind of things are possible. What is also possible is exactly the same expressions that you can also declare in class constants. So, in the class constants, you can do simple mathematical expressions, you can reference other constants. So, this is something that will be very interesting for attributes to do reference class names for example. Derick Rethans 8:34 What happens if you define an attribute on a declaration element? Benjamin Eberlei 8:38 What happens is that while the PHP script gets compiled, it will see that there are attributes declared and it will parse the attributes and similar to the doc block store them on the internal structure for future reference. Attributes are parsed in my current proposal in a way that you can have every attribute just once. This is something that is still under heavy discussion, because there are a few good ideas why you would need two, or multiple. Essentially similar to how a doc block is a string, we then store an array, which represents the attributes belonging to the class or the function or the constant. And this is something that the engine stores and also stores it in OPCache. Derick Rethans 9:27 How would you access these attributes? Benjamin Eberlei 9:28 Attributes are accessed through the reflection API. The reflection API also allows access to doc blocks. For attributes that would be a new function called getAttributes(). And it returns a list of all attributes using a new reflection class called ReflectionAttribute. There you can access what name does this attribute have? What are the arguments that are passed? And then this goes into one of the next features of this RFC proposal. You can also ask it to return this attribute as an object instance. Derick Rethans 10:05 An object instance of which class though? Benjamin Eberlei 10:07 Attributes, and this is something that is different to the initial version, the version one attributes RFC is, attributes names resolve to class names. That means if you declare an attribute, for example, Foo, and you have an import for our class, MyApplication/Foo, then during passing the attribute will be resolved to my attribute view name. It uses the same mechanism for class resolving that is used in every script. It reflects the use statements that are declared in the file. And you can use namespaces, namespace operators to reference the attributes as well. Derick Rethans 10:49 These are attributes not classes, so I don't quite see all the link between the attribute names in the classes is? Benjamin Eberlei 10:55 One problem with the original doc block based system was that there are conflicts between attributes of different systems. One library would have a type annotation, or a var annotation, and some other library would also use it. This could lead to conflict if the syntax for them was slightly different. So this would lead to problems when multiple parses would use the same attribute. And they would parse them differently. And this could lead to errors. One problem that was mentioned in the initial attributes RFC and that, I think, if you vote us all so used as a reason for voting no is that there was no namespacing, which means that different libraries could clash and their use of attributes. My idea was we already have classes, we have namespacing. We can resolve this by using this mechanism. You declare an attribute and an attribute always resolves to a class. In the best case scenario, you would also declare this class in your code. Essentially, the attribute is not an attribute, but it's a special class that represents an attribute. This is also shown in the code that by having an additional interface, or a sort of a marker interface, that attributes can implement to make it obvious that they they are used as an attribute. Derick Rethans 12:19 You mentioned that you could access the attributes through reflection API, and you can get them out as an object? Benjamin Eberlei 12:25 Yes, this is why I mentioned before that the syntax sort of looks like constructing a new object, but without the new keyword. When you access the objects through the reflection API, it would essentially instantiate the class, and all the arguments that you put into the attribute declaration are passed into the constructor of the object. And this is why the connection is there between a class and an attribute. It directly goes to instantiating the attributes as an object using the arguments and giving the developer access to them. Derick Rethans 13:00 Does it only do something like this when you use the getObject() on the reflection arguments? Or is it also possible that I don't care about these classes things whatsoever, and I can just get a list of attributes and their optional values that are associated with them? Benjamin Eberlei 13:16 You don't have to have a class, and the class name resolving in PHP is independent of classes actually existing. The attributes RFC respect that. You can just import anything that is not a class and use an import statement to shorten the attribute usage, or you can use the absolute namespace syntax to put a fully qualified attribute name into your code. And it wouldn't fail. The fail would only happen when you call the method on ReflectionAttribute to get the attribute as an object. So this is something the RFC is also in flux with and about to change it. The first version mentioned that attributes will always be auto loaded when they are declared at compile time. This would essentially treat attributes similar to base classes or interfaces, in a way that they are always resolved, they're always checked. However, this is a little bit overkill for userland attributes. And a lot of feedback was related to this should only happen when the reflection API is used. So I'm going to change this. One thing that we do need to handle in a way is a built in attributes. One reason why I want to add this RFC as well is that there are a few use cases coming up in PHP itself, that could benefit a lot if we had built in attributes. Since we don't have a clear path forward there. But Nikita has published his ideas on editions. So there's some paths forward to having PHP code work slightly differently depending on what developers want. Attributes could be helpful there. Other things for example, the JIT. JIT has features where you can at the moment use doc block comments to declare methods as always JIT-able or never JIT-able. Dmitri used doc block comments to check for JIT or no JIT tag in there. This is essentially something that attributes should be used for because should be machine readable. Then there's a lot of other stuff that for example, Rust also put forward that PHP is struggling with: conditional declarations of functions. For example, Symfony has a polyfill library that adds functions that are in higher languages, re implements them in a way that they're also available in lower versions where they don't exist in core. There are a lot of hacks around the sort of conditional declaration of functions and classes and stuff that make it difficult for OPCache to actually cache the files. I believe there are also even more problems if you use these kind of fights with pre loading. Essentially what could be done with attributes would be something like conditionally declared as function only if it's on PHP 7.3 and lower something like this. Derick Rethans 16:13 You just mentioned using JIT or no JIT as an annotation. Does that also mean that extensions have easy access to these attributes? Benjamin Eberlei 16:21 OPCache's not a PHP core functionality. It's still its own extension. The idea is that extensions have access to attributes in a very simple way. So there will be a Zend API, sort of an internal name for an API that the Zend engine provides to extensions and extensions will be able to access attributes and make decisions based on this. Extensions can already hook into the compile step of PHP and there's a hook called zend_ast_process. During AST processing, you can do stuff. That would be one way to, for extensions to look at attributes and maybe change code if they want. Then the engine obviously has tonnes of other hooks where the declarations are available in the data structure that the Zend engine provides. So there's zend_class_entry, for example, where you could look into the attributes as an extension and make decisions. Derick Rethans 17:20 This is a pretty new RFC, and hence there're always going to be few open issues. Because we like to argue about stuff. What are the open issues on this RFC? Benjamin Eberlei 17:29 This is the seventh RFC on this topic. So there has been a lot of discussion. I guess this feature is, in a way quite controversial because of the implementation details. A lot of my work now will be to find the best implementation that can actually make this feature part of core by getting enough votes for it. And so I gathered a lot of feedback from the community; also talked a lot to contributors. Changes that I will be probably doing is allowing multiple attributes. What I said before, the auto loading has to be clarified. There has to be some distinction between internal attributes and user land attributes in a way that doesn't require auto loading. Hack, for example, has __ as a magic prefix, which I want to avoid, because it puts up all this magic methods, sort of argument back on the table. We need to have something to make a distinction between userland and internal attributes, because the internal attributes need to be validated very strictly at compile time. And the userland attributes need to be validated only when you call the getAsObject() method on the reflection API. Derick Rethans 18:42 How long do you think there'll be before you put this RFC up for a vote? Benjamin Eberlei 18:46 It's a bit tricky because this issue is so controversial. I don't want to invest month of work and then get a no vote. And so I do want to have some feedback quite quick enough. I do realise that the first draft needs some work and clarifications that would otherwise lead to no votes from contributors. So I hope to get this done in, let's say, two to four weeks of additional work. Derick Rethans 19:09 All right, Benjamin. That was a great explanation of the attributes version two RFC. Benjamin Eberlei 19:16 Thank you for having me, and I really appreciate it again. Derick Rethans 19:21 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Attributes v2 Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 46: str_contains() London, UK Thursday, March 26th 2020, 09:09 GMT In this episode of "PHP Internals News" I chat with Philipp Tanlak (GitHub, Xing) about his str_contains() RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 46. Today I'm talking with Phillipp Tanlak, about an RFC that he's made titled str_contains. Phillipp, would you please introduce yourself. Philipp Tanlak 0:35 Hey, Derick. My name is Philipp. I'm 25 years old and I live in Germany. I work for an IT service company, which does mainly development and maintenance of IT projects. We specialise in the maintenance of e-commerce website and create enterprise applications. Derick Rethans 0:52 How long have you been using PHP for? Philipp Tanlak 0:54 I've been using PHP for quite a long time now that might be six years I guess. Derick Rethans 0:58 What brought to you creating an RFC? Philipp Tanlak 1:02 The main reason I've created this RFC was out of necessity and interest, mainly to scratch my own itch. Derick Rethans 1:08 That is how most things make it into PHP in the end isn't it? Philipp Tanlak 1:11 Yeah, I guess. Derick Rethans 1:12 The RFC is titled str_contains, that tells me something that is about strings and containing things. How do we currently find a string in a string? Philipp Tanlak 1:22 The current approach to find the string in a string is to use the strpos() function or the strstr() function. But on Reddit, I found someone also use preg_match which I find kind of interesting. Derick Rethans 1:35 There are multiple amount of different methods in use, what are the general problems with these approaches that people have made? Philipp Tanlak 1:41 So the current approach which I find is not very intuitive, and mainly because of the return values of these functions. For example, the strpos() returns either the position where the string is found, or a false value if the string is not found, but there has to be a check with a !== operation, and the strstr() function just returns a string. So you have to convert that to a boolean to check if the string is found or not. Derick Rethans 2:11 Because with strpos(), if you wouldn't use the === or !== operator. Of course, if it would find it at the first position of the string, it'd be zero position, and it would return false, even though it's sfound it. Philipp Tanlak 2:26 Yeah. Derick Rethans 2:27 So there's a few different problems with these things. Also, I don't think it's particularly vary intuitive to do because you sort of need to come up with like a whole construct to see whether it's part of a string. Philipp Tanlak 2:37 Correct. I don't think it's intuitive for a beginner. So if someone is learning PHP for the first time, then he has to search through the documentation, what are the exact return values for these functions, and has to remember that so I thought, string or str_contains() might be a better fit for that to just return a true or false value. Derick Rethans 2:58 We've mentioned str_contains() a few times now, I guess the RFC is producing to add this function. How would this function differ from what PHP already has? Philipp Tanlak 3:07 So this function does not differ in a lot of ways. It's basically the same implementation of the strpos() function. But instead of returning the position of the found string, it just simply returns it as a boolean value. So either true or false. Derick Rethans 3:23 I can imagine some people will say, well, you can just do this in your own wrapper function, right? Because pretty much what it deos is converting the results from strpos() to a boolean. But you must have a good reason of why to want to add an extra function here. Philipp Tanlak 3:38 The reason for this function, and maybe someone might disagree is, mainly a user experience for the developer. So this is just out of necessity which I found, and I've been using this function quite a lot. So I thought this might be a valid add to the PHP language. So I tried to implement it and it got some great reviews. So I thought that wasn't a very bad idea I had. Derick Rethans 4:04 Is the RFC suggesting just out a single function: str_contains(). Philipp Tanlak 4:09 Yes, the RFC is currently adding just a single function, which is the str_contains(). When I first submitted the discussion about this RFC, there were quite a few people asking why is there no case insensitivity or multibyte versions for these, and I did not think of those at first. But in the discussion, it became clear that the multibyte version did not seem to be very necessary because the comparison is going to be byte by byte. Unlike strpos(), the position of the found string is not relevant. So it doesn't matter if there is any difference in encoding. Derick Rethans 4:47 I remember in last year, there was another RFC related to strings functions they were the string_starts_with() and a string_ends_with(). Those are two functions and there were also variants for both case insensitivity, ss well as multibyte. Which made eight different functions to be added to pretty much do a single thing. That RFC failed, potentially because there are so many things being added. Philipp Tanlak 5:11 Yeah, that was also the main reason, I think the case insensitivity of this function, or the variant of it was not so relevant. So I did not include it into the RFC just because of this case you mentioned. So instead of polluting the global space with more functions, someone suggested to just advance PHP incrementally and add in case sensitivity for this function just if it is necessary. Derick Rethans 5:37 This is a common recurring subject. Most of the people I spoke with in the last few episodes are all adding things to PHP bit by bit instead of coming up with big RFCs which I think is a good way of going forwards. When reading the RFC, I had a quick look at which argument the function would accept. PHP of course this weakly typed strings in most of time. Is this str_contains() function handling distinct different from what strpos() does for function arguments. Philipp Tanlak 6:10 So the str_contains() function uses the same internal function, which is php_memnstr(), if I recall correctly. It tries to interpret it as a string. And if it's not a string, it either throws a warning or notice, but I've just run some checks and it seems like in the next PHP version, non string values which are passed into the string functions will be interpreted as a string, and if that is not the case, it will throw an error or usually return false. Derick Rethans 6:43 So it doesn't do any special magic, and just relies on the PHP tends to do for parsing arguments and weak and strict typing. Philipp Tanlak 6:51 Yes, that's correct. Derick Rethans 6:53 Most RFCs they come with a patch, as does yours. How did you find it getting started with writing things for PHP instead of using PHP. Philipp Tanlak 7:02 So basically, I've looked at the PHP source code in the past, just to see how things are implemented. And I had some basic background in C. So I thought that this was not very hard for me. Most of the functions or things I had to do to include this patch, were already there. So basically, I just copied the strpos() function and remove the, when the string is found, use the position to calculate a new string and just remove that code and return the boolean value from the found position. Derick Rethans 7:35 Because it is not a very different function from strpos(), it's just pretty much a different return type. It's a lot easier to do. Philipp Tanlak 7:44 Yeah. Derick Rethans 7:45 When looking at feedback, what were the main criticisms of this? Philipp Tanlak 7:48 The main criticism of this was basically just the variants of these functions. So mainly the multibyte variant or the in case sensitivity. Other than that, the response was very, very nice and, and also very rewarding for me. So I thought I did a good job on this. And many people wanted to have this function in PHP, but either did not have the time to implement it or it was too easy. I'm not sure how that went. But I think the response from the devs and the overall PHP community was very nice. Derick Rethans 8:23 The RFC is already in voting, so I'm I'm a bit late to talk about them. Usually I'm and things are still in discussion. And at the moment, it looks like it is passing because the votes are 43 to 6 with another weeks ago, then. Philipp Tanlak 8:37 Yeah. Derick Rethans 8:37 Do you think this will be your last RFC? Or do you have something else in mind? Philipp Tanlak 8:41 At the time of this recording I don't have anything else in mind, but maybe if I find something. Since I'm working with PHP on a daily basis, which I think is worth adding to PHP I might create a new RFC. Derick Rethans 8:54 That's how I started and see what happens now. Thank you for taking the time to talk to me today Phillipp, I hope you enjoyed this. Philipp Tanlak 9:01 Yeah, thanks for having me Derick. Derick Rethans 9:05 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: str_contains() Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 45: Language Evolution Overview Proposal London, UK Thursday, March 19th 2020, 09:08 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the Language Evolution Overview Proposal RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 45. Today I'm talking with Nikita Popov yet again about a non technical RFC that he's produced titled language evolution overview. Somewhere last year, there was a big discussion about P++, an alternative ID of how to deal with improving PHP as a language but also still think about how some other people already use PHP and I don't really want to change how they currently use PHP. Like then I didn't really have an episode about that because I'd like to keep politics out of this podcast, or definitely PHP's internals politics. I do think that we realised at that moment that something did have to happen, because there's not really policy about when we can add things, when we can remove things, and so on. So I was quite pleased to see that you have come up with a quite wordy RFC, not talking about anything technical, but more looking forward of were will see PHP in the near or medium future, I would say. What are your thoughts about making this RFC to start with? Nikita Popov 1:29 As you mentioned we had some pretty, let's say heated discussions last year, concerning especially backwards incompatible changes. So there were a number of very, very contentious RFCs. One of them was the short opentags removal, and another one was the classification of undefined variable warnings. So whether those should throw or not throw, and well basic contention is this that PHP is a by now pretty old language, 25 years old. And we can all admit that it's not the language with the best design. So it has evolved relatively organically with quite a few words, and the famous inconsistencies. And now we have this problem where we would like to resolve some of these long standing issues. Many of them are genuine problems that are introducing bugs in code, that reduce developer productivity. But at the same time, we have a huge amount of legacy code. So there are probably many hundreds of millions of lines of PHP code. And every time we do a backwards compatibility break, that code has to be updated, or more realistically, that code does not get updated and keeps hitting on old PHP version that, at some point also drops out of security support. And now the question is how can we fix the problems that PHP has, while still allowing this legacy code to update their PHP version. The general idea of how to fix this is to make certain backwards compatibility breaks opt in. By default, you just get the old behaviour, but you can specify in some way, exactly how it's done doesn't really matter at this point, that you want to opt into some kind of change or improvement. Derick Rethans 3:34 As one example being the strict types that have been introduced in PHP that you need to turn on with a switch with a declare switch. Nikita Popov 3:42 Strict types is really a great example because it has the important characteristic that has done per file. So you can turn on the strict types in one file and not affect any other code, at least in theory. So there are some edge cases, but I think like mostly you can just enable strict types in your library and you don't affect any other library that the project uses. We would like to extend this concept. It should be possible that libraries can update to your language, well, it's called language dialect without forcing other libraries or without forcing the using codes to update as well. Because this is what we have to do right now, though, before you can update your project to PHP eight, let's say, you first have to wait that all the libraries you're using update to PHP eight. And maybe there are libraries that are going to update but also say that: Okay, now actually PHP eight is required. And then you kind of get these complex dependencies with libraries supporting these versions and not supporting those versions, and doing updates becomes pretty hard. As I said, the idea is to make the these backwards incompatible changes opt in some way, and there are multiple general models. So as you mentioned, P++ is the most radical approach. It's more or less a separate language but sharing the same implementation. And as the name suggests that this is inspired by C and C++. So those are usually implemented in the same compiler. And they can be interoperable in a limited way, mostly in that you can use C code inside C++ easily. Using C++ code inside C code tends to be much harder. Yeah, P++ is, I think the option we are pretty unlikely to take for a couple of reasons, because it's this kind of one time huge break which first means that we only have one chance to get it right, and given all the track record, we should maybe not rely on that. Also means that the upgrade becomes especially hard because you have to do everything at once. It's not spread out over a longer time. Derick Rethans 5:54 You say that we need to get it right in one go, but that is hard to say because you don't know, in the future what else we want to add? Like the RFC mentions a few few other cases, like, for example, things like forbidding dynamic Object Properties, we'd have to do right away now as well, if he'd go with the two languages one implementation phase, right? I mean, if we hadn't thought about it, nobody would have thought about it after the split as we made, we'd still not be able to do it. Nikita Popov 6:20 That's true. So P++ is, one time, one time solution. It doesn't really scale over time. I mean, there are also other concerns. And I think like in the end, one of the big ones is just that we don't have the resources for it anyway. So we have only maybe three full time developers on PHP. And I don't think we want to start focusing on this huge separate language more or less. Now we're just going to take a couple of years. Next to having this entirely separate language, there are two other ways to approach the problem. One is editions, which is a concept used by the rust programming language. The idea there is that next to the version, which is more or less than implementation version, you also have this edition, which is a completely orthogonal concept. Basically, we will say: okay right now we are for example at edition zero. And then in addition one you opt into some kind of set of backwards incompatible changes. Then in addition two, there are more backwards incompatible changes, and so on. Each edition is essentially a superset of the previous one. Derick Rethans 7:32 Would it also mean you couldn't get new features in a new edition or is it purely about making backwards incompatible changes? Nikita Popov 7:40 So, this is purely about backwards compatibility. So, if a new feature can be added without breakage then should always be available. The editions switch would only control the backwards incompatible parts. This is to contrast with the second approach, which is to have fine grained declare statements. As you already mentioned, we have the existing strict types directive and we could continue down the same path. So, we could add new declare for no dynamic Object Properties equals one, and then for a strict operators equals one, and for whatever else equals one. And then you would have this long list of possible declares, with which you could enable or disable some particular bit of language behaviour. Derick Rethans 8:26 Then I can imagine that in another five years, that list might be 20 options long. Nikita Popov 8:31 Right. So, the concern there is of course, one part is maintenance, because we have to support basically an exponential combination of different options. And the other is from the programmer perspective, that the like mental model becomes more complicated because you have to keep in mind like which exact set of declares am I using right now? I should say, though, that this model is actually used by Python. Because Python has this import or use from future feature. So there is basically this magic module __future from which you can import language features that will become the default in newer Python versions. For example, you can import the new integer division behaviour inside an older version. This is more or less the same as doing the declares, the fine grained declares, just with a different syntax and with the I think, stronger focus that the behaviour is going to become the default in the future version. Derick Rethans 9:38 So basically, you're opting into experimental functions really? Nikita Popov 9:41 Could be either experimental functions, or it could be really functions from newer versions. In particular Python, also for a while had parallel development of Python 2 and Python 3, in which context this probably makes more sense. Derick Rethans 9:56 There's pretty much three options that the RFC mentions: a new language common implementation or the PHP / P++ option, the editions, and the fine grained declares. These are all still going to be based per file? Nikita Popov 10:12 So that's the second large question, what is the general model? And the second one is where we declare it. The approach I was initially pursuing was to have this declare it at the package level. So for a whole library or for for a whole project. Derick Rethans 10:32 How would you define what a package is? Nikita Popov 10:33 We have namespaces. And there is a somewhat loose coupling between namespaces and packages. So I have an old RFC for a namespace scope declares, where you could, for example, specify strict types for whole namespace, which is, I think, maybe the most natural way to treat packages right now, because this is the closest thing to a package we have. Fortunately, it does have a few issues. One of them is that this namespace package mapping is not always there. So there are packages that have some somewhat odd nesting of name spaces. And I've also heard that some people, for example, define their models inside the Doctrine name space, because they're, you know, extend their classes. So they also put them the namespace. Of course, you shouldn't do that. But it's things that could happen, because we don't really have this enforcement that the namespace really is a package. And then there are also technical concerns, because right now, namespaces are really just a compile time thing to handle name resolution, and now they kind of turn into a feature that also has some kind of runtime impact. And you have to consider things like what happens if you have multiple namespaces in the same file, and also other considerations, like what happens if the names namespace is first used, and you issue some namespace scope declares afterwards. All that can be resolved, but it makes the model somewhat more complicated. Derick Rethans 11:53 And I guess you end up having to declare these namespace scope declares maybe in a separate file or something like that? Nikita Popov 12:14 At least what I have in mind that is that you would declare them in composer.json, and Composer would then take care of registering them with PHP itself. Of course, you could also do that manually, which are not using Composer but that at least was the 95% use case. Derick Rethans 12:31 In applications that make use of Composer, it is very likely that Composer knows about all the libraries that a specific application uses, and hence will be able to construct an array, where it can tell PHP by calling a function declaring all the different options or editions of whatever that end's up being. Nikita Popov 12:49 So that's one of the approaches. There are also some alternatives. One is to instead introduce an actual package concept. One of the possibilities is to basically: add an extra line to each file, which says package and the package name. So that really removes any and all ambiguities. But you do have to add that extra line, which serves some very limited purpose. And basically only for these package scope declares, could maybe also be used for some extra features, like, package private symbols. Derick Rethans 13:23 But it would also instantly make that code base non-parsable with older PHP versions. Nikita Popov 13:28 That's also true, right. But that's a general problem that most approaches I think, would have. So namespace scope declares is one that doesn't have it, but even the per file approach would have this problem because if you write for example, declare edition, then you would right now on PHP seven get the warning that the edition declare is not known. Yeah, last variant that I'm discussing here is to make packages based on the file system, which is something many other languages do. So you have some kind of magic file somewhere that says okay, this directory and all the sub directories are part of the package. In PHP, this kind of file system based approach is somewhat problematic, because our include mechanism is not really based on the file system but on fairly general stream abstraction. You can include from the file system, you can include, if you're really crazy from HTTP, but you can also include from Phar files, from an input stream, or from some kind of custom defined stream. These file system based packages require some additional operations to be well defined. So they have to have a notion of path canonicalization so you can determine whether a file is inside the directory, even if there are things like symlinks or the file system is case insensitive. Which does exist for the file system. So we have the real path syscall, but doesn't exist for streams right now. And a similar problem is that we need to be able to walk up from a path to the directories. And that's also something that doesn't exist for streams. And like more generally, not all streams really have a well defined concept of a directory. For example, if you are reading a file from stdin, so the stdin or the input stream, then there is no directory and like, which package is that going to be in? Derick Rethans 15:31 I think it would be hard to end up debugging at some point. So why some things don't actually end up being in a package where you expect them to be, for example. And then on top of that, you also need to define: Well, how do I call this file and things like that, right? I mean, a PHP script wouldn't be just a single file, for example, would be a single file and this extra definition file. And that's the concept of course that we don't have in PHP at all. Everything is on profile pretty much. Nikita Popov 15:56 Which is why at least to right now. I think, like the immediate way forward, is to use per file declares. So if we don't use the fine grained declare approach, and instead have a single edition, then it's not really a problem to put the declare edition inside every file, because this is already what we do for strict types. It's like not super ergonomic. But I think it's also not a huge problem. And it does have the one very big advantage that files are and remain self contained. So you don't have to consult an external definition that may be hard to locate to figure out how to process. Derick Rethans 16:36 And every IDE or tool would have to implement that same logic and make sure that it's all consistent with each other as well. Nikita Popov 16:43 I wouldn't say it's really hard, but it might be somewhat fragile, especially when it comes to convention. I said if we put things in composer.json, there's probably something tooling can easily deal with. But if you then encounter a project that doesn't use Composer and uses as some other way to register the package declares, then you might run into problems. Derick Rethans 17:09 Lots of things to talk about and discuss at some point. As you submitted this RFC to the mailing list some time ago now, what is sort of the feedback that you're getting on this? Nikita Popov 17:19 So I think the general direction, at least this pretty clear. Most of the discussion is focused on the addition concept, not the finger in declaratives, or the P++. I think for now, we would also go with the per file approach. Now, the main two points that remain contentious is: first, how does the support timeline look like? So basically, the concept of editions just enables different libraries to upgrade independently. That's the core premise. But at least in Rust additionally editions of are also guaranteed to be supported forever. So you can leave your old code running on the old edition, and you do not have to ever update it. Derick Rethans 18:10 How often do they make new editions? Every three years? Nikita Popov 18:13 Yeah, it's not quite clear yet, but probably it's going to be every three years. And now for us, the question is, well, do we want to support old editions forever? Or do we want to give them a finite lifetime? Say we introduced a new edition in PHP eight, and then we supported until PHP nine. That means code can take its time to do the necessary updates, but it does have to do the updates at some point. Derick Rethans 18:37 But you'd have five years? Nikita Popov 18:39 It's more of the general question of if it's forever or if it's limited. So I think based on the discussion, there is a pretty strong preference to not support them forever. Derick Rethans 18:51 But for how long then? I mean, it must be longer than what we support a normal PHP version for, right? Nikita Popov 18:56 Yeah, would expect it to be something like a major version cycle. The second question is related to the strict types, as you said, strict types is like an existing example of a mechanism that works like this. And now we're introducing a second mechanism with the same basic characteristics. Are we going to merge them or not? Would we say that, in the new edition that strict types is enabled by default, or even always enabled? If we do that, and we say that additions have limited support life, that means that strict types is going to become the only option in the future at some point, at least. You can imagine that this is somewhat contentious because there are quite a lot of people who consider weak types to still be the superior option. Derick Rethans 19:49 Whenever I go speak at conferences or user groups, that's not the case. One question is, which keeps recurring always is: Why isn't this the default in PHP eight? I think there's an expectation that strict title at some point is going to be turned on by default. Nikita Popov 20:04 Yeah, and the thing, this is where people disagree whether this expectation is this or not. So there are plenty of people in the discussion thread, well, by plenty I mean, at least two, who strongly think that strict types should remain an option. I mean, PHP of deals with often deals with input coming from HTTP or from a database which is usually coming in as a string. And they think that the typecast you have to do to make that work with strict types actually kind of weaken the type safety guarantees, because if you perform an explicit cast, then that cast is performed basically without any checks. So you can like take a completely non numeric string cast it to integer and you will get zero without any warning or whatever. While even in weak typing mode, that would still result in an error. Derick Rethans 20:58 It's a curious thing actually when you mention databases because, of course databases, you've defined very strict types for your data in them. It's just that it's interesting that PHP's interface to most of these old SQL databases, just decided to always turn into a string. Nikita Popov 21:14 It's it does actually support returning things in they're like native type. Derick Rethans 21:20 With PDO, yes. Nikita Popov 21:21 But under options, and I think it's also like dependent on whether you do emulation or not, and stuff like that. And you have all these different drivers that have differing support for that. But yeah, to get back to strict types, but one of the options is to really keep editions and strict types separate, and also evolve the strict and the non strict mode independently. So you could say that in the new edition, the strict typing mode becomes stricter, for example, by also extending to operators, arithmetic operators, not just to function arguments, but that of course doesn't mean that: Yeah, we saying strict types of states exist forever as a separate track of language. Derick Rethans 22:06 Yeah, that's an interesting one. I'm not sure how to get to a conclusion there actually. Because there's always going to be people on each side side. Nikita Popov 22:13 Yeah. Derick Rethans 22:13 Would you think that this language evolution overview proposal would have been decided on which way to go by the time feature freeze for PHP eight comes around? Nikita Popov 22:23 I think it would be pretty good to have this for PHP eight, because well, it's new major version and the time to introduce this kind of concept. I should say, though, that we already have quite a few backwards incompatible changes in PHP eight, and at least some of them are, like, we are definitely not going to retrofit them into the editions concept. So there are already certainly going to be breaking changes there. Derick Rethans 22:52 Why wouldn't you retrofit them? I mean, if we end up deciding a PHP eight will have these editions, would they not be part of that or would they always end up breaking anyway? Because it seems like a sort of an ideal place to then do it. Nikita Popov 23:05 And yeah, problem is just that the there are some quite extensive changes, especially when it comes to warnings versus exceptions, and will just be like a lot of efforts to get this under an edition flag and to support both behaviours there. Maybe some of the existing changes could be moved into there, with not a huge amount of effort. But I think there are definitely going to be some like hard edition independent breaking changes. Derick Rethans 23:37 New major PHP versions still might have some backward breaking changes independently from when we do the editions or not, or more declares or not? Nikita Popov 23:46 Yeah, that's like one more question, what exactly is the scope of editions? What goes into the edition, what doesn't go into there? I mean, there is always a cost to ending something with this mechanism. One is just maintenance for us. And of course that like user has to consider more different versions of the language. And I think one particularly large aspect that would likely never fall under edition concept is changes to the standard library. So additions work well for language changes, but I don't think they really make sense for a standard library changes. So everything that involves depreciations, or functions with eventual removal would not be covered for that. Derick Rethans 24:31 Do you have an example of such a change in the standard library that PHP eight might have? Nikita Popov 24:36 What I just said might as the general that, usually in every PHP version, we deprecate a bunch of functions and are going to remove them at some point. And these deprecations are like going to apply independently of what edition you set. Actual changes in terms of like real behaviour changes of the standard library I think that's something we quite rarely do. Actual changes to the standard library where the behaviour of a function is changed. That's something we generally try to avoid. Specifically because this causes relatively subtle backwards compatibility breaks. So usually we will either do changes by introducing a new flag or a new function, or by deprecating the functionality entirely. Even when it comes to language changes, there is like I know one example. And the discussion was, well, if we had the edition concept, and we wanted to introduce something like traits, the trait functionality in general is not backwards compatibility breaking. But the trait feature does introduce two new reserved keywords, which is trait and insteadof. So there is technically a backwards compatibility break even though it's finer. And now you have the trade off. Do you introduce traits in the new edition and only reserve the keywords there, thus removing any backwards compatibility break. Or do you you introduce it always, which means that everyone can benefit from it, even if they haven't updated the code to the new edition yet. But it does introduce the small backwards compatibility break. And then you get this trade off and the discussion what you should be doing about that. Derick Rethans 26:17 I think making that kind of decisions will have to be done based on evidence. And I think in the past you've used the top thousand projects on GitHub and see whether things break or not to make a decision. For example, having the nested, or the triple, quadruple nested ternary. Anytime people use it, it's pretty much a bug in the code. Nikita Popov 26:36 Yeah, so to give one example, in PHP 7.4, we introduced the short closure syntax with the fn keyword, and they're the source code analysis showed that basically, fn is not used outside of tests, apart from one library, which is my own. Which does have quite a few dependencies. And that library was indeed broken essentially completely by that change. So in that case, I think there might have been an argument that this feature should be introduced under an edition, because there is like evidence of actual breakage in the wild. Derick Rethans 27:14 This is one of us trying to get it right. We now have evidence for it. Nikita Popov 27:18 And probably like the insteadof keyword for traits, that there's much less problematic. Derick Rethans 27:24 Again, as I say, it's the data that speaks that there right? That was quite a bit to go through. I'm curious to see where those discussions ends up going. Hopefully, we get to a conclusion somewhere in the next few months and ready for PHP 8.0. Who knows? Maybe we have another podcast episode where we introduce a new editions concept. Nikita Popov 27:43 So this is probably my most vague RFC, with a somewhat unclear goal and the somewhat unclear discussion outcome. Derick Rethans 27:53 Do you have anything else to add to this discussion that we've missed? Nikita Popov 27:55 I think there is just one thing maybe worth mentioning, which Rust uses pretty extensively, which has automatic upgrades. So they have some tooling to do that, which is mostly reliable. And I think it would be pretty nice if in PHP, we had something similar. In PHP, we can't really make this reliable because language is just way too dynamic. And we actually do have some tooling in the form of the rector library. But we might want to think about providing something under the PHP project umbrella that is more geared towards like doing updates that are as safe as possible. So you can run them without thinking but still reduce your loads some what. Derick Rethans 28:40 And that is something that is definitely for the future. Thanks for talking to me about the language evolution overview proposal. Nikita Popov 28:46 Thanks for having me, Derick. Derick Rethans 28:53 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP line. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Language Evolution Overview Proposal Rector PHP Library Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 44: Write Once Properties London, UK Thursday, March 12th 2020, 09:07 GMT In this episode of "PHP Internals News" I chat with Máté Kocsis (Twitter, GitHub, LinkedIn) about the Write Once Properties RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 44. Today I'm talking with Máté Kocsis about an RFC that he produced called write only properties. Hello, Máté. How's it going? Máté Kocsis 0:34 Yeah, fine. Thanks. Derick Rethans 0:36 Would you mind introducing yourself a moment? Máté Kocsis 0:38 My name is Máté Kocsis and I'm a software engineer at LogMeIn. I've been using PHP for 15 years now. And after having followed the mailing list for quite some time, I started contributing to the project last October, and now Write Once properties is my first RFC. Derick Rethans 0:58 What is the concept of Write Once Properties? Máté Kocsis 1:00 Write Once Properties can only be initialised, but not modified afterwards. So you can either define a default value for them, or assign them a value, but you can't modify them later. So any other attempts to modify, unset, increment, or decrements them, would cause an exception to be thrown. Basically, this RFC would bring Java's final properties, or C#'s, read only properties to PHP. However, contrary how these languages work, this RFC would allow lazy initialization. It means that these properties don't necessarily have to be initialised until the object construction ends, so you can do that later in the object's life cycle. Derick Rethans 1:48 PHP already has constants, which are pretty much write only properties as long as they're being defined in a class definition. How does differ? Máté Kocsis 1:58 Yeah, it's it's the difference because, so you can assign these properties value in the constructor or anywhere. You don't don't have to define them a default value. Derick Rethans 2:12 Okay, and of course constants have the other problem is that you can only set its values to constants, not necessarily to any sort of expressions, or the result of other method calls. Unknown Speaker 2:22 So you can use objects, resources, any kind of property value here. Derick Rethans 2:28 You mentioned C#'s read only properties. And you sort of mentioned them in the same breath as write ones properties for PHP. These seem like opposite things Máté Kocsis 2:39 Not quite opposite, but there's some distinction between the two. C sharp requires these properties to be initialised until the object construction ends. And this is very difficult to achieve in PHP. And now I'm using Nikita's words: Object construction is a fuzzy term and you can be sure if, if the contractor is involved at all. For example, if you are using Doctrine or proxy manager, so we decided to allow lazy initialization, which means that you don't have to assign these properties a value, you are free to do anytime when you want. Derick Rethans 3:22 What happens if you read them without them having being set yet? Máté Kocsis 3:27 Initially, when I started working on this proposal, I faced the problem because untyped properties have an implicit default value in the absence of an explicit default value. That's why you just can't really use them with the write once properties. Either you have a default value or you can do anything with them. That's why we we had to only allow typed properties with the write once properties and typed properties are in an uninitialised state by default. You can't read them until you first assign them a value. Derick Rethans 4:04 Because in PHP 7.4 that will throw a type error. So that actually ties in really nicely with PHP 7.4's initialise concept for the type hinted properties. Máté Kocsis 4:14 Yes. Derick Rethans 4:15 One thing that is slightly skipped over is which keyword does the RFC produced, because you mentioned final for Java and read only for C sharp, which one of you picked for PHP? Máté Kocsis 4:25 So there were plenty of possibilities considered. The first one was the final keyword. At first, it seemed to be the obvious choice for me, but after thinking about it, I turned out that it's not not the right candidate because currently it affects inheritance rules in PHP. And now we are talking about mutability rules. We had sealed which comes from C sharp and the problem is the same because it also affects inheritance rules, so we shouldn't reuse it for different purposes. We also consider immutable. It's one I like. But it might be a little bit misleading because the usage of immutable data structures, like objects or resources are not restricted at all. Then there's locked, which is a bit too abstract or vague name. We also have writeonce as well. And technically, it's the most accurate term. But from the user's point of view, it could be a bit confusing because they are not expected to write them at all, only the read these properties. And now we have readonly and probably this keyword get the most traction so far. And it's good. It's a good name because it refers to what users should generally do with these properties. However, there's also a slight problem that users can, or in some circumstances can, write these properties too. But that's not the general use case. Derick Rethans 6:10 It's a curious thing. I remember we had a PHP developers meeting back in 2000, let's say 2008. But it could as well have been 2005, where we also actually spoke about read only properties, but I'm going to have to dig up the notes for that to see what it said there. Maybe you find it interesting to read to see what the history said about this. Unknown Speaker 6:31 I'm curious. The question is open, so I plan to put it to vote. Derick Rethans 6:37 When do you think you're putting it up for a vote? Unknown Speaker 6:39 I think it should be close now. I will answer the mail, which came from Nicholas. I don't know if there is no more problems than we could do it this week or early next week. Derick Rethans 6:54 As the properties are write once, how will she implement lazy loading with that? In order to do the lazy loading, you need to first figure out whether the property is already set. How will you know that it's already set? How can you check for that? Máté Kocsis 7:07 I think generally you don't have to worry whether a property's write once or or not. Since mainly, we are talking about private or protected properties in the most cases. However, if you need this information, then you will be able to use reflection. I've already added support for method in in ReflectionProperty for this purpose. Derick Rethans 7:31 Let me ask a little bit more about that. You mentioned that this is meant for lazy loading. I understand lazy loading is something that you do well, you're executing and all the methods. For example, on an object, you do get something and that needs to fetch things from a database. Because those write once properties are private or protected, most of the time, the code that fetches the things from the database that does the lazy loading still needs to know whether the properties already been written to. Because if it would attempt it again, you'd potentially get an exception. So how would it know it's already been written to? Máté Kocsis 8:03 Good question. I was talking with with Marco Pivetta. His use case with proxy manager is to unset these properties in advance and then it can use the get or set or I don't know which magic methods. Derick Rethans 8:28 I saw that the RFC mentioned a few other alternative approaches for this feature. And the headlines in the RFC say: read only semantics, write before construction semantics, and property accessors. Would you mind explaining these and why they haven't made the final RFC? Máté Kocsis 8:44 The first one was to follow Java and C sharp, and require all write once properties to be initialised until the object construction ends. And this is what we talked about before. The counter arguments were that it's not easy to implement in PHP. This approach is unnecessarily strict. The other possibility is to let our limited writes to these properties until object construction ends and then do not allow any writes. But positive effect of this solution is that it plays well with bigger class hierarchies, where possibly multiple constructors are involved, but it still has the same problems as the previous approach. Finally, the property accessors could be an alternative to write once properties, although in my opinion, these two features are not really related to each other. But some say that property accessors could alone prevent some unintended changes from the outside and they say that maybe it might be enough. I don't share this sentiment. So in my opinion, unintended changes can come from the inside, so from the private or protected scope. And it's really easy to circumvent visibility rules in PHP. There are quite some possibilities. That's why it's a good way to protect our invariants. Derick Rethans 10:15 What was the most criticism you got on the mailing list about his proposal? Máté Kocsis 10:18 As far as I remember, the property accessor. The biggest criticism was that we don't really need this term, but we could use property accessors. Derick Rethans 10:29 We have spoken a little bit about what this feature is. We went into a few use cases with lazy loading. What would other use cases for this be? Máté Kocsis 10:38 I think it's really suitable for domain driven design, or working with value objects, and I'm a great fan of DDD. The problem is PHP can't guarantee any immutability for our objects. Just one example. You can invoke the object constructor as many times as you wish, which overrides all your properties. Derick Rethans 11:04 I had not thought about that you can actually call the constructor yourself. And of course you can. Máté Kocsis 11:08 Yes, me neither. I just saw somewhere probably in a previous discussion about immutable objects. That's the advantage of having write once properties. You could by using write once properties, yeah, you can prevent accidental modifications from the outside or from the inside too. And that's the main purpose. Derick Rethans 11:32 Your main purpose wasn't lazy loading but more immutable value objects. Máté Kocsis 11:36 Yes, yes. Right. I proposed right fans properties first, to pave the road for immutable objects because this is my main goal. Derick Rethans 11:46 Okay, but you're going step by step. I think that's actually a wise way and Nikita have said something similar that it is nicer to take things little by little so that it is easier to convince people that this is a good feature or not. Unknown Speaker 11:59 Actually it was Nikita's idea to split the two proposals. Derick Rethans 12:03 That make sense. Okay, Máté, thank you for taking the time this morning to talk to me. Máté Kocsis 12:08 Thank you for having me. Derick Rethans 12:11 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Write Once Properties PHP Developers Meeting Notes Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 43: Syntax Tweaks London, UK Thursday, March 5th 2020, 09:06 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the RFCs. One on abstract methods in traits, and one about an improvement to the tokenizer. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 43. Today I'm talking with Nikita Popov yet again about a few RFCs that he's produced for PHP 8. Good morning, Nikita. How are you doing? Nikita 0:34 Good morning, Derick. I'm doing great. Derick Rethans 0:37 I've given up on introducing you because we've done this so many times. Now, you don't need an introduction any more. The first RFC I wanted to talk about a little bit this morning is the abstract trait methods validation RFC. What are traits? Nikita 0:51 We usually talk about traits as compiler assisted copy and paste. Basically, we just take all the methods and properties from a trait and copy them into the class that's using the trait. That's a bit over simplified, in particular, you can use multiple traits in the single class. And those traits might be defining the same method, in which case you have to resolve the conflict in some way. So that's where you have these insteadof or use annotations to specify precedents and aliases. Derick Rethans 1:23 Traits has been in PHP for quite a long time. What is now the problem that you're trying to solve through this RFC? Nikita 1:29 The problem is that traits are sometimes not self contained. So to give a specific example, we have in the logger PSR, we have a trait called logger trait, which has a bunch of methods like warning, error, info, notice, and so on. So just simple helper methods, which all called the log method with a specific log level and this trait only specified these helper methods but still requires the actual class to implement the log method. The way you'll usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait. And you have a number of abstract methods that the trait itself requires to work. This already works fine, but the problem is just that these methods are not actually validated, or they are only inconsistently validated. Even though the trait specifies this abstract methods, you could implement it in the class with a completely different signature. Derick Rethans 2:30 Okay, just like any signature? Nikita 2:32 Just like any signature right. The method still has to be present in some way. But the signature can be completely different. Could also be like different method type, like a static method, or an instance method. Derick Rethans 2:43 Just basically checks for the name is what you're saying? Nikita 2:46 Yeah, it only checks with the name. Derick Rethans 2:49 Is this the only place, is this the only time where these abstract methods are not being validated. Or are there other situations where that could happen as well? Nikita 2:57 No, I think this is the only place. Derick Rethans 3:00 Are all the situations where these abstract methods in the trait will get validated. And also on signature? Nikita 3:07 As I mentioned, it's not like the signatures are completely unvalidated. They are just inconsistently validated. It depends a lot on exactly how you use the trait. If you just use the trait and specify the methods of the same class, it doesn't get validated right now. If instead of the method is provided by the parent class, so it's inherited, then it does get validated. If you don't implement the method that makes the class abstract instead, then it's also going to get validated in the child class. It kind of already works halfway. And this RFC just tries to make it work always. Derick Rethans 3:44 Okay, that seems like a reasonably good addition to almost a no brainer. Nikita 3:48 I would say it's basically, a bug. Especially if you look at the implementation, there is clearly some validation code there. The conditions are just a little bit off, but so we do have to go through the proposal, because this is a backwards compatibility break. Derick Rethans 4:02 Yes, I was about to ask if it's a bug fix, why bother with an RFC? But if it's a BC break then yeah, we still need to do it of course. I doubt there be many controversies about is? Nikita 4:12 Actually there is one contentious point. Um, so something I didn't mention yet is that the RFC also allows you to define private abstract methods in traits. Normally private abstract is like a contradiction in terms because private means only visible in the same class. And abstract means it has to be implemented in the child class, you can't really have both. You can't have both with traits, because traits can see the private members in the class. I think that by itself is like not controversial. That's a reasonable thing to have a trait. The part that is controversial is what you do with existing visibility modifiers. This pattern already exists. So people already define abstract methods in traits but because right now private abstract is forbidden, the lowest they can use is actually protected abstract, even though they don't actually want that method to be publicly exposed, or even protectively exposed. So there is an argument there that we should maybe ignore the normal visibility validation that we do, and allow even implementing a protected abstract method from a trait with a private method inside the class, simply for backwards compatibility reasons. Derick Rethans 5:21 Because if you wouldn't allow that then, how would it break things? Nikita 5:26 It would break things because there is existing code, using these abstract protected methods simply because we don't support abstract private yet. So those code would start throwing visibility error, and I mean, could be fixed by just dropping the abstract method, but there's also not ideal. Derick Rethans 5:45 Because people use it to make sure that, I mean it's there in the class that implements the trait pretty much. Do you have any idea when this is going to for vote? Nikita 5:53 I think it can already go up for vote? Mainly I need to resolve that question about the visibility first. Derick Rethans 5:59 I'm looking forward to seeing that showing up sometime soon then. How do you call your second RFC? Nikita 6:05 Object based token get alternative? Derick Rethans 6:07 I think that's a great title. There's a few words in there that we might have to explain first. What are these tokens you're talking about? Nikita 6:14 So the token_get_all function, which we already have, exposes a part of the PHP compiler infrastructure. PHP compilation generally has three steps. The first is the tokenization. The second part is the parser, and then the compiler. So the tokenizer converts the raw character stream into tokens, which encode higher level concepts, for example, that like the sequence of FUNC and so on is actually a function keyword, or that double code followed by characters is actually a string. So this part only recognises like not larger structures, like whole functions but at least the the atoms that make up language. Derick Rethans 7:00 Would you say these are the words that make up the sentences? Nikita 7:03 Yeah, that's that's the right analogy. Derick Rethans 7:06 Why would you want to have access to them? Nikita 7:08 For example, I have a PHP parser library, which converts these tokens into an actual syntax tree. And then on top of that, you can easily analyse PHP source code. So this is what all these static analyzers, like PHPStan or Psalm are based on. Derick Rethans 7:27 Do they all use the tokens? Nikita 7:29 Those two, in particular, use my PHP parser library, and that one uses the tokens internally. There is also other tooling that's more directly based on tokens, for example, code formatters or code style inspection tools like PHPCS. Those all directly operate on the tokens instead. Derick Rethans 7:47 But as you say, these tokens only are words and they don't really provide a structure. How would these tools then convert that into a structure? Nikita 7:54 If you're looking for, if you're looking just at formatting, then you may not really need a lot of structure. So you probably do need to write like that of extra code to recognise that, okay, the function token followed by white space, followed by an identifier, that's function declaration. For the more complicated tooling that builds a syntax tree, you need to implement a parser, either based in code generation, or based on recursive descent approach. Derick Rethans 8:26 Why would you not want to have direct access to PHP's AST instead because that already provides a structure for you? Nikita 8:33 We do have direct access to the AST through the AST PECL extension, which is not part of core yet. I don't know if there are plans in that direction. Derick Rethans 8:43 Well you wrote it so you surely can make these plans. Nikita 8:46 Yes, I can make them but I don't know if I should make them. Derick Rethans 8:50 I think you should. Nikita 8:51 I mean, the nominal advantage of the AST extension is that it's always up to date with PHP. In practice that really isn't an issue, because some of the userland tooling is also pretty quickly updated. The more practical advantage is that the extension is a lot faster than implementing this in userland code. Well, I mean, this is really one of the areas where C code is faster than PHP code. The AST extension only exposes the structure that PHP itself needs. PHP is not interested in like precise formatting, and things like that at all. So it throws away quite a few things. You can, for example, get accurate on position information. Like, where, exactly not just which line but of which column, something is defined. And that's something you're quite often interested in. Derick Rethans 9:46 Also, from what I've known, it throws away all the comments unless they are doc bloc comments. How does the tokenizer currently return information about the tokens? I've played with this in the past and I didn't think it was the prettiest format to get back out of it. Nikita 10:02 token_get_all returns an array of tokens. And there are generally two types of tokens. One is single character tokens, like a semicolon, or a comma, or whatever, which are just returned as a string. So it's a single character string. And then there are complex tokens, like the function keyword, like white space, like strings, which are returned as an array where the first element is the token ID, which is an integer. And we have constants defined for these integers. The second element is the actual string content of the token. So for the function keyword, that's always going to be function, but it could be written in different ways because the keyword is case insensitive, so it could be all lowercase, or uppercase, hopefully it's all lowercase. Derick Rethans 10:52 You'll get the odd situation where the first letter is the capital, I suppose, but that's about it, hopefully. Nikita 10:57 And finally, the last element is the line number. So the starting line number. Derick Rethans 11:02 So if you want to look at the position on the line, you'd have to calculate it yourself? Nikita 11:08 Right you would have to track that yourself. I mean, there are two problems. One is just that you have these single character tokens and the complex tokens using different structure. So all the codes using them as to always switch back between those; check if it's an array or a string, or a test to do some kind of normalisation itself. And the second problem is that arrays in PHP are fairly memory inefficient when it comes to storing a fixed amount of data. Storing three elements inside an array always means allocating an array for eight elements. Because its minimum array size, you have to use space to store the key, and so on. Generally, if you have a fixed structure, it's much much more efficient to store it inside an object. Using a class that has declared properties. So this makes a very large difference in some cases, especially if your array only has like two or three elements, you can save a lot of memory with it. Derick Rethans 12:12 Have you done any benchmarks to see how much memory you'd actually save some likes some some particular scripts that you've parsed with how to tokenizer doesn't matter and how you proposing to do it? Nikita 12:22 Yeah, I have here in the RFC, some numbers for some particular script that goes down from 14 megabytes to eight megabytes. So that's nearly half the memory usage. Well, actually, maybe I should first actually say what the RFC proposes. The RFCe proposes to instead return objects, an array of objects. And these objects have four properties. So first is again, the ID of the token, then the textual content, the line number, and also the starting position of the token in the string. Derick Rethans 12:54 Is this something that the tokenizer extension and tracks for you? Nikita 12:58 I mean, that's something that can easily do, so we can just as will expose it. And these objects are always used. So we no longer make the distinction between single character tokens and complex tokens. So we always return the uniform array of tokens, of token objects. Despite doing that, removing this optimization for a single character tokens, the end result is still that we use half as much memory, simply because objects are that much more efficient than arrays. Derick Rethans 13:27 That's a clever trick. I'm sure people like that, that using less memory, at least I know I would. Is it also faster or doesn't particularly matter much? Nikita 13:35 It's also faster, like maybe 30% or something, because memory usage and performance tend to be pretty heavily correlated. So if you use less memory, you also are faster. Derick Rethans 13:46 That makes sense. Are you thinking of other things that you can add to the tokenizer extension to make working with them even easier? Nikita 13:52 The way this new functionality is implemented is, we have a PHP token class and on it we have a static method getAll. So instead of calling the token_get_all function, you call PHPToken::getAll(). And one nice thing this allows you to do is to extend this token class. So you can say, MyPHPToken extends PHPToken, and then you call MyPHPToken::getAll() and then we will actually construct your extension class. That means that you can add whatever methods you like, in addition to what we provide by default. Derick Rethans 14:29 Is that a pattern that we have in other places in PHP as well? Because I don't usually think that even if you'd call an inherited static method, why wouldn't suddenly return the inherited classes object? wDo we did it in other places? Nikita 14:42 So this is somewhat uncommon in PHP internals. I think it's a pretty common pattern for userland where generally if you return new objects from static methods, you always use new static, not new self. This is essentially late static binding, which we did discuss quite recently. So, there is one limitation here namely that the constructor of the PHPToken class is final. So, you can extend the class and you can add extra methods, but you cannot modify the construction behaviour, because we would like to internally construct these tokens very efficiently by more or less directly writing the values into the right slot in memory and not doing slow constructor calls, becouse this functionality tends to be very performance sensitive. And the same trick where you can extend the class but not change the constructor is also used by the SimpleXML extension. Does exist but not very common in, generally where internal code is concerned, we usually do not really plan for extension. I think nowadays we mark nearly all internal all new internal classes as final simply because extension is such a pain to deal with. And for old classes who usually wish that we had marked them as final. I mean, this is also a general recommendation for userland that, like you should mark things final as much as you can get away with it. But it's much bigger concern for internals because dealing with userland extensions that do unexpected things is much harder for us. Derick Rethans 16:23 You even need to make sure that your internal structures are properly constructed by the parent's constructor being called from inherited classes but in PHP, there's no such requirement that you do. Pretty sure I've had problems with that for the Date extension a long, long time ago, where people would extend from it, not call the constructor. And then because he didn't think of it, nothing is defined and everything just falls down. Nikita 16:44 Yeah, so this is one of the common problems. And the other one is that internal classes often define custom object handlers. So that's something only internal classes can do. Just to give one example, they can define debug info handler that modifies the output of var_dump, but nowadays we also have the user land magic methods on get you back into and I think pretty much all internal classes are just going to ignore that, and always return their own internal debug information even if this method has been overwritten, simply because no internal class actually checks for that. And this kind of problem also exists for a lot of other magic, and generally no one takes it into account, and things are just more or less softly broken. Derick Rethans 17:31 Very recently there was a pull request for Xdebug to change that as well because in Xdebug's debugging output get sent to IDEs. For internal classes always uses internal get debug handler, and for userland classes it uses whatever is userland defined; I mean if there's a magic method we'll use that. The pull request wanted to change Xdebug in such a way that it would also use the get debug info magic method for internal classes, whenever overridden. After some discussion about this, we figured out, this is probably a bad idea to do, and hence, we haven't merged that. Although we end up fixing some other things that the developer also found. That's a curious situation to be in. We would like us to be sort of work the same. But at the same time, you sometimes really want to see the internal information from the classes without developers having hidden the information behind it, right. Nikita 18:20 Yeah, that's true. Derick Rethans 18:21 And that is just from a from a debug perspective. And even from, let's make sure things don't crash perspective. I see that the RFC also rejected a few features that aren't part of the current iteration yet or might make sense to add it later. And one of them is about a lazy token stream. What would that be and what sort of different interface would it have? Nikita 18:43 The lazy token stream basically just means that instead of returning an array of tokens, we return an iterator of tokens, which means that we do not have to store the full array in memory, which, like for the example, I used. The memory usage for the whole token array was eight megabytes, even after these memory usage improvements, which wasn't a fairly large file, but definitely not the largest file. You can encounter especially when it comes to generated files. So there is an advantage of processing tokens one by one as a stream, because then your memory usage is going to be basically O(1), not O(n). The problem is, I mean, the PHP lexer does indeed work one token at a time, so it can support it. The problem is that it has a lot of internal state. And in order to implement this iterator, we would have to backup and restore the state on each produced token to make sure that it's still possible to for example, include and compile other files in the meantime. So this is something that can be improved; we can make that cheaper, but that would be a larger effort. And I'm not really sure it's worthwhile because, while you can process one token at a time. And this is, for example, what the PHP parser does internally. Many practical applications in userland will generally want to have all tokens as an array. Because it makes it simply, makes things easier if you can always look ahead and look back. And I think it would be fairly hard to rewrite the existing libraries in terms of the latest tree. It may be a nice to have, but I'm not the most useful thing for it now. Derick Rethans 20:32 What has been the feedback for this RFC? Nikita 20:35 I think pretty good. This is something that we've already discussed years ago. Last time the discussion kind of got a bit got a bit sidetracked. Yeah, one of the dangerous when you start introducing object oriented interfaces. Well, actually, I just call this RFC object-based intentionally, because when you do object oriented then people would like to have their tokens, and their token streams, and their token stream factories, and the token stream managers. And this is basically held this the whole time. But generally everyone who is working on tokens, which is not a lot of people, but those who are working with them know that memory usage is a problem. And the current, current inconsistent structure is a problem, which is why most of them implement their own token objects, and basically do the same thing we propose here just themselves. Derick Rethans 21:30 When it's this one going up for a vote at the same time? Nikita 21:32 Soon. Derick Rethans 21:33 Both of these RFCs that we spoken about today are both targeted to a PHP eight, I suppose? Nikita 21:37 Yeah. So right now, I think all RFCs are targeted at PHP 8. Derick Rethans 21:42 Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and see what happens to them. Nikita 21:52 Thanks for having me once again. Derick Rethans 21:56 Thanks for listening to this instalment of PHP internals news. The weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Show Notes RFC: Validation for abstract trait methods RFC: Object-based token_get_all() alternative PSR-3 Logger Interface PHPStan — PHP Static Analysis Tool Psalm PHP-AST PHPCS Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 42: Userspace Operator Overloading London, UK Thursday, February 27th 2020, 09:05 GMT In this episode of "PHP Internals News" I chat with Jan Böhmer (GitHub, LinkedIn) about the Userspace Operator Overloading RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 42. Today I'm talking with Jan Böhmer about Userspace Operator Overloading. Jan, would you please introduce yourself? Jan Böhmer 0:33 Hi, my name is Jan Böhmer. I'm a physics student from Germany. And I'm the author of the Operator Overloading RFC. Derick Rethans 0:40 What brought you to writing this RFC? Jan Böhmer 0:42 Mostly because I have worked with monetary objects in the past. And it was a bit tedious to work when it comes to calculating. And whenever you have to want to calculate something, you have to call functions on objects. This is not possible to call, just use operators like with normal values like floats or integers. Derick Rethans 1:06 Because the monetary objects themselves had multiple things embedded in there or something like that? Jan Böhmer 1:11 Yes, they describe mostly a value and a currency. And together they are saved in an object. Derick Rethans 1:18 Okay that that seems like a reasonable thing to do, right? I mean, other times people say the same thing about doing complex numbers or something like vectors. The RFC is called Userspace Operator Overloading. What is operator overloading? Jan Böhmer 1:31 Yeah. Basically, is the idea that you can define operators, like addition or subtraction, or the string concatenation for objects Derick Rethans 1:43 Does PHP already have something like this? Jan Böhmer 1:45 Actually, yes. Objects can have something that calls do operation handler. This is called whenever PHP encounters an object, but if used with an operator. The problem is that this handler is only available for PHP internally. So if you want to use it, you have to write an extension. Derick Rethans 2:06 So it will be possible to have in an extension a Monetary class with its own operators already defined on it. Jan Böhmer 2:14 Exactly PHP extension GMP uses this as already. The problem is that it's not very flexible, you already have to know, be familiar with C, you have to be able to compile that. You have to contribute it to whatever system you want to use it. Since we have the foreign function interface since PHP 7.4 we can implement many things without to actually have an extension. But this operator overloading is something that's not possible yet inside from PHP. Derick Rethans 2:47 So it wouldn't have been possible to write the GMP extension which is of course a wrap around libgmp with FFI, because there's no operator overloading available in PHP. Jan Böhmer 2:59 Not in that comfortable way. You could use this way with functions but it would a bit more tedious then using just operators. Derick Rethans 3:09 You've mentioned the Monetary object as a good use case. What other use cases can you think of? Jan Böhmer 3:15 Higher mathematical objects like complex numbers, vectors, or something like tensors, maybe something like the string component of Symfony. That's you can simply concat this string objects with a normal string using the concat operator and doesn't have to use the function to call that, because basically, this should behave similar to a basic string variable, and not, like something completely different. Derick Rethans 3:45 What is the syntax you're proposing for implementing this? Jan Böhmer 3:49 My idea was similar to Python to use special metric function, the methods for every operator you can overload. So if you want to overload the addition operator, you would implement function called, a static function called __add, for example. This offer this function takes both operands, the left hand operand and the right hand operand. So you can decide if your current, this object is on the right or the left hand. That is important to determine something like one divided for zero, or one divided through two, or two divided through zero. There are two complete different cases and you have to be able to differentiate between the two cases. Derick Rethans 4:39 And wouldn't that not be possible to do in non static functions? Jan Böhmer 4:43 Another problem with non static functions such as possible access to this variable. If you modify an object from inside an operator handler, this can lead to very, very strange behaviour. Because normally operations doesn't change the object itself, but rather you should return a new value. The problems such as asked us to this, it is very easy to accidentally change the this object. If you only pass both objects like via a static methods, it is a bit more clear that you have to create an all new object Derick Rethans 5:24 Would a type hint enforce that you return a object to the same class? Jan Böhmer 5:29 Not an all case you want to return an object of the same class. For example, takes a dots product of vectors. So you take two vectors, multiply it in some way as you return to normal float value. Derick Rethans 5:43 Of course, yes. Jan Böhmer 5:44 If you were to enforce that, but would always to be the same types as those limits the use cases, in my opinion too much. Derick Rethans 5:52 But you could of course type hint the __add operator yourself? Jan Böhmer 5:57 It's always typehints in arguments, in my observation are used as a hint which type are supported for the operand handler. If you for example, vector plus an integer, and your operator handler only declares vector vector as a parameter types, then this operator will not be called, and it will tried to be called on the second object. Derick Rethans 6:24 So it won't the called and instead it falls back to the second object to be called on. Jan Böhmer 6:29 Yes, the idea behind it, is that only one of the objects have to know about both classes. So if you want to combine, for example, two objects from different libraries, and library A doesn't know about library B then only objects of the second library have to know about object A. In C++ you can define supported type outside of classes. So you can define combinations between arbitrary objects. The problem is in PHP this was a bit complicated. And the best way to implement this handler in types or classes. So the class has to know about each other objects, it could be interact with possibly. Derick Rethans 7:14 That makes sense to me. What happens if neither of the classes, or if one of them is a class, and the other one is just a scalar type, if none of the add methods fit, what would happen then? Jan Böhmer 7:24 The operators implement an handler, then those doesn't support them, then an error would be thrown. Derick Rethans 7:32 And that is a type error like you'd normally get? Jan Böhmer 7:34 If the object doesn't implement operator at all, then a notice would be triggered. The idea's that in the moment, it is possible to write something like object plus one, this would be a fine expression in PHP, in the current PHP versions, the object could be interpreted as a one and just a notice would be thrown. For compatibility reasons, my RFC does the same behaviour if no operators are overloaded on objects. Derick Rethans 8:05 That seems like a reasonable compromise there. I remember from in the past, I think it was Sara Golemon that wrote an extension for using operator overloading. And I remember from the time that there is a problem with using the lesser than or greater than operators, because I think one of them gets flipped around automatically in the engine is being changed in PHP already, or are you running into the same problem? Jan Böhmer 8:28 I'm not sure about this. My RFC doesn't mention comparison operators like greater or less at all. Because comparison, handled differently internally of PHP. This doesn't work about this. This is mentioned do operator handler. It would be a bit other implementation to do this. Also, the comparison is a bit complicated on its own terms. Maybe it's more useful to use interfaces for, to implements this overloading, or to use. Also, there are some problems. Maybe we should only allow something like an compare operator that's resolved either, minus one, one, or zero. If object's lesser or equal, so that everything is defined together at once. So it's not possible to define an object that has maybe, for example, the lesser, but not the greater operator. Derick Rethans 9:32 But this sounds like that's for a different RFC. Jan Böhmer 9:35 Exactly. That's a bit complicated. If the current operator overloading RFC gets passed, then maybe a comparison operator overloading RFC would make sense. Derick Rethans 9:46 From reading the RFC, I've noticed that you also won't be able to use a shorthand assignment operator. So for example, plus equals. What is the reason for that? Jan Böhmer 9:56 So every shorthand operator becomes currently an assignment of A plus B. The do operation handler cannot decide if an shorthand operator or normal operator was called. Allowing to overloads the shorthand operators, would maybe allow some benefits for objects terms of memory optimization. If you call a short hand operator you can mutate the object itself doesn't have to create a new object which takes more memory, but I think with the garbage collector of PHP that is not such a big problem. And if that is really needed feature in the future, this could be edited in other, later version of PHP. Derick Rethans 10:41 Okay. Jan Böhmer 10:42 Many other languages doesn't allow to otherwise shorthand operators so I don't think that as too much need for. Derick Rethans 10:49 Operator overloading sometime has criticisms directed at it. What are some of the criticisms you've heard about it? Jan Böhmer 10:56 First of all, there are some criticisms about the operator overloading idea in general. So there's also some criticism could be abused for doing some very weird things with operator overloading. So as mentioned C++ there is a shift, left shift operator, is used for output in a stream to the console. Or you could do whatever you want inside this handler, so if somebody would want to save files or modify the file in inside operator overloaded handler, it would be possible, and it's in the most cases function would be more clear what it does. Derick Rethans 11:35 Of course, in a function add(), if you implemented yourself, nothing stops you of course on writing to a file either. Jan Böhmer 11:41 Operator overload issues, in my opinion only be used for things that's related to maths or creating custom types that behave similar to the built-in types. Derick Rethans 11:52 Like complex numbers, or vectors, or monetary numbers. So far, we have been discussing this RFC for a few weeks now. What do you think the chances are of it being passing? Jan Böhmer 12:05 I'm not sure. I think the idea of operator overloading in general is accepted in the community, but doesn't hear so much backlash. There was some time discussion about how to do it. Some people think it's maybe better if you would implement operator overloading with interfaces, like with ArrayAccess, or to introduce some completely new keywords, like in other languages. In C++, or C#, there are a special keyword operator, that's marks an operator overloading function. So it is clear that is not a real function but special handled way. Derick Rethans 12:49 Instead of using the underscore underscore in front of method names. When do you think you'll be ready to put this up or vote? Jan Böhmer 12:56 Wasn't it busy last days, I will do some revises to my RFC, and polish my implementation. Derick Rethans 13:06 Okay, thank you very much this morning for taking the time to talk to me Jan. Jan Böhmer 13:10 Thank you very much for inviting me. Derick Rethans 13:13 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Userspace Operator Overloading Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 41: __toArray() London, UK Thursday, February 20th 2020, 09:04 GMT In this episode of "PHP Internals News" I chat with Steven Wade (Twitter, GitHub, Website) about the __toArray() RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hi, this is Episode 41. Today I'm talking with Stephen Wade about an RFC that he's produced, called __toArray(). Hi, Steven, would you please introduce yourself? Steven Wade 0:35 Hi, my name is Steven Wade. I'm a software engineer for a company called follow up boss. I've been using PHP since 2007. And I love the language. So I wanted to be able to give back to it with this RFC. Derick Rethans 0:48 What brought you to the point of introducing this RFC? Steven Wade 0:50 This is a feature that I've I've kind of wish would have been in the language for years, and talking with a few people who encouraged it's kind of like the rule of starting a user group right? If there's not one and you have the desire, then you're the person to do it. A few people encouraged and say: Well, why don't you go out and write it. So I've spent the last two years kind of trying to work up the courage or research it enough or make sure I write the RFC the proper way, and then also actually have the time to commit to writing it and following up with any of the discussions as well. Derick Rethans 1:18 Okay, so we've mentioned the word RFC a few times. But we haven't actually spoken about what it is about. What are you wanting to introduce into PHP? Steven Wade 1:25 I want to introduce a new magic method. The as he said, the name of the RFC is the __toArray(). And so the idea is that you can cast an object, if your class implements this method, just like it would toString(). If you cast it manually to array then that method will be called if it's implemented. Or as, as I said, in the RFC, array functions will it can it can automatically cast that if you're not using strict types. Derick Rethans 1:49 Oh, so only if it's not strictly typed. So if its weakly typed would call the toArray() method if the function's argument or type hint array. Steven Wade 1:58 Yes, and that is actually something that came up during the discussion period, which is something again, this is why we have discussions, right? Is to kind of solicit feedback on things we don't think about it, we may overlook or, and so someone did point out that it is, you know, it would not function that way, or you would not expect it to be automatically cast for you, if you're using strict types. Derick Rethans 2:17 Okay. Steven Wade 2:18 The RFC has been updated to reflect that as well. Derick Rethans 2:20 So now the RFC says it won't be automatically called just for type hint. Steven Wade 2:24 Correct. Derick Rethans 2:24 Not everybody is particularly fond of magic methods. What would you say about the criticism that introducing even more of them would be sort of counterproductive, because you'll end up not necessarily knowing what happens if you start calling a method, when you do a cost, for example. Steven Wade 2:38 The beauty of PHP is in its simplicity. And so adding more and more interfaces, kind of expands class declarations enforcement's and in my opinion, can lead to a lot of clutter. And so I think PHP is already very magical. And the precedent has been set to add more magic to it with 7.4 with the introduction of serialize and unserialize magic methods, and so for me it's just kind of a, it's a tool. I don't think that it's necessarily a bad thing or a good thing. It's just another option for the developer to use. Derick Rethans 3:06 Two episodes ago, I spoke with Nicolas Grekas about a Stringable interface that he suggested to introduce, which is a little bit similar to sort of the casting with toArray(). And hence, do you think it would have make sense to have an __toArray() also happen if the class implements a interface with a typed function argument? Steven Wade 3:29 I think that would be two separate RFCs. I think the first one to kind of get it on par with what's what we have now in PHP would be to introduce the toArray(). And then a separate one would be if we wanted to follow suit with an arrayable interface. Derick Rethans 3:43 And which is the same thing that happens with the Stringable interface, right? We have had toString() for how many years, decades? But from what I understand, if you have a typed property "string", it would also call the toString() method when it's defined on an object that's being passed in, or do I misunderstand that, there are misremember that? Steven Wade 4:00 I haven't followed that one too closely. I've kind of been catching up on some of the discussion today. But and yeah, I don't know off the top of my head what that would do. Derick Rethans 4:07 I didn't mean with the ori.. with the newly suggested Stringable interface with adults we currently have. Steven Wade 4:12 I'm not sure how that would work. Derick Rethans 4:13 I don't know, either. That's what I'm asking you. Steven Wade 4:15 With the array and with the typed properties? That's a good question. That's again some feedback, we kind of need to that I need to think through Derick Rethans 4:21 Because I think it would make sense to at least behave the same and I don't particularly mind which way it goes. Me that's, that's a personal opinion here. Steven Wade 4:28 And that's a great idea I need to haven't played with 7.4 too much, I need to pull it down and try and just see what the behaviour of string is because that's the main goal of this is to try and just get this on a parity, functionality parity with with what's toString() will do. And so if that is how it handles it with typed properties and I would want to implement that as well. Derick Rethans 4:47 In a similar way. I don't also know what happens if if you have toString() available in a class and you pass it in as an argument that is typed as string. Steven Wade 4:54 Even though at least when my test was weak types, it will actually cast that for you. If you have that. String argument type hint, it will cast it and then that will be a copy. So it will actually just be the result of that cast to string. I do not think I think it throws an error if you have a strict type set. No, I think it'd be very similar, right. It's just how you want to use it in user land, you know, the __toArray() is you're going to you could cast it yourself ,or you can with weak types PHP could cast for you in the appropriate circumstances. If you want the same functionality. In some for now, you would need to call, you know, the __serialize() yourself with the toArray(). In the future, you could implement the toArray() and then your serialize could actually just cast this object to array, and then that should actually convert that for you. And then serialise will then return array so you're not duplicating how you want that object represented when it's an array. Derick Rethans 6:00 So the RFC mentions that when you do a print_r of person is called __toArray(). But that's not particularly a cast. So why would it do it here, but not for method arguments, for example? Steven Wade 6:11 That is a product of this being my first time and that was a mistake that was thankfully pointed out during the discussion period and has been corrected. Derick Rethans 6:19 I read this RFC a week or two ago or so. And I haven't.. I should have reread it this morning that. I did not so my apologies for not being fully up to date here. There's some array functions in PHP like sort() that operate on an array as a reference right? That can't particularly work if you first have to cast to an array, which is what your current RFC now just. I mean, toArray() only gets called when you cast to an array or when it's a weakly typed argument. But how would it work for methods or functions that accept an array by reference? Steven Wade 6:49 At least the way I proposed it, they would throw an error as it currently does. Again for my test and trying to keep this within parity with the toString. I don't believe there are many functions that will operate on toString on, on a string by reference, as there are with arrays. From what I can recall is that it would throw an error. If you try to operate by reference on an object that implements toString, it will throw an error. Derick Rethans 7:10 And it wouldn't just fall back to using an object because that'd be very strange behaviour in that case, I suppose. Steven Wade 7:15 Basically, if it's if it's not something that can be cast or converted to an array through this method, and it's just going to be the same functionality you have in current PHP, which will be throw an error. Derick Rethans 7:24 Going to go for the principle of least astonishment or something. Steven Wade 7:27 Yeah, I don't want to introduce too many changes to it. I just want to be able to cast. Derick Rethans 7:31 I think that is a great idea. Actually, I mean, the same thing I've spoken with Nikita about, that introducing features step by step makes it a lot easier for people to comprehend what you actually end up doing. And there's also less, less chance of people getting bogged down in liking a specific aspect of the RFC but not of the other RFC parts. And we end up not merging the whole thing with the sub part of it. Steven Wade 7:54 And that's why I was very purposeful and not including any kind of write. You write, you cannot write to a class that implements toArray(). You know, as you will with array ArrayObject, because that we have that for a reason. So this is different functionality, we just wanted to keep it small, and just have this little helper Derick Rethans 8:11 I read in the RFC, something called get_mangled_object_vars(), but I didn't quite understand what it was. Steven Wade 8:16 So that was actually a function introduced in 7.4, as a direct result of my original proposal trying to see what people thought in the internals and in the community of this feature. Sometime in spring, last year 2019, I began this discussion, and there was some initial feedback with folks saying that it would cause some breaking changes in their libraries or their code, because they are overloading the casting. Right now, if you cast an object, I guess you get insight into the object's internals without any side effects. And so I think that's how Symfony's var dumper works. And that's how they're able to display some of that information. So that was concern by introducing this, that functionality would break. And so to introduce a method that would give you the same benefits without overloading the casting, the get_mangled_object_vars() was introduced and accepted and implement in 7.4. Derick Rethans 9:04 And that returns the object properties with their special characters in place. Because PHP internally, if you have a private method, the name for both methods and property is done by doing a null character, the name of the class, a null character then the property name. So that's what that would return, I suppose. Steven Wade 9:22 I believe so. Derick Rethans 9:22 I ran into a similar issue in Xdebug, because in some cases, you want to call get_debug_info, which is what people implement for getting debug info for their objects. But in other cases, you don't want to do it because you want to see everything that happens internally, or you want to see all the properties that exist. So there's kind of a tricky one. And I think at some point with toArray also happening, I might actually end up adding the output of both toArray() and get_debug_info separate sort of fake properties into the Xdebug output. But of course that only works if toArray() has no side effects. I don't think there's any way of preventing that in the toArray method that you can now implement that it doesn't change any information in normal properties, for example, right? Steven Wade 10:12 And that's kind of some of the internals of it that I'm not fully familiar with. With it, I'm hoping to kind of, you know, the discussion period will help eliminate some of that. Derick Rethans 10:20 I don't think you'd be able to actually. Steven Wade 10:22 Just recently, we were able to throw an exception from the toString. I don't know if you can actually do any kind of operations, write operations on the object within the toString? I do? That's a good question. And I do look that up. And whatever that behaviour is, we'd want to mimic here as well. Derick Rethans 10:34 I believe you can. It's normal PHP code, right? And if you don't want to do it, you need to clone it first, which is something you could choose as an implementation, right? You could first clone the class and then call the toArray method on the cloned object. I don't think we have any protection for that. The RFC is currently in the discussion phase. At the time of recording, we're talking about the discussion period. When I sort of thinking of ending that and going for vote? Steven Wade 10:58 I think this is actually going to be probably a longer period of discussion. And I think most RFC is most fleshed out just because of the nature of it. I am a full time employee full time, father, husband, and also student, as well. And so I don't have a lot of time to do this. And I want to do it right. I want to be able to respond to this. And so the discussion opened up a week ago, and this morning is the first time I've had to be able to respond to that and update the RFC. And so I because I really care about this and would love this feature to go in. I want to continue to solicit discussion and advice and questions and to be able to answer them all and do that. So however long it takes. Ideally, I would love it to be closed, voted on, accepted and implemented in time to be able to get in for the feature freeze for 8.0. Derick Rethans 11:40 For that you have about four months. Would you have anything else to add that I forgot? Or you want to add that you think it's interesting to know about this RFC? Steven Wade 11:50 Yeah, the only thing I would add is I've seen discussion, someone posted the RFC on Reddit and I've seen discussions with people like it, people hate it. They want to move one way or the other again, it's just It's a small feature, it's a helper. It's a tool that you can use. Is it perfect? No. Is it going to satisfy everybody? No. You've got the people who are want more functional and procedural you got people who want more OOP. I think it's just another helpful tool that could be in your tool belt. If you use it great. If you don't, you don't have to touch it. Derick Rethans 12:19 Very well. Thank you, Steven, for taking the time to talk to me this afternoon. I'm looking forwards on this coming to vote at some point. Steven Wade 12:27 Thank you for having me on the show. And let me explain the purpose and the reasoning behind this RFC. And thank you very much for giving a voice to those looking to improve the language. Derick Rethans 12:35 You're most welcome. Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: __toArray() Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 40: Syntax Tweaks London, UK Thursday, February 13th 2020, 09:03 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about a bunch of smaller RFCs. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 40. Again, I'm talking with Nikita. Perhaps we should rename this podcast to the Derick and Nikita Show at some point in the future. This time we're going to talk about a bunch of smaller RFC that he produced related to tweaking PHP syntax for PHP 8. Nikita, would you please introduce yourself? Nikita Popov 0:42 Hi, I'm Nikita and I do PHP core developement on behalf of JetBrains. We have a couple of new and not very exciting RFCs to discuss. Derick Rethans 0:53 Sometimes non not exciting is also good to talk about. Anyway, the first one that caught my eye was a RFC called static return type. So we have had return types for well, but what is special about static? Nikita Popov 1:07 So PHP has three magic special class names that's self, referring to the current class, parent referring to the well parent class, and static, which is the late static binding class name. And that's very similar to self. If no inheritance is involved, then static is the same as self introducing refers to the current class. However, if the method is inherited, and you call this method on the child class, then self is still going to refer to the original class, so the parent. While static is going to refer to the class on which the method was actually called. Derick Rethans 1:51 Even though the method wasn't overloaded Nikita Popov 1:54 Exactly. In the way one can think of static as: You can more or less replace static with self. But then you would have to actually copy this method inside every class where. Derick Rethans 2:09 You have not explained the difference between self and static. Why would you want to use static as a return type instead of self? Nikita Popov 2:17 There are a couple of use cases. I think the three ones mentioned in the RFC are. The first one is named constructors. So usually in PHP, we just use the construct method. Well, if we had to give this method, a type, a return type, then the return type will be static. Because of course, the constructor always returns while the class you're actually constructing, not some kinda parent class. And named constructors are just a pattern where you use a static method instead of a constructor, for example, because you have multiple different ways of constructing an object and you want to distinguish them by name. Derick Rethans 2:57 Could we also call those factory methods? Nikita Popov 3:00 Yeah, that's also related pattern. So for named constructors, you usually also want to return the object that it is actually called on. Derick Rethans 3:09 It makes sense attached there because of that then creates a contract that you know that is named constructor is going to return that same class and not something else. Because there's no requirements that would otherwise require that same class, like you'd have to construct. Nikita Popov 3:22 Exactly, yeah. The other pattern. These I think maybe that popularised by PSR, maybe 7 or something, the HTTP request object interface, the object is actually immutable. And the way you change it is by calling it with something method. And this method is going to return you a new object with this particular bit of information replaced. And again, usually for these kinds of API's, you also want to, you want to return the class that the methods actually call them. So if you extend this kind of API, you don't want to get objects of the parent class back. Right the third way, and I think the like by a pretty large margin, the most common one, is just normal fluid mthods. So where each method returns this. This is always an instance of static. So if you extend the class then this is going to be the extending class of the parent class. For this particular case, in PHP there is also a different convention where instead of returning static, you actually need right, return $this. So you use $this as a one word type, a special types indicates this type of method. So static would be a slightly weaker form of that. But we might still add the special $this case in the future. Derick Rethans 4:51 Because static would only enforce it's the same class but not to the same object. Nikita Popov 4:56 Exactly. Derick Rethans 4:56 Are you intending to add that to this RFC? Nikita Popov 4:59 I like to keep RFCs like different issues separated. Derick Rethans 5:03 It makes it easy to talk about them and get them accepted or not. Nikita Popov 5:06 I'm not totally convinced on the $this thing, because static is in the end. I mean, we already allow self return types, we allow parent. It make sense to allow static. But this is not really a type or some kind of extra contact contract on top of the type system. And I'm not sure it makes sense to open this position. Derick Rethans 5:28 Okay, in which position would you be able to use the static keyword? You've already mentioned the return types, there are other places as well? Nikita Popov 5:36 No, you can only use it in return types, it would simply not be sound. So it would violate the liskov substitution principle in any other place. The reason why you can use static in return types, is that static is basically a restriction on each inheriting class. So in your original class, static is the same as self. Then in the inheriting class, static is again the same as self, but in the inheriting class. And the inheriting class is a subclass or a subtype of the parent class. So this is allowed by the liskov substitution principle or by our variance rules. If you do the same things for parameters, you would also go from having a parameter for the parent class to parameter for the child class. So you would restrict the amount of inputs that are allowed in this parameter. And that's invalid. And the same argument also goes for properties. Derick Rethans 6:37 The RFC also talks a little bit about variance and subtyping. How is static considered here differently from self, or if you just explained exactly that? Nikita Popov 6:46 static is considered a sub type of self. If you have a parent method that uses a self return type, you can have a child method that uses a static return type, because static is ta further restriction. So self allows, still allows you to return the parent class, while static does not allow it. So you restrict the amount of return values and that's valid. While going the other direction. So replacing the static type and the parent method ,with the self type in the child method that would not be valid. Because, you make the amount of low values larger. Derick Rethans 7:21 And that is exactly the same as the other variance rules that we have since PHP seven for of course. The the last thing the RFC mentions or actually don't quite remember whether it mentions is, is whether you can also use static as part of a union type. Nikita Popov 7:35 So yes, you can. Derick Rethans 7:37 Okay, that's the simple answer. I like simple answers. Nikita Popov 7:40 Together with the other restrictions. So that union type has to be in the return type position. But apart from that, you can. Derick Rethans 7:47 That's good to hear. Nikita Popov 7:48 There is actually one more tricky thing regarding the property types. Without a lot of static and property types because as I mentioned, it would violate our variance rules. But unfortunately we have the extra issue that we also have static properties. So if you write public static foobar, then is that static for a static property or for a static type? Derick Rethans 8:14 Right, because we don't enforce that a static goes or goes before or behind public, private, or protected. Nikita Popov 8:21 Yeah. Derick Rethans 8:21 At least not in the syntax. I mean, I think coding standards actually do most of the time require the static to be before. Nikita Popov 8:27 Even the coding standards they would require you to write it as public static, not this static public. Derick Rethans 8:35 Oh, really? Okay. I thought was the other way around. Yeah, that is difficult. Because then you don't know which static is meant here. Nikita Popov 8:41 Yes, and we just allow on the, disallow it on the grammar level. It's actually a bit ugly, because we have to like duplicate the whole type grammar two times, once to include static, once to not include it, just to deal with this ugly of conflict. Derick Rethans 8:56 That's what happens when you come with something clever. You need clever workarounds. Nikita Popov 9:00 So it's unfortunate that the static keyword has like three or maybe four completely different meanings in PHP. Simply I think, simply because people wanted to re use a keyword, instead of introducing a new one Derick Rethans 9:15 Because introducing new keywords might end up meaning breaking people's code. Nikita Popov 9:19 On the downside, reusing keywords makes code confusing, because well, at least I got the impression that some people find the use of static for late static binding somewhat confusing. And I can also see if you see methods that has signature public static, whatever and return static, and you're not like super familiar with what all of that means. Derick Rethans 9:46 And that is quite a common pattern because this named constructors are static methods that return static. Let's move on to the next one, which is a tiny RFC that you came up with, which is the Class Name Literal on objects. What does this do? Nikita Popov 10:03 The syntax where you write a class name, then the double colon class. And that just returns you the fully qualified class name. For example, have a use statement for that class, you get back the full name instead of the short name. I think we've had this since PHP 5.5. And it's a great feature because it's like makes it clear where you're referencing the class and not just some random string. And that means, for example, that that IDE refactorings could work better and so on. Derick Rethans 10:35 Okay. Nikita Popov 10:36 The actual RFC is very simple. Currently, the class syntax is only allowed on like literal class names, but you can take an object variable and get the class of that object using the syntax. Derick Rethans 11:48 However, PHP has a function for that already, which is called get_class() right? Nikita Popov 10:52 Exactly. This is essentially just syntax sugar for get_class(). The reason why we want to have the syntax sugar is really not so much that writing get_class() is particularly hard, but just that people expect it to be there. This class syntax looks a lot like a constant access, like a class constant access. So it looks like every class has a magic constant called class. Usually you are able to access class constants on objects. So you can write object, the double colon, and the constant thing. And that works. So in that case, we just take the class of the object and access it on that class. For consistency reasons, it only makes sense that you can do the same with this particular magic concept as well. There's really all the motivation Derick Rethans 11:43 Originally the class literal colon colon class is resolved at compile time. Of course, that can't happen on object colon colon class. Is that still true or no longer? Nikita Popov 11:53 So it really was true in the first place. For normal class names of cours is resolved at compile time. Actually one of the like gotchas with the syntax is that some people expected to validate that the class actually exists. So they expect that this gets auto loaded and they get an error if it doesn't exist, doesn't happen. So you can reference some non existing class with this syntax just fine. The usually your IDE is going to show a warning for that. I mean, as we just discussed, we also have a couple of magic class names. So we have self, parent, and static. The static one in particular, also always has to be resolved at runtime, because we don't know what the what class the method is actually going to be called on. Actually, self and parent also sometimes have to be resolved at runtime. And there are two cases where that can happen. One is if you use traits, because in that case self refers to the using class, not to the trait. So in closures the self class, refers to the bound scope. The bind to method, there is like the last argument on, is the scope you're using. So in those cases, it's already dynamically resolved. Derick Rethans 13:09 Okay. The RFC mentions one specific area where you can't use colon colon class. In which situation can you still not use colon colon class on objects? Nikita Popov 13:20 You can always use it on an object. I think what you're referring to is that normally, for normal class constants, you can also put the class name inside the string. I mean, put the string class name inside the variable and then access the constant on that variable. Derick Rethans 13:38 Oh, right. Yes. Nikita Popov 13:39 For the double colon class syntax, we don't want to allow that. Because, well, first this is kinda useless, because it will just return you back the same string you gave it. And I think in that case, the fact that the class name is not validated, this is especially confusing. Derick Rethans 14:00 Okay, that makes sense. So you can only call colon colon class on literal class names that you already could, as well as on variables that contain an object? Nikita Popov 14:09 That's right? Yeah. Derick Rethans 14:10 That sounds great. Does it show up differently in reflection? Nikita Popov 14:13 This magic class constant actually doesn't show up in reflection at all. It looks like a constant both it's really a special syntax that just happens to share the look with constants. Derick Rethans 14:24 Do you expect any controversies about this? Nikita Popov 14:27 I don't think so. Derick Rethans 14:28 I don't think so either. I can't really see anything that people could complain about too much. I think. I however, do think that for the next RFC that you came up with the variable syntax tweaks, there will be a little bit of haggling about whether this is good idea to do. In PHP seven, zero, we got this uniform variable syntax. Could you give a brief reminder of what it was about? Nikita Popov 14:48 That was about, well fixing a couple of syntax inconsistencies when it comes to variables syntax. So variable syntax in PHP is extremely, extremely magic. Like our expression, syntax nice and regular. But the variable syntax is a huge assortment of special rules and that RFC made those rules little bit less special at more regular. Derick Rethans 15:17 From what I understood we missed a few inconsistencies that we probably also should have addressed in that RFC. And that is what you know, trying to tweak again? Nikita Popov 15:24 All of these remaining consistencies are like really, really minor things and edge cases. But weirdly, all or at least most of them are something that someone at some point ran into, and either open the bug or wrote me an email or pinged me on Twitter. So people somehow managed to still run into these things. Derick Rethans 15:52 The RFC mentions four specific things that we've missed. What is the first one? Nikita Popov 15:57 Yeah, so it's probably going to be somewhat hard to talk about some of these examples. Derick Rethans 16:02 I know because I think some of them make no sense whatsoever. Nikita Popov 16:05 Yeah. Derick Rethans 16:06 Because how do you call a method on the string? Nikita Popov 16:07 Context for this one is I have a nice little extension called scalar objects, which allows you to more or less define methods on strings, on integers, on arrays and so on. In with the uniform variable syntax, we have allowed calling methods on string literals. That actually makes no real sense with baseline PHP. But if you're using scalar objects, then this is a useful feature because you can do something like take a string that rule and call length on it, while otherwise we'll have to wrap it in brackets. Derick Rethans 16:45 So it's just a syntax change pretty much. Nikita Popov 16:48 Well, what this one particular is about that right now, this works if it's a literal string, but if you have any variables inside it than suddenly stops to work, which is just a very. Derick Rethans 16:59 So it is the interpolated strings inside double quotes, the dollar variable name syntax. That's the problem that? Nikita Popov 17:05 Yeah. Derick Rethans 17:06 The second one is called constant derefenceability, which is a word I can't pronounce. And my text edit says it's not a word. So what do you mean by it? Nikita Popov 17:14 That's a good question. I think the term is more or less picked up from C, where we have pointers. And we can dereference pointers to access what the pointer points to. So that's the star operator in C. In PHP, we use the term dereference to also access some kind of structure in some way. For example, to access an array element, so array dereference, or to access an object properties, object reference and so on. That particular one is, I think two things. One is that you can, for example, access the first character of a constant. So read the constant name then brackets zero. Well, maybe even not the first time I can think of a better example. Um, you haven't the constant that contains an array, and you want to access a specific key on that area. That's something you can do already you right now. The same syntax does not work if the constant is in magic constant. What also doesn't work is if you use the our alternative array access syntax. So we have the square brackets, that's what people should use. And we have the curly braces, which is the alternative way to access arrays and which is actually deprecated as of 7.4. I'm not totally sure that that's going to be removed in PHP 8 or not. If it's going to be removed, then this part is a moot point. But yeah, this is again, I think, from a practical perspective, not really interesting. The only situation where I think this is useful is again, of course scalar objects, because it means you can call the methods on the constant Derick Rethans 18:57 Okay, which in syntax is the grammar currently disallowed doing that. Nikita Popov 19:00 Currently it's disallowed and that would allow it. Derick Rethans 19:03 A third one is related. I think it's a class constant dereferenceability. Nikita Popov 19:07 So someone complained about this one on Twitter. I don't know how they ended up trying to do this. Something you can do right now is, you can access a static property. And then you can interpret the content of that static property as a class name, and access another static property on that. So you can change chain these static property accesses. For some reason, the same does not work with class constant access. So static property accesses can be chained. But class constant accesses can't be. Again, for no particular reason, this change would allow that to happen as well. Derick Rethans 19:41 This is even a change that makes sense without having to use scalar objects. Nikita Popov 19:45 That one is. I wouldn't write that kind of code, but it logically makes sense. Derick Rethans 19:50 And then the last one is, in the RFC is called arbitrary expression support for new and instanceof. Nikita Popov 19:56 Yeah, so this is probably the only one that's actually useful for something. PHP has well, a bunch of places where usually you have to place either an identifier or a namespace name, but class name, method name, or a property name, and so on, or even the variable name. For all of these places, we usually support some kind of special syntax to instead use a general expression. For example, some of the variable with a static name, you can use curly braces to use a dynamic name instead. Derick Rethans 20:26 I think for new, we did it at some point already. Nikita Popov 20:29 For new, this doesn't exist yet. So you can use a variable as class name, but you can't actually compute the class name as part of the expression. Derick Rethans 20:40 I think what I was referring to, is you can use braces around the whole new class extension, so you can call methods. So that's that's what I meant, but this is specifically using an expression behind new. Nikita Popov 20:51 Yeah, so these are like two things. One is whether you use an expression for the new class name, and the others for the use the new itself as an expression. And yeah, the same, so yeah, right now, we don't support that for new. And we also don't support it for instanceof, so the the right hand side, which consists of that as the class new. The RFC just proposes to allow an expression and parenthesis in there. And this kind of stuff is, again, not well not particularly useful. But it is useful for things like code generation, where you may have to insert arbitrary expressions sets up your coins. And there are actually some nice hacks that you can use right now. So you can use a variable with a complex expression inside it, where you assign to the variable itself and then return its name. Derick Rethans 21:42 I don't think I understand this. You're saying you can construct a string with a complex expression in it. Nikita Popov 21:48 Not a string. You you write something like new variable, but with a curly brace syntax, and in there you return you start off with the string containing some kind of dummy variable name, and then you concatenate that with an empty string. But that empty string is computed by doing the assignments to the variable name that you're actually going to return. Derick Rethans 22:12 I still don't understand this. You know, what I'm going to do is I'm just going to link to a example for this in the show notes. Nikita Popov 22:19 It's not really important. You can just cut off this part. Derick Rethans 22:22 Yep sure, I can do that too-perfectly fine. Nikita Popov 22:24 Nice hack. Derick Rethans 22:24 But let's not teach too many hacks to people such I think. Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and we'll see what happens to them. Nikita Popov 22:37 Thanks for having me once again. Derick Rethans 22:41 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Static Return Type RFC: Class Name Literal on Object RFC: Variable Syntax Tweaks RFC: Uniform Variable Syntax RFC Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 39: Stringable Interface London, UK Thursday, February 6th 2020, 09:02 GMT In this episode of "PHP Internals News" I chat with Nicolas Grekas (Twitter, GitHub, LinkedIn, Symfony Connect) about the new "Stringable Interface" that Nicolas is proposing, as well as about voting rights (on RFCs). The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. Hello, this is Episode 39. Today I'm talking with Nicholas Grekas about an RFC that he's produced called stringable interface. I already spoke with Nicholas last year about the work that Symfony does the new PHP versions come out to look at deprecations and to make sure that versions of Symfony work with new versions of PHP. But this time Nicholas came up with his own RFC called the stringable interface. Nicholas, could you explain what streamable is? Nicolas Grekas 0:54 Hello, and Stringable is an interface that people could use to declare that they implement some the magic toString() method. Derick Rethans 1:02 Because currently there's not necessary to implement an interface, and PHP's internals will always use toString if it is available in a class, right? Nicolas Grekas 1:10 Yeah, absolutely. Derick Rethans 1:11 What is true reason why you would want to have a stringable interface. Nicolas Grekas 1:16 So the reason is to be able to benefit from union type in PHP 8. Right now, if you want to accept a string as an argument, it's pretty easy. You just add the string type, right? Let's say now you want to accept a string or a stringable object, stringable an object being something that implements this method. If you want to do that, you can not express the type using types today. Derick Rethans 1:42 Because if you choose string, and then the name of an object that would only do that specific object. Nicolas Grekas 1:47 Yes, there are some cases in Symfony especially because this is where work and I do open source. Where we do want to not call toString method until the very latest moment. after example is in the code: one is from Drupal. Drupal computes some constraint validation messages, lazyly, and it's pretty important to them because computing the message itself is pretty costly. They don't need to compute it all the time. Actually, we added the type, the string type in Symfony five, before it was released and Drupal came and say: Oh, this is breaking our code and our features, what should we do now? And we removed the type and we replaced it by some annotation saying: Okay, this is a string or a stringable object. So in the future, when will add up PHP 6 would like to be able to express that using a type of real one, Derick Rethans 2:41 PHP 6? Nicolas Grekas 2:42 No, PHP 8, that's true. Strings and PHP 6. Derick Rethans 2:49 Yay. Nicolas Grekas 2:51 Another example is also is pretty similar, actually. It's in the symfony auto wiring system. We have services that we wire and sometime we can not; the auto wiring logic is broken doesn't work because some class cannot be at a wet. So in this case, we have a lazy message, because sometime of service while it's not auto wireable, it's going to be removed later on because we removed, Symfony removes, unused services. So instead of computing ahead of time and error message that is heavy to compute, and that we might just trash because the service is going to be removed. We have this lazy thing because yeah, it's heavy to cook with that. So real world use cases. Derick Rethans 3:32 I think the intention by by having a stringable interface actually makes sense. What are the concerns for for adding this to your own code, are issues with backwards compatibility, for example? Nicolas Grekas 3:43 That's another goal of the RFC. The way I have designed it, is that I think the actual current code should be able to express the type right now, using annotations of course. So what I mean is that the interface, the proposal, the stringable is very easily polyfilled. So we just create this interface into global namespace, the declarative method, and done. So we can do that now. We can improve the typings now, and then in the future, we'll be able to turn that into an actual union type. Derick Rethans 4:16 You'd be able to do that almost immediately. Well, you would be able to do that in PHP 8. Nicolas Grekas 4:21 Yeah. Derick Rethans 4:21 Without it being a problem. And of course, in that case, you can remove to polyfilled stringable interface. Nicolas Grekas 4:27 Yeah, absolutely. Derick Rethans 4:28 This is going to impact extensions, as well, because extensions, I mean, PHP, internal functions, they often accept strings. I don't actually remember but if you use a scaler type hint string for an internal method than PHP or internal function, this is actually called a toString interface on objects. Like if you would call strlen() on an object that implements toString would actually call toString and return the length to that result. Nicolas Grekas 4:53 Yes, absolutely. Derick Rethans 4:54 So that wouldn't impact that specific case then. Nicolas Grekas 4:57 About extension because that's the current state of the implementation of extension, there was a discussion we're going to talk a bit later about, I think. The current state of the art say is that the interface declares the method that just run right, it declares the written type. It's colon string. So the declaration is public function "toString : string". The very first version didn't have the written type, because it's easier for backward compatibility. Because the current code doesn't need the written type. So by not adding it to the interface, we don't break backward compatibility, which is another critical lighting designer feature that I want at least to have. And so feedback came on the first pull request and said okay, we need the written type. So, the way I implemented that is that now in the RFC actually, the written type is implicit. toString, if you declare it, whether you type ": string" or not, it's there. If you do some reflection later on an instance of something that that then the reflection will tell: Yes, there is a written type and it's string, Derick Rethans 6:01 Whether you have defined it or not in your class. So that's a little bit of magic that gets added on. Nicolas Grekas 6:07 So it doesn't break any semantics because the written type is already in force: you cannot return anything else than the string right now. Derick Rethans 6:14 Yeah, that's true. So that means that automatically toString methods will in return type hints require string to be returned. Nicolas Grekas 6:21 Yes. Derick Rethans 6:22 And that tweak was necessary to make sure that an older backward compatibility was being broken. Nicolas Grekas 6:27 Yes Derick Rethans 6:28 Does that also extends to extension that no part that are not part of the PHP core distribution, do they need to be changed as well? Nicolas Grekas 6:35 So right now, in the current implementation, yes, they need to be changed. If they declare the toString method, they need to change the type basically, to declare that they return the string explicitly in the C code. So that the current state it's pretty easy on the implementation, implementation side to ask that to the extension authors, right? I think it is doable, but Nikita today posted proposal to improve and go to the next level of the RFC. And the next level would be to have the same magic for the declaration of the interface itself. So it would mean if you declare a toString method, then you implement the stringable interface without having to explicitly declare it in the class. Derick Rethans 7:22 I think that actually makes quite a bit of sense because that is pretty much how toString is used already. Anyway, the PHP engine enforces it has to be a string that's being returned. Nicolas Grekas 7:31 Yeah, that's very interesting in that would make the type as a typehint much more useful because any pre existing code would just work with the type and pass the type into the written type and so on. So that would be great. So the link with the extension is that maybe we should have the same automatic declaration implicit declaration applied to extensions. So then extension to boodle have to do to do anything and done. That would declare both the written type and the interface. Derick Rethans 8:03 That makes sense. You mentioned that Nikita just suggested something to tweak this RFC. I reckon this RFC is still open for discussion and voting hasn't started on it yet. Nicolas Grekas 8:13 Yes. Derick Rethans 8:13 Do you have any sort of idea for a timeframe where you think this will be finished? Nicolas Grekas 8:17 The earliest is on February 6, because we know we need to wait two weeks. So I opened that so we go. I don't know how to write the last part of what we discussed. So Nikita's suggestion. So I'm asking him to some help. As soon as it's ready. I think it can be open for voting. So it can be 10 days. So it didn't trigger much discussions on internals, which I don't know. Maybe it's a very, it's a good point. Or maybe it's like people will vote against without expressing why, I don't know. I hope it's a good thing. Derick Rethans 8:50 Sometimes people just start paying attention and there's a new vote. Nicolas Grekas 8:53 Yeah. Derick Rethans 8:54 So there wasn't a lot of controversy about stringable as you just said, but there was some controversy about you actually apply for voting rights, I remember what happened there? Nicolas Grekas 9:03 So yes, I applied for voting. Because of my implication, I think I'm an active PHP contributor to internals in not on not on the C-side, but Okay, so since I wanted to open this RFC, I said: Okay, now it's time to do the bureaucratic steps to get a vote, right? Derick Rethans 9:23 Yep. Nicolas Grekas 9:24 And I think I'm the first person to actually get through some process for getting votes in itself. I mean, I think most people or maybe all people that have a vote, a vote as a side effect of of something else. Derick Rethans 9:38 Yeah, usually about contributing patches, either PHP itself, documentation or extensions. Nicolas Grekas 9:43 So I think there's there has been some confusion, but it's been sorted out pretty quickly. I think I'm going to be able to vote on the next RFC. I'll report back if I can. Derick Rethans 9:54 Okay, fair enough. Currently, we don't really have a process for this at all. I mean, you get to vote when you have a GIT account. Pretty much, or a PHP commit access in some form. And I don't think we've ever really thought about handing that out to people that have been contributing a lot. Right. So that's kind of an interesting thing to see. What we have seen in the past, is people wanting just saying: Yeah, I'd like to vote, or in other cases, or yeah can I have a php.net email address, right. So that also happened because that is a side effect of getting commit access. Nicolas Grekas 10:23 Okay. Derick Rethans 10:24 At the moment I what happened when you did it, it got immediately shut down. Probably a bit quicker than was nice without any discussion. But I think in the future, we do need to come with, come up with a plan and perhaps even think about how to approach voting for features for RFCs the first place because we don't really have a set guideline on who gets to do this and who doesn't get to do it and stuff like that. Nicolas Grekas 10:49 Yeah, it's pretty interesting. Nikita just after the or during the discussion at, he posted some stats on the number of people who can vote and I think the number is like 1900 Derick Rethans 10:59 Yeah. There's quite a lot here. Nicolas Grekas 11:01 It's bit strange. And most people don't vote, I think, because they think they shouldn't. I don't know, something like that. But it's true. It's pretty strange. What I like about this situation is that it doesn't draw a strong line between people that contribute C code and people that write PHP code. And it's nice for PHP. I really think it's nice for PHP to have people that vote that don't do C code. But I think, of course, people that do C code must have the strongest voice, because at some point, the implementation decides. Derick Rethans 11:35 Well, that is a different right, the votes are usually on the idea, not on the implementation. But sometimes the implementation is so complicated that it's nearly impossible to implement, like, I've very briefly spoken with Nikita about generics. I'm sure we'll talk about that at some point, where I'm pretty sure that generics is an idea that simple, I mean, people will vote for it, but as an implementation it might not be that simple to do. Nicolas Grekas 12:01 Yeah. Derick Rethans 12:02 So what happens if you vote for the feature, but you can't come up with a good implementation? Nicolas Grekas 12:06 So I'm inside of thinking that people should vote on the implementations. I mean, people shouldn't be able to vote only on an idea. If there is an idea, it will be supported by an implementation that proves that we are talking about something real, no, just a fancy idea that might not work in black. So that's my opinion. Derick Rethans 12:24 That's a good point. But as you said, from the 1900 people, or or 1900 people plus, that's controlled, most of them are not familiar with a PHP internals whatsoever, because they tend to be contributions to the documentation. This is also very valuable, but it doesn't mean you know, and you don't necessarily know PHP internals, Nicolas Grekas 12:40 Yeah, sure. Derick Rethans 12:41 The oher way can be true as well right? You might know a lot about PHPs internals, but never really use PHP in real life, in your job, or anything like that. Nicolas Grekas 12:48 So it's also good to be able to team up with someone that knows how to code the C part, the internal part. So you have the idea you're you're the supporter part of the team and then someone - being able convince someone to do the implementation or to help you do it, is also proof of kind of interest. So starting small and bringing more people in the boat and making it happen as a thought. Derick Rethans 13:12 Yeah, and we saw some of that happening last year. I can't quite remember what feature it was or or exactly what it was. But I agree with you. I think that is important to do that you can at least somebody convinced to implement the feature before just voting on the idea. Thank you for taking the time with me this morning, Nicholas. Nicolas Grekas 13:30 And thank you Derick for having me again. Derick Rethans 13:32 It it continues like this I'm sure we'll speak again at some point in the future. Nicolas Grekas 13:35 Okay. Derick Rethans 13:39 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week. Show Notes RFC: Stringable Interface Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
PHP Internals News: Episode 38: Preloading and WeakMaps London, UK Thursday, January 30th 2020, 09:01 GMT In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about PHP 7.4 preloading mishaps, and his WeakMaps RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Show Notes RFC: WeakMaps Transcript Derick Rethans 0:16 Hi, I'm Derick. And this is PHP internals news, a weeklish podcast dedicated to demystifying the development of the PHP language. This is Episode 38. I'm talking with Nikita Popov about a few things that have happened over the holidays. Nikita, How were your holidays? Nikita Popov 0:34 My holidays days were great. Derick Rethans 0:36 I thought I'd start with something else then I did last year. In any case, and wanting to talk to you this morning about something that happens to PHP seven four over the holidays. And that is issues with preloading on Windows with PHP seven four. I have no idea what the problem is here. Would you try to explain this to me? Nikita Popov 0:56 So there were actually quite a few issues with preloading in early PHP 7.4 releases. The feature definitely did not get enough testing. Most of the issues have been fixed in 7.4.2. But if you're using preload-user, what you have to use if you're running on the root, then you will probably still see crashes and that's going to be fixed in the next release. Derick Rethans 1:20 In 7.4.3. Nikita Popov 1:22 Right. But to get back to Windows, Windows has a well very different process architecture than Linux. In particular, on Linux, or BSD we have fork. Which basically just takes a process and copies its entire memory state to create a new process. This is a lot cheaper than it sounds because it's all like reuses memory until it's actually changed. Derick Rethans 1:48 Its copy on write. Nikita Popov 1:49 Copy on write exactly. The same functionality does not exist on Windows, or at least it's not publicly exposed. So on Windows, you can only create new processes from scratch, that look, we use our memory from the previous one. And for OPcache, this is a problem because OPcache would really like to reference internal classes as defined by PHP. But because we store things in shared memory, which is shared between multiple processes, we now have the problem that these internal classes can reside at different addresses, in these different processes. On Linux, it's always going to be the same address because we are forking and that keeps the address. On Windows each process could have a different address. And especially because Windows since I think Windows Vista, uses address space layout randomization. This is actually pretty much always going to be a different address. Derick Rethans 2:51 Because that's a security feature? Nikita Popov 2:52 Exactly. It's a security feature. Derick Rethans 2:54 Would it also be a problem on Linux if you'd start a process instead of forking it? Nikita Popov 2:59 Yes, it would be a problem. The difference's just that on on Unix, we don't do that. OPcache has quite a different architecture on Windows. On Linux, we do not allow to attach to an existing OPcache from a separate process. So the only way to share an OPcache is to use fork. On Windows because of this restriction that we don't have fork, we do though this kind of attachments and that's where we have we have to deal with these kind of issues. So that's actually a general problem, not just for preloading on differences, just that normally, we can just: Hey, okay, we do not allow any references to internal classes from shared memory on Windows. It's like a slight hit to optimization, but it's not super important. While we're preloading, we have to link the entire class graph during preloading. And if you have any classes that for example, extend from an internal class, like extend from Exception. Or in some cases, you can just use an internal class as a type hint, then we would not be able to store these kinds of references in shared memory on Windows. And because for preloading, it's pretty much inevitable that you run into the situation you just can't realistically do preloading on Windows, Derick Rethans 4:18 Hence, the decision being made just turning it off, instead of trying to end and always failing pretty much. Nikita Popov 4:24 Yeah, I mean, it kind of did work before, it just got a bunch of warnings that these classes haven't been preloaded. And if people try that, oh, it's like with a simple example there, we'll see you great, preloading is working. But once they move to their actual complex application that uses internal classes at various points, it turns out that: Actually, no, it doesn't really work in practice. And so the way that we just disabled entirely Derick Rethans 4:51 That seems like a reasonable solution to this, do you think at some point this can be fixable in another clever way? Nikita Popov 4:58 Well, main way in which can be fixed is to avoid this kind of multi process attachments on Windows. The alternative to having multiple processes is to have multiple threads, which do share an address space. Basically same as fork just with threads then. But that, of course, depends on what kind of web server you're using and what kind of SAPI you're using. And I think nowadays, on Windows on threaded web servers are somewhat more popular than on Linux, it's still not the majority deployment strategy. Derick Rethans 5:34 I think it used to be that threaded process models on Windows were a lot more common when PHP just came out for Windows, because it was an ISAPI module which was always threaded. From what I remember the original reason why we had ZTS, in the first place. Yeah, at some point that started moving to PHP FPM kind of models because it didn't use threading and it was, tended to be a lot safer to use it that way. Nikita Popov 5:57 Right. I mean, threading has issues in particular because things like locales are per process, not per thread. So processes are usually safer to use Derick Rethans 6:08 Anything else interesting that happened that went wrong with a preloading, or do you not want to mention? Nikita Popov 6:12 The rest is mostly just that we have two different ways of doing preloading. One is using OPcache compile file, and others using require or include, and the difference between them is that OPcache compile file combines the file but does not executed. In that case, the way we perform preloading is that we first collect all classes and then we, like gradually, link them, actually register them, always making sure that all the dependencies have already be linked. And this is the mode that that I think mostly work well at the release of PHP seven point four. And the other one, they require approach is where we, well require directly executes the code and registers the classes. And in that case, basically, if it turns out that some kind of dependency cannot be preloaded for some reason, we simply have to abort preloading, because we cannot recover from that. This abortion was missing. And it that turns out that, in the end, the way people actually use preloading is using the require approach, not using the OPcache compile file approach. Derick Rethans 7:26 Although that's the one you see most of the examples that I've seen, and in the documentation. Nikita Popov 7:30 Right, it has some advantages you some require. Derick Rethans 7:34 Something else that happened over the holidays is that you've worked on several RFCs there're too many to talk about at all in this episode. But one of the earlier ones, was a WeakMap, or WeakMaps RFC, which sort of builds on top of the weak references that we already got in PHP seven four. What's wrong with the weak references, and why do we now need weak maps? Nikita Popov 7:58 There's nothing wrong with weak references. As a reminder what weak references are both, they allow you to reference an object without preventing it from being garbage collected. So if the object is unset, then you're just left with a dangling reference. And if you try to access it, you get back knowledge of the object. Now, the probably most common use case for any kind of weak data structure is a map or an associative array, where you have objects and want to associate some kind of data with them. Typical use cases are caches or other memoise data structures. And the reason why it's important for this to be weak is that you do not well, if you want to cache some data with the object, and then nobody else is using that object. You don't really want to keep around that cache data because no one has ever going to use it again. And it's just going to take up memory usage. And this is what the weak map does. So you use objects as keys, use some kind of data as the value. And if the object is no longer used outside this map, then is also removed from the map as well. Derick Rethans 9:16 So you mentioned objects as keys. Is that something new? Because I don't think currently PHP supports that. Nikita Popov 9:22 I mean, you can't use objects as keys in normal arrays. That doesn't work. For example, the array access interface and the traversable interface, they don't really care what your types are. So you can use anything. Derick Rethans 9:37 I glanced over that that point, yes. But weak map is something that then implements array access. Nikita Popov 9:44 That's right Derick Rethans 9:45 How does the interface of a weak map look like? How would you interact with it? Nikita Popov 9:49 Yeah, actually, it just implements all the magic interfaces in PHP. So ArrayAccess, you can access the roadmap by key, where the key's object. Traversable, that is you can iterate over the weak map and get both the keys and values, and of course Countable, so you can count how many elements there are in there. And that's it. Derick Rethans 10:12 All the methods, there's plenty of em then, there should be nine or 10 or so right? Nikita Popov 10:17 Five. Derick Rethans 10:18 No there's the six of iterator. Nikita Popov 10:20 Right, yeah, there is this little detail where when you implement Traversable, internal classes, you don't actually have to implement iterator methods. That's why there is a few, a few less. Derick Rethans 10:33 Who's going to benefit from this new feature? Nikita Popov 10:35 Like one of the users for weak maps are things like ORMs. Where, well, database records are represented as object, and there is data storage related to these objects. And I think it's a, well, well known issue that if you're using ORMs you can sometimes run into Memory Usage issues. And the absence of weak structures is one of the reasons why that can happen. So that they just keep holding onto information even though the application actually doesn't use it anymore. Derick Rethans 11:12 Did a specific ORM request this feature? Nikita Popov 11:15 I don't think so. Derick Rethans 11:16 Because weak maps are something done as an internal class in PHP, how are these things implemented? Is there something interesting because I remember talking to Joe about weak references last year, there is some functionality where it would automatically do something on the destructor or rather of the objects. Is this something that also happens with weak maps. Nikita Popov 11:37 So yeah, the mechanism how weak references and maps work is basically the same. So there is a flag on each object, that can be set to indicate that it has a weak reference or weak map. If the object is destroyed, and has this nice flag, then we execute a callbeck that is going to remove the object from the Weak Reference or from the weak map, or from multiple maps. Derick Rethans 12:05 Is it because there are some kind of registry that links an object? Nikita Popov 12:08 So when we store all the weak references, weak maps, and the object as part of, so we can efficiently remove it. Derick Rethans 12:16 When I was reading the RFC, I saw something like SPL object ID mentioned, which is a way how to basically identify a specific object. Is this something related to weak references or weak maps? Or is this something else no longer used, or people should no longer use pretty much, because I guess this was a way previously how to identify an object and then associated extra data with it. Like you mentioned that ORMs were due for cache. Nikita Popov 12:44 Right. So it's kind of related, but I'm also not. So one is not a replacement for the other, just different use cases. We used to have SPL object hash for a very long time. And I think, somebody went around PHP 7.0, or maybe later SPL object ID was introduced, which this the same just because an integer and because because of that is more efficient. But in the end, what these functions do is return a unique identifier for an object. But this identifier is only unique as long as the object is alive. So these object IDs are reused when objects are destroyed. Derick Rethans 13:30 And that makes them not usable for associating cache data with a specific object? Nikita Popov 13:35 That makes them usable for associating cache data. But you also have to store the object to make sure it does not get destroyed in the meantime. So that's how you get around the restriction that you cannot use objects as array keys. That's what you need the ID for. But you still have to store the like a strong reference to the object to make sure it's not garbage collected. And this ID starts referencing some kind of other objects. Derick Rethans 14:04 When you say Strong Reference, that is what PHP references are traditionally? Nikita Popov 14:08 That's the normal reference. Derick Rethans 14:10 Well, because it's been quite some time since it's got introduced from what I understood this has been accepted? Nikita Popov 14:16 It is accepted: 25, zero Derick Rethans 14:18 25, zero. That doesn't happen very often. Nikita Popov 14:22 Most RFCs are maybe not anonymous, but usually either they are 95% accepted, or they rejected really hard. There is not a lot of middle ground. Derick Rethans 14:34 That's pretty good, though. In any case, we will see this in PHP 8, I suppose, coming out later in the year. Nikita Popov 14:39 That's right. Yes. Derick Rethans 14:41 Well, thank you for taking the time today to talk to me about weak references and preloading especially on Windows. Thank you for taking the time. Nikita Popov 14:50 Thanks for having me Derick Derick Rethans 14:52 Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week. Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
Thanks to PHP being an interpreted language and it that it has a garbage collector, PHP developers don't often have to think about memory management. Unlike developers in compiled languages, such as C/C++, we don't have to give that much thought to memory allocation and deallocation.However, it's helpful to have a broad understanding of how garbage collection works in PHP, along with how you can interact with it so that you can create high performing applications.In this episode, Benjamin and Matthew, talk with Derick Rethans about: The basics of how garbage collection works in PHP About how Xdebug and garbage collection in PHP came about About some of the functions available for interacting with it How to get the most out of garbage collection Links https://tideways.com/profiler/blog/what-is-garbage-collection-in-php-and-how-do-you-make-the-most-of-it https://pdfs.semanticscholar.org/0aa0/bbefc44d55a18826a0d82007f4b1f678cb36.pdf https://news.ycombinator.com/item?id=14823054 https://xdebug.org/support https://derickrethans.nl/xdebug-update-september-2019.html Guests: Derick Rethans.Hosted By: Benjamin Eberlei, and Matthew Setter.Thanks for tuning in to the Undercover ElePHPant. If you'd like to be a guest on the podcast or know someone very knowledgeable in writing highly performant and scalable PHP applications, email podcast@tideways.com. This podcast is produced by Tideways. Don't look further for an all in one Monitoring, Profiling and Exception Tracking software for PHP applications available on tideways.com. Follow us on Twitter (@tidewaysio). Find out more about us at https://tideways.com.
@yugo_tak さんとPHPアプリケーションのパフォーマンス・チューニングやプロファイル、ISUCON などについて話しました。 負荷試験 モニタリングツール 予測より計測 ボトルネック PHPプロファイル (New Relice, XHProf + tideways, Xdebug) チューニングのアンチパターン(キャッシュ、持続的接続、データ特性、原因の見極め) Go と PHP Singleflightパターン 仕様チューニング ISUCON楽しい PHPで出る チームレクリエーションとしてのISUCON
This month the team discusses Eric saying he loves Laracon and then proceeding to lose his shit why he doesn't like Laracon. Other topics include Eric discusses this year's Laracon John talks about why it's a bad idea to run XDebug in production and exposed to the internet. His pain is your lesson John posted his credentials to social media Stupid Router Purchase Credit Me · Issue #1594 · nova-framework/framework Laravel Nova
I had a nice chat with Derick Rethans (https://twitter.com/derickr) at PHPBenelux Conference 2016 (http://conference.phpbenelux.eu). We talked about how he became a British citizen, how Michael Schumacher encouraged him to learn PHP and a bunch of other random things. We also talked about Derick's career and open source contributions. Derick works at MongoDB (https://www.mongodb.com/) where he focuses on the PHP driver and library. Derick Rethans is the author and maintainer of Xdebug (https://xdebug.org), the well-known debugging extension in PHP. He is also a well-respected PHP core contributor and release manager. The video footage of the interview can be found on Youtube: https://youtu.be/APZF9YDRknI ► Follow me on Twitter - http://twitter.com/ThijsFeryn ► Watch my videos on Youtube - http://www.youtube.com/thijsferyn ► Subscribe on iTunes - http://itunes.feryn.eu ► Read my blog - http://blog.feryn.eu ► Check my presentations - http://talks.feryn.eu
In this episode we are lucky to be joined by good friend of the show Joe Watkins. We start off the podcast discussing the recent major release of pthreads (version 3) and the differences found compared to version 2. This moves us on to touch upon the design decisions made due to changes between PHP 5 and 7, providing a good API to the user, volatility and immutability considerations and installing extensions within PHP 7. We finally bring up the progress being made in moving over other extensions such as APCu and uopz, helping out with other projects such as Xdebug and our delve into screencasting.