#unit-testing
1 messages ยท Page 14 of 1
Hello, all. I have an interesting problem to which I got a suboptimal solution. I'd like to know if you can give me some ideas on it.
I have a legacy class whose structure is like this:
class SomeClass
def process(self):
try:
<some code>
self.create_indicators(some_parameters)
<some code>
except Exception as e:
return False
return True
def create_indicators(params):
<some_code>
my_interesting_var, refs = self.create_my_interesting_var(params)
SomeOtherClass.send_away_my_interesting_var(my_interesting_var)
<some_code>
So, as you can imagine, I want to inspect my_interesting_var when testing with unittest.TestCase. The only way I could do it was changing SomeClass code to this:
class TestException(Exception):
def __init__(self, message, obj):
super().__init__(message)
self.obj = obj
class SomeClass
def process(self):
try:
<some code>
self.create_indicators(some_parameters)
<some code>
except TestException as e:
raise TestException("Error processing my_interesting_var", e.obj)
except Exception as e:
return False
return True
And then I created the test like this:
@patch("mymodule.SomeOtherClass.send_away_my_interesting_var")
def test_create_report_create_web3_indicators(self, mock_mymodule):
# Make mocks:
mock_mymodule.return_value = MagicMock()
mock_mymodule.return_value.send_away_my_interesting_var.side_effect = lambda my_interesting_var, source_name=None: (_ for _ in ()).throw(TestException("my_interesting_var", my_interesting_var))
my_interesting_var = None
template = mymodule.SomeClass(some_inits)
try:
template.process()
except TestException as e:
my_interesting_var = e.obj
self.assertEqual(my_interesting_var, something)
The problem is, I needed to change SomeClass code so I could test it.
Is there a way to test this variable without changing the legacy code?
Isn't create_my_interesting_var() pure? If it was there would be no need to fiddle with create_indicators at all
Hello, all. I have an interesting problem to which I got a suboptimal solution. I'd like to know if you can give me some ideas on it.
My suggestion would be, don't try to retrofit tests to an already existing code. You won't improve the design of your code that way. It's poor as a regression test, because the test won't cover what's supposed to happen, only what does happen.
So if you want to cover that piece of your feature with test, start with a test, test-drive a new class, and then replace the old class with the new class.
I strongly disagree
First of all what is "supposed" to happen is very very often "whatever already happens"
Second, if you don't test for existing behavior parity first, debugging can become a nightmare
First of all what is "supposed" to happen is very very often "whatever already happens"
I see the confusion. When I said "supposed to happen", I meant the expected behaviour or outcome, and when I said "whatever already happens" I meant the implementation details. Good test should specify the expected behaviour, leaving the implementation to vary. If you start with the code, and try to retrofit tests to it, you don't get tests that check the expected behaviour, you get a test that checks implementation details.
Second, if you don't test for existing behavior parity first, debugging can become a nightmare
Sure, but if you start with tests, debugging done is very minimal, because projects like that tend to have very few bugs.
Well that's clearly wrong. It's called "black box testing" or even "snapshot testing" and it's a very reasonable behavior and might or might not test implementation details.
You can do black box testing either test first or test after, and doing black box testing test-first always yields better results than test after. As to the "very reasonable behaviour", I wouldn't be so sure, I would say it's only slightly better than manual tests. If I were to order testing strategies from the best to worst, then at the very end would be manual tests, and one tick before that would be tests written after the fact. Any kind of test written before the code will give you a better design, better coupling and cohesion, and most people report feeling way better when working with systems like that.
I guess, the reason people don't do that, is because they find themselves working on legacy systems without tests, so they very rarely get a chance to test-drive something for real. They are forced to either retrofit tests to existing code (which is very low quality), or create a new code and test-drive that (which is also often difficult). If someone finds himself in this situation, like imagine you work in a project like that for 2-3 years, and you can't get away from it; it's very painful to admit to oneself "I'm doing bad testing, because the project forces me to". It's much better to rationalize it by saying "I'm doing test after the fact and that's good".
I propose to you - take any person, show them either a very well written project with tests or a new project, show them how nice and easy it is to work in an envorinment like that, 99% of them will never write tests after the fact again - but they need to experience it first.
even "snapshot testing" and it's a very reasonable behavior and might or might not test implementation details.
I agree that it's popular, but it's not good. What you've achieved with a snapshot test, is a test that couples to the implementation most often. It doesn't provide the same level of freedom to refactor your internals as if it was written before the fact, like a regular test.
That's a strawman though. No one argued that.
Black box testing is great when it's the right tool. TDD is great when it's the right tool. No tests at all is great when it's the right situation. Everything has context.
Black box testing is great when it's the right tool. TDD is great when it's the right tool. No tests at all is great when it's the right situation. Everything has context.
๐ง
This argument can be used to defend any bad idea. Like LCD or OLED screens are better than old CRT monitors, but you could then say "lcd are good when they're right tools and ctr are good when they're the right tools". That doesn't mean anything. The context for using CRT is so narrow it doesn't make sense to recommend it to anyone, same as test-after code.
There are very few outright bad ideas. It's all about context. Ignoring context and dealing with absolutes is fundamentalism and that's no good.
You're leaving the subject and going off to broader areas.
The topic came @sturdy plaza who showed some code, and he asked how to retrofit tests into an existing code.
I suggested that it's a bad idea all together, and it would be much better to drive the design of the code from tests. You achieve much better reslts that way. By retrofiting (or snapshot tests), you couple your tests to the implementation. They assert the code is the code that is there, but it doesn't improve the design of the system (like tests-first would) and aren't flexible enough to allow a substantial refactor. Tests-after will break when you refactor your code, because they couple to the implementation.
Test before will allow refactor, because they're specificying the intent, not the implementation.
if you already have code, and you don't yet have tests, you should write tests. It sounds like you're suggesting you shouldn't write those tests.
I'm talking about the basic problem underlying the disagreement.
Well, yes and no. You're right in some areas.
If you write your code first, we don't generally thinkg of it being testable, because it's already hard to get it to work. So we tend to create software thats not-testable, poorly designed and tightly-coupled. It's hard to test that code.
On the other hand, if you write your test first, that drives you to creating software that is more testable, because you can't do it any other way ๐ The resulting code is much more testable, and by definition, decoupled. It's usually better designed, because when writing the test you're thinking of it what you'd like to achieve, not implementation details.
The first is undesired, and the later is highly desired.
Now, how do you achieve the second, if you already have the first? ๐ค
That's all irrelevant to the situation. He's asking about his broken leg and you're telling him to not break the leg in the first place.
The sad part is, you can't. The design decisions are already made, so you can't retrofit proper tests to that code. What you can do, is if you have like 100 "old classes" (classes without tests), you take one of them, and you rewrite it, but with tests. So you have 99-old classes and 1-new class. That one class is an improvment. You use the new class in your code, and when you're done you remove the old one. You do that for all code in your system, and you have a testable system now.
That's all irrelevant to the situation. He's asking about his broken leg and you're telling him to not break the leg in the first place.
I see why you may think that, and if I suggested that - that would be a poor advise. But I'm not! ๐ Let me explain:
it's not realistic to say, "just rewrite the whole thing"
If you rewrite it with tests but have no way to verify that your new code does the same thing as the old code, that's a problem.
He has a class without tests, and he wants to add a test. Of course, you should want the good tests and good design - if not, what's the point? And the best way to achieve that, is to write that class test-first.
If you do that, you will have what you wanted - a class with good tests.
it's not realistic to say, "just rewrite the whole thing"
I didn't suggest rewrite the whole thing, just this class that he wants tested.
if i have no tests, and take your advice, i will rewrite the whole thing.
I mean, that's what you said, but I am willing to believe you failed to communicate what you really meant :P I do that all the time myself.
My strong disagreement with your perspective is specifically in the context of writing new tests for existing code
wat.. that's just saying the same thing again, which is the part we disagree with
The chance of introducing a new bug, or accidentally missing some tiny feature, it is way too high
In an ideal world you have a specification for the program and you can implement that as the test suite, and then factor your code to match the test suite
In practice, the specification is whatever the program already happens to do
Oh, these hints from you all are interesting. I think there are more things here that I can chew for the moment.
I decided to create a side effect and get the arguments passed to send_away_my_interesting_var.
It's not the best solution, I know, but now I see I can use tests to help decoupling and speed up development from now on.
I'm glad it fired such an interesting discussion! ๐
if i have no tests, and take your advice, i will rewrite the whole thing.
You only rewrite the thing, that you want tested. If you want just one class tested,you only need to rewrite that one class.
If you rewrite it with tests but have no way to verify that your new code does the same thing as the old code, that's a problem.
That is also a very serious and real issue, thank you for bringing it up. It's serious, because there are two forces at play: In one corner, we have the "what the code should do" in the other "what the code does". Good tests should specify what the code should do. The good code, should specify what it does.
If you tell me that "have no way to verify that your new code does the same thing as the old code" to me that means, you know what the code currently does (or not), but probably not what it should to. That's a very common issue, if you write the test-after. Because you implement the stuff, you read the code, and have no idea what it's supposed to be doing.
The chance of introducing a new bug, or accidentally missing some tiny feature, it is way too high
That is also a very real issue, and exactly the thing you get if you do test-after.
In an ideal world you have a specification for the program and you can implement that as the test suite, and then factor your code to match the test suite
In practice, the specification is whatever the program already happens to do
That's not entirely true. Noone will give you a specification for a program. You're the programmer, you're in charge of developing the application. What you will receive, is wishes of your customer/client/user. What he wants to do, what work he needs to do, what's the benefit he wants. How it's implemented/designed/developed, is up to the programmers.
And thus, as a programmer - you must know what the program should do. If not, you're in big, big trouble.
Dnaron. You sound very junior. I don't know if you are, but it sounds like you're green and excited and have read a lot. I've been that person. 25 years ago.
Argumentum, ad hominem. Thank you.
again, that's asking him to not break the leg after the leg is already broken and there's blood squirting wildly. Context.
That's not what I'm saying, you're misreading my words.
He said he's got a class without tests. He wants to add a test. Thus - he wants to have a class with tests. The best way to have that is to write a new test first, and drive that class back from the test, then remove the old class.
The class is not written in stone. You can refactor it, update it, remove it and rewrite it.
there are risks to doing that, not to mention the amount of work it would take.
What risks? The thing you mentioned already, are that you don't really know what the class is doing, and you're scared of changing it, because you don't know what might happen.
And I agree, that's a bad place to be in.
thus black box testing
Working in a legacy software, that who knows what will do is stressfull.
yes.
But!
If you're in that kind of place, that you don't really know what the software is supposed to do, because it's so bad and old,
i'm sorry to say that, but you just aren't able to test it properly. You can't, it's not possible.
You can fool yourself into thinking you can blackbox test that,
and you can do that, but these tests will not give you any value.
they will be slow to execute, break when you refactor, won't catch bugs, won't improve your design, nothign.
i'm sorry, that's simply not true. they have value.
They have like 0.0001% of the value of the tests that would give you 100% if you wrote them test-first.
they aren't ideal, but we started from a non-ideal place.
If you have a legacy code with classes that you have no idea what they're doing, the only thing you can do to improve it, is to learn what the code is supposed to be doing.
Not what it does, but what it's supposed to be doing.
If you don't have that, you can do all the black box testing you want, nothing good will come from that.
ok, we get it. this is an extreme way to express your ideals.
Let me ask you this then. What good are tests, written by someone who doesn't know what the class under test is supposed to be doing?
that's wasn't the question. We know what the class is supposed to do. We don't have tests.
It's like a recipe for a pie, written by someone who doesn't know how to make one.
no one said, "I have no idea what the class is supposed to do"
He blocked me. So I guess you're on your own ned. Godspeed.
how can you tell that?
You can't react with emojis on a message of someone who has blocked you. It's a nice funny animation too. The entire window vibrates. It's pretty neat. Confusing as hell though.
no one said, "I have no idea what the class is supposed to do"
I think @proud nebula said that:
If you rewrite it with tests but have no way to verify that your new code does the same thing as the old code, that's a problem.
He suggested there might be no way to verify that the new codes does the same thing as the old code. To me - the only circumstance in which that is true, is if you don't know what the code is supposed to be doing.
He suggested there are some parts in the code, that do something - but we're not really sure why or how.
ok, then it wasn't expressed well. We know what the code is supposed to be doing.
If you do, then what's the problem with writing a new test first, then drive a class from it? ๐ค
have you done this with legacy code?
Yup.
You test-drive a small bit of the system, and you replace the usage of the old version with the new version.
(aka YOLO)
And you do that with every bit that you want tested properly.
the problem is that there can be unknown edge cases or side effects. It's not that we have no idea what the code should do. It's that we might not understand 100% of what the code does.
the problem is that there can be unknown edge cases or side effects.
Back again - if there might, that means you don't really know what the system is supposed to be doing.
you keep switching to "100% don't know."
in any case, @sturdy plaza has what they needed.
These "unknown edge cases" or side effects, that you speak of - if the application was written test-first, there wouldn't be any, because they would be covered by tests.
the problem is that there can be unknown edge cases or side effects
If that is true, that there are these edge-cases, then doing "blackbox" testing won't help you much eaither, because that kind of test won't illustrate those edge-cases.
again, you are saying "you shouldn't have gotten yourself into that situation in the first place"
As a sidenote, yes. But I'm also saying how to leave it.
and we are saying it might not be feasible to leave it the way you are describing.
If you want good tests, you do this:
- if there are edge cases, find them
- write a fresh test
- drive the class from that test
and test-first doesn't ensure that you've fully tested all of the behavior either.
That's correct, but at least you specified in test what the code is supposed to do. If you find a missing behaviour, it's trivial to add it , because the behaviour is fully specified in the test.
the code could have behavior the test doesn't specify
even if you wrote the tests first.
that's right, but if you're doing test-first, you can freely remove that code, because it's not needed for anything.
If it was neede, there would be a test for it, that would catch it.
If you want some behaviour from a software, you codify it in a test.
that's not true. You write a test, you write a class, you write another class that uses the first. the second class depends on the behavior the test didn't test.
yes, ideally. the real world gets messy. people make mistakes.
That's true, but in test-first applications, the mistakes happen once every year maybe. In test-after, you get mistakes daily probably.
and even if the mistake happens, it's caught very quickly.
The second class wasn't test-driven?
even if it was, the tests could have missed the secret behavior.
You're the author of the test. If you missed the secret behaviour, that means it wasn't needed.
i get it: tests first is a good way to write better software. but it's not a magic bullet.
I never said it was a magic bullet. I just said it was orders of magniute better than test after. Of course there are mistakes, but way fewer.
even if it was, the tests could have missed the secret behavior.
When you're writing tests, you're designing your system. If you want your system to do something, because it must, you write a test for it. You don't rely on secret behaviour to simply "emerge" and give you a feature. If you want a feature, you write a test for it.
So yes - there might be secret behaviours, but you can remove/change/update them, and if all of the tests pass, you're good to go.
In a legacy system, the secret behaviour that's missing might actually be critical - but you don't know it, you have no idea of knowing. If there was a test for that, you would know.
I get what you guys are saying. The system was in production for 10 years, some secret behaviour appeared that 50% of your users rely on it; and you're afraid of changing the code because of that secret behaviour, that noone knows about, but if you were to remove it, half of your users would scream. I get that, I've been there.
So due to that fear of breaking the secret behaviour, you don't test-drive your app, and do blackbox/snapshot testing; because that's the only think that you trust not to break your system.
That's a terrible code to work with. It's aweful. It's stressful, you feel the pressure, you can't change it much, because it's so fragile. You rename a variable and suddently the pagination doesn't work. That sucks.
There is no way out of this, other than to properly design your system. You need to start improving it, if the system is still to be developed for another years. To improve it, you need to know what it's supposed to be doing. You can introduce a small change, and roll it to QA's or a small number of people, to verify that you didn't break anything. You can push it to another enviornment, to ask someone who knows the system whether that part you touched still works.
I suggest you watch a video by Kent Beck "Forrest and a desert": https://www.youtube.com/watch?v=dtu9Ks2CN-U
Beauty in Code 2025 was a single-track full day IT-conference organized by Living IT, featuring six amazing speakers. It was hosted at the Malmรถ Live conference center on March 1, 2025.
https://beautyincode.se
https://livingit.se
Session 6 of 6 by Kent Beck (@KentBeck)
"The Forest & The Desert Are Parallel Universes"
So close and yet so far....
I don't see how this squares with your advice to rewrite the class against a speculative test suite
The truth is that you need both
And you need to be pragmatic about what you do, and what order
Most of the time, it's safer to take the approach of gradually building up tests surround existing functionality and building up tests around desired/specified functionality
Ok, someone tell him no one meant that the blackbox tests should be kept for all eternity. I think he thinks that's what we're all saying.
(I hate how stupid blocking in discord is)
I guess I'm taking the approach that blackbox tests can be an absolute fucking nightmare and sometimes you actually want to write unit tests for existing code
Like you really need all three
Peak TDD is when you drive the design of what you're building with clean interfaces, you fundamentally make it easy and comprehensive to test. If you care about actually testing that your code does what it is supposed to and consider it mandatory then you design it in the easiest way to get there. But I don't think tests after the fact in all situations are bad. You don't need TDD to write testable code, sure it's probably leagues better if you lean into it but some stuff is obvious with what behaviors it should have and it's fine to put ones on after.
But here a user wanders in and asks us how to write a unit test for an existing class. The answer can't be to spend several developer weeks or even months building out a sophisticated test infrastructure
IMO this is precisely what "bad" testing tools like mocking are good for. They let you add tests easily and quickly, which allows you to make localized refactoring easy, which has a positive snowball effect
@molten hollow I think where your approach makes more sense is in a big team
I do not get the sense that this person is in a big team but I suppose I should've checked
it's safer to take the approach of gradually building up tests surround existing functionality and building up tests around desired/specified functionality
"Safer" as in less chance of breaking secret behaviour? Yes.
"Safer" as in it lets you safely change the code, introduce new feature, fix bugs, refactor? No.
But I don't think tests after the fact in all situations are bad.
To me, writing tests after the fact has all the disadvantages, and no advantages. I'm sure, that if I jumped into your project, all tests I would've written would've been test first. There are ways to do that, that you can learn, and there are obstacles to that, but they can be dealt with.
You don't need TDD to write testable code
That's right, but if you rely on your judgment to create a testable code, that's just an untried guess. Sometimes it'll work, sometimes won't. And you end up with untested code, in some proportion.
I guess I'm taking the approach that blackbox tests can be an absolute fucking nightmare and sometimes you actually want to write unit tests for existing code
Sure, you might want. The question is - what do you hope to achieve by that?
But here a user wanders in and asks us how to write a unit test for an existing class. The answer can't be to spend several developer weeks or even months building out a sophisticated test infrastructure
I never said that. You can do that in a couple of minutes.
And you need to be pragmatic about what you do, and what order
I see how you call yourself "pragmatics" and me "idealistic", but maybe we can leave these unhelpful words? What you call "idealistic" to me is day-to-day job, that I do for many years now. Now, what you call "pragmatic" to me feels like being in the worst possible situation, that if I found myself in, I would like to quickly improve that. So, how about we keep it civil. If my you feel my advice doesn't cover some case, please bring it up in a peaceful manner, and we can talk about it.
I work by myself and I use TDD if what I'm doing isn't exploratory. Either you write and use and expect other people to use your code or application on the trust me bro model, or you add tests to your code to verify it does what you want, stuff can be as simple as a lambda. Design to make it testable, you don't even gotta write a test to do this either, even just "when in doubt write testable code" does wonders. The code isn't done unless there's tests, at least isolate the important pieces and do those, and in that case why not write them first.
@molten hollow "You can do that in a couple of minutes." The discussion will be more helpful if you acknowledge that it might take more than a couple of minutes. You are stating things in very stark terms.
Why would writing a test for a new class take more than that? ๐ฎ
If your code is coupled to some framework, if it's undeterministic, if it's got a lot of dependencies, if it's badly designed - yes. But these are all code smells. If you stop thinking about "how can I test this already existing class", and think of it in terms "I need a class that does X", then it's very simple, and very doable in a couple of minutes.
writing the new class and ensuring it still does what it needs to do will take more than a couple of minutes.
Never happens to me. Please, give me an example.
i need to get a job where you work ๐
I gotta be honest - if it took me hours to create a test, I would be helishly tired and would probably stop doing that. But it's quick and easy, provided you don't slow yourself down by code smells.
If the class is supposed to produce some sort of json file for example and there's a lot of stuff to make sure is right maybe where it's not worth bringing in the repository or other data access abstraction pattern.
That's speaking in terms of implementation details. Tell me what the class needs to do, what's the expected behaviour?
I'm just trying to imagine because it's been a while. I don't know tbh. Maybe someone else should provide an example.
So if your class is coupled to a framework (like uses spring anotations, laravel classes, ruby on rails stuff), has a lot of dependencies, lot of static/global state, is coupled to the inputs and outputs, is reliant on implementation details; then obviously this class in not testable and would take hours to test that. That's exactly the reason why working with it hard, even if you introduce blackbox testing to it.
And that's exactly why I'm suggesting you should create a new test, drive the responsibility of the class from the test, and then use it in place where the original class was used.
ah, to be young and naive again
Do you have any success stories of doing this? I don't ask to doubt your experience. I more wonder if there are certain situations where this approach does work, which is useful for those of us who uniformly recommend against it.
Many programmers spanning decades have tried to do this many times and failed, which is where the advice you hear comes from. I personally have tried it and it has only ever ended up in me working through the night super stressed out when I could've been sleeping or having fun, and/or having sheepish 1:1s explaining that I badly underestimated the work.
So there's a mismatch between your recommendation and the recommendations of people who feel that they have learned the hard way not to do what you recommend. Maybe that means you have a different and unique perspective.
Sure!
Do you have any success stories of doing this? I don't ask to doubt your experience. I more wonder if there are certain situations where this approach does work, which is useful for those of us who uniformly recommend against it.
I mean, I managed to do it in every project I joined. I do stumble upon everything you guys describe, big classes, no tests, secret behaviours, all that. What you experience, is real. But I try to address the issues and deal with them. I tried multiple things, and what I suggest here was just the stuff that works for me. I tried blackbox/sandbox testing, and it didn't do it for me.
Many programmers spanning decades have tried to do this many times and failed, which is where the advice you hear comes from. I personally have tried it and it has only ever ended up in me working through the night super stressed out when I could've been sleeping or having fun, and/or having sheepish 1:1s explaining that I badly underestimated the work.
That's definitely true, and that's a real problem. However, I found that it's not intrisic, it's not like we're bound to suffer. Most problems like that comes from very simple things, that we can change. Stuff like:
- we believe people must sign off deploys
- we believe we must deploy to all of the people at once
- we don't trust our developers and testers
- we can't work in pairs because it slows us down too much
- we should optimise for time spent coding, not talking to people
- tests are not part of a releasable, so they're not important
- my manager didn't ask me to refactor, so I can't do that.
- we must create the whole feature at once in a sprint, we can't split it in chunks
These aren't the only ones, but there are more. There are things/assumptions, that people hold that sometimes stop them from working in a productive way. The only way for me to address them, would be to find them somehow; either by working with your code or by talking to you.
Maybe that means you have a different and unique perspective.
It's definitely not unique, I met many people who do the same thing. Did you try reading "Working with legacy code" by Michael Feathers?
Argumentum ad hominem, again.
When I said before that "you don't know what your software is supposed to be doing", I'm prepare to accept that may have been a bit rough; people might feel personally attacked. But I didn't mean to attack anyone, that was supposed to be a diagnostic observation. Programmers not being fully aware of what the code is supposed to be doing is a real problem, that's need addressing. I just stated it, to put myself in a place where I can deal with the issue somehow. If I were to find myself in a project, where I don't know what it's doing, then that's the first, second and third thing I would need to fix. Testing would come later.
Test-first is useful precisely because you can't really do it, if you don't know what your software is ought to do. And I saw that when I suggested that, I got pushback - because some programmers actually didn't know that. So the thing now should be - not to skip the test-first, but to learn what the system is supposed to be doing.
fwiw, i wasn't pushing back on test-first. maybe you mean someone else was.
I had a feeling i'm talking to 3 different people, and randomlny one of them answers my posts ๐
Bro why i was temporality muted?
I didn't see any of your messages in past 2 days, if that's what you're asking.
How are you supposed to unit test, when you actually didnt implement the function first?
Property based testing aswell
You can define the function to return None, then write the tests until they all pass.
PBT is a method to find edge cases in the logic that you then add to the tests. Mutation testing finds what behavior the code has that isn't tested.
greetings, I have a bin where you can see 1 fixture and 1 test function which fails to assert due to MagicMock being compared to a string
following a pdb.set_trace, I wasn't able to return a string from the Mock object, in order to pass the test
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
i have exhausted all of the internets, leaving this channel for the last resort. If you google anything, I have tried it
even used GPT
Have you tried to work around the issue by avoiding mocking in the first place?
its a requirement...
i dont know how to mock a namedtempfile
otherwise, i am doing it by the book
What? Is this school work or something?
work
just landed this job and i have little experience with unit testing, otherwise am solid with python overall
Sometimes for a thing like a temp file you want a "fake", not a "mock"; they can be easier, worth looking up at least.
Why would work fight you on trying to.. work...?
Is the requirement that you can't modify the code to test until after you've tested it? Under no circumstances ever?
(Redefining the problem is what makes a good programmer imo)
Are you sure you're not confusing mocking with mandatory unit testing? Because mocking doesn't prove that your code works or necessarily meaningfully achieve what you want. Which in this case looks like you want to load a file from somewhere correctly, which it would be better to take a sample file or produce one (whichever is more convenient) and see if the function has the end result you want.
Does pytest monkey patch make any guarantees about what __enter__ returns?
__enter__ of what, monkeypatch.context()?
Whether test-entering the mock object in the tested function there would return the mock object itself.
The two different context manager makes things a little confusing ๐
It's actually not that hard. Because what do you actually need to test a function? You need to know its name, its signature and arguments, and you need to know what its purpose is. So for example, I can imagine a function that parses roman numerals. I don't have the function written yet, but I can write the first test like so:
def test_parse_roman_numerals():
assert parse_roman('I') == 1
assert parse_roman('II') == 2
assert parse_roman('III') == 3
assert parse_roman('IV') == 4
Having that, I'm free to implement it however I want. You don't need to know the function implementation to write a test for it.
These pure "mathematical" functions don't occur that often in real-life code though.
That's right, but there are easy ways to test-drive those too. If you give me an example of what's hard to write test-first for, I can show you can I would test-drive that.
There are things that are hard to test of course: UIs, concurrency, distributed systems, 3rd party systems. But there are tricks to side-step it, so that most of your code can be test-driven.
I wanted to help you, but I fail to understand what the code actually needs to do? ๐ค
Test function for sure and then try try try
Ah, so unit testing mainly relies on planning and defining the expected results beforehand, as I understood
Not really. You don't need to plan ahead. It's just about specifying the expected outcome.
defining the expected results beforehand that's correct. But planning part, not really.
Well, Iโm kinda confused. Iโm currently working as an apprentice in a software development company, and they really emphasize detailed product planning
Beforehand? Before what?
Black box testing is very much after the fact.
If the product planning is to do with production, then that's okay. If that's to do with software development, then that's a huge mistake.
You don't know the context. Context is everything.
For example, at JPL, your statement would be the huge mistake.
I meant beforehand as in defining the expected behavior before running the code, not before writing it.
If you write the tests before you write the function it's TDD. If you write them after the function exists by looking at the code it's white box. If you write it without understanding the function it's black box. If you use hypothesis it's Property Based Testing. If you use mutmut it's Mutation Testing.
All of it is "testing" or (less academically correct, but commonly used) "unit testing".
@proud nebula I'm sorry, but that message is quite misleading for new developers.
If you write the tests before you write the function it's TDD
That's necessary for TDD, but not sufficient. Not without other prerequisites.
If you write it without understanding the function it's black box. I
Maybe that's incorrect wording, but I think you mean "without knowing the implementation details"? Because if you truly meant "without understanding the function", than you have no business testing the function, if you don't understand it.
If you use hypothesis it's Property Based Testing.
~~You can use hypothesis in all kinds of testing, not just property based testing, what's the deal? ~~
PS: the author didn't mean "hypothesis", he meant "hypothesis library".
If you use mutmut it's Mutation Testing.
All of it is "testing" or (less academically correct, but commonly used) "unit testing".
Mutation testing isn't really testing per se. It's a tool to find holes in your test suite. You can't really find bugs with mutation testing, you can only find mutants (live mutations), that weren't caught by the test suite, but that's not a bug. So it's more of a test-suite-quality-control, rather than a testing strategy. You can't really catch a regression with mutation testing, and you can't drive the implementation (like TDD) with mutation testing. What you can do with mutation testing, is improve the reliablness of your test suite.
If you write them after the function exists by looking at the code it's white box. If you write it without understanding the function it's black box.
That separation is very artificial. I didn't work in a team that would use that distinction. In a proper system, you would never need to couple your tests to the implementation of the method, so what's the point this "white-box-test"? It may be a cool sounding name "white-box-test"/"black-box-test", but what does it really bring to the table?
I find most distinctions (especially within the testing world) are overdone.
@river pilot Some definitely are. Do you have some examples?
i think your point about mutation testing is overly picky. testing your tests is still a kind of testing.
I think it's a misused word, that leads you to believe it's testing because it has "testing" in the name: "Mutation Testing".
isn't it testing your tests?
I wouldn't say so.
ok, we can agree to disagree
I would call it "auditing your tests", "reviewing your tests", "inspecting your tests" at best.
is load testing a kind of testing?
"Testing" to me means finding bugs.
I would say so, because if it finds a problem with the application, that means the application doesn't do something it's supposed to.
And so security testing, performance testing, etc.
But I don't think mutation testing qualifies.
mutation testing find a problem with the tests, the tests aren't doing something they are supposed to do.
Same as check style, linters, code quality checks, sanity checkes, I wouldn't call any of them testing.
BUT: what value is added by saying "mutation testing isn't testing"?
how does that help anyone?
Yes, but the tests aren't part of the app.
this is what i mean about overdone distinctions.
Overdone distinctions are bad, but blurring distinct concepts into one also isn't helpful.
what would you call mutation testing instead?
You might as well call "code reviews" testing, because they can help you find problems, but that's not testing.
I think most precise word would be, "inspecting the quality of tests".
the more mutants are alive, the weaker the test suite.
But it doesn't necessarily mean that the app has problems.
"inspecting the quality of tests" doesn't quite roll off the tongue
True. Doesn't make it wrong, tho.
ok, we have different approaches to all of this I think
Notice, that if load test/security test/performance test fails, then that necessarily means there's a problem that requires fixing.
With mutation testing, that's not the case.
I mean, "testing" is just a category humans impose on practices.
you might add and remove elements from that category, if you'd like . the question is whether or not that's useful.
@river pilot If you want to say that mutation testing is testing, then things like:
- coverage
- linter/checkstyle
- code review
- cyclomatic complexity
would also need to be added to the "testing" category.
This is something I've been thinking about more and more: https://hachyderm.io/@nedbat/115245272539560254
Level 0: Testing is debugging
Level 1: Testing is to show the program works
Level 2: Testing is to show the program doesn't work
Level 3: Testing is to reduce the risk of using the program
Level 4: Testing is a mental discipline that helps us make better software
I liked this ladder of understanding the purpose of testing:
Level 0: Testing is debugging
Level 1: Testing is to show the program works
Level 2: Testing is to show the program doesn't work
Level 3: Testing is to reduce the risk of using the program
Level 4: Testing is a mental discipline that helps us make better software
Okay, that makes sense to me. But these 5 items, I would call by different name. I would just say it's software development.
Level 0: Software development is debugging
Level 1: Software development is to show the program works
Level 2: Software development is to show the program doesn't work
Level 3: Software development is to reduce the risk of using the program
Level 4: Software development is a mental discipline that helps us make better software
By that definition "coding == testing".
I mean, that's not exactly wrong. If you're using TDD, then basically testing is coding. in a sense ๐
so I guess that's all right.
I think it does because it's the only way to find out if your application meets your capacity requirements (or at least continues to meet them)
In my definition, testing is a falsification mechanism. If you can use something to falsify that the app/program doesn't work as it's supposed to, then that's a test.
If something can give you result: "fix immediately", then that's a test.
If it gives you "fix maybe", then that's an audit/inspection,something like that.
i don't exclude tests from "my program", maybe that's the difference here
i encourage people to include their tests in the total coverage percentage, for example.
For the same reason I wouldn't say that SEO audits for example are testing.
@river pilot I got it!
I would say that Mutation Testing would count as measurement.
That I would agree with.
But not every measurement is testing.
Regarding your "doesn't roll off the tongue" ๐ "measurement" sounds good.
I'm find with any kind of measurement giving intermediate results, and what not.
But for it to count as test, it would need to give a definitive response.
you also want the criteria to include what it gives a response about, I think
Basically, I would hate for a junior person to come to #unit-testing , and read "mutation testing is testing", and think that he can use that to do regression test for example - that would mislead him.
definitely these topics are intricate and subtle enough to need discussion
Definitely so, if separate concepts are being blurred into one, because someone feels like they're overdone distinctions.
no need to point fingers ๐
I like about testing, that's it's not open for interpretation. If an acceptance test, unit test, security test, load test, performance test, integration test, fails that must mean there's something wrong. You can't argue with it.
But with mutation testing, seo audits, checkstyles and stuff like that, it's up to the reader to interpret it.
and with the advent of AI... I'm finding that TDD is much more enjoyable than before. I write tests and let AI write the code to pass my tests.
i often find that failing tests require my interpretation to understand the failure and decide what it means and what to do about it.
I never found that. I write tests tdd-style, so I decide what the test means at the moment of writing, before any implementation.
When the test fails in the future, it's already determined what it means.
That kind of interpretation would be required with tests-written-after-the-fact I think, and in that case I think it's weaker, exactly because of that interpretation.
this is a repeat of a few days ago: your project seems very different than the ones I have worked on.
I think projects may have been similar. What differs is the approach I think.
i'm not interested in you telling me i've been doing it wrong.
I mean, if you were to chose, between:
- test, that if passes gives you confidence that everything works, and when fails, points you exactly where the issue is
vs.
- test, that you must read and interpret what it means, and different people might disagree about what the failure means
Which test would be better? Which more useful? Which would make developers work faster and better?
I'm not big on criticizing people ๐
I might criticize ideas, concepts, etc. but with people it's much more complicated.
i work on coverage.py. Its test suite checks that Python code is being measured properly by coverage.py. Python changes from version to version. tests fail. Is it coverage at fault, or Python?
Python version change isn't forced on you or on the project, right?
When you work on that coverage.py, you need to manually add the new version?
coverage.py's goal is to properly measure the next version of Python.
there are often changes in Python that need adaptations in coverage.py
So your goal is to be very up to date with python, but it's not like your project is immediately compatible with python change.
Unless python was dependent on your project, that's not happening.
You probably version the python version you run your coverage.py, right?
So you set it to be compatible with 3.14 let's say for now.
it supports 3.15 now, and runs nightly against the tip of main of CPython
So it relies on something outside of your control, then?
Well, then I would handle it the same was as any 3rd-party.
Like payment providers, etc.
The same way I treat stuff like stripe, oAuth login, any kind of integration with 3rd party.
i don't know what way that is.
Let's say, when a new python 3.16 comes and there is a number of ways it's incompatible with your coverage.py;
there is minimum time you need to update your coverage.py, so it's compatible again. Let's say that's 24 hours.
For that 24 hours, your goal is not met.
You might want to get that number down, to maybe 12-hours or something, but still. You don't control when python is released, and they don't depend on you, so you can only retroactively react to the changes.
So when python introduces an incompatible change, it's neither python failure nor your failure.
They're just incomptible.
i don't understand why you are talking about 24 hours, and you haven't talked about how the test failures need interpretation.
Because there's no right way to say "who's at fault".
Python is definitely not at fault, they just released an update.
yup. it needs interpretation. that's what I said.
Your project is not at fault, because it doesn't control the things it relies on.
Python is often at fault, that's why i test pre-alphas.
Even if it is, there's nothing you can do about it, can you?
Unless you're also a maintainer/contributor of python that can freely update it.
I can ask them to fix it: https://github.com/python/cpython/issues?q=is%3Aissue state%3Aopen author%3Anedbat
Sure, but you don't have control at whether they'll merge it, right? That's what I'm talking about.
those issues are mostly, "this is what i see, whose fault is it?"
It's not like you can merge it yourself.
I don't get what's being discussed? Are we talking about automated testing for compatibility with the latest python versions nightly or something?
yes.
@molten hollow wherever this is going: do you see how a test failure requires interpretation?
Yes. That's "interpretation" is exactly why I wouldn't call anything you're doing testing.
I see how what you're doing requires interpretation, yes.
But I don't think what your doing is testing.
That's just development of your coverage.py project.
so now failures in my test suite aren't "testing"? This is getting absurd.
What you're doing is developing your project.
What kind of test failure is it and why does it require interpretation? If it fails and isn't compatible with the latest version, isn't that a concrete test that says "we're invalid" or something, if that's our goals. Or is it like, it could break externally for some arbitrary reason, and it's flaky so it's not really a test?
And the part, where you solve compatibility issues with python, I wouldn't call that testing. That's integrating with a new version.
It's the same thing as if one of libraries in my application gets and update, and I want to update it.
And let's say it's got a breaking change, that I need to integrate to my app. That's not testing, that's just upgrade.
you started by saying that test failures shouldn't require interpretation. I showed you test failures that do. Now you say that isn't testing. I think we are done.
Your example is integrating your project with newer version of python. And that definitely require interpretation, yes!
I think what he's trying to say is that because it can't kill something, or falsify or like impact the release of anything it's not really a test???
I'm just saying a test isn't open for interpretation.
Like, if it fails.. then everyone involved will agree that it fails.
i agree my test has failed. now i need to determine why and what to do about it.
Point taken. if it fails, then everyone involved will agree that it fails and why.
feels like some sort of exploratory test, a test still. Are we compatible with the latest python? Fail = no. we've got our result, what do we do now?
I think what you're doing is conceptually the same as upgrading a library in my application. Isn't it?
this is my only point: you said test failures shouldn't require interpretation. Sometimes they do.
But how is what you're doing testing?
i write test_foo(), I ran it with pytest. it failed. What can it possibly mean to say it isn't testing?
this is a meaningless distinction.
Just because you can run it in a testing library, doesn't necessarily means it's a test.
this is absurd. i'm done.
@river pilot
def test_foo():
print('Hello')
Is this a test in your opinion?
You can take any code and put it in a testing library. Does this mean any code is a test? ๐ค
I can take any hello world app, any function, and wrap it in a pytest test. Does this mean it's now a test?
i hope you can assume that my tests are not like that.
I don't know what they're like, but when you tell me they're open for interpretation, then I'm prepared to say that they're not really test.
Test should be definite, deterministic and not open for interpretation.
I can agree that what your pytest "tests" are checking your integration with python, I'm fine with that.
I feel like we're being loose with "open for interpretation" in the example.
But given that you have control over your coverage.py, and not over python; then it's essentially an app + 3rd party integration.
Let's say I'm creating a webapp, that needs to allow the user to pay for services, and we use stripe to do that. Of course, stripe may be down, and in that case the website displays information "sorry, stripe is down".
Is this function "sorry, stripe is down" a test? Not it's not, it's just an information for the user that he service in unavailable. Yes, it tells you something, that you can use to do something, but it's not a test.
Sam as your pytest things. PYthon becomes incompatible with your app, you have something that measures it and lets you know about that, but it's not a test.
I think what you're doing, are measurements. And they can be open for interpretation.
PS: For them to become tests, you would need to narrow them down to true/false result with exact reason for failure. If they continue to be open for interpretations, then I'm afraid they're still measurements and not tests.
thank you for demonstrating my point.
You're now arguing against the common use of established terms.
You are also arguing that you know better what mutation testing is than the author of the most commonly used mutation testing tool for python.
You are extremely arrogant, and refuse to listen, and when you are corrected you argue minor semantic details that are themselves irrelevant until the other part gives up in frustration.
You haven't won any argument here. You have just demonstrated that you are impossible to have a meaningful discussion with, and that you will make every effort to not lose face instead of trying to learn. You have also demonstrated that you are willing to say absolutely idiotic things like "You can use hypothesis in all kinds of testing, not just property based testing, what's the deal? ". https://hypothesis.readthedocs.io/en/latest/ "Hypothesis is the property-based testing library for Python".
What will you argue next? That "python" isn't really a programming language?
At this point you are damaging this channel by your presence.
You are also arguing that you know better what mutation testing is than the author of the most commonly used mutation testing tool for python.
If you're talking about the author of mutmut, he created the tool, but not the practice. Mutation testing was coined by Richard Lipton in early 1970-ties. There were many tools created for that later, only one of which is mutmut.
You're now arguing against the common use of established terms.
From my perspective, that's what your doing.
You have just demonstrated that you are impossible to have a meaningful discussion with, and that you will make every effort to not lose face instead of trying to learn. You have also demonstrated that you are willing to say absolutely idiotic things like "You can use hypothesis in all kinds of testing, not just property based testing, what's the deal? ". https://hypothesis.readthedocs.io/en/latest/ "Hypothesis is the property-based testing library for Python".
Sorry, I didn't realise "hypothesis" is the name of the library. I thought you used it as a regular, english word. I understand you meant "If you use hypothesis library, then it's property based testing"?
You haven't won any argument here.
I'm not here to win arguments.
What will you argue next? That "python" isn't really a programming language?
Straw man fallacy
You're now arguing against the common use of established terms.
Fundamental attribution error.
You are extremely arrogant, and refuse to listen,
Argumentum ad hominem.
"He". The word you should have used is "you". And obviously I know that. I'm him ๐คฃ I have in fact found bugs using MT. So that falsifies your thesis above. It also does in fact help with better structure and it can show you places you need to refactor, again falsifying a previous statement you made in great confidence. It is pretty obvious you have never practiced MT.
hypothesis is a lib
Ok, but maybe the fact that the grammar doesn't make sense if you used the word in the normal sense should have made you confused enough to ask a question instead of being arrogant?
Also it's like the only PBT lib for python so if you had tried PBT at all you should know about it. Again: you have obviously read a lot of theory, and have much less practical understanding and experience.
"He". The word you should have used is "you". And obviously I know that. I'm him ๐คฃ I have in fact found bugs using MT. So that falsifies your thesis above. It also does in fact help with better structure and it can show you places you need to refactor, again falsifying a previous statement you made in great confidence. It is pretty obvious you have never practiced MT.
I actually practice mutation testing every week for my past couple of years; and everything I say in this channel is backed by practice.
I can agree that you found a bug while using mutation testing, but I doubt it was actually with mutation testing. Please, notice - mutation testing works by having a test suite, then you introduce a change in the software, and then you run the test again. The thing that mutation testing gives you, is it validates your test suite. I can agree that while doing that, you stumbled upon a bug and you fixed it? That works, but that's not due to mutation testing being used. That's due to having a test suite.
"He". The word you should have used is "you". And obviously I know that. I'm him ๐คฃ I have in fact found bugs using MT. So that falsifies your thesis above. It also does in fact help with better structure and it can show you places you need to refactor, again falsifying a previous statement you made in great confidence. It is pretty obvious you have never practiced MT.
Good for you, but just because you created a library that can be used to exercise this idea, doesn't really give you authority about its merit.
Ok, but maybe the fact that the grammar doesn't make sense if you used the word in the normal sense should have made you confused enough to ask a question instead of being arrogant?
I'm sorry, but most of the things you mention in this channel are... calling for my concern.
You clearly have a bone with me. I think so, because you're using personal arguments all the time, instead of sticking to the subject matter. I don't have a problem with you, as a person, but I don't agree with part of the things you say. I'm capable of having a reasonable debate, but not if someone uses argumentum ad hominem.
You've made Ned visibly frustrated. That is extremely rare. You don't know his personality so you don't know what a red flag that is.
Why would that mean I'm wrong? I'm just gonna ignore any kind of non-meritoric arguments from now on.
You ignore everything heh. It feels very Jordan Peterson to talk to you.
I didnโt follow all that, is the assertion that bad tests can be written, therefore testing is not inherently valuable?
I believe the assertion that was made is that certain types of tests don't count as "tests" if you have to look at the result to determine what to do, if they're not definite. In this case a test that runs nightly to check if pytest coverage is compatible with the latest python version or anything else that is not so definite in meaning.
Oh I see, yeah. To me that is a โlinterโ check not an in-codebase unit test
In the sense that itโs pure validation that doesnโt really inform the shape of your codebase
I'd of argued it's more like an exploratory test that is automated, send someone to go check if we're compatible with the latest external thing, if we're not go update our thing, do nothing, or go contact them to fix it, just now the exploration and getting that result is automated. But I wasn't really in this argument so idk.
i had wanted to stay out of this to collect thoughts and let the heat die down, but: the test we're talking about checks if coverage.py produces correct results on Python 3.15 (let's say). I don't see how that's a linter.
Oh yes, in that context itโs actually a domain concern, my above take does not apply here.
I thought it was just a belt and suspenders thing
But no in that case, itโs exactly what your tests are for
For RubySpec we added a bunch of โguardโ support so you could make tests not run on implementations that didnโt support that etc
But is it a check before you release the new version, if it fails you block it from releasing, or is it just a daily test on the latest version you don't officially have supported yet?
If you donโt support 3.16 yet I donโt see why it would be tested in master branch CI
That should be on the 3.16 support branch
I have a GitHub action that runs my test suite on the tip of main in the CPython repo, to get quick feedback about changes to Python.
Hmm, ok, and is it set up to block a PR merge for example?
no, it doesn't block PR merges. It's there to give me early warnings about changes to Python and a chance to discuss the change with CPython devs before it's cast in stone.
Seems cool to me, yeah. And they thought that was not a โtestโ? To me it just sounds like an integration test or functional test, not a unit one, but certainly a test?
i agree.
I wouldnโt actually be surprised to learn that Google has suites that take a probabilistic approach to deciding when to fail the whole โrunโ, given their scale
In my view an integration test tests your compatibility against either an network level mock of a system or a deployable/runnable thing that you want to fail the build if you're not compatible with, probably more specifically the actual thing.
I donโt feel integration tests have any necessary thing to do with networks, Iโve written plenty that test CLI tool interactions etc
Fair point. It could involve integration with other applications or mocks of them at the command line or anywhere else they communicate for real.
in my view (the view that started the whole discussion), categories of tests are talked about as hard-edged things, but they are often quite squishy. I'm fine calling these integration tests, or compatibility tests. You can also look at them as functional tests. But they are definitely some kind of test.
I personally agree with that. It very much strikes me as some type of test just outside of the normal stages of a pipeline.
people love to categorize things. It's useful sometimes to step back and ask, why are we categorizing them? How will the categories help us understand? Maybe we don't need to categorize as much as we do, or maybe we need different kinds of categories.
in this case, it's the exact same tests as the ones that run on every PR, but using different builds of Python nightly.
I personally live by three and with more loose/pragmatic definitions. If I write a function that is simple enough in scope to just write something to a file or put something in a directory, is it a unit test if I bring in the filesystem? It's not just in memory so some would argue no but pragmaticicly I consider it a unit test because it's always there. An integration test for me has several meanings because of what other people refer to it as, like an end to end test, a functional test, etc. And then there's acceptance tests, aka functional. But I do think categories help when pragmatism is applied and I pretty much just translate/infer what people mean when they say one or the other. And then there's testing ideas out manually or manual testing and feedback though it doesn't fail anything
this does sound like a practical approach, but even here: i'm interested to know, what does it matter whether it's a unit test or an integration test? When does that question come up day-to-day? What do you do with that information?
and I'm not trying to say, don't categorize. I'm trying to explore why we do it.
To me itโs always about coverage (real coverage not C0)
what is C0?
The coverage you get from 'code coverage' tools, where it only knows which lines executed, not which actual semantics took place.
C1 is per-statement, C2 is like per-side-effect or something? I can't remember the exact hierarchy
I don't know of a tool that does lines but not statements?
I think I do it to set the context of what I'm doing. If I say integration test it sort of indicates in my mind we're testing compatibility of some sort of real software, maybe bringing in testcontainers for a database or other application. The lines can get blurred but to me I always think about it because it sets the scene, acceptance test comes up because they prove that my application works, and I can stretch the definition and use synonyms like end to end test, or even integration if that's the goal someone is going for if it's integration with the client to the server to the database, and so on. If I write code that is supposed to get today's weather I'd unit test with a mock that code using that, unit making me think mocks of that sort, it proves that thing works but it doesn't tell me getting the weather from weather.com as we know it will work so I'd have to take my actual implementation and see if it works against the API or website data as we know it, testing integration. If we say acceptance I'm thinking how do we test the application as a whole? In my mind it comes up every day for these three.
this seems useful to me: the category helps set the goal and the approach.
It doesn't seem to me that everything can be categorized (at this point in time as we know it) and is black and white. In the scenario that you're not compatible with python 3.13 and it should fail the build, it's basically an integration test in purpose but running unit tests, and when it comes to writing it I can see how categorization might not really be helpful.
What is mutation coverage in that hierarchy? Is lower numbers more or less coverage? :)
i really appreciate your pragmatic approach to this.
Haha great question, I guess thatโs like fractal dimensions between the regular tiers? ๐
100% mutation coverage means all behavior of the code is tested.
Yeah, you mean mutation testing where every non-keyword gets eventually torqued? I agree
But Iโve also never seen a full passing suite like that without major exclusions
Like, this is conceptually true but you canโt run every possible mutated program in a CI suite
We mutate some keywords in mutmut.
Neat
People keep asking to run mutmut in CI, which I always tell people to stop doing because it's stupid :P
It's a tool to fix your test suite, but running it in CI all the time is a huge waste of resources unless you are very careful how you do it and think about it deeply.
What do you feel the right way to integrate it into your teamโs workflow is?
Just as needed to audit the test suite?
Interactive use. Mutmut3 has a super nice interactive mode now too, which makes this very enjoyable compared to before.
And highly selectively where it's critical only, or you care for some other reason.
whats this channel for
I run MT on iommi sometimes out of hobby level professional pride. But that's not extremely rational use of time :P
"Everything related to testing your Python applications and libraries, and discussion of testing as a whole."
For example, I wrote this about one scenario when I used MT in code that I wanted to run in production: https://kodare.net/2021/04/04/safe_number_parsing.html
ok
Oh this looks really good, Iโm gonna post this a few places
Thanks
I have a lot more good content on my blog imo. And it's all pretty short :P
Prepare forโฆ promotion.
I'd love some. I am horrible at marketing heh.
I mean.. just look at iommi, which imo would absolutely revolutionize web development if people embraced it. And I'm not seeing very many users at all :/
That needs different marketing and docs for sure
I think the docs are pretty good. Or at least there's a lot of it :) I have considered some kind of marketing landing page but I'm no good with design.
Last time i took a look i found it a pain im happy to do a brainstorm with you to make the docs better
I would very much like that. My focus so far has been mostly on correctness and volume. But I'm hitting hard diminishing returns on that. You can't go beyond all examples working :)
I really need to figure out when and how to point people read the Equivalence page. That is really key to make things click.
what page is that?
https://docs.iommi.rocks/equivalency.html If this thing, that __ is just a general purpose short form for nesting hasn't clicked, you're going to have a tough time with iommi.
oh, i thought we were talking about docs for mutmut.
ah. Well, those might need improvement too heh. MT in general needs more hype (than PBT :P)
https://iommi.rocks new hero page online now at least. So a bit of a marketing push I hope.
If you have any more feedback, I'd love to hear it. It's super hard to write docs when you are so deep in it...
I think ill need to shedule that im a bit stretched between too many things atm
No worries. Just curios if you had something off the top of your head.
Nice! My coverage dropped from 94.370% to 94.366%!
(I deleted code that had been covered, but was no longer needed)
How I would like test decorators to work:
@test
@test.params(a=(1, 2, 4), b=(100, 150))
@test.params(a=(8,), b=(50, 100))
@def test_add_two_numbers(a, b):
assert myadd(a, b) == a + b
And this would expand into 32 + 12 = 8 tests
@pytest.mark.parametrize() does this
No, it does it In different way, using strings and no automatic cross-product as I've seen.
testdata = [
(datetime(2001, 12, 12), datetime(2001, 12, 11), timedelta(1)),
(datetime(2001, 12, 11), datetime(2001, 12, 12), timedelta(-1)),
]
@pytest.mark.parametrize("a,b,expected", testdata)
def test_timedistance_v0(a, b, expected):
diff = a - b
assert diff == expected
Look at how parametrize is implemented. You can implement this yourself.
That's what I have, per above ๐
def params(**kwargs):
def deco(fn):
return pytest.mark.parametrize(
kwargs.keys(),
itertools.product(kwargs.values()),
)(fn)
return deco
?
I'm mostly against the separation of the test parameter name and its values. Having a comma separated string is also not very convenient.
You don't have to comma-separate, you can also provide a tuple of strings
you can stack two parametrize decorators, and they cross-product.
Is that what you want though?
maybe I don't understand what you want. I thought it was cross-product.
i guess your example was not cross-product
Right, cross within one decorator, "addition" between
But I didn't know that the parametrize decorator cross produced at all, that's good to know.
On another note, can someone explain hamcrests's logo? ๐
it looks like a person surfing down a pile of ham? Which makes sense for the name, but why the name?
Any good documentation for writing tests(Integration, E2E, Unit) in a none TDD architecture?
What is "none TDD architecture"? ๐ค
Wouldn't it be better to just explicitly define the tests?
What do you mean? Sweeping test parameters is a great way to get good test coverage (and avoid missing some case). If possible you can also randomly select test input (like the hypothesis package does)
@twin shale i don't know of a test decorator that works the way you showed, but it seems like it should be possible to write.
Wouldn't you get just the same coverage by having explicit tests?
you don't use parameterized tests? They are very handy for reducing repetition.
I did in the past, but I noticed that they can make your design weaker.
Imagine you have two cases, that appear similar at first - so you write them as one test, and parametrize it to "reduce duplication". But then, after working with the code a bit you discover they aren't really the same idea, so you should split it. Maybe you split them and have two tests anyway, or maybe you're lazy and leave the tests like that, but without the difference covered.
Real design is about organizing expected results, information flow, compartmentalization of the system, information hiding, separation of concern, and reduction of information. That's what gives your programs a real edge.
Just joining two test cases into one with parametrization is nothing but syntax sugar, not a very helpful one at that imo.
Sure, it might be misapplied, but it's very common to have a dozen data scenarios for the same test. I wouldn't want a dozen tests.
My take is that if you have "dozen data inputs" that really means it's just one test, and one data input might suffice.
You don't get any real benefit from including more; and if you do, that really means it's a different test case worthy of a dedicated method and a proper test name, because that's a different behaviour.
Here's an example where i've parametrized: https://github.com/nedbat/coveragepy/blob/master/tests/test_misc.py#L85-L107 . They check different behaviors of the one function, so i could have made separate tests with individual names, but the body of the test would be the same. This let me be more concise while covering all the behavior.
Rule of thumb:
- if it's one behaviour, one data input will suffiice, no need for parametrization
- if it's multiple behaviours, it's better to split them into multiple tests, no need for parametrization
i guess we'll have to disagree on this.
VARS = {
"FOO": "fooey",
"BAR": "xyzzy",
}
@pytest.mark.parametrize(
"before, after",
[
("Nothing to do", "Nothing to do"),
("Dollar: $$", "Dollar: $"),
("Simple: $FOO is fooey", "Simple: fooey is fooey"),
("Braced: X${FOO}X.", "Braced: XfooeyX."),
("Missing: x${NOTHING}y is xy", "Missing: xy is xy"),
("Multiple: $$ $FOO $BAR ${FOO}", "Multiple: $ fooey xyzzy fooey"),
("Ill-formed: ${%5} ${{HI}} ${", "Ill-formed: ${%5} ${{HI}} ${"),
("Strict: ${FOO?} is there", "Strict: fooey is there"),
("Defaulted: ${WUT-missing}!", "Defaulted: missing!"),
("Defaulted empty: ${WUT-}!", "Defaulted empty: !"),
],
)
def test_substitute_variables(before: str, after: str) -> None:
assert substitute_variables(before, VARS) == after
If I understand correctly, all of these cases are different behaviours.
yes, but it doesn't make things clearer to have ten functions with separate names but the same single-line body.
Are you sure about that? ๐
yes
Let me show you how I would've created that method.
besides, you're missing some test cases, which I would've included.
I'd be happy to add the missing cases.
That's the point, in the current form of this test,you can't. Give me a minute, I'll show you
the error cases are handled in the next test
(well, one error case)
Parametrized case for one input. Interesting.
Well, still. Let me show you how I would've written that, and what cases are missing for me, given I would test-drive that.
Actually, the more I read those examples, the less I understand what it's actually supposed to be doing ๐
there's a docstring on the function
For example, I read those parametrized test, and have no idea what's the ? is doing.
I should be able to understand what the function does from the tests. Doc strings can lie, if you forget to update them.
what would i put in the test to make it clear? A comment?
I'm glad to show you an example, but first I need to understand what your function does.
So, i I understand correctly, if the string doesn't have a placeholder, it's returned as is, correct?
yes
Also, if there's a superfluous variable in the dictionary, that's supposed to be ignored or throw error for missing placeholder?
ignored
okay,
format of the placeholder are either $Foo or ${Foo}, and it doesn't change anything, that's just notation, correct?
yes
okay, ? question mark inside the braces means... what exactly? I can't tell.
Also, format ${Name-default} means you either read Name from the vars, or if it's missing, then insert the default, correct?
yes
What the question mark means?
Something like that would be my tests:
I sent a screenshot to say that the content of text isn't that big compared to your test, but you gain a lot of information and clarification.
def test_substitute_variables_in_text_with_their_values():
text = substitute_variables('Hello $Name, ($Age)', {'Name': 'John', 'Age': '14'})
assert text == 'Hello John, (14)'
def test_variable_has_shell_format__simple_placeholder():
assert substitute_variables('$Foo', {'Foo': 'Bar'}) == 'Bar'
def test_variable_has_shell_format__braced_placeholder():
assert substitute_variables('${Foo}', {'Foo': 'Bar'}) == 'Bar'
def test_simple_format__given_value_not_exists__returns_empty_string():
assert substitute_variables('${Missing}', {}) == ''
def test_strict_format__given_value_exists__passes():
assert substitute_variables('${Foo?}', {'Foo': 'Bar'}) == 'Bar'
def test_strict_format__given_value_not_exists__fails():
with raises(Exception):
substitute_variables('${Missing?}', {})
def test_default_format__given_value_exists__returns_value():
assert substitute_variables('${Foo-default}', {'Foo': 'Bar'}) == 'Bar'
def test_default_format__given_value_not_exists__returns_default():
assert substitute_variables('${Missing-default}', {}) == 'default'
def test_encode_dollar_sign__with_two_dollar_signs():
assert substitute_variables('$$', {}) == '$'
def test_malformed_placeholder__double_braces__is_not_substituted():
assert substitute_variables('${{Foo}}', {'Foo': 'Bar'}) == '${{Foo}}'
def test_malformed_placeholder__non_letter__percent_sign__is_not_substituted():
assert substitute_variables('${%Foo}', {'Foo': 'Bar'}) == '${%Foo}'
def test_malformed_placeholder__non_letter__digit__is_not_substituted():
assert substitute_variables('${5}', {'5': 'Bar'}) == '${5}'
def test_malformed_placeholder__not_closed_brace__is_not_substituted():
assert substitute_variables('${Foo', {'Foo': 'Bar'}) == '${Foo'
We can agree to disagree on this point
- First test shows an example of what the function is for. Reader knows exactly how to use that function and what can it be good for.
- Each case is described with given input, no need to include
"Default","Strict"into the test data. test names should be test names, test data should be test data. - Passed in values are specifically design for each scenario. Also, minimal values are inserted into the function to illustrate the behaviour.
- You can easily copy each of those methods, and adapt to your needs. With parametrized test, if you'd like to do something slightly different, then that's not so simple.
- Case of superfluous arguments is now explicitly tested.
- Test names specify what needs to happen: "is_not_substituted", "returns_default", "fails". Even if test data isn't descriptive enough, test names will tell you what the intent behind the test is.
Now, I propose to you @river pilot show any programmer your test, and my test, and try to guess which version will be easier to understand for him.
- With tests like that, you don't really need a test doc, because the tests contain all the information about the function you need. Plus, tests are actually executed and asserted, while function doc, it's possible the become out of date.
BTW, this question mark, wouldn't it make more sense to be other way around? Like ${Foo?} should returns empty string, and ${Foo} would throw for the missing value? ๐ค Just a suggestion.
I'm not sure I understand the difference except you would just write a lot more boilerplate or duplicated code if you write them as separate tests. And it would be much harder to get an overview of what you test. It's really unfeasible method to not use parametrization for some scenarios.
Having 1 test test expand to >100 is not uncommon.
This is just not true. But it all depends on what you are testing.
@molten hollow thanks for the examples. I find long test names like that hard to read. I'd rather have a comment than underscored sentences. The question mark behavior is borrowed from the shell: this function implements a subset of shell variable expansion behavior. That is mentioned in the docstring.
Sometimes a test just test that two implementations give the same result.
Sometimes a test has no documentation value but is just there to avoid someone accidentally making a mistake.
Sometimes you need test coverage with can mean testing each bit in a 64-bit integer.
Sometimes test space is just so big that you can't feasibly write manual tests to cover it.
Sometimes sweeping or exhaustive testing is not an option either - it would take too much time to cover. Here random testing is the way to go.
I agree, good docstrings with human language and ascii art if needed ๐ค
My point in all of that, is this:
- parametrized testing are just a syntax sugar for multiple testcases. You don't lose much by spliting the parameters into multiple test cases.
- good tests are descriptive - they should tell you what you're after. Parametrized test tend to hide that, while explicit tests tend to express that
- if it's hard for you to write multiple test cases, try setting up snippets/live templates in your IDE, that you can type just
"test" + Tablike in PyCharm, that will insert a test snippet. - it's better to understand specific behaviours, if they're explicitly stated,
- with parametrized tests, you need to conform all your test-cases into one form with parameters, with explicit tests, you're free to express them in such a way that makes it clear for the reader.
thanks for the examples. I find long test names like that hard to read. I'd rather have a comment than underscored sentences.
@river pilot No problem, you can shorten the names if you'd like:
def test_substitute_variables_in_text():
text = substitute_variables('Hello $Name, ($Age)', {'Name': 'John', 'Age': '14'})
assert text == 'Hello John, (14)'
def test_simple_placeholder():
assert substitute_variables('$Foo', {'Foo': 'Bar'}) == 'Bar'
def test_braced_placeholder():
assert substitute_variables('${Foo}', {'Foo': 'Bar'}) == 'Bar'
def test_value_not_exists_returns_empty():
assert substitute_variables('${Missing}', {}) == ''
def test_strict_passes():
assert substitute_variables('${Foo?}', {'Foo': 'Bar'}) == 'Bar'
def test_strict_fails():
with raises(Exception):
substitute_variables('${Missing?}', {})
def test_default_returns_value():
assert substitute_variables('${Foo-default}', {'Foo': 'Bar'}) == 'Bar'
def test_default_returns_default():
assert substitute_variables('${Missing-default}', {}) == 'default'
def test_encode_dollar():
assert substitute_variables('$$', {}) == '$'
def test_malformed_double_braces():
assert substitute_variables('${{Foo}}', {'Foo': 'Bar'}) == '${{Foo}}'
def test_malformed_percent_sign():
assert substitute_variables('${%Foo}', {'Foo': 'Bar'}) == '${%Foo}'
def test_malformed_digit_():
assert substitute_variables('${5}', {'5': 'Bar'}) == '${5}'
def test_malformed_not_closed_brace():
assert substitute_variables('${Foo', {'Foo': 'Bar'}) == '${Foo'
You can add comments to it if you'd like, but please note that these test cases, they' aren't all the same structure.
The way I see it, turning that back into parametrized values would lose information that is otherwise helpful.
Some people might say that you can save 4 or 6 lines, by condensing them back into parametrized test cases, as if it's the lines of code that makes it hard to maintain. It's not. It's the thinking. The harder it is to think about the function, the harder it is to maintain.
And it's way easier to think of these behaviours if they're separate test cases like that.
But! There are things you can do to make them better:
- you can brainstorm the design with other programmer (pair programming)
- you can try to test-drive the implementation - lose coupling, and behaviour driven (tdd)
- you can try to reimplement the same thing the next morning; chances are you're gonna come up with better functions
- try to reimplement the same functionallity in other programming language, just to shift your mindset. It's not uncommon to come up with different solutions when changing perspective like that
- try to explain the function to non-programmers; sometimes non-programmers tend to ask questions which will change your point of view drastically, giving you a chance to redesign your approach
- Try to only implement the features your program needs. For example, if your other functions only use a fraction of those features, try to keep them and remove the rest; chances are a simpler solution is waiting for you
As you can see, there are a lot of approaches which would improve the overall quality of the function and tests, and parametrizing inputs IMHO just isn't one of them.
These sound like real cases, however I would solve each of them without parametrized tests. If you give me an example, I can show you how I would approach them.
It's probably a preference thing in terms of this project. @river pilot said he personally finds them easier to read this way and I imagine the other people working on it do too.
I'd personally write them like yours and use the function names to communicate exactly what is being tested to make it all the clearer, and I'd probably use TDD to do it and organize by one clear thought or behavior at a time.
And I argue that this is not feasible if you need hundreds of test.
Why hundreds? Maybe we can take a look at a specific example?
I can see why some people prefer the separate tests. I think it's a tradeoff, and we are choosing differently. To me, it's easier to see the behavior being tested in the compact parameterized form.
I supposed you might think that, because you wrote the thing. Someone who doesn't know the function, might have a different opinion. I'm wondering if you left the project for a year and came back to it, after forgetting what was there, would you still prefer the parametrized one or the split one. I would wager that you could get back to it quicker if it was split. Maybe we'll get to settle the wager one day in the future ๐ Who knows.
@params(x=range(64))
def test count_set_bits(x):
val = 1 << x
assert count_set_bits(val) = 1
did you read the docstring? You understood almost all of what it did.
Yes, but as I mentioned, when code changes, the docstrings tend to become obsolete. There are cases where the comment says one thing, and the code says other. I actually have legacy code like that ๐ That's why I don't treat them as a valid "source of truth". The code will tell you the truth and the tests.
the names of tests are also comments that can be out of date.
@twin shale Sorry, just to be sure, it's count_set_bits(x), right? Not count_set_bits(val)?
That's true, but yet:
- I see very often code and comment where they disagree
- I don't see very often testnames that disagree with the test code
๐ค ๐
Oops, yes
Okay, so this test.
@params(x=range(64))
def test_count_set_bits(x):
val = 1 << x
assert count_set_bits(val) == 1
@twin shale And please tell me, what is the intention behind this test? Are you trying to test-drive the count_set_bits() function?
I want to the test the implementation. What do you mean by test drive?
Because to me, it would seam that the only valid implementation of count_set_bits() is actually the code you have in your test, which is val = 1 << x. At which point, the test you have just checks that the code that you wrote is the code that you wrote.
Both can suffer the same problem I think. But it's probably down to how detailed the names or descriptions are, and how likely behavior is to change. If you have an addition function and you start with a docstring it could be a pretty fine source of truth at least for how it should be. I read somewhere from Oracle for Java and documentation that the docstrings (or whatever they're called there) should lay out a testing plan sort of.
This is not true. The implementation is a physical chip.
Or fpga. Or at least not python code.
Oh, so you're trying to test hardware?
Why not? ๐
So let me understand, you're creator of the hardware and trying to test it using pytest?
Or are you a user of a hardware and just need to check whether it works?
I thought we were talking about software development.
This also seems to misunderstand the function?
count_set_bits(0b1100_0000_0000_0000) would return 2
Somewhat both, I'm a hw engineer
I'm approaching this whole debate from the perspective of a software developer, who uses unit tests to improve the quality of the software ๐ I wasn't aware we're migrating from it into the hardware world.
To me, parametrizing tests, when it comes to application development, has the same flaws as using for-loops in tests.
I don't see any reason software shouldn't be equally black box tested as something actually hardware
Well, I guess that's right. I would treat that hardware same as any 3rd party software.
I have a feeling we're migrating away from the original topic, which was parametrized tests ๐
as I understand it, @twin shale's job is to test that hardware. It's not 3rd-party.
But I'm not sure where the domain of "unit" testing ends.
2nd party?
Imagine being an architect and then testing both the blueprint and the delivered physical house
whatever party it is, do I have it right that you are testing the hardware?
Well, I'm sure I entered the debate with parametrized tests when it regards software development (like applications, tools, functions). How you're supposed to develop hardware, is something I don't know much about.
I can't talk with you about that, because I don't know enough :/ I only have opinions about creating software, and testing software that uses hardware, but not testing hardware itself. Sorry.
does it matter if it's hardware or software? The job is to test this Python function that counts the number of bits in an int.
but he said it's not a python function.
he said it's a hardware chip.
Software might be simulating/emulating hardware. I don't think it's much different testing-wise.
let's say it's a Python function.
I mean, it could be. You don't know the implementation
Or the implementation might be super complicated, and you just want to assert some high level stuff
or the implementation is actually in hardware, but you access it through a Python wrapper so you can use pytest. It's black-box, we don't know.
let's say it's a Python function.
If it's a python function, and he wrote a test like that:
@params(x=range(64))
def test_count_set_bits(x):
val = 1 << x
assert count_set_bits(val) == 1
Then, to my eyes, the only feasable implementation of that function is the code val = 1 << x itself. So you have a test that just duplicates the implementation.
you are misunderstanding the function. It counts the number of 1 bits in the int.
I mean, it could be. You don't know the implementation
Or the implementation might be super complicated, and you just want to assert some high level stuff
or the implementation is actually in hardware, but you access it through a Python wrapper so you can use pytest. It's black-box, we don't know.
Well, from the perspective of a software developer, I will say it makes a difference whether it's you who's responsible for creating and maintaining that function, or whether it's something off-the-shelf that you don't maintain. Like, these require two different approaches to testing.
Dnaron being dnaron
that's not productive
Ah, that's right. Sorry, I mistook the output from the input, you're right. I retract my previous statement. But still, I wouldn't use parametrize tests to test that.
@twin shale I haven't fully been following this as i've been fixated on initial the paramterization argument. I think that in the case of checking if specific bits are set (1-64) having one test with parameterization that covers that range is valid. I wouldn't write them all indvidually personally, but I would do a few specific behaviors with it. I'm NGL I probably don't understand what's being tested, but assuming we're testing a bit counting function, i'd do something like this (ignore missing implementation, I'm too dumb for this right now).
def count_bits_set(number: int) -> int:
"""
Takes a number and counts the number of bits that are 1 in it,
returns 0 if None are set, otherwise the number of them on.
"""
return number.bit_length()
class TestCountBitsSetFunction:
def test_should_report_zero_bits_set_for_number_zero(self):
assert count_bits_set(0) == 0
def test_should_report_one_bit_set_for_number_one(self):
assert count_bits_set(1) == 1
def test_should_report_right_number_of_bits_set_for_reasonable_range_of_single_bit_set(self):
# implementation with for loop or paramterization here
def test_should_report_i_do_not_know_how_many_for_negative_one(self):
# This is python, I genuinely have no clue.
pass```
I think we're very off topic, because the whole point started from a question, posted by someone who has no interested in hardware and bits, at all.
Never mind, it was you ๐
Well, maybe I acted too roughly. I assumed you were creating a software system, and saw you tried to use parametrized testing, and I tried to advise you against that.
I don't think it matters. You can find analogies in higher level system.
But seeing you're creating hardware, that may be valid, I don't know.
let's talk about the python function. how would you test it?
Yea, I don't think so. I did use parametrized tests in the past, as a software developer, and with enough development, I alaways regreted that decision, because it failed me somehow.
I guess it might be appealing only in narrow situations, when you think counting lines of code makes a difference that much.
But seeing how you're creating hardware, I don't know, you're trying to cover the whole range of inputs? I guess that makes sense.
I'm sure I would never test software like that, by "covering the whole space of inputs". To me such test would be redundant, and thus harder to maintain.
it's a python function: how would you test it?
I would start from test, and drive my implementation from it. So I would need to know what my "soon-to-be-written-function" needs to do.
I think it makes sense to cover like a subset of them. Do bits 1-64 work correctly? In python they can go up infinitely so it would be impossible to cover it all. But specific behaviors, if it's 1 you get 1, if it's 0 you get zero, if you do 1-64 all at the same time, you get the full value of those ones, basic scenarios to prove that it works/isn't broken.
I'm definitely not counting lines of code. In my mind I'm very pragmatic in this. It might that the cases a haven't appeared. But I do think it's needed in some cases. I agree not all tests should be lumped together as parametrizations. There is a sweet spot.
it seems like you're ignoring parts of the discussion: it counts the number of 1 bits in an int.
But you're talking about your hardware,where you want to cover a range/space of inputs, correct?
For example:
Write 10 tests with 1 test data input each.
Or wrote 10 tests with 100 test inputs each for 5% more time spent?
It might be extremely good value. Time is a scarce resource.
Here's an example of where I used paramterization the other week where I don't think I'll regret it:
class TestDiscordSnowflakeFunctionResultBits:
@pytest.mark.parametrize("test_value", [0, 1, 4095])
def test_should_have_given_increment_value_in_bits_1_to_12(self, test_value: int):
snowflake = make_discord_snowflake(increment=test_value, worker=9, process=2, timestamp=2)
assert snowflake & 4095 == test_value
@pytest.mark.parametrize("test_value", [0, 1, 31])
def test_should_have_given_internal_worker_value_in_bits_13_to_17(self, test_value: int):
snowflake = make_discord_snowflake(increment=5, worker=test_value, process=2, timestamp=2)
internal_worker = (snowflake >> 12) & 31
assert internal_worker == test_value
@pytest.mark.parametrize("test_value", [0, 1, 31])
def test_should_have_given_internal_process_value_in_bits_18_to_22(self, test_value: int):
snowflake = make_discord_snowflake(increment=5, worker=5, process=test_value, timestamp=2)
internal_process = (snowflake >> 17) & 31
assert internal_process == test_value
@pytest.mark.parametrize("test_value", [0, 1, 4398046511103])
def test_should_have_given_timestamp_value_in_bits_23_to_64(self, test_value):
timestamp = make_discord_snowflake(increment=5, worker=5, process=0, timestamp=test_value)
assert timestamp >> 22 == test_value
I want to do that when I test software as well. How can I otherwise know that it works? Imagine count_set_bits actually being a python function.
Okay, so the thing I'm cautios about is that there is this effect in programmers, where we often do stupid things ๐ me included:
- we code stuff that's not needed
- we forget stuff
- we create the first thing that pops into your head
- we delay feedback in learning
- we do complicated stuff, instead of simple
- we sometimes do something complicated because we want to feel proud of it
we do all that , because we're human. Good engineering pracitces allow us to overcome those issues. I try to do that.
So when I need to create some software, I try to test-drive it (practice TDD), to not allow myself to create more code than necessary, and make sure it's simple enough, and not overly complicated.
And in order to do that, I try to start from tests, to make sure I don't fall into those traps.
It should count how many bits are set. That's it.
ok, so how would you test it?
we don't have to talk about this if you don't want. I've asked four times, and you aren't answering.
What I would absolutely need to do, is to verify that's actually what I need to do.
Because it could be an x/y problem.
I would go from higher level tests, to lower level,
if, by test-driving it i would find myself needing that function - great.
if not, i would not write it at all, and thus not have a need to test it.
but!!
Let's say we vierifeid it ๐
And I verified that I need it.
And I'm going to write it and test it.
And it's me who's creating that, not 3rd party.
Here's how I would do it:
Ok but you are missing a HUGE shortcut here. The task IS to count bits.
Okay, we're verified it. I just wanted to make sure i'm not falling into x/y, but if I'm not, let's go.
Here's how I would test-drive it (because I'm creating that function, correct)?
I would write my tests in such a way, that allow me to learn.
And also, I know I will make mistakes, so I need to write tests in a way that allow me to make progress in iterative steps, and when I found out where I am wrong, I can correct it.
I will start simple
Count bits. Let's say there are no 1 bits:
def test_there_are_no_1_bits():
assert count_bits(0) == 0
That's very simple, I can implement that very simple,
then, let's say there is 1 bit on the first position
def test_there_is_1_bit_at_first_position():
assert count_bits(1) == 1
That's also very simple to implement,
then, what's the next simplest thing after that? 1 bit on the second position
def test_there_is_1_bit_at_second_position():
assert count_bits(2) == 1
and also two bits
def test_there_are_2_bits():
assert count_bits(3) == 2
Also, handle the other cases
def test_handles_signed_integers():
assert count_bits() # here, put in information about whether signed integers are handled
def test_fails_for_unsigned_integers():
I've already all 64 bits, what are you wasting your time with? ๐ค
Also, put in floats,strings and None, assert that the function behaves properly
def test_function_fails_for_none():
with raises(Exception):
count_bits(None)
def test_function_does_not_count_bits_for_floats() # or does? i don't know, you tell me
assert count_bits(0.00) == 0 # what's the expected outcome here?
You see, i'm not treating these tests just as a regression test suite,
I'm treating it as:
- a tool to learn
- to assert what I already know
- what I want my function to do
Just assume input data is 64 bits ๐
and also to:
- gather feedback on ym design
Great, write a test for that!
def test_fails_for_128_bit_input():
with raises(Exception):
count_bits(here_insert_data_with_128_bits) # maybe 2^128 or smth like that
That's the whole point, don't do "just assume". Assert that as an executable specification.
If that behaviour is not met, the test suite should fail.
There is no other input except 64 bits. Don't worry about other inputs, they are not represntable.
What do you mean "there are no other"? ๐ So what the function should do when someone passes it?
The compiler or testing framework would crash on type error
I personally rely on the type hint for these and assume any caller code is wrong/that's on them, so float and None would be eliminated by not having the signature say it takes anything but int
My point being, you can very well test that without using parametrized tests.
You just specify the important cases that are enough to understand the function as a whole.
If we're talking about the software system, of course.
And then?
And that's enough to have good tests - as long as we're talking about the software system.
For hardware, as I said, I'm no expert, so I will not answer that.
They way you are doing it you get some examples. But some things need to be exhaustively tested.
That's definitely not true
Let me tell you that:
parametrized tests are a way to explore an input space. In software development, I will argue you never need to do that. In hardware, maybe you do? ๐ค I don't know.
That's why I'd personally start with the specific examples, then cover a range of them exactly like you did with parameterization.
It's not about exploring or making examples of the behavior. It's about verifying.
@twin shale We're talking about software development still? Or hardware?
Sure
Then I will argue you never need to explore an input space like that.
Indeed. If you make too high level or too parametrize tests, you might miss some concrete examples, which is not good either.
Okay, then maybe I'm wrong. Can you give me an example where you might use that for software development?
I don't understand how you can advocate for blackbox testing and then say that hardware and software would need different testing of the same algorithm?
Because for software, I will work with tests TDD-style. And for that, you don't need parametrized tests.
And for hardware, I don't know how they test and maintain that. I just don't know, so maybe they need to explore the input space? I just don't know how they do it.
If you think about it from a TDD perspective it becomes natural, and I say this and have to admit I don't always use TDD. But if there's a problem that is new to me, or annoying to write, sometimes with bit math, I always do. I'd take it incrementally for learning like @molten hollow said. Does it count 0 bits correctly? Does it count 1? Thinking of it now, what's negative one supposed to be, what should happen then? (in your case if you don't expect anything besides 0-64 maybe that's fine). So naturally I'd get those specific examples, then I'd wonder, does this actually work for all the ones it needs to, or a reasonable set to prove that it will, which is where paramterization or a for loop would come in.
Example: Test a sorting algoritm. A quick way is to use an existing sorting algorithm, exhaustively feed it with with permutations of (1, 2, 3, 4 5) and check expected output.
Okay, so your goal is to write a new sorting algorithm, correct?
or property-based testing, which is like parameterization on steroids
Doesn't matter what the bits represent. Signed or unsigned integer or float or unums
Ahhh. I see back to your specific example. You're taking a function to count how many bits are on in memory, and then testing if the bit shifting logic works on the hardware as expected, and doing parameterize to test a bunch of values at once. In an environment where at runtime or compile time it'll most likely be impossible to use anyway. We assume that function already works by the time that test code is executed.
Well, here's the thing:
- if you only care about achieving your goal, which is getting your data sorted, you could just write one test
def test_my_pokemons_are_sorted():
assert pokemons(['Pikachu', 'Alakazam']) == ['Alakazam', 'Pikachu']
and you implement that using built-in or off-the shelf sorting in your programming language. You don't need to reimplement it, you can just use what's there. That's one case. But I know that's not what you're after.
- if you really want to create your own sorting algorithm, how would I say you need to approach it: I would say you need to test drive it. Drive the implementation of that sorting algorithm in test. You're not going to invent this as a big idea in your head, you will need to iteratively design it and come up with it, solve the edge cases, work on it. So I think you should use tests as a stepping tools to it. How exactly those test cases would appear is up to the creator of the algorithm, because the tests necessary depend on the nature of that algorithm. So in order to create good tests, you need to know how the algorithm works. If you want "black box", then just go with the first approach with one test.
There is even an example in "Clean Craftsman" by Robert Martin, where he uses tests like that to write quick-sort, if you're interested. And he doesn't use parametrized tests ๐
Not sure I follow. And I don't have more time to lend to this discussion right now. Even though I think this is very interesting! I have some things to complete before bed.
I know you aren't saying that one test of a two-element list is sufficient to test a new sorting algorithm. And I know you like black-box testing, so "the nature of the algorithm" can't come into play. This seems like a non-answer to me.
I will say, that's the reason I dislike topics in #unit-testing channel ๐
Because one person asks something (like @river pilot in this case), I answer, and now @twin shale doesn't agree, so I answer, and now @river pilot doesn't agree, and I'm constantly between two people ๐
One single test with one single test input to test a sorting algorithm?
As I mentioned, it depends if you're using a function that already exists (like sorted() in python), or you're writing your own.
If you're writing your own, then you need more, obviously, as I described above.
I prefer "gray box testing" (I'm not sure that's a used terminology): Test it as a black, even though I know the internal working. And ALSO spend extra effort testing the parts I know are more complicated (and likely to contain bugs) more thoroughly.
That sounds like a good strategy, if you're writing code first, and tests after.
But you get way better results, if you write tests first, and code after.
you aren't making sense. You need zero tests for the existing Python sorted() function
If you mean, you don't need to test it, you're right. Why would you, someone else created it.
I would start it with the simplest example to see if anything about it works, then add more behaviors/scenarios for it. So just a two element list would suffice, and the sorting algorithm would probably be wrong at that point, not do much, then I'd add more with different inputs.
But if you want to test that your function (like the one with pokemons), returns values sorted, then you need a test to test-drive that usage of sorted() function.
Although you can find issues in python builtin stuff as well ๐
Sure.
But you see, it's not your job to test the tools you're using.
Although, i'm not familiar with sorting algoirthms tbh, don't most of them have the same behavior but just do it faster/slower?
there can be bugs in ifs, fors, variables, compilers, all that.
No this is just a case of contributing to open source ๐
If you're working on that software, sure.
But I guess we're working on our own projects, mostly. And for that, we don't need to test that kinds of things.
Sure
It varies. But mostly it's a team effort and we do both in parallel
If I can suggest something, try to test-first more, and test-after less.
Would be awesome if 100% was test-first.
Most of the problems with testing just disappear, if you test-first.
I don't agree ๐
Okay, my point being, the resulting software often is better designed if you do test-first.
Hence, I advise it to people.
Unless I'm wrong, in which case I'd be happy to hear a counter-example.
I have one counter example. Iirc In an interview with Dave Farley, Dave Thomas (one of the authors of the pragmatic programmer book) said that as sort of an experiment he went like six months without writing tests and didn't have any issues, on account of writing the code in the same testable way as when he did.
Oh, yes! I saw that.
I think it's possible to write testable well architected code without tests at all
But I personally wouldn't do no tests.
Yes! Interesting observation.
On the topic of tdd we also have this discussion:
https://github.com/johnousterhout/aposd-vs-clean-code
@pulsar oracle So because he internalized writing testable code, he got the benefits from testable code and lose coupling, without needing test first.
Very interesting video, I agree.
Pragmatic programmer will be read hopefully starting this year
There's also the situation where you don't know where you are going. TDD can be a massive waste of time then.
However, my internal sceptic about this, because I bet he created that application in a familiar technology, and probably familiar domain and ecosystem.
Would he have achieved the same results, in a new programming language, new framework, new eko system, new domain? ๐ค Now that would be interesting to see!
Maybe he would, who knows.
Very early on I learned when in doubt, write testable code. I think that some people are less error prone than others, maybe off by one errors or paying attention is more likely, familiar with the language, etc. If we're taking just same language. For me, I try to avoid frameworks and if I'm using one I'd separate important logic and build it separately (can't imagine not having tests to get feedback but the design is always pretty solid).
Yes, I can take a look at that. What am I looking for there, specifically? ๐
Because I know a lot of people who tried TDD, and now actually prefer it. I don't know anyone who would try it, and then not like it.
I only hear that TDD sucks from people who didn't actually do that.
@pulsar oracle Are you in Dave's Farley discord server?
I'm not. I didn't pay for the patreon or anything, I didn't even know they had a discord server. I'm just a massive fan of the channel, and now lmax, and Martin Thompson, and the extended universe ๐ญ 
The fee isn't that much, and it's a valuable content. I can send you some screenshots, if you'd like.
It's just tangential to this discussion, and interesting.
I'll consider it
There's this. I can't even figure out what I want sometimes until I've written a lot of code to get something going. But in my case I'll probably just rewrite with more tests.
But I bet it's not like you don't know anything.
You do have some starting points.
There are always ways to assert what you already know, and there are ways to take slices.
Heck, I did TDD in languages I didn't even know yet.
Some time ago, I started to learn Rust, never seen that thing in my life, and my first line was a test.
I think it is exceptional wherever you know what you want and not how to get it. You get to specify the perfect thing then go make it happen and get feedback. So any new language it's the first thing I'll aim for If I need something done.
you know what you want and not how to get it.
Isn't that always the case? I'm not trying to be argumentative, but what are cases where you don't know what you want?
But If I start with zero clue how to approach or test something I just want to get my hands dirty and see where things will go.
Yea! That's correct. And why not with a test?
I do that do, get my hands dirty with something new.
Last week, I started to create a platformer game with new game library.
I didn't know, and didn't know what it can do.
I just wanted to learn that library.
I started with a test.
Maybe it's a mindset problem. I was writing a harness for a discord bot the other week, I didn't know it was possible or how to approach it outside of an existing library doing it. But I had almost no clue how to test it or what the interface should look like, I just jumped in sort of making a domain model of guilds, users, in memory state, and just building up wayyy too high, figuring stuff out. I absolutely could have done this with TDD, it's not figured out and I am redoing it from scratch with it now and making better progress. But It wasn't clicking in my mind, I wasn't thinking how I'd test it, it was too much at the start, and required so much thinking.
Once I had an interface/design/sense of where it's going I could start driving development like this. There was also a ton of data i had to put in, stuff i had to look up, this time I started it minimal and got chatgpt to generate it and filled it in, testing event handling with a helper class.
Is the top half of this picture the result of TDD? ๐คก
Wouldn't say so.
Not sure why would someone ridicule tdd like that
i think it's because TDD is often explained in stark terms that don't match reality. TBH, it's something you've been doing here. If you like, explain how you would use TDD to test an IsEven function. Don't get distracted by whether we need that function, etc. Just: how would you test it?
I am using pytest-ruff. Is there a way to disable E401 just for when Pytest runs Ruff, i.e. I don't want tests failing, just because the import list is not sorted. That will be enforced as part of pre-commit, but I don't want it checked each time pytest is triggered from inotify while I am making changes to the code
You can pass --ruff-config to pytest to use a custom ruff config file (in which you can disable E401)
okay, cool. That'll do for my case I guess. Would love a pyproject.toml approach, but ok for now
I would like to know if BDDs (behave) scenario based testing is used widely in the industry standards? Or pytest based unit testing is sufficient?
From what I've seen, BDD is just a bunch of regex inserted in a middle layer of your tests that makes the output slightly prettier, and everything else worse.
Yea, from a casual look behave looks the same. A bunch of English that is never checked but is asserted as fact. How could you possibly trust that?
Yeah i also feels it's a waste of time and effort just to keep things simple for the business perspective
And that business perspective is an illusion anyway. No manager will actually read that and if they do, it is a lie anyway.
i have not seen BDD being used much in practice. One place I worked had some of it, and it ended up that the devs had to write the tests anyway, and had to spend time either trying to understand the middle layer of translation, or adding to it.
I also worked at a place that had some. We removed it. There was another team at the same company that had more than we did. They also removed it for the same reason: it was a cost for no gain.
As a developer and for the practical aspect you'd probably be more interested in acceptance test driven development. That's where you write tests that ideally use the language of the problem domain to test the entire system. The level of abstraction can vary, they don't have to be for the business, if you're building an HTTP server for example, they're obviously largely not, and you get more feedback about weather your application is fit for release in continuous delivery terms. If you're developing a server. You can use pytest, unit test, any testing framework. BDD for what most developers who use it care about is just end to end testing happening to have each step written in gherkin and wired up to code and it's a pretty bad way to do it.
You're supposed to write it in a way that says what should happen while leaving how separate. The developers are supposed to write the tests or even both but anyone reading can see an example usage of the system and be like "yea, that's right". The same way a developer can see a good unit test with asserts and be like yea, that's what that function is supposed to do. The idea originally formed by Dan north as a way to describe TDD to developers without mentioning the word test, where test case classes are specifications and individual tests are scenarios iirc. But you don't need gherkin at all to do it and Dan North just comments the given when and then parts among normal pytest code.
It sounds great. It didn't work out in practice.
That's fair
Ive seen BDD be attempted before, with plenty of frameworks. Robot Framework was the worst by in large.
You get to a point where you have to write so much custom code, you think to yourself, "what am I doing here?"
I'll also say I work in a place that's heavily requirements based. Lots of IBM Jazz and DOORS. Its god awful. We're supposed to be capturing tests into those requirement systems and it just doesn't happen. its just there to lookup the customer signed off requirements but nothing ever goes back in
Part of the issue, the requirements are setup for programmatic access and even if they were the language used in the various processes has diverged so much from the requirements (e.g. dual use of a word) that its almost impossible to keep aligned
It's also been attempted at the lmax exchange without any frameworks. Everyone wrote the tests and specifications using normal junit and an internal DSL. The business analysts would be sat down with an IDE to write the tests and there would be massive reusability with methods like register, login, create an instrument, wait for something to happen, verify an email was sent, etc. And every developer for every feature or bug fix would create an acceptance test even independently of stuff like user stories. Maybe this BDD stuff is error prone in practice like agile, but there's a huge practical part of it being missed where it's a synonym for acceptance test driven development, testing that the entire application is fit for release, usually in terms of the business with the same terms and language (in any programming language).
Utilities to write a simple DSL in Java. Contribute to LMAX-Exchange/Simple-DSL development by creating an account on GitHub.
In main.py:
from voiceconversion.RVC.RVCr2 import RVCr2
...
def initialize():
...
some_var = RVCr2(settings)
...
In test_mytest.py with pytest:
from myapp.main import initialize
I'd like it to use MockRVCr2 (that's implemented in mock_rvcr2.py) instead of the real RVCr2. How to do that?
It's possible to override before the import, but all the linters are unhappy. Is there a cleaner way?
mock_module = types.ModuleType("voiceconversion.RVC.RVCr2")
mock_module.RVCr2 = MockRVCr2
sys.modules["voiceconversion.RVC.RVCr2"] = mock_module
Where do you call initialize? You should mock things where they are used, so you want to patch main.RVCr2
I import initialize in test_mytest.py and call it in the tests.
you should try mock.patch("main.RVCr2", MockRVCr2)
Your paste is too long, and couldn't be uploaded.
please delete this.
for ?
it's at the very least off-topic for this channel, and obfuscated code is usually suspicious. Please delete it.
its all about fun training
this channel is about automated testing.
and yea its test encoded script
it's not about testing. Please delete it.
i swear its testing for training
it's 99Mb of encrypted code. It's not about automated testing. This channel isn't about testing people, it's about testing code. Please delete it.
hi i just got into fuzzing and may have overfuzzed some functions
how do you not do that?
What does that mean? That you just wasted time on it?
That now my test suite of very basic methods takes about 3 minutes to ocmplete
I don't think you should run fuzzing always. Mutation Testing, Fuzzing, Property Based Testing, these are all methods to find tests to add to your test suite, not something you run as part of the test suite itself.
Yeah, am working on lowering the fuzzing inside my test suite
every time i run it i find more bugs so its actually so far been useful
No, you missed my point. There should be literally zero fuzzing done in the test suite itself. You run that separately once in a while to find tests to add.
Think of it as programming itself. You don't "do programming" while the function runs in prod. You do it before :P
and you should call it Property Based Testing, not fuzzing, so people don't get confused imo :P
what is the difference ๐ค
I've calmed my testing somewhat but its not great yet
94% coverage though
which is excellent
Fuzzing is a super broad concept. It could mean almost anything. PBT is much more specific.
But yea, PBT is commonly thought of as a form of fuzzing. But so is Mutation Testing.
And those are VERY different
It's a method to find what behavior your tests don't test.
Ahhh
It can't find behaviors your code doesn't have but should have though. PBT can sometimes help with that.
Yeah I should probably use a tad of mutation testing
I'm partial towards MT personally. PBT is hard and seldomly applicable imo. While MT is a ton of work and always applicable.
.gh repo onerandomusername ghretos
https://gh.arielle.codes/ghretos/tree/main/tests this is what I have so far
I just shut my computer down otherwise I'd make some other changes
I plan to write tests for my discord bot soon https://gh.arielle.codes/Monty
it's hard to write a blog post about mocking without pulling in pages and pages of advice about how to write better tests. This is still a draft, so thoughts are welcome: https://nedbatchelder.com/blog/202511/why_your_mock_breaks_later.html
An overly aggressive mock can work fine, but then break much later. Why?
huh
?
I could have tested this example without mocking and without it being finicky. If I have that function I'd want to know that my settings are loaded correctly from a settings settings json file in a directory, just not making it explicitly the home directory.
I don't get why we want to avoid opening a real file, it's basically exactly what you want to test and you don't need to mock. Most people can afford it and there's the tempfile.TemporaryDirectory module and the superb standard library for working with paths (os path join and so on). I personally prefer to put as much as I can in an area it can be tested to assure theres less of a chance for it to go wrong on user error.
i dont understand. how would you test it without creating a file in the user's home directory?
I meant I'd change it to still search in a directory and load json from a file (that function specifically, the others I'd probably change to use a loaded version of the settings and not care where from), and just change the directory, then for testing, I'd put like /tmp/wherever and it would load from /tmp/wherever/settings.json, and I'd know when given the home directory it would load it pretty much as expected. No mocks.
right, a kind of dependency injection
Exactly, though I'm personally hesitant to call it that with primitives, it doesn't bring up the right idea in my head (very arbitrary tbh). But yea inverting control of where the path for the directory containing the config comes from.
But usually people use dependency injection to get out of testing anything real and in my experience (in regards to anything I want to find out that stuff actually works) it just moves around where I have to test stuff at.
if your point is "why use mocks at all", then this is what I meant above when I said, "it's hard to write a blog post about mocking without pulling in pages and pages of advice about how to write better tests."
Yea that's more or less what I was saying. And fair point. I'm not against mocks in general, just that specific example, which is fair given the context it's written in.
i'd read those, tbf
thesedays i avoid mocks+monkeypatches if i can - allowing for dependency injection and validated fakes is so much more joy
I'm doing procedural generation, I'm using files as input and both random.seed and np.random.seed are fixed. Output keeps changing. Any obvious ideas I'm missing?
those are the typical causes. is the time a factor? Can you link us to the code?
no, it's not public, thanks though, I'll try to go step by step and see where things start changing.
when you find out, let us know.
I actually can share what the basis was, it's a pretty cool project but I rewrote a bunch. https://github.com/oargudo/orometry-terrains Doesn't help for debugging though...
I was recording timings for functions for optimization purposes and put that into a dict and returned that. Obviously the timings are always slightly different and that changed my control hash.
The other thing I found before that that started the whole thing was that I didn't have the numpy seed set, so that's probably the first thing I fixed and then I got stuck on this other "problem".
got it. glad you found it.
Yoo who is active??
there are lots of people here. most will wait for a question or topic to chime in.
Yoo thanks for that answer
If you are testing if something is even, you likely are testing a programming language implementation detail.
For obvious reasons, this is bad. You are coupling to a detail. When you use a prog language, you trust that the language creators tested their own code.
And you do not need to use gherkin or even a BDD test framework to do BDD. You do not even need a unit test framework to do TDD.
If there is some confusion with the middle layer of making your test cases, this is not really a testing problem. More of an organization one now.
Meaning, the design was probably always bad. And it also exists in the prod code, not just test code.
i don't see why isEven() means you are testing the language? People often have production code that wants to know if something is even.
If you are doing input validation, that is fine to test, but you have to name and test the case for that input validation. But purely just testing for evenness, probably coupling to the lang now.
it seems like you read too much into that joke image. it's not directly about testing.
i think it's because TDD is often explained in stark terms that don't match reality. TBH, it's something you've been doing here. If you like, explain how you would use TDD to test an IsEven function. Don't get distracted by whether we need that function, etc. Just: how would you test it?
Ok. Joke is joke. You asked though.
I asked about testing isEven(). If you had that function in your code, why wouldn't you test it? Sure, it's easy to imagine it's a one-line function, but that line needs a test.
@timber anchor ^^
Perhaps it would be better for you to answer why it "needs a test"
because i could have written the line incorrectly:
def isEven(x):
return bool(x % 2)
how would you decide what parts of your code need tests and what parts don't?
Gherkin is just BDD at the functional testing level meant for business analysts. I've been doing BDD exactly as Dan North explained in his original article at the unit level for two years now. Some people are rubbed the wrong way by the gherkin part and miss the original approach entirely. If you use a unit test framework to do any type of test you can format them the same and ideally have them say exactly what you want it to do while saying very little about how (the level of abstraction varying). I personally heard it from Dave Farley and got it immediately then got confused by other material and wasn't sure I was doing BDD because of 99% of explanations being for functional testing but the article plus other explanations once again clarified it, I'm not perfect, some tests are definitely crummy and fail to exactly read as specifications or scenarios but I'm definitely most of the way there especially recently.
If you're using BDD you don't test it. You say you want a function that will tell you if the function is even. Then you do a few scenarios using it. You name the test case after what is being tested or specified then name methods like sentences that specify what it should do.
TestIsEvenOdd:
test_should_detect_uneven_numbers_as_odd
What should it do? So give it an odd number have it return false because that's what you want, see it fail because it doesn't meet the specification, go make it be true, or should it really be true? If it shouldn't maybe another developer or person can be like, nope, need a different behavior (as understandings change even in the code we think we want). And so on. You test any function you want, and you test any code that uses it for broader behaviors, maybe I'd test this twice as part of something broader that also has even odd functionality. It's in the first part of this article and I've been doing it for years and now do it for acceptance tests, my naming is just better.
I had a problem. While using and teaching agile practices like test-driven development (TDD) on projects in different environments, I kept coming across the same confusion and misunderstandings. Programmers wanted to know where to start, what to test and what not to test, how much to test in one go, what to call their tests, and how to understan...
You sound almost religious when you defer to authority that much.
this is a lot of words, but i don't see how it's bdd. You said, "check that the function returns what you want." That's how all tests work.
It's exactly the origin of BDD though and part of it, it is testing but explained differently, and to understand specifically test driven development. It's the flavor of it and how you think about it weather at the unit level or functional. They're basically identical in what's being done but if you use BDD there's a heavy emphasis on specifying what it should do while leaving out how it does it and making it very sentence like.
i like the "specify what it should do". sentence-like doesn't really appeal to me.
maybe it's a bit unfair, but "BDD" now is associated with intermediate tooling that many people find unproductive.
type fewer words
I'm not sure how to comment on how fair it is or not tbh because I don't think I've seen it explained very well outside of the original article and no one has tried to clarify it until wayyy after the fact and when I started testing largely like this I was like "oh BDD" then , wait no??? Still TDD? Then got it again yesterday when I saw the article.
Yea, it's unfair though for sure because it's a nice way to think about it and I've been writing code in little examples basically in a very BDD/functional testing style and I like the approach. It's fair if anyone doesn't want to do it, what's really important is weather something has tests or not be it before or after, English like or at all.
we definitely agree that the important thing is to have tests.
I would go one further and say you dont absolutely have to test that either.
It would make sense to approach it from the end user first, and then you can explain why you needed to test if something is even based on the business/user needs.
Example: Equipment must be inspected on matching parity day.
If for some reason some business rule forces you to check for evenness in a unique way then this is going to be a test, yes. But its more of a contract test to assert that your types can do modulo arithmetic, which isnt about testing if something is even anymore.
If it isnt already in your prog lang, then ok you can test drive it, but if you are just converting types with builtins and then doing the modulo, its already tested. There would need to be a far better reason than we just need coverage
I think we agree then: if there's a function called is_even(), then we should have tests for it.
or maybe not: "converting types and doing the modulo" is code you can get wrong. You should test it.
@timber anchor #unit-testing message
Its an internal detail bool(x%2) we may as well check if it is even valid code (which in something like Java it is not).
If for some reason python changed how it evaluated the truthiness of this, your test will fail despite not changing any of your code.
Closer to language paranoia.
Your tests can inadvertently cover scenarios for evenness/oddness and avoid explicitly testing the output of modulo and how python interprets ints as bools (also known as trust and know the language).
Its not a bad sanity check to assert something is even or odd, but i would not formalize such a thing as an actual unit test.
Its more of just assert as sanity check, instead of unit test.
i'm trying to understand what you are saying. Let's say there's a python function:
def is_even(x):
return bool(x % 1)
How would you "assert as sanity check"?
Sanity check for your own programming language understanding. If you do not know what this does, and you need to use it, then go ahead and assert it. Or just read the docs.
"go ahead and assert it": can you be very specific? What code would you write where to do that?
does "assert it" mean write a unit test?
do you mean do a manual test of the function?
No. Just validate your own learning. If you call learning manual testing, then sure
so you have no protection against future changes to that function? I would definitely write a test.
@timber anchor how do you decide what functions to write tests for?
I think what they're saying is you're probably developing something where that functions usage is an implementation detail and not something worth testing directly, and that one probably isn't something you'd write in practice with an expression in the language and all anyway.
can we just accept that this function exists? The question is how you would approach testing it.
Yea I accept that and would assume the end user is a consumer of like a helper library or the standard library.
you mean the caller of this function? How does that affect your approach to testing it?
It doesn't. It's just that the context changes everything because i might write it as a private function as part of something broader and not test it directly.
Like I said, indirectly. If you have a higher up business rule or scenario, and change this function those tests should fail. Example: testing that a particular customer support on-call rota strategy involving an every other day rotation behaves as expected.
But the test doesnt have to cascade that high up either. You can unit test the behaviors closer to isEven
true, but why not write a test specifically for this function? That's the "unit" in unit test.
This is a common misconception. I recommend looking into this yourself for now
can you explain your perspective to me?
you don't have to if you don't want to.
I wont go into much more detail because there are people who speak on it far better than I do, and it has been done... but the unit is closer to behaviors (perhaps even so far as to say specifically end-user behavior) instead of functions.
Hence BDD
ok, this is where we started, I understand. It's a different approach to testing.
I can give an actual example of this if you want that would explain why someone practicing TDD might skip testing is even or why it's confusing from a certain perspective.
sure
number of days covered in on-call with an every other day strategy.
test: leap year vs non leap year.
it will find problems with is even or odd quickly. 366 vs 365 days
calculating leap years doesn't involve even/odd, but: why wouldn't you also want a unit test for is_even?
I want to recover album photos from a game. There's a cache directory and these photos are jpegs and of a certain resolution. To do this I need to check if a file is a jpeg, does it have the signature, think of it analgous to, is this number even? My real logic is take a directory and find photos that are a jpeg and of a resolution.
So I'd do find_album_photos_in_directory(directory_path: str) and I would place actual photos in that directory and be interested in, will it find a jpeg photo with the right resolution? Will it skip one of a different format like a PNG with the same resolution, each individual function like is even or even some functions to read this information are important, but I wouldn't test them for this problem, I'd hide them. So what I'm saying is, if you need to check if something is even and you have that function, you just skip it, even if you make it easier and less error prone. But I agree completely if we're talking about making a library of functions, a standard library, something where that exact code is what you want people to consume.
It might not need to, but it can. One team can work 182 vs 183 days. Or on leap year, 183 vs 183. The importance is in asserting which team gets 183.
You are testing leap year vs non leap year though, not isEven
I totally understand testing the user-visible functionality of the product. I agree that's a good thing. It's also good to have tests at smaller granularities. They can include cases that are harder to do at the higher level.
I'm not sure why you think it would be bad to test at the lower level also.
That is the lower level. You can use fakes for all of that if you like
If you are referring to my example anyway
I personally think that it does depend. If I have something I need to do all over the place it would be nice to have a tested function available that I can rely on to do it (even if it's going to be tested in other places indirectly, and I find that when I do this sometimes it makes it easier to put everything together later). But in the example I gave I avoid it because it's like planning ahead, not being so iterative. One time when I did this I started writing these different functions like for is_photo_a_jpeg(file_path) and it felt like I was making publicly available functions to be relied on that I don't need when I should have been starting with a function that solves the problem that I want and testing it. I personally think about the issue as public vs private code for it.
by lower level i meant a test specifically for is_even()
maybe in your example there would be a test for is_leap_year(y)?
the test would be something like two_people_divide_days_in_leap_year
so you wouldn't test is_leap_year directly?
it is an on call rotation, should be at least two people
so you wouldn't test is_leap_year directly?
I think calendar does this already for you. You are testing a dependency?
You wouldn't test leap year checking logic?
Maybe. Not entirely sure
I would. If I have business logic that depends on a year being of a certain type and doing something if it is, I would extract it into it's own interface like YearTypeChecker with a method to check what type of year it is, then write an implementation and test it there, for the core busines logic I'd use a mock and make it say a certain year is a leap year or test it with my implementation of the checker injected as a dependency.
If i have business logic dependent on if the year is a leap year, then I put that in it's own class and make a mock to test it. The actual thing would need to use an implementation that checks that leap year logic is at least somewhat right, which is where integration tests come in, or at the very least push it off to some acceptance tests.
i'm not writing a date library. I have a helper function that tells me if a year is leap or not.
(I'm not talking an individual function at this point by the way, just business logic and testing that logic somewhere)
I think if I'm writing code I want to know that it does something when it's a leap year and when it isn't and I don't care about the logic for it actually being one, so it could take a callable to check, or a class that serves that purpose and you could just lie with some given input to check what you want. And I think a leap year example is different because it's something probably directly related to your business logic (If I'm not imagining this wrong) and you would want to know what you pass in actually works somewhat.
"you would want to know what you pass in actually works": it sounds like you would write a test for is_leap_year()
In this case yes but where varies and directly at all varies too by what I'm making.
I have tests that needs some setup, e.g. by loading a JSON schema. I've set them up as fixtures in the module and that's fine. However I need the same fixture functions for several test modules, but with different file input. To DRY I'd like to move these fixtures to conftest.py.
Is there a way to parameterize fixtures in conftest.py where the parameter is actually provided by a test file/module?
Can a test fixture with @pytest.fixture(scope="module") get access to a variable which a specific module sets? -- Because scope="module" in conftest.py means per test-file, not per-conftest, right?
Google suggests that I can create a common fixture that returns a function that does the heavy lifting. This way I can give an per-file input to the reused fixture.
It ended up something like this:
# ---- conftest.py ----
@pytest.fixture(scope="session")
def fn_schema():
"""Return a function that reads a schema file."""
def schema(filename: str|Path) -> dict:
with open(ROOT / "schemas" / filename, "r") as f:
return json.loads(f.read())
return schema
# ---- test_something.py ----
DUT_NAME = "user_data"
@pytest.fixture(scope="module")
def schema(fn_schema):
return fn_schema(DUT_NAME)
def test_example(schema):
...
Then I realize that the fn_schema() doesn't create any value as fixture since the schema() fixture is required. It could as well just be a regular imported utility function.
right, schema can just be a regular function. Even fn_schema might not be providing much value, since it doesn't cost much to read the file each time.
The reason schema() is a fixture is because it's used a lot in the test functions. So to avoid repeating schema = fn_schema(DUT_NAME) in every function.
Scope isn't terribly important for that
I found indirect= as an option to parametrize and are looking into if that is a more elegant method
This works, although the repeated @pytest.mark.parametrize(...) quickly gets very tedious:
# ---- conftest.py ----
@pytest.fixture
def schema(request):
with open(ROOT / "schemas" / request.param, "r") as f:
return json.loads(f.read())
# ---- test_something.py ----
DUT_NAME = "user_data"
@pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
def test_user_data(schema):
...
@pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
def test_user_data_2(schema):
...
# This works:
schema = pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
@schema
def test_fn(schema):
... # This works
# This doesn't work
@pytest.fixture
@pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
def local_schema(schema):
return schema
def test_fn2(local_schema):
... # This doesn't work
By exposing that JSON schema you're making your tests unnecessarily complex. Maybe you could write your tests from the perspective of the user of the code, calling natural entry-point methods?
can decorate the fixture if needed
@pytest.fixture(params=["user_data"])
def user_schema(request):
return json.loads((ROOT / "schemas" / request.param).read_text())
def test_user_data(user_schema):
...
Otherwise, I'd write a helper to determine the schema based on some info. i.e. for a web request, you can use the path + openapi spec to find the json schema
I wish there was an ability to pattern match the module name on the command python -m unittest discover -s "longmodulename.modulenameblah*" -p test.py
I can match on the filename, but not the module 
afaik you can with pytest
with -k
Hi, so I made a click app that when you run it, runs a function with some prints and inputs like:
# in cli.py
@click.command(name="start")
def start():
main()
cli.add_command(start)
# in cli_app.py
def main():
raw_to_parse = input(textwrap.dedent(
"""
Welcome!
Do you want to start?
(Y)es [default]
(N)o
"""
))
to_parse: bool = True
if raw_to_parse.lower() not in ["", "y", "yes"]:
to_parse = False
if to_parse is True:
get_pattern()
How would I test this? CliRunner.invoke() doesn't seem to handle inputs in the function itself.
i'm not sure, but perhaps if you use click's utilities for input, their test tools would work with it? They have click.prompt() and click.confirm()
Am I tripping, or does IPython specifically not run this test in CI? https://github.com/ipython/ipython/blob/9.8.0/tests/cve.py since its file name doesn't start with test_.
On a recent CI run: https://github.com/ipython/ipython/actions/runs/19890255242/job/57007060466 I don't see cve anywhere in the logs
This seems like a very good argument for including tests in coverage, but they do include it in coverage, it just doesn't error when it's not 100% ๐ค
sounds like a good issue to write
i'm just making sure I'm seeing it right
would be really silly to make a PR fixing it and it's like "we're obviously running the cve test in super-cool-separate-cve-runner"
at a quick look, i don't see a thing that runs it, and the commit that added that file didn't change any test-running.
alright, thanks for the reassurance
the test thankfully still passes, otherwise I would've thought it was intentionally named this way to unfix the CVE later
(kinda sus that the only test this happened with is a security related one)
sorry, I'm a bit over-paranoid
I'd say "diligent and detail-oriented"
ugh, codecov has geoblocking... and also doesn't like requests over Tor...
I have to spam "new circuit for this site" just to see the coverage. smh
๐ A cool example, in the wild, of why I like coverage on my test files right there. ๐ธ
So, like I have a problem when trying to run hatch test and I am not sure what I can do to fix it.
Click here to see this code in our pastebin.
Hi everyone! I am here about a concept of making tests. In F.I.R.S.T principles it's required to wrtie test simultaneously with creating some x func or even before, but what about reality? Sometimes, you don't want to write tests simply to check whether value or None returned or even write test before a certain func.
So how to correctly implement T - Timely part of principles in real-world development?
To me, Timely doesn't mean the test should exist before the code. It means the tests should be added to the project when the code is added to the project. "Added" could mean a pull request, or a work item, or whatever. The new code or fix isn't done until there are tests to go along with it.
ty for your answer, now it makes sense! btw i read your article today about mocks. Find it really useful for my ocassion and thanks to it I understand a bigger picture, appreciate your work!
A useful class that is hard to test thoroughly, and my failed attempt to use Hypothesis to do it.
Beautifully written, as usual. Just out of curiosity, how long did it take you to write that? (Excluding the programming stuff; I'm just curious about looking at a sentence and saying "hm, could that be clearer")
that was probably an hour, then a long walk, then 20 min of editing? These days I ask claude for critiques of drafts, and take ~half its suggestions
ooh sneaky ๐
I think you mean, using all of the tools available to me ๐
YES THAT'S CHEATING
I'm really confused...
Why does a test with the name test_lorum_ipsum_update pass, but when I change it to test_update_lorum_ipsum it fails?
changing the name won't do it. something else is going on. Can you show us the passing and the failing code?
perhaps there are two tests with the same name? test framesworks might ignore the second such
The assertion that's failing when I have the test named as test_update_lorum_ipsum is:
mock_mongo_document.find_one_and_update.assert_called_once_with(...)
The fail is:
AssertionError: Expected 'find_one_and_update' to be called once, Called 0 times
Then, when I change the test name to test_lorum_ipsum_update it passes.
There are no other tests with the failing test name. That was my first thought ๐
Renaming could change the order the tests are run. Perhaps your tests are not isolated from each other
The test class that it lives in inherits from unittest.IsolatedAsyncioTestCase
you're certain that when it passes, it actually runs (as opposed to being skipped)?
I guess your future looks like: simplify your tests bit by bit until you discover the bit that is breaking things
I don't believe it's skipping since it says it passed. I could run the test through the debugger and see what all is happening
I don't know what IsolatedAsyncioTestCase does, but there are lots of ways for tests to be accidentally coupled to each other.
I'd put 5/0 or some other easy-to-type thing that is guaranteed to raise an exception
@river pilot @marsh raft - okay, so if I comment out the test above it which mocks out the function that I'm testing later, it passes with the name that I want (test_update_lorum_ipsum).
can you share the code of that test you just commented out?
It runs your test method in a new event loop, same design as the original class but that
Here's the test class' setUp:
def setUp(self):
self.service = AttendanceService()
The first test:
async def test_handle_absence(self):
{ ... }
self.service.update_attendance = AsyncMock()
self.service.update_attendance.return_value = AttendanceModel(...)
{ ... }
self.service.handle_absence()
self.service.update_attendance.assert_called_once_with(...)
This passes...
Second test:
async def test_update_attendance(self, mock_document):
mock_document.find_one_and_update = AsyncMock()
mock_document.find_one_and_update.return_value = AttendanceModel(...)
test_result = await self.service.update_attendance(...)
mock_document.find_one_and_update.assert_called_once_with(...)
What is cleaning up the mocks? Something needs to undo them at the end of the test.
or, what makes mock_document, and what uses it?
I hope AttendenceService isn't a singleton....
more ideas ^^
It is a singleton ๐
does that mean that all of your tests are using the same one? That will be a problem.
Yeah, which now makes sense with me not having a cleanup
I recommend not using singletons: https://nedbatchelder.com/blog/202204/singleton_is_a_bad_idea.html
Design patterns are a great way to think about interactions among classes. But the classic Singleton pattern is bad: you shouldnโt use it and there are better options.
I agree with you. The client that I'm working for uses them, so my hands are tied.
ok, at least the root cause has been found. Cleaning up the mocks should fix it.
Thank you and @marsh raft for being my rubber ducks 
Ran into a bug I can't get to the bottom of, maybe someone smarter than me can figure it out. The following test produces this error in python 3.10 - 3.11, but not 3.12 or later:
NameError: name 'isclose' is not defined
def test_shear_wall():
file = "/Users/villager/Projects/pynite/Examples/Shear Wall - Basic.py"
exec(open(file).read())
- The script makes use of the
math.isclosefunction.from math import isclose - I cannot reproduce the error in a smaller example.
- There is no error running the file directly, or via
exec. So it I expectpytestis somehow related. - More details: https://github.com/JWock82/Pynite/pull/301
I cannot reproduce the error in a smaller example.
What does that mean? It doesn't happen if you remove everything past the lastisclosecall in that file?
Also: Do you have more traceback than just the NameError? Which line does it error on?
I don't think I can see more traceback because I'm running it through exec. GitHub action log here: https://github.com/JWock82/Pynite/actions/runs/20359819373/job/58618190836#step:6:115
I will try remove everything past the last isclose call.
I would remove things until it no longer occurs. Does it happen when refering to isclose at all after the import? Does it even happen in the script file itself (or in something it calls)?
Thanks! That was helpful, it's happening at the list comprehension. This also throws an error:
n = len([node for node in model.nodes.values() if height])
NameError: name 'height' is not defined
I'll keep poking at it.
it sounds like some odd scoping thing
anyone have an idea on how i can write unit tests for this?
https://github.com/CheetahDoesStuff/sleet
i fear that they will change / affect the projects enviorment (installing/deleting packages, writing commits etc) as that is what its built for and those are the features i would need to test
i haven't looked at the code, but it sounds like you could create a temporary directory and do everything there, checking the results.
hmm, i guess that would work
I am currently taking over a pretty big code-base that isn't in Git, nor is it test-covered. What I would really like to do is import individual pieces of code into Git along with the tests I write. This is kinda hard to do with a big, convoluted code-base, and I am wondering if you know of a way to run pytest on the Git index (cached changes) whenever those change? I.e. I do git-add and pytest runs on whatever is HEAD + index at that time and gives me the output? Sort of like what pre-commit can do pre-commitโฆ
why not put all of the code into git now? I don't understand how git-ness and tested-ness are connected.
It's me trying to make sense of the big thing by carving out batches at a time.
maybe I just want pre-commit
i wouldn't run pytest in pre-commit, it could be much too slow.
Well, I agree, but maybe this is precisely what I need right now?
I think doing this on the index is hard. Maybe you could settle for doing it based on commits? Then you could use some combination of git worktree to have a second, linked checkout of the same repo which is on that same branch (but doesn't have any uncommitted files), entr (e.g. to watch HEAD), and a little script that pulls and runs pytest in the second checkout whenever you make a commit.