#unit-testing

1 messages ยท Page 14 of 1

proud nebula
#

We slice the stack in a different direction.

sturdy plaza
#

Hello, all. I have an interesting problem to which I got a suboptimal solution. I'd like to know if you can give me some ideas on it.

I have a legacy class whose structure is like this:

class SomeClass
    def process(self):
        try:
            <some code>
            self.create_indicators(some_parameters)
            <some code>
        except Exception as e:
            return False
        return True

    def create_indicators(params):
        <some_code>
        my_interesting_var, refs = self.create_my_interesting_var(params)
        SomeOtherClass.send_away_my_interesting_var(my_interesting_var)
        <some_code>

So, as you can imagine, I want to inspect my_interesting_var when testing with unittest.TestCase. The only way I could do it was changing SomeClass code to this:

class TestException(Exception):
    def __init__(self, message, obj):
        super().__init__(message)
        self.obj = obj

class SomeClass
    def process(self):
        try:
            <some code>
            self.create_indicators(some_parameters)
            <some code>
        except TestException as e:
            raise TestException("Error processing my_interesting_var", e.obj)
        except Exception as e:
            return False
        return True
#

And then I created the test like this:

    @patch("mymodule.SomeOtherClass.send_away_my_interesting_var")
    def test_create_report_create_web3_indicators(self, mock_mymodule):
        # Make mocks:
        mock_mymodule.return_value = MagicMock()
        mock_mymodule.return_value.send_away_my_interesting_var.side_effect = lambda my_interesting_var, source_name=None: (_ for _ in ()).throw(TestException("my_interesting_var", my_interesting_var))
        my_interesting_var = None

        template = mymodule.SomeClass(some_inits)
        try:
            template.process()
        except TestException as e:
            my_interesting_var = e.obj

        self.assertEqual(my_interesting_var, something)

The problem is, I needed to change SomeClass code so I could test it.
Is there a way to test this variable without changing the legacy code?

proud nebula
molten hollow
# sturdy plaza Hello, all. I have an interesting problem to which I got a suboptimal solution. ...

Hello, all. I have an interesting problem to which I got a suboptimal solution. I'd like to know if you can give me some ideas on it.
My suggestion would be, don't try to retrofit tests to an already existing code. You won't improve the design of your code that way. It's poor as a regression test, because the test won't cover what's supposed to happen, only what does happen.

So if you want to cover that piece of your feature with test, start with a test, test-drive a new class, and then replace the old class with the new class.

pearl cliff
#

First of all what is "supposed" to happen is very very often "whatever already happens"

#

Second, if you don't test for existing behavior parity first, debugging can become a nightmare

molten hollow
#

First of all what is "supposed" to happen is very very often "whatever already happens"
I see the confusion. When I said "supposed to happen", I meant the expected behaviour or outcome, and when I said "whatever already happens" I meant the implementation details. Good test should specify the expected behaviour, leaving the implementation to vary. If you start with the code, and try to retrofit tests to it, you don't get tests that check the expected behaviour, you get a test that checks implementation details.

#

Second, if you don't test for existing behavior parity first, debugging can become a nightmare
Sure, but if you start with tests, debugging done is very minimal, because projects like that tend to have very few bugs.

proud nebula
molten hollow
#

You can do black box testing either test first or test after, and doing black box testing test-first always yields better results than test after. As to the "very reasonable behaviour", I wouldn't be so sure, I would say it's only slightly better than manual tests. If I were to order testing strategies from the best to worst, then at the very end would be manual tests, and one tick before that would be tests written after the fact. Any kind of test written before the code will give you a better design, better coupling and cohesion, and most people report feeling way better when working with systems like that.

I guess, the reason people don't do that, is because they find themselves working on legacy systems without tests, so they very rarely get a chance to test-drive something for real. They are forced to either retrofit tests to existing code (which is very low quality), or create a new code and test-drive that (which is also often difficult). If someone finds himself in this situation, like imagine you work in a project like that for 2-3 years, and you can't get away from it; it's very painful to admit to oneself "I'm doing bad testing, because the project forces me to". It's much better to rationalize it by saying "I'm doing test after the fact and that's good".

#

I propose to you - take any person, show them either a very well written project with tests or a new project, show them how nice and easy it is to work in an envorinment like that, 99% of them will never write tests after the fact again - but they need to experience it first.

#

even "snapshot testing" and it's a very reasonable behavior and might or might not test implementation details.
I agree that it's popular, but it's not good. What you've achieved with a snapshot test, is a test that couples to the implementation most often. It doesn't provide the same level of freedom to refactor your internals as if it was written before the fact, like a regular test.

proud nebula
#

Black box testing is great when it's the right tool. TDD is great when it's the right tool. No tests at all is great when it's the right situation. Everything has context.

molten hollow
# proud nebula Black box testing is great when it's the right tool. TDD is great when it's the ...

Black box testing is great when it's the right tool. TDD is great when it's the right tool. No tests at all is great when it's the right situation. Everything has context.
๐Ÿง
This argument can be used to defend any bad idea. Like LCD or OLED screens are better than old CRT monitors, but you could then say "lcd are good when they're right tools and ctr are good when they're the right tools". That doesn't mean anything. The context for using CRT is so narrow it doesn't make sense to recommend it to anyone, same as test-after code.

proud nebula
molten hollow
#

The topic came @sturdy plaza who showed some code, and he asked how to retrofit tests into an existing code.

#

I suggested that it's a bad idea all together, and it would be much better to drive the design of the code from tests. You achieve much better reslts that way. By retrofiting (or snapshot tests), you couple your tests to the implementation. They assert the code is the code that is there, but it doesn't improve the design of the system (like tests-first would) and aren't flexible enough to allow a substantial refactor. Tests-after will break when you refactor your code, because they couple to the implementation.

#

Test before will allow refactor, because they're specificying the intent, not the implementation.

river pilot
proud nebula
molten hollow
# river pilot if you already have code, and you don't yet have tests, you should write tests....

Well, yes and no. You're right in some areas.

If you write your code first, we don't generally thinkg of it being testable, because it's already hard to get it to work. So we tend to create software thats not-testable, poorly designed and tightly-coupled. It's hard to test that code.

On the other hand, if you write your test first, that drives you to creating software that is more testable, because you can't do it any other way ๐Ÿ˜„ The resulting code is much more testable, and by definition, decoupled. It's usually better designed, because when writing the test you're thinking of it what you'd like to achieve, not implementation details.

The first is undesired, and the later is highly desired.

#

Now, how do you achieve the second, if you already have the first? ๐Ÿค”

proud nebula
molten hollow
#

The sad part is, you can't. The design decisions are already made, so you can't retrofit proper tests to that code. What you can do, is if you have like 100 "old classes" (classes without tests), you take one of them, and you rewrite it, but with tests. So you have 99-old classes and 1-new class. That one class is an improvment. You use the new class in your code, and when you're done you remove the old one. You do that for all code in your system, and you have a testable system now.

molten hollow
river pilot
proud nebula
molten hollow
#

He has a class without tests, and he wants to add a test. Of course, you should want the good tests and good design - if not, what's the point? And the best way to achieve that, is to write that class test-first.

#

If you do that, you will have what you wanted - a class with good tests.

#

it's not realistic to say, "just rewrite the whole thing"
I didn't suggest rewrite the whole thing, just this class that he wants tested.

river pilot
proud nebula
pearl cliff
proud nebula
pearl cliff
#

The chance of introducing a new bug, or accidentally missing some tiny feature, it is way too high

#

In an ideal world you have a specification for the program and you can implement that as the test suite, and then factor your code to match the test suite

#

In practice, the specification is whatever the program already happens to do

sturdy plaza
#

Oh, these hints from you all are interesting. I think there are more things here that I can chew for the moment.
I decided to create a side effect and get the arguments passed to send_away_my_interesting_var.
It's not the best solution, I know, but now I see I can use tests to help decoupling and speed up development from now on.
I'm glad it fired such an interesting discussion! ๐Ÿ™‚

molten hollow
#

if i have no tests, and take your advice, i will rewrite the whole thing.
You only rewrite the thing, that you want tested. If you want just one class tested,you only need to rewrite that one class.

If you rewrite it with tests but have no way to verify that your new code does the same thing as the old code, that's a problem.
That is also a very serious and real issue, thank you for bringing it up. It's serious, because there are two forces at play: In one corner, we have the "what the code should do" in the other "what the code does". Good tests should specify what the code should do. The good code, should specify what it does.

If you tell me that "have no way to verify that your new code does the same thing as the old code" to me that means, you know what the code currently does (or not), but probably not what it should to. That's a very common issue, if you write the test-after. Because you implement the stuff, you read the code, and have no idea what it's supposed to be doing.

The chance of introducing a new bug, or accidentally missing some tiny feature, it is way too high
That is also a very real issue, and exactly the thing you get if you do test-after.

#

In an ideal world you have a specification for the program and you can implement that as the test suite, and then factor your code to match the test suite
In practice, the specification is whatever the program already happens to do
That's not entirely true. Noone will give you a specification for a program. You're the programmer, you're in charge of developing the application. What you will receive, is wishes of your customer/client/user. What he wants to do, what work he needs to do, what's the benefit he wants. How it's implemented/designed/developed, is up to the programmers.

#

And thus, as a programmer - you must know what the program should do. If not, you're in big, big trouble.

proud nebula
#

Dnaron. You sound very junior. I don't know if you are, but it sounds like you're green and excited and have read a lot. I've been that person. 25 years ago.

molten hollow
#

Argumentum, ad hominem. Thank you.

proud nebula
molten hollow
#

That's not what I'm saying, you're misreading my words.

#

He said he's got a class without tests. He wants to add a test. Thus - he wants to have a class with tests. The best way to have that is to write a new test first, and drive that class back from the test, then remove the old class.

#

The class is not written in stone. You can refactor it, update it, remove it and rewrite it.

river pilot
molten hollow
#

What risks? The thing you mentioned already, are that you don't really know what the class is doing, and you're scared of changing it, because you don't know what might happen.

#

And I agree, that's a bad place to be in.

proud nebula
molten hollow
#

Working in a legacy software, that who knows what will do is stressfull.

molten hollow
#

But!

#

If you're in that kind of place, that you don't really know what the software is supposed to do, because it's so bad and old,

#

i'm sorry to say that, but you just aren't able to test it properly. You can't, it's not possible.

#

You can fool yourself into thinking you can blackbox test that,

#

and you can do that, but these tests will not give you any value.

#

they will be slow to execute, break when you refactor, won't catch bugs, won't improve your design, nothign.

river pilot
molten hollow
#

They have like 0.0001% of the value of the tests that would give you 100% if you wrote them test-first.

river pilot
#

they aren't ideal, but we started from a non-ideal place.

molten hollow
#

If you have a legacy code with classes that you have no idea what they're doing, the only thing you can do to improve it, is to learn what the code is supposed to be doing.

#

Not what it does, but what it's supposed to be doing.

#

If you don't have that, you can do all the black box testing you want, nothing good will come from that.

river pilot
molten hollow
#

Let me ask you this then. What good are tests, written by someone who doesn't know what the class under test is supposed to be doing?

river pilot
molten hollow
#

It's like a recipe for a pie, written by someone who doesn't know how to make one.

river pilot
proud nebula
#

He blocked me. So I guess you're on your own ned. Godspeed.

river pilot
proud nebula
# river pilot how can you tell that?

You can't react with emojis on a message of someone who has blocked you. It's a nice funny animation too. The entire window vibrates. It's pretty neat. Confusing as hell though.

molten hollow
# river pilot no one said, "I have no idea what the class is supposed to do"

no one said, "I have no idea what the class is supposed to do"
I think @proud nebula said that:
If you rewrite it with tests but have no way to verify that your new code does the same thing as the old code, that's a problem.
He suggested there might be no way to verify that the new codes does the same thing as the old code. To me - the only circumstance in which that is true, is if you don't know what the code is supposed to be doing.

#

He suggested there are some parts in the code, that do something - but we're not really sure why or how.

river pilot
molten hollow
#

If you do, then what's the problem with writing a new test first, then drive a class from it? ๐Ÿค”

river pilot
molten hollow
#

Yup.

#

You test-drive a small bit of the system, and you replace the usage of the old version with the new version.

proud nebula
#

(aka YOLO)

molten hollow
#

And you do that with every bit that you want tested properly.

river pilot
molten hollow
#

the problem is that there can be unknown edge cases or side effects.
Back again - if there might, that means you don't really know what the system is supposed to be doing.

river pilot
#

in any case, @sturdy plaza has what they needed.

molten hollow
#

These "unknown edge cases" or side effects, that you speak of - if the application was written test-first, there wouldn't be any, because they would be covered by tests.

#

the problem is that there can be unknown edge cases or side effects
If that is true, that there are these edge-cases, then doing "blackbox" testing won't help you much eaither, because that kind of test won't illustrate those edge-cases.

river pilot
molten hollow
#

As a sidenote, yes. But I'm also saying how to leave it.

river pilot
molten hollow
#

If you want good tests, you do this:

  • if there are edge cases, find them
  • write a fresh test
  • drive the class from that test
river pilot
#

and test-first doesn't ensure that you've fully tested all of the behavior either.

molten hollow
river pilot
#

even if you wrote the tests first.

molten hollow
#

If it was neede, there would be a test for it, that would catch it.

#

If you want some behaviour from a software, you codify it in a test.

river pilot
river pilot
molten hollow
#

and even if the mistake happens, it's caught very quickly.

molten hollow
river pilot
molten hollow
#

You're the author of the test. If you missed the secret behaviour, that means it wasn't needed.

river pilot
#

i get it: tests first is a good way to write better software. but it's not a magic bullet.

molten hollow
#

I never said it was a magic bullet. I just said it was orders of magniute better than test after. Of course there are mistakes, but way fewer.

even if it was, the tests could have missed the secret behavior.
When you're writing tests, you're designing your system. If you want your system to do something, because it must, you write a test for it. You don't rely on secret behaviour to simply "emerge" and give you a feature. If you want a feature, you write a test for it.

#

So yes - there might be secret behaviours, but you can remove/change/update them, and if all of the tests pass, you're good to go.

#

In a legacy system, the secret behaviour that's missing might actually be critical - but you don't know it, you have no idea of knowing. If there was a test for that, you would know.

#

I get what you guys are saying. The system was in production for 10 years, some secret behaviour appeared that 50% of your users rely on it; and you're afraid of changing the code because of that secret behaviour, that noone knows about, but if you were to remove it, half of your users would scream. I get that, I've been there.

#

So due to that fear of breaking the secret behaviour, you don't test-drive your app, and do blackbox/snapshot testing; because that's the only think that you trust not to break your system.

#

That's a terrible code to work with. It's aweful. It's stressful, you feel the pressure, you can't change it much, because it's so fragile. You rename a variable and suddently the pagination doesn't work. That sucks.

#

There is no way out of this, other than to properly design your system. You need to start improving it, if the system is still to be developed for another years. To improve it, you need to know what it's supposed to be doing. You can introduce a small change, and roll it to QA's or a small number of people, to verify that you didn't break anything. You can push it to another enviornment, to ask someone who knows the system whether that part you touched still works.

#

I suggest you watch a video by Kent Beck "Forrest and a desert": https://www.youtube.com/watch?v=dtu9Ks2CN-U

Beauty in Code 2025 was a single-track full day IT-conference organized by Living IT, featuring six amazing speakers. It was hosted at the Malmรถ Live conference center on March 1, 2025.

https://beautyincode.se
https://livingit.se

Session 6 of 6 by Kent Beck (@KentBeck)
"The Forest & The Desert Are Parallel Universes"

So close and yet so far....

โ–ถ Play video
pearl cliff
#

The truth is that you need both

#

And you need to be pragmatic about what you do, and what order

#

Most of the time, it's safer to take the approach of gradually building up tests surround existing functionality and building up tests around desired/specified functionality

proud nebula
#

Ok, someone tell him no one meant that the blackbox tests should be kept for all eternity. I think he thinks that's what we're all saying.

#

(I hate how stupid blocking in discord is)

pearl cliff
#

I guess I'm taking the approach that blackbox tests can be an absolute fucking nightmare and sometimes you actually want to write unit tests for existing code

#

Like you really need all three

pulsar oracle
#

Peak TDD is when you drive the design of what you're building with clean interfaces, you fundamentally make it easy and comprehensive to test. If you care about actually testing that your code does what it is supposed to and consider it mandatory then you design it in the easiest way to get there. But I don't think tests after the fact in all situations are bad. You don't need TDD to write testable code, sure it's probably leagues better if you lean into it but some stuff is obvious with what behaviors it should have and it's fine to put ones on after.

pearl cliff
#

But here a user wanders in and asks us how to write a unit test for an existing class. The answer can't be to spend several developer weeks or even months building out a sophisticated test infrastructure

pearl cliff
#

@molten hollow I think where your approach makes more sense is in a big team

#

I do not get the sense that this person is in a big team but I suppose I should've checked

molten hollow
#

it's safer to take the approach of gradually building up tests surround existing functionality and building up tests around desired/specified functionality
"Safer" as in less chance of breaking secret behaviour? Yes.
"Safer" as in it lets you safely change the code, introduce new feature, fix bugs, refactor? No.

But I don't think tests after the fact in all situations are bad.
To me, writing tests after the fact has all the disadvantages, and no advantages. I'm sure, that if I jumped into your project, all tests I would've written would've been test first. There are ways to do that, that you can learn, and there are obstacles to that, but they can be dealt with.

You don't need TDD to write testable code
That's right, but if you rely on your judgment to create a testable code, that's just an untried guess. Sometimes it'll work, sometimes won't. And you end up with untested code, in some proportion.

I guess I'm taking the approach that blackbox tests can be an absolute fucking nightmare and sometimes you actually want to write unit tests for existing code
Sure, you might want. The question is - what do you hope to achieve by that?

But here a user wanders in and asks us how to write a unit test for an existing class. The answer can't be to spend several developer weeks or even months building out a sophisticated test infrastructure
I never said that. You can do that in a couple of minutes.

And you need to be pragmatic about what you do, and what order
I see how you call yourself "pragmatics" and me "idealistic", but maybe we can leave these unhelpful words? What you call "idealistic" to me is day-to-day job, that I do for many years now. Now, what you call "pragmatic" to me feels like being in the worst possible situation, that if I found myself in, I would like to quickly improve that. So, how about we keep it civil. If my you feel my advice doesn't cover some case, please bring it up in a peaceful manner, and we can talk about it.

pulsar oracle
# pearl cliff <@323535764455555083> I think where your approach makes more sense is in a big t...

I work by myself and I use TDD if what I'm doing isn't exploratory. Either you write and use and expect other people to use your code or application on the trust me bro model, or you add tests to your code to verify it does what you want, stuff can be as simple as a lambda. Design to make it testable, you don't even gotta write a test to do this either, even just "when in doubt write testable code" does wonders. The code isn't done unless there's tests, at least isolate the important pieces and do those, and in that case why not write them first.

river pilot
#

@molten hollow "You can do that in a couple of minutes." The discussion will be more helpful if you acknowledge that it might take more than a couple of minutes. You are stating things in very stark terms.

molten hollow
#

If your code is coupled to some framework, if it's undeterministic, if it's got a lot of dependencies, if it's badly designed - yes. But these are all code smells. If you stop thinking about "how can I test this already existing class", and think of it in terms "I need a class that does X", then it's very simple, and very doable in a couple of minutes.

river pilot
molten hollow
river pilot
molten hollow
#

I gotta be honest - if it took me hours to create a test, I would be helishly tired and would probably stop doing that. But it's quick and easy, provided you don't slow yourself down by code smells.

pulsar oracle
#

If the class is supposed to produce some sort of json file for example and there's a lot of stuff to make sure is right maybe where it's not worth bringing in the repository or other data access abstraction pattern.

molten hollow
pulsar oracle
#

I'm just trying to imagine because it's been a while. I don't know tbh. Maybe someone else should provide an example.

molten hollow
#

So if your class is coupled to a framework (like uses spring anotations, laravel classes, ruby on rails stuff), has a lot of dependencies, lot of static/global state, is coupled to the inputs and outputs, is reliant on implementation details; then obviously this class in not testable and would take hours to test that. That's exactly the reason why working with it hard, even if you introduce blackbox testing to it.

#

And that's exactly why I'm suggesting you should create a new test, drive the responsibility of the class from the test, and then use it in place where the original class was used.

proud nebula
#

ah, to be young and naive again

pearl cliff
# molten hollow And that's exactly why I'm suggesting you should create a new test, drive the re...

Do you have any success stories of doing this? I don't ask to doubt your experience. I more wonder if there are certain situations where this approach does work, which is useful for those of us who uniformly recommend against it.

Many programmers spanning decades have tried to do this many times and failed, which is where the advice you hear comes from. I personally have tried it and it has only ever ended up in me working through the night super stressed out when I could've been sleeping or having fun, and/or having sheepish 1:1s explaining that I badly underestimated the work.

So there's a mismatch between your recommendation and the recommendations of people who feel that they have learned the hard way not to do what you recommend. Maybe that means you have a different and unique perspective.

molten hollow
# pearl cliff Do you have any success stories of doing this? I don't ask to doubt your experie...

Sure!

Do you have any success stories of doing this? I don't ask to doubt your experience. I more wonder if there are certain situations where this approach does work, which is useful for those of us who uniformly recommend against it.
I mean, I managed to do it in every project I joined. I do stumble upon everything you guys describe, big classes, no tests, secret behaviours, all that. What you experience, is real. But I try to address the issues and deal with them. I tried multiple things, and what I suggest here was just the stuff that works for me. I tried blackbox/sandbox testing, and it didn't do it for me.

Many programmers spanning decades have tried to do this many times and failed, which is where the advice you hear comes from. I personally have tried it and it has only ever ended up in me working through the night super stressed out when I could've been sleeping or having fun, and/or having sheepish 1:1s explaining that I badly underestimated the work.
That's definitely true, and that's a real problem. However, I found that it's not intrisic, it's not like we're bound to suffer. Most problems like that comes from very simple things, that we can change. Stuff like:

  • we believe people must sign off deploys
  • we believe we must deploy to all of the people at once
  • we don't trust our developers and testers
  • we can't work in pairs because it slows us down too much
  • we should optimise for time spent coding, not talking to people
  • tests are not part of a releasable, so they're not important
  • my manager didn't ask me to refactor, so I can't do that.
  • we must create the whole feature at once in a sprint, we can't split it in chunks

These aren't the only ones, but there are more. There are things/assumptions, that people hold that sometimes stop them from working in a productive way. The only way for me to address them, would be to find them somehow; either by working with your code or by talking to you.

#

Maybe that means you have a different and unique perspective.
It's definitely not unique, I met many people who do the same thing. Did you try reading "Working with legacy code" by Michael Feathers?

molten hollow
#

When I said before that "you don't know what your software is supposed to be doing", I'm prepare to accept that may have been a bit rough; people might feel personally attacked. But I didn't mean to attack anyone, that was supposed to be a diagnostic observation. Programmers not being fully aware of what the code is supposed to be doing is a real problem, that's need addressing. I just stated it, to put myself in a place where I can deal with the issue somehow. If I were to find myself in a project, where I don't know what it's doing, then that's the first, second and third thing I would need to fix. Testing would come later.

#

Test-first is useful precisely because you can't really do it, if you don't know what your software is ought to do. And I saw that when I suggested that, I got pushback - because some programmers actually didn't know that. So the thing now should be - not to skip the test-first, but to learn what the system is supposed to be doing.

river pilot
molten hollow
#

I had a feeling i'm talking to 3 different people, and randomlny one of them answers my posts ๐Ÿ˜„

safe bronze
#

Bro why i was temporality muted?

molten hollow
cedar wraith
#

How are you supposed to unit test, when you actually didnt implement the function first?

#

Property based testing aswell

proud nebula
proud nebula
tired jungle
#

i have exhausted all of the internets, leaving this channel for the last resort. If you google anything, I have tried it

#

even used GPT

proud nebula
tired jungle
#

its a requirement...

#

i dont know how to mock a namedtempfile

#

otherwise, i am doing it by the book

proud nebula
tired jungle
#

work

#

just landed this job and i have little experience with unit testing, otherwise am solid with python overall

odd walrus
#

Sometimes for a thing like a temp file you want a "fake", not a "mock"; they can be easier, worth looking up at least.

proud nebula
#

Is the requirement that you can't modify the code to test until after you've tested it? Under no circumstances ever?

#

(Redefining the problem is what makes a good programmer imo)

pulsar oracle
# tired jungle its a requirement...

Are you sure you're not confusing mocking with mandatory unit testing? Because mocking doesn't prove that your code works or necessarily meaningfully achieve what you want. Which in this case looks like you want to load a file from somewhere correctly, which it would be better to take a sample file or produce one (whichever is more convenient) and see if the function has the end result you want.

dense bough
#

Does pytest monkey patch make any guarantees about what __enter__ returns?

swift pewter
dense bough
#

The two different context manager makes things a little confusing ๐Ÿ˜…

molten hollow
# cedar wraith How are you supposed to unit test, when you actually didnt implement the functio...

It's actually not that hard. Because what do you actually need to test a function? You need to know its name, its signature and arguments, and you need to know what its purpose is. So for example, I can imagine a function that parses roman numerals. I don't have the function written yet, but I can write the first test like so:

def test_parse_roman_numerals():
  assert parse_roman('I') == 1
  assert parse_roman('II') == 2
  assert parse_roman('III') == 3
  assert parse_roman('IV') == 4

Having that, I'm free to implement it however I want. You don't need to know the function implementation to write a test for it.

swift pewter
molten hollow
#

There are things that are hard to test of course: UIs, concurrency, distributed systems, 3rd party systems. But there are tricks to side-step it, so that most of your code can be test-driven.

molten hollow
cloud shadow
cedar wraith
molten hollow
#

defining the expected results beforehand that's correct. But planning part, not really.

cedar wraith
proud nebula
#

Black box testing is very much after the fact.

molten hollow
proud nebula
cedar wraith
proud nebula
#

All of it is "testing" or (less academically correct, but commonly used) "unit testing".

molten hollow
#

@proud nebula I'm sorry, but that message is quite misleading for new developers.

#

If you write the tests before you write the function it's TDD
That's necessary for TDD, but not sufficient. Not without other prerequisites.

If you write it without understanding the function it's black box. I
Maybe that's incorrect wording, but I think you mean "without knowing the implementation details"? Because if you truly meant "without understanding the function", than you have no business testing the function, if you don't understand it.

If you use hypothesis it's Property Based Testing.
~~You can use hypothesis in all kinds of testing, not just property based testing, what's the deal? ~~
PS: the author didn't mean "hypothesis", he meant "hypothesis library".

If you use mutmut it's Mutation Testing.
All of it is "testing" or (less academically correct, but commonly used) "unit testing".
Mutation testing isn't really testing per se. It's a tool to find holes in your test suite. You can't really find bugs with mutation testing, you can only find mutants (live mutations), that weren't caught by the test suite, but that's not a bug. So it's more of a test-suite-quality-control, rather than a testing strategy. You can't really catch a regression with mutation testing, and you can't drive the implementation (like TDD) with mutation testing. What you can do with mutation testing, is improve the reliablness of your test suite.

If you write them after the function exists by looking at the code it's white box. If you write it without understanding the function it's black box.
That separation is very artificial. I didn't work in a team that would use that distinction. In a proper system, you would never need to couple your tests to the implementation of the method, so what's the point this "white-box-test"? It may be a cool sounding name "white-box-test"/"black-box-test", but what does it really bring to the table?

river pilot
#

I find most distinctions (especially within the testing world) are overdone.

molten hollow
river pilot
molten hollow
molten hollow
river pilot
molten hollow
#

I would call it "auditing your tests", "reviewing your tests", "inspecting your tests" at best.

river pilot
#

is load testing a kind of testing?

molten hollow
#

"Testing" to me means finding bugs.

molten hollow
#

And so security testing, performance testing, etc.

#

But I don't think mutation testing qualifies.

river pilot
molten hollow
#

Same as check style, linters, code quality checks, sanity checkes, I wouldn't call any of them testing.

river pilot
#

BUT: what value is added by saying "mutation testing isn't testing"?

#

how does that help anyone?

molten hollow
river pilot
#

this is what i mean about overdone distinctions.

molten hollow
#

Overdone distinctions are bad, but blurring distinct concepts into one also isn't helpful.

river pilot
molten hollow
#

You might as well call "code reviews" testing, because they can help you find problems, but that's not testing.

molten hollow
#

the more mutants are alive, the weaker the test suite.

#

But it doesn't necessarily mean that the app has problems.

river pilot
#

"inspecting the quality of tests" doesn't quite roll off the tongue

molten hollow
#

True. Doesn't make it wrong, tho.

river pilot
#

ok, we have different approaches to all of this I think

molten hollow
#

Notice, that if load test/security test/performance test fails, then that necessarily means there's a problem that requires fixing.

#

With mutation testing, that's not the case.

#

I mean, "testing" is just a category humans impose on practices.

#

you might add and remove elements from that category, if you'd like . the question is whether or not that's useful.

#

@river pilot If you want to say that mutation testing is testing, then things like:

  • coverage
  • linter/checkstyle
  • code review
  • cyclomatic complexity

would also need to be added to the "testing" category.

river pilot
#

This is something I've been thinking about more and more: https://hachyderm.io/@nedbat/115245272539560254

Level 0: Testing is debugging
Level 1: Testing is to show the program works
Level 2: Testing is to show the program doesn't work
Level 3: Testing is to reduce the risk of using the program
Level 4: Testing is a mental discipline that helps us make better software

youtube.com/watch?v=BKgdrEPYqmM

I liked this ladder of understanding the purpose of testing:

Level 0: Testing is debugging
Level 1: Testing is to show the program works
Level 2: Testing is to show the program doesn't work
Level 3: Testing is to reduce the risk of using the program
Level 4: Testing is a mental discipline that helps us make better software

youtube.com/watch?v=BKgdrEPYqmM

molten hollow
#

Level 0: Software development is debugging
Level 1: Software development is to show the program works
Level 2: Software development is to show the program doesn't work
Level 3: Software development is to reduce the risk of using the program
Level 4: Software development is a mental discipline that helps us make better software

#

By that definition "coding == testing".

#

I mean, that's not exactly wrong. If you're using TDD, then basically testing is coding. in a sense ๐Ÿ˜„

#

so I guess that's all right.

pulsar oracle
molten hollow
#

In my definition, testing is a falsification mechanism. If you can use something to falsify that the app/program doesn't work as it's supposed to, then that's a test.

#

If something can give you result: "fix immediately", then that's a test.

#

If it gives you "fix maybe", then that's an audit/inspection,something like that.

river pilot
#

i don't exclude tests from "my program", maybe that's the difference here

#

i encourage people to include their tests in the total coverage percentage, for example.

molten hollow
#

For the same reason I wouldn't say that SEO audits for example are testing.

#

@river pilot I got it!

#

I would say that Mutation Testing would count as measurement.

#

That I would agree with.

#

But not every measurement is testing.

#

Regarding your "doesn't roll off the tongue" ๐Ÿ˜„ "measurement" sounds good.

#

I'm find with any kind of measurement giving intermediate results, and what not.

#

But for it to count as test, it would need to give a definitive response.

river pilot
#

you also want the criteria to include what it gives a response about, I think

molten hollow
#

Basically, I would hate for a junior person to come to #unit-testing , and read "mutation testing is testing", and think that he can use that to do regression test for example - that would mislead him.

river pilot
#

definitely these topics are intricate and subtle enough to need discussion

molten hollow
river pilot
#

no need to point fingers ๐Ÿ˜„

molten hollow
#

I like about testing, that's it's not open for interpretation. If an acceptance test, unit test, security test, load test, performance test, integration test, fails that must mean there's something wrong. You can't argue with it.

#

But with mutation testing, seo audits, checkstyles and stuff like that, it's up to the reader to interpret it.

raven igloo
#

and with the advent of AI... I'm finding that TDD is much more enjoyable than before. I write tests and let AI write the code to pass my tests.

river pilot
molten hollow
#

When the test fails in the future, it's already determined what it means.

molten hollow
river pilot
#

this is a repeat of a few days ago: your project seems very different than the ones I have worked on.

molten hollow
river pilot
#

i'm not interested in you telling me i've been doing it wrong.

molten hollow
#

I mean, if you were to chose, between:

  • test, that if passes gives you confidence that everything works, and when fails, points you exactly where the issue is

vs.

  • test, that you must read and interpret what it means, and different people might disagree about what the failure means

Which test would be better? Which more useful? Which would make developers work faster and better?

molten hollow
#

I might criticize ideas, concepts, etc. but with people it's much more complicated.

river pilot
#

i work on coverage.py. Its test suite checks that Python code is being measured properly by coverage.py. Python changes from version to version. tests fail. Is it coverage at fault, or Python?

molten hollow
#

Python version change isn't forced on you or on the project, right?

#

When you work on that coverage.py, you need to manually add the new version?

river pilot
river pilot
molten hollow
#

So your goal is to be very up to date with python, but it's not like your project is immediately compatible with python change.

#

Unless python was dependent on your project, that's not happening.

#

You probably version the python version you run your coverage.py, right?

#

So you set it to be compatible with 3.14 let's say for now.

river pilot
#

it supports 3.15 now, and runs nightly against the tip of main of CPython

molten hollow
#

So it relies on something outside of your control, then?

#

Well, then I would handle it the same was as any 3rd-party.

#

Like payment providers, etc.

molten hollow
#

The same way I treat stuff like stripe, oAuth login, any kind of integration with 3rd party.

river pilot
molten hollow
#

Let's say, when a new python 3.16 comes and there is a number of ways it's incompatible with your coverage.py;

#

there is minimum time you need to update your coverage.py, so it's compatible again. Let's say that's 24 hours.

#

For that 24 hours, your goal is not met.

#

You might want to get that number down, to maybe 12-hours or something, but still. You don't control when python is released, and they don't depend on you, so you can only retroactively react to the changes.

#

So when python introduces an incompatible change, it's neither python failure nor your failure.

#

They're just incomptible.

river pilot
#

i don't understand why you are talking about 24 hours, and you haven't talked about how the test failures need interpretation.

molten hollow
#

Python is definitely not at fault, they just released an update.

river pilot
molten hollow
#

Your project is not at fault, because it doesn't control the things it relies on.

river pilot
#

Python is often at fault, that's why i test pre-alphas.

molten hollow
#

Unless you're also a maintainer/contributor of python that can freely update it.

molten hollow
#

Sure, but you don't have control at whether they'll merge it, right? That's what I'm talking about.

river pilot
#

those issues are mostly, "this is what i see, whose fault is it?"

molten hollow
#

It's not like you can merge it yourself.

pulsar oracle
#

I don't get what's being discussed? Are we talking about automated testing for compatibility with the latest python versions nightly or something?

river pilot
#

@molten hollow wherever this is going: do you see how a test failure requires interpretation?

molten hollow
molten hollow
#

But I don't think what your doing is testing.

river pilot
molten hollow
pulsar oracle
#

What kind of test failure is it and why does it require interpretation? If it fails and isn't compatible with the latest version, isn't that a concrete test that says "we're invalid" or something, if that's our goals. Or is it like, it could break externally for some arbitrary reason, and it's flaky so it's not really a test?

molten hollow
#

And the part, where you solve compatibility issues with python, I wouldn't call that testing. That's integrating with a new version.

#

It's the same thing as if one of libraries in my application gets and update, and I want to update it.

#

And let's say it's got a breaking change, that I need to integrate to my app. That's not testing, that's just upgrade.

river pilot
molten hollow
#

Your example is integrating your project with newer version of python. And that definitely require interpretation, yes!

pulsar oracle
molten hollow
#

Like, if it fails.. then everyone involved will agree that it fails.

river pilot
#

i agree my test has failed. now i need to determine why and what to do about it.

molten hollow
pulsar oracle
#

feels like some sort of exploratory test, a test still. Are we compatible with the latest python? Fail = no. we've got our result, what do we do now?

molten hollow
river pilot
#

this is my only point: you said test failures shouldn't require interpretation. Sometimes they do.

molten hollow
river pilot
#

this is a meaningless distinction.

molten hollow
#

Just because you can run it in a testing library, doesn't necessarily means it's a test.

river pilot
#

this is absurd. i'm done.

molten hollow
#

@river pilot

def test_foo():
  print('Hello')

Is this a test in your opinion?

#

You can take any code and put it in a testing library. Does this mean any code is a test? ๐Ÿค”

#

I can take any hello world app, any function, and wrap it in a pytest test. Does this mean it's now a test?

river pilot
#

i hope you can assume that my tests are not like that.

molten hollow
#

I don't know what they're like, but when you tell me they're open for interpretation, then I'm prepared to say that they're not really test.

#

Test should be definite, deterministic and not open for interpretation.

#

I can agree that what your pytest "tests" are checking your integration with python, I'm fine with that.

pulsar oracle
#

I feel like we're being loose with "open for interpretation" in the example.

molten hollow
#

But given that you have control over your coverage.py, and not over python; then it's essentially an app + 3rd party integration.

#

Let's say I'm creating a webapp, that needs to allow the user to pay for services, and we use stripe to do that. Of course, stripe may be down, and in that case the website displays information "sorry, stripe is down".

#

Is this function "sorry, stripe is down" a test? Not it's not, it's just an information for the user that he service in unavailable. Yes, it tells you something, that you can use to do something, but it's not a test.

#

Sam as your pytest things. PYthon becomes incompatible with your app, you have something that measures it and lets you know about that, but it's not a test.

molten hollow
# river pilot i hope you can assume that my tests are not like that.

I think what you're doing, are measurements. And they can be open for interpretation.

PS: For them to become tests, you would need to narrow them down to true/false result with exact reason for failure. If they continue to be open for interpretations, then I'm afraid they're still measurements and not tests.

river pilot
#

thank you for demonstrating my point.

proud nebula
# molten hollow I wouldn't say so.

You're now arguing against the common use of established terms.

You are also arguing that you know better what mutation testing is than the author of the most commonly used mutation testing tool for python.

You are extremely arrogant, and refuse to listen, and when you are corrected you argue minor semantic details that are themselves irrelevant until the other part gives up in frustration.

You haven't won any argument here. You have just demonstrated that you are impossible to have a meaningful discussion with, and that you will make every effort to not lose face instead of trying to learn. You have also demonstrated that you are willing to say absolutely idiotic things like "You can use hypothesis in all kinds of testing, not just property based testing, what's the deal? ". https://hypothesis.readthedocs.io/en/latest/ "Hypothesis is the property-based testing library for Python".

What will you argue next? That "python" isn't really a programming language?

At this point you are damaging this channel by your presence.

molten hollow
#

You are also arguing that you know better what mutation testing is than the author of the most commonly used mutation testing tool for python.
If you're talking about the author of mutmut, he created the tool, but not the practice. Mutation testing was coined by Richard Lipton in early 1970-ties. There were many tools created for that later, only one of which is mutmut.
You're now arguing against the common use of established terms.
From my perspective, that's what your doing.
You have just demonstrated that you are impossible to have a meaningful discussion with, and that you will make every effort to not lose face instead of trying to learn. You have also demonstrated that you are willing to say absolutely idiotic things like "You can use hypothesis in all kinds of testing, not just property based testing, what's the deal? ". https://hypothesis.readthedocs.io/en/latest/ "Hypothesis is the property-based testing library for Python".
Sorry, I didn't realise "hypothesis" is the name of the library. I thought you used it as a regular, english word. I understand you meant "If you use hypothesis library, then it's property based testing"?
You haven't won any argument here.
I'm not here to win arguments.

#

What will you argue next? That "python" isn't really a programming language?
Straw man fallacy
You're now arguing against the common use of established terms.
Fundamental attribution error.
You are extremely arrogant, and refuse to listen,
Argumentum ad hominem.

proud nebula
# molten hollow > You are also arguing that you know better what mutation testing is than the au...

"He". The word you should have used is "you". And obviously I know that. I'm him ๐Ÿคฃ I have in fact found bugs using MT. So that falsifies your thesis above. It also does in fact help with better structure and it can show you places you need to refactor, again falsifying a previous statement you made in great confidence. It is pretty obvious you have never practiced MT.

hypothesis is a lib

Ok, but maybe the fact that the grammar doesn't make sense if you used the word in the normal sense should have made you confused enough to ask a question instead of being arrogant?

#

Also it's like the only PBT lib for python so if you had tried PBT at all you should know about it. Again: you have obviously read a lot of theory, and have much less practical understanding and experience.

molten hollow
#

"He". The word you should have used is "you". And obviously I know that. I'm him ๐Ÿคฃ I have in fact found bugs using MT. So that falsifies your thesis above. It also does in fact help with better structure and it can show you places you need to refactor, again falsifying a previous statement you made in great confidence. It is pretty obvious you have never practiced MT.
I actually practice mutation testing every week for my past couple of years; and everything I say in this channel is backed by practice.

I can agree that you found a bug while using mutation testing, but I doubt it was actually with mutation testing. Please, notice - mutation testing works by having a test suite, then you introduce a change in the software, and then you run the test again. The thing that mutation testing gives you, is it validates your test suite. I can agree that while doing that, you stumbled upon a bug and you fixed it? That works, but that's not due to mutation testing being used. That's due to having a test suite.

molten hollow
# proud nebula "He". The word you should have used is "you". And obviously I know that. I'm him...

"He". The word you should have used is "you". And obviously I know that. I'm him ๐Ÿคฃ I have in fact found bugs using MT. So that falsifies your thesis above. It also does in fact help with better structure and it can show you places you need to refactor, again falsifying a previous statement you made in great confidence. It is pretty obvious you have never practiced MT.
Good for you, but just because you created a library that can be used to exercise this idea, doesn't really give you authority about its merit.

#

Ok, but maybe the fact that the grammar doesn't make sense if you used the word in the normal sense should have made you confused enough to ask a question instead of being arrogant?
I'm sorry, but most of the things you mention in this channel are... calling for my concern.

#

You clearly have a bone with me. I think so, because you're using personal arguments all the time, instead of sticking to the subject matter. I don't have a problem with you, as a person, but I don't agree with part of the things you say. I'm capable of having a reasonable debate, but not if someone uses argumentum ad hominem.

proud nebula
#

You've made Ned visibly frustrated. That is extremely rare. You don't know his personality so you don't know what a red flag that is.

molten hollow
proud nebula
odd walrus
#

I didnโ€™t follow all that, is the assertion that bad tests can be written, therefore testing is not inherently valuable?

pulsar oracle
odd walrus
#

In the sense that itโ€™s pure validation that doesnโ€™t really inform the shape of your codebase

pulsar oracle
#

I'd of argued it's more like an exploratory test that is automated, send someone to go check if we're compatible with the latest external thing, if we're not go update our thing, do nothing, or go contact them to fix it, just now the exploration and getting that result is automated. But I wasn't really in this argument so idk.

river pilot
odd walrus
#

I thought it was just a belt and suspenders thing

#

But no in that case, itโ€™s exactly what your tests are for

#

For RubySpec we added a bunch of โ€œguardโ€ support so you could make tests not run on implementations that didnโ€™t support that etc

pulsar oracle
odd walrus
#

If you donโ€™t support 3.16 yet I donโ€™t see why it would be tested in master branch CI

#

That should be on the 3.16 support branch

river pilot
odd walrus
river pilot
odd walrus
odd walrus
#

I wouldnโ€™t actually be surprised to learn that Google has suites that take a probabilistic approach to deciding when to fail the whole โ€œrunโ€, given their scale

pulsar oracle
#

In my view an integration test tests your compatibility against either an network level mock of a system or a deployable/runnable thing that you want to fail the build if you're not compatible with, probably more specifically the actual thing.

odd walrus
#

I donโ€™t feel integration tests have any necessary thing to do with networks, Iโ€™ve written plenty that test CLI tool interactions etc

pulsar oracle
#

Fair point. It could involve integration with other applications or mocks of them at the command line or anywhere else they communicate for real.

river pilot
#

in my view (the view that started the whole discussion), categories of tests are talked about as hard-edged things, but they are often quite squishy. I'm fine calling these integration tests, or compatibility tests. You can also look at them as functional tests. But they are definitely some kind of test.

pulsar oracle
river pilot
#

people love to categorize things. It's useful sometimes to step back and ask, why are we categorizing them? How will the categories help us understand? Maybe we don't need to categorize as much as we do, or maybe we need different kinds of categories.

river pilot
pulsar oracle
# river pilot people love to categorize things. It's useful sometimes to step back and ask, wh...

I personally live by three and with more loose/pragmatic definitions. If I write a function that is simple enough in scope to just write something to a file or put something in a directory, is it a unit test if I bring in the filesystem? It's not just in memory so some would argue no but pragmaticicly I consider it a unit test because it's always there. An integration test for me has several meanings because of what other people refer to it as, like an end to end test, a functional test, etc. And then there's acceptance tests, aka functional. But I do think categories help when pragmatism is applied and I pretty much just translate/infer what people mean when they say one or the other. And then there's testing ideas out manually or manual testing and feedback though it doesn't fail anything

river pilot
#

and I'm not trying to say, don't categorize. I'm trying to explore why we do it.

odd walrus
#

To me itโ€™s always about coverage (real coverage not C0)

odd walrus
# river pilot what is C0?

The coverage you get from 'code coverage' tools, where it only knows which lines executed, not which actual semantics took place.

#

C1 is per-statement, C2 is like per-side-effect or something? I can't remember the exact hierarchy

river pilot
#

I don't know of a tool that does lines but not statements?

pulsar oracle
#

I think I do it to set the context of what I'm doing. If I say integration test it sort of indicates in my mind we're testing compatibility of some sort of real software, maybe bringing in testcontainers for a database or other application. The lines can get blurred but to me I always think about it because it sets the scene, acceptance test comes up because they prove that my application works, and I can stretch the definition and use synonyms like end to end test, or even integration if that's the goal someone is going for if it's integration with the client to the server to the database, and so on. If I write code that is supposed to get today's weather I'd unit test with a mock that code using that, unit making me think mocks of that sort, it proves that thing works but it doesn't tell me getting the weather from weather.com as we know it will work so I'd have to take my actual implementation and see if it works against the API or website data as we know it, testing integration. If we say acceptance I'm thinking how do we test the application as a whole? In my mind it comes up every day for these three.

river pilot
pulsar oracle
# river pilot this seems useful to me: the category helps set the goal and the approach.

It doesn't seem to me that everything can be categorized (at this point in time as we know it) and is black and white. In the scenario that you're not compatible with python 3.13 and it should fail the build, it's basically an integration test in purpose but running unit tests, and when it comes to writing it I can see how categorization might not really be helpful.

proud nebula
river pilot
odd walrus
proud nebula
odd walrus
#

But Iโ€™ve also never seen a full passing suite like that without major exclusions

odd walrus
proud nebula
odd walrus
proud nebula
#

It's a tool to fix your test suite, but running it in CI all the time is a huge waste of resources unless you are very careful how you do it and think about it deeply.

odd walrus
#

Just as needed to audit the test suite?

proud nebula
#

And highly selectively where it's critical only, or you care for some other reason.

thorny cave
#

whats this channel for

proud nebula
#

I run MT on iommi sometimes out of hobby level professional pride. But that's not extremely rational use of time :P

river pilot
proud nebula
odd walrus
#

Thanks

proud nebula
odd walrus
proud nebula
#

I mean.. just look at iommi, which imo would absolutely revolutionize web development if people embraced it. And I'm not seeing very many users at all :/

ember maple
proud nebula
ember maple
proud nebula
proud nebula
#

I really need to figure out when and how to point people read the Equivalence page. That is really key to make things click.

proud nebula
river pilot
proud nebula
proud nebula
proud nebula
ember maple
proud nebula
river pilot
twin shale
#

How I would like test decorators to work:

@test
@test.params(a=(1, 2, 4), b=(100, 150))
@test.params(a=(8,), b=(50, 100))
@def test_add_two_numbers(a, b):
   assert myadd(a, b) == a + b

And this would expand into 32 + 12 = 8 tests

river pilot
twin shale
# river pilot `@pytest.mark.parametrize()` does this

No, it does it In different way, using strings and no automatic cross-product as I've seen.

testdata = [
    (datetime(2001, 12, 12), datetime(2001, 12, 11), timedelta(1)),
    (datetime(2001, 12, 11), datetime(2001, 12, 12), timedelta(-1)),
]


@pytest.mark.parametrize("a,b,expected", testdata)
def test_timedistance_v0(a, b, expected):
    diff = a - b
    assert diff == expected
proud nebula
twin shale
#

That's what I have, per above ๐Ÿ˜Š

swift pewter
#
def params(**kwargs):
    def deco(fn):
        return pytest.mark.parametrize(
            kwargs.keys(),
            itertools.product(kwargs.values()),
        )(fn)
    return deco

?

twin shale
#

I'm mostly against the separation of the test parameter name and its values. Having a comma separated string is also not very convenient.

swift pewter
#

You don't have to comma-separate, you can also provide a tuple of strings

river pilot
twin shale
river pilot
river pilot
twin shale
#

Right, cross within one decorator, "addition" between

#

But I didn't know that the parametrize decorator cross produced at all, that's good to know.

#

On another note, can someone explain hamcrests's logo? ๐Ÿ˜…

river pilot
#

it looks like a person surfing down a pile of ham? Which makes sense for the name, but why the name?

spark thicket
#

Any good documentation for writing tests(Integration, E2E, Unit) in a none TDD architecture?

molten hollow
molten hollow
twin shale
river pilot
#

@twin shale i don't know of a test decorator that works the way you showed, but it seems like it should be possible to write.

molten hollow
river pilot
molten hollow
#

I did in the past, but I noticed that they can make your design weaker.

Imagine you have two cases, that appear similar at first - so you write them as one test, and parametrize it to "reduce duplication". But then, after working with the code a bit you discover they aren't really the same idea, so you should split it. Maybe you split them and have two tests anyway, or maybe you're lazy and leave the tests like that, but without the difference covered.

#

Real design is about organizing expected results, information flow, compartmentalization of the system, information hiding, separation of concern, and reduction of information. That's what gives your programs a real edge.

#

Just joining two test cases into one with parametrization is nothing but syntax sugar, not a very helpful one at that imo.

river pilot
#

Sure, it might be misapplied, but it's very common to have a dozen data scenarios for the same test. I wouldn't want a dozen tests.

molten hollow
#

You don't get any real benefit from including more; and if you do, that really means it's a different test case worthy of a dedicated method and a proper test name, because that's a different behaviour.

river pilot
molten hollow
#

Rule of thumb:

  • if it's one behaviour, one data input will suffiice, no need for parametrization
  • if it's multiple behaviours, it's better to split them into multiple tests, no need for parametrization
river pilot
#

i guess we'll have to disagree on this.

molten hollow
#
VARS = {
    "FOO": "fooey",
    "BAR": "xyzzy",
}


@pytest.mark.parametrize(
    "before, after",
    [
        ("Nothing to do", "Nothing to do"),
        ("Dollar: $$", "Dollar: $"),
        ("Simple: $FOO is fooey", "Simple: fooey is fooey"),
        ("Braced: X${FOO}X.", "Braced: XfooeyX."),
        ("Missing: x${NOTHING}y is xy", "Missing: xy is xy"),
        ("Multiple: $$ $FOO $BAR ${FOO}", "Multiple: $ fooey xyzzy fooey"),
        ("Ill-formed: ${%5} ${{HI}} ${", "Ill-formed: ${%5} ${{HI}} ${"),
        ("Strict: ${FOO?} is there", "Strict: fooey is there"),
        ("Defaulted: ${WUT-missing}!", "Defaulted: missing!"),
        ("Defaulted empty: ${WUT-}!", "Defaulted empty: !"),
    ],
)
def test_substitute_variables(before: str, after: str) -> None:
    assert substitute_variables(before, VARS) == after

If I understand correctly, all of these cases are different behaviours.

river pilot
molten hollow
river pilot
molten hollow
#

Let me show you how I would've created that method.

#

besides, you're missing some test cases, which I would've included.

river pilot
#

I'd be happy to add the missing cases.

molten hollow
river pilot
#

(well, one error case)

molten hollow
#

Parametrized case for one input. Interesting.

#

Well, still. Let me show you how I would've written that, and what cases are missing for me, given I would test-drive that.

#

Actually, the more I read those examples, the less I understand what it's actually supposed to be doing ๐Ÿ˜

river pilot
molten hollow
#

For example, I read those parametrized test, and have no idea what's the ? is doing.

molten hollow
river pilot
molten hollow
#

I'm glad to show you an example, but first I need to understand what your function does.

#

So, i I understand correctly, if the string doesn't have a placeholder, it's returned as is, correct?

river pilot
#

yes

molten hollow
#

Also, if there's a superfluous variable in the dictionary, that's supposed to be ignored or throw error for missing placeholder?

river pilot
#

ignored

molten hollow
#

okay,

#

format of the placeholder are either $Foo or ${Foo}, and it doesn't change anything, that's just notation, correct?

river pilot
#

yes

molten hollow
#

okay, ? question mark inside the braces means... what exactly? I can't tell.

#

Also, format ${Name-default} means you either read Name from the vars, or if it's missing, then insert the default, correct?

river pilot
#

yes

molten hollow
molten hollow
#

Something like that would be my tests:

river pilot
#

Can you pastebin that as text?

#

(also, have to be afk for at least an hour)

molten hollow
#

I sent a screenshot to say that the content of text isn't that big compared to your test, but you gain a lot of information and clarification.

#
def test_substitute_variables_in_text_with_their_values():
    text = substitute_variables('Hello $Name, ($Age)', {'Name': 'John', 'Age': '14'})
    assert text == 'Hello John, (14)'

def test_variable_has_shell_format__simple_placeholder():
    assert substitute_variables('$Foo', {'Foo': 'Bar'}) == 'Bar'

def test_variable_has_shell_format__braced_placeholder():
    assert substitute_variables('${Foo}', {'Foo': 'Bar'}) == 'Bar'

def test_simple_format__given_value_not_exists__returns_empty_string():
    assert substitute_variables('${Missing}', {}) == ''

def test_strict_format__given_value_exists__passes():
    assert substitute_variables('${Foo?}', {'Foo': 'Bar'}) == 'Bar'

def test_strict_format__given_value_not_exists__fails():
    with raises(Exception):
        substitute_variables('${Missing?}', {})

def test_default_format__given_value_exists__returns_value():
    assert substitute_variables('${Foo-default}', {'Foo': 'Bar'}) == 'Bar'

def test_default_format__given_value_not_exists__returns_default():
    assert substitute_variables('${Missing-default}', {}) == 'default'

def test_encode_dollar_sign__with_two_dollar_signs():
    assert substitute_variables('$$', {}) == '$'

def test_malformed_placeholder__double_braces__is_not_substituted():
    assert substitute_variables('${{Foo}}', {'Foo': 'Bar'}) == '${{Foo}}'

def test_malformed_placeholder__non_letter__percent_sign__is_not_substituted():
    assert substitute_variables('${%Foo}', {'Foo': 'Bar'}) == '${%Foo}'

def test_malformed_placeholder__non_letter__digit__is_not_substituted():
    assert substitute_variables('${5}', {'5': 'Bar'}) == '${5}'

def test_malformed_placeholder__not_closed_brace__is_not_substituted():
    assert substitute_variables('${Foo', {'Foo': 'Bar'}) == '${Foo'
river pilot
#

We can agree to disagree on this point

molten hollow
#
  1. First test shows an example of what the function is for. Reader knows exactly how to use that function and what can it be good for.
  2. Each case is described with given input, no need to include "Default", "Strict" into the test data. test names should be test names, test data should be test data.
  3. Passed in values are specifically design for each scenario. Also, minimal values are inserted into the function to illustrate the behaviour.
#
  1. You can easily copy each of those methods, and adapt to your needs. With parametrized test, if you'd like to do something slightly different, then that's not so simple.
#
  1. Case of superfluous arguments is now explicitly tested.
#
  1. Test names specify what needs to happen: "is_not_substituted", "returns_default", "fails". Even if test data isn't descriptive enough, test names will tell you what the intent behind the test is.
#

Now, I propose to you @river pilot show any programmer your test, and my test, and try to guess which version will be easier to understand for him.

#
  1. With tests like that, you don't really need a test doc, because the tests contain all the information about the function you need. Plus, tests are actually executed and asserted, while function doc, it's possible the become out of date.
#

BTW, this question mark, wouldn't it make more sense to be other way around? Like ${Foo?} should returns empty string, and ${Foo} would throw for the missing value? ๐Ÿค” Just a suggestion.

twin shale
# molten hollow Wouldn't you get just the same coverage by having explicit tests?

I'm not sure I understand the difference except you would just write a lot more boilerplate or duplicated code if you write them as separate tests. And it would be much harder to get an overview of what you test. It's really unfeasible method to not use parametrization for some scenarios.

Having 1 test test expand to >100 is not uncommon.

twin shale
river pilot
#

@molten hollow thanks for the examples. I find long test names like that hard to read. I'd rather have a comment than underscored sentences. The question mark behavior is borrowed from the shell: this function implements a subset of shell variable expansion behavior. That is mentioned in the docstring.

twin shale
#

Sometimes a test just test that two implementations give the same result.

Sometimes a test has no documentation value but is just there to avoid someone accidentally making a mistake.

Sometimes you need test coverage with can mean testing each bit in a 64-bit integer.

Sometimes test space is just so big that you can't feasibly write manual tests to cover it.
Sometimes sweeping or exhaustive testing is not an option either - it would take too much time to cover. Here random testing is the way to go.

twin shale
molten hollow
#

My point in all of that, is this:

  • parametrized testing are just a syntax sugar for multiple testcases. You don't lose much by spliting the parameters into multiple test cases.
  • good tests are descriptive - they should tell you what you're after. Parametrized test tend to hide that, while explicit tests tend to express that
  • if it's hard for you to write multiple test cases, try setting up snippets/live templates in your IDE, that you can type just "test" + Tab like in PyCharm, that will insert a test snippet.
  • it's better to understand specific behaviours, if they're explicitly stated,
  • with parametrized tests, you need to conform all your test-cases into one form with parameters, with explicit tests, you're free to express them in such a way that makes it clear for the reader.
#

thanks for the examples. I find long test names like that hard to read. I'd rather have a comment than underscored sentences.
@river pilot No problem, you can shorten the names if you'd like:

def test_substitute_variables_in_text():
    text = substitute_variables('Hello $Name, ($Age)', {'Name': 'John', 'Age': '14'})
    assert text == 'Hello John, (14)'

def test_simple_placeholder():
    assert substitute_variables('$Foo', {'Foo': 'Bar'}) == 'Bar'

def test_braced_placeholder():
    assert substitute_variables('${Foo}', {'Foo': 'Bar'}) == 'Bar'

def test_value_not_exists_returns_empty():
    assert substitute_variables('${Missing}', {}) == ''

def test_strict_passes():
    assert substitute_variables('${Foo?}', {'Foo': 'Bar'}) == 'Bar'

def test_strict_fails():
    with raises(Exception):
        substitute_variables('${Missing?}', {})

def test_default_returns_value():
    assert substitute_variables('${Foo-default}', {'Foo': 'Bar'}) == 'Bar'

def test_default_returns_default():
    assert substitute_variables('${Missing-default}', {}) == 'default'

def test_encode_dollar():
    assert substitute_variables('$$', {}) == '$'

def test_malformed_double_braces():
    assert substitute_variables('${{Foo}}', {'Foo': 'Bar'}) == '${{Foo}}'

def test_malformed_percent_sign():
    assert substitute_variables('${%Foo}', {'Foo': 'Bar'}) == '${%Foo}'

def test_malformed_digit_():
    assert substitute_variables('${5}', {'5': 'Bar'}) == '${5}'

def test_malformed_not_closed_brace():
    assert substitute_variables('${Foo', {'Foo': 'Bar'}) == '${Foo'
#

You can add comments to it if you'd like, but please note that these test cases, they' aren't all the same structure.

#

The way I see it, turning that back into parametrized values would lose information that is otherwise helpful.

#

Some people might say that you can save 4 or 6 lines, by condensing them back into parametrized test cases, as if it's the lines of code that makes it hard to maintain. It's not. It's the thinking. The harder it is to think about the function, the harder it is to maintain.

#

And it's way easier to think of these behaviours if they're separate test cases like that.

#

But! There are things you can do to make them better:

  • you can brainstorm the design with other programmer (pair programming)
  • you can try to test-drive the implementation - lose coupling, and behaviour driven (tdd)
  • you can try to reimplement the same thing the next morning; chances are you're gonna come up with better functions
  • try to reimplement the same functionallity in other programming language, just to shift your mindset. It's not uncommon to come up with different solutions when changing perspective like that
  • try to explain the function to non-programmers; sometimes non-programmers tend to ask questions which will change your point of view drastically, giving you a chance to redesign your approach
  • Try to only implement the features your program needs. For example, if your other functions only use a fraction of those features, try to keep them and remove the rest; chances are a simpler solution is waiting for you

As you can see, there are a lot of approaches which would improve the overall quality of the function and tests, and parametrizing inputs IMHO just isn't one of them.

molten hollow
pulsar oracle
#

It's probably a preference thing in terms of this project. @river pilot said he personally finds them easier to read this way and I imagine the other people working on it do too.

I'd personally write them like yours and use the function names to communicate exactly what is being tested to make it all the clearer, and I'd probably use TDD to do it and organize by one clear thought or behavior at a time.

twin shale
molten hollow
river pilot
#

I can see why some people prefer the separate tests. I think it's a tradeoff, and we are choosing differently. To me, it's easier to see the behavior being tested in the compact parameterized form.

molten hollow
# river pilot I can see why some people prefer the separate tests. I think it's a tradeoff, an...

I supposed you might think that, because you wrote the thing. Someone who doesn't know the function, might have a different opinion. I'm wondering if you left the project for a year and came back to it, after forgetting what was there, would you still prefer the parametrized one or the split one. I would wager that you could get back to it quicker if it was split. Maybe we'll get to settle the wager one day in the future ๐Ÿ˜„ Who knows.

twin shale
#
@params(x=range(64))
def test count_set_bits(x):
    val = 1 << x
    assert count_set_bits(val) = 1
river pilot
molten hollow
river pilot
molten hollow
molten hollow
twin shale
#

Oops, yes

molten hollow
#

Okay, so this test.

@params(x=range(64))
def test_count_set_bits(x):
    val = 1 << x
    assert count_set_bits(val) == 1

@twin shale And please tell me, what is the intention behind this test? Are you trying to test-drive the count_set_bits() function?

twin shale
#

I want to the test the implementation. What do you mean by test drive?

molten hollow
#

Because to me, it would seam that the only valid implementation of count_set_bits() is actually the code you have in your test, which is val = 1 << x. At which point, the test you have just checks that the code that you wrote is the code that you wrote.

pulsar oracle
# molten hollow That's true, but yet: - I see very often code and comment where they disagree -...

Both can suffer the same problem I think. But it's probably down to how detailed the names or descriptions are, and how likely behavior is to change. If you have an addition function and you start with a docstring it could be a pretty fine source of truth at least for how it should be. I read somewhere from Oracle for Java and documentation that the docstrings (or whatever they're called there) should lay out a testing plan sort of.

twin shale
#

Or fpga. Or at least not python code.

molten hollow
#

Oh, so you're trying to test hardware?

twin shale
#

Why not? ๐Ÿ™‚

molten hollow
#

So let me understand, you're creator of the hardware and trying to test it using pytest?

#

Or are you a user of a hardware and just need to check whether it works?

molten hollow
twin shale
twin shale
molten hollow
#

I'm approaching this whole debate from the perspective of a software developer, who uses unit tests to improve the quality of the software ๐Ÿ˜„ I wasn't aware we're migrating from it into the hardware world.

#

To me, parametrizing tests, when it comes to application development, has the same flaws as using for-loops in tests.

twin shale
#

I don't see any reason software shouldn't be equally black box tested as something actually hardware

molten hollow
#

I have a feeling we're migrating away from the original topic, which was parametrized tests ๐Ÿ˜„

river pilot
twin shale
#

But I'm not sure where the domain of "unit" testing ends.

twin shale
#

Imagine being an architect and then testing both the blueprint and the delivered physical house

river pilot
molten hollow
#

Well, I'm sure I entered the debate with parametrized tests when it regards software development (like applications, tools, functions). How you're supposed to develop hardware, is something I don't know much about.

molten hollow
river pilot
molten hollow
#

he said it's a hardware chip.

twin shale
river pilot
twin shale
#

Or the implementation might be super complicated, and you just want to assert some high level stuff

river pilot
#

or the implementation is actually in hardware, but you access it through a Python wrapper so you can use pytest. It's black-box, we don't know.

molten hollow
#

let's say it's a Python function.
If it's a python function, and he wrote a test like that:

@params(x=range(64))
def test_count_set_bits(x):
    val = 1 << x
    assert count_set_bits(val) == 1

Then, to my eyes, the only feasable implementation of that function is the code val = 1 << x itself. So you have a test that just duplicates the implementation.

river pilot
molten hollow
#

I mean, it could be. You don't know the implementation
Or the implementation might be super complicated, and you just want to assert some high level stuff

or the implementation is actually in hardware, but you access it through a Python wrapper so you can use pytest. It's black-box, we don't know.

Well, from the perspective of a software developer, I will say it makes a difference whether it's you who's responsible for creating and maintaining that function, or whether it's something off-the-shelf that you don't maintain. Like, these require two different approaches to testing.

proud nebula
#

Dnaron being dnaron

river pilot
molten hollow
pulsar oracle
#

@twin shale I haven't fully been following this as i've been fixated on initial the paramterization argument. I think that in the case of checking if specific bits are set (1-64) having one test with parameterization that covers that range is valid. I wouldn't write them all indvidually personally, but I would do a few specific behaviors with it. I'm NGL I probably don't understand what's being tested, but assuming we're testing a bit counting function, i'd do something like this (ignore missing implementation, I'm too dumb for this right now).

def count_bits_set(number: int) -> int:
    """
    Takes a number and counts the number of bits that are 1 in it,
    returns 0 if None are set, otherwise the number of them on.
    """
    return number.bit_length()

class TestCountBitsSetFunction:
    def test_should_report_zero_bits_set_for_number_zero(self):
        assert count_bits_set(0) == 0

    def test_should_report_one_bit_set_for_number_one(self):
        assert count_bits_set(1) == 1

    def test_should_report_right_number_of_bits_set_for_reasonable_range_of_single_bit_set(self):
        # implementation with for loop or paramterization here

    def test_should_report_i_do_not_know_how_many_for_negative_one(self):
        # This is python, I genuinely have no clue.
        pass```
molten hollow
#

I think we're very off topic, because the whole point started from a question, posted by someone who has no interested in hardware and bits, at all.

molten hollow
#

Well, maybe I acted too roughly. I assumed you were creating a software system, and saw you tried to use parametrized testing, and I tried to advise you against that.

twin shale
molten hollow
#

But seeing you're creating hardware, that may be valid, I don't know.

river pilot
molten hollow
#

I guess it might be appealing only in narrow situations, when you think counting lines of code makes a difference that much.

#

But seeing how you're creating hardware, I don't know, you're trying to cover the whole range of inputs? I guess that makes sense.

#

I'm sure I would never test software like that, by "covering the whole space of inputs". To me such test would be redundant, and thus harder to maintain.

river pilot
molten hollow
pulsar oracle
twin shale
#

I'm definitely not counting lines of code. In my mind I'm very pragmatic in this. It might that the cases a haven't appeared. But I do think it's needed in some cases. I agree not all tests should be lumped together as parametrizations. There is a sweet spot.

river pilot
molten hollow
twin shale
#

For example:

Write 10 tests with 1 test data input each.
Or wrote 10 tests with 100 test inputs each for 5% more time spent?
It might be extremely good value. Time is a scarce resource.

pulsar oracle
# molten hollow Yea, I don't think so. I did use parametrized tests in the past, as a software d...

Here's an example of where I used paramterization the other week where I don't think I'll regret it:


class TestDiscordSnowflakeFunctionResultBits:

    @pytest.mark.parametrize("test_value", [0, 1, 4095])
    def test_should_have_given_increment_value_in_bits_1_to_12(self, test_value: int):
        snowflake = make_discord_snowflake(increment=test_value, worker=9, process=2, timestamp=2)
        assert snowflake & 4095 == test_value

    @pytest.mark.parametrize("test_value", [0, 1, 31])
    def test_should_have_given_internal_worker_value_in_bits_13_to_17(self, test_value: int):
        snowflake = make_discord_snowflake(increment=5, worker=test_value, process=2, timestamp=2)
        internal_worker = (snowflake >> 12) & 31
        assert internal_worker == test_value

    @pytest.mark.parametrize("test_value", [0, 1, 31])
    def test_should_have_given_internal_process_value_in_bits_18_to_22(self, test_value: int):
        snowflake = make_discord_snowflake(increment=5, worker=5, process=test_value, timestamp=2)
        internal_process = (snowflake >> 17) & 31
        assert internal_process == test_value

    @pytest.mark.parametrize("test_value", [0, 1, 4398046511103])
    def test_should_have_given_timestamp_value_in_bits_23_to_64(self, test_value):
        timestamp = make_discord_snowflake(increment=5, worker=5, process=0, timestamp=test_value)
        assert timestamp >> 22 == test_value
twin shale
molten hollow
# river pilot it seems like you're ignoring parts of the discussion: it counts the number of 1...

Okay, so the thing I'm cautios about is that there is this effect in programmers, where we often do stupid things ๐Ÿ˜„ me included:

  • we code stuff that's not needed
  • we forget stuff
  • we create the first thing that pops into your head
  • we delay feedback in learning
  • we do complicated stuff, instead of simple
  • we sometimes do something complicated because we want to feel proud of it

we do all that , because we're human. Good engineering pracitces allow us to overcome those issues. I try to do that.

So when I need to create some software, I try to test-drive it (practice TDD), to not allow myself to create more code than necessary, and make sure it's simple enough, and not overly complicated.

#

And in order to do that, I try to start from tests, to make sure I don't fall into those traps.

twin shale
river pilot
#

we don't have to talk about this if you don't want. I've asked four times, and you aren't answering.

molten hollow
#

Because it could be an x/y problem.

#

I would go from higher level tests, to lower level,

#

if, by test-driving it i would find myself needing that function - great.

#

if not, i would not write it at all, and thus not have a need to test it.

#

but!!

#

Let's say we vierifeid it ๐Ÿ˜„

#

And I verified that I need it.

#

And I'm going to write it and test it.

#

And it's me who's creating that, not 3rd party.

#

Here's how I would do it:

twin shale
#

Ok but you are missing a HUGE shortcut here. The task IS to count bits.

molten hollow
#

Here's how I would test-drive it (because I'm creating that function, correct)?

#

I would write my tests in such a way, that allow me to learn.

#

And also, I know I will make mistakes, so I need to write tests in a way that allow me to make progress in iterative steps, and when I found out where I am wrong, I can correct it.

#

I will start simple

#

Count bits. Let's say there are no 1 bits:

#
def test_there_are_no_1_bits():
  assert count_bits(0) == 0
#

That's very simple, I can implement that very simple,

#

then, let's say there is 1 bit on the first position

#
def test_there_is_1_bit_at_first_position():
  assert count_bits(1) == 1
#

That's also very simple to implement,

#

then, what's the next simplest thing after that? 1 bit on the second position

#
def test_there_is_1_bit_at_second_position():
  assert count_bits(2) == 1

and also two bits

def test_there_are_2_bits():
  assert count_bits(3) == 2
#

Also, handle the other cases

#
def test_handles_signed_integers():
  assert count_bits()  # here, put in information about whether signed integers are handled
#
def test_fails_for_unsigned_integers():
twin shale
#

I've already all 64 bits, what are you wasting your time with? ๐Ÿค”

molten hollow
#

Also, put in floats,strings and None, assert that the function behaves properly

#
def test_function_fails_for_none():
  with raises(Exception):
     count_bits(None)
#
def test_function_does_not_count_bits_for_floats() # or does? i don't know, you tell me
  assert count_bits(0.00) == 0 # what's the expected outcome here?
#

You see, i'm not treating these tests just as a regression test suite,

#

I'm treating it as:

  • a tool to learn
  • to assert what I already know
  • what I want my function to do
twin shale
#

Just assume input data is 64 bits ๐Ÿ˜Š

molten hollow
#

and also to:

  • gather feedback on ym design
molten hollow
#

That's the whole point, don't do "just assume". Assert that as an executable specification.

#

If that behaviour is not met, the test suite should fail.

twin shale
#

There is no other input except 64 bits. Don't worry about other inputs, they are not represntable.

molten hollow
twin shale
#

The compiler or testing framework would crash on type error

pulsar oracle
molten hollow
#

You just specify the important cases that are enough to understand the function as a whole.

#

If we're talking about the software system, of course.

molten hollow
# twin shale And then?

And that's enough to have good tests - as long as we're talking about the software system.

#

For hardware, as I said, I'm no expert, so I will not answer that.

twin shale
#

They way you are doing it you get some examples. But some things need to be exhaustively tested.

molten hollow
#

Let me tell you that:

parametrized tests are a way to explore an input space. In software development, I will argue you never need to do that. In hardware, maybe you do? ๐Ÿค” I don't know.

pulsar oracle
twin shale
#

It's not about exploring or making examples of the behavior. It's about verifying.

molten hollow
#

@twin shale We're talking about software development still? Or hardware?

twin shale
#

Sure

molten hollow
#

Then I will argue you never need to explore an input space like that.

twin shale
molten hollow
river pilot
#

I don't understand how you can advocate for blackbox testing and then say that hardware and software would need different testing of the same algorithm?

molten hollow
#

And for hardware, I don't know how they test and maintain that. I just don't know, so maybe they need to explore the input space? I just don't know how they do it.

pulsar oracle
# twin shale Indeed. If you make too high level or too parametrize tests, you might miss some...

If you think about it from a TDD perspective it becomes natural, and I say this and have to admit I don't always use TDD. But if there's a problem that is new to me, or annoying to write, sometimes with bit math, I always do. I'd take it incrementally for learning like @molten hollow said. Does it count 0 bits correctly? Does it count 1? Thinking of it now, what's negative one supposed to be, what should happen then? (in your case if you don't expect anything besides 0-64 maybe that's fine). So naturally I'd get those specific examples, then I'd wonder, does this actually work for all the ones it needs to, or a reasonable set to prove that it will, which is where paramterization or a for loop would come in.

twin shale
molten hollow
river pilot
twin shale
pulsar oracle
# twin shale Doesn't matter what the bits represent. Signed or unsigned integer or float or u...

Ahhh. I see back to your specific example. You're taking a function to count how many bits are on in memory, and then testing if the bit shifting logic works on the hardware as expected, and doing parameterize to test a bunch of values at once. In an environment where at runtime or compile time it'll most likely be impossible to use anyway. We assume that function already works by the time that test code is executed.

molten hollow
#

Well, here's the thing:

  • if you only care about achieving your goal, which is getting your data sorted, you could just write one test
def test_my_pokemons_are_sorted():
  assert pokemons(['Pikachu', 'Alakazam']) == ['Alakazam', 'Pikachu']

and you implement that using built-in or off-the shelf sorting in your programming language. You don't need to reimplement it, you can just use what's there. That's one case. But I know that's not what you're after.

  • if you really want to create your own sorting algorithm, how would I say you need to approach it: I would say you need to test drive it. Drive the implementation of that sorting algorithm in test. You're not going to invent this as a big idea in your head, you will need to iteratively design it and come up with it, solve the edge cases, work on it. So I think you should use tests as a stepping tools to it. How exactly those test cases would appear is up to the creator of the algorithm, because the tests necessary depend on the nature of that algorithm. So in order to create good tests, you need to know how the algorithm works. If you want "black box", then just go with the first approach with one test.

There is even an example in "Clean Craftsman" by Robert Martin, where he uses tests like that to write quick-sort, if you're interested. And he doesn't use parametrized tests ๐Ÿ˜„

twin shale
river pilot
molten hollow
#

I will say, that's the reason I dislike topics in #unit-testing channel ๐Ÿ˜„

#

Because one person asks something (like @river pilot in this case), I answer, and now @twin shale doesn't agree, so I answer, and now @river pilot doesn't agree, and I'm constantly between two people ๐Ÿ˜„

twin shale
molten hollow
#

If you're writing your own, then you need more, obviously, as I described above.

twin shale
#

I prefer "gray box testing" (I'm not sure that's a used terminology): Test it as a black, even though I know the internal working. And ALSO spend extra effort testing the parts I know are more complicated (and likely to contain bugs) more thoroughly.

molten hollow
#

But you get way better results, if you write tests first, and code after.

river pilot
molten hollow
pulsar oracle
molten hollow
#

But if you want to test that your function (like the one with pokemons), returns values sorted, then you need a test to test-drive that usage of sorted() function.

twin shale
molten hollow
#

But you see, it's not your job to test the tools you're using.

pulsar oracle
#

Although, i'm not familiar with sorting algoirthms tbh, don't most of them have the same behavior but just do it faster/slower?

molten hollow
#

there can be bugs in ifs, fors, variables, compilers, all that.

twin shale
molten hollow
#

If you're working on that software, sure.

#

But I guess we're working on our own projects, mostly. And for that, we don't need to test that kinds of things.

twin shale
#

Sure

molten hollow
#

@twin shale are you writing your tests first?

#

Or code first, and then test that?

twin shale
#

It varies. But mostly it's a team effort and we do both in parallel

molten hollow
#

Would be awesome if 100% was test-first.

#

Most of the problems with testing just disappear, if you test-first.

twin shale
molten hollow
#

Okay, my point being, the resulting software often is better designed if you do test-first.

#

Hence, I advise it to people.

#

Unless I'm wrong, in which case I'd be happy to hear a counter-example.

pulsar oracle
molten hollow
#

Oh, yes! I saw that.

pulsar oracle
#

I think it's possible to write testable well architected code without tests at all

#

But I personally wouldn't do no tests.

molten hollow
#

Yes! Interesting observation.

twin shale
molten hollow
#

@pulsar oracle So because he internalized writing testable code, he got the benefits from testable code and lose coupling, without needing test first.

#

Very interesting video, I agree.

twin shale
#

Pragmatic programmer will be read hopefully starting this year

proud nebula
#

There's also the situation where you don't know where you are going. TDD can be a massive waste of time then.

molten hollow
#

However, my internal sceptic about this, because I bet he created that application in a familiar technology, and probably familiar domain and ecosystem.

#

Would he have achieved the same results, in a new programming language, new framework, new eko system, new domain? ๐Ÿค” Now that would be interesting to see!

#

Maybe he would, who knows.

pulsar oracle
# molten hollow However, my internal sceptic about this, because I bet he created that applicati...

Very early on I learned when in doubt, write testable code. I think that some people are less error prone than others, maybe off by one errors or paying attention is more likely, familiar with the language, etc. If we're taking just same language. For me, I try to avoid frameworks and if I'm using one I'd separate important logic and build it separately (can't imagine not having tests to get feedback but the design is always pretty solid).

molten hollow
#

@pulsar oracle Are you in Dave's Farley discord server?

pulsar oracle
#

I'm not. I didn't pay for the patreon or anything, I didn't even know they had a discord server. I'm just a massive fan of the channel, and now lmax, and Martin Thompson, and the extended universe ๐Ÿ˜ญ DaveFarley

molten hollow
twin shale
pulsar oracle
molten hollow
#

You do have some starting points.

#

There are always ways to assert what you already know, and there are ways to take slices.

#

Heck, I did TDD in languages I didn't even know yet.

#

Some time ago, I started to learn Rust, never seen that thing in my life, and my first line was a test.

pulsar oracle
#

I think it is exceptional wherever you know what you want and not how to get it. You get to specify the perfect thing then go make it happen and get feedback. So any new language it's the first thing I'll aim for If I need something done.

molten hollow
#

you know what you want and not how to get it.
Isn't that always the case? I'm not trying to be argumentative, but what are cases where you don't know what you want?

pulsar oracle
#

But If I start with zero clue how to approach or test something I just want to get my hands dirty and see where things will go.

molten hollow
#

I do that do, get my hands dirty with something new.

#

Last week, I started to create a platformer game with new game library.

#

I didn't know, and didn't know what it can do.

#

I just wanted to learn that library.

#

I started with a test.

pulsar oracle
#

Maybe it's a mindset problem. I was writing a harness for a discord bot the other week, I didn't know it was possible or how to approach it outside of an existing library doing it. But I had almost no clue how to test it or what the interface should look like, I just jumped in sort of making a domain model of guilds, users, in memory state, and just building up wayyy too high, figuring stuff out. I absolutely could have done this with TDD, it's not figured out and I am redoing it from scratch with it now and making better progress. But It wasn't clicking in my mind, I wasn't thinking how I'd test it, it was too much at the start, and required so much thinking.

#

Once I had an interface/design/sense of where it's going I could start driving development like this. There was also a ton of data i had to put in, stuff i had to look up, this time I started it minimal and got chatgpt to generate it and filled it in, testing event handling with a helper class.

twin shale
#

Is the top half of this picture the result of TDD? ๐Ÿคก

molten hollow
#

Not sure why would someone ridicule tdd like that

river pilot
# molten hollow Not sure why would someone ridicule tdd like that

i think it's because TDD is often explained in stark terms that don't match reality. TBH, it's something you've been doing here. If you like, explain how you would use TDD to test an IsEven function. Don't get distracted by whether we need that function, etc. Just: how would you test it?

limpid raft
#

I am using pytest-ruff. Is there a way to disable E401 just for when Pytest runs Ruff, i.e. I don't want tests failing, just because the import list is not sorted. That will be enforced as part of pre-commit, but I don't want it checked each time pytest is triggered from inotify while I am making changes to the code

swift pewter
limpid raft
karmic viper
#

I would like to know if BDDs (behave) scenario based testing is used widely in the industry standards? Or pytest based unit testing is sufficient?

proud nebula
#

Yea, from a casual look behave looks the same. A bunch of English that is never checked but is asserted as fact. How could you possibly trust that?

karmic viper
proud nebula
river pilot
proud nebula
#

I also worked at a place that had some. We removed it. There was another team at the same company that had more than we did. They also removed it for the same reason: it was a cost for no gain.

pulsar oracle
# karmic viper I would like to know if BDDs (behave) scenario based testing is used widely in t...

As a developer and for the practical aspect you'd probably be more interested in acceptance test driven development. That's where you write tests that ideally use the language of the problem domain to test the entire system. The level of abstraction can vary, they don't have to be for the business, if you're building an HTTP server for example, they're obviously largely not, and you get more feedback about weather your application is fit for release in continuous delivery terms. If you're developing a server. You can use pytest, unit test, any testing framework. BDD for what most developers who use it care about is just end to end testing happening to have each step written in gherkin and wired up to code and it's a pretty bad way to do it.

pulsar oracle
# river pilot i have not seen BDD being used much in practice. One place I worked had some of...

You're supposed to write it in a way that says what should happen while leaving how separate. The developers are supposed to write the tests or even both but anyone reading can see an example usage of the system and be like "yea, that's right". The same way a developer can see a good unit test with asserts and be like yea, that's what that function is supposed to do. The idea originally formed by Dan north as a way to describe TDD to developers without mentioning the word test, where test case classes are specifications and individual tests are scenarios iirc. But you don't need gherkin at all to do it and Dan North just comments the given when and then parts among normal pytest code.

river pilot
pulsar oracle
muted lichen
#

Ive seen BDD be attempted before, with plenty of frameworks. Robot Framework was the worst by in large.

#

You get to a point where you have to write so much custom code, you think to yourself, "what am I doing here?"

#

I'll also say I work in a place that's heavily requirements based. Lots of IBM Jazz and DOORS. Its god awful. We're supposed to be capturing tests into those requirement systems and it just doesn't happen. its just there to lookup the customer signed off requirements but nothing ever goes back in

#

Part of the issue, the requirements are setup for programmatic access and even if they were the language used in the various processes has diverged so much from the requirements (e.g. dual use of a word) that its almost impossible to keep aligned

pulsar oracle
# muted lichen Ive seen BDD be attempted before, with plenty of frameworks. Robot Framework was...

It's also been attempted at the lmax exchange without any frameworks. Everyone wrote the tests and specifications using normal junit and an internal DSL. The business analysts would be sat down with an IDE to write the tests and there would be massive reusability with methods like register, login, create an instrument, wait for something to happen, verify an email was sent, etc. And every developer for every feature or bug fix would create an acceptance test even independently of stuff like user stories. Maybe this BDD stuff is error prone in practice like agile, but there's a huge practical part of it being missed where it's a synonym for acceptance test driven development, testing that the entire application is fit for release, usually in terms of the business with the same terms and language (in any programming language).

https://github.com/LMAX-Exchange/Simple-DSL/wiki

GitHub

Utilities to write a simple DSL in Java. Contribute to LMAX-Exchange/Simple-DSL development by creating an account on GitHub.

bronze quiver
#

In main.py:

from voiceconversion.RVC.RVCr2 import RVCr2
...
def initialize():
...
    some_var = RVCr2(settings)
...

In test_mytest.py with pytest:

from myapp.main import initialize

I'd like it to use MockRVCr2 (that's implemented in mock_rvcr2.py) instead of the real RVCr2. How to do that?

It's possible to override before the import, but all the linters are unhappy. Is there a cleaner way?

mock_module = types.ModuleType("voiceconversion.RVC.RVCr2")
mock_module.RVCr2 = MockRVCr2
sys.modules["voiceconversion.RVC.RVCr2"] = mock_module
river pilot
#

Where do you call initialize? You should mock things where they are used, so you want to patch main.RVCr2

bronze quiver
river pilot
bitter wadiBOT
#

Your paste is too long, and couldn't be uploaded.

river pilot
#

please delete this.

deft vigil
#

for ?

river pilot
# deft vigil for ?

it's at the very least off-topic for this channel, and obfuscated code is usually suspicious. Please delete it.

gritty oracle
#

aha

#

ok sure

river pilot
deft vigil
#

and yea its test encoded script

river pilot
deft vigil
#

i swear its testing for training

river pilot
potent quest
#

hi i just got into fuzzing and may have overfuzzed some functions
how do you not do that?

proud nebula
potent quest
#

That now my test suite of very basic methods takes about 3 minutes to ocmplete

proud nebula
potent quest
#

Yeah, am working on lowering the fuzzing inside my test suite

#

every time i run it i find more bugs so its actually so far been useful

proud nebula
#

No, you missed my point. There should be literally zero fuzzing done in the test suite itself. You run that separately once in a while to find tests to add.

potent quest
#

Hmm. Fair....

#

probably need to refigure out how to use hypothesis

proud nebula
#

Think of it as programming itself. You don't "do programming" while the function runs in prod. You do it before :P

proud nebula
potent quest
#

what is the difference ๐Ÿค”

#

I've calmed my testing somewhat but its not great yet

#

94% coverage though

#

which is excellent

proud nebula
#

But yea, PBT is commonly thought of as a form of fuzzing. But so is Mutation Testing.

#

And those are VERY different

potent quest
#

interesting

#

what is mutuation testing?

proud nebula
#

It's a method to find what behavior your tests don't test.

potent quest
#

Ahhh

proud nebula
#

It can't find behaviors your code doesn't have but should have though. PBT can sometimes help with that.

potent quest
#

Yeah I should probably use a tad of mutation testing

proud nebula
#

I'm partial towards MT personally. PBT is hard and seldomly applicable imo. While MT is a ton of work and always applicable.

potent quest
#

.gh repo onerandomusername ghretos

#

I just shut my computer down otherwise I'd make some other changes

river pilot
potent quest
#

huh

river pilot
pulsar oracle
# river pilot it's hard to write a blog post about mocking without pulling in pages and pages ...

I could have tested this example without mocking and without it being finicky. If I have that function I'd want to know that my settings are loaded correctly from a settings settings json file in a directory, just not making it explicitly the home directory.

I don't get why we want to avoid opening a real file, it's basically exactly what you want to test and you don't need to mock. Most people can afford it and there's the tempfile.TemporaryDirectory module and the superb standard library for working with paths (os path join and so on). I personally prefer to put as much as I can in an area it can be tested to assure theres less of a chance for it to go wrong on user error.

river pilot
pulsar oracle
# river pilot i dont understand. how would you test it without creating a file in the user's ...

I meant I'd change it to still search in a directory and load json from a file (that function specifically, the others I'd probably change to use a loaded version of the settings and not care where from), and just change the directory, then for testing, I'd put like /tmp/wherever and it would load from /tmp/wherever/settings.json, and I'd know when given the home directory it would load it pretty much as expected. No mocks.

river pilot
pulsar oracle
# river pilot right, a kind of dependency injection

Exactly, though I'm personally hesitant to call it that with primitives, it doesn't bring up the right idea in my head (very arbitrary tbh). But yea inverting control of where the path for the directory containing the config comes from.

#

But usually people use dependency injection to get out of testing anything real and in my experience (in regards to anything I want to find out that stuff actually works) it just moves around where I have to test stuff at.

river pilot
#

if your point is "why use mocks at all", then this is what I meant above when I said, "it's hard to write a blog post about mocking without pulling in pages and pages of advice about how to write better tests."

pulsar oracle
ember maple
#

thesedays i avoid mocks+monkeypatches if i can - allowing for dependency injection and validated fakes is so much more joy

random sorrel
#

I'm doing procedural generation, I'm using files as input and both random.seed and np.random.seed are fixed. Output keeps changing. Any obvious ideas I'm missing?

river pilot
random sorrel
#

no, it's not public, thanks though, I'll try to go step by step and see where things start changing.

river pilot
random sorrel
random sorrel
# river pilot when you find out, let us know.

I was recording timings for functions for optimization purposes and put that into a dict and returned that. Obviously the timings are always slightly different and that changed my control hash.

The other thing I found before that that started the whole thing was that I didn't have the numpy seed set, so that's probably the first thing I fixed and then I got stuck on this other "problem".

thin imp
#

Yoo who is active??

river pilot
timber anchor
#

And you do not need to use gherkin or even a BDD test framework to do BDD. You do not even need a unit test framework to do TDD.

If there is some confusion with the middle layer of making your test cases, this is not really a testing problem. More of an organization one now.

Meaning, the design was probably always bad. And it also exists in the prod code, not just test code.

river pilot
timber anchor
#

If you are doing input validation, that is fine to test, but you have to name and test the case for that input validation. But purely just testing for evenness, probably coupling to the lang now.

river pilot
timber anchor
#

i think it's because TDD is often explained in stark terms that don't match reality. TBH, it's something you've been doing here. If you like, explain how you would use TDD to test an IsEven function. Don't get distracted by whether we need that function, etc. Just: how would you test it?

Ok. Joke is joke. You asked though.

river pilot
#

I asked about testing isEven(). If you had that function in your code, why wouldn't you test it? Sure, it's easy to imagine it's a one-line function, but that line needs a test.

#

@timber anchor ^^

timber anchor
#

Perhaps it would be better for you to answer why it "needs a test"

river pilot
river pilot
pulsar oracle
# timber anchor And you do not need to use gherkin or even a BDD test framework to do BDD. You d...

Gherkin is just BDD at the functional testing level meant for business analysts. I've been doing BDD exactly as Dan North explained in his original article at the unit level for two years now. Some people are rubbed the wrong way by the gherkin part and miss the original approach entirely. If you use a unit test framework to do any type of test you can format them the same and ideally have them say exactly what you want it to do while saying very little about how (the level of abstraction varying). I personally heard it from Dave Farley and got it immediately then got confused by other material and wasn't sure I was doing BDD because of 99% of explanations being for functional testing but the article plus other explanations once again clarified it, I'm not perfect, some tests are definitely crummy and fail to exactly read as specifications or scenarios but I'm definitely most of the way there especially recently.

pulsar oracle
# river pilot because i could have written the line incorrectly: ```python def isEven(x): ...

If you're using BDD you don't test it. You say you want a function that will tell you if the function is even. Then you do a few scenarios using it. You name the test case after what is being tested or specified then name methods like sentences that specify what it should do.

TestIsEvenOdd:

test_should_detect_uneven_numbers_as_odd

What should it do? So give it an odd number have it return false because that's what you want, see it fail because it doesn't meet the specification, go make it be true, or should it really be true? If it shouldn't maybe another developer or person can be like, nope, need a different behavior (as understandings change even in the code we think we want). And so on. You test any function you want, and you test any code that uses it for broader behaviors, maybe I'd test this twice as part of something broader that also has even odd functionality. It's in the first part of this article and I've been doing it for years and now do it for acceptance tests, my naming is just better.

https://dannorth.net/blog/introducing-bdd/

proud nebula
#

You sound almost religious when you defer to authority that much.

river pilot
pulsar oracle
# river pilot this is a lot of words, but i don't see how it's bdd. You said, "check that the ...

It's exactly the origin of BDD though and part of it, it is testing but explained differently, and to understand specifically test driven development. It's the flavor of it and how you think about it weather at the unit level or functional. They're basically identical in what's being done but if you use BDD there's a heavy emphasis on specifying what it should do while leaving out how it does it and making it very sentence like.

river pilot
#

i like the "specify what it should do". sentence-like doesn't really appeal to me.

#

maybe it's a bit unfair, but "BDD" now is associated with intermediate tooling that many people find unproductive.

#

type fewer words

pulsar oracle
pulsar oracle
river pilot
#

we definitely agree that the important thing is to have tests.

timber anchor
# pulsar oracle If you're using BDD you don't test it. You say you want a function that will tel...

I would go one further and say you dont absolutely have to test that either.

It would make sense to approach it from the end user first, and then you can explain why you needed to test if something is even based on the business/user needs.

Example: Equipment must be inspected on matching parity day.

If for some reason some business rule forces you to check for evenness in a unique way then this is going to be a test, yes. But its more of a contract test to assert that your types can do modulo arithmetic, which isnt about testing if something is even anymore.

If it isnt already in your prog lang, then ok you can test drive it, but if you are just converting types with builtins and then doing the modulo, its already tested. There would need to be a far better reason than we just need coverage

river pilot
#

or maybe not: "converting types and doing the modulo" is code you can get wrong. You should test it.

timber anchor
#

Its an internal detail bool(x%2) we may as well check if it is even valid code (which in something like Java it is not).

If for some reason python changed how it evaluated the truthiness of this, your test will fail despite not changing any of your code.

Closer to language paranoia.

Your tests can inadvertently cover scenarios for evenness/oddness and avoid explicitly testing the output of modulo and how python interprets ints as bools (also known as trust and know the language).

Its not a bad sanity check to assert something is even or odd, but i would not formalize such a thing as an actual unit test.

#

Its more of just assert as sanity check, instead of unit test.

river pilot
timber anchor
#

Sanity check for your own programming language understanding. If you do not know what this does, and you need to use it, then go ahead and assert it. Or just read the docs.

river pilot
river pilot
timber anchor
#

No. it means use the assert keyword

#

You can try in a repl

river pilot
timber anchor
#

No. Just validate your own learning. If you call learning manual testing, then sure

river pilot
#

@timber anchor how do you decide what functions to write tests for?

pulsar oracle
river pilot
pulsar oracle
river pilot
pulsar oracle
timber anchor
#

Like I said, indirectly. If you have a higher up business rule or scenario, and change this function those tests should fail. Example: testing that a particular customer support on-call rota strategy involving an every other day rotation behaves as expected.

#

But the test doesnt have to cascade that high up either. You can unit test the behaviors closer to isEven

river pilot
timber anchor
#

This is a common misconception. I recommend looking into this yourself for now

river pilot
#

you don't have to if you don't want to.

timber anchor
#

I wont go into much more detail because there are people who speak on it far better than I do, and it has been done... but the unit is closer to behaviors (perhaps even so far as to say specifically end-user behavior) instead of functions.

#

Hence BDD

river pilot
pulsar oracle
timber anchor
#

number of days covered in on-call with an every other day strategy.

test: leap year vs non leap year.

it will find problems with is even or odd quickly. 366 vs 365 days

river pilot
pulsar oracle
#

I want to recover album photos from a game. There's a cache directory and these photos are jpegs and of a certain resolution. To do this I need to check if a file is a jpeg, does it have the signature, think of it analgous to, is this number even? My real logic is take a directory and find photos that are a jpeg and of a resolution.

So I'd do find_album_photos_in_directory(directory_path: str) and I would place actual photos in that directory and be interested in, will it find a jpeg photo with the right resolution? Will it skip one of a different format like a PNG with the same resolution, each individual function like is even or even some functions to read this information are important, but I wouldn't test them for this problem, I'd hide them. So what I'm saying is, if you need to check if something is even and you have that function, you just skip it, even if you make it easier and less error prone. But I agree completely if we're talking about making a library of functions, a standard library, something where that exact code is what you want people to consume.

timber anchor
#

It might not need to, but it can. One team can work 182 vs 183 days. Or on leap year, 183 vs 183. The importance is in asserting which team gets 183.

#

You are testing leap year vs non leap year though, not isEven

river pilot
#

I totally understand testing the user-visible functionality of the product. I agree that's a good thing. It's also good to have tests at smaller granularities. They can include cases that are harder to do at the higher level.

#

I'm not sure why you think it would be bad to test at the lower level also.

timber anchor
#

That is the lower level. You can use fakes for all of that if you like

#

If you are referring to my example anyway

pulsar oracle
# river pilot I'm not sure why you think it would be bad to test at the lower level also.

I personally think that it does depend. If I have something I need to do all over the place it would be nice to have a tested function available that I can rely on to do it (even if it's going to be tested in other places indirectly, and I find that when I do this sometimes it makes it easier to put everything together later). But in the example I gave I avoid it because it's like planning ahead, not being so iterative. One time when I did this I started writing these different functions like for is_photo_a_jpeg(file_path) and it felt like I was making publicly available functions to be relied on that I don't need when I should have been starting with a function that solves the problem that I want and testing it. I personally think about the issue as public vs private code for it.

river pilot
river pilot
timber anchor
#

the test would be something like two_people_divide_days_in_leap_year

river pilot
timber anchor
#

it is an on call rotation, should be at least two people

river pilot
timber anchor
#

I think calendar does this already for you. You are testing a dependency?

pulsar oracle
timber anchor
#

Maybe. Not entirely sure

pulsar oracle
#

I would. If I have business logic that depends on a year being of a certain type and doing something if it is, I would extract it into it's own interface like YearTypeChecker with a method to check what type of year it is, then write an implementation and test it there, for the core busines logic I'd use a mock and make it say a certain year is a leap year or test it with my implementation of the checker injected as a dependency.

timber anchor
#

If you are writing a date librbary, sure

#

What are you mocking?

pulsar oracle
#

If i have business logic dependent on if the year is a leap year, then I put that in it's own class and make a mock to test it. The actual thing would need to use an implementation that checks that leap year logic is at least somewhat right, which is where integration tests come in, or at the very least push it off to some acceptance tests.

river pilot
pulsar oracle
#

(I'm not talking an individual function at this point by the way, just business logic and testing that logic somewhere)

pulsar oracle
# river pilot so you wouldn't test `is_leap_year` directly?

I think if I'm writing code I want to know that it does something when it's a leap year and when it isn't and I don't care about the logic for it actually being one, so it could take a callable to check, or a class that serves that purpose and you could just lie with some given input to check what you want. And I think a leap year example is different because it's something probably directly related to your business logic (If I'm not imagining this wrong) and you would want to know what you pass in actually works somewhat.

river pilot
pulsar oracle
amber fulcrum
#

I have tests that needs some setup, e.g. by loading a JSON schema. I've set them up as fixtures in the module and that's fine. However I need the same fixture functions for several test modules, but with different file input. To DRY I'd like to move these fixtures to conftest.py.

Is there a way to parameterize fixtures in conftest.py where the parameter is actually provided by a test file/module?

#

Can a test fixture with @pytest.fixture(scope="module") get access to a variable which a specific module sets? -- Because scope="module" in conftest.py means per test-file, not per-conftest, right?

#

Google suggests that I can create a common fixture that returns a function that does the heavy lifting. This way I can give an per-file input to the reused fixture.

#

It ended up something like this:

# ---- conftest.py ----
@pytest.fixture(scope="session")
def fn_schema():
    """Return a function that reads a schema file."""
    def schema(filename: str|Path) -> dict:
        with open(ROOT / "schemas" / filename, "r") as f:
            return json.loads(f.read())
    return schema

# ---- test_something.py ----
DUT_NAME = "user_data"

@pytest.fixture(scope="module")
def schema(fn_schema):
    return fn_schema(DUT_NAME)

def test_example(schema):
    ...

Then I realize that the fn_schema() doesn't create any value as fixture since the schema() fixture is required. It could as well just be a regular imported utility function.

river pilot
amber fulcrum
#

Scope isn't terribly important for that

#

I found indirect= as an option to parametrize and are looking into if that is a more elegant method

#

This works, although the repeated @pytest.mark.parametrize(...) quickly gets very tedious:

# ---- conftest.py ----
@pytest.fixture
def schema(request):
    with open(ROOT / "schemas" / request.param, "r") as f:
        return json.loads(f.read())

# ---- test_something.py ----
DUT_NAME = "user_data"

@pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
def test_user_data(schema):
    ...

@pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
def test_user_data_2(schema):
    ...
amber fulcrum
#
# This works:
schema = pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)

@schema
def test_fn(schema):
    ...  # This works

# This doesn't work
@pytest.fixture
@pytest.mark.parametrize("schema", [DUT_NAME], indirect=True)
def local_schema(schema):
    return schema

def test_fn2(local_schema):
    ...  # This doesn't work
molten hollow
velvet dirge
boreal tundra
#

I wish there was an ability to pattern match the module name on the command python -m unittest discover -s "longmodulename.modulenameblah*" -p test.py

#

I can match on the filename, but not the module ducky_skull

proud nebula
#

with -k

wind zenith
#

Hi, so I made a click app that when you run it, runs a function with some prints and inputs like:

# in cli.py
@click.command(name="start")
def start():
    main()

cli.add_command(start)

# in cli_app.py
def main():
    raw_to_parse = input(textwrap.dedent(
        """
        Welcome!
        Do you want to start?
            (Y)es   [default]
            (N)o

        """
    ))
    to_parse: bool = True

    if raw_to_parse.lower() not in ["", "y", "yes"]:
        to_parse = False

    if to_parse is True:
        get_pattern()

How would I test this? CliRunner.invoke() doesn't seem to handle inputs in the function itself.

river pilot
fiery arrow
#

This seems like a very good argument for including tests in coverage, but they do include it in coverage, it just doesn't error when it's not 100% ๐Ÿค”

river pilot
#

sounds like a good issue to write

fiery arrow
#

i'm just making sure I'm seeing it right

#

would be really silly to make a PR fixing it and it's like "we're obviously running the cve test in super-cool-separate-cve-runner"

river pilot
fiery arrow
#

(kinda sus that the only test this happened with is a security related one)

#

sorry, I'm a bit over-paranoid

river pilot
fiery arrow
#

ugh, codecov has geoblocking... and also doesn't like requests over Tor...

#

I have to spam "new circuit for this site" just to see the coverage. smh

weary quarry
#

๐Ÿ‘€ A cool example, in the wild, of why I like coverage on my test files right there. ๐Ÿ“ธ

radiant stirrup
#

So, like I have a problem when trying to run hatch test and I am not sure what I can do to fix it.

bitter wadiBOT
summer surge
#

Hi everyone! I am here about a concept of making tests. In F.I.R.S.T principles it's required to wrtie test simultaneously with creating some x func or even before, but what about reality? Sometimes, you don't want to write tests simply to check whether value or None returned or even write test before a certain func.
So how to correctly implement T - Timely part of principles in real-world development?

river pilot
#

To me, Timely doesn't mean the test should exist before the code. It means the tests should be added to the project when the code is added to the project. "Added" could mean a pull request, or a work item, or whatever. The new code or fix isn't done until there are tests to go along with it.

summer surge
river pilot
marsh raft
river pilot
marsh raft
#

ooh sneaky ๐Ÿ™‚

river pilot
#

I think you mean, using all of the tools available to me ๐Ÿ™‚

marsh raft
#

YES THAT'S CHEATING

languid lance
#

I'm really confused...
Why does a test with the name test_lorum_ipsum_update pass, but when I change it to test_update_lorum_ipsum it fails?

river pilot
marsh raft
#

perhaps there are two tests with the same name? test framesworks might ignore the second such

languid lance
#

The assertion that's failing when I have the test named as test_update_lorum_ipsum is:

mock_mongo_document.find_one_and_update.assert_called_once_with(...)

The fail is:
AssertionError: Expected 'find_one_and_update' to be called once, Called 0 times

Then, when I change the test name to test_lorum_ipsum_update it passes.
There are no other tests with the failing test name. That was my first thought ๐Ÿ˜…

river pilot
#

Renaming could change the order the tests are run. Perhaps your tests are not isolated from each other

languid lance
marsh raft
#

you're certain that when it passes, it actually runs (as opposed to being skipped)?

#

I guess your future looks like: simplify your tests bit by bit until you discover the bit that is breaking things

languid lance
river pilot
marsh raft
languid lance
#

@river pilot @marsh raft - okay, so if I comment out the test above it which mocks out the function that I'm testing later, it passes with the name that I want (test_update_lorum_ipsum).

river pilot
pulsar oracle
languid lance
#

Here's the test class' setUp:

def setUp(self):
    self.service = AttendanceService()

The first test:

async def test_handle_absence(self):
    { ... }
    
    self.service.update_attendance = AsyncMock()
    self.service.update_attendance.return_value = AttendanceModel(...)

    { ... }

    self.service.handle_absence()
    self.service.update_attendance.assert_called_once_with(...)

This passes...

Second test:

async def test_update_attendance(self, mock_document):
    mock_document.find_one_and_update = AsyncMock()
    mock_document.find_one_and_update.return_value = AttendanceModel(...)

    test_result = await self.service.update_attendance(...)

    mock_document.find_one_and_update.assert_called_once_with(...)
river pilot
#

What is cleaning up the mocks? Something needs to undo them at the end of the test.

#

or, what makes mock_document, and what uses it?

#

I hope AttendenceService isn't a singleton....

languid lance
river pilot
languid lance
#

Yeah, which now makes sense with me not having a cleanup

river pilot
languid lance
#

I agree with you. The client that I'm working for uses them, so my hands are tied.

river pilot
languid lance
atomic thistle
#

Ran into a bug I can't get to the bottom of, maybe someone smarter than me can figure it out. The following test produces this error in python 3.10 - 3.11, but not 3.12 or later:

NameError: name 'isclose' is not defined
def test_shear_wall():
    file = "/Users/villager/Projects/pynite/Examples/Shear Wall - Basic.py"
    exec(open(file).read())
  • The script makes use of the math.isclose function. from math import isclose
  • I cannot reproduce the error in a smaller example.
  • There is no error running the file directly, or via exec. So it I expect pytest is somehow related.
  • More details: https://github.com/JWock82/Pynite/pull/301
swift pewter
#

Also: Do you have more traceback than just the NameError? Which line does it error on?

atomic thistle
#

I will try remove everything past the last isclose call.

swift pewter
#

I would remove things until it no longer occurs. Does it happen when refering to isclose at all after the import? Does it even happen in the script file itself (or in something it calls)?

atomic thistle
#

Thanks! That was helpful, it's happening at the list comprehension. This also throws an error:

n = len([node for node in model.nodes.values() if height])
NameError: name 'height' is not defined

I'll keep poking at it.

river pilot
#

it sounds like some odd scoping thing

sacred lintel
#

anyone have an idea on how i can write unit tests for this?
https://github.com/CheetahDoesStuff/sleet
i fear that they will change / affect the projects enviorment (installing/deleting packages, writing commits etc) as that is what its built for and those are the features i would need to test

GitHub

Contribute to CheetahDoesStuff/sleet development by creating an account on GitHub.

river pilot
sacred lintel
limpid raft
#

I am currently taking over a pretty big code-base that isn't in Git, nor is it test-covered. What I would really like to do is import individual pieces of code into Git along with the tests I write. This is kinda hard to do with a big, convoluted code-base, and I am wondering if you know of a way to run pytest on the Git index (cached changes) whenever those change? I.e. I do git-add and pytest runs on whatever is HEAD + index at that time and gives me the output? Sort of like what pre-commit can do pre-commitโ€ฆ

river pilot
#

why not put all of the code into git now? I don't understand how git-ness and tested-ness are connected.

limpid raft
#

It's me trying to make sense of the big thing by carving out batches at a time.

#

maybe I just want pre-commit

river pilot
#

i wouldn't run pytest in pre-commit, it could be much too slow.

limpid raft
#

Well, I agree, but maybe this is precisely what I need right now?

swift pewter
# limpid raft I am currently taking over a pretty big code-base that isn't in Git, nor is it t...

I think doing this on the index is hard. Maybe you could settle for doing it based on commits? Then you could use some combination of git worktree to have a second, linked checkout of the same repo which is on that same branch (but doesn't have any uncommitted files), entr (e.g. to watch HEAD), and a little script that pulls and runs pytest in the second checkout whenever you make a commit.