#algos-and-data-structs
1 messages Β· Page 71 of 1
how do you upgrade wsl π€‘
sudo apt update && sudo apt full-upgrade
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
sudo do-release-upgrade
lhott@JasonTower:~$ sudo do-release-upgrade
Checking for a new Ubuntu release
There is no development version of an LTS available.
To upgrade to the latest non-LTS development release
set Prompt=normal in /etc/update-manager/release-upgrades.
yes I had that too
ah you're not on LTS
I like to live dangerous
Btw my desktop computer is not healthy atm. A couple of days ago I had tons of bluescreens. After that chrome didn't want to start, forcing me to reinstall it.
I should really get it fixed
Seems my ram or drive is fucked
too many benchmarks
computer's complaining
no more sparse tables please
is the gcc version different on your new ubuntu
maybe only gcc fixed it
I think it has seen worse
seems hefty already
||I didn't||
Ah yes I hadn't even seen the stop code
π
my wsl is taking ages to update ubuntu
but then I won't be able to reproduce any issues
It warns it could take hours
yes but
Where is the best place to practice algos and data structures?
most places are fine
https://cses.fi/problemset is nicely organised I guess
I was just going to say that
But cses.fi is more for C++ than Python
Because of its very tight time limits
But then you learn about obscure xor trees
Thanks
I've gone over the bugs reported to bugzilla (glibc) using memcpy. The ones that could be relevant are these
https://sourceware.org/bugzilla/show_bug.cgi?id=32475
https://sourceware.org/bugzilla/show_bug.cgi?id=32427
https://sourceware.org/bugzilla/show_bug.cgi?id=30995
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
No one reports a slowdown even close to what we are experiencing
They talk about things like 30% slower on AMD, that is what they call "significant"
Not 3000% slower =p
Could be that they either found the bug themselves, or just accidently fixed it while changing something else
lhott@JasonTower:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.10
Release: 24.10
Codename: oracular
lhott@JasonTower:~$
why are you special enough that you get 25.04
have you tried updating again
even if that worked i would be too offended to do it
can you try slowdown bug
lhott@JasonTower:~/tmp$ vim test.c
lhott@JasonTower:~/tmp$ gcc test.c -O3
lhott@JasonTower:~/tmp$ ./a.out
lhott@JasonTower:~/tmp$ time ./a.out
real 0m0.221s
user 0m0.212s
sys 0m0.000s
lhott@JasonTower:~/tmp$ vim test.c
lhott@JasonTower:~/tmp$ gcc test.c -O3
lhott@JasonTower:~/tmp$ time ./a.out
real 0m0.210s
user 0m0.224s
sys 0m0.000s
nothingness
Maybe it is already fixed
*glibc
Still one thing that doesn't mate sense
How did I have string slowdown but not list slowdown
in 3.12.3
try me
Just have to wait for my computer to bluescreen and press recover
and bang
free downgrade
so
with godbolt
you can run code
and all GCC versions are listed
let's find the one
You have to understand how things work. All you see on godbolt is that memcpy is called
you can execute the code
You don't see what is going on inside of it
and time it
you just have to see when the bug starts happening
for which version i mean
i know i just wanted somewhere where I can test all gcc versions I guess
although if it's glibc
ah yes if it's glibc I won't be able to reproduce
yeah even old versions of gcc are fast on godbolt
Can you even test old versions of gcc on godbolt?
but
so
GCC fixed it but not clang?.
I thought that the problem was "past the compilers"
But on windows, I've never been able to reproduce with gcc
can you?
nope
I can't reproduce on godbolt what you say
Maybe it just runs things on different servers?
For me it seems semi consistent on godbolt
I never got 373 ms
Quite big variations
You were the one who suggested it
ConfusedReptile had the slowdown on ubuntu 24.04 with gcc version 13.3.0 and clang version 18.1.8
I have no slowdowns with Ubuntu 25.04 with gcc version 14.2.0 and clang version 20.1.2
probably what confusedreptile had
but he had a real linux
That should not matter
on my real linux, gcc 13.3, I have NOTHINGNESS
well it's not real it's a VM idk
I have a windows update to do
Should I do it
maybe we'll unlock new behaviours
π
Are you Romain Lhotte?
Yeah
I thought itβd be relevant to add that it was fixed
On Ubuntu
Put some pressure on the windows devs
what if putting computer to sleep makes the bug appear again
Itβs not sleep
Itβs restart
This time
But weβve had plenty of restart tests already with your blue screens
hum HUM HUM
PS C:\Users\lhott> python -m timeit "([0] * 2**18)*14"
50 loops, best of 5: 5.4 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * (2**18+1))*14"
20 loops, best of 5: 6.8 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * (2**18+100))*14"
50 loops, best of 5: 5.46 msec per loop
I can't reproduce
It's over
Not even in C (nevermind I was using gcc for C)
No no its the other way around. If it goes to sleep, I'm afraid it will never wake up again.
my python is still going πͺ
ah now I can
PS C:\Users\lhott> python -m timeit "([0] * (2**18+100))*14"
50 loops, best of 5: 5.39 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * (2**18+100))*14"
50 loops, best of 5: 5.47 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * (2**18+1))*14"
20 loops, best of 5: 13.1 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * (2**18+1))*14"
20 loops, best of 5: 9.36 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * (2**18+1))*14"
20 loops, best of 5: 13.1 msec per loop
PS C:\Users\lhott> python -m timeit "([0] * 2**18)*14"
50 loops, best of 5: 5.2 msec per loop
so upgrading windows didn't change anything
I honestly think the best course is to downgrade back to the ubuntu that wasn't working
to investigate why the memcpy we think happens doesn't happen (otherwise it'd be slow)
time to just report confusedreptile's version as not working
we just need to add a printf statement in the C code, I'm downgrading ubuntu
what
well in C:
memcpy with a distance of 2**18 + 8 and a count of 2**18 is slow
in Python:
not slow
so I'll put a printf in CPython source code just before the memcpy
You want to modify CPythons source?
I don't think it's that complicated
you just have to recompile it
it's worth a shot
I did compile it "manually" to install Python 3.13.5 anyway
ok, downgrading is annoying
In a perfect world, microsoft will reply with more info about what the bug is and what is affected
But reading through other issues, I'm expecting a "Our engineers will take a look at this" kind of reply
I'm back on Ubuntu 24.04.1 LTS
lho.....5$ python3 -m timeit -r 20 "('d' * ((2 ** 14) + 1000))*4"
500000 loops, best of 20: 853 nsec per loop
lho.....5$ python3 -m timeit -r 20 "('d' * ((2 ** 14) + 1))*4"
50000 loops, best of 20: 9.06 usec per loop
as expected, string is bad.
And lists are not bad.
Even though, in C:
memcpy(src + (1 << 18) + 8, src, 1 << 18);
is slow
So I'm going to go printf size_t((dest + copied) - dest) and (size_t)bytes_to_copy (basically what is called here in CPython source code):
memcpy(dest + copied, dest, (size_t)bytes_to_copy);
Running:
a = ([0]*((1<<18) + 1))*4
Prints this to me in my custom-compiled CPython:
Offset: 2097160, Bytes to copy: 2097160
Offset: 4194320, Bytes to copy: 4194320
2 097 160 = 2^18*8 + 8
It all makes sense, the bug cannot be repeated when the distance is 2^21
Fast:
lhott@JasonTower:~/tmp$ vim test.c
lhott@JasonTower:~/tmp$ gcc test.c -o test
lhott@JasonTower:~/tmp$ time ./test
real 0m0.259s
user 0m0.278s
sys 0m0.000s
lhott@JasonTower:~/tmp$ cat test.c
#include <string.h>
#include <stdio.h>
#define REPEATS 8000
char src[10000000]; // Allocate a large buffer
int main() {
for (int i = 0; i < REPEATS; ++i) {
// Perform memcpy with specific offset
memcpy(src + 8*(1 << 18) + 8, src, 8*(1 << 18));
}
return 0;
}
Fast as well:
lhott@JasonTower:~/tmp$ vim test.c
lhott@JasonTower:~/tmp$ gcc test.c -o test
lhott@JasonTower:~/tmp$ time ./test
real 0m0.242s
user 0m0.264s
sys 0m0.000s
lhott@JasonTower:~/tmp$ cat test.c
#include <string.h>
#include <stdio.h>
#define REPEATS 8000
char src[10000000]; // Allocate a large buffer
int main() {
for (int i = 0; i < REPEATS; ++i) {
// Perform memcpy with specific offset
memcpy(src + 8*(1 << 18) + 800, src, 8*(1 << 18));
}
return 0;
}
So the mystery is solved, the problem is entirely on memcpy, we just benchmarked it wrong
uhm what
tl;dr: no bug in C with a distance of 2^21 + 8 which is what is used by CPython for your list example
Haven't we been over this. On my computer anything up to +24 worked
But you said +7 was the highest you could go I think
for 2^18
2^18+8 = slow for me (on WSL)
2^21+8 = fast
And obviously both 2^18+800 and 2^21+800 are fast
It seems the something small is dependent on the power of 2
Okey I looked at that just now. On mvsc (1 << 20) + 1 is fast.
but << 19 is slow
So the consistency is less nice as we though on the C side
But at least on the CPython side, everything makes sense with the C benchmarks
It really was frustrating me that we weren't able to link both
Also, it's surpisingly easy to add a printf in CPython Source code and compile it, honestly I thought it was going to be a nightmare of bugs and whatnot but it's not
This one is fast for me
#include <string.h>
#include <stdio.h>
#define REPEATS 4000000
int main() {
char src[10000000];
for (int i = 0; i < REPEATS; ++i) {
memcpy(src + (1 << 19) + 1, src, 4*4096 + 1);
}
return 0;
}
This one is slow in msvc
#include <string.h>
#include <stdio.h>
#define REPEATS 4000000
int main() {
char src[1000000];
for (int i = 0; i < REPEATS; ++i) {
memcpy(src + (1 << 19) + 1, src, 4*4096 + 1);
}
return 0;
}
Somehow the size being 1e6 vs 1e7 matters here
Does it just optimize away the 1e7 one?
Moving src to global makes both slow
My benchmarks were with src in global
and no optimizations flags anyway
I only did things with gcc to explain why we could reproduce the evil string example but not the evil list example
The consistency around when memcpy is slow makes no sense for now - I was just making sure that everytime Python is slow, we can pinpoint exactly the corresponding slow memcpy and everytime Python is fast, we can pinpoint exactly the corresponding fast memcpy. That's reassuring
So ([0] * (2**15 + 1)) * 2?
I think so
No wait
Yep
You are thinking about this wrong
Why?
A Python list of size 2^15+1 has an underlying 2^18+8 bytes-long array
but then why * 8?
That's because my brain is a bit small
Shouldn't * 2 be the correct thing
lhott@JasonTower:~$ python3 -m timeit "([0] * (2**15 + 1)) * 2"
5000 loops, best of 5: 89.4 usec per loop
lhott@JasonTower:~$ python3 -m timeit "([0] * (2**15 + 100)) * 2"
5000 loops, best of 5: 46.6 usec per loop
Yes.
Unless 2^18+16 is also slow but I forgot
And at this point everything makes sense so I'm not too worried
try **14 and * 4 instead
Did you benchmark 2^17+16? Why those numbers?
lhott@JasonTower:~$ python3 -m timeit "([0] * (2**14 + 100)) * 4"
10000 loops, best of 5: 35 usec per loop
lhott@JasonTower:~$ python3 -m timeit "([0] * (2**14 + 1)) * 4"
5000 loops, best of 5: 62.1 usec per loop
To get a big slowdown, I would like the x in * x to be as big as possible
For list, with * 2, you get a +8
For list, with * 4, you get a +16
You need to see which power of 2 the "something small" is as large as possible
For * 6 you actually get a +16 still
don't you?
you might be able to make it bigger a bit like this
For * 7 you get a +24 I assume
No
wait
Thats not how it works
They double the pointer each time. But the count can be less than double in last iteration
Ah, I think you're right
I don't think it's "particularly" good but neithe ris it "particularyl" bad; you still have some bad memcpy hits
In theory, *16 should be the winner
or *8
It is still strange that 14 and 20-22 seemed optimal
If you really want to maximise the slowdown you'd have to do that.
Knowing what we know now
Did we ever extensively try *16?
I did
I think I used 14 to not Memory Limit Exceeded the Discord Python bot
I used 14 since my benchmark told me it was a winner
in slowdown
Oh here is a theory
It could be that the time it takes (the "slowdown") doesn't scale linearly in count
That could definitely explain why *14 looks better than *16
Maybe a slowdown bugged memcpy scales sublinearly
Ok I think this is probably it
Makes total sense
Possible
as long as count >= 8193
but sublinearly gets slower
so you'd want to hit as many 8194 as possible
Ye probably that
But ... do we care
well 8193
off by one errors is what probably caused this bug in the first place
I think the only reason I'm now decent at off-by-one errors is doing some CP - I was terrible at debugging these things before but still decent at my job lol
Why havent we done a 8193 long string yet. That should be murder
Something like 8193 times 16
What
Ah yes
2^13 + 1 string
ain't that disappointing
no one said the slowdown was even monotonic in count to be fair
The point of 2**13+ 1 is that you can do *16 or *32
16 is / seems optimal
!e
import time
start_time = time.time()
for _ in range(50000):
("s" * (2**13 + 1000)) * 16
print(time.time() - start_time)
start_time = time.time()
for _ in range(50000):
("s" * (2**13 + 1)) * 16
print(time.time() - start_time)
:white_check_mark: Your 3.13 eval job has completed with return code 0.
001 | 0.35376739501953125
002 | 0.5982496738433838
!e
import time
start_time = time.time()
for _ in range(50000):
("s" * (2**12+ 1000)) * 16
print(time.time() - start_time)
start_time = time.time()
for _ in range(50000):
("s" * (2**12+ 1)) * 16
print(time.time() - start_time)
:white_check_mark: Your 3.13 eval job has completed with return code 0.
001 | 0.2638430595397949
002 | 0.33711934089660645
I guess 2**13's slowdown is bigger
makes sense since the first copy with 12 is not problematic
On my computer, *64 is the clear winner.
0.4272184371948242
3.963329315185547
no
Something is funky
import time
start_time = time.time()
for _ in range(50000):
("s" * (2**13 + 100)) * 32
print(time.time() - start_time)
start_time = time.time()
for _ in range(50000):
("s" * (2**13 + 1)) * 32
print(time.time() - start_time)
start_time = time.time()
for _ in range(50000):
("s" * (2**13 + 100)) * 64
print(time.time() - start_time)
start_time = time.time()
for _ in range(50000):
("s" * (2**13 + 1)) * 64
print(time.time() - start_time)
0.7776021957397461
3.9934470653533936
0.9463193416595459
4.486739873886108
Wtf are these times
double of 0.7776021957397461 is not 0.9463193416595459 last time I checked
Wel if the last copy doesn't trigger a bug in memcpy
Could it be that +100 isnt enough?
It might be super fast
To be fair that could be the case too
Lets use +0 instead
Couldn't it just be that the last copy doesn't trigger a bug in memcpy
last copy is a copy of size 2^.. + 32
you said the bug was working up to 24
Where are you getting your 64 from
Why would it be straight if your x-axis is logarithmic
This is the string example?
yes
it'd be linear if it wasn't for that last point lol
seems like you maxed out how much of a slowdown you can get
they're perfectly parallel
Let me do a better plot with every possible *k
More benchmarking more benchmarking!
more like minimized
?
I meant like k > 32 doesn't seem to give you additional slowdown over the negative controls
ye after 32 the slowdown dissapear
yep
but it makes sense
you said 24 was the upper bound on your computer
k = 64, you do a copy with +32
and this isn't slowed down
but k = 32, the last copy is +16 which is slowed down
So if we had a 2^10 + 1 = 1025 long list
Then we would have effectly start with 2^13 + 8
Yep i think so
and we would be able to do * 4
Kind of
I still dont understand how *22 could result in such a good example
that should be +88 on last copy
Clearly there must be other + values that work other than the small ones we've tried
Given that *22 was working, which effectly runs +128
My github example starts with:
2^21 + 8 bytes
Yes maybe it stops at +24 but then comes back
At you hit all the right values
With my custom python it tells you all the arguments of memcpy that are hit π
So you are saying that that 2^25 + 128 is a killer?
For 1 << 24 it seems slow happens +x <= 31
I'm unnable to reproduce ([0] * 262145) * 14 in C calling memcpy manually
It should be doing these 4 calls
memcpy(2^21 + 8, 0, 2^21 + 8)
memcpy(2^22 + 16, 0, 2^22 + 16)
memcpy(2^23 + 32, 0, 2^23 + 32)
memcpy(2^24 + 64, 0, 12582960)
memcpy(2^24 + 64, 0, 12582960) does not result in a slowdown for me
neither does memcpy(2^23 + 32, 0, 2^23 + 32)
only memcpy(2^21 + 8, 0, 2^21 + 8) has a slowdown 
Idk what is going on here
Youβre trying on MSVC?
Since that slowdown is only on your Windows
I am
Maybe I'm not seeing the slowdowns because I'm not using the exact same version of MSVC as CPython uses
Another thing that we do differently
We keep calling the same memcpy over and over. It could be that this somehow makes things faster because of caching.
CPython doesn't do that.
Ah yes
The output assembly is same except for the constant of 1e6 and 1e7.
The specific memcpy implementation would be the difference. Maybe different branch at different sizes.
"Who" is in charge of implementing memcpy?
Is it the "os people"?
or the "cpu people"?
I have no idea how things work "down there"
Compiler people.
Ah it's the compiler people directly?
Ok I see
There is some implementation for the standard library somewhere, and you can swap this out.
What do you mean?
There are multiple C standard library implementations.
Most common: https://en.wikipedia.org/wiki/Glibc
The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s by the Fr...
On Windows this is Microsoft's for MSVC.
Ah yeah ok I see
Parts of the C standard library are implemented in assembly, and so it's a bit weird, some C features are tied to the library/runtime.
Then there is like math.h, which is technically not part of it, and so on Linux you need to separately link to it.
I was talking about msvc specifically. In gcc and clang I had to make the array global for the memcpys to not be optimized away
Yeah I checked in MSVC, same assembly output that just calls memcpy.
So whatever the memcpy implementation happens to be.
Are you absolutely sure?
It really looked to me like it didn't compute anything in the 1e7 case
I'll check again when I'm back home
iirc pajenegod is on 19.38 but it's the same as well on 19.38 indeed
My first initial guess is that past a certain size it triggers a different branch inside memcpy with a different loop. That loop may be slower for that exact size and a bit above it, but if you go even further it gets faster.
We're talking a difference of speed up to a factor 60
Did I read it wrong before? I thought the issue was smaller gave fast, then a big bigger gave slow, then even bigger gave fast right?
It's a bit much to follow in the chat can you restate the issue here?
I have tried to be as exhaustive as possible in that issue
Oh, ok.
The first few lines should sum up everything
Hmm, need to dig into how memcpy is implemented to tell what is going on.
I find that fascinating but I'm starting to be very far from my comfort zone unfortunately
It's not open source AFAIK, but we can look at the assembly in the binary.
a = ([0] * 262145) * 14
Offset: 2097160, Bytes to copy: 2097160
Offset: 4194320, Bytes to copy: 4194320
Offset: 8388640, Bytes to copy: 8388640
Offset: 16777280, Bytes to copy: 12582960
Yep everything you are is consistent with the printf
So idk
People have gotten the slowdown on both windows and Linux. So this seems to be some bug that multiple implementation of memcpy falls into.
In that case it also raises likelihood on hardware problem that all the implementations run into.
99% of the people getting it have AMD
so yeah I'm definitely thinking hardware problem
but upgrading our WSL Ubuntu versions to the latest preview versions fix the issues
π
This makes it more interesting IMO, but would really need to get the source to memcpy (any of them) and test on AMD and not AMD.
I think it is likely some simd instruction on Ryzen 9 recent AMD cpu being buggy/extremely slow
my laptop is Ryzen 7
and has the bug
I have another guess that due to the memory size being the difference, maybe something having to due with their memory paging?
It could be a heuristic triggered internally too, or some other threshold.
Should we run perf
To have an idea
Like if we see a huge spike in branch misses
Or whatever
No idea how to run that on Windows though
You can, but it would probably be much faster to get way more useful info by getting the memcpy implementation.
(The specific one that ended up in the binary and the assembly it compiled to specifically)
(Before going further check what other compiler versions output on the same hardware to make sure it's not that)
(And if they are different, that gives you something to compare too though)
On Windows, MSVC triggers the bug but not GCC
On WSL for a non-preview version of Ubuntu, gcc triggers the bug
So if we look at the gcc and MSVC binary compiled on windows
There should be a diff and that should be insightful I guess
Yeah, compare the final binaries, can objdump them, find the memcpy location, compare assembly. Specifically any unique instruction differences (AMD has many fancy instructions, if there is a fancy looking one that is the difference, probably it).
Iβll be honest i have no idea how to do those things
Currently googling what objdump mean
You just give it the binary and it outputs all the assembly and symbols (you can think of as functions / jump locations).
It's just a tool to look at what is in the binary.
The binary has more than the machine code, lots of info.
I think it's in mingw too on Windows, not sure, but there are other tools.
There is ofc Ghidra.
And how will you know which instructions correspond to the memcpy?
If you find the memcpy symbol you can see where that is.
You can also go to where memcpy would be called (you know this because its your own code), then follow that address.
Try making any little program, like hello world, compile it, then objdump it.
(-D to see the assembly (converts machine code to assembly))
Ghidra has a GUI and many tutorials for it.
It is on Windows you were right
\mingw\bin\objdump.exe
I have to log off but Iβll see if I can find the memcpy call and see what instructions (if any) differ between the + 1 (slow) and the +1000 (fast)
I feel like comparing the Windows buggy MSVC binary and the Windows non-buggy gcc one will be more complicated but will probably give more insight π€·ββοΈ
Like if a bunch of people are injured, prioritizing the most critical first.
I guess in this case it probably means that you got put to the back of the line.
So triaged could mean both down-prioritzed or prioritized
Fancy word used incorrectly. They probably meant sorted by priority level.
(The image of this being a battlefield where there are many injured soldiers that are being sorted by needs is funny though (it's not that serious ("I'm dying! Please you gotta fix my bug!")))
shouldn't you look at libc's source instead if you want to look at memcpy
Itβs now « under considerationΒ Β»
The objdump of the GCC version of the MRE that triggers the bug is huge (14406 lines), what am I supposed to look for in there
And the MSVC one is 35163 lines and ends with "..." so I think it couldn't even fit everything in objdump -D .\smallerExampleMSVC.exe > msvcDump.txt
They really have the weirdest names for their statuses
-D will disassemble everything
if you just want to see main's code use -d
But you might just use godbolt for that
But I think the point was to see the internals of memcpy
At least on godbolt, I just see call memcpy, so that doesn't really tell me why MSVC has the performance bug and GCC doesn't
Actually that gives much more than what is on godbolt, I guess it shows what's inside memcpy, but I have no idea where to look lol
if you want to see memcpy's assembly code you would have to dump glibc
sth like libc.so.6 on the system
navigating the source code might be a better idea but I don't have much experience in that
Actually looks like I'm stack overflowing
Adding larger stack via /F67108864 fixed it
How large is the executable file?
So there is no issue if you malloc the array instead?
42 KB for GCC and 108 KB for MSVC
And the initial .c file is just:
#include <string.h>
#include <stdio.h>
#define REPEATS 4000000
char src[1000000];
int main() {
for (int i = 0; i < REPEATS; ++i) {
// Perform memcpy with specific offset
memcpy(src + (1 << 18) + 1, src, 8193);
}
return 0;
}
I guess including stdio is useless but it should be optimised away?
Yes
Try listing just the symbols and see if memcpy is in there (you can grep it): https://www.man7.org/linux/man-pages/man1/nm.1.html
objdump can disassemble a specific symbol (-d=symbol).
It looks like the issue has been found though, but if you want to learn how to do this anyhow, go for it.
Btw, 108 KB for something that basically does nothing is kind of wild, you can fit an entire 3D FPS into about 96 KB if you try hard enough (on Windows).
Interesting that an array that large does not compile error then.
It depends on the platform but some constant for each OS in the compiler could be there and adjusted as the OS does.
The issue about why a bigger array made things weird was solved but the "initial" issue was not (memcpy super slow sometimes independently of the size of the array)
I mean, it's MSVC, not me π
Yeah I know, just wanted to point out that that default size of everything is pretty large.
We have a lot more memory now, but still does not feel good.
I guess since no one cares about KBs they don't care either
Also that.
I'm struggling to understand how nm would help
I checked in the de-assembled file, there is no memcpy
I stand corrected
Ok, you can try disssasembling just that instead of all of the program.
objdump -d=memcpy you can try.
Does it mean that there was some inlining that made the "name"/"calling" of the function disappear from the machine code?
It should be in objdump too, but nm is listing just the symbols so we had a more managable output size.
You can do objdump --syms.
There is memcpy in the objdump of GCC and in the nm of GCC
But there is no memcpy in neither of the commands applied on the (huge) MSVC binary
When I nm the (huge) MSVC binary I get: "no symbols"
It was not compiled with mingw (GCC).
For the MSVC one you will need other tools.
Ah.
Maybe.
But then, I can't reproduce the (initial issue) with GCC
Try doing a debug mode build.
So analysing the GCC binary won't help me that much
It's been a while seen a messed around with mingw on WIndows.
It will give you an implementation of memcpy you can compare (some idea of how people implement it).
Unfortunately, the only way for us to see the bug is to use a MSVC binary on Windows π€‘
You can use Visual Studio.
Or Ghidra.
They don't like the equal
should I just do objdump -d memcpy myBinary.exe?
Needs the equal there.
PS C:\Users\lhott\Desktop\temp\c\untitled> objdump -d=memcpy smallerExampleGcc.exe
gets me
objdump.exe: unknown option -- =
Try --disassemble (I usually type the whole long form).
ah it works
so what I'm doing here is:
tell me how the assembly code corresponding to a memcpy call of that binary?
And I'll see the "choice" the compiler made as to "which memcpy implementation" is used?
So it's just showing that it's calling out?
Sorry?
Does it show something like call memcpy? Or do you see memcpy's implementation assembly?
The latter
Ok, then lets see where it's pulling it from with ldd https://www.man7.org/linux/man-pages/man1/ldd.1.html .
Then we find that library binary file and same process (dynamic library is also an object file, only difference with an executable is that it has no main function and a flag is set (in the header) that it can't be executed).
That's assembly right? maybe I was wrong
It's not.
I dont have permission to start voice
If you have Visual Studio, it comes with its own command prompt you can open that has similar tools like dumpbin.
If you want to use those.
What is it though?
Hi
Looks like header file information, talking about architectures.
The Intel 386, originally released as the 80386 and later renamed i386, is the third-generation x86 architecture microprocessor from Intel. It was the first 32-bit processor in the line, making it a significant evolution in the x86 architecture. Pre-production samples of the 386 were released to select developers in 1985, while mass production c...
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new operating modes: 64-bit mode and compatibility mode, along with a new four-level paging mechanism.
In 64-bit mode, x86-64 supports signific...
"elf" is Executable and Linkable Format, used by Linux. https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
In computing, the Executable and Linkable Format (ELF, formerly named Extensible Linking Format) is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4)...
The little and big refer to endianness. https://en.wikipedia.org/wiki/Endianness
In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed (by rising addresses) in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian (BE) or little-endian (LE), terms introduced by Danny Coh...
"pe" is Portable Executable, used by Windows. https://en.wikipedia.org/wiki/Portable_Executable
Portable Executable (PE) is a file format for executables, object code, dynamic-link-libraries (DLLs), and binary files used on 32-bit and 64-bit Windows operating systems, as well as in UEFI environments. It is the standard format for executables on Windows NT-based systems, including files such as .exe, .dll, .sys (for system drivers), and .mu...
@stiff quartz About the github issue. Is the guy that closed the issue part of DevDiv? He is using "we" in a way that confuses me
The good news is that VCRuntime is owned by DevDiv (maintained by the libraries and compiler back-end teams), so we can ship fixes much more easily than getting UCRT changes (which is owned by Windows and is much more difficult to get changes into).
I would assume so?
No idea but if he says we Iβd guess he works for them
Stephan works for microsoft and maintains the STL
So what does "we" refer to?
microsoft?
Idk, I thought he was from DevDiv
DevDiv is a team inside of microsoft afaict
STL might be part of it, idk
(STL the person, not STL the lib)
if anybody knows numba: #1400452755276955648
you can just ask your question, don't ask to ask
I wonder how people come up with solutions like this in 45 min interview
Given an integer array nums sorted in increasing order, return an array result of the same length, where result[i] should be the sum of the absolute differences between nums[i] and every other element in nums.
def getSumAbsoluteDifferences(self, nums):
n = len(nums)
result = [0] * n
prefix_sum = [0] * (n + 1)
# Calculate prefix sums
for i in range(n):
prefix_sum[i + 1] = prefix_sum[i] + nums[i]
# Calculate result array
for i in range(n):
left_sum = prefix_sum[i]
right_sum = prefix_sum[n] - prefix_sum[i + 1]
result[i] = i * nums[i] - left_sum + right_sum - (n - i - 1) * nums[i]
return result
I could never on my own come with formula result[i] = i * nums[i] - left_sum + right_sum - (n - i - 1) * nums[i] Because of this I lose motivation to practice
How do you practice things like this? Often coming up with a really simple naive approach is a good first step to noticing a smarter way of doing it
also depending on the company you're interviewing for, doing a simple working solution first and then talking through the process of optimizing it might impress them more than just being able to go directly to the clever solution
Yeah, I do not usually write code for naive approach but I usually know naive approach and when I feel like solution is naive I do not think that much about it
I always feel like if I try to think about naive approach and then try to explain and talk about it, that I lose time because I do not immediately think about optimal solution...like I benefit more from coming up with optimal solution and writing it rather than not writing it because I wasted time on writing solution for naive approach
maybe I could benefit from thinking more about naive approach
yeah comparing them and gradually moving from a naive approach to a clever approach is a skill in itself
some stuff to think about is why the naive approach isn't good, identifying and pointing out how it's wasting time or what it's doing that's repetitive, and how a better solution fixes those problems
Hello. I am new in the programming world. I have just started learning Python but before I have been learning MakeCode and Scratch. I have base but not so fully because of my teachers at school. Do you have some tips what to do, learn? I am having classes about Python with group. We have just finished algorithms and some things that we use in Python such as int, if else... Please someone help me I want to be persuade a career in programming.
can anyone help get me going in the right direction on visualizing an algorithm which packs boxes of known dimension into a shipping container? it's basically the 3D bin packing problem where n of bins = 1. so far GPT has really failed to help me in any meaningful way, therefore I'd like to begin to implement known algorithms from scratch, with the end goal of visualizing the algorithm as it runs. setting aside the animation, im struggling to implement the logic and space filling code, for example be it a voxel based approach, handling rotations of boxes, et cetera
that expression is simpler than you think if you just break it apart
you start with this, the most naive thing
result[i] = sum(abs(num[i] - num[j]) for j in range(n))
```you can split this in two cases, where the number is smaller and when it is bigger, to get rid of the `abs`
in this case we have a sorted array, so this just becomes
sum(num[i] - num[j]) for j in range(i))
+ sum(num[j] - num[i]) for j in range(i+1, n))
we can sum terms independently, so
sum(num[i]) for j in range(i))
- sum(num[j]) for j in range(i))
+ sum(num[j]) for j in range(i+1, n))
- sum(num[i]) for j in range(i+1, n))
```which can be simplified to
sum(num[i]) for j in range(i))
- i*num[j]
- sum(num[j]) for j in range(i+1, n))
- (n-i-1)*num[i]
this is still a slow solution, but now you can notice that the sums that remain are prefix/suffix sums
so we can optimize that
and that's the whole problem
thanks for writing that. I guess I gotta put more effort and be more patient
although at least I feel that I made improvement and that I got better at solving problems
it's a nice case of starting with a naive and correct formulation and transforming it into something that can be better attacked
Ok so I have a working prototype now of the box placing algo, however I am failing to capture the intermediate steps, eg I want to actually show when a placement tries and fails, show rotations, et cetera
But depending on my stack this may now be like a react or nodejs question
There, I am interested what you are saying, And I can help you.
Itβs actually an interesting problem that Iβve been focused on even if you leave out the weights of the boxes and just focus on how many you can fit in the container
Can you tell me in more clearly?
The bin packing problem in 3D and with n bins = 1
How can you most efficiently pack a shipping container basically given a set of boxes of known dimension
Fitting the most boxes possible
The way I see it, this is mainly a mathematical exercise about manipulating sums
My solution would then look like this
def getSumAbsoluteDifferences(self, nums):
n = len(nums)
result = [0] * n
prefix_sum = [0] * (n + 1)
# Calculate prefix sums
for i in range(n):
prefix_sum[i + 1] = prefix_sum[i] + nums[i]
# Calculate result array
for i in range(n):
left_sum = prefix_sum[i + 1]
right_sum = prefix_sum[n] - left_sum
# The +2 is because of 0-indexing
result[i] = nums[i] * (2 * i + 2 - n) - left_sum + right_sum
return result
can someone check if 2nd option is correct or na ?
G is a directed graph
Without giving away too much I think there should be ||2|| true statements total
hmm, I got ||1|| true statements
actually you are probably right
there could be ||a negative cycle in a separate component||
wait no
it says "in p" 
not in G
the other statement would be ||the subpath one||
||#3, yeah||
I think p cannot contain a positive or a negative cycle
But I'm not 100% what they mean by p containing a cycle
Only case where a cycle appears in a shortest path is if its weights is 0
By that logic, there are ||3|| true statements in total
Is anyone aware of how i can progrsm a bot to recognize wicks on a candlestick in the money market?
I read it as "in G" first
talking about cycles in a shortest path is kinda sketchy in general
Talking about a shortest path as a list of nodes is sketchy too
I see it as a list of edges
It's the edges that are weighted
It's the subgraph containing those edges that potentially could contain a cycle
Rate my solution to Reducing Dishes
Someone explain how people are doing this in 10 lines with 1 loop and 1 item
huh, this problem really shouldn't be marked Hard
The part you're probably missing is that to calculate the new score when you prepend an item, you just add the sum of all current items to the current score. And the sum of all current items you can update on the go, too.
This problem is solvable in O(n) by doing this
does my approach logically make sense
how tf wud anyone know that intuitivley
damnnn
i just shove on the largest negative numbers infront of the positive list
to shunt it along one at a time and check if its improved
Why is leetcode ide so ugly
That's what I do too (my reasoning was: it's always optimal to have your chosen list in sorted order, and use the largest numbers possible, so the only question is how many of the biggest negatives you prepend to your list of positives).
I wrote mine like this (and then realised it's overcomplicated because I could handle positive and negatives together instead of having separate neg_score and cur_score variables, but was too lazy to rewrite it):
how tf wud anyone know that intuitivley
I would say it relies on having done a few problems solved by computing cumulative sums. When I looked at this scoring mechanism I went "oh that's a cumsum for sure, no wait it isn't, oh no wait it still is"
That is kinda how I did it I think. But Iβve seen people solve in 10 line
The code I posted is after looking at it and thinking for
Half an hour because I donβt do leetcode so I just had to brute force it in my brain
nit: O(n) after sorting
Oh right, I forgot that sorting takes time π
and you can make it a tad shorter
def solve(satisfaction):
sum_so_far = 0
best = 0
score = 0
for x in sorted(satisfaction, reverse=True):
sum_so_far += x
score += sum_so_far
best = max(best, score)
return best
For people that practiced a lot of competitive programming, in what ways your thinking improved and what kind of techniques have you learnt that are useful for getting better at problem solving? I am asking about things related to problem solving other than knowing some specific tricks like slow and fast pointer, so I am asking, what you have realized about how you can generally improve problem solving skills?
These are things that I have learnt:
- I solved like 170 problems and I have realised that it is useful to think about different characteristics/parts of solution and try to understand how different characteristics of solution map to solution, what kind of problems different parts of solution solve, e.g. that hashing could be used for checking number of occurences of particular element. After understanding how different parts of solution solve problem, then it is easier to solve new problems, because solution for new problems sometimes consist of different parts of solution from old problems
- I have learnt to be patient and if I don't understand something at first, to ask specific question, then try to get an answer to that question so I come up with solution (before I was too impatient).
- I have learnt that it is important to "think in examples" and not wait for solution so that it comes on it's own, so really put effort and "think in examples"
- Other than that, I have learnt that sometimes it is useful to think about how to solve problem without using code
Im far from being anywhere near the level of other people here but, usually solving more problems on your own is the only answer here
Right, because new problem is somewhat similar to one that was solved
crazy af
where to start data structure and algo!?
I think you could even do this
def solve(satisfaction):
sum_so_far = 0
score = 0
for x in sorted(satisfaction, reverse=True):
sum_so_far += x
score += max(sum_so_far, 0)
return score
The answer is just the sum of all sum_so_far that are positive since if sum_so_far turns negative, then it will never become positive again
Im doing leetcode and even though i was learning for a while and completly finished the base learn python course on boot.dev I cant seem to solve any leetcode problems does anyone know a fix for this, do i just need to code more because i cant solve anything without AI help and then i feel like im not learning its a shitty spot lol any ideas
I prefer reading books and a mix of exercises (i.e, LeetCode). I read "Grokking Algorithms: An Illustrated Guide for Programmers and Other Curious People", and I liked it. I guess it's a good starting point.
Are you brand new to programming? i ask b/c you said you just did the base python course.
yes
Probably too early to focus on Leetcode then
I was steered towards leetcode when I started and it wasnβt a good use of time. Even the βeasyβ problems assume a lot of baseline knowledge
but that's just my two cents. i'm by no means an expert
not sure if this is the right place to ask, but, just wanted to know if anyone could suggest a good place to learn algos and data structures from.
Have you checked out the awesome list? The computer science section should have some stuff.
I will check it out.... Thanks!
Bruh why leetcode is freaking hard. I feel very dumbππ
https://www.skartner.com/dsa/two-pointers/remove-occurences
Hi, I am creating a course on DSA in Python, it's free, please check out.
I plan to cover 300 important questions. Blog contains problem-statement, different approaches, code and video explanation at end.
My background - SDE with 4 yoe, now a startup founder.
Here's my leetcode - https://leetcode.com/u/RahulGupta9877/
Your Partner in Skills Development, learn Data Structures and Algorithms, Math and more. Primarily useful for people wanting to get into Software Engineering and Machine Learning.
I will leave yall with a question
go write some code to find the saddlepoint of a m x n matrix
looking for a coding parter for DSA in Python
expected time complexity?
and I don't think this is possible in better than O(n m) unless we have some guarantees about the values
e.g. you can have a sea of say constant 5 then a
5 5 5 5 5
5 5 6 5 5
5 4 5 4 5
5 5 6 5 5
5 5 5 5 5
```in the middle
or is a constant like that all saddle points?
even if so you can make some evil patterns if repeated values are allowed
Chat i need a help!!! I am working for a competition where i get to finish 400 tasks. I am stuck on this task where I need to write a python program to convert inputs like this to output. They can give any input and i may have to convert it. here is some input and output data :
Click here to see this code in our pastebin.
All saddle points
O (M x n )
how do we know what the solution is?
e.g. why can't the solution for the yellow part of the first one be
similarly for the one with the pluses, how do we know it's not more tightly spaced pluses?
Very trivial. Not bothering to write it lol
Actually may be fun to golf
Probably it's "goes the furthest it can where it complerely overlaps".
I might be tripping but this is extremely reminiscent of the ARC-AGI series of benchmarks for testing LLM reasoning abilities; if I'm right, then it's better suited for #data-science-and-ml
also this benchmark is still under active research, there's a competition with big prizes (though, it doesn't seem to limit submissions only to llms) currently underway for example, so there might not be a satisfactory answer for now
it certainly looks related, and there is an ongoing competition
its an arc agi competition.
Kaggle one??
lol
thats what im in
how do you find out time complexity of a code ??
counting
Guys any ideas about where I can find the link to sign up for IBM's open source quantum?
does anyone know how in order dfs traversal works?
Did you try using " IBM open source quantum" as a search phrase on Google? Perhaps you should try it.
BFS and DFS are essentially the same. The former uses a queue and the latter uses a stack.
I did but it didn't show up at first, but I have the link now
Hi Everyone, here's a small task if anyone interested.
You have to write a algorithm which will visit every url on the site once, save the url and screenshot the page using x,y coordinates. you can generate the x,y coordainates and tagtype using the pupeteer script i am providing. It will generate the x,y,h,w and tagtype for all the visible elements on the webpage. I am also attaching a script which you can use to control the mouse and keyboard using python. So you have to move the cursor to the center of the rectangle. and save. visit next url, screenshot and save, repeat for all the urls, as you visit a new url, wait for page to load, run the run script to get all the new elements, and filter only the urls, use a list to track all the urls visited. and restrict the algorithm from visiting any url outside of domain.
And yes, I am willing to pay
Looking forward to hear from interested minds
Click here to see this code in our pastebin.
you can use python selenium, or any other library you want.
i was looking for a fellow python mate to grind DSA in , anyone who is serious for getting into FAANG , hardcore level of DSA grinding , it would be convenient if you are a newbie in DSA and using python mainly for consistency
Note: Mostly focussing on leetcode grinding , not doing competitive programming or codeforces
does any one know how to to fix found numbers in python with formulas relating to data sets
@regal spoke I didn't see but look what the bot has said on your issue "Our teams prioritize action on product issues with broad customer impact."
Does that mean they put the issue in "We basically don't care"
There has basically been no commucation. That bot just post the same stuff on every issue.
That doesn't mean they arent working on it
Ah I thought only some issues had that
I wouldnt read too much into it
@regal spoke
I don't know what they did but hacking CPython dictionaries is like so much worse (or good, depends on how you see it) in Python 3.14
I've seen odd reports of 3.14 having random performance issues, I wonder if it might be a messed up build setup or something
like missing some optimizations that otherwise would happen
lto maybe?
could be, but the "non-hacked" version is consistent across versions
if it wasn't compiled properly, the "non-hacked" version would be slower too?
or is it a marketing technique to say "we improved by a factor 3 collision hash performance" when 3.15.xx will be out
not necessarily, in normal usage the bad codepath is hit very rarely, so if that path ended up being worse in one version it would look like that
I donβt understand what you mean sorry
If it was compiled without optimisation, the peak in time for 3.14 would occur for both the bad path
And good path
Ah yeah I didnβt show the graph for good path
But good path is the same time for all versions
not without optimization, but without some specific optimization like lto or whatever
if the path gets hit that part would indeed be slower, but if you have something like
total_cost = lookup_cost + resolve_conflict*n
```assuming the overall lookup cost is a largish constant, then you might not see the contribution of the conflict resolution for small n
so even if it got slower you might not see it unless you hit that path a lot of times per query
in normal usage you expect very few collisions (typically 0), while with the hack input you expect a ton of them
Ah so you mean that optimization would only affect resolve_conflict cost?
the lookup cost itself might be dominated by just a fetch from main memory
ah ok makes sense
i.e. nothing you can just trivially optimize
I'll try to compile the source code myself from the latest point in the branch
of course I'm speculating π
Yeah of course, but it's still worth it to check it I guess
Ah I think I found how to enable release mode
@haughty mountain damn
maybe you were right, those are my results with python locally built with optimizations
when debugging a function clicking step over it sometimes says running and i wait for 5 minutes and nothing happens still waiting?
Infinite while loop or something?
jasonn what makes you sucha good enginerer?
\
So I just found out that the slowdown also is here on Python 3.13.6 if you download it from the main website
Could be different build setups that aren't mistakes
Just some have side effects on obscure hash collisions that barely anyone cares about
it might be some bad release setup though
That would be ironic
Python 3.13.6 is literally on the homepage of the main website
idk why it would be ironic
maybe at some point something changed in the release process (scripts maybe) changed and caused a regression, and the setup is then re-used for new releases
Well idk some people would be stuck with a version that hasnβt been compiled optimally because people donβt usually update their versions that often
hay guys im planning to start dsa in python, could you please shear some resources
@robust lark
Yall discuss dsa or cp questions?
Hi guys, i am taking dsa in uni, and our teacher let us pick a language to use for the class, java, c, or python. I know pretty decent in all three. I just dont know which is more effecient to use with, or just enough for the job, here are our syllabus
Course Outline
The following modules are covered in CMSC 122.
I. Introduction to Data Structures and Algorithms
A. Basic Concepts
B. A Philosophy of Data Structures
C. From Problems to Programmed Solutions
II. Fundamental Data Structures
A. Arrays
B. Linked Lists
III. Search and Sorting Algorithms
A. Arrays
B. Sorting Algorithms
IV. Algorithm Complexity and Analysis
A. A Deep Dive to Algorithm Complexity
B. Worst and Average Cases
C. Asymptotic Analysis
V. Stacks and Queues
A. Stacks
B. Queues
VI. Binary Search Trees
A. Introduction to Non-linear Data Structures
B. Binary Trees
C. Array-based Representation of Binary Trees
D. Tree Traversals
VII. Self-balancing Trees
A. Introduction to Self-balancing Trees
B. Red-Black Trees
C. Complexity of Red Black Trees and related algorithms
VIII. Heaps
A. Min- and Max Priority Queues
B. Binary Heaps
C. Min- and Max Heaps
D. Complexity of Heaps and related algorithms
IX. Hash Tables
A. Hash Tables
B. Hash Functions
C. Managing collisions
X. Graphs
A. Introduction to Graphs
B. Data Structures for Graphs
C. Traversal Algorithms for Graphs
D. Problem-solving with Graphs
E. Complexity of Graphs and related algorithms
Thanks in advance guys
Having to deal with memory management yourself makes it more tedious; but if you think you can handle learning to go about learning a bit more difficult algorithms, it's worth it imo to learn DSA in C because less is hidden from you (which makes complexity analysis easier because you know the complexity of the operations you're using generally)
What is "for the job" depends on the job; programming jobs is a large category, can't say much on that
i mean like for uni type of works
I strongly recommend Java
Where should I start with first?
What is the best way to start learning algorithms?
Whyβs that?
I have some formal algo studying on the horizon and am debating between Python and Java
Just easier in general to not be tempted to use all the Python conveniences.
We give a deterministic $O(m\log^{2/3}n)$-time algorithm for single-source shortest paths (SSSP) on directed graphs with real non-negative edge weights in the comparison-addition model. This is the first result to break the $O(m+n\log n)$ time bound of Dijkstra's algorithm on sparse graphs, showing that Dijkstra's algorithm is not optimal for SSSP.
An advance in shortest paths compute.
For sparse graphs in particular
yes, important distinction
I see. Thanks
Can anyone help me fix my greedy search? This is a broken old version but I need it fixed soon:
i may or may not have thrown away all my algo knowledge, but i'll see if i can help with my single brain cell
code block pls?
also input and expected output
It is appreciated, it has to work with this, the first and third work because they weren't corrupted
I'm not inputting anything, just running the code on a premade graph for my assignment. The solution is probably obvious but I'm really tired right now
You seem to be enqueueing child nodes multiple times.
You should keep track of whether you've enqueued them.
it appears that city_coordinates is a dict
im not really sure how city_coordinates is structured
Yeah, part of the template I have to work with:
this is technically not even related to the greedy algorithm, it's just a syntax error
anyway
Hint: to use automatic tuple sorting, you could put the cost at the front of the tuple.
(cost, node_info)
@umbral mountain am I missing something here?
nope what you said should be right.
city_coordinates(start, None, None, 0) is just wrong syntax
Do I need to post more screenshots to provide more clarity?
python error more than algo error
No but you need to
- reread zehata's message
- Look at your code with regard to
city_coordinates.
No it is definitely me being dumb
thing is, i dont really know what city you are trying to add to the queue
can you go through with me how you determine which city you are trying to add?
The immediate error you are making is that you are "calling" a dict as if it were a function. But it's not, of course.
It is for an assignment, I'm going through 3 different search algorithms to find the quickest path from Ipswich to Newcastle
ok, maybe I rephrase
I'm not good at python and again I'm really tired this corruption is making things difficult
Just need someone to get me over the line
what are you trying to achieve on the line
frontier.put((graph[start], city_coordinates(start, None, None, 0)))
you need to tell me what you are trying to do in english
I don't really know, I lost the working one, and the one I'm trying to fix is months old that I imported, I probably need a new one. Sorry for confusing you
i dont mean to sound mean, but did you write it originally?
or did you ask chatGPT, or someone else to do it for you
It was from a problem sheet solving task at my university, I tried to make it work but I can't figure it out as I'm mid at python but it is part of my computing course
how long do you have till you have to submit
4 hours, the others didn't take long but christ I am knackered, came here for a quick solution but I probably made myself look dumb
and a more technical question,
what does graph look like
if i understand the assignment correctly, you are trying to get from Ipswich to Newcastle with greedy
thing is
I can't tell how their connections are represented in the variable graph
There are also many variants of "greedy" pathfinding. Which one are you attempting to do
thats true
Go through all the paths to find the solution and then measure it against my a* and uniform search algorithms
i kinda assume that you are just picking the lowest cost at every node to follow
tepig
mmmm
it's clear to me that you have absolutely no idea what you are doing
i
i could bail you out
You might need to hit the books lol
So you know how to code an A*?
do you know what is A*?
Yeah, that is why I am here, I should be okay anyway, I think I can do it. I just wasted my time
Yeah, I got that one and a uniform working.
I should probably just figure it out myself, I don't think I'm getting a quick solution here.
Ok
Uniform Cost Search
This greedy search bit is the last bit if I need to make it clear, the performance comparison code works.
from what i can see
you are supposed to add every city connected to Ipswich
to frontier
right?
well, the starting city, which you hard coded as Ipswich
Right, so truncated dijkstra
Yeah
IDK with that greedy search tbh I probably broke it worse.
I'm going to go back to the original templates and guides, sorry for wasting both of your time. I'll definitely get this done, thanks for trying to help me.
do you know what does frontier mean in the context of this graph problem?
it means the possible next nodes that have not been explored yet
well, edges
right now your code is failing at the part where it's trying to populate frontier for the first time
i.e. you need to find a way to get all the cities connected to the starting cities
and add them to the frontier
dw about it a friend is gonna help my sleep deprived state
your code is failing because the expression that you used in frontier.put((graph[start], city_coordinates(start, None, None, 0))) is not the correct way to find a list of connected cities to add to frontier
it's the set of nodes that are possible next nodes, not all of the unexplored nodes, no?
sorry i phrased it wrong
my bad
@dreamy night I am a fool I wasn't supposed to do greedy, I was allowed to do a simple one like depth first search. Sorry for wasting your time and thanks for trying to help, I really appreciate it!
still, do you at least understand my explanation just now
whether you use greedy or depth first search
you are still going to have to
populate your frontier queue
Yes, thanks!
In the Quicksort example, why did the first partitioning step produce the exact sequence [8, 5, 9, 6] instead of another random-looking order given the initial unordered list [9, -3, 5, 2, 6, 8, -6, 1, 3]? Note that the last element is the pivot element.
it doesn't necessarily need to be that
i think the last element is a[0] which is strange
But I also found another example with the same exact case
yeah the diagram is placing the first element of the array last
This wasn't my question actually...
oh
quicksort works by repeatedly taking a pivot and taking the sublist of elements smaller and larger than the pivot
and does quicksort on those sublists
I just found why this behaves in such a way
It doesn't have a particular sequence that it needs to end up in.
It all depends on the partition scheme.
As long as "stuff on the left less than pivot, and stuff on the right more than pivot", it's all good.
Quicksort is unstable anyway.
as others have said the exact mechanics doesn't matter to the functioning of quicksort, you just need to partition things somehow
but in the case you're looking at for C++ the partition seems to be done with swapping like in std::partition
Do you know what a full binary tree is? Is it a binary tree, whose nodes have 0 or 2 children?
a bit
Guys can you help me further?
yes
just ask your question dont ask to ask
In a complete binary tree, the nodes can have either 0, 1 or 2 children and every level is filled, except possibly the last level and all nodes in the last level are as far left as possible. With that being said, a complete binary tree can also be a full tree or a perfect tree, right?
Why not just google for the definitions?
I have another question: There is a general concept about balanced and unbalanced binary trees, where balanced simply means that it does support O(log n) effectively, while an unbalanced tree means that it doesn't support O(log n) effectively?
Yes that is correct
Is there something called a complete balanced binary tree or is it just a balanced binary tree that is also complete?
Complete trees are always balanced
But balanced trees are not always complete, right?
No but only because the definition of balanced depends on a constant.
Balanced means depth of left and right subtrees differ by <= k
If k == 1, and child nodes are aligned left, then balanced => complete
Do you mean that there is a constant c that limits the difference in height for any subtree by c?
A balanced tree, particularly a balanced binary tree, is a type of tree data structure where the height of the left and right subtrees of any node differ by at most a certain value, typically 1. This property ensures that the tree remains relatively "flat" or compact, preventing the worst-case scenarios of highly unbalanced trees that can degenerate into a linked list.
So this would imply an AVL tree, right?
Not necessarily. There are many types of balanced tree.
Okay, another quick question: How many subtrees can you see in this binary tree?
- It's the same as the number of nodes.
Yes, I would also think about this. The rule is that every node in a tree is the root of its own subtree. Most people would assume that the child nodes (also known as leaves) are not part of a subtree. However, a single node counts as a tree as well
So are you asking just to test me? π€
Not to test you, but rather me. I like to exchange information and debate about various topics, especially if I'm not sure
Have you written your own tree class yet? That is an excellent way to improve your understanding of trees.

or wait, I'm thinking of a different thing than subtrees
hello everyone , i am looking for a partner to hardcore grind DSA in python for FAANG mostly , beginners and intermediete both are welcome
can i?
Yeah sure
hiiii new here so I hope this isn't the wrong place to ask. Does anyone know of a way to make a large collection of objects and efficiently pick a (pseudo-) random one? (in fact, pick all of them, in a random order, doing a pop() on each as I go until the collection is empty)
I'm currently just using something like list[random.randint()] but with the number of objects I have it's sloooooow.
Not a python expert but I at least figured out that dicts/hashmaps are faster than lists; I implemented dicts in a few other places in my program to speed things up but I can't figure out a way to pick randomly from a dict without first converting the keys to a list, which ruins the efficiency advantage
why not just shuffle them first then pop it
I assume it is the removing from list part that is slow because you are removing from a specific index in the list
yeah
oh I forgot popping from the end would not have to do that
Ah no right the problem I had with that was that when I pick a random one, sometimes I use it, and other times I put it back in the list and pick a new one
which if I just shuffled them popped from the end would give me the same one again
Honestly there's probably a better way to do that part, so that I don't have to put any back, but I am ignoring that for now
from itertools import permutations
after reading the documentation on that function for a while I am struggling to figure out what to do with that
then could you fully and specifically describe what you're trying to do? would be helpful to other people passing by that may be able to help
ok so I'm trying to create a 2d grid of objects ("tiles"), with each holding a "type" that is randomly selected, but biased by the other objects that surround it in the grid. (i.e., make a fun little randomly generated terrain map) (there are four "types" (water, grass, beach, tree) with whether it can be each "type" stored in an attribute as a dict with boolean values)
Function:
- create empty dict, create empty list
- Populate dict with same set of "tiles", keyed "i_j" (The coordinates
[i,j]are also stored as an attribute by each object). At same time, append() each object to the list (just a 1d list holding each one sequentially)
~While there are still "tiles" in list --
- Pick a random one (
list[random.randint(0, len(list)-1)]) - Check if it's on the wall of the grid, i.e.
iorjis 0 or the size of the grid in that dimension. If so, tell it it's a wall, remove it from the list, and move on. - Elif the tile does not have True for all of its possible "types:"
- or
rand.randint(0,1000)==5(to break the starting case where all have True for all their cases, and randomly choose some other ones too so it doesn't just spread out from a single tile) - While tile type is None:
rand.choice()the possible "types," and check if this "tile" is allowed to be that "type." If so, set tile to that type. - (outside while loop, still in elif) create list of surrounding tiles, indexed from the dict (
dict.get(str(x-1)+"_"+str(y)), etc.) - For each "tile" in the surrounding tiles: modify its list of allowed "types" based on this tile's type.
- Remove this "tile" from the list.
- or
- No else statement -- (if neither if condition is true, keep the "tile" in the list and pick a new random one)
My code is terribly messy and not well commented at the moment π because of course it would be and I did not come here prepared
It works well enough though, for what I want it to create. However, the step of picking a random one from the list like that is very slow
boy this was long I probably should've just gone to #1035199133436354600 at this point lol
Does it matter whether the pick step is uniformly random? What you could do is have two lists of tiles. When you need to re-insert, instead put it in the second list. Once the main list is empty, shuffle the second list, then swap the two lists.
That sounds like it could work? I'll try it -- for reference this process takes about 5.8-6 seconds at the moment (per time.perf_counter())
(that's for a small area, 200x200 pixels; if I scale it up to reasonable size for my screen it takes minutes, which is why I want to speed it up lol)
I might've implemented it badly but this seems to have slowed it down, up to 12.1 seconds
Probably worth running with a real profiler, that'll help tell you where your actual slow spots are.
Just noticed something - this could be a slowdown: dict.get(str(x-1)+"_"+str(y)). Instead it'd probably be better to just use (x+1, y) tuples.
I originally had the dict as a 2d list if that's what you mean? It was much slower
No, dicts don't have to use string keys, they can use any immutable object, like tuples.
Oh, huh. Did not realize. I had tried keying it with a 2-item list and it yelled at me so I just wrongly assumed tuples would be the same
this does appear to have cut down about half a second, ty lol
working through documentation to try to figure this out π
For a grid, it is usually more efficient to use a 2D list
I.e. List of lists.
Lists are mutable
Tuples arenβt
hashable*
(A mutable object can be hashable -- for example functions, or instances of a user-defined class that didn't override __eq__ explicitly)
I am pretty confident it was a lot slower when I tried it. Can't remember the times I recorded but
depends on what operations you were doing on the 2d list
if it's just indexing and changing individual elements without moving things around, it should be slightly faster than a dict
if you were modifying the structure of the lists tho then you would probably see a decline in performance
I am indexing random elements and then removing them
So yeah structure changes frequently
yeah that'd explain things
not too sure what you're doing, but dict is probably fine
if you wanted to stick with 2d lists, it'd be better to just set stuff to something like None instead of removing it from the list
Why remove them? Is it not OK to set them to None temporarily? I encourage avoiding the list.remove method entirely if possible.
I am choosing random ones but need to ascertain that eventually all of them have been chosen. If I set them to "None" then by the end it would be just choosing random numbers repeatedly until eventually happening to land on the one single one that hasn't been chosen yet, no? I can't imagine that would be fast
Would also need to copy them into a second list at the start since I need to keep the information about the ones that have already been chosen but that would be fine presumably
can you explain what you actually trying to do?
list does not sound like the data structure you want
^^ ?
How can I start learning py
Why wouldn't it choose the other edge that costs 7 as well?
it can, but it doesn't matter for that input
Okay, I see π
can you just shuffle your list and go through it, rather than doing the slow removal at each step?
No, because I often need to repeat points if they did not meet conditions for removal the first time, and if I just used a single shuffled list it would just give me back the same point again when I need to try a different one instead
Suppose I could shuffle, go through, then make a new list of the ones that haven't been edited yet and repeat? But id be making a new list for often every few points so I'm not certain that would be faster...? I can try it at least (later)
your thing sounds like a use case for a structure like the one used to solve
https://leetcode.com/problems/insert-delete-getrandom-o1-duplicates-allowed/description/
(though the duplicates part isn't interesting)
actually nvm your problem should be easier
instead of removing by .remove or whatever, swap the element with the last element and .pop
Hmmm, that might work, though if I'm understanding correctly I worry about some undesirable patterning, as items earlier in the list are less likely to be removed, so the end of the list would be skewed towards points that initially did not meet conditions. Might not make a noticeable effect though...? Worth trying
if I've implemented it correctly it did indeed produce an undesirable output (before is on the left, after is on the right)
Additionally, took... exactly the same amount of time. Which made me suspicious that this process wasn't what was eating time, but when I skipped it entirely it went down to near-zero time, so... ?? I could also just be misunderstanding though
https://jiffyclub.github.io/snakeviz/ profiling is based, you could try it out if you're at a roadblock algorithmic-insights wise
SnakeViz is a browser based graphical viewer for the output of Python's cProfile module.
I meant that you pick a random thing uniformly like you tried initially, if you decide to use it you delete it by swapping to the end and popping, otherwise you leave it be
Ah huh I may have indeed misunderstood. Will try later
hi guys i need help
This did actually give way more improvement than I was expecting, neat. Down from ~5.5s to ~1.9s for total generation.
I guess I'm removing items from large lists like that forever now π
Welcome to arrays. Also, this is only OK if you don't mind your list becoming unsorted.
Ah, suppose so
Program as a whole is still quite slow and I'll have to come back to this, I think, but I think a two-fold reduction is enough to move on for now as long as it's still just a personal project
as mentioned, if order matters this technique is not an option, and you would have to find other ways to not do expensive deletions/whatever
a lot of algorithmic design comes down to avoiding having to do the expensive naive thing
dsu
favourite algorithm is tough 
probably the polynomial time 2 sat solver
I quite like tarjan's scc alg as well
the suite of algorithms surrounding computation of the matrix exponential is also rather interesting
putzer's, the column method, jordan chevalley, Jordan normal form, etc.
tho that might be more mathy than cs
Segment tree and segment tree
Big surprise
what segment tree impl do you use?
Usually use something like this: https://github.com/cheran-senthil/PyRival/blob/master/pyrival/data_structures/SegmentTree.py
But like, I'm used enough to coding these that I code them on the spot everytime
So I don't really have one implementation that I use
I swear this was initially revealed like a month and people back then went like "oh it's actually only faster in a very very very specific niche" and somehow it's exploded in the news again with no context
how can i post block of code here??
!code @silver cloud
def is_balanced(input:str)->bool:
#((()))
#(()
if len(input) == 1:
return False
stack = Stack()
for paren in input:
stack.push(paren)
for i in range(len(stack.items)):
if stack.items[i] == stack.items[len(stack.items) - 1 - i]:
return False
return True
print(is_balanced("(()(()))"))
this returns false
but other cases returns true
like ()() or ((()))
A balanced parens representation does not need to be symmetrical.
(like the one in your print)
