#ESP32-S2 wifi monitor mode design

1 messages · Page 1 of 1 (latest)

mental monolith
hushed chasm
#

@night kestrel invitation test... I'll post some notes here soon

night kestrel
#

🙂 👍

hushed chasm
#

Apologies in advance for my verbosity, a lot of this isn't obvious or intuitive to me so it helps to get a sense check. First thoughts on design were something like wifi or bleio scanning, but I think those may be too "chunky", meaning a bunch of packets come in a short span, and then we miss a larger interval while processing those in CP (more likely to miss short-lived events). In my mind, a smoother more random sampling would be better. Second thoughts were something like Keypad EventQueue or BLE PacketBuffer, which use ringbuf, but unless they just hold mini-pointers they're too limited.

#

But, there's already an esp-idf queue in common-hal (needed by the asynchronous callback), and at least in my current somewhat wifi-sparse environment, CircuitPython is able to keep up with a single-shot received = wifi.radio.monitor_recv_into(buf, MAXBUF) API (monitor_min branch). With more filter parameters on the request, and keeping to small (sub-)packet returns, it may be enough. Initial implementation actually starts up and closes down monitor mode for each packet, which is overkill and I'm willing to bet very inefficient.

#

But, even so packets get cycled in the core in as little as a couple of low-10s of ms```I (988994) wifi:ic_enable_sniffer
I (989014) monitor: returning a packet, q=1
I (989014) wifi:ic_disable_sniffer
I (989044) wifi:ic_enable_sniffer
I (989144) monitor: peek timed out q=0
I (989244) monitor: peek timed out q=0
I (989344) monitor: packet timed out, q=0
I (989354) monitor: returning a packet, q=1
I (989354) wifi:ic_disable_sniffer

hushed chasm
#

All of the above is randomly selecting a channel each time, and only returning enough bytes for the packet header and SSID (58 bytes needed for that; most packets tend to be 100-300+ bytes in total; in theory, they can be a couple thousand or more). Throughput approximately doubles when sticking to one or a couple of the main channels, 1, 6, or 11, and CircuitPython still keeps up.

hushed chasm
#

Any thoughts or suggestions are welcome. If this seems like a reasonable path I'll continue in common-hal, to separate monitor start/stop from the recv_into (some framework is in place, but more asynchronous coordination is probably needed). It would be good to get additional parameters sent down to filter what comes back (e.g., channel, minimum_rssi, header fields)- I think that's what you have @night kestrel? It would also be good to add some kind of queue length or overflowed so the user can know when the queue is under stress.

hushed chasm
#

Running overnight without monitor stop / start, over a million packets at about 29 packets per second average (~35ms / packet), parsed in CP. CP keeps up. The common-hal queue sees periods of stress (>50% full) with lost packets due to espidf memory fragmentation. There may be tweaks to parameters to help. Regularly stopping and re-starting monitor from CP cleans up memory and keeps it running healthy (but of course that drops packets). User-level filters sent down to common-hal and applied early should also help reduce the burden of processing the queue and returning packets.

night kestrel
#

Looks like you are on the right track, I can see the final api taking shape:

@singleton
class wifi.radio.Monitor(buf, max_buf, channel, min_rssi, header, parse)
#
import wifi

monitor = wifi.radio.Monitor(buf=buf, max_buf=MAXBUF, channel=10, parse={wifi.Packet.LEN})
print(monitor.packet)
monitor.channel = 8
monitor.parse = {wifi.Packet.RSSI, wifi.Packet.RAW}
print(monitor.packet)
monitor.deinit()

# Output
{"LEN":"..."}
{"RSSI":"...","RAW":"..."}
#

Do you think we need to supply parameters for parse function or just send back the whole packet instead of selective parsing?
I am starting to think that selective parsing won't render significant gains in performance.

#

I will test this on an esp32c3 and will report back the performance for more insight on how this works in an even more compute limited environment.

hushed chasm
#

Hard to say, I really don't know the different ways people may want to use this. I pretty much care only about Management packets (type), but maybe others are interested in other types. The header may be small enough (24 bytes) that there's little value to C parsing it, not sure yet. It may be useful to skip the packet body at times (saving a couple hundred bytes typically), other than getting the SSID if available, but who knows, maybe others want the available rates or something else.

#

We probably want channel to be a collection of some sort to allow returning frames from one, or a subset, or all channels.

#

Wi-Fi is a little more active during the day today, more stress on the queue, and some fragmentation even with regularly stopping and re-starting monitor mode (though no dropped packets (yet?). Stopping shuts it all down, so I'm not sure what else to do to help with memory... maybe constant-size or multiple-of-some-block-size allocations. This will surely be better on -S3, it will be interesting to see how it does on -C3.

#

The API looks good, the Singleton is interesting. I'm pretty sure we'll need the separate control to stop and start monitor. In theory, user code could do some monitor stuff, then turn it off and be a station for some other work.

#

Espressif says you can run monitor with light station or AP use, but I tried that briefly last night and it didn't go well.

#

oh, I see monitor.deinit() now ...so invoking the Singleton would start up monitor mode, and deinit would shut it down. That works.

night kestrel
night kestrel
night kestrel
hushed chasm
#

I'll work up the parameter list

hushed chasm
#

How to handle SSID? Certain subtypes of Management frames may contain an SSID (up to 32 bytes), but it's in the variable-length section of the body. We could add it to wifi_packet_t and parse could return it. Or the user could just extract it from the packet, even limiting max-buf to >=58 to get just the header and the SSID in the RAW value.

#

Function parameters:
buf -- currently uint8_t *buf in common-hal, just like sockets
max_buf -- currently uint32_t len in common-hal, just like sockets
channel -- some way to select one channel 6, several channels 1,6,11, or all channels ?
min_rssi -- signed int with range -100 to 0
header -- you have this... boolean for header-only?
parse -- you have this covered
subtype 🆕 -- unsigned byte (4 bits) with range 0-15 some way to select one, several, or all

#

(subtype would be a useful filter to reduce unwanted packets, but (oops) ...we probably want subtype, like channel, to be a collection or similar, to allow one, several, or all)

#

This is just for reference...

#

Return data:
.parse -- you have this covered
(for type=0 (management) frames)...
Header...
• your wifi_packet_t covers the header fields + channel, rssi, len, & raw
Body...
• fixed-length elements (0-12 bytes total for Management type frames)
• variable-length Information Elements (IE)
- an IE has a 1-byte ID, a 1-byte length (N), then an N-byte sequence of content
- a frame may have 0 or more IEs, not necessarily in order by ID, but they can be traversed using the lengths; SSID is IE ID 0 and is typically the first IE when present

#

...I think there are tools already there to get either header-only, complete RAW packet, or max-len of the packet to capture header along with only some early part of the body

hushed chasm
#

microDev... I'm going to be mostly afk until Tuesday, but should have time next week again

night kestrel
#

@hushed chasm I have completed the shared-bindings part and am now connecting it with the common-hal stuff from your monitor_min branch.

hushed chasm
#

@night kestrel Sounds good, I’m just getting back this afternoon, I’ll take a look tonight.

hushed chasm
#

@night kestrel Can you let me know when to look at the shared-bindings? There's work to be done yet in common-hal to match those APIs (not sure, for example, what channel and subtype will look like). Needs keeping the object alive through starts and stops. Also needs more fine-tuning (memory, etc), and there's not really init/deinit yet, so control-C and saves to CIRCUITPY typically crash it. Slowly researching and working through those things. Mostly afk again for a week starting Tuesday, then I'll be around for a while.

hushed chasm
night kestrel
night kestrel
hushed chasm
#

I'll go through it more thoroughly later this afternoon (GMT-5). All of the commented out parsing stuff can be deleted, looks like that's from an older commit. Better to put it back in later if we need it, I think. Can I PR to your repo-branch, is that the preferred way for me to suggest things now?

night kestrel
#

I will be removing all of the parsing stuff for now, that can be done in python as you have implemented in your example, I want to get the basic stuff running first.

#

Regarding the suggestions... I am fine with anyway you prefer to do it, you can just let me know here as well 🙂

hushed chasm
#

OK, sounds good. Thanks for taking this on!

hushed chasm
#

Looks good, I think I see where this is going. I realized that since only one packet gets returned at a time, it probably makes more sense to just have a single channel as parameter to get_packet and let CP do the channel selection packet-by-packet. Sorry about that, that was my confusion due to how I was testing. The other alternative is to randomize among the collection, but that's maybe a bit wonky. In future, there may be a desire to have some mechanism to return multiple packets, and multiple channels would make more sense, but I think we set that aside in this design. Makes sense?

#

You'd probably realize this soon, but wifi_monitor_obj_t doesn't need task or sem_task_over, those are remnants of how Espressif did their example. I think we got a little out of sync on commits, but I'll keep an eye especially on the common-hal stuff. Please let me know what you want me to do. For now, I'm going to keep testing and fine-tuning the common-hal operation in monitor-min branch.

hushed chasm
#

tbh, the shared-bindings abstractions confuse me in general, so I'm not sure if I'm reading right, but make_new is the Singleton constructor? I was envisioning many of the parameters being part of get_packet so that each packet can be requested with specific characteristics (like channel), rather than have it all fixed for the life of the monitor. Maybe some params make more sense in get_packet, and some make more sense in make_new. I suppose you could init, get-packet(s), deinit in the user loop when the characteristics get changed. Channel does seem like a critical one that is likely to change packet-by-packet so that users can monitor a set of channels.

#

One last thought that might influence the above... it might be nice (in future perhaps) to have something to get the queue size or whether the queue is overflowing or there is packet loss due to memory fragmentation... that kind of feedback would let the code auto-adapt to the ambient packet flow (cut down on channels or types or subtypes, etc), or at least expose to the user that it's not performing well. One set of characteristics may work well in one environment, but be too broad in another and the monitor gets inundated.

#

e.g., this is super-handy to instantly tell the health of Monitor, but production builds won't have DEBUG and we probably don't want so many print statements in the final code (DEBUG on left; CircuitPython output on right)

night kestrel
#

@hushed chasm I have updated the implementation and I believe the changes are PR worthy.

let CP do the channel selection packet-by-packet
Yes, the current implementation is doing this.

wifi_monitor_obj_t doesn't need task or sem_task_over
Thanks for pointing that out... these are now removed.

make_new is the Singleton constructor?
Yes it is.

each packet can be requested with specific characteristics
Currently this is a 2-step process:

import wifi
monitor = wifi.radio.Monitor()
# channel used: 1 (default)
print(monitor.packet)
monitor.channel = 11
# channel used: 11
print(monitor.packet)

get the queue size
Good point, we can have a property for this (e.g. monitor.queue)

queue is overflowing or there is packet loss
Not sure how to implement this feedback... an error won't do it, we can return a LOSS parameter with each poll.

hushed chasm
#

Thank you again for all of your work on this. Keeping initial PR simple is probably fine, we can add features easily. Let me think about overflow a bit, there are queue size and memory fragmentation issues. Can we wait until next week? I’m out of the country until Sunday with just my phone, and I’d really like to take a solid look at it.

#

thanks for answers… the API looks good to me

#

‘monitor.queue’ is good for queue size. Packet loss results from queue full or largest available memory block not big enough for packet (typ. 128 or less). But loss is independent of the current packet so that state would need to stored in the object, perhaps we could even get/clear count of lost packets but maybe that’s overkill.

#

…loss count probably needs an idf semaphore since it would be incremented asynchronous from our ‘clear’

hushed chasm
#

@night kestrel Thanks for your patience. Checking out the implementation now, looks good and I think the bases are covered. For the common packet loss cases, user code can check the queue length and also collect memory (fragmentation) stats by catching the exception. I'll load it up and test it out.

night kestrel
#

@hushed chasm I have modified your example to work with the current api. Give me one min... I'll upload it now.

#

@hushed chasm A little more updated version:

hushed chasm
#

cool, yeah I had remnants of trying to periodically shut it down to clear out fragmentation

hushed chasm
#

(some unrelated xtensa and ulab build issues)

#

hm, I can only get the BOOT drive to come up, some build or flashing procedure has changed?

#

perhaps not unrelated [246/639] Building C object esp-idf/esp_system/CMakeFiles/__idf_esp_system.dir/port/arch/xtensa/panic_arch.c.obj ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c: In function 'panic_print_registers': ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c:48:10: warning: unused variable 'regs' [-Wunused-variable] int *regs = (int *)frame; ^~~~ ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c: In function 'print_illegal_instruction_details': ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c:119:24: warning: unused variable 'pepc' [-Wunused-variable] volatile uint32_t *pepc = (uint32_t *)epc; ^~~~ ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c: In function 'print_cache_err_details': ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c:183:25: warning: variable 'size' set but not used [-Wunused-but-set-variable] uint32_t vaddr = 0, size = 0; ^~~~ ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c:183:14: warning: variable 'vaddr' set but not used [-Wunused-but-set-variable] uint32_t vaddr = 0, size = 0; ^~~~~ ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c: In function 'print_memprot_err_details': ../../esp-idf/components/esp_system/port/arch/xtensa/panic_arch.c:276:11: warning: variable 'operation_type' set but not used [-Wunused-but-set-variable] char *operation_type = "Write"; ^~~~~~~~~~~~~~

#

but the build completes

night kestrel
#

are you building my branch?

hushed chasm
#

yes, monitor-mode

#

I can test with an artifact if you have one

#

(rule out build issues)

#

oh, but I'm on a Cucumber, but have other boards... Saola, FeatherS2, etc

night kestrel
#

I can get you a uf2/bin. Which board?

hushed chasm
#

Cucumber, Saola, FeatherS2, LOLIN, Funhouse, Magtag

#

whatever is convenient, debug port is useful (but not essential)

night kestrel
#

I temporarily disabled Build-CI on my branch and only have pre-commit enabled so artifacts will have to be built from start but that would take lot of time

#

I'll get you a cucumber debug build

#

which variant do you have?

hushed chasm
#

R and RS

#

R at the moment is all set up

#

(but either build will work, just a difference of the I2C singleton)

#

Saola build also works

night kestrel
#

just started cucumber_r debug build with a bunch modules disabled

#

it finished successfully blinka_cooking

hushed chasm
#

something funny about the state of that board, running on a different RS board now... looks good, I'll put it through its paces then do a longer run to see how packet loss goes ...thanks for your help setting up

hushed chasm
#

would it make sense for get_queue to return uxQueueMessagesWaiting(monitor_rt.work_queue), or maybe a separate value for this? (the number of unconsumed packets in the queue)

#

(oops, my old variable names, but I think you get the idea)

#

channel-hopping working fine

#

something about this structure seems more immune to fragmentation :-), so it will be mainly the queue filling up that would lead to lost packets I'd guess at this point

night kestrel
hushed chasm
#

although I tried it and it's always still 128... not sure if I did something wrong or if the queue is running full and packets are getting dropped and we don't know... if xQueueSend fails, packet is dropped silently

night kestrel
#

the monitor.queue parameter returns the initially set queue size which is 128 by default

hushed chasm
#

right, but I tried to change it to return uxQueueMessagesWaiting

#

once Monitor is constructed, the queue may be filling before we get the first packet?

night kestrel
#

hmm... can you set a higher queue size... that should give us more time to debug monitor = wifi.radio.Monitor(queue=1024)

hushed chasm
#

trying...

#

while still at 128, got a: ```ERROR A stack overflow in task wifi has been detected.

Backtrace:0x40032947:0x3ffe1770 0x40033125:0x3ffe1790 0x40035ef5:0x3ffe17b0 0x40034eb8:0x3ffe1830 0x4003321c:0x3ffe1850 0x400331ce:0x3ffe1870 0x400867c3:0x3f015828 |<-CORRUPTED

night kestrel
#

okay so there is definitely an overflow... how many WiFi networks you have in your vicinity?

hushed chasm
#

maybe 3 dozen

night kestrel
#

woah... that's a lot

hushed chasm
#

that's just my house 😉

#

(multiple APs, and multiple SSIDs per AP)

#

but no neighbors, in a more populated area it would be similar perhaps

night kestrel
#

you are doing the real stress test here 🙂

hushed chasm
#

set monitor.queue = 1024 and mp_obj_t common_hal_wifi_monitor_get_queue(wifi_monitor_obj_t *self) { return mp_obj_new_int_from_uint(uxQueueMessagesWaiting(self->queue)); } returns 1024 out of the gate and same from then on

#

do you know what that stack overflow is getting triggered by?

night kestrel
#

not much info

gravitech_cucumber_r
? Backtrace:0x40032947:0x3ffe1770 0x40033125:0x3ffe1790 0x40035ef5:0x3ffe17b0 0x40034eb8:0x3ffe1830 0x4003321c:0x3ffe1850 0x400331ce:0x3ffe1870 0x400867c3:0x3f015828
got ['0x40032947', '0x40033125', '0x40035ef5', '0x40034eb8', '0x4003321c', '0x400331ce', '0x400867c3']
/home/mdev/code/circuitpython/ports/espressif/build-gravitech_cucumber_r/esp-idf/../../esp-idf/components/esp_system/panic.c:368
/home/mdev/code/circuitpython/ports/espressif/build-gravitech_cucumber_r/esp-idf/../../esp-idf/components/esp_system/system_api.c:112
/home/mdev/code/circuitpython/ports/espressif/build-gravitech_cucumber_r/esp-idf/../../esp-idf/components/freertos/port/xtensa/port.c:490
/home/mdev/code/circuitpython/ports/espressif/build-gravitech_cucumber_r/esp-idf/../../esp-idf/components/freertos/tasks.c:3289
/home/mdev/code/circuitpython/ports/espressif/build-gravitech_cucumber_r/esp-idf/../../esp-idf/components/freertos/port/xtensa/portasm.S:432
/home/mdev/code/circuitpython/ports/espressif/build-gravitech_cucumber_r/esp-idf/../../esp-idf/components/freertos/port/xtensa/portasm.S:231
/home/mdev/code/circuitpython/ports/espressif/../../py/mpprint.c:403
?
#

how about a queue size 4096?

hushed chasm
#

same, 4096 right out of the gate

#

I must have missed something in how I did that

night kestrel
#

what rate of packets did you get with your earlier testing?

hushed chasm
#

I'll dig in, maybe the build didn't pick up the change

#

rate... hard to say, visually, the dump looks similar rate, but I don't have anything quantitative

#

currently dumping ~25 packets per second

#

which is similar order of magnotude

#

yeah, from earlier comments in this thread, was seeing ~29 packets per second

#

but if we're dropping any now, I don't know about it

#

I'll add some debug prints

night kestrel
#

okay... I'll leave it up to you to figure out the queue situation... the stack overflow crash is puzzling though

hushed chasm
#

got it, I was just being sloppy with my build/flash commands... monitor is running great, showing actual (small) numbers of packets in queue with that small change. So it's keeping up, memory is holding up. This is looking really nice!

#

so there's the odd stack overflow, and I'm still not sure what to do about xQueueSend failures ...maybe raise an exception so user can do stats on packet drops?

hushed chasm
#

a little odd thing: constructing with a queue= is fine (...see below), but this is allowed but does nothing: monitor.queue = 16 (queue size probably shouldn't be writeable since it makes sense to set it up only in the constructor - dynamic queue size is probably an unnecessary complexity)

#

not sure if this helps, but when the stack overflow happens, user code gets an empty traceback: Traceback (most recent call last):

#

Also, not sure how this happens... set monitor = wifi.radio.Monitor(queue=8) but reading back monitor.queue with each packet, the length will go higher than 8

hushed chasm
#

Two tweaks to suggest:

  1. in cb:```c
    // send packet
    if (xQueueSend(monitor_obj->queue, &packet, pdMS_TO_TICKS(MONITOR_PROCESS_PACKET_TIMEOUT_MS)) != pdTRUE) {
    // raise an exception...
    }
2.```c
mp_obj_t common_hal_wifi_monitor_get_queue(wifi_monitor_obj_t *self) {
    // or return the actual queue length in some other variable...
    return mp_obj_new_int_from_uint(uxQueueMessagesWaiting(self->queue));
}
hushed chasm
#

It runs pretty smoothly when ambient packet rates are not too high (less than ~25 packets/second), or by setting parameters to limit packet rates if ambient rates _are _ too high. More than ~25 packets/second is more likely to fill the queue, fragment memory, and / or stack overflow.

#

monitor.deinit() doesn't exist yet, right? Another thing is that once the Monitor is constructed, packets will start to fill the queue, so user code should start consuming them immediately.

hushed chasm
#

I'm now questioning the exception in the callback. If the queue is getting more traffic than it can handle, trying to catch it will just end up with a stream of additional exceptions.

night kestrel
night kestrel
hushed chasm
#

yes, I was hopeful we could provide more info on queue health back to the python app. when I just print debug messages for both memory and queue full in the callback, it will keep running fine, I think something about raising an exception in the callback is what leads to the stack overflow... is this because it needs to deinit when raising exception, going into REPL, etc.?

#

queue size may suffice for queue health, and we have that now

night kestrel
hushed chasm
#

if I set queue=64, it will go to 129 then crash soon after (when raising the exception I think)

night kestrel
#

stack overflow is also a concern

hushed chasm
#

setting channel in the constructor also seems to always end up with channel 1

#

I didn't see anything in the code to my n00b eyes that would be a problem

night kestrel
#

debugging now

hushed chasm
#

it doesn't crash if no exceptions are raised in the (asynchronous) callback, so that's promising. I'd like to keep debug messages in there though, like we do for the (asynchronous) event handler in radio init.c

#

(that way, people can get additional debug info without a special build)

night kestrel
night kestrel
#

hmm... for some reason the input number n is being converted to 2n+1 for both channel and queue

#

found the bug: MP_ARG_OBJ was being used for channel and queue instead of MP_ARG_INT

hushed chasm
#

cool 🙂

#

so I think with debug log messages in the callback instead of exception, and deinit, this should be pretty good for initial PR? I looked, but wasn't sure what to do for control-C, etc. ...deinit called from main?

#

(memory fragmentation sticks around through reloads, for example)

night kestrel
#

Changes so-far:
• Removed set method for queue.
• Replace MP_ARG_OBJ with MP_ARG_INT for channel and queue.
• Convert property monitor.packet to method monitor.packet()
• Add monitor.deinit() method.

hushed chasm
#

I incorporated the changes. channel & queue are working great in the constructor. Bullet points #1 & #3 are also all good.

#

Having trouble with deinit()... if I control-C, it's fine and I can make a monitor in the REPL, but if I call deinit() in the code, it crashes, like:```I (369054) wifi:ic_enable_sniffer
I (375154) wifi:ic_disable_sniffer
assertion "( pxQueue )" failed: file "../../esp-idf/components/freertos/queue.c", line 1308, function: xQueueReceive

abort() was called at PC 0x40019fbe on core 0

Backtrace:0x40033df3:0x3ffe2dc0 0x400345d1:0x3ffe2de0 0x4003bd8a:0x3ffe2e00 0x40019fbe:0x3ffe2e70 0x4001a42b:0x3ffe2e90 0x400353c6:0x3ffe2ec0 0x400b6769:0x3ffe2f00 0x400b0f8d:0x3ffe2f30 0x40093fb9:0x3ffe2f50 0x4008f40e:0x3ffe2f70 0x4008f521:0x3ffe2f90 0x4009d8cf:0x3ffe2fb0 0x40094122:0x3ffe3050 0x4008f40e:0x3ffe3080 0x4008f43a:0x3ffe30a0 0x400d049b:0x3ffe30c0 0x400d0812:0x3ffe3160 0x400a0c60:0x3ffe3180 0x400a0fe9:0x3ffe31a0 0x400a13c4:0x3ffe3210 0x400a169b:0x3ffe3230 0x4015ee4c:0x3ffe3250 0x4003726d:0x3ffe3270```though I can't see how the queue would be NULL.

#

Also if I try to re-construct an already-constructed monitor, it crashes similarly:```I (13424) wifi:ic_enable_sniffer
assertion "( pxQueue )" failed: file "../../esp-idf/components/freertos/queue.c", line 1308, function: xQueueReceive

abort() was called at PC 0x40019fbe on core 0

Backtrace:0x40033df3:0x3ffe2dc0 0x400345d1:0x3ffe2de0 0x4003bd8a:0x3ffe2e00 0x40019fbe:0x3ffe2e70 0x4001a42b:0x3ffe2e90 0x400353c6:0x3ffe2ec0 0x400b6769:0x3ffe2f00 0x400b0f8d:0x3ffe2f30 0x40093fb9:0x3ffe2f50 0x4008f40e:0x3ffe2f70 0x4008f521:0x3ffe2f90 0x4009d8cf:0x3ffe2fb0 0x40094122:0x3ffe3050 0x4008f40e:0x3ffe3080 0x4008f43a:0x3ffe30a0 0x400d049b:0x3ffe30c0 0x400d0812:0x3ffe3160 0x400a0c60:0x3ffe3180 0x400a0fe9:0x3ffe31a0 0x400a13c4:0x3ffe3210 0x400a169b:0x3ffe3230 0x4015ee4c:0x3ffe3250 0x4003726d:0x3ffe3270

#

I'll keep poking at it.

night kestrel
hushed chasm
#

wow, that was fast 🙂

#

I don't see what changed for deinit

night kestrel
#

• needed to raise an error for deinit
• a second condition was needed for re-construction

hushed chasm
#

I'll load it up.

night kestrel
#

a diff of the recent changes

diff --git a/ports/espressif/common-hal/wifi/Monitor.c b/ports/espressif/common-hal/wifi/Monitor.c
index d00349b81..6b887c2b5 100644
--- a/ports/espressif/common-hal/wifi/Monitor.c
+++ b/ports/espressif/common-hal/wifi/Monitor.c
@@ -73,9 +73,11 @@ static void wifi_monitor_cb(void *recv_buf, wifi_promiscuous_pkt_type_t type) {
 
 void common_hal_wifi_monitor_construct(wifi_monitor_obj_t *self,
     uint8_t channel, uint8_t parse, size_t queue) {
-    if (monitor_obj != NULL) {
+    if (monitor_obj != NULL && !common_hal_wifi_monitor_deinited(monitor_obj)) {
         self = monitor_obj;
         return;
+    } else {
+        monitor_obj = NULL;
     }
 
     const compressed_string_t *monitor_mode_init_error = translate("monitor init failed");
diff --git a/shared-bindings/wifi/Monitor.c b/shared-bindings/wifi/Monitor.c
index 40bbfec2f..64c9910ff 100644
--- a/shared-bindings/wifi/Monitor.c
+++ b/shared-bindings/wifi/Monitor.c
@@ -26,6 +26,7 @@
 
 #include "py/runtime.h"
 #include "py/objproperty.h"
+#include "shared-bindings/util.h"
 #include "shared-bindings/wifi/Packet.h"
 #include "shared-bindings/wifi/Monitor.h"
 
@@ -143,10 +144,18 @@ STATIC mp_obj_t wifi_monitor_obj_deinit(mp_obj_t self_in) {
 }
 STATIC MP_DEFINE_CONST_FUN_OBJ_1(wifi_monitor_deinit_obj, wifi_monitor_obj_deinit);
 
+STATIC void check_for_deinit(mp_obj_t self_in) {
+    if (common_hal_wifi_monitor_deinited(self_in)) {
+        raise_deinited_error();
+    }
+}
+
 //|     def packet(self) -> dict:
-//|     """Returns the monitor packet."""
+//|         """Returns the monitor packet."""
+//|         ...
 //|
 STATIC mp_obj_t wifi_monitor_obj_get_packet(mp_obj_t self_in) {
+    check_for_deinit(self_in);
     return common_hal_wifi_monitor_get_packet(self_in);
 }
 MP_DEFINE_CONST_FUN_OBJ_1(wifi_monitor_packet_obj, wifi_monitor_obj_get_packet);
hushed chasm
#

I must have missed something, getting guru meditation error after wifi:ic_enable_sniffer, but it's midnight here, so I'll resume in the morning

#

oh, weird, that was just a one-off

#

works sometimes, still get the guru meditation sometimes when going through deinit/reconstruct cycles, will play more tomorrow

#

(I set it up to construct, read just one packet, deinit, rinse, repeat)

#

still getting the assertion "( pxQueue )" failed when reconstructing an already-constructed (no deinit-ing)

night kestrel
night kestrel
hushed chasm
#

Those look good, I'm tracking down an intermittent guru meditation related to the Singleton (I think)

#

somewhere among the changes, the queue length got lost:```py

import wifi

monitor = wifi.radio.Monitor()
_ = monitor.packet()
monitor.queue
128```

hushed chasm
#

oh, looks like this reverted? return mp_obj_new_int_from_uint(self->queue_size);

#

(rather than uxQueueMessagesWaiting(self->queue))

#

just occurred to me that will crash if we're deinited?

night kestrel
#

added free(packet.payload) if xQueueSend fails... should prevent a stack overflow

hushed chasm
#

ah, interesting, will see if that helps fragmentation

night kestrel
#

what do you suggest regarding queue health? uxQueueMessagesWaiting or a loss counter which increments each time xQueueSend fails?

hushed chasm
#

health: (1) memory there's not much we can do but print a debug message, raising exception in the callback is problematic [also can track espidf largest available block]; (2) having monitor.queue gives user code queue size. I don't know that there's much more we can do without adding a lot of complexity, and that should be enough.

#

we can always add a counter or something later. for now, I think it's more important just to know that the queue isn't keeping up rather than some precise measure

hushed chasm
#

there's still something intermittent (guru meditation) around deinit / re-construct, I'll keep poking at that

#

but otherwise I think this is pretty solid with those changes (free unsent malloc, memory message in callback instead of exception, re-revert monitor.queue)

night kestrel
#

just implemented a loss counter parameter... it can be accessed using method monitor.loss()
it increments each time xQueueSend fails and resets when it gets polled

hushed chasm
#

There are losses we know about (memory and queue full), and losses we don't (for example, missed packets on other channels while waiting for timeout on the current channel), so I've been experimenting this morning with #define MONITOR_PROCESS_PACKET_TIMEOUT_MS. Default from the Espressif example is 100ms. When on a single channel, I don't think it matters much. But in the case where user code is channel-hopping, reducing the timeout improves throughput.

hushed chasm
#

But it's not helpful in the single-channel case, so I'd say we leave that alone for now and see how people end up using monitor.

night kestrel
#

I think MONITOR_PROCESS_PACKET_TIMEOUT_MS should be set to 0 to avoid unnecessary blocking.

hushed chasm
#

I can test it, but my hesitation is that if user is only scanning a single channel, it takes longer to wrap through the code and packets could be missed there

#

10ms was a 50% improvement in throughput over 100ms in channel-hopping ...I'll test single-channel

night kestrel
#

but when the task blocks the packets are being lost anyway in the background

hushed chasm
#

well the callback is waiting for a packet during that timeout, and if one comes in sooner, it will return

#

when always on the same channel, nothing is missed when blocking in the callback, is my assumption

night kestrel
#

can you give it a test please... let's see if we get more packets in the same time frame with or without timeout

hushed chasm
#

I'll test 0-100 on a single channel and see what happens

#

it's hard to measure accurately because I don't get enough traffic on a single channel to really stress it, but 0ms and 100ms are similar. In the 0ms case, user code often ends up getting nothing back from monitor.packet() and trying again. In the 100ms case, user code always gets a packet... and the time between packets varies as we'd expect.

#

But I'll do some longer runs and see if there are any more insights.

#
  6136   399.010s   0  200  6 -57 mgmt FF:FF:FF:RE:DA:CT L_M 00:1A:DD:RE:DA:CT VEN 00:1A:DD:RE:DA:CT VEN  62  9 Beacon       SSID found                       32768 
  6137   399.043s   0  341  6 -60 mgmt FF:FF:FF:RE:DA:CT L_M 0E:36:14:RE:DA:CT LOC 00:25:00:RE:DA:CT VEN 224 13 Action                                        32768 
  6138   399.076s   0  237  6 -60 mgmt FF:FF:FF:RE:DA:CT L_M 88:1F:A1:RE:DA:CT VEN 88:1F:A1:RE:DA:CT VEN 193  2 Beacon       SSID found                       32768 
  6358   399.280s   0  237  6 -60 mgmt FF:FF:FF:RE:DA:CT L_M 88:1F:A1:RE:DA:CT VEN 88:1F:A1:RE:DA:CT VEN 193  4 Beacon       SSID found                       32768 
  6986   399.792s   0  237  6 -59 mgmt FF:FF:FF:RE:DA:CT L_M 88:1F:A1:RE:DA:CT VEN 88:1F:A1:RE:DA:CT VEN 193 11 Beacon       SSID found                       32768 
  6987   399.826s   0  200  6 -57 mgmt FF:FF:FF:RE:DA:CT L_M 00:1A:DD:RE:DA:CT VEN 00:1A:DD:RE:DA:CT VEN  63  1 Beacon       SSID found                       32768 
  7034   399.895s   0  237  6 -60 mgmt FF:FF:FF:RE:DA:CT L_M 88:1F:A1:RE:DA:CT VEN 88:1F:A1:RE:DA:CT VEN 193 12 Beacon       SSID found                       32768 
  7061   399.950s   0  201  6 -57 mgmt FF:FF:FF:RE:DA:CT L_M 00:1A:DD:RE:DA:CT VEN 00:1A:DD:RE:DA:CT VEN 213 14 Beacon       SSID found                       32768 
  7079   399.997s   0  237  6 -60 mgmt FF:FF:FF:RE:DA:CT L_M 88:1F:A1:RE:DA:CT VEN 88:1F:A1:RE:DA:CT VEN 193 13 Beacon       SSID found                       32768 
  7088   400.036s   0  341  6 -58 mgmt FF:FF:FF:RE:DA:CT L_M 0E:36:14:RE:DA:CT LOC 00:25:00:RE:DA:CT VEN 224 15 Action                                        32768 ```This is 0ms. The first column is the loop#, you can see sometimes they're sequential, and sometimes there are big skips. code.py only parses and prints if it gets a packet.
#

I didn't actually do 0ms with channel-hopping (only went down to 10ms), so I'll test that. I guess since the callback is asynchronous, exiting right away is fine. I wonder why Espressif didn't do this (and free memory on error) in their example. I don't know how queueing up callbacks works in the IDF, like if several packets arrive while the callback is running.

#

if they get saved, seems like a memory/stack risk, but if they get ignored then the less time spent in the callback the better (probably good guidance in any case)

hushed chasm
#

Looks like I made a measurement error earlier with the channel-hopping throughput improvement, didn't notice that every loop wasn't getting a packet. When measured properly, the timeout doesn't seem to affect throughput in either single-channel or channel-hopping cases...

#

However, when there are more ambient packets, the queue is more likely to fill when the timeout is longer. So my hypothesis from that is that the callback perhaps isn't re-entrant and we should minimize the timeout. Do you have any insight on the nature of the callback?

#

(if it was re-entrant, I think we'd see more memory and stack issues?)

hushed chasm
#

I wonder why Espressif didn't use xQueueSendFromISR().

#

Ah, well, here's what the timeout is:xTicksToWait: The maximum amount of time the task should block waiting for space to become available on the queue, should it already be full. The call will return immediately if this is set to 0 and the queue is full...Doesn't really explain why I saw the queue fill faster above, but I don't see a benefit to waiting on a longer timeout, we don't have any priority scheme going on (push to front of queue, etc). Based on all of the above, There's no reason to think that the next packet is any more important than the one after it. I'm fine with a small timeout, or 0.

night kestrel
night kestrel
#

one thing... wifi.Monitor() or wifi.radio.Monitor() ?

hushed chasm
#

I don't have a strong position about this, maybe tannewt does. It seems to me to be a component of the radio, though it doesn't really interact with other components of radio (isn't even affected by wifi.radio.enabled = False, but perhaps it should be). The things at wifi level other than radio are data structure helper classes.

night kestrel
#

I think the implementation is ready for a PR.