#elasticsearch :: Daggerverse
1 messages · Page 1 of 1 (latest)
That's pretty great! 🚀 It's a great example of daggerizing something. Short and sweet, hides a bit of complexity behind an easy to call CLI. Since you're looking for feedback, I do have a few tips 🙂
1️⃣ The init function isn’t using the current Elasticsearch instance, it’s always picking up the class’s defaults. You can see it with:
dagger call -v --mode=prod ctr with-exec --args=printenv,xpack.security.enabled stdout
✔ elasticsearch(javaOpts: "-Xms4g -Xms4g", mode: "prod", version: "8.13.2"): Elasticsearch! 1.7s
False
Notice that Elasticsearch is instantiated with mod: "prod", but still returns False for mode != "dev".
There’s multiple ways to solve this. It depends on your requirements:
- Do you want to expose
Elasticsearch.ctras a Dagger Function or just keep it in Python? - Do you need to expose
ctras a constructor argument (i.e., allow another module or the CLI to use anotherdagger.Containerinstead of this default)?- If not, and since
versionis only used to initctr, you could makeversiona constructor-only argument. Same withjava_opts.
- If not, and since
- Do you mind
ctr: dagger.Container | Noneor would you rather not have| Nonein that type?
2️⃣ I’m not familiar with xpack, but str(mode != "dev") is creating a capital cased "False". Is that an acceptable value or does it need to be lowercased "false"?
3️⃣ All functions have the same boilerplate for using the service, but I think you can reduce the duplication a bit more.
- Since
portis needed in every function, have you considered making it a field (likemode)? - Also, if the service is only internal, you may not care about the port so Dagger can assign one for you.
- The
servicefunction doesn’t need to beasync. - You can move
managed_serviceto an instance method so you get access toself. - Should probably make sure the service is stopped in case there's an exception.
Example:
port: int = field(default=9200)
@contextlib.asynccontextmanager
async def managed_service(self):
"""Start and stop a service."""
try:
svc = await self.service().start()
yield self.curl.with_service_binding("es", svc)
finally:
await svc.stop()
@property
def host(self):
return f"http://es:{self.port}"
@function
def service(self) -> dagger.Service:
"""Create an Elasticsearch service in dev mode by default"""
return self.ctr.with_exposed_port(self.port).as_service()
@function
async def get(self, path: str = "") -> str:
"""Sends a GET request to the ES service and returns the response."""
async with self.managed_service() as es_cli:
if self.mode == "dev":
...
return await es_cli.with_exec(["-s", f"{self.host}/{path}"]).stdout()
Great feedback, thanks. Exactly what I was looking for 👍
I’ll push an update in a couple of hours.
As I’d like to make it super easy for anyone to test it, what’s the best practice for embedding datasets examples into the module (location and so on)
I suggest you create an examples module inside your module:
dagger init --sdk=python --source=examples examples
dagger -m examples install .
This way you can create a few functions in there that show how to use the Elasticsearch module. You'd put the datasets in there. 🙂
This makes it easy for anyone to see it running, and showcases with code how it can be used. \cc @civic crown
I pushed a commit to address your feedback https://github.com/mgreau/daggerverse/commit/bb854b701cc9319f87823411a0b156564b47e8b7
I will look at adding an examples module
One thing I struggled with yesterday is how to update a Dagger.File content before mounting it to a container.
def index_bulk_data(self, data: dagger.File, index: str = "my-index") -> str:
"""Index documents from a file into Elasticsearch."""
...
# Format/Update the JSON data document before passing it to the es_cli container
es_cli.with_mounted_file("/data.json", data)
I tried using the export and contents methods but was not able to pass an updated file then to the container.
do you have any examples on how to do that?
You can create a new file in an empty directory and return it:
contents = await data.contents()
new_content = ...
new_data = dag.directory().with_new_file("data", contents).file("data")
es_cli.with_mounted_file("/data.json", data)
You can also add it directly to Container if you don't require a mount:
es_cli.with_new_file("/data.json", contents=new_content)
Quick feedback:
- Don't make
modeandportinit vars. They'll only be used in post init but not saved on the instance so you'll be using the class defaults in your functions. You have access to them on__post_init__throughself. - No need to annotate and describe
curlsince it's not exposed as a function. If you do want to expose it, then you need to set the default withfield(default=). CacheSharingMode.SHAREDis already the default for cache volumes, so you can omit that.- Instead of doing
svc = Noneinmanaged_service, do this:
svc = self.service()
try:
svc = await svc.start()
yield self.curl.with_service_binding("es", svc)
finally:
await svc.stop()
However, do you really need to manually start/stop the service? That's only necessary in certain use cases. I'd do this instead:
def _curl(self, *args) -> dagger.Container:
return (
self.curl
.with_service_binding("es", self.service())
.with_exec([*args])
)
@function
def service(self) -> dagger.Service:
return self.ctr.with_exposed_port(self.port).as_service()
@function
async def get(self, path: str = "") -> str:
if self.mode == "dev":
...
return await self._curl(...)
return await self._curl("-s", f"{self.host}/{path}").stdout()
Thanks,
do you really need to manually start/stop the service? That's only necessary in certain use cases.
I feel like it could be useful for users to be able to execute several calls before stopping the service, but not sure how it make it happens
You can make many calls, there's just a grace period that Dagger uses to recognize if the service is no longer needed or not. I suggest you try to let Dagger manage that connection until you hit a use case where you need to take it into your own hands.