#elasticsearch :: Daggerverse

1 messages · Page 1 of 1 (latest)

lethal rampart
#

That's pretty great! 🚀 It's a great example of daggerizing something. Short and sweet, hides a bit of complexity behind an easy to call CLI. Since you're looking for feedback, I do have a few tips 🙂

lethal rampart
#

1️⃣ The init function isn’t using the current Elasticsearch instance, it’s always picking up the class’s defaults. You can see it with:

dagger call -v --mode=prod ctr with-exec --args=printenv,xpack.security.enabled stdout
✔ elasticsearch(javaOpts: "-Xms4g -Xms4g", mode: "prod", version: "8.13.2"): Elasticsearch! 1.7s

False

Notice that Elasticsearch is instantiated with mod: "prod", but still returns False for mode != "dev".

There’s multiple ways to solve this. It depends on your requirements:

  • Do you want to expose Elasticsearch.ctr as a Dagger Function or just keep it in Python?
  • Do you need to expose ctr as a constructor argument (i.e., allow another module or the CLI to use another dagger.Container instead of this default)?
    • If not, and since version is only used to init ctr, you could make version a constructor-only argument. Same with java_opts.
  • Do you mind ctr: dagger.Container | None or would you rather not have | None in that type?
#

2️⃣ I’m not familiar with xpack, but str(mode != "dev") is creating a capital cased "False". Is that an acceptable value or does it need to be lowercased "false"?

lethal rampart
#

3️⃣ All functions have the same boilerplate for using the service, but I think you can reduce the duplication a bit more.

  • Since port is needed in every function, have you considered making it a field (like mode)?
  • Also, if the service is only internal, you may not care about the port so Dagger can assign one for you.
  • The service function doesn’t need to be async.
  • You can move managed_service to an instance method so you get access to self.
  • Should probably make sure the service is stopped in case there's an exception.

Example:

    port: int = field(default=9200)

    @contextlib.asynccontextmanager
    async def managed_service(self):
        """Start and stop a service."""

        try:
            svc = await self.service().start()
            yield self.curl.with_service_binding("es", svc)
        finally:
            await svc.stop()

    @property
    def host(self):
        return f"http://es:{self.port}"

    @function
    def service(self) -> dagger.Service:
        """Create an Elasticsearch service in dev mode by default"""

        return self.ctr.with_exposed_port(self.port).as_service()

    @function
    async def get(self, path: str = "") -> str:
        """Sends a GET request to the ES service and returns the response."""

        async with self.managed_service() as es_cli:
           if self.mode == "dev":
                ...

           return await es_cli.with_exec(["-s", f"{self.host}/{path}"]).stdout()
twin quartz
#

Great feedback, thanks. Exactly what I was looking for 👍

#

I’ll push an update in a couple of hours.

As I’d like to make it super easy for anyone to test it, what’s the best practice for embedding datasets examples into the module (location and so on)

lethal rampart
#

I suggest you create an examples module inside your module:

dagger init --sdk=python --source=examples examples
dagger -m examples install .

This way you can create a few functions in there that show how to use the Elasticsearch module. You'd put the datasets in there. 🙂

#

This makes it easy for anyone to see it running, and showcases with code how it can be used. \cc @civic crown

twin quartz
#

One thing I struggled with yesterday is how to update a Dagger.File content before mounting it to a container.

 def index_bulk_data(self, data: dagger.File, index: str = "my-index") -> str:
        """Index documents from a file into Elasticsearch."""

            ...
          # Format/Update the JSON data document  before passing it to the es_cli container
            es_cli.with_mounted_file("/data.json", data)
#

I tried using the export and contents methods but was not able to pass an updated file then to the container.

do you have any examples on how to do that?

lethal rampart
#

You can create a new file in an empty directory and return it:

contents = await data.contents()
new_content = ...
new_data = dag.directory().with_new_file("data", contents).file("data")
es_cli.with_mounted_file("/data.json", data)

You can also add it directly to Container if you don't require a mount:

es_cli.with_new_file("/data.json", contents=new_content)
lethal rampart
# twin quartz I pushed a commit to address your feedback https://github.com/mgreau/daggerverse...

Quick feedback:

  • Don't make mode and port init vars. They'll only be used in post init but not saved on the instance so you'll be using the class defaults in your functions. You have access to them on __post_init__ through self.
  • No need to annotate and describe curl since it's not exposed as a function. If you do want to expose it, then you need to set the default with field(default=).
  • CacheSharingMode.SHARED is already the default for cache volumes, so you can omit that.
  • Instead of doing svc = None in managed_service, do this:
svc = self.service()
try:
    svc = await svc.start()
    yield self.curl.with_service_binding("es", svc)
finally:
    await svc.stop()

However, do you really need to manually start/stop the service? That's only necessary in certain use cases. I'd do this instead:

def _curl(self, *args) -> dagger.Container:
    return (
        self.curl
        .with_service_binding("es", self.service())
        .with_exec([*args])
    )

@function
def service(self) -> dagger.Service:
    return self.ctr.with_exposed_port(self.port).as_service()
    
@function
async def get(self, path: str = "") -> str:
    if self.mode == "dev":
        ...
        return await self._curl(...)
    return await self._curl("-s", f"{self.host}/{path}").stdout()
twin quartz
#

Thanks,

do you really need to manually start/stop the service? That's only necessary in certain use cases.

I feel like it could be useful for users to be able to execute several calls before stopping the service, but not sure how it make it happens

lethal rampart
#

You can make many calls, there's just a grace period that Dagger uses to recognize if the service is no longer needed or not. I suggest you try to let Dagger manage that connection until you hit a use case where you need to take it into your own hands.