#Looking for help running large queries on Snowflake

12 messages · Page 1 of 1 (latest)

smoky geyser
#

Hello all.

I am trying to find a solution to be able to pull in multi-million row tables from Snowflake.

I have tried snowflex and the basic :odbc routes, and both of them seem to timeout in 1-2 minutes. Snowflex gives me actual timeout messages, while :odbc gives me something like this:

14:09:43.826 [error] GenServer #PID<0.208.0> terminating
** (stop) {:port_exit, :memory_allocation_failed}

I am just wondering if anyone else has gotten this to work successfully. I know this cannot be an un-solved problem, I just cannot seem to find the right docs or posts on it anywhere.

forest tulip
#

Stream, do not try to fetch everything at once

#

You probably will not be able to load everything into memory at once

smoky geyser
#

I even tested it by passing 1000 rows at a time into ets, and then deleteing the ets entries, and still had it fail

#

I was using :odbc.next() to fetch each row and pass it to ets, then after 1k ets entries i would clear out the ets and start to refill.

#

I guess ill try DTS instead just to eliminate possible me issues from ets

forest tulip
#

Check for memory usage during that processing

smoky geyser
#

ok, ill do that 🙂

#

thanks, ill do that an post back 😄

smoky geyser
#

@forest tulip - I tried using Streams, but I must be doing it wrong. I am still spiking my memory, even without persisting the data. I tried to strip down my code to bare bones, just to grab and barely process the rows

  def run() do
    {:ok, pid} = connect([])
    {odbc_conn_pid, count} = query_warehouse(pid, @query)

    row_stream =
      Stream.flat_map(1..count, fn _n ->
        odbc_conn_pid
        |> :odbc.next()
        |> small_process()
      end)

    row_stream
    |> Stream.run()
  end

  defp small_process({:selected, headers, _rows}) do
    headers
    |> hd()
  end
#

I am quite confused as to why the rows are eating up memory when I am not using them past the small_process function

#

When I run it my memory usage looks like this, and the yellow is where it dies