#Token Too Long error using CloudTrail Parser

1 messages · Page 1 of 1 (latest)

bold gyro
#

Hey Guys!

I'm using the aws-cloudtrail parser (https://app.crowdsec.net/hub/author/crowdsecurity/configurations/aws-cloudtrail) to retrieve data from my S3 bucket. I'm leveraging the SQS configuration.
Everything seems to be working fine, but when I check the logs I'm receiving the following error:

time="2024-11-22T21:14:09Z" level=error msg="Error while reading file: failed to read object BUCKET_NAME/PREFIX.json.gz: bufio.Scanner: token too long" method=readManager queue="https://sqs.us-east-1.amazonaws.com/ACCOUNT/SQS_QUEUE_NAME" type=s3

I tryed changing the max_buffer_size in the acquis.yaml configuration, but it didn't work. Here's what I used: max_buffer_size: 1000000000

Does anyone have any ideia of how can I resolve this issue?

CrowdSec Console

Use CrowdSec Console to visualize security data, manage dynamic blocklists, and gain real-time intelligence on IPs. Enhance your threat response capabilities.

dire sandalBOT
#
Important Information

Thank you for getting in touch with your support request. To expedite a swift resolution, could you kindly provide the following information? Rest assured, we will respond promptly, and we greatly appreciate your patience. While you wait, please check the links below to see if this issue has been previously addressed. If you have managed to resolve it, please use run the command /resolve or press the green resolve button below.

Log Files

If you possess any log files that you believe could be beneficial, please include them at this time. By default, CrowdSec logs to /var/log/, where you will discover a corresponding log file for each component.

Guide Followed (CrowdSec Official)

If you have diligently followed one of our guides and hit a roadblock, please share the guide with us. This will help us assess if any adjustments are necessary to assist you further.

Screenshots

Please forward any screenshots depicting errors you encounter. Your visuals will provide us with a clear view of the issues you are facing.

dire spear
#

Can you check in your cloudtrail logs the length of the longest lines ?

max_buffer_size should be bigger than it (although 1000000000 seems plenty enough).

Do you see any logs with Setting max buffer size to XXXX ?

bold gyro
#

I think that the problem is really the longest bucket file address

dire spear
#

There's not any limit on this AFAIK
We do use this parser internally, reading cloudtrail logs from S3, and we don't have this isssue

#

for reference, this is our acquisition config:

polling_method: sqs
sqs_name: cloudtrail-queue
sqs_format: s3notification 
polling_interval: 30
aws_region: eu-west-1
transform: map(JsonExtractSlice(evt.Line.Raw, "Records"), ToJsonString(#))
max_buffer_size: 10000000
use_time_machine: true
labels:
  type: aws-cloudtrail
#

(the polling_interval is useless, probably a leftover from some tests)

#

can you try with max_buffer_size set to 10000000 ?
I'm wondering if 1000000000 triggers an overflow somewhere in go / it's too big for the memory allocation to succeed

bold gyro
#

I've tryed here, but the same error happens. The only difference that I've seen between our configuration is the sqs_name. I use the SQS in another account.
Follow my config:

source: s3
polling_method: sqs
sqs_name: "https://sqs.us-east-1.amazonaws.com/[ACCOUNT_NUMBER]/CloudTrail"
sqs_format: s3notification 
polling_interval: 30
aws_region: us-east-1
max_buffer_size: 10000000
transform: map(JsonExtractSlice(evt.Line.Raw, "Records"), ToJsonString(#))
use_time_machine: true
labels:
  type: aws-cloudtrail```
dire spear
#

and if you run cscli metrics, do you see any lines read at all from S3 ?

#

I might have an idea

When you say The longest line is 184 characteres, where have you checked ?
in the S3 bucket directly ?

If not, can you download a file that is mentioned in the error message, and check the length of the lines in it (cloudtrail will bundle a lot of events on the same line)

#

and also, if you are willing to share one file triggering the error so I can reproduce locally, it would be amazing (I know it will more than likely contains a ton of private data such as account ids, AWS usernames and so on, but cloudtrail will (in theory) remove anything really sensitive like credentials)

bold gyro
dire spear
#

For reference, I just had another look at our logs, and turns out we sometimes get this error (I just found a file with a single 12-millions-characters line in it)
but no issue if I do increase the max_buffer_size

Can you try to run this command on the machine where your crowdsec is running (replace the region with the one you are in):

AWS_REGION=eu-west-1 crowdsec -dsn "s3://bucket/path/to/the/object.json.gz?max_buffer_size=100000" --type cloudtrail -no-api

and play with the value max_buffer_size until it succeeds

bold gyro
dire spear
#

if you remove it, it will default to 65k, so it's expected to fail in your case.

If I understand correctly, you are saying that it does work when you do the replay of file, but using the same value for max_buffer_size in the acquis.yaml does not work ?

bold gyro
#

Exactly.

dire spear
#

It doesn't make any sense :/
It's the same code that is used to read the file in both cases.

The only difference I can think of is that, in some cases, the AWS SDK seems to automatically decompress the gz file for us, and sometimes not (and it's not clear when it happens).
Depending on whether it was decompressed or not, we read the file slightly differently, maybe the issue is coming from that, I'll try to test some things

bold gyro
#

I was though about it. The command dsn is directly in the S3 write? The acquis is configured to use a SQS queue, maybe it's get differents results for that.
It's really strange, it's like the acquis can't apply the max_buffer_size. At the same time, it generate logs like setting max buffer size

bold gyro
dire spear
solemn copper
#

(I'm working with Vinicius, so it's the same environment)