Twitter Scraping | Bellingcat | Page 1

broken galleon Jun 18, 2024, 2:21 PM

#

I mean X scraping

lapis crane Jun 18, 2024, 2:24 PM

#

If you are not logged into twitter you would filter for the request "TweetResultByRestId" and if you are logged in you would filter for the request "TweetDetail"

#

I would prefer to do it logged out just because its easier in terms of code

#

Sorry for the trash quality, my OBS setup is bad

pliant herald Jun 18, 2024, 2:29 PM

#

just repeated and got the same result, it works even logged out. it does have some headers which I'd be concerned about namely an authorization and a few other tracking ones, which I guess will be invalidated based on time and maybe on amount of requests to make sure they're note being abused

#

if you right click the network entry > copy as curl you can easilly modify these for testing or even run it later on to see if they still work

lapis crane Jun 18, 2024, 2:30 PM

#

The request cannot be made twice unless you make a new request to the tweets url

#

I just made another tool to download tweets and parse them ill record

#

pliant herald Jun 18, 2024, 2:46 PM

#

can you post the link to your code repo in this forum so it's not only in the prev chat?

kind salmon Jun 18, 2024, 2:52 PM

#

[via @lapis crane]

#tools-and-sites message

https://github.com/inputoutputcontrol/tweex/blob/main/main.py

uses the oembed feature its pretty simple

GitHub

tweex/main.py at main · inputoutputcontrol/tweex

the simplest script to get data from a tweet/x?? idk what to call it now, xD - inputoutputcontrol/tweex

#

[via @pliant herald]

#tools-and-sites message

just tested that endpoint and it does return the tweet's textual content + user info but lacks a link to the media (images/videos) that allow for download. it has the t.co/redirectId links which redirect to twitter.com/this-is-the-tweet-id/photo/1 but do not allow direct download.

example for this tweet https://x.com/bellingcat/status/1800902098181316824

{
    "url": "https:\/\/twitter.com\/bellingcat\/status\/1800902098181316824",
    "author_name": "Bellingcat",
    "author_url": "https:\/\/twitter.com\/bellingcat",
    "html": "\u003Cblockquote class=\"twitter-tweet\"\u003E\u003Cp lang=\"en\" dir=\"ltr\"\u003EDarfur’s largest city is at risk of falling to the Rapid Support Forces, leading experts to warn of the real risk of genocide. Bellingcat and our partners \u003Ca href=\"https:\/\/twitter.com\/BeamReports?ref_src=twsrc%5Etfw\"\u003E@beamreports\u003C\/a\u003E have been examining the deteriorating situation over the past month... \u003Ca href=\"https:\/\/t.co\/8y1WBrmOu6\"\u003Ehttps:\/\/t.co\/8y1WBrmOu6\u003C\/a\u003E\u003C\/p\u003E&mdash; Bellingcat (@bellingcat) \u003Ca href=\"https:\/\/twitter.com\/bellingcat\/status\/1800902098181316824?ref_src=twsrc%5Etfw\"\u003EJune 12, 2024\u003C\/a\u003E\u003C\/blockquote\u003E\n\u003Cscript async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"\u003E\u003C\/script\u003E\n\n",
    "width": 550,
    "height": null,
    "type": "rich",
    "cache_age": "3153600000",
    "provider_name": "Twitter",
    "provider_url": "https:\/\/twitter.com",
    "version": "1.0"
}

Bellingcat (@bellingcat) on X

Darfur’s largest city is at risk of falling to the Rapid Support Forces, leading experts to warn of the real risk of genocide. Bellingcat and our partners @beamreports have been examining the deteriorating situation over the past month... https://t.co/8y1WBrmOu6

lapis crane Jun 18, 2024, 2:55 PM

#

pliant herald can you post the link to your code repo in this forum so it's not only in the pr...

https://github.com/inputoutputcontrol/twitter
https://github.com/inputoutputcontrol/tweex

GitHub

GitHub - inputoutputcontrol/twitter: Export all tweets from a users...

Export all tweets from a users profile by exploiting Twitter/X's XHR calls - inputoutputcontrol/twitter

GitHub

GitHub - inputoutputcontrol/tweex: the simplest script to get data ...

the simplest script to get data from a tweet/x?? idk what to call it now, xD - inputoutputcontrol/tweex

lapis crane Jun 18, 2024, 2:55 PM

#

pliant herald can you post the link to your code repo in this forum so it's not only in the pr...

I havent posted my other version to download tweets and get basic info on a users profile, should I publish it in the same tweex repo?

pliant herald Jun 18, 2024, 3:06 PM

#

I'd say go for it

lapis crane Jun 18, 2024, 3:08 PM

#

pliant herald I'd say go for it

https://github.com/inputoutputcontrol/tweex/blob/main/downloader.py

GitHub

tweex/downloader.py at main · inputoutputcontrol/tweex

the simplest script to get data from a tweet/x?? idk what to call it now, xD - inputoutputcontrol/tweex

lapis crane Jun 19, 2024, 12:31 PM

#

Ive experimented with making a twitter notifier through discord webhooks by using Nitters rss feeds as well, im not sure if it still works cus that was when i was first getting into twitter scraping, is anyone interested in me publishing something like that as well? Sorry i just have so many test scripts that some people might find useful

lapis crane Jun 19, 2024, 1:34 PM

#

https://platform.twitter.com/embed/Tweet.html?id=1800902098181316824
Here is a way to fetch the tweet embed by id lmao

viral grotto Jul 10, 2024, 8:52 AM

#

This is an interesting project. Could be turned into NLP analysis tools to go thru xyz account followers and then build up some kind of threat analysis on the followers.

dense saddle May 19, 2025, 2:03 AM

#

so in this proyect do you need help or new features to improve or its finished?

brave carbon Jun 12, 2025, 2:21 PM

#

#1265088434465144932 message
Ftr I posted the syndication method for grabbing tweet ids a while back

#Twitter Scraping