#XZ Backdoor Analysis
997 messages · Page 1 of 1 (latest)
If this title doesn't invite a very lively and fact-based discussion I don't know what will.
Absolute legend weighing in:
https://lcamtuf.substack.com/p/technologist-vs-spy-the-xz-backdoor
I wrote a summary what happened/ what discussion led to this post but apparently automod doesn't let me post it so here we go. Should open as a markdown file.
Too many mentions
@pliant trench sorry, I know it says it in bright red... 
Stick to three
Sorry, missed this thread. There's a bit more info after the move to this thread in #infosec. I wish there was an easy way for discord to move things over. My apologies.
I'll just say @serene hawk
I don't see that message. I get another.
@serene hawk no we literally just opened it
gotcha 🙂
Alas, all we have is copy/paste
At least it's not IRC 😄
So now, we wait. Things have been patched (went faster than I thought), investigations starting... not much for anyone to really do or talk about until it's fully reversed or an investigation concludes, which I doubt we'll ever hear about. This is confirmed to have been picked up by CISA, I'm sure NSA is also on it.
Perhaps more.
In retrospect, this should have been a serious red flag.
I'm pretty sure every cybersecurity researcher and their mother are on it
*is? *are?
It has 1) ifunc stuff, 2) valgrind issues, 3) conflicting build shit with Landlock
Those three things should not be together for a compression library.
I don't know what you're saying but valgrind sounds like you don't want to mess with their hoards
not even if you're landlock himself
ifunc = indirect functions, valgrind = a debugging tool that checks memory allocations in C/C++ programs to make sure you don't double-free or use after free
landlock = unprivileged access control
This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user space applications.
^ from the landlock page
a compression library does not need to even know about landlock.
I'm mostly curious about the human aspect of the story - is he real, is that a real person or a team? his account could have been compromised to push a few late night commits, but then what about all the pretext?
There's no way of knowing under the powers that be reveal that information.
This person hid themselves too well.
Honestly the patience and tenacity are admirable despite the adversarial nature. That sort of discipline and dedication to craft is something.
as @burnt kindle pointed out if this had worked this would be a backdoor the NSO group would salivate over.... this would've been HUGE and then like DAYS, nay hours after it made it into some pre-releases it got made by accident??
Exactly!!!!!!!!
You tend to see those traits in better organized APT groups with long term footprints in compromised systems.
Imagine how they feel, having worked on this project for years.
how many apt groups try to social engineer major OS distributions, though? I can't think of any
still, even if it's an ATP, there's always people behind it and it's always a game, and they lost BIG TIME
meaning?
I am not sure what it actually implies, other than it's unprecedented AFAIK
Yeah that's exciting innit?
Kind of like discovering weaknesses in Dual EC DRBG in a fresh install while performance testing algorithms.
I don't know what that means but I nod excitedly
snowden 'n' the nsa
Dual EC DRBG was a potential backdoor in an encryption algorithm advocated for by the NSA. https://en.wikipedia.org/wiki/Dual_EC_DRBG
Dual_EC_DRBG (Dual Elliptic Curve Deterministic Random Bit Generator) is an algorithm that was presented as a cryptographically secure pseudorandom number generator (CSPRNG) using methods in elliptic curve cryptography. Despite wide public criticism, including the public identification of the possibility that the National Security Agency put a...
Such an attempt was tried again at the ISO level some years later and the chairs made a hubub about it on Twitter
// stirring the pot mode on
https://twitter.com/TheHackersNews/status/1714855194779857223
// stirring the pot mode off
🕵️♂️ ALERT: Google TAG security experts uncover Russian and Chinese state-backed threat actors exploiting WinRAR #vulnerability (CVE-2023-38831) to infiltrate systems.
Get details here: https://t.co/bTGkOdbXfN
#hacking #CyberAttack #CyberSecurity
There's a precedent of things targeting archiving software.
But you get patches when you pay for winrar mirite? (sorry, had to)
NIST lost a substantial amount of credibility and trust when it was revealed that one of their consultants had undisclosed NSA connections in... ugh, I can't remember what it was
or was that dual_ec?
that was dual_ec
there's a second standards body issue that happened some years later
NIST Special Publication 800-90A
the thing that keeps me thinking is the specific targeting. And I don't know exactly where in the conversation I picked that up and whether either of you can substantiate the claims I just saved in te back of my head:
- the xy-utils was a tiny project with very few, or even a single maintainer
- the project had a certain complexity; file compression algorithms are not something that any undergrad can glance over and understand what's going on
- the maintainer alledgedly had mental health issues
- the person or dev-team was chosen with a skillset that was very specific to that project: filesystems and compression algorithms
This dictates a causality chain, or say, a specific order of operations: assuming that this was an ATP/state actor, this would make sense in so far as you pick a target project first that has three particular weaknesses: it's relatively minor, it has a small team, and the team lead is psychologically vulnerable. Then, after you picked the target, you assemble a person, or a team, and assign them to the project.
Now there's two scenarios: either
a) this was selective bias on my part and my chain of reasoning breaks at one or several points (can anyone substantiate my assumptions?)
b) if this was a strategy it might not have been the only one. Even if if was a multi-year effort if would certainly have paid off had it worked. A strategy like this - do you think this was a on-off moonshot? What other projects that fulfill these characteristics can you identify? Are there other small projects that are presents in a larger number of distros, that have a small team of maintainers and that exhibit a relatively complex codebase?
hmmm... they did catch DreadPirateRoberts because he asked the wrong question on stackoverflow and used his email address... idk...
Also, the obligatory
but if you know I got the assumptions wrong, let me know.
Because if b) has any relevance, this becomes a data science question. If you go ahead and pick a target like that, it should be (some someone with the particular skillset) to assemble a list.
R - gather all open source projects that are commonly used packages and repositories that are considered 'main' among the major distributions
C - find metrics for code complexity (how specialized is the knowledge you need to understand the code)
M - count the number of active maintainers
sort the list R according to C weighted by M.
and now start from the top of the list and OSINT (or closed INT) the shit out of the maintainers. Pick your top 5 targets, assemble teams and go for it.
This is an awesome write up on the obfuscation bits: https://gynvael.coldwind.pl/?lang=en&id=782
https://boehs.org/node/everything-i-know-about-the-xz-backdoor nice write up with osint angles
includes sock puppets or otherwise suspicious one day flies
also mentions possibly libarchive targeted by same attacker
Best source which is also getting updates over time, funnily enough the same person who accidentally broke npm (the “all” package)
A way to detect if the source has the back door (the release tar ball): https://infosec.exchange/@wdormann/112187297459865552
Here’s a interesting detail, systemd was about to remove (or seriously reduce) its xz dependency, which may have motivated the attacker to rush their attack
For those asking what systemd change, easy write up: https://github.com/systemd/systemd/issues/32028
It was in train before the XZ issue was discovered, which may be why the threat actor sped up, started making mistakes and started begging distros to upgrade XZ - as what looks to be years of planning was about to be flushed down the pan.
https://twitter.com/fr0gger_/status/1774342248437813525 visual representation of how the backdoor works and how it sneaked into xz in stages
Arch had a vuln version of xz, yes. But it didn't use liblzma in ssh, it isn't linked. You can verify with ldd
#infosec message
#infosec message
what would be very interesting is using English language analysis on 'Jia' to see if it is one or multiple personae, if they are the same person as the sockpuppets. Also, look at the names of the sock puppets. This way, you can possibly attribute it to someone else who is known in the hacker community (or APT member), or figure out where they're from. Could it also be the person executing the dev access as 'Jia' is a different one as the one implementing the backdoor, or doing the social engineering? Or do they seem like the same person?
what is also curious is their comment on Longsoon arch. I don't know if it is one piece of comment in a lot of noise or if it stands out on its own. If you know you'll be eventually detected, you don't want attribution to your country right
I don’t think name analysis will lead to much. Since it’s likely a state actor, and they had very good OPSEC (over 3 years!!) it’s likely that if anything they would have pretended to be another threat actor. https://cyberplace.social/@GossiTheDog/112189338794013839
@briankrebs@infosec.exchange @cenobyte@mastodon.thirring.org @rene_mobile@infosec.exchange @AndresFreundTec@mastodon.social @danderson@hachyderm.io @eloy@hsnl.social not saying it’s the case here (and honestly attribution bores me), but GCHQ operate a policy of plausible deniability on everything they do - they pretend to be other threat actors,...
It’s likely that multiple people were involved, but there’s no proof of that yet
@sullen raptor what you are referring to is stylometry. While possible to do, it'd require a fair amount of the author's writing to be statistically robust for attribution.
For multiple author analysis on an arbitrary collection, you'd also need to have distinct author's writings as well.
There could be low-fidelity ways to show shifts in word usage or phrasings. I've not seen enough of Jia's writings to say how feasible that may be.
Suppose I could glance thru some of the history.
That would actually be relatively easy to get a corpus of their text.
All of the mailing list archives as well as all commit messages from their account.
Let's do it. 💪🏽
(And the sock puppets!)
The mailing lists would be the tricker part to track down, I wonder if there's a clickhouse query I can run to get the commit bodies
I don't know how to do such automated analysis (I just remember a project on HN trying to find alt accounts and some people woke up to this de-anon attack then), I know it can be done manually though. English seems to be proper, not like literally translated via Google Translate or whatever. Or that you have those weird Chinese sayings in English
a project like that for 3 years if one time he made a mistake with his VPN he can get caught. But we don't have access to say access logs of Libera or GitHub
what also stands out to me (in hindsight) is that the 5.6.0 does not seem to warrant a major release. That is (in hindsight) a risk the attacker took
Fair enough.
I've got some experience there. We can give it a shot but it may get only so far.
Keep in mind, all the writing is from a pretty narrow domain (i.e. activity in the xz project). That should actually make it easier.
Weird phrasings would be out of my wheelhouse (Chinese vs English) and would take a native. There's been analysis on the name (Jia Cheong Tan) and it seems of questionable lineage. Mixes of Cantonese, Mandarin, and Hokkien.
clickhouse won't let me export the data
have to scrape it, sec
JSON.stringify([].slice.call(temp0.querySelectorAll('thead,tr')).map(elem => [].slice.call(elem.children).map(c => c.innerText)))
First row is column headers
Alternatively:
curl 'https://play.clickhouse.com/?add_http_cors_header=1&default_format=JSONCompact&max_result_rows=1000&max_result_bytes=10000000&result_overflow_mode=break' -X POST -H 'Authorization: Basic cGxheTo=' -H 'Content-Type: text/plain;charset=UTF-8' --data-raw $'SELECT * FROM github_events WHERE actor_login=\'JiaT75\''
Not sure if it has been posted yet, but here is an archived snapshot of the repo: https://archive.softwareheritage.org/browse/revision/af071ef7702debef4f1d324616a0137a5001c14c/?origin_url=https://github.com/tukaani-project/xz&snapshot=bcdaf33e1b3864c1c5f52dca8389a8f68d679e03
Anyone know of Hans Jennson's github handle?
At least for github there's nothing interesting there
This is the Hans Jansen Github handle
Nice!
yep that looks right
Got it from here: https://news.ycombinator.com/item?id=39866936
I think this has been in the making for almost a year. The whole ifunc infrastructure was added in June 2023 by Hans Jansen and Jia Tan. The initial patch is "authored by" Lasse Collin in the git metadata, but the code actually came from Hans Jansen: https://github.com/tukaani-project/xz/commit/ee44863ae88e377...> Thanks to Hans Jansen for the o...
I think this has been in the making for almost a year. The whole ifunc infrastructure was added in June 2023 by Hans Jansen and Jia Tan. The initial patch is "authored by" Lasse Collin in the git metadata, but the code actually came from Hans Jansen:
what is that supposed to mean? is this a squash commit?
Really good question
Also, we should scrape the mailing list: https://www.mail-archive.com/xz-devel@tukaani.org/
No, you can have different commiters vs authors in git
it's easy to screw it up too
It's probably a nothing burger that lasse was the "authored by".
here's the trimmed up corpus text for hansjan162
jiat75 is next
for anyone curious:
cat jiat75_clickhouse.json | jq -r '.data[] | .[8]' > jiat75_corpus.txt
I just removed code blocks
here's jiat75's from github
maybe someone can do something with that
regarding Hans Jansen a possible lead #infosec message
Over the holidays I got an odroid <- which holidays?
Me: which odroid?
Not following, the xz-backdoor page is run by Lasse and doesn't mention Hans
i think it means regading name origins on Hans Jansen.
that it's a pseudonym.
could be a pseudonym, not necessarily is
Hans is a common name in German, Dutch, Scandinavian. As is Jansen or something akin to that. Jansen literally means Jan's son
I'm curious which holydays they were referring to because it could indicate location. But given this is a long tail op, we cannot exclude such is part of deception (same w/time zone)
This is all I can find on them. carrd.co jiat0218@gmail.com business https://jiat0218@gmail.com.carrd.co
eBay JiaT75 shopping https://www.ebay.com/usr/JiaT75
giters jiat0218 coding https://giters.com/jiat0218
giters JiaT75 coding https://giters.com/JiaT75
GitHub jiat0218 coding https://github.com/jiat0218
GitHub JiaT75 cod...
"Everything about this situation feels like more than one talented individual can do. My gut says there is a lot more behind these actions than one person as someone with an amazing open source set of knowledge, immense obfuscation skills for bash and malware seems like a rare person. It reminded me of that YouTube video of a previous NSO engineer explaining how many engineers work together when building a robust exploit."
We already had the git contribution graph and it looked as if you graphed office hours
I.e. 9-5 with a lunch break
that's my comment haha
@serene hawk (sorry, it failed to reply to your message) and to your point (and tangentially to a Q @verbal crest asked) this op does share some patterns of a red-team that conducted a proper supply chain attack analysis. agree that the varying levels of domain expertise and complexity are a bit incomprehensible for a single person.
It would be helpful if you linked the message instead of sharing a screenshot. That way we also get the context of the message.
which tooling did you use?
with which timezone did it correlate?
Sorry didn't know that worked across channels.
That would be me. Western, central Europe. Maybe, one zone over, GMT +2? Tops.
tecnically it work across servers, but only for peaple that are also in the other server too
Depends, maybe even just plain UTC, which would spell one country and one country only
link to the source of the abow img
#infosec message
#infosec message
There's more countries that one on UTC right
Yeah there's lots
Also other countries with highly sophisticated intelligence services, like Sierra Leone, Mali... Iceland... /s
Iceland might be small but has strong IT
so what this could suggest (I say: could) is that the perpetrator worked from Europe (or Africa) and wanted attribution to China. If they had this in mind from the get go, then they made up a name in Chinese. Because eventually, this stuff would become public
Hans Jansen also suggests someone from West, known with that LOTR meme and/or someone who can relate to that name
Yeah Katrín Jakobsdóttir probably personally ordered that Iceland will beat Israel as the #1 country leading the proliferation of cyberweapons. /s
has anyone noticed any typical English they may recognize from non-native speakers? For example I am Dutch and have various tricks to figure if someone is native Dutch speaker
Can we just agree that it was more likely to be developed in the UK than in Iceland?
ofc
just dont wanna rule em out
I do rule out both the ghost of Prigozhin and the ghost of Kyiv :>
I've had a glance at it and I didn't see anything, this looks quite native
regarding the graph something else pops up. Normally I am productive when I start, cause I have ideas. But that doesn't happen here. Cause there's barely any contributions in the 8-11 hours, it really takes off from 11
Just because it's not falsifiable doesn't mean we can't go on TikTok and convince a few people that it was him.
briefings in morning?
Idk, for me the most productive time begins at 11. Say you get to work between 8 and 9, then you get started - first commits at 11 seems reasonable.
But I worked in academia, I don't known these things you've talking about... weekends, lunch breaks, holidays... sure you have to visit your family over Xmas but that doesn't count
is there consensus whether JT was a long term deception effort/goal, or a sudden change of heart?
a statement a Brit would make could help with attribution. You know, like a slip of the tongue
I think among this channel there is a consensus yeah 
long term right
But it's also something we want to be true - the story would just be a lot cooler
this shit is too complicated with all the scenarios and then having it in a chat, requires scenario analysis with probabilities
I've seen nothing which suggests Russia. Zero. Whereas Singapore VPN, who uses those typically?
PROBABILITY MATRIIIIIX
a sudden change could indicate coercion, something common in authoritarian countries
whatsmyname
but also if you look at the date JT started it is the start of the escalation of the UA / RU war
@burnt kindle 
or it was a matter of 'look, they have it, you should too'
whereas 5.6.0 was bloody minor, barely worth a new stable tree
why would you target Fedora, nobody seriously uses that in production. It is RHEL which is used in prod instead, in West. Which Linux distributions does China and Russia use in their servers?
I think we should focus more on things that are actionable again, i.e. things that can be investigated using openly available data.
Additional thought:
They used a asian identity since that’s a large demographic of GitHub users / contributors.
Some open source community stuff is also meeting up and they might be asked if they want to come to meetups or if they are close by.
I.e. no one will ask questions about you
interesting i had not thought of that
And if it’s easy to answer
(Example about large share:) The top GitHub repositories are all Chinese
blend in with the masses, check
JT has contributions with mainly liblzma/xz, but also some with libarchive and zstd (legacy building?)
it has been a while since I used IRC but it used to be that you could go back through logs from ages ago, and could read discussions in public channels. Then you could also see joins and parts and quits which included hosts, IP addresses, and username / identd
two possible hatred related leads: 1) calling .gitignore gitnigore https://news.ycombinator.com/item?id=39867737 this could also be a typo 2) the owner of jiatan.org is from Hong Kong and a queer advocate https://jiatan.org she wrote the book Digital Masquerade: Feminist Rights and Queer Media in China in 2023 https://www.clarehall.cam.ac.uk/directory/jia-tan/ (the angle here is using a nickname related to someone/something you don't like so attribution points to them)
A SourceGraph search like this shows https://sourcegraph.com/search?q=context:global+JiaT75&patte...- Jia Tan jiat75@gmail.com- jiat75 jiat0218@gmail.com```
amap = generate_author_map("xz") test_author = amap.get_author_by_name("Jia Cheong Tan")
self.assertEqual(
test_author.names, {"Jia Cheong Tan", "Jia ...
I wonder if that person was a public person back when JT made their handle
The tricky part is that there’s not many Chinese last names. Combine that with the pinyin writing and you are going to have a large overlap in name usage
The Chinese character writing of the name would be more unique, but even then you’d have a lot of overlap probably
Tan surname odds are 1 in 65 in China. Rank 36. Not too common, but common enough.
Jia less common
Thankfully not all distro's implemented/enforce systemd 😛
Interesting about Jia: highest density in Macau... not that it means anything, but... if I was an author writing a story... this name could come up?
Also interesting is the gender distribution of the name Jia.
88% is India by the way, where the frequency is 1:190k, while Malaysia and Thailand and Taiwan have a much higher frequency of the name Jia...
so it is decidedly a non-gender specific name
so, question: in any of Jia T's writings have they ever written in Chinese or used Chinese terminology?
inclined to question whether the name itself is a red herring.
I think the chance of the red herring hypothesis being true is likely.
Did someone say Singapore VPN? Check this out. Surname Tan, highest density in...
This smells like a beginner's exercise in crafting a legend. 
Like come on, if your task is to pick a name that's as nondescript as possible? First name could be male, could be female, last name is among the 100 most common names in China, you use a Singaporean VPN (a country with close ties to the UK, where English is even one of the official languages)...
have ChatGPT translate "David Smith" into any other nationality and language.
as to the question whether it's one or multiple authors...
https://arxiv.org/abs/2401.06752
In recent years, the increasing use of Artificial Intelligence based text generation tools has posed new challenges in document provenance, authentication, and authorship detection. However, advancements in stylometry have provided opportunities for automatic authorship and author change detection in multi-authored documents using style analysis...
I mean if you pick David Smith or Zhang Wei... you might as well just pick JohnDoe. Nobody will believe that's a real name. You'd want something that's foreign and frequent enough so you don't bother googling, but just so uncommon that it's not a blatant lie
interesting. i'm reading through this.
part of the challenge here in Jia's corpus:
- smaller overall document lengths (on the orders of sentences vs paragraphs);
- intermixed with lots of "noise" that is embedded code references or non-linguistic patterns for analysis;
- lots of components that would otherwise be dropped out in pre-processing (incidentally mentioned in the opening salvo of that link above)
- lack of sufficient baseline author texts to test (detect) style change wrt 2nd or 3rd authorship
From that perspective, JinTan would we an excellent pick
I met a gentleman at an AI function a few weeks ago. He (credibly) claims to have worked with famous three-letter-agencies. and his name? David Smith.
no. i did not believe him.
yes. i looked him up. he's real.
(but damn I'd have chosen that name for sure)
I mean, yeah, I bet he's not the only David Smith working for that company 
plus, there's also tools like this:
https://github.com/rakshithShetty/A4NT-author-masking
yes. i built a crude but effective adversarial stylometry method using pretty accessible tooling.
that's an interesting one.
one easy method is having Google Sheets backtranslate through a series of languages thereby eroding original authorship attributes (to varying degrees).
doubtful that applies here wrt Jia and the others.
there's also code stylometry but given the current mess of languages in the backdoor logic it probably isn't as straightforward.
a simpl stylometry masking tool would explain why there are no obvious giveaways in the corpus. i did't look very carefully, but no stylistic localism, flourishes, spelling preferences, slang, idioms, formal/informal language.... it was all pretty bland.
some people really do speak like that tbh.
for sure, yeah. also it would be a pain in the butt to write a stylometry masking tool that works for code comments. I'd hazard a guess and say if you're maintaining a git, with comments and documentation and all... a tool must be rather sophisticated to not fuck with what you're actually trying to communicate
Some key lessons learned from this work can be summarized as follow.
– As indicated by the experimental results, the complexity and the performances of the NLP models vary on the different tasks involved in SCD. The classification of single and multi-authored documents is more straightforward compared to the identification of text segments where the author switches[1]. Similarly, the performance of the proposed methods is significantly lower on the third task involving the identification of multiple locations where the author switches.
[1] :🫤
Then again, it's 50/50 - if it was an intelligence agency working for three years towards an exploit like this? Any good team that specializes in malware attribution has SOME expertise in that area.
If you go into mission planning of a project of that magnitude: Why wouldn't that be on the list of things to cross off? Wouldn't you make sure you at least somehow address adversarial stylometry?
As in - do something to obfuscate writing style clues.
And given that even most ATP malware groups now put some effort into making it hard to identify them by linguistic clues... why wouldn't GCHQ?
Adam Langley
by the way, GCHQ shares a building with the Joint Technical Language Service...
do we have a straw man emoji?
nah I'll just keep using this one 
say it is a long game by an agency, with an agent expertise in compression. You'd then have someone writing the malware. You'd have someone do the social engineering. Stuxnet was made by various people, Bond is like X people in one
in short, if it was a pre-planned long game by a state actor there is no way in hell this was one project by one person. That person probably runs multiple projects, as do those different people with different expertises
I'm just saying, if work schedule points to UK, level of sophistication points to state actor, then GCHQ would be prime candidate and they have a whole department of linguists literally a couple of doors down the corridor.
it could be different agencies, too
like a collaboration
you remember those commits on odd time frame, completely different from rest?
spinning that straw man a bit further... just assume it would be GCHQ, and GCHQ shares a building with what's basically a school for translators with a little extra flavor.... how likely would it be that they go get lunch at the same time... and is there a remote possibility that Joint Technical Language Service has at least one or two git repos we can find?
not that odd given that JTLS seems to basically be a part of GCHQ
The JTLS is co-located with GCHQ for administrative purposes.
I'm just saying, JTLS might have the same lunch hours if they're in the same building... if they're not quite as secretive, maybe they have a git or two and we can compare overlap
I suspect that a sustantial part of the GCHQ located JTLS work on translating GCHQ stuff
the civilian part is appernerently in Gloucester
Cheltenham is in yeah, the doughnut.. isn't that the same county? Gloucestershire?
ah you mean, there's also a civilian part in Gloucester itself, given it has a university and whatnot
the donut is in Cheltenham no clue whare that is in relation to Gloucestershire
Inside it.
@serene hawk can you do the clickhouse analysis again for the following repos:
https://github.com/gchq/stroom
https://github.com/gchq/Bailo
? pretty please?
yeah, same thing. Gloucestershire is the county, the donut is the 'pentagon' of the UK, just that it's a donut and in Cheltenham, which is in Gloucestershire. Gloucester is a stone's throw away and the county capital or whatever the English call it. And the GCHQ lives in the donut, and so does the JTLS
I mean come on, figuring out when teams working at GCHQ usually arrive at work, commit to their gits, have lunch and go home should not really be a problem.
And yes I know, it proves nothing if the schedules line up, but come on, their lunch breaks are pretty regular and the whole git screams 9-5 job. It would be a valid clue if nothing else...
however, there's at least two other facilities, one in London, one in Manchester, that could just as well be the source of the gits and might just have different time (ie. lunch) schedules, so the whole endeavour is very speculative either way
I did note one thing in the scripts,
backticks instead of $(like this)
backticks even
it is an old style of shell scripting, deprecated, as $() is more secure
this is just what was apparent to me without putting much effort into it. Something like Shellcheck could yield more
interesting observation for sure.
considered obsolete according to X/Open Portability and POSIX standards.
Still referenced as of 2018 in section 2.6.3 Command Substitution of the Open Group Specification.
https://unix.stackexchange.com/questions/126927/have-backticks-i-e-cmd-in-sh-shells-been-deprecated
that'd fit under the Code Stylometry methodology.
I really, really don't think the gitignore thing is a hatred lead, it's most likely a typo.
timestamp analysis?
yeah graph you did with clickhouse? I don't have an account there, can you just execute the same script on some GCHQ repos?
sure, just a second
btw
those usernames are fucking weird dude
I've never seen anything like it
no display pics, random letters/numbers
dunno what you found but it's atypical
you have not?
yeah I'm looking at them as well
check Ghidra
and at the time they took time off
one rationalization here could be that they wanted to achieve greatest reach into most systems. the backticks for backward-compatibility would be more applicable in that regard.
eh yeah that makes more since given they all have "ghid" or something like that in the name, looks like a team that signed up specifically for the project
this looks strange to me
but i digress
do you want a timestamp of all events on the repo? or a timestamp of individual contributors?
i'm in a client system for work that actually does something very similar to those names.
as an external vendor/contractor.
idk, do a couple if it's not too much work, if we have both, an average, and a few individuals? Can't hurt
when you make a new Gmail or Hotmail (or IM or FB) account and it is taken they add random chars
Twitter does that now, too.
I just would love to see if it lines up perfectly or just meh... or if it's even shifted by an hour consistently...
the way I see it, it is probably an employer of GCHQ who does accountability on these external Git accounts making sure no PII is there, while the internal Git server cannot be reached outside. As it shouldn't
PA: with all the in-band chatter here it's drowning out some of the technical analysis. if anyone wants pins feel free to drop Discord links and flag them for pinning by a mod.
I only have events going back to 2023-12-01 22:00:00
will try to do some analysis on them anyway
thanks!
minor at best, possibly even intentional. Just would need more/better premises
here's the raw counts by user and hour
Ha! Sweet!
and the TSV
Elliottdotgov lol also CLAassistant what is CLA?
I understood there was also a binary included in the malware. Was it an object file? Has it been reversed? Would be fun using Ghidra to find attribution FVEY xd
that's what I get
there appear to be two subgroups though, one that commits late, one that commits during regular work hours.
they'd likely have day and night shifts. there may even be 3rd and 4th shifts for all anyone knows.
different time zones - GCHQ allows contributors that dont work at GCHQ
yea--esp if they have multi-national satellite offices. for instance, analyzing contributor activity in my org would have a continuous 24/7 cycle Mon-Fri with different trends by region.
definitely some oddballs doing Sat-Sun work as well.
@serene hawk any chance you can pull this sort of data for https://github.com/GCHQDeveloper314 ?
he seems to do code review all day every day and if his constant number of vacation days per year are an indication, he's one person
with every finding it is 'was this deliberate' or 'was this intentional to be found' that makes it really tough :/
those 9-5 hours for example stand out, also with the one at end
break periods normalized in 2023; prior to that the schedule was less deterministic.
2023 break period intervals: 16-12-16-8
2024 tbd.
some night owls ehh? or Oz?
The white line is a3957273
OK, but... do you see the shift of an hour I was talking about? Our guys goes to lunch an hour later... (maybe)
so maaaaaaybe... it's a central europe timezone after all? GMT+1?
I honestly don't think this will lead to a result.
(the attempt to pin this on a specific actor based on speculation)
doesn"t look like it, no
I'm not sure what these guys have anything to do with jia tan (I must have missed some part of the convo)
yeah I mean... this guy clearly has lunch at noon
Oh this is an intelligence agency, of course they have weird usernames.
I understand now, that's why someone mentioned Ghidra 😂 sorry
brain fart
My main line of investigation was whether the initial time distribution you posted matches the time distribution of various other github contributors that we know for sure work at an intelligence agency that works on a GMT schedule
CET? sorry, same same
Ah I see.
UTC or CET indeed
In that case look at the corpuses of jia tan and hans jansen I provided before and see if they use US vs UK spellings 🙃
well we're CEST now
I sure am
you know who also is CEST? Troll station.
(sorry, that was totally off topic...)
but I'm a nerd, I know what time it is in Antarctica
https://en.wikipedia.org/wiki/Troll_(research_station)
If the name can be a red herring then the hours worked can be too.
Yeah.
yeah that's what I mean, it proves nothing. But if we were using a probability matrix to evaluatie competing hypotheses, the fact that the lunch hours differ by a time zone would have to be taken into consideration
I think inference on any of this is moot. This guy was able to think 10 steps ahead.
I can easily write a userscript for Github that schedules any comments/PRs/etc. 8 hours later.
it is worth trying to tick it off too
well a tool like this https://github.com/psal/anonymouth? if it was used, would not change the content, so there's an angle for stylometry reversing. I know nothing about it tbh, just that the project lead of team high tech crime has a degree in linguistics
it could just be one more part of the scheme to make attribution harder. 50/50
It's worth noting that jia tan used american spellings.
the IS worth noting
so did Hans:
can someone fill me in, who is Hans?
one other thing standing out is often greeting with exclamation mark. Like: Hello!
Just harder to uphold over 3 years, especially since Jia Tan was actually working as a maintainer
Interesting thought https://federate.social/@mattblaze/112191304157092489
any spelling errors?
I'd have to run it through a spellchecker
The LSM score for the two corpuses is 0.91
which far above average.
Idk what degree of certainty that entails.
#1223764590890975282 message
#1223764590890975282 message
@bitter geode can you pin those?
For all intents and purposes, I would say this is 1. But do we know the "normal distribution"? The value is between .5 and 1 - is it skewed to the right?
Ignore that.
NOOO
The spell checker automatically corrected words for me.
I don't want to
The normal value is 0.75.
OK
normal? like mean, median?
wait I'll check
because a high LSM score wouldn't be all that unusual in homogeneous formal and small groups
There's a trend of weird conjugation of English
let me compile some examples
it's subtle, so may be nothing.
.91 is high, but seems to be not unusual in cohesive groups
with hans I found Redundant phrase 'a large number of'
Specify a number, remove phrase, or simply use “many” or “numerous” using languagetool.org
it also mentions missing comma's a lot
and this one stood out: Nonstandard phrase
Although “In regards to” is sometimes used in casual speech, it is typically considered a nonstandard phrase.
"In regards to" happens to German speakers all the time
Germany IS in the same time zone as Troll station at the moment, and they're known to fuck up their intelligence operations... I drive by the BND building from time to time, maybe I just go and ask 
This is a scraped version of the entire xz-devel mailing list
Might be a decent control
The names should be consistent. I didn't include mail addresses, as those require an additional call to the server.
From the corpus of Jia Tan's Github events, trimmed of any >blockquotes and codeblocks.
Multithreading is a single word, "multi" is not a word on its own.
Showing this as a warning message made more sense before because multi threading was not the default.
Oftentimes is a single word, though arguably a common mistake.
Often times community members will also help us benchmark
"Plenty" used here is... odd. It sounds like an archaic usage of the word.
that is plenty reason to have the code to align the buffer
Extra word when doing a quantitative comparison.
multi threaded encoding mode with 1 thread will produce the same as output as 10 threads.
Mispelling / typo
I believe that is where the non-determinstic belief originated from
Missing plurality
We don't use variables starting with "_" since these type of identifiers are reserved
Missing article (arguable...)
Optimized CRC32 will be enabled if ARM64 has CRC extension running on Linux.
Missing 's'
Enable optimized CRC32 algorithm if ARM64 support CRC extension
A few things: two instances of missing plurality, and a use of british spelling. Also doesn't use consistent nor even correct spelling of x86_64, and x86_32 doesn't exist (it's just x86). I should probably check this isn't somehow someone else's comment.
The x32 port has a x86-64 ABI** in term of** all registers but uses only 32bit pointer like x86-32. The assembly optimisation fails to compile on x32. Given the state of x32 I suggest to exclude it from the optimisation rather than trying to fix it.
Typo, probably meant to say "not sure"
and I'm sure sure if they are really needed.
Another case that is really strange, I need to double check if it's really from Jia Tan. Typo ('you' should be 'your') and repeating "so far" twice.
Thank you everyone so far for you patience and your contributions so far!
There's more but I'm running out of Discord message space.
.91 is pretty high though...
please pin
Would be funny if this was a German operation (fun history read: Crypto AG), btw you typically learn British English in school there.
It was corrected to be .75 if I didn’t misread
Man the list goes on, there is definitely a recurring theme of some broken English here.
Oh OK
I really feel like Jia Tan's first language is not English.
that's more consistent with two authors
They have a great handle on the language but there are some common mistakes they're making with conjugation.
Namely with plurality and tense.
Or that’s what the TA wants you to think
Those are typical Chinese errors
there are plenty clues that would support that hypothesis
Since you don’t have tenses in Chinese
Plurality in English is a common mistake with Asian mother languages.
Yep
That's exactly my thought
Which would perfectly fit the persona
I believe this is serious problem since it can lead to unexpected results when linking.
having options as NULL will cause a seg fault.
I also ran into this bug a few days ago and have been investigated it a bit
It possible I set up my test wrong so don't believe me 100%.
This is is documented in doc/lzma-file-format.txt
Here's a date format written by JIa Tan:
(~ line 105 as of 2022-07-13)
Also here's a strangely worded sentence:
You do not need to subscribe to the mailing list to email xz@tukaani.org, but then only the maintainers will be able to see your mails.
the courtesy (thanking, please) he does is also typical of Chinese, if done too much
Looking at the professionalism of the attack, I wouldn’t be surprised if this was done on purpose
He thanks a lot.
or multiple people
They probably crafted this persona with some care
I mean if they didn't they're definitely in the wrong line of work
This is not the first cyber attack, this kind of analysis to try and pin it to a Threat actor is common.
Thanks the for PR!
I want to do a bit more research about the GNU indirect functions to make its right for this project before we decide to merge. Before that, there are a few style changes that need to be made. I will comment them separately.
^ interesting comment.
Jia Tan pushed back on the ifuncs integration prior to merging it.
Isn't it an important part of the backdoor?
and they usually use bayesian inference and probability matrices to narrow down the possibilities iteratively...
I think there is value in showing that if we have too low memlimit, we can up it and continue decoding.
"if we have too low"
I mean it would be weird if the maintainer would just accept any PRs, it’s a soft pushback to not attract attention imo
Right but notably the PR came from Hans.
damn you, Hans...
If the PR was necessary for the backdoor to work... Then Hans almost definitely is a part of this.
And the whole "I need to research it" might be a ploy to keep eyes off of it.
I thought it was already established that Hans is a part?
I think it's still speculation.
seems quite obvious and Hans and Kumur or w/e are, as they pushed agenda
Man this guy's english is not native.
I’m pretty sure that if Hans was a real person there’d been some pushback from him this far.
I mean we're not sure but it seems likely
People act strangely in times of stress of crisis.
Everyone who is related to OSS or info sec work is talking about this, everyone knows
I also want to remind everyone of this clue from yesterday (either @burnt kindle or @serene hawk posted it)
E.g. Lasse has been mostly silent.
He has written enough to deflect suspicion
Pretty sure they weren't pwned, they needed their own test framework for this to work.
That took about a year to develop.
he was on vacation and on way back probably been at the police station
unless the Fins are in on the conpiracy insert_vimana_emoji
It’s established that he was on vacation, he has written in mailing lists explaining himself and already done work to undue Jia Tans shenanigans in the code
OK so this is established? That the preparation by Jai Tan preceded the rollout by a year?
So the pwned hypothesis is basically ruled out by now, or at least highly unlikely?
It’s highly unlikely.
Highly unlikely.
This is not a single account takeover, this is all accounts of Jia Tan working together, and integrating the building pieces of the exploit over a long time.
Why would it be GCHQ
scratch that
Anything we are looking at could be carefully crafted to mislead us, since it’s pretty much 99% likely to be a state actor or a org with state actor resources
GCHQ would be an agency in Europe brazen and capable enough, and they're not EU so a mission like this would probably even be legal
Are you sure
nope
I suggest starting with evidence first and the finding culprits.
And what if someone wants you to think it’s GCHQ, it could be anyone
Yeah, talking about specific actors is pure speculation at this stage.
thanks, yes, sorry
I think what is interesting is that “Jia Tan” was writing English with typical Asian idiosyncrasies. It’s another hint that this persona was being maintained with high care
let"s get back to linguistic analysis then, sorry for the interruption, I thought I had found something, I was mistaken, my bad, I immediately deleted it
*American English!
what about Hans, did we find idiosyncrasies in Hans style?
Just to be clear though, this is not proper linguistic analysis. None of us are really qualified to do that or to interpret the results.
It's still interesting though.
Right, just looking through things.
well what are we doing then here, this is digital forensics, there is nothing we can do but speculate, we won't find hard evidence. If we want to start with evidence first and then decide where to focus out line of inquiry we might as well just quit now.
seriously?
OK, now what is that now, evidence? clue? linguistic analysis?
can you elaborate?
solid clue
Let's call it an interesting observation.
He's typing. No need to push.
so it is likely we would need runtime check if we wanted to include
Notably also american english.
Sorry for the delay I was updating my code to use the new organization.
I'm glad your test result match up with mine.
Your right though, I was hoping to look into additional optimizations after this one.
The percentage on the graph show the combined average for all types.
^ the above line appears in four places with a different sentence before it; the error is consistent. Either it's copy/pasted or they just make this mistake consistently.
The corpus isn't large so that's all I have.
I call it an indicator that can be used to change the row values in a probability matrix that will seriously affect the bayesian prospective probability that Hans and Jia are the same person
sorrryyyyy I'm excited. I'll restrain myself

Honestly both of these could be Jia Tan. I don't see any striking difference in writing style between them. An expert would have to dissect this.
Interesting that they both refer to liblzma as "liblzma" every single time they talk about it, as opposed to just "LZMA" or "lzma".
if you are Chinese residing in China, do you care you get busted/ousted by West? No. Cybercriminals residing in China and Russia are safe, as long as they stay there
how do they compare to other contributors in that specific aspect?
Dunno, repos are disabled right now
aaarrrgh
thanks Microsoft :/
Check the recent pin for an archived version of the repo
Doesn't include issues unfortunately.
Oh
What about the mailing list?
Not sure how deeply archived it is though
https://web.archive.org/web/20240329182224/https://github.com/tukaani-project/xz/issues/
XZ Utils. Contribute to tukaani-project/xz development by creating an account on GitHub.
tried 2 issues; not archived
Issue 83 (the first one) is the only one archived.
I've gtg to bed, will leave this as an exercise for someone else 🙂 night everyone
mailing list archive is pinned
You deserve it, great work!!!
N8
n8
Two additional issues archived: https://archive.is/https://github.com/tukaani-project/xz/issues/*
I'll also sign off for tonight. I'm on Antarctica time 
archive.today doesnt have it
See the link above 
There are some additional archived issues inside the closed tab.
https://web.archive.org/web/20240329182532/https://github.com/tukaani-project/xz/issues/79
https://web.archive.org/web/20240329182522/https://github.com/tukaani-project/xz/issues/89
https://web.archive.org/web/20231010134325/https://github.com/tukaani-project/xz/issues/61
man if i were that guy from Finland I'd get a lawyer
He probably already has one
Finland is CEST
ofc, that is also why he's in STFU mode
Last issue I found: https://web.archive.org/web/20240330004642/https://github.com/tukaani-project/xz/issues/24
All other issue links are dead or point to a time after which the repository was already taken down.
There is an archived active pull request featuring our protagonist: https://web.archive.org/web/20240329180818/https://github.com/tukaani-project/xz/pull/86
The other active PR does not, not worth linking.
It's interesting to note that the "Jigar Kumar" in this thread that is pressuring Lasse to find a replacement maintainer has the same e-mail format (<firstname><lastname><number>) as the "Hans Jansen" e-mail that was part of the backdoored commits. The PGP key for the Protonmail account (0xA97B6FC34F5DB756) was created on 2022-04-26, and the first message by "Jigar Kumar" on the xz-devel mailing list was on 2022-04-27, one day later. from https://archive.is/GaWMh
hmm what about JT being from Hong Kong. The academic Jia Tan is from Hong Kong, they speak decent English there
issue 86 is interesting because they're discussing a Chinese architecture (Loongson, based on MIPS IP). It'd be two Chinese people speaking English to each other
The closed PR tab had three archived links. All of them relating to the backdoor implementation.
https://web.archive.org/web/20240329180854/https://github.com/tukaani-project/xz/pull/53
https://web.archive.org/web/20240329191313/https://github.com/tukaani-project/xz/pull/64
https://web.archive.org/web/20240329213320/https://github.com/tukaani-project/xz/pull/73
And with that, I clicked through every link on wayback machine. I don't think there is anything more to find.
xry111 having physical access to some hw and providing benchmarks
so GitHub revoked lasse, jiatan, and also hans. Which suggests GitHub has proof hans is a bad actor. That hans and jia seem to make same mistakes in English is interesting, it shows jiatan or whoever is/are behind decided to make that hans handle
https://github.com/xry111?page=2&tab=repositories he forks the shit out of everything, including the tool 'jansson' https://github.com/akheron/jansson
For anyone trying to follow, this video is the one that made the most sense and explains the most of what it is in the code that makes it so weird. https://youtu.be/gyOz9s4ydho
In the latest liblzma update, a trusted bad actor called 'JiaT75' implemented a backdoor which allows RCE (sending calls to system()) on ssh connections. Here I'm looking into the case and explaining how it works.
Links:
- AndresFreundTec on Mastodon: https://mastodon.social/@AndresFreundTec/112180083704606941
- openwall email: https://www.open...
I see timezone stuff has been discussed here before, but I don't think this particular article has been linked before: https://rheaeve.substack.com/p/xz-backdoor-times-damned-times-and
Also, check the comments. One of the replies from Jia Tan on the mailing list also suggests a UTC+3 timezone.
They just pulled everyone directly involved
"However, I believe that he is actually from somewhere in the UTC+02 (winter)/UTC+03 (DST) timezone, which includes Eastern Europe (EET), but also Israel (IST)" Well, Unit 8200 is huge 🙂
Except sometimes, he forgot to change his time zone. There are 3 commits and 6 commits, respectively, with UTC+02 and UTC+03. <- this is what we were hunting for
in 2 or 3 years everyone is going to fuck up
ppl forget their VPN, their killswitch does not work, etc etc
from comment:
Yura
4 hrs ago
·edited 4 hrs ago
Another evidence:
Jigar Kumar (most probably, Jia Tan's fake account used to promote himself) wrote the letter at Wed, 27 Apr 2022 11:42:57 -0700 (time is in destination server’s time zone).
https://www.mail-archive.com/xz-devel@tukaani.org/msg00555.html
But when Jia Tan replied as he-self, his email program marked “ On Thu, 28 Apr 2022 at 02:42, Jigar Kumar jigarkuma...@protonmail.com wrote”. It gives 15 hours time zone difference to destination mail server, that means Jia Tan replied being in +0300 time zone.
https://www.mail-archive.com/xz-devel@tukaani.org/msg00556.html
Another comment: Also, Russia:
-
Jan 7 - Orthodox Christmas, no sane person would work on that date
-
Jan 13 - 'Old new year', not a state holiday but a small celebration day nonetheless
-
May 1-3 and May 9 - so called May holidays -- are there differences between Russia and Ukraine/Kazakhstan/etc..? not sure
-
Nov 4 - People's unity day -- only in Russia!
https://www.consultant.ru/law/ref/calendar/proizvodstvennye/2023/ here you can find the business hours calendar for 2023.
КонсультантПлюс онлайн (бесплатные некоммерческие интернет-версии системы) содержат огромный массив документов по федеральному и региональному законодательству РФ. В онлайн версиях КонсультантПлюс реализован удобный поиск законов, кодексов, приказов, указов, постановлений, распоряжений, писем и других документов в последней редакции. Кроме докум...
That looks like a pretty solid analysis
Russia would timeline in with 'SMO'?
Committing from airplane.... ugh
not airtight
I wouldn’t speculate about attribution
which cybercriminal works 9 to 5
I’m not arguing that it’s not a state actor, or a org with state actor resources (I believe its a state actor ). Just that we shouldn’t speculate which state actor / org it could have been, since it doesn’t contribute anything
hopefully the VPN provider can be subpoenad, though perhaps they just have logs from passive bridge metadata
not sure Singapore does that
Could be that they used TOR or something to connect to the VPN, using the VPN only to get a plausible IP geolocation, don’t think there is much data of value to be gotten
i think it is fine to speculate, just don't speak in certainty when it is uncertain
gotta work with what you got
Community rules expressly forbid speculation and a strong preference for evidence-based analsysis. We are a research community by nature.
metadata is invaluable btw, you can correlate with something like power loss, and also he may have forgotten his VPN or killswitch at times, or forgotten his Tor
OK well delete whatever I wrote you think is wrong (ie. against the rules)
IIRC this ks how DreadPirateRoberts operated.
(And was caught slipping)
“caught slipping” = ordering fake passports for himself
from the top of my head: he used the same email address somewhere and the same handle from that post was then used elsewhere where he asked security advice. But we do not know for sure as there is something called parallel construction
Yes that was one. The other was his stack overflow mishap.
at one point (around 2016?) the FBI instructed security researchers to keep quiet about a Tor 0-day
Honestly that one is still amusing.
It is, he did pretty well until that point 😂
that was in the very start of his... adventure
it is after that fact that he did pretty well. At the very start he made mistakes, and it is very common to make mistakes in the start (and learn from them)
I still haven’t seen a write up of the xy payload, I just want to say that this is unusual since it’s a global event and there’s probably many skilled reverse engineers (or teams) working on it
It’s basically a CTF with very high publicity.
Speaks to the complexity and resources the attacker spent on this attack.
Would be a good idea to check major Israeli holidays then, like Yom Kippur for example.
This probably doesn’t surprise anyone, but “Jigar Kumar” hasn’t replied to my email. 💔
did you include a tracking pixel? ^^
Nah, I just sent an email for fun asking them if they’re a sockpuppet 
They are not going to be using any of those accounts anymore
Not sure if previously liked here or not, just joined, but I have this on my links https://gist.github.com/smx-smx/a6112d54777845d389bd7126d6e9f504?s=35
<bernard__> Hans Jansen hansjansen162@outlook.com >> Dear mentors, I am looking for a sponsor for my package "xz-utils":
Speaking of potential screwups.
Asking for a link now.
xz was never Hans's package.
Just confirmed that Hans was never a maintainer.
Does bcat have a linguistic analyst handy? Would really be interested to know if they think the corpuses (corpie?) are indicative of the same person.
@bitter geode / @pliant trench pin perhaps? Might come up again.
There was a lot of circunstancial hints that hans is jia tan, but I think that's a clear mistake where they intended to use the maintainer account
What about the non-maintainer upload point in the text?
Is this about the maintainer of the xz Debian packaging?
There was a theory I saw yesterday where there's an issue to remove the xz dependency from systemd, and in response to that there was a hasty push to deploy the long plan, can lead to mistakes like that and an increased weekend work as well
Asking now, hadn't occurred to me.
Issue: https://github.com/systemd/systemd/issues/32028
Suggestion: https://openwall.com/lists/oss-security/2024/03/31/9
Here's the most recent posting I can find from Jia Tan. 28 Mar, one day before the backdoor was found.
<junon> Or, better put, was Hans the debian package maintainer?
<negril> junon: yes, see the non-maintainer-upload
(snip unrelated)
<negril> it's the same as proxy-maintainers in gentoo
<supakeen> Specifically for Debian it means someone isn't a packager, they still have to ask approval and review for each update they submit.
<junon> Gotcha, thanks for the information.
<supakeen> s/packager/maintainer.
Hans was indeed a packager.
So maybe that's a nothingburger.
heres a nice anagram if you can appreciate:
Jia Cheong Tan
CIA Agent John
Is there one for the NSA for Hans Jansen?
LOL
NSA Jen Nash
😂
well xz does sound a bit like the name of the supreme leader of china
This is Hans Jansen.
btw the backticks for compatibility reasons makes no sense at all, as systemd is a dependancy. Systemd as dep implies recent Linux distribution
Yeah. It's used by people who've been writing shell scripts for ages.
next time i need some anonymous alias im gonna use hans jansen
I thought it was the other way around? systemd depending on xz?
Correct
the vulnerability depends on systemd being used, linked to liblzma
no systemd, no backdoor access. Also no ssh being the binary, no backdoor access
Ah you mean the backdoor is dependent on systemd being linked against it.
this is why *BSD, macOS, and all those non-systemd Linux distros were never vulnerable, nor Arch as it had no linked liblzma in /usr/sbin/sshd
Yeah, but the backticks you're talking about were part of xz/the build process for xz, right?
Or were they part of the payload?
part of initital stage
eval grep ^LD=\'\/ config.status
eval grep ^CC=\' config.status
it even checks for linux-gnu and x86_64. So would never have worked on RISC-V, ARM64, x86 (x86-32). Only on systemd Linux distributions on the large OSes such as RedHat, Debian, Ubuntu and derivatives
apparently Arch for some reason did not link sshd against liblzma
wondering why debian links sshd to liblzma
it was some temporary hack not sure about specifics
was slated to be removed
this is possibly why the actor rushed to move
Because sshd needed to send notifications to SystemD.
And SystemD used liblzma.
They did it for convenience since even though the notification protocol is simple and over a Unix Domain Socket, they didn't want to re-invent the wheel to form the requests
so they just linked against libsystemd and used the provided functions to send the notification.
Now they've removed the linkage and are going to re-implement the notification protocol themselves since it's easy and would reduce supply chain attack surface.
ignore my crazy capitalization of things, I'm on a crazy amount of caffeine right now
maintainer btw, they used s/packager/maintainer
Never working on Dec 25: Christmas (for many EET countries) (from https://rheaeve.substack.com/p/xz-backdoor-times-damned-times-and ) <- interestingly, some countries have a 'second christmas day'. NL does. We also have second easter day (today) and may 5th as Liberation day (WWII related)
To further investigate, we can try to see if he worked on weekends or weekdays: was this a hobbyist or was he paid to do this? The most common working days for Jia were Tue (86), Wed (85), Thu (89), and Fri (79). <- which raises the question, where is monday?
Monday may have been a day to do meetings / plan the week (if there was a team behind the Jia Tan alias)
But that’s just speculation
Would be interesting to compare it to other open source maintainers, since we don’t know what ordinary data looks like
to get a baseline?
perhaps specifically versus devs in the timezone where Jia supposedly slipped up
My intuition would be that maintainers (since they are mostly volunteers) would mostly do things in the evening or on weekends / vacation days
hobby -> yes, work -> no
since he mostly committed tue/wed/thu/fri (with slightly less on fri compared to other three) this was his dayjob
he always started committing at 12:00 UTC
https://github.com/emirkmo/xz-backdoor-github
Here's the dataset of github activity
I hope Jia comes out of the woodwork to tell us all about his adventurous xz implementations
I’ve got the post for you: https://infosec.exchange/@tinker/112196180295212632
Put yourself in Jia Tan's shoes, the malicious contributor to the xz backdoor...
It's been, what, two... three?... years since you started this campaign. You've had the entire support of your team and of your chain of command.
Your coders created a complex and sublime backdoor. A secure! backdoor that only you and your team could connect to. H...
fun post but... Your spouse and kids will understsnd why you haven't been at home lately <- does not correlate with commit times. The commit times indicate our Jia had plenty of weekends with friends and family
Also to do code, the code committed Tuesday morning was probably made Monday XD
I had seen that in the last few months they started to work weekends
Anyway, if it's a team, Jia is not a person, it doesn't have a wife.
For the record.
was it in addition to or solely? Deadline related?
In addition, you could see it on the GitHub profile graph, I'll try to find it
Someone finally repro'd the exploit
The kill switch? Don’t you need the private key otherwise
Here the link to the exploit demo: https://github.com/amlweems/xzbot
They patched their own key in! Smart
Yeah they just do a sig check of the public key
so they were able to patch in their own
Yeah I've seen that shared but not sure if legit, can anyone confirm the PoC?
It’s from a google vulnerability researcher. Don’t see a reason to doubt it (and if it would be quickly dismissed in the infosec community)
I've not confirmed it but I would imagine it's probably legit.
The thread on Twitter seemed to be legit.
Would be kind of embarrassing not only for the research but for Google if it was fabricated lol
Nothing to gain from faking a PoC.
Yeah, I hadn't realized when posted, and I wasn't sure if it would be the whole thing or just the build stage.
new found they tried getting it into macos homebrew same night, skipping dozens of versions.
also new account to OSINT!
Output of brew config HOMEBREW_VERSION: 4.2.15 ORIGIN: https://github.com/Homebrew/brew HEAD: 92a4311868322188478d7a90511ec0e8e6b0d7df Last commit: 5 days ago Core tap JSON: 29 Mar 18:18 UTC Core c...
yes it was in Homebrew
I use a Mac with macOS and I had 5.6.1 installed
however, macOS does not use glibc, nor Systemd, nor does it use Linux kernel (so the early stage of malware would've exit 0), there is alos no ldd(1)
instead of ldd you'd use e.g. otool -L /usr/sbin/sshd it has no deps on liblzma or libsystemd. macOS has no systemd, it has launchd which predates systemd and even upstart
Some thoughts about attribution in the XZ backdoor, having just wasted so many hours digging into the details.
The email addresses used for a couple of years at least by the parties involved have absolutely zero trace in any kind data breach or database beyond Github/Gitlab, and maybe Tukaani and Debian and a few mailing lists.
Normally when...
The consensus from infosec is that it was a state actor or state funded actor (with “was” I mean very high likelihood)
lots of companies are 'funded by a state'
What I mean is a actor with state funds
shell companies are used as a cloaking device for secret agencies, you mean those?
'contractors'
In China you have private companies which hack for the government
in China a company is owned by the government 😄
Not quite
not gonna argue the case
https://tukaani.org/artwork.html contains artwork by Jia Tan
Mirror https://archive.is/Wck1J
$ exiftool 1305ad71f9575e8e6174ec426f42b5961d108ff8.png 3f88a5e949db450d0c7a8c62e26c754a67632006.png 96f96d171200c7cfbe4b053a925141e00f5417bf.png e7d878f822fdef3995457d5bd20c9f6bd9cc6b9a.png | rg 'Datecreate|Datemodify'
Datecreate : 2023-11-08T19:22:27+00:00
Datemodify : 2023-11-08T17:40:33+00:00
Datecreate : 2023-11-08T19:22:27+00:00
Datemodify : 2023-11-08T17:40:33+00:00
Datecreate : 2023-11-08T19:22:27+00:00
Datemodify : 2023-11-08T17:40:33+00:00
Datecreate : 2023-11-08T19:22:27+00:00
Datemodify : 2023-11-08T17:40:33+00:00
$ exiftool 1305ad71f9575e8e6174ec426f42b5961d108ff8.png 3f88a5e949db450d0c7a8c62e26c754a67632006.png 96f96d171200c7cfbe4b053a925141e00f5417bf.png e7d878f822fdef3995457d5bd20c9f6bd9cc6b9a.png | rg 'Modify Date'
Modify Date : 2023:11:10 18:01:33
Modify Date : 2023:11:10 18:01:51
Modify Date : 2023:11:10 18:01:21
Modify Date : 2023:11:10 18:01:51
Do we have svg versions of the artwork? 👀
!!!!
Lasse is talking to people in the #tukaani channel on libera
Some tidbits
<Larhzu> The last SECURITY.md commit was such a good example, people thinking it was Jia being malicious while the commit was my suggestion, except that I also wished to change 90 days to 30 days (or even less).
<Larhzu> I haven't but I understood we both would be offline the days matching the Easter.
I'd kill for IRC logs where our suspect writes (or just lurks)
<Larhzu> FH_thecat: IFUNC was legitimate addition but, as IRC logs[*] show, the number of bugs with the IFUNC made me seriously consider ripping it out completely. Jia wanted to keep it.
<Larhzu> [*] I have logs and I hope some of channel regulars have too.
nice
<Larhzu> sh4: Timezone is +0800 (or so) at least. Majority of chat was on Signal.
Signal
<Larhzu> His English was at the same level as mine. Perhaps some errors were different but not much.
<Larhzu> xx: See tests/files/README, most files are created in hexeditor by me. Thus those are the source code. The new files however... I wanted generator programs but I'll save the details for later, sorry.
<sh4> Larhzu, were you contacted by some sort of police ?
<xx> no worries
<Larhzu> sh4: No
<Larhzu> But hasn't been normal work days either due to Easter.
Lasse has not yet been contacted by LE.
<Larhzu> kevans: Based on emails, people assume I'm feeling terrible, especially due to depression history.
<Larhzu> kevans: That's logical but apart from too little sleep (too much sitting indoors playing boardgames) I'm not down.
<Larhzu> kevans: I have been better in the past several months.
so he works at the same time as Europeans (let us exclude Africans and Middle East for a moment), he pretends he is Chinese with some made up shitty name John Woe, and he fucked up once cause he's from same TZ as Lasse (everyone makes OPSEC mistakes)
good to hear
tell him to get a lawyer
and a psychiatrist, FWIW
I am flabbergasted Finish police don't take a break from work and interview this guy ASAP, before he speaks to other people or can change facts. Sloppy
<Larhzu> FH_thecat: "would the added lines have struck you as suspicious" -- I said "yes" but I actually have to cancel that comment partially.
<Larhzu> FH_thecat: I'm still learning about this and I just learned one more thing. Wow.
<FH_thecat> please tell us
<Larhzu> FH_thecat: I mean "wow" as in something that public cannot know right now.
<Larhzu> I will later, if I try now, I will miss something and then speculation starts.
I know what the Dutch secret service was doing when Heartbleed got out I can tell you that lol
yes good on him
he needs to talk to a lawyer and to LE
<Larhzu> Now that I re-think of the build-to-host.m4 additions, I suspect I would have missed it. But the full story will be out later.
and if he is depressed he is going to need professional help, most likely, as this is gonna take its toll
That's really great to hear though.
<int-e> Apachez: My pet speculation is that a genuine contributor was coerced into doing a bit more than that. Who knows. Maybe we'll never know...
<Larhzu> int-e: Knowing for 100 % sure, probably not. Knowing for 90 %, maybe.
As for the XZ logos:
<f_[xmpp]> Larhzu: Weren't the logos made by someone else?
<f_[xmpp]> as in, not made by Jia
<Larhzu> f_[xmpp]: So the story went, friend made but wanted to remain anonymous so copyright was transferred to Jia.
what's the significance of these logos that they're a topic of ongoing conversation?
On the topic of if Jia was compromised.
<Larhzu> Often on the same computer though... but I'll get to this kind of things in time.
<Larhzu> I spent 26 months chatting with him so I know a few more things than most.
Dunno, just seemed notable.
it is content created by our suspect (or well, supposedly; according to Lasse not made by Jia)
honestly there are far worse OSS logo art crimes like OpenBSD we should really discuss.
<Larhzu> I understand it sounds scary but highly likely most commits by Jia are fine. But one has to fine every single one of the bad ones.
<Larhzu> Ripping out all Jia's commits and then redoing them is suggested by some but I'll make decisions when I have investigated things more.
<Larhzu> Most of the time I reviewed patches in branches and we edited them via feedback etc. So often the branches were in state that I would approve. However...
<Larhzu> ...in cases where the merges were done by Jia, it's not obvious that the branch-that-I-reviewed was merged as is.
<Larhzu> For example, perhaps there was one more typo to fix in commit 1 of 4 and thus that would rewrite the commit IDs.
well.. that reminds me of that meme (before memes were a thing) when OpenSSH had a RCE, a picture containing a cartoon of char asking Theo, why is syslogd running. I want to see SSHd and nothing else!!
<Larhzu> Lockal: It's simpler to check one macro in C code than three. All three are needed together.
<Larhzu> Lockal: So cleanroom in my style could be close to identical.
<Larhzu> It sounds like that people assume that Jia worked alone most of the time.
<Larhzu> In reality it got somewhat like that in early 2024 (or slightly earlier, this I haven't verified myself clearly enough yet).
<Larhzu> spacespork: I likely talked with the same person the whole time. That's how things are kept convincing. And Jia learned enough that he would have had the skills do be a good maintainer. It took time though, in the beginning it was bad.
Lasse seems to be convinced he was talking to one person but multiple people were behnd it.
He keeps referring to CAPCOM
<Larhzu> rurapenthe: Thus my CAPCOM analogy: https://en.wikipedia.org/wiki/Flight_controller#CAPCOM
ok for a second I thought he went full Street Fighter.
<susi> Larhzu: how the initial contact happened?
<Larhzu> xz-devel
<Larhzu> FH_thecat: I'm really picky, I want proper sentences and grammar and detailed commit messages, not just a single line. And coding style has to be consistent unless there is a reason to deviate.
<Larhzu> FH_thecat: That is, Jia didn't conform at first and I didn't accept the commits.
<FH_thecat> Larhzu: the backdoor, and the obfuscation seems quite sophisticated. Do you think he could have come up with it himself ?
<Larhzu> spacespork: I don't have a clear thought at this time of the evening. I feel I should review the recent suspicious Jia stuff first (year 2024).
<Larhzu> spacespork: I want to do that with the current repo. So if you get ahead of me then it might be fine or it might be a bit duplicate work.
Wait what 'often on the same computer though'?
Yeah it wasn't clear even with context.
I think he suspects Jia was committing from the same computer
as whom?
he has information that the public doesn't know and will write about it on the site.
As jia
Jia was commiting from the same computer as jia?
No, he means that Jia was most likely connecting from the same machine throughout all of his interactions.
yet there's this VPN from SG?
so we have a time of commits showing a normal 9 to 5 work sched for someone in European TZs (and Middle East and Africa but I am excluding those for now), yet this guy is connecting from a SG VPN and has a Chinese sounding name (which could also be from diaspora as it seems mixed up)
for Lasse, Jia must have seemed like someone from Europe, who was using a SG VPN
I wonder if he has any connection logs
I told Lasse I'd look at his Github actions workflows sometime tomorrow 🙂
At least I can help in some small way
I wonder if Lasse could sue Jia, as he defamed his project
see if NN1 shows up at court
There's no one to serve a subpoena to 🙃
Lasse said there's probably no chance of tracking this person.
yet he knows a thing or two about this person which he ain't gonna share for now
Right. He said he'd post on the site his information.
actually do you happen to know when Lasse went on vacation?
Sometime last week.
right so a state actor has multiple specialists. Jia was the social engineer who knew very little about the backdoor. The person who programmed the backdoor is obviously a grey beard God(dess) well vested in Linux. They ain't spending time on social engineering. They're shit at it, it ain't their expertise, it ain't something they wanna become good at either
yeah Good Friday OK makes sense I guess
My gut inclination is that it was developed well before it was ever committed.
Perhaps even years.
do you know when that dependancy of systemd on liblzma or when ssh depending on openssh was implemented?
They probably did some fishing for maintainership.
Once they got a lead, they developed a plan.
Probably a long time ago.
https://infosec.exchange/@kurtseifried
Kurt is a quite known person in the space and he shared a few posts showing there's no consensus XD. I think it's most likely state sponsored, but wouldn't rule out single actor
2.3K Posts, 82 Following, 1.15K Followers · Chief Innovation Officer https://cloudsecurityalliance.org https://webfinger.io https://gsd.id and #osspodcast
<Larhzu> I spent 26 months chatting with him
So it was initiated around Feb 2022?
Yep
it is around the time shit was hitting the fan UA-RU and CN and the successor of Cold War got started / rebooted :>
I'm not saying RU is behind this btw, given the public key is hardcoded this was a terrific method of achieving NOBUS and getting into RU computers, to name an example. Or US. ANY agency would want this power
Is anyone here part of Linux Foundation or adjacent to it?
@lcamtuf Also a good counterpoint to "only a state sponsored attacker" can do this would be the list of winners of pwn2own: https://en.wikipedia.org/wiki/Pwn2Own# <- I lol'ed
Pwn2Own is a computer hacking contest held annually at the CanSecWest security conference. First held in April 2007 in Vancouver, the contest is now held twice a year, most recently in March 2024. Contestants are challenged to exploit widely used software and mobile devices with previously unknown vulnerabilities. Winners of the contest receive ...
sorry but there are people in infosec who work for state actors (which is also a wide range of diff branches), or for companies who are contractors for government agencies (this may require security clearance). If you go to a hacker conference in NL, you will meet people who work for those agencies. It is as simple as that, and honestly not a big thing either. But you cannot just say that the winners of such contests are some kind of anarchists hackers. Nor that they're all working for-profit. Those days are long, long gone
I know bugger all about GHCQ so i dont know how they do it, nor do i know about the UK police, but these agencies also work together and help each other out, copy methods, etc. IOW I would assume it is similar in UK.
if it is a state actor it is likely they have one person set up OPSEC (for example using Whonix), another person doing social engineering, and yet another who did the programming. Which seemed to be mostly shell programming? Or was it also C? You could even have a different person who's done the backdoor as well as the C and getting it implemented. A single actor being skilled in each and every of those things is one of its kind, and as you know there ain't one James Bond it is a caricature of multiple skillsets
I think if they interacted much in IRC and signal then the person speaking is more likely to be the one coding, but probably not the exploit development
coder together with social engineer in same room
but indeed not the person who programmed the entire backdoor
@chort @kurtseifried @lcamtuf Maybe they just had a bet with a friend. Seriously, why wouldn't it be a single person? They are clearly talented, they probably have a day job that pays more than enough money and the whole xz project wasn't a fulltime job.
We often talk about how we knew it's theoretically possible, and maybe this guy bet someone that he can show that. <- this also just doesn't fly. Who spends so much work hours 9-5 (approx 2 years?) on a joke?
I think that serves more as an example. I think the fact that there's nothing else about them means it's a single purpose goal, even the other side accounts have only contributed to this "project". Saying that, I think it must be funded, not only because of the effort but also because of the risk, whoever did it would be risking a lot.
Assuming it's a state sponsored team, they didn't use a "bot net" to push the changes and for public interactions, as all the "alts" didn't show anything else that's sus. If it was a big agency, they were super diligent with that so that it doesn't "contaminate" other ongoing "projects", or they really were all out working only for this one
funded could also mean a grey hat company like Hacking Team or NSO Group
I said it before: if you are a criminal from CN or RU, why would you thoroughly care about your OPSEC? Nobody's gonna extradite you. This is someone (or a group) who thoroughly cares about not getting attributed to this incident
I didn’t like his replies, he is ignoring the nature of the back door and a lot of other stuff. It isn’t just one hack, it’s the entire effort dedicated and the careful considerations taken.
Like if it’s a single guy I will eat a hat.
if it's a proper fascinator we got a handshake deal.
(though have serious doubts that you'll lose the bet.)
Deal.
TL;DR -- did they identify commits + committers?
Sounds like I can get Lasse some funding 🙂
Happened to know a guy
Idk how 'in the middle' of all of this I want to get though, personally.
funding for what exactly?
OSS funding for the XZ project so he can focus on maintainership and put resources into finding good maintainers, etc.
cover costs for project-related things, ideally
Anyway nothing for sure yet. He's basically coming back to all of this fresh after the holidays.
Will take a few weeks for him to come up with a plan.
this was legacy building as macOS isn't Linux, does not contain systemd, ssh is not linked against xz, heck most macOS machines these day run ARM64 not x86-64. One thing standing out to me is this: ==> xz: stable 5.4.6 (bottled)
General-purpose data compression with high compression ratio
https://xz.tukaani.org/xz-utils/ <- old website
but its honestly minor as the CNAME is gone
interestingly i cannot uninstall xz as it says openssh (installed via Homebrew) depends on it X.x
IMO Lasse is prime witness to a crime, and I expect he is going to be interviewed / debriefed. The police want to make sure he is no suspect. Either way, he is probably going to need a lawyer (even if innocent)
Dumb question, but is there even a criminal investigation yet?
Not a dumb question imo....
Probably in some jurisdictions, "conspiracy to X", who knows
re: context for anyone who hasnt caught up
https://www.openwall.com/lists/oss-security/2024/03/29/4
https://www.youtube.com/watch?v=bS9em7Bg0iU
has anyone looked activity/content on jia's other repos?
https://github.com/JiaT75/STest
looks like a near 1:1 of another repo they have pinned, there's some pretty extensive commit messages for what seems like a personal project
bruh

Dont know how significant/novel this is, lines up with the activity on GH re the unit testing stuff for sure
It's already pretty well established they contributed to that
I don't believe anything malicious has been found
STest is the test framework he developed to create the backdoor.
if you assume he's a one man show
https://research.swtch.com/xz-timeline best timeline I've seen, thus far (correct me if I'm wrong)
that ought to be a good start to get data fed into Maltego
YARA rule -> https://github.com/Neo23x0/signature-base/blob/master/yara/bkdr_xz_util_cve_2024_3094.yar
is the resulting binary object sufficiently weird to be more generally detectable?
From another Discord (bridge):
bjo: Btw, Lasse said now "There was speculation if Jia was a honest contributor who got forced by a state or such to act bad later. I'm confident now that Jia was on the job since his first message. It cannot be proven but his behavior is proof enough to me (how he talked early on; not the code)."
Is there evidence that message is genuine?
I didn't verify that, but he's been known to be around IRC
since yesterday
when he got back from vacation
he also wrote he had something on Jia and he'd write an article on that on his website
he did not want to share it yet
Yeah, I am aware he has been on IRC, but that's precisely why I ask, if there is any log of his conversations in IRC for that comment
Libera chat has a by default no log policy. Though you can never verify if someone does log
https://tukaani.org/xz-backdoor/ has a minor update specifically the sections were added: To media and reporters and Plans were added

nothing new added past 12 hrs right
I'm a bit behind 
"
Saw an interesting commit over in cpython: python/cpython@ea51476
Its part of PR python/cpython#115989
The bytecode there seem to be .xz test files from the 5.6.1 release.
Fortunately the cpython developers appear to have removed the bytecode from the PR (python/cpython@32725a7)
I then saw the person who made the PR to cpython seems to be 'Chien Wong' who:
a) has a commit in xz-utils recently (git.tukaani.org/?p=xz.git;a=commit;h=eee579fff50099ba163c12305e81a4bd42b7dd53)
b) was thanked by Jia Tan for the work on the RISC-V stuff (git.tukaani.org/?p=xz.git;a=commit;h=440a2eccb082dc13400c09e22308a58fef85146c) - note that Jia Tan updated the risc-v 'test' files (git.tukaani.org/?p=xz.git;a=commit;h=0b4ccc91454dbcf0bf521b9bd51aa270581ee23c)
c) pushed for a Rust project to be updated to include xz 5.6.0 (Portable-Network-Archive/liblzma-rs#91)
d) mentioned a questioned change in the cpython PR as, basically, 'this was important to prevent tests failing' - different scenario but it remind me of the apparent reasoning for 'fixing' the valgrind issue. (python/cpython@68979bc#r1505355565)
Not saying that person is involved in this, could just be poor timing (everything can look suspicious due to hindsight). Person was perhaps just excited to have added the RISC-V feature and wanted to see other projects use it. And it is for different architecture than the known backdoor. Just wondered if this risked adding a form of backdoor to python at the time, even as accident."
Probably nothing tbh.
https://www.livemaster.ru/m@xv97.com
Das interesen.
I found something minor
https://gitlab.com/ivq and https://github.com/snappyJack both had contact with jiatun and both have cat avatars
I've checked https://github.com/snappyJack/Notion-Image-Hosting/tree/main/薛兆丰的经济学讲义 (sorry link contains Chinese chars) and nothing there on the EXIF data. This guy seems to be into CTF, but also some more shady stuff (IMO)
not sure about this, look here for example: https://www.livemaster.ru/m@xv97
lol
this website is broken at the very least 😛
he also has a Twitter account

wtf
I think you can change your handle on Twitter, so it’s likely just a jokester.
Definitely a joke account
jiat0218@gmail.com on X.com however? is that a joke account too?
without access to that email address cannot change your email address to it right?
It does indeed look like there exists an account with that email, but do we know that it's tied to the JiaT75 account? I don't think there's a way to check that.
that is the same email address Jia used on GitHub
please add Twitter to the subpoena list thank you
Damien Miller 2024-03-30 09:22:14 AEDT
Created attachment 3798 [details]
standalone systemd notifications
This implements the equivalent of sd_notify() without bringing in the rest of systemd bloat. It it also signal-handler safe, which is not the case for the originally proposed diffs.
Lightly tested.
Committed as 08f579231cd38 and will be in OpenSSH-9.8, due around June/July.
not sure if ontopic here but its good to see such commits
I was thinking the JiaT75 twitter account specifically.
I used Holehe and holehe/modules/social_media/twitter.py appears to be using https://api.twitter.com/i/users/email_available.json
Ok, but is the script able to return the username associated with the email? All I'm saying is that while there does appear to be an account that uses jiat0218@gmail.com as its email on Twitter, do we actually know this is the JiaT75 account?
https://twitter.com/jiat0218 is an account and jiat0218@gmail.com is an email address used as login at Twitter. If they're same I don't know
https://twitter.com/JiaT75 is different from https://twitter.com/jiat0218
That definitely looks like a joke account given the March 2024 creation date.
either way, jiat0218@gmail.com is the email address of the person who committed as jia tan
https://fossies.org/linux/xz/ChangeLog with ironically that slip-up being at the bottom of the changelog (OPSEC mistake at start is common) 10012 Author: Jia Cheong Tan jiat0218@gmail.com
10013 Date: 2022-12-20 22:05:21 +0800
however they were active before that date
what this could indicate is a different operator?
jiat75@gmail.com has no twitter presence
so jiat75@gmail.com is not the email account for https://twitter.com/JiaT75
si thise odd time commits always occurred on saturday and sunday
si = so
so, I don't know how GCHQ works with red teaming or NSA with TAO but would you accept remote work in such an environment? If via shell company?
I don't know how the various state actors have policies on such
the majority of the work being done though seems to be in the afternoon
(interestingly, 15 march and 15 october are almost never the very dates DST changes, and they also differ per country, so take that into account when reading the plot)
what stands out is that one winter commit at Wednesday at about 2 AM
which commit was this?
also what stands out is the lack of commits at Friday afternoon, they never commit then
3 times in winter they did
the two long days in summer also stand out
I wonder which days that was
might be interesting to figure on circumstances like weather on those days
sunday is clearly the off day
but not holy enough to do no work at all, ever
esp morning tho
gotta attend church? 😛
0 = monday btw
Simple alternative https://fxtwitter.com/0xAsm0d3us/status/1774534241084445020
https://www.wired.com/story/jia-tan-xz-backdoor/ they attribute APT29/SVR
[23:53]Trxnn: Jigar Kumar: "Patches spend years on this mailing list. 5.2.0 release was 7 years ago. There is no reason to think anything is coming soon."
[23:53]Trxnn: who really says "5.2.0 release"? that's not grammatically correct in english
[23:53]Trxnn: in russian it's grammatically correct though
[23:55]Trxnn: Jia Tan: "If you can think of a better character I would be interested to hear, but I don't think those are better."
[23:55]Trxnn: "interested to hear"... it? about it?
[23:55]Trxnn: just "to hear". not grammatically correct in english
anyone here versed in Russian who can underline this?
On that, the gist author here says they have logs, if someone can do that analysis, they could ask for more data there
https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27?permalink_comment_id=5010126#gistcomment-5010126
Where did you get the logs above?
thesamesam is one of the three people who Jia followed on GitHub hmm
someone put some of the xz-devel texts in GPT and asked it the native language of the person who wrote the English
[00:15]Penny606: GPT over several emails: "Slavic languages (e.g., Russian, Polish): The sentence structure inversions, emphasis use, and potential calques all lean towards this possibility."
Are you conversing in IRC now?
nah I use IRC-style Discord
Three years ago, #FDroid had a similar kind of attempt as the #xz #backdoor. A new contributor submitted a merge request to improve the search, which was oft requested but the maintainers hadn't found time to work on. There was also pressure from other random accounts to merge it. In the end, it became clear that it added a #SQLinjection #vul...
it is from the XZ Backdoor Discord
hm can't mention it
send it to you in private
hope that's OK
As a person who talks about releases all day long, we say "x.x release" all the time, whether or not it's grammatically correct
Fully agreed. This is absolutely common, no matter the nationality of the speaker.
I did some analysis on it with various versions of GPT4 and it isn't conclusive. Some versions are unsure, tell to be careful, others say russian or slavic, yet others say could also be Chinese, Korean, or Japanese
I'm afraid that's not the standard of research we're usually going for here. LLMs notoriously hallucinate and give wrong answers, especially when prompted with a leading question.
yes I know, that is why I didn't share it
I did not ask with a leading question btw
Have you tried:
"Are you sure about that? I believe that if you look carefully, you will find his language to contain particularities of Finnish speakers."
(then ChatGPT, being a good bot, will agree with you)
"Oh I was just messing with you, actually he is certainly Japanese. I wonder how you didn't catch that."
No but honestly, all else being equal I would expect a bias of a general generative language model towards whatever foreign language (or foreign language mistake type) is most represented in the training data.
Also, it give you reasons why it's this and that language because such and such inflection in the past tense and this and that kind of expression reflects the cognitive style of expressing whatever....
But aside from helping you to to catch irregularities (that you then have to verify yourself quantitatively) I don't believe ChatGPT can be trusted for this application.
I mean, honestly, this would also be an interesting testbed for a forensic error-rate study. You'd need to have a large enough, annotated dataset of known EFL speakers (preferable of different levels (A1-C2), preferably somewhat representative) and test ChatGPT on its error rate identifying the speakers.
someone who spoke Russian (native Czech) came up with the idea that it could be a Russian
here's the question I asked: Help me identify the native language of the following written English excerpts written by the same person [...]
so, with that Czech person who speaks Russian saying that, and cause Penny606 whom I quoted above I wanted to reproduce independently
It makes sense, in a linguistic way. Teams internally use shorthand language when juggling multiple releases (ie, right now I am talking about 24.1, 24.2, 11.0, 10.0) and we never add "release" to the phrase because we don't need to - everyone knows "release" is implied
However when talking to stakeholders we tack "release" on the end, so "24.1 (pause) release" to give context to the external audience.
Additionally on my teams I have 6 nationalities (at least), they aren't necessarily homogeneous in language. Hopefully this is helpful
what about the sentence 'There is no reason to think anything is coming soon.'?
I mean, I understand what is being said, but it does not sound like a native English speaker to me
It sounds like normal colloquial English
With that being said, no release is to be expected.
seems to be just someone who's into the latest and the greatest TBH, would also be a great cover though
I've seen that person for years on github. They like to post about the latest releases everywhere. What's the relevance in this case?
@sullen raptor, I deleted your messages that pointed out a specific individual. Please don't do that, as the risk of false accusations here is way too high and we don't want to have a Boston Bomber Reddit situation.
sure np
elsewhere then, there are people looking for similar patterns
my take on it, is that it won't be socially acceptable anymore to complain about a project being unmaintained
I think it's more likely that there'll be an increasing "professionalisation" of maintainership under foundations.
that is the positive side of it, the other one instead of complaining a better response is to be courteous asking if you can help the dev out with anything, with alternatively sponsorship or quit using the project (both costs money)
stuff like 'For example, I think that @EasyNetDev can be officially a team member to manage this dead project.' is just a red flag now. It won't be socialle acceptible anymore
that goes one step further than claiming/complaining a project is unmaintained
I think considering the relatively large corpus of comments and mailing list messages, linguistic analysis isn't the wrong approach.
It's definitely what many teams around the world would be using right now.
Also, code should be considered. My guess is that person took part in more than one CTF - there is a specific "art" in writing code that looks kinda normal but too complicated to give it too much attention. That kind of code is often built into CTFs as a challenge.
What do we known about the code that was added? How much was it?
it is being reversed
Pod with Silver Back Gorilla of Nerds(?):
https://risky.biz/RB743/
Risky Business #743 -- A chat about the xz backdoor with the guy who found it
that's reaching, those are things I would say and I'm dreadfully american
👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆 👆
You're most likely correct.
have been having a lot of conversations about this with a few very involved folks and they all assert the same thing.
People who know a lot about regulations and community building in OSS specifically.
The XZ thing has really had a silver lining in this regard
Yes but it may be the tip of the iceberg regarding the damage already done in other projects
Of course
But the paranoia doesn't solve anything; learning the lessons, responding accordingly, and having the newfound vigilance solves it.
Now a bunch of legacy codebases are being audited. That's a very good thing.
Companies are now even more incentivized to fund OSS, and with the EU CRA going into effect soon, they're going to be even more incentivized.
This is exactly it. In the end, the economic and ideological factors that have driven open source thus far will outweigh the risks. Yes, the risk calculus has changed, but the benefits are just too great to ignore.
Absolutely.
We're both lucky and fortunate to have this entire situation turn out like it has, IMO.
interesting hypothesis:
[22:57]GⓇÅNĐPĄ ₡ÅÑĄRŶ: 100% the 5.6.1 was instuction call finder patch.... FOSS Team Jia made the adjustment for the debugger helper. Jia Backdoor .o team FORGOT TO TAKE OUT THEIR PRINTS! That's gotta be it!
they're going to use some tricks to slow CPU down and sigstop or use some ASM to slow down the process, this way they can document exactly what happens client-server wise
Needs more context for those not following intimately
for the first comment they had to quickly adapt, while the .o part was probably written by some old neckbeard, they were out of sync for the 5.6.1 release
people are reversing the backdoor they run a vulnerable liblzma version and sshd, in a VM or dedicated machine. Then in order to catch what exactly is being processed, they need to slow down, limit, or stop the CPU
That doesn't make much sense to me.
Unless they're exploiting some sort of timing attack they don't need to slow things down to use a debugger.
Unless I'm misunderstanding something, but that's all very vague.
they want to know exactly what the payload does, specifically the .o
trying to make it work with in this case asterisk instead of sshd
there are also subtle differences between 5.6.0 and 5.6.1
leithal_weapon: I can confirm my PoC of dl-audit hooking from ifunc works in glibc 2.35+ which is the version that added audit support for ld.so bind-now mode
leithal_weapon: whats also intersting about glibc 2.35's ld.so is that there are two new dynamic exports (compared to earlier versions before dl-audit.c refactor) - _dl_audit_preinit _dl_audit_symbind_alt
and guess what - the backdoor code has them in its string list!
the exploit also has "GLRO(dl_naudit) <= naudit" which also matches an assert print string from glibc's rtld.c - which is in the dl_main function
I wonder if it uses these symols to hook those functions or to work out the internal ld.so struct offsets rquired to setup the dl-audit hooking - these offsets would change between glibc builds and distros
leithal_weapon: this also gives another area for git commit reviewing - looking at the people involved in the glibc dl-audit refactor between glibc 2.31 and 2.35 which conveniently added the bind-now support and these new dynamic symbol exports
basically before version 2.35 this exploit have not worked
maybe someone here can help e.g. with Clickhouse?
leithal_weapon: anyone who wants to do some git commit sleuthing should investigate the git commits for glibc between version 2.31 and 2.35 in which the ld.so dl-audit stuff had a refactor which added various bits and pieces the exploit requires to work
before 2.35 it is not possible to use dl-audit interface hooking with a bind-now akak full relro distro
so this has the purpose of possibly helping with attribution, as well as the ability to find more harmful commits and/or peer pressure behavior
https://twitter.com/bl4sty/status/1776691497506623562 @next marlin thanks for the find
I appreciate the effort but i wonder how you find the time
I'm expecting some interview questions on the topic i the coming weeks
got quite a bit of leisure time lately
You mean GOT right?
GOT right?
week later, Lasse still not interviewed by police he said on IRC yest
guy's sitting on Signal chats w/Jia
at least someone (Sam) has chat logs of the IRC channel as well
@sullen raptor Just a joke (apparently a bad one)
You wrote got, GOT - Global Offset Table. Never mind!
"I don't know if it's relevant, but it appears that Hans Jansen has an account on proton.me (hansjansen162@proton.me), with the Outlook address (hansjansen162@outlook.com) set up as the recovery email."
some updates at gist as well as discussion https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
specifically this one: "The "a systemd developer suggested extending the approach to compression libraries" comment was 2 days after the release of 5.6.0, more relevant would be systemd/systemd#31131 (comment)
The timing of 5.6.0 is a good fit for getting into Ubuntu LTS, and that could explain the timing no matter what happened at systemd.
Lennart and Andres are both working at Microsoft, even the reverse direction that some government agency had advance knowledge of the planned backdoor and nudged people in the right direction cannot be ruled out."
"From my own participation in discussions on IRC, the plan was absolutely to be in the next Ubuntu LTS, btw. Jia pushed for an accelerated release schedule to make it in."
"@thesamesam Regarding "Checking other projects for similar injection mechanisms", Debian has an online search engine that provides literal and regex searches over up-to-date sources of the 38k packages in Debian unstable like:
codesearch.debian.net/search?q=grep+-aErls&literal=1
codesearch.debian.net/search?q=Automake+1.10a&literal=1
I checked interesting strings from the manipulated build-to-host.m4, and there was nothing that looked suspicious to me."
on top of that, ivq (Chien Wong) replied to make clarifications
yet another good write-up https://securelist.com/xz-backdoor-story-part-1/112354/

