Are We PEP740 Yet?

(trailofbits.github.io)

114 points | by djoldman a day ago ago

80 comments

simonw a day ago ago
I suggest reading this detailed article to understand why they built this: https://blog.trailofbits.com/2024/11/14/attestations-a-new-g...
The implementation is interesting - it's a static page built using GitHub Actions, and the key part of the implementation is this Python function here: https://github.com/trailofbits/are-we-pep740-yet/blob/a87a88...
If you read the code you can see that it's hitting pages like https://pypi.org/simple/pydantic/ - which return HTML - but sending this header instead:
```
    Accept: application/vnd.pypi.simple.v1+json
```
Then scanning through the resulting JSON looking for files that have a provenance that isn't set to null.
Here's an equivalent curl + jq incantation:
```
    curl -s \
      -H 'Accept: application/vnd.pypi.simple.v1+json' \
      https://pypi.org/simple/pydantic/ \
    | jq '.files | map(select(.provenance != null)) | length'
```
[-]
- Cthulhu_ a day ago ago
  That's the first time I've seen JSON api standard headers in the wild. There was a project where an architect indicated our APIs should be built in that fashion, but people just... disregarded it completely out of pragmatism, also because our endpoints were just pure API / JSON endpoints, never anything else. But seeing how it's used in the wild is pretty clever, same endpoint for different use cases.
  [-]
  - the_mitsuhiko 20 hours ago ago
    Python folks are a bit obsessed with weird, novel or otherwise barely adopted standards, particularly around packaging and PyPI. They also use Macaroons for tokens.
    It's quite interesting to see, but they rarely become particularly popular outside of that community.
    [-]
    - woodruffw 16 hours ago ago
      I can take partial credit (blame?) for going forwards with Macaroons. The reason those were selected originally is because they allow distributed permission attenuation, and the thinking was that individual users/orgs could manually attenuate their scopes as needed.
      In practice, that never really panned out (very few users actually attenuated their credentials). If we were to reimplement PyPI's API tokens today, I'd likely go with a more traditional design (even though all of this is a black box from a normal user's perspective - a Macaroon looks like a normal opaque token, just chunkier).
Tknl a day ago ago
https://slsa.dev/ gives much clearer explanations about the why of this work. Github recently started offering a SaaS sigstore implementation including support for private reps. https://docs.github.com/en/actions/security-for-github-actio... Anyone working on OT should be quickly moving towards this.
cyrnel a day ago ago
Why invest so much time and money in a feature that prevents such a small percentage of data breaches that it's not even categorized on the 2024 Verizon Data Breach Investigations Report?
The vast majority of breaches are caused by credential theft, phishing, and exploiting vulnerabilities.
It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.
[-]
- darkamaul a day ago ago
  The fact that a security measure doesn't solve all or even most breaches doesn't mean it's not worth implementing. Supply chain attacks may be a smaller percentage of breaches, but they can have massive impact when they do occur (see SolarWinds). Security is all about layers - each measure raises the bar incrementally.
- tptacek a day ago ago
  The 2024 DBIR, for whatever it's worth, repeatedly mentions software supply chain attacks.
  [-]
  - cyrnel 17 hours ago ago
    They consider accidental vulnerabilities to be "supply chain attacks". They make no mention of build/packaging system attacks except for SolarWinds.
- twothreeone a day ago ago
  Probably because they got a government contract under which they receive funding for 3-5 FTEs over 24-36 months in return for quarterly reports - and a tool like this makes the DARPA PM happy. They're one of those "Cyber Defense Contractors"..
  [-]
  - woodruffw 17 hours ago ago
    This has nothing to do with DARPA. As the linked post in the comments says, this work was funded by Google.
    Source: I wrote the contract for it.
    [-]
    - twothreeone 10 hours ago ago
      Thanks, I appreciate the honesty.
- Cthulhu_ a day ago ago
  Why not? You're presenting a false dichotomy, the time spent on this security does not take away time spent on the other ones you mentioned, and ultimately all security measures should be taken.
  [-]
  - cyrnel 17 hours ago ago
    Security is about risk reduction; like creating an ordered list of measures that reduce the most risk for the lowest cost.
    Working on stuff outside of the top 99%+ of effective solutions deserves to be questioned.
    As an alternative, they could have been investing time in a form of capabilities-based security for packages like Deno has where not every pip package can spawn info stealer processes at install time or runtime. That would address both the build compromise attacks and actually-common attacks like vulnerability exploitation.
- itsgrimetime a day ago ago
  > Why invest so much time and money in a feature that prevents such a small percentage of data breaches ...
  Because it's a tractable problem that these devs can solve - and just because they're working on this doesn't meant they (or others) aren't also working on the other things.
  > It doesn't matter that you can cryptographically verify that a package came from a given commit if ...
  Sure, but just because it doesn't solve every single problem doesn't mean it's not worthwhile
- gklitz a day ago ago
  They already went through requirering 2FA for the most popular packages: https://blog.pypi.org/posts/2023-05-25-securing-pypi-with-2f...
  This is just another step in increasing security. And of cause that is something you want to preferably do prior to breaches not only as a reaction.
  [-]
  - the_mitsuhiko a day ago ago
    Because publishing goes with GitHub actions for the most part for attestations the attack vector is getting access to GitHub which might be easier at this point.
- some_furry a day ago ago
  Because Verizon's report, while a good read, isn't the end-all-be-all of threat intelligence.
  https://www.wired.com/story/notpetya-cyberattack-ukraine-rus...
  https://krebsonsecurity.com/2020/12/u-s-treasury-commerce-de...
  Software supply chain attacks are rare, but when they happen, they're usually high-impact ordeals.
  [-]
  - MattPalmer1086 a day ago ago
    Correct, software supply chain attacks may be rarer than a simple SQL injection attack, but they are much harder to detect and defend against, and can have very high impact.
    Where I work (highly regulated financial utility), we would absolutely love to have signed attestations for the provenance of python packages.
    I also have personal experience of NotPetya; I produced the forensic analysis of how it compromised the whole of Maersk. Took less than 30 minutes to take out their entire global IT infrastructure, with a cost of about $400 million.
    [-]
    - pastage 20 hours ago ago
      You need to know where these attestations were made, you need a TPM telling you that this was done on a patched untouched server, in an patched untouched VM, on a signed attested code base, with deploy scripts unchanged by anyone.
      We are not there as long as Jenkins and most build infrastructure is basically Remote exploit as a service it is also going to take a long time. While Petya like in Ukraine is harder the more attestations you add there is always a hole, especially when all of your infrastructure needs to run things from some proprietary vendor.
      [-]
      - michaelt 18 hours ago ago
        The proposal is an attestation from Github, owned by Microsoft.
        They could instead use a cloud-based trusted execution environment on Azure, and get an attestation from some combination of Intel and Microsoft.
        Or they could use a TPM and get an attestation from Infineon that the Microsoft-signed shim loaded a kernel blessed by a Microsoft-approved Linux vendor such as Canonical.
        Seems to me that just trades one corporate attestation for another.
        [-]
        amiga386 16 hours ago ago
        And that attestation does not affirm that the device's /usr/bin/python3 was squeaky-clean and didn't Ken-Thompson the package build process. It just affirms which machine. The trust that the machine itself is not compromised is implicit; "oh, yes, we trust Microsoft not to mess up its GitHub Actions worker VMs"
        What would work better is if _everybody_ could run the same CI action to produce a release tarball, and it produced identical builds, and everyone doing this could publish their findings - we can federate a heterogenous network of verifiers, rather than trust a single source's attestation
      - MattPalmer1086 20 hours ago ago
        Well, those would be nice too. But I'm happy to not let the perfect be the enemy of the good.
- rfoo 19 hours ago ago
  > It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.
  If that commit has accidentally-vulnerable code, or someone just gets phished and attacker added some malicious code to the repository with his creds, it is visible.
  However, if the supply chain was secretly compromised and the VCS repo was always clean, only the release contains malware, then good luck finding it out.
  We've all witnessed this earlier this year, in the xz accident, while the (encrypted) malicious code was presented in the source code repo as part of test data, the code to load and decrypt it only ever existed in release tarballs.
- tzlander 20 hours ago ago
  This method favors big corporations and provides further lock-in. Python only does what Microsoft/Instagram etc. demand.
  So you get suit-compatible catch phrases like "SBOM" (notice how free software has been deliberately degraded to "materials" in that acronym!).
  The corporations want to control open source, milk it, feed it to their LLMs, plagiarize it and so forth. And they pay enough "open" source developers who sell out the valuable parts that are usually written by other people.
  As you say, it's partly security theater because of the other attack vectors that are especially relevant in an organization that has no stringent procedures, no open discussion culture or commitment to correctness like e.g. FreeBSD.
marky1991 a day ago ago
Could someone explain why this is important? My uninformed feeling towards PEP 740 is 'who cares?'.
[-]
- darkamaul 21 hours ago ago
  Supply chain attacks can exploit gaps between source code and distributed packages. Today, if PyPI were to be compromised, attackers could serve malicious packages even if the source code is clean.
  Attestations provide cryptographic proof linking published packages to specific code states. This proof can be verified independently of PyPI - reducing exclusive trust in the package index.
  Worth noting, attestations aren't a complete defense against index compromises since an attacker could simply stop serving attestations (though this would raise alerts for users monitoring their dependencies' attestation status).
  Is this a silver bullet? No. If an attacker compromises a project's source repository, attestations won't help. However, it meaningfully reduces certain attack vectors and moves us towards being able to cryptographically verify the entire chain from source code to deployed package.
  (Disclaimer: I helped build this feature for PyPI)
  [-]
  - amelius 21 hours ago ago
    > Is this a silver bullet? No. If an attacker compromises a project's source repository, attestations won't help.
    But that's a huge attack surface.
    [-]
    - MattPalmer1086 20 hours ago ago
      And one that needs a different solution.
      Update: That a control doesn't solve all problems is irrelevant. By that measure, we would have no controls at all.
      [-]
      - amelius 19 hours ago ago
        > And one that needs a different solution.
        Does the community have any ideas about that?
        [-]
        MattPalmer1086 17 hours ago ago
        If you're asking how we can prevent people breaking in to a source code repository, the answer is mostly just the same as anything.
        Patch, apply principle of least privilege, make sure everyone uses strong authentication. Monitor the system.
        For SCM specific controls we could also require signed commits, apply branch protection and any other system specific controls, and enforce code reviews before commit.
        [-]
        amelius 17 hours ago ago
        That doesn't sound like a lot of fun, and I wonder if it is reasonable to ask from maintainers who do this work in their spare time.
        [-]
        amelius 15 hours ago ago
        Maybe in the near future we could have an AI system that:
        - Connects to HDMI
        - Monitors any text that's on the screen
        - Checks that the user is watching the screen (or at least is at the computer)
        - Checks that any changes committed into the Git repository have been at least on the user's screen for X seconds, while they were sitting near the computer.
- hadlock a day ago ago
  I believe this is a system where a human/system builds a package and uploads and cryptographically signs it, verifying end to end that the code uploaded to github for widget-package 3.2.1 is the code you're downloading to your laptop for widget-package 3.2.1 and there's no chance it is modified/signed by a adversarial third party
  [-]
  - marky1991 a day ago ago
    That's my understanding also, but I still feel like 'who cares' about that attack scenario. Am I just insufficiently paranoid? Is this kind of attack really likely? (How is it done, other than evil people at pypi?)
    [-]
    - OutOfHere a day ago ago
      Yes, it is likely. It is done by evil intermediaries on hosts that are used to create and upload the package. It is possible for example if the package is created and uploaded on the developer laptop which is compromised.
      ---
      From the docs:
      > PyPI's support for digital attestations defines a strong and verifiable association between a file on PyPI and the source repository, workflow, and even the commit hash that produced and uploaded the file.
      [-]
      - abotsis a day ago ago
        It still doesn’t protect against rogue commits to packages by bad actors. Which, IMO, is the larger threat (and one that’s been actively exploited). So while a step in the right direction, it certainly doesn’t completely solve the supply chain risk.
        [-]
        MattPalmer1086 20 hours ago ago
        I'm not sure there is any way to completely solve supply chain risk. All you can do is raise the bar for a successful attack. Right now, we hardly have any controls at all.
      - mikepurvis a day ago ago
        It’s honestly a bit nuts that in 2024 a system as foundational as PyPI just accepts totally arbitrary, locally built archives for its “packages”.
        I appreciate that it’s a community effort and compute isn’t free, but Launchpad did this correctly from the very beginning — dput your signed dsc and it will build and sign binary debs for you.
        [-]
        amiga386 18 hours ago ago
        Why not? It also accepts the totally arbitrary, locally typed "code" from the author.
        I'm also happy with the Launchpad approach, but even happier with debian's Reproducible Builds approach; you _can_ provide a binary package that you built on your own machine, but several other people will _also_ build it. With a reproducible build, they will all get _exactly_ the same result for a given architecture. If they don't... we've found a problem.
        I'd prefer to trust:
        * the creator of the package over the distributor of the package
        * a multitude of independent distributors who can all vouch the package builds the same for them, over a single distributor who builds it themselves and says "trust me bro"
        baq a day ago ago
        Build farms are expensive though. The PSF probably doesn’t have this kind of money and if they do, there’s a whole lot of other Python issues to fix.
        https://xkcd.com/2347/ all over again.
      - marky1991 a day ago ago
        Could you explain why you think it is a likely risk? Has this attack happened frequently?
        [-]
        OutOfHere a day ago ago
        It is likely in the same way that aliens are statistically likely. There is no evidence so far afaik, but you will likely never find out when it happens, and it can compromise the whole world if it were to happen to a widely used package. It is not worth the risk to not have the feature. I even think it should ideally eventually become mandatory.
        [-]
        LudwigNagasena a day ago ago
        If it happens, you won’t notice it even with PEP740 because all those “trusted publishers” already are or may easily be compromised by state actors. It’s all smoke and mirrors without reproducible builds.
        bostik a day ago ago
        Yes. Solarwinds.
        Code checked out from the repository was not the same that was used to build the actual release. These are not high-likelyhood incidents but when they do occur, they tend to have high impact.
        And more recently, semantically similar code-vs-build mismatch: CrowdStrike. The broken release was modified after the actual build step, as per their own build process, and the modified artifact is what was released.
        [-]
        irundebian a day ago ago
        To which CrowdStrike incident are you refering to? The global impact CrowdStrike incident was caused due to a driver defect which wasn't caught by quality assurance processes. It had nothing to do with malicious actors which were interfering with the code repository or the software deploment process.
        [-]
        bostik 17 hours ago ago
        The same incident, because the root causes are even more messed up than just shoddy QA.
        Yes, the thing wasn't caught because of missing QA. What I find even worse is that the build process for their "channel files" involved:
        * building a release in CI, for which tests were run * modifying the built artifact as a post-process step, and * uploading this modified end result into their CDN infrastructure
        In effect, what they actually built from their sources in the CI pipeline was not what was delivered to end users. You are correct in that attestations wouldn't help against such flagrant lies. And it wasn't a malicious act (although maliciously incompetent might qualify).
        That post-build modification step would fly in the face of the attestation concept. It wouldn't help against having an empty set of tests, but an attestation-friendly build process at least would discourage messing around with the artifacts prior to release.
      - cyrnel a day ago ago
        I'd encourage you to read the Verizon DBIR before making statements about whether a given attack is likely or not. Hijacking build systems is not likely: https://www.verizon.com/business/resources/reports/dbir/
        [-]
        usr1106 a day ago ago
        A directed attack against the homegrown build system of a small company is unlikely. An attack against a high profile, centralized system or a commonly used package in such system is something to be prepared against.
    - otabdeveloper4 a day ago ago
      You are correct. Start distributing and requiring hashes with your Python dependencies instead.
      This thing is a non-solution to a non-problem. (The adversaries won't be MiTM'ing Github or Pypi.)
      The actual problem is developers installing "foopackage" by referencing the latest version instead of vetting their software supply chain. Fixing this implies a culture change where people stop using version numbers for software at all.
      [-]
      - baq a day ago ago
        Nothing stopping version numbers from being 512 bits long. It’s nice if they can be obviously correlated with time of release and their predecessor, which hashes alone can’t do.
      - a day ago ago
        [deleted]
    - gklitz a day ago ago
      It’s like HTTPS vs HTTP for your packages. It’s fine if you don’t care but having more secure standards helps us all and hopefully doesn’t add too much of a headache to provides while being mostly semi invisible for end users.
  - TZubiri a day ago ago
    1- Why not compile it? 2- does pip install x not guarantee that?
  - otabdeveloper4 a day ago ago
    Yeah, because the problem with Python packaging is a lack of cryptographic signatures.
    /"Rolls eyes" would be an understatement/
- 20 hours ago ago
  [deleted]
- rty32 a day ago ago
  https://en.m.wikipedia.org/wiki/XZ_Utils_backdoor
  [-]
  - marky1991 a day ago ago
    But that involved one of the developers of said package committing malicious code and it being accepted and then deployed. How would this prevent that from happening?
    I thought this was about ensuring the code that developers pushed is what you end up downloading.
    [-]
    - rty32 a day ago ago
      No, part of the malicious code is in test data file, and the modified m4 file is not in the git repo. The package signed and published by Jia Tan is not reproducible from the source and intentionally done that way.
      You might want to revisit the script of xz backdoor.
      [-]
      - epcoa a day ago ago
        An absolutely irrelevant detail here. While there was an additional flourish of obfuscation of questionable prudence, the attack was not at all dependent on that. It’s a library that justifies all kinds of seemingly innocuous test data. There were plenty of creative ways to smuggle in selective backdoors to the build without resorting to a compromised tar file. The main backdoor mechanism resided in test data in the git repo, the entire compromise could have.
progval a day ago ago
According to this page, urllib3 does not use trusted publishing. According to https://docs.pypi.org/project_metadata/#verified-details , trusted publishing and self-links are the only ways to have "verified details". However https://pypi.org/project/urllib3/ shows Changelog/Code/Issue tracker as "Verified details" even though they are not self-links. How come?
urllib3 does not have a recent release that could explain https://trailofbits.github.io/are-we-pep740-yet/ lagging behind.
[-]
- darkamaul a day ago ago
  This page only shows if a package has been uploaded with attestations .The verified details (Changelog/Code/Issue tracker) are showing because they do use Trusted Publishing.
  However, they have not published a new version since the beginning of attestation support in PyPI. That's the meaning of the clock icon right to the package name.
  Their workflow responsible for publishing new releases [1] has support for attestations. Thus, it will turn green on this page with the next project release.
  [1] https://github.com/urllib3/urllib3/blob/main/.github/workflo...
physicsguy a day ago ago
People don’t have to use GitHub, and certainly don’t have to use GitHub Actions even if they do
[-]
- globular-toast 21 hours ago ago
  "How can I trust you?"
  "I am trusted."
  It's basically the same model as HTTPS. Not sure if it has a name. "Too big to fail" security? Security by fiat?
Arch-TK 19 hours ago ago
Something this doesn't answer:
Can I make my package green without having to compromise my integrity by utilising proprietary git hosting?
[-]
- darthwalsh an hour ago ago
  I read that GitLab was going to be supported too.
- blenderob 18 hours ago ago
  I've got the same question. Anyone knows the answer please?
zahlman a day ago ago
>Using a Trusted Publisher is the easiest way to enable attestations, since they come baked in! See the PyPI user docs and official PyPA publishing action to get started.
For many smaller packages in this top 360 list I could imagine this representing quite a bit of a learning curve.
[-]
- amiga386 a day ago ago
  Or it could see Microsoft tightening its proprietary grip over free software by not only generously offering gratis hosting, but now also it's a Trusted Publisher and you're not - why read those tricky docs? Move all your hosting to Microsoft today, make yourself completely dependent on it, and you'll be rewarded with a green tick!
  [-]
  - zahlman a day ago ago
    Thankfully, the PyPI side of the hosting is done by a smaller, unrelated company (Fastly).
  - simonw a day ago ago
    I think it's a little rude to imply that the people who worked on this are serving an ulterior motive.
    [-]
    - akira2501 a day ago ago
      It's possible they're just naive.
    - Spivak a day ago ago
      Microsoft for sure has an ulterior motive here, and the PyPI devs are serving it. It's not a bad thing, it's a win-win for both parties. That kind of carrot is how you get buy-in from huge companies and in return they do free labor for you that secures your software supply chain.
      [-]
      - woodruffw 16 hours ago ago
        Microsoft was not and is not involved in any way with this work. If you look at the announcements, you'll observe that a different, large, soulless megacorporation that is one of Microsoft's primary competitors funded it.
        If you're going to cast aspersions about hidden motives, you might as well cast them at the right entity.
        [-]
        amiga386 14 hours ago ago
        Would you agree that most of the large, soulless megacorporations want moats? They want extra bureaucracy or complexity or other costs, which they as a behemoth can throw people and money at, but would bury a scrappy upstart challenger. It's the same with e.g. Google and HTML standards; Microsoft and document standards; Amazon and cloud APIs; IBM/Microsoft and patents.
        If that's the case, it doesn't matter which megacorporation sponsors the work; it benefits all of them.
        I'm not saying this specific security initiative is or isn't worthwhile, that remains to be seen. But whether it was intended or not, it's a windfall to Microsoft for an advocacy website to demand PEP740 when? and the most practical way to apply it right now is Microsoft's proprietary offering.
        Also, if I'm reading this correctly, the docs warn you off setting up your own Trusted Publisher and steer you towards using an existing one... of which there is currently only one choice, Microsoft. That's just an endorsement with extra steps.
        [-]
        woodruffw 13 hours ago ago
        I think, strategically, that all companies want moats. I do not think this will ultimately be anything resembling a meaningful for moat for MSFT (or GOOG), given that it's designed to interoperate with anybody who can run an OIDC IdP.
        As I've said about 50 times in the last day: the only reason GitHub is being used for examples and was the first one enabled is because that's where the overwhelming majority of PyPI upload traffic comes from. It's not a commitment or an endorsement; it's a strictly strategic move to help the largest uploading demographic first.
- simonw a day ago ago
  I think it's pretty hard to get a Python package into the top 360 list while not picking up any maintainers who could climb that learning curve pretty quickly. I wrote my own notes on how to use Trusted Publishers here: https://til.simonwillison.net/pypi/pypi-releases-from-github
  The bigger problem is for projects that aren't hosting on GitHub and using GitHub Actions - I'm sure there are quite a few of those in the top 360.
  I expect that implementing attestations without using the PyPA GitHub Actions script has a much steeper learning curve, at least for the moment.
  [-]
  - St-Clock 17 hours ago ago
    > I think it's pretty hard to get a Python package into the top 360 list while not picking up any maintainers who could climb that learning curve pretty quickly.
    I can speak for experience. Py4J is on that list and getting maintainers is very difficult (for various reasons). Packaging is also not something that naturally attracts contributions.
- woodruffw a day ago ago
  I suspect that most of the packages in the top 360 list are already hosted on GitHub, so this shouldn’t be a leap for many of them. This is one of the reasons we saw Trusted Publishing adopted relatively quickly: it required less work and was trivial to adopt within existing CI workflows.
19 hours ago ago
[deleted]