Hyrumtoken: A Go package to encrypt pagination tokens

(github.com)

64 points | by noleary a day ago ago

53 comments

xg15 a day ago ago
Maybe the deeper message of this is: Pagination tokens are a shitty API and you should really offer something that gives the client more options.
Pagination tokens feel great if you're providing the API, because they let you retain maximum control over API usage and let you do all kinds of optimizations in the back-end. But they are really limiting for the clients and - as we've seen here - often fail to address even basic use cases, such as going back a page or even reliably reloading the same page you're on. (Not even starting about arbitrary seeking.)
This leads to dreaded infinite-scrolling or "click to load more" UIs, which overload my browser and will have me start again from the very beginning should I have to reload the page.
I think it's also very easy to miss for non-technical stakeholders just how constraining this API is. E.g. in the OP's story, I could easily imagine some nontechnical PM assigning tasks like this:
- back-end team: Implement data access API using pagination tokens (because the devs said this is the best way to do it)
- front-end team: Implement the data view UI, including "next", " previous" and "go to page" buttons, linkability and page numbers in the URL. (Because that's what the product owners ordered)
Without having some specific knowledge about how pagination tokens work, it's easy to miss that those requirements are contradictory - unless someone cheats, like in this case the front-end.
[-]
- Ferret7446 15 minutes ago ago
  Pagination is effectively a streaming API. Are all streaming APIs bad?
- mst a day ago ago
  I see the failure mode you describe, but the point here is that if the API _does_ need to support that front-end use case, that should become an explicit part of the API contract.
  I think "start off with an API that supports a minimal set of use cases without it becoming an API that de-facto supports other _accidentally_ and without the backend team ever having committed to that" is a valuable option to have.
- taeric a day ago ago
  It isn't really contradictory to have pages and pagination tokens? Just, at the low level, you want dumb API calls.
  Now, I will assert you are probably better off with index like pages, anyway. Instead of numbered, jump to the "g" records. If you insist on page numbers, not too hard to somewhat precalculate what offset each page could be. Just more work, after all.
  The advantage of tokenized apis is that you can prevent unbounded work on an API call. In particular, if you allow filters, it is easy to build what has to be a full table scan in one call. This keeps you from later having to build a way to cancel a call.
  [-]
  - tossandthrow 13 hours ago ago
    Offsets only make sense when you have ordered lists.
    For large scale applications your really can't think about a global total order.
    This is the reason why a list of pages should not be used with large amounts of data.
    Also: It is a poor design choice. If you have users that see themselves go to page X, then they don't have the filtering they need.
    [-]
    - taeric 13 hours ago ago
      Agreed. Is why I asserted numbered pages probably aren't best. Still, I know it is a popular ask.
- tossandthrow 13 hours ago ago
  What is the issue in previous page using pagination tokens? We use that. On all page lookups we provide 2 tokens: next and previous page.
  > reliably reloading the same page you're on
  How would you do that _without_ pagination tokens? Granted - it requires the data for that page is stable. Naturally page 3 of a list ordered by insertion order with a lot of additions will not stay stable.
  The core is that the page metaphor only makes sense in a very limited cases - much more limited than would the regular UX designer thinks.
- groestl a day ago ago
  > But they are really limiting for the clients and - as we've seen here - often fail to address even basic use cases, such as going back a page or even reliably reloading the same page you're on.
  That's not really true. If this level of consistency is required, which it hardly is, then there's solutions with pagination tokens for both of these requirements.
- alserio a day ago ago
  what's the solution space for these problems?
  [-]
  - mhuffman 20 hours ago ago
    If the UI got you to a page that started with a search of some type, then you can look in your own experiences of how many times have you went to the second page on Amazon, Ebay, or Google search? Probably not often. So the answer is some combination of better search, related recommendations from the items on the remaining pages, and filtering.
    [-]
    - alserio 18 hours ago ago
      I don't agree. As an example a shop and a dashboard are different things. Sometimes you are not looking for an item, but for a pattern, an outlier, or something else. And there are many use cases for a list of something.
taeric a day ago ago
Pagination tokens have been a bane to my existence. I'm basically this backend engineer. Have done these exact steps. Between worry the frontend will decode and annoyance at explaining how tokens work, I am growing to hate working with frontend teams.
Even worse, when I have to explain that they may get an empty result with a token meaning they need to call again. Unbounded service call means nothing, it seems.
Kudos on this project!
nzach a day ago ago
Is this really a problem we need to solve with technology?
I've never seen this happen in the real world. But by the motivation presented in the repo as well as some commentaries in this thread I got the impression this is a miscommunication issue.
I my experience most mid-level engineers are fully aware about the differences between "pagination token" and "page/size" based pagination strategies.
So when this kind of situation happens I get the impression that both teams were never in the same page. Maybe someone wrote "the api needs to have pagination" in a design doc and nobody bothered to ask for the details. To make things worse this problems only appear when we start to make some e2e tests, but by that point the deadline is generally pretty close. And this creates a big incentive for people to get creative in order to deliver on time.
[-]
- numbsafari a day ago ago
  When you are presenting a public API… you are almost by definition never in the same room with the folks consuming your API. More to the point, you probably don’t have an agreed upon contract, beyond your terms of service page and internal SLOs.
nesarkvechnep a day ago ago
I really wish HATEOAS clients were the norm instead of concatenating strings like cavemen in order to obtain a URL. You could send the prev, next, first, and last relations of your collection resource. You could also send a templated link so the clients could still jump to whatever page they want.
[-]
- xg15 a day ago ago
  The thing that always puzzled me about HATEOAS was that it basically treats the computer like a human - like yes, if I'm manually exploring an API via curl, the "relation" links can be immensely helpful (sometimes - often, the bare relation names can't be understood on their own).
  But if I'm having some script or app consume the API, then this script will have to "know in advance" how the API works anyway, and the "self-discoverability" of HATEOAS doesn't bring any benefit.
  (There is the idea, that we could at some point have a rich set of standardized relation types and a class of generic clients that could make use of them. But as with the Semantic Web, the incentives seem to be misaligned to make that dream a reality)
  But interestingly, with LLMs, the picture could change - because LLMs could actually explore the API "like a human" and even read plain-english apidocs of nonstandard relation types. So maybe some sort of LLM-augmented generic client could actually fulfill the role of the elusive "autonomous agent" that HATEOAS and Semantic Web people are designing for.
  [-]
  - Joker_vD a day ago ago
    > then this script will have to "know in advance" how the API works anyway, and the "self-discoverability" of HATEOAS doesn't bring any benefit.
    Tentatively, it allows better flexibility in non-entrypoint URIs, by adding a layer of indirection: instead of hardcoding URLs itself, you hardcode the name of the fields in the response that will contain actual URIs to follow.
    But then there is a whole crowd of people who claim that URIs should never change anyhow, for any reason, in the first place.
    [-]
    - xg15 a day ago ago
      That makes sense, but then it also trusts that every client will "play by the rules" - first do the "discovery" request, then use the discovered URL to do the real request - when they could just "cheat" and hardcode the URL.
      I think, again the incentives are misaligned, because hardcoding would be beneficial for the client devs in several ways: It's simpler to code, halves the number of requests at runtime and gets rid of several edge cases that you'd have to deal with - e.g. what happens if the discovered URL points to a different domain than the entrypoint?
      It makes sense from the server's POV, but because of this, I see a hard time to convince client devs to adopt it.
      [-]
      - Joker_vD a day ago ago
        > e.g. what happens if the discovered URL points to a different domain than the entrypoint?
        You follow it. Not sure why even mention it: having a web of URLs spread between e.g. contoso-streaming.tv, api.contoso.com, login.contoso.org, etc. is nothing special these days.
  - nesarkvechnep 21 hours ago ago
    Clients should know about the relations, not the links themselves. If a relation isn't an IANA one, there should be a way to discover how to deal with it. It must know how to display and process the prev relation. If the client doesn't know about a relation, it just ignores it.
unsnap_biceps a day ago ago
Looks pretty cool, however, I'm curious about the usage of a pointer to the key slice. A slice is just two ints and a pointer, so the copy semantics is pretty cheap, and one less pointer for the GC to clean up later.
Is there some reason why you choose to pass a pointer to the slice?
[-]
- silisili a day ago ago
  You're correct about slice semantics but key here is actually a Go array, not a slice. So it's passing a pointer instead of an entire array.
  [-]
  - unsnap_biceps a day ago ago
    Ahh, thank you. Still very new to go :)
tossandthrow a day ago ago
I have been in this situation also. The biggest issue, that took several meetings, was getting the ux team to understand that they can not index the pages as we do not know their tokens.
[-]
- taeric a day ago ago
  If you have found good links to explain this to other teams, I'd love to see them.
rendall a day ago ago
I don't really agree with the entire premise. At first I read about this trying to understand what the security implications are of obscured pagination tokens. Why would one want to obfuscate pagination tokens from hackers?
Turns out, it's to make things more opaque for one's own team ::scream emoji::
This is simply not a good way to work, sorry.
Either implement paging the way the frontend team wants and expects it to work, or take the time to explain in a clear and friendly why you cannot (i.e. the data is not structured as pages, specs were clear and rigid, whatever). With healthy interteam communication, there won't be a need to obfuscate pagination tokens in this manner.
It reads like OP built something as a backend engineer without reference to client or user needs and then threw it over the wall to the frontend team. Not good.
[-]
- mst a day ago ago
  > or take the time to explain in a clear and friendly why you cannot
  The next person to work on the frontend paging code will likely look at the existing code and infer the contract from there. Doesn't matter how healthy your inteream communication is, humans just straight up don't work like that.
  The documentation already says you have to use the nextToken from the current request - adding more words to the effect of "no, really, you do *have* to use the nextToken from the current request" is sadly unlikely to help in practice.
  It isn't about making things more opaque for one's own team, it's about stopping people *accidentally* doing something unsupported, which is if anything a kindness to your colleagues since they won't end up with an apparently working feature that will break unexpectedly later.
  OP is aiming minimise unpredicted future pain for his team, his colleagues in the front-end team, and their users.
  Your point of view would be absolutely correct in a world where "everybody would just" ... but humans, as a species, don't "just" - so code accordingly.
  [-]
  - rendall a day ago ago
    > The documentation already says you have to use the nextToken from the current request - adding more words to the effect of "no, really, you do have* to use the nextToken from the current request" is sadly unlikely to help in practice.*
    Agreed. That is not a good fix, either.
    Unfortunately, if I understood the situation, there was a disconnect between what the frontend team needed or expected (pagesize, offset), and what the backend engineer could or would deliver (next page). Rather than address this primary issue (lack of understanding), OP adds a technical fix to address what is fundamentally a human, "soft skills" problem.
    It is almost certainly not OP's fault, but a lack of managerial guidance of team dynamics.
    In my experience, separating teams into frontend and backend silos does not lead to good outcomes. Each team begins to see the other as a clueless adversary. But even if necessary, there must be strong efforts to unify their incentives and understanding.
    So, I disagree that this fix is going to fix what's wrong.
    [-]
    - mst a day ago ago
      I don't think you quite did understand - the frontend team didn't actually -need- that, they just happened to have a prebuilt UI component that -used- that, reached for that (entirely understandably, to my mind) without considering if it was the right thing to use there, then reverse engineered and abused the API to make it work.
      What OP did was make a technical change to make that abuse impossible so that the conversation as to whether it was something the UI actually -did- need or if the UI should simply be written to expose the intended capabilities of the API happened naturally, as early as possible in the process.
      You can't use technical measures to *fix* human problems, but in this case the technical measure exists to *surface* the human problem so it can then be resolved between the humans in question.
      This is IMO a far better approach than the alternative, which would boil down to micromanaging the UI team's choices in a way that would slow everybody down and likely *would* produce the adversarial dynamic you describe.
      [-]
      - rendall a day ago ago
        > ...they just happened to have a prebuilt UI component
        With respect, that's not what is described. OP rolled an API, handed documentation to frontend, and "after a week" frontend "came up" with a UI that held different expectations for how pagination works than the API actually worked. I'm being charitable in assuming that there was a good reason that this pagesize, offset pagination style could not work, but as written, there was no communication about that, neither to us the audience nor to OP's colleagues. In fact, frontend apparently expected this.
        But why not discuss offset, pagination with colleagues? Why just present it as fait accompli and move on to create a library that further locks it in?
        [-]
        alejo a day ago ago
        Maybe this may help. What if we are not talking internal development teams but something different, like a commercial/public API?
        In those cases you cannot affort or expect to have meetings with folks to explian and communicate, and you also can appreciate more the abuse (unintended or not) that tokens can have.
        I particularly liked that OP mentioned about expiration, key rotation and more advanced features you can achieve with his proposal, like switching schemes
        [-]
        rendall 20 hours ago ago
        Agreed: if the situation were completely and totally different to the one described by OP, then yes, different circumstances apply.
- demarq a day ago ago
  Absolutely on point. This is a classic issue of an engineer lacking organisational awareness. You work in a team for the benefit of the business.
  It’s utterly unthinkable that you turn around and tell the business “I can’t show you page 2 of the results. Just because”
  I’ve seen this before and it truly is the worst for everyone else involved.
  [-]
  - rendall a day ago ago
    Speculation here, but given the frontend team reverse engineered OP's API and attempted to work around its limitations, and then engineer countered by taking the time to craft a whole library, with very few words passed between, I suspect this is a long-standing problem.
    However, I don't wholly blame OP. It smells to me like inexperienced or absent leadership.
njtransit a day ago ago
This looks like a very simple wrapper around golang.org/x/crypto/nacl/secretbox
What’s the point of this?
[-]
- maxbond a day ago ago
  To remove/obscure structure from a token so that the structure is not relied upon & can be changed in backwards incompatible ways without disrupting API consumers.
  If you think it's internal details are too simple to justify a dependency, you can vendor or reimplement it, but that's orthogonal to whether it's pointless. The README is pretty detailed & explicit about what the point is.
  [-]
  - njtransit 20 hours ago ago
    But nothing about the API is specific to pagination. This library essentially is just two other API calls: marshal and seal. It can do this operation on any marshal-able type. By using this library, you lose control over marshaling, which seems like a high cost to pay for this very simple and basic functionality.
    [-]
    - maxbond 17 hours ago ago
      Would the library be better if it were more restrictive or more complicated?
      [-]
      - njtransit 13 hours ago ago
        No, the library would be better if it provided some utility. As it stands, it provides negative utility.
        [-]
        maxbond 13 hours ago ago
        You're conflating how it works internally with whether it's useful. You've critiqued is internal details but you haven't engaged with the premise of why you might want to use it.
        A coin is just a metal disk. A dollar bill is just a piece of paper. Neither of them do anything. After adopting them, you lose control of how your cash is represented. And yet we find them useful.
        A simple implementation is a virtue, not an albatross.
        [-]
        njtransit 10 hours ago ago
        No, I'm sorry, you are not correct. I am not conflating internal implementation with whether or not it is useful. Rather, I am evaluating the opportunity cost, i.e. comparing this library to the "next best thing."
        For this library, the next best thing would be to make two simple API calls instead of one simple API call. As a cost, this is very low. However, the "next best thing" also has a number of desirable properties compared to this library: better support for custom serialization and a lower attack surface for supply-chain attacks. When you look at the costs vs. the benefits of this library, the utility is negative.
        Using your example of currency: a paper bill is not just a piece of paper. It's a piece of paper coupled with the vast machinations of a nation state that can enforce its currency via its monopoly on violence. You can't get all the benefits of a $100 bill just by having a green piece of paper.
        [-]
        maxbond an hour ago ago
        It was presumptive of me to tell you what you were thinking, and I apologize.
- tommiegannert a day ago ago
  Huh. Anyone know why the nonce isn't baked into the box upon sealing?
  It's the same in the original: https://nacl.cr.yp.to/secretbox.html
  [-]
  - compressedgas a day ago ago
    The operation doesn't dictate how the nonce is to be conveyed to the recipient.
    [-]
    - tommiegannert 4 hours ago ago
      Yes, but since you never want to reuse the nonce (at least not with the same key; and no one stores nonces for later use), they are 1:1 to the message, suggesting it would have been less error-prone to encode it in the box.
      I had the impression NaCL was about being highly opinionated, so this choice surprised me.
- EdSchouten a day ago ago
  Relatedly, what's the advantage of that secretbox package over calling https://pkg.go.dev/crypto/cipher#NewGCM ?
Xxfireman a day ago ago
“intellectual curiosity of your coworkers demands they base64-parse it.” This is crazy behavior. Creating your own pagination key, assuming it exists, and then putting that in production certainly proves “Hyrum’s law”.
[-]
- Arnavion a day ago ago
  I have a similar story to OP's. I had made a service that provided access to cryptographic keys but did not reveal the key material directly. Instead it had an RPC API for requesting a key "handle" for the key you wanted to use, and API for performing operations like encrypt or sign that took that key handle, performed the operation inside the service and returned the result. The key handle was to be treated as opaque and implemented as a base64-encoded blob containing the key ID and a signature (for tamper-proofing).
  One day a coworker working on another project that would use my service contacted me to complain that the keys from my service were malformed. Turned out they had noticed that the return value was base64-encoded so they assumed it was a base64-encoded key, so they wrote code to base64-decode it and load the result into their language's crypto library to perform those operations directly. They figured that the service's API for doing those operations was just there to be convenient for callers that didn't have access to a crypto library.
  [-]
  - taeric a day ago ago
    We could probably make a drinking club for teams that have been bitten by stuff like this. :)
    [-]
    - groestl a day ago ago
      I'll join :) For past war stories, because these days, I sign parameters that should not be tapered with ;)
- starttoaster a day ago ago
  I don't know that I agree that it's crazy. Any time I see a base64 encoded string, I decode it, because I want to know what's in there and what I'm working with. Don't use b64 if it's something you don't want me to see. Obfuscation isn't even the point of b64, because if it were, their strings would be less instantly recognizable.
  The decoded b64 just being an offset integer is like high school level programming. Of course I'm going to send whatever offset I want and assume that's what the API author is allowing me to do. Especially if I'm in the shoes of a frontend engineer, and my Jira ticket says, "design a pagination UI element that allows the user to select a page of results." Now if that Jira ticket was impossible from the API, I'm going to go to my team and ask if the alternative (the "load more" button element) approach is acceptable or if we should butt heads with backend.
  Decoding b64 isn't crazy, spending billions of dollars on a super computer to crack RSA encryption on a pagination token to discover that it's just an encrypted offset integer is crazy.
- mst a day ago ago
  The author does make a point of giving an example of him perpetrating something equivalent wrt somebody else's API.
  In theory, yes, it's kinda crazy behaviour. In practice I suspect most of us have done something (im)morally equivalent at least once.
crest a day ago ago
Or give your users a proper streaming API for large or partial reponses?