Invisible watermarks is just steganography. Once the exact method of embedding is known it is always possible to corrupt an existing watermark - however in some cases it may not be possible to tell if a watermark is present, such as if the extraction procedure always produces high entropy information even from unwatermaked content.
Watermarking is not just steganography and steganography is not just watermarking
In June 1996, Ross Anderson organized the first workshop dedicated specifically to information hiding at Cambridge University. This event marked the beginning of a long series known as the Information Hiding Workshops, during which foundational terminology for the field was established. Information hiding, i.e., concealing a message within a host content, branches into two main applications: digital watermarking and steganography. In the case of watermarking, hiding means robustly embedding the message, permanently linking it to the content. In the case of steganography, hiding means concealing without leaving any statistically detectable traces.
References:
1. R. J. Anderson, editor. Proc. 1st Intl. Workshop on Inf. Hiding, volume 1174 of LNCS, 1996.
2. B. Pfitzmann: Information hiding terminology - Results of an informal plenary meeting and additional proposals. In Anderson [1], pages 347–350.
Specifically, shuffling compression, bit-rate, encryption, and barely human-perceivable signal around mediums (x-M) to obscure the entrophic/random state of any medium as to not break the generally-available plausible-deniability from a human-perception.
Can't break Shannon's law, but hides who intent of who is behind the knocks on the all doors. Obscures which house Shannon lives in, and whom who knocks wishes to communicate.
the point here is to dissipate it across enough mediums as to be indiscernible from noisy background fluctuations regardless of existence, giving general-deniability to all mediums eventually, thru signal to noise ratio.
all security is just obscurity, eventually, where you are obscuring your private key's semi-prime's factors.
> all security is just obscurity, eventually, where you are obscuring your private key's semi-prime's factors.
This is a lazy take that obscures the definition to uselessness. It’s perpetuated by people who make insecure systems that break when the algorithm is known.
There is a vast gulf between:
- security depends on secret algorithm
- security depends on keeping a personal asymmetric key secret
The latter is trivial to change, it doesn’t compromise the security of others using the scheme, and if it has perfect forward secrecy it doesn’t even compromise past messages.
Please don’t repeat that mantra. You’re doing a disservice to anyone who reads it and ultimately yourself.
All security is obscurity. I think it's laughable that you believe you know what someone does just because they say this. Consider there's many levels of knowledge about a topic and sometimes when you get to a deeper level your conclusion or the labels you use for stuff "flip".
Understanding the differences that you outlined is so basic that a good commenter wouldn't assume they don't know the difference, they are making a deeper point.
When a commenter doesn't know how to even spell the word "steganography", it's quite safe to assume that they don't possess deeper level knowledge and are not making any deeper point about it.
trivial grammar/spelling mistakes are worse than running analogies into the ground without hitting the "context" button, or even the reductio ad absurdium train HN has been on lately.
yes my latin half-Freudian trans-alliterations can be tempting to pick out, i had another tab with stylometry obfuscation described, incident, and mitigated.
also giigles spellcheck sucks ass, and im tired of being gaslit of my word choice/spelling by giigles, who should know every word by now, in all languages
>don't possess deeper level knowledge
umm besides error-correcting codes reducing the bitrate, compression, and random byte padding to fend off correlation/timing attacks, there is no where to hide data, outside of the shannon limit for information thru a medium.
but its easy to hide data you cannot perceive; and everyone being conscious of this feat/fingerprinting, even if barely, does more towards efficacy to deter leaking via second-order "chilling effect" than the aftermath; I.P theft is hard to un-approximate
also stenography, ironically still being the only "real" signature, is still security thru obscurity with more steps; your literal stenographic signature is unique, but not preventable from duplicity, so it is un-obscurable.
if googles' "Add to Dictionary" button worked more than their new 100+ languages i wouldn't felt gaslit by the same words having needed re-googled weekly
But you do have to admit that they know very many big important sounding words, go off on extremely dope tangents ("second order chilling effects!" Fuck yeah!) AND say "giigle" instead of Google, which is a.) super cool (obviously) but I suspect there's b.) a darker reason: they are probably a rouge cryptoanarchist being hunted down by The Algorithm and are only able to survive on the streets because of there every-day-carry RF blocking wallet and screwdriver combo and their ability to outsmart Google, because it hasn't learned all the words yet.
Good luck bro, continuing to obscure the entropic state of the x-M medium and remain plausibly deniable. Shannon in the (his?) house, mothafucka! Stenography FTW!
in the context of preventing leaks: if/when this nears ubiquity, the first ID'ing of leaks will obviously lead to the second effect of deterring further leaks.
>their ability to outsmart Google
it knows all the words: that is why i should not had had to had reminded it incessantly.
>continuing to obscure the entropic state of the x-M medium and remain plausibly deniable
lemme draw this out cuz you seem intimidated with simple abstractions.
imagine a three page power-point composed of Header, Text, companyLogo, no other data, aside from inclination from the plane.
under the plausibility presumption the header and text and company logo cannot be within ~15
interval degrees from the plane, you only have a state-space of so many combinations, which puts a hard limit (Shannon's) on the medium's maximum signal/noise ratio.
assuming people cannot collude to delineate between copies, they arent going to be able to perceive subtle shifts in the inclination/position/font/inclusion/exclusion of elements.
however more generally, this key-space needed for the LEAKER_ID wont be much larger (in magnitude) than the user pool of potential leakers, with a simple CRC for resiliency.
There is some nice information in the appendix, like:
“One training with a schedule similar to the one reported in the paper represents ≈ 30 GPU-days. We also roughly estimate that the total GPU-days used for running all our experiments to 5000, or ≈ 120k GPU-hours. This amounts to total emissions in the order of 20 tons of CO2eq.”
I am not in AI at all, so I have no clue how bad this is. But it’s nice to have some idea of the costs of such projects is.
Assuming you're purchasing from someone with infinite carbon credits and you're spending it in an environment with infinite ability to re-sink the carbon. Sure.
To a more and lesser degree depending on the action, I try to apply "that rigor" to myself, at least?
And yes, I think the world would be better off if more people considered how their decisions impact others, if that's what you're getting at, but it's unrealistic to expect everyone to care about other people - and of course entirely impossible to account for ALL variables.
How do you come up with a ratio that you consider a fair trade?
I'm really not sure how I'd personally set a metric to decide it. I could go with the stat that one barrel of oil is equivalent to 25,000 hours of human labor. That means each barrel is worth 12.5 years of labor at 40 hours per week. That seems outrageous though - off hand I don't know how many barrels would be used during the flight but it would have to be replacing way more than several engineers working for several years.
> 1. Different energy sources produce varyings of co2
Yes.
> 2. This likely does not include co2 to make the GPUs or machines
Definitely not, nobody does that.
Wish they did, in general I feel like a lot of beliefs around sustainability and environmentalism are wrong or backwards precisely because embodied energy is discounted; see e.g. stats on western nations getting cleaner, where a large - if not primary - driver of improved stats is just outsourcing manufacturing, so emissions are attributed to someone else.
Anyway, embodied energy isn't particularly useful here. Energy embodied in GPUs and machines amortizes over their lifetimes and should be counted against all the things those GPUs did, do and will do, of which the training in question is just a small part. Not including it isolates the analysis to contributions from the specific task per se, and makes the results applicable to different hardware/scenarios.
> 3. Humans involved are not added to this at all, and all of the impact they have on the environment
This metric is so ill-defined as to be arbitrary. Even more so with conjunction with 2, as you could plausibly include a million people into it.
> 4. No ability to predict future co2 from using this work.
Total, no. Contribution of compute alone given similar GPU-hours per ton of CO2eq, yes.
1. yes, this is the default co2 eq/ watts from the tool that is cited in the paper, but it's actually very hard to know the source of energy that aliments the cluster, so the numbers are only an order of magnitude rather than "real" numbers
2. 4. I found that https://huggingface.co/blog/sasha/ai-environment-primer gives a good broad overview (not only of the co2 eq, which is limited imo) of AI environmental impact
> Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?
Although it may not the best way to quantify it, it gives a good overview of it. I would argue that it matters a lot to quantify and popularize the idea of such sections in any experimental ML papers (and should in my opinion be the default, as it is now for the reproducibility statement and ethical statement).
People don't really know what an AI experiment represents. It may seem very abstract since everything happens in the "cloud", but it is pretty much physical: the clusters, the water consumption, the energy. And as someone who works in AI, I believe it's important to know what this represents, which these kinds of sections show clearly. It was the same in the DINOv2 paper or in the Llama paper.
But let’s say you were able to see it all somehow. Your lab was also the data center, powerplant, etc. You see the fans spinning, the turbines moving, and exhaust coming out. Do you change what you do? Or do you look around, see all the others doing the same and just say welp this is the tragedy of the commons.
I think it’s clear that people generally want to move to clean energy, and use less energy as a whole. That’s a gradual path. Maybe this reinforces the thinking, but ultimately you’re still causing damage. If you really truly cared about the damage, why would you do it at all?
I’m not a big fan of lip service. Just like all these land acknowledgements. Is a criminal more “ethical” if they say “I know I’m stealing from you” as they mug you? If you cared, give back your land and move elsewhere!
yes I agree...
But personally I do wonder what is best between (1) leaving without any impact on the rest of the herd, or (2) trying to be careful about what you do, raise awareness and try to move the herd in the good direction. I would personally go for (2) since usually the scale of these papers is still o(LLM training).
so say i have a site with 3000 images, 2M pixel each. How many GPU-months it would take to mark them? And, what gigabytes i would have to keep for the model?
The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.
Yes, although the number of parameters is not directly linked with the flops/speed of inference. What's nice about this AE architecture is that most of the compute (message embedding, and merging) is done at low resolution, same idea as behind latent diffusion models
I wonder what will come of all the creative technologists out there, trying to raise money to do "Watermarking" or "Human Authenticity Badge," when Meta will just do all the hard parts for free: both the technology of robust watermarking, and building an insurmountable social media network that can adopt it unilaterally.
Various previous attempts at invisible/imperceptible/mostly imperceptible watermarking have been trivially defeated, this attempt claims to be more robust to various kinds of edits. (From the paper: various geometric edits like rotations or crops, various valuemetric edits like blurs or brightness changes, and various splicing edits like cutting parts of the image into a new one or inpainting.) Invisible watermarking is useful for tracing origins of content. That might be copyright information, or AI service information, or photoshop information, or unique ID information to trace leakers of video game demos / films, or (until the local hardware key is extracted) a form of proof that an image came from a particular camera...
... Ideal for a repressive government or just a mildly corrupt government agency / corporate body to use to identify defectors, leakers, whistleblowers, or other dissidents. (Digital image sensors effectively already mark their output due to randomness of semiconductor manufacturing, and that has already been used by abovementioned actors for the abovementioned purposes. But that at least is difficult.) Tell me with a straight face that a culture that produced Chat Control or attempted to track forwarding chains of chat messages[1] won’t mandate device-unique watermarks kept on file by the communications regulator. And those are the more liberal governments by today’s standards.
I’m surprised how eager people are to build this kind of tech. It was quite a scandal (if ultimately a fruitless one) when it came out colour printers marked their output with unique identifiers; and now that generative AI is a thing stuff like TFA is seen as virtuous somehow. Can we maybe not forget about humans?..
[1] I don’t remember where I read about the latter or which country it was about—maybe India?
That’s like asking why a fair and just executive shouldn’t be interested in eliminating the overhead of an independent judiciary. Synchronically, it should. Diachronically, that’s one of the things that ensures that it remains fair and just. Similarly for transparency and leakers, though we usually call those leakers “sources speaking on condition of anonymity” or some such. (It does mean that the continued transparency of a modern democratic government depends on people’s continual perpetration of—for the most part—mildly illegal acts. Make of that what you will.)
Both can be true! This is essentially making it easier to do [x] argument, which itself is essentially security through obscurity.
It was always possible to do watermark everything: any nearly-imperceptible bit can be used to encode data that can be used overtly.
Now enabling everyone everywhere to do it and integrate it may have second-order effects that were opposite of one's intention.
It is very convenient thing, for no one to trust what they can see. Unless it was Validated (D) by the Gubmint (R), it is inscrutable and unfalsifiable.
I stopped myself from making the printer analogy, but of course it's relevant, as is the fact that few seem to care. I personally hope some group strikes back to sanitize images watermarked this way, with no more difficulty than removing exif data.
In my previous experience the "resizing & rotate" always defeats all kinds of watermarks. For example, crop a 1000x1000 image to 999x999, and rotate it by 1°
also there's "double watermark" attack, just run the result image through the watermark process again, usually the original watermark would be lost
Yeah, so it's impressive if this repo does what it claims and is robust to such manipulations.
I tried to run it but of course it failed with
NVIDIA GeForce RTX 4090 with CUDA capability sm_89 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 4090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
I was curious, but not curious enough to deal with this crap even if it's rather simple. God I hate everything about the modern ML ecosystem with python, pip, conda, cuda, pytorch, tensorflow (more rare now), notebooks, just-run-it-in-the-cloud...
And they'll say it's to combat disinformation, but it'll actually be to help themselves filter AI generated content out of new AI training datasets so their models don't get Habsburg'd.
I was reading an article lately about how a lot of that was really just immensely dumb luck on their inbreeding part - that is, they ended up picking just exactly the worst sort
What if the watermark becomes a latent variable that's indirectly learnt by a subsequent model trained on its generated data? They will have to constantly vary the mark to keep it up to date. Are we going to see Merkle tree watermark database like we see for certificate transparency? YC, here's your new startup idea.
I can imagine some kind of public/private key encrypted watermark system to ensure the veracity / provenance of media created via LLMs and their associated user accounts.
There's many reasons why people are concerned about AI's training data becoming AI generated. The usual one is that the training will diverge, but this is another good one.
Camera makers are all working on adding cryptographic signatures to captured images to prove their provenance. The current standard embeds this in metadata, but if they start watermarking the images themselves then skipping watermarked images during training would quickly become an issue
If every medium becomes editable like text, i don't see why it should be possible to watermark images or video any easier than text.
Images have the aliasing problem, which is NP-hard, but aliasing gets close to 100% correct after editing an image just by cutting shapes, and throw it in an image generator to create a new one with 99% similarity. In Stable Diffusion XL it need 70% similarity or something like that. The new image will be very similar to the old one with correct aliasing, but edited as much as you like.
When you generate a text with an LLM, you always have some choice. So you can sample in a way that is very likely under your watermark scheme, and unlikely otherwise
but that's what I'm saying, When you ask for exclamation marks after each word that must change the likelihoods of next token by quite a bit. You then remove the marks which hides the fact that you just changed every word without loss of meaning
Invisible watermarks is just steganography. Once the exact method of embedding is known it is always possible to corrupt an existing watermark - however in some cases it may not be possible to tell if a watermark is present, such as if the extraction procedure always produces high entropy information even from unwatermaked content.
Watermarking is not just steganography and steganography is not just watermarking
In June 1996, Ross Anderson organized the first workshop dedicated specifically to information hiding at Cambridge University. This event marked the beginning of a long series known as the Information Hiding Workshops, during which foundational terminology for the field was established. Information hiding, i.e., concealing a message within a host content, branches into two main applications: digital watermarking and steganography. In the case of watermarking, hiding means robustly embedding the message, permanently linking it to the content. In the case of steganography, hiding means concealing without leaving any statistically detectable traces.
References: 1. R. J. Anderson, editor. Proc. 1st Intl. Workshop on Inf. Hiding, volume 1174 of LNCS, 1996. 2. B. Pfitzmann: Information hiding terminology - Results of an informal plenary meeting and additional proposals. In Anderson [1], pages 347–350.
[x] is just [y] with more steps
Stenography is just security by more obscurity.
Specifically, shuffling compression, bit-rate, encryption, and barely human-perceivable signal around mediums (x-M) to obscure the entrophic/random state of any medium as to not break the generally-available plausible-deniability from a human-perception.
Can't break Shannon's law, but hides who intent of who is behind the knocks on the all doors. Obscures which house Shannon lives in, and whom who knocks wishes to communicate.
> Stenography is just security by more obscurity
Security-by-obscurity is when security hinges on keeping your algorithm itself (as opposed to some key) hidden from the adversary.
I don't see how it has any connnection with what you're alluding to here.
the point here is to dissipate it across enough mediums as to be indiscernible from noisy background fluctuations regardless of existence, giving general-deniability to all mediums eventually, thru signal to noise ratio.
all security is just obscurity, eventually, where you are obscuring your private key's semi-prime's factors.
> all security is just obscurity, eventually, where you are obscuring your private key's semi-prime's factors.
This is a lazy take that obscures the definition to uselessness. It’s perpetuated by people who make insecure systems that break when the algorithm is known.
There is a vast gulf between:
- security depends on secret algorithm
- security depends on keeping a personal asymmetric key secret
The latter is trivial to change, it doesn’t compromise the security of others using the scheme, and if it has perfect forward secrecy it doesn’t even compromise past messages.
Please don’t repeat that mantra. You’re doing a disservice to anyone who reads it and ultimately yourself.
All security is obscurity. I think it's laughable that you believe you know what someone does just because they say this. Consider there's many levels of knowledge about a topic and sometimes when you get to a deeper level your conclusion or the labels you use for stuff "flip".
Understanding the differences that you outlined is so basic that a good commenter wouldn't assume they don't know the difference, they are making a deeper point.
No, anyone who knows more than a surface level understands the difference between these and doesn’t muddle them.
What you’re doing is the equivalent of saying there is no difference between a parachute and an airplane.
You should deepen your reading comprehension.
When a commenter doesn't know how to even spell the word "steganography", it's quite safe to assume that they don't possess deeper level knowledge and are not making any deeper point about it.
I assumed the numerous too-obvious errors were some form of code for information hiding.
et tú, too?
trivial grammar/spelling mistakes are worse than running analogies into the ground without hitting the "context" button, or even the reductio ad absurdium train HN has been on lately.
yes my latin half-Freudian trans-alliterations can be tempting to pick out, i had another tab with stylometry obfuscation described, incident, and mitigated.
also giigles spellcheck sucks ass, and im tired of being gaslit of my word choice/spelling by giigles, who should know every word by now, in all languages
umm besides error-correcting codes reducing the bitrate, compression, and random byte padding to fend off correlation/timing attacks, there is no where to hide data, outside of the shannon limit for information thru a medium.but its easy to hide data you cannot perceive; and everyone being conscious of this feat/fingerprinting, even if barely, does more towards efficacy to deter leaking via second-order "chilling effect" than the aftermath; I.P theft is hard to un-approximate
also stenography, ironically still being the only "real" signature, is still security thru obscurity with more steps; your literal stenographic signature is unique, but not preventable from duplicity, so it is un-obscurable.
also i know rsa != ECC plz dont
A person that experiences correction and criticism as gaslighting has serious mental health issues. Talk to a therapist. Get help.
if googles' "Add to Dictionary" button worked more than their new 100+ languages i wouldn't felt gaslit by the same words having needed re-googled weekly
But you do have to admit that they know very many big important sounding words, go off on extremely dope tangents ("second order chilling effects!" Fuck yeah!) AND say "giigle" instead of Google, which is a.) super cool (obviously) but I suspect there's b.) a darker reason: they are probably a rouge cryptoanarchist being hunted down by The Algorithm and are only able to survive on the streets because of there every-day-carry RF blocking wallet and screwdriver combo and their ability to outsmart Google, because it hasn't learned all the words yet.
Good luck bro, continuing to obscure the entropic state of the x-M medium and remain plausibly deniable. Shannon in the (his?) house, mothafucka! Stenography FTW!
imagine a three page power-point composed of Header, Text, companyLogo, no other data, aside from inclination from the plane.
under the plausibility presumption the header and text and company logo cannot be within ~15 interval degrees from the plane, you only have a state-space of so many combinations, which puts a hard limit (Shannon's) on the medium's maximum signal/noise ratio.
assuming people cannot collude to delineate between copies, they arent going to be able to perceive subtle shifts in the inclination/position/font/inclusion/exclusion of elements.
however more generally, this key-space needed for the LEAKER_ID wont be much larger (in magnitude) than the user pool of potential leakers, with a simple CRC for resiliency.
Note that stenography is very different from steganography.
https://en.wikipedia.org/wiki/Stenography
https://en.wikipedia.org/wiki/Steganography
ga!
I was thinking that too. This seems like a useful tool for a secret communication protocol.
Link to the paper in the README is broken. I believe this is the correct link to the referenced paper: https://arxiv.org/abs/2411.07231
There is some nice information in the appendix, like:
“One training with a schedule similar to the one reported in the paper represents ≈ 30 GPU-days. We also roughly estimate that the total GPU-days used for running all our experiments to 5000, or ≈ 120k GPU-hours. This amounts to total emissions in the order of 20 tons of CO2eq.”
I am not in AI at all, so I have no clue how bad this is. But it’s nice to have some idea of the costs of such projects is.
> This amounts to total emissions in the order of 20 tons of CO2eq.
That's about 33 economy class roundtrip flights from LAX to JFK.
https://www.icao.int/environmental-protection/Carbonoffset/P...
33 seats on a flight maybe. It's about one passenger aircraft flight, one way.
And it has produced a system superior to several engineers working full time for several years.
Seems like a fair carbon trade.
Assuming you're purchasing from someone with infinite carbon credits and you're spending it in an environment with infinite ability to re-sink the carbon. Sure.
Are you applying that same rigor to every action people undertake daily?
To a more and lesser degree depending on the action, I try to apply "that rigor" to myself, at least?
And yes, I think the world would be better off if more people considered how their decisions impact others, if that's what you're getting at, but it's unrealistic to expect everyone to care about other people - and of course entirely impossible to account for ALL variables.
But is it a trade? Feels additive. Assuming same engineers will continue spending their carbon budget elsewhere ...
> Seems like a fair carbon trade.
How do you come up with a ratio that you consider a fair trade?
I'm really not sure how I'd personally set a metric to decide it. I could go with the stat that one barrel of oil is equivalent to 25,000 hours of human labor. That means each barrel is worth 12.5 years of labor at 40 hours per week. That seems outrageous though - off hand I don't know how many barrels would be used during the flight but it would have to be replacing way more than several engineers working for several years.
> That seems outrageous though
There's a good reason oil is so hard to give up. [6.1 GJ worth of crude oil](https://en.wikipedia.org/wiki/Barrel_of_oil_equivalent) costs about $70 USD.
Barrel of oil is currently $70 which is 10 person-hours at minimum wage.
I guess you could get number like that if you are comparing the energy output. But that is a weird way to do it since we don't use people for energy.
Only if it's actually used.... hard to imagine this has much use to begin with
It’s very interesting this is gpu time based because:
1. Different energy sources produce varyings of co2
2. This likely does not include co2 to make the GPUs or machines
3. Humans involved are not added to this at all, and all of the impact they have on the environment
4. No ability to predict future co2 from using this work.
Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?
> 1. Different energy sources produce varyings of co2
Yes.
> 2. This likely does not include co2 to make the GPUs or machines
Definitely not, nobody does that.
Wish they did, in general I feel like a lot of beliefs around sustainability and environmentalism are wrong or backwards precisely because embodied energy is discounted; see e.g. stats on western nations getting cleaner, where a large - if not primary - driver of improved stats is just outsourcing manufacturing, so emissions are attributed to someone else.
Anyway, embodied energy isn't particularly useful here. Energy embodied in GPUs and machines amortizes over their lifetimes and should be counted against all the things those GPUs did, do and will do, of which the training in question is just a small part. Not including it isolates the analysis to contributions from the specific task per se, and makes the results applicable to different hardware/scenarios.
> 3. Humans involved are not added to this at all, and all of the impact they have on the environment
This metric is so ill-defined as to be arbitrary. Even more so with conjunction with 2, as you could plausibly include a million people into it.
> 4. No ability to predict future co2 from using this work.
Total, no. Contribution of compute alone given similar GPU-hours per ton of CO2eq, yes.
>Definitely not, nobody does that.
Except every proper Life-cycle assessment on carbon emissions ever.
not sure how that invalidates Algernon's point. These things should be considered, and are in a lot of LCAs.
Just define "proper" to mean "it is an analysis that considers the whole supply chain and would pass academic peer review".
Those would count toward “Scope 3” emissions, right?
https://www.mckinsey.com/featured-insights/mckinsey-explaine...
1. yes, this is the default co2 eq/ watts from the tool that is cited in the paper, but it's actually very hard to know the source of energy that aliments the cluster, so the numbers are only an order of magnitude rather than "real" numbers 2. 4. I found that https://huggingface.co/blog/sasha/ai-environment-primer gives a good broad overview (not only of the co2 eq, which is limited imo) of AI environmental impact
> Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?
Although it may not the best way to quantify it, it gives a good overview of it. I would argue that it matters a lot to quantify and popularize the idea of such sections in any experimental ML papers (and should in my opinion be the default, as it is now for the reproducibility statement and ethical statement). People don't really know what an AI experiment represents. It may seem very abstract since everything happens in the "cloud", but it is pretty much physical: the clusters, the water consumption, the energy. And as someone who works in AI, I believe it's important to know what this represents, which these kinds of sections show clearly. It was the same in the DINOv2 paper or in the Llama paper.
But let’s say you were able to see it all somehow. Your lab was also the data center, powerplant, etc. You see the fans spinning, the turbines moving, and exhaust coming out. Do you change what you do? Or do you look around, see all the others doing the same and just say welp this is the tragedy of the commons.
I think it’s clear that people generally want to move to clean energy, and use less energy as a whole. That’s a gradual path. Maybe this reinforces the thinking, but ultimately you’re still causing damage. If you really truly cared about the damage, why would you do it at all?
I’m not a big fan of lip service. Just like all these land acknowledgements. Is a criminal more “ethical” if they say “I know I’m stealing from you” as they mug you? If you cared, give back your land and move elsewhere!
yes I agree... But personally I do wonder what is best between (1) leaving without any impact on the rest of the herd, or (2) trying to be careful about what you do, raise awareness and try to move the herd in the good direction. I would personally go for (2) since usually the scale of these papers is still o(LLM training).
so say i have a site with 3000 images, 2M pixel each. How many GPU-months it would take to mark them? And, what gigabytes i would have to keep for the model?
That amount of compute was used for training. For inference (applying the watermarks), hopefully no more than a few seconds per image.
Llama 3 70B took 6.4M GPU hours to train, emitting 1900 tons of CO2 equivalent.
Thanks! I was not at all aware of the scale of training! To me those are crazy amounts of gpu time and resources.
The amounts of gpu time in the paper are for all experiments, not just training the last model that is OSS (which is usually reported). People don't just oneshot the final model.
The embedder is only 1.1M parameters, so it should run extremely fast.
Yes, although the number of parameters is not directly linked with the flops/speed of inference. What's nice about this AE architecture is that most of the compute (message embedding, and merging) is done at low resolution, same idea as behind latent diffusion models
I wonder what will come of all the creative technologists out there, trying to raise money to do "Watermarking" or "Human Authenticity Badge," when Meta will just do all the hard parts for free: both the technology of robust watermarking, and building an insurmountable social media network that can adopt it unilaterally.
How do you think they trained their image AI? Instagram.
How was copilot trained? Github.
Zoom, others would love to use your data to train their AI. It’s their proprietary advantage!
It is called DRM codecs, and that has been around for 30+ years.
We did consider a similar FOSS project, but didn't like the idea of helping professional thieves abusing dmca rules.
Have a nice day. =3
Is this a big deal? I'm a layman here so this seems like a needed product but I have a feeling I'm missing something.
Various previous attempts at invisible/imperceptible/mostly imperceptible watermarking have been trivially defeated, this attempt claims to be more robust to various kinds of edits. (From the paper: various geometric edits like rotations or crops, various valuemetric edits like blurs or brightness changes, and various splicing edits like cutting parts of the image into a new one or inpainting.) Invisible watermarking is useful for tracing origins of content. That might be copyright information, or AI service information, or photoshop information, or unique ID information to trace leakers of video game demos / films, or (until the local hardware key is extracted) a form of proof that an image came from a particular camera...
... Ideal for a repressive government or just a mildly corrupt government agency / corporate body to use to identify defectors, leakers, whistleblowers, or other dissidents. (Digital image sensors effectively already mark their output due to randomness of semiconductor manufacturing, and that has already been used by abovementioned actors for the abovementioned purposes. But that at least is difficult.) Tell me with a straight face that a culture that produced Chat Control or attempted to track forwarding chains of chat messages[1] won’t mandate device-unique watermarks kept on file by the communications regulator. And those are the more liberal governments by today’s standards.
I’m surprised how eager people are to build this kind of tech. It was quite a scandal (if ultimately a fruitless one) when it came out colour printers marked their output with unique identifiers; and now that generative AI is a thing stuff like TFA is seen as virtuous somehow. Can we maybe not forget about humans?..
[1] I don’t remember where I read about the latter or which country it was about—maybe India?
> ... for a repressive government ...
Why shouldn't a virtuous and transparent government (should one materialize somehow, somewhere) be interested in identifying leakers?
That’s like asking why a fair and just executive shouldn’t be interested in eliminating the overhead of an independent judiciary. Synchronically, it should. Diachronically, that’s one of the things that ensures that it remains fair and just. Similarly for transparency and leakers, though we usually call those leakers “sources speaking on condition of anonymity” or some such. (It does mean that the continued transparency of a modern democratic government depends on people’s continual perpetration of—for the most part—mildly illegal acts. Make of that what you will.)
Both can be true! This is essentially making it easier to do [x] argument, which itself is essentially security through obscurity.
It was always possible to do watermark everything: any nearly-imperceptible bit can be used to encode data that can be used overtly.
Now enabling everyone everywhere to do it and integrate it may have second-order effects that were opposite of one's intention.
It is very convenient thing, for no one to trust what they can see. Unless it was Validated (D) by the Gubmint (R), it is inscrutable and unfalsifiable.
If they are transparent, what is leaking?
There is always a need for _some_ secrets to be kept. At the very least from external adversaries.
> Why shouldn't a virtuous and transparent government
That doesn't exist.
The parent comment says that it has dangerous use-cases, not that it does not have desirable ones.
I stopped myself from making the printer analogy, but of course it's relevant, as is the fact that few seem to care. I personally hope some group strikes back to sanitize images watermarked this way, with no more difficulty than removing exif data.
In my previous experience the "resizing & rotate" always defeats all kinds of watermarks. For example, crop a 1000x1000 image to 999x999, and rotate it by 1°
also there's "double watermark" attack, just run the result image through the watermark process again, usually the original watermark would be lost
Yeah, so it's impressive if this repo does what it claims and is robust to such manipulations.
I tried to run it but of course it failed with
I was curious, but not curious enough to deal with this crap even if it's rather simple. God I hate everything about the modern ML ecosystem with python, pip, conda, cuda, pytorch, tensorflow (more rare now), notebooks, just-run-it-in-the-cloud...Use the google colab link https://colab.research.google.com/github/facebookresearch/wa...
everything is installed directly in the colab
My assumption is that this will be used to watermark images coming out of cloud-based generative AI.
And they'll say it's to combat disinformation, but it'll actually be to help themselves filter AI generated content out of new AI training datasets so their models don't get Habsburg'd.
> their models don't get Habsburg'd.
You mean develop a magnificent jawline, or continue to influence Austrian politics?
I was reading an article lately about how a lot of that was really just immensely dumb luck on their inbreeding part - that is, they ended up picking just exactly the worst sort
How do they still influence Austrian politics? Do you have any links or sources? I'm genuinely curious!
I wondered why they'd be doing this NOW and this makes perfect sense!!
>so their models don't get Habsburg'd.
Nice metaphor
Why? Those are not copyrightable.
Because downstream consumers of the media might want to know if an image has been created or manipulated by AI tools.
hardware self-evident modules to essentially sign/color all digital files
now we need more noise
This still leaves out non cooperating image generators, and the real bad guys (organised disinformation grups) will use them.
They would not want to train their next model on the output of the previous one...
Who says?
this is one of the primary communication methods of oversea agents in CIA, interesting to have it be used more broadly </joke>
Do you have a source? I'd be interested in reading more about this.
It's a form of Steganography https://en.wikipedia.org/wiki/Steganography
Was referring specifically to the claim about the CIA, I'm aware of steganography.
of course not, check their username.
What if the watermark becomes a latent variable that's indirectly learnt by a subsequent model trained on its generated data? They will have to constantly vary the mark to keep it up to date. Are we going to see Merkle tree watermark database like we see for certificate transparency? YC, here's your new startup idea.
I can imagine some kind of public/private key encrypted watermark system to ensure the veracity / provenance of media created via LLMs and their associated user accounts.
There's many reasons why people are concerned about AI's training data becoming AI generated. The usual one is that the training will diverge, but this is another good one.
I think there should be an input filter that if it sees a watermark refuses to use that input and continues with the next input
Camera makers are all working on adding cryptographic signatures to captured images to prove their provenance. The current standard embeds this in metadata, but if they start watermarking the images themselves then skipping watermarked images during training would quickly become an issue
Does this watermark still work if someone screenshots an image?
Yes. The data is embedded in the pixels of the image, and it's embedded in a way that survives recompression of the image and some editing.
Does it still work when I take a photo of the screen with a camera?
Maybe. It depends how strong the watermark was, and how good the photo was at reproducing the image.
please tell us if it worked!
try running the code to find out
Now we need a link to the "Unwatermark Anything" repo
https://github.com/XuandongZhao/WatermarkAttacker
These can be easily jailbroken by quantizing the weights lower.
I think the intent is for deploying this between APIs and models.
oh I was kind of hoping this would also watermark text imperceptibly... alas this doesn't do that
Watermarking text seems impossible. you can ask the llm to add an exclamation mark after every word and then remove all exclamation marks
If every medium becomes editable like text, i don't see why it should be possible to watermark images or video any easier than text.
Images have the aliasing problem, which is NP-hard, but aliasing gets close to 100% correct after editing an image just by cutting shapes, and throw it in an image generator to create a new one with 99% similarity. In Stable Diffusion XL it need 70% similarity or something like that. The new image will be very similar to the old one with correct aliasing, but edited as much as you like.
text watermarking works pretty well! https://arxiv.org/abs/2301.10226
When you generate a text with an LLM, you always have some choice. So you can sample in a way that is very likely under your watermark scheme, and unlikely otherwise
but that's what I'm saying, When you ask for exclamation marks after each word that must change the likelihoods of next token by quite a bit. You then remove the marks which hides the fact that you just changed every word without loss of meaning
It's a faid carbon
I have just positive feelings about facebook recently, big power, open mindset