As a ML person who's also worked on HPC stuff, you will most certainly save money by doing this and there are plenty of benefits. It is generally a good idea, but there is a bit more barrier to entry and you need in house expertise.
So important piece of advice. If you can, hire an admin with HPC experience. If you can't, find ML people with HPC experience. Things you can ask about are slurm, environment modules (this clear sign!), what a flash buffer is, zfs, what they know about pytorch DDP, their linux experience, if they've built a cluster before, adminning linux, and so on. If you need a test, ask them to write a simple bash script to run some task and see if everything has functions and if they know how to do variable defaults. With these guys, they won't know everything but they'll be able to pick up the slack and probably enjoy it. As long as you have more than one. Adminning is a shitty job so if you only have one they'll hate their life.
There are plenty of ML people who have this experience[0], and you'll really reap rewards for having a few people with even a bit of this knowledge. Without this knowledge it is easy to buy the wrong things or have your system run far from efficient and end up with frustrated engineers/researchers. Even with only a handful of people running experiments schedulers (like slurm) still have huge benefits. You can do more complicated sweeps than wandb, batch submit jobs, track usage, allocate usage, easily cut up your nodes or even a single machine into {dev,prod,train,etc} spaces, and much more. Most importantly, a scheduler (slurm) will help prevent your admin from quitting as it'll help prevent them from going into a spiral of frustration.
[0] At least in my experience these tend to be higher quality ML people too, but not always. I think we can infer why there would be a correlation (details).
Nice ideas, but we have chosen a really simple Kubernetes deployment. We only install the host OS (ubuntu server) and then join the self-hosted GPUs as workers in a Kubernetes cluster.
No other task is needed and our Grafana monitors if the server (and its containers) are up and running.
Sorry, my suggestion was if you need to do training. If you're only serving then the suggestions I made aren't as valuable and something like what you've done probably make more sense. But you want a proper cluster setup to do multigpu and especially multi node stuff
> We however found that our co-working space - WeWork has an excellent server hosting solution. We could put the servers on the same floor as our office and they would provide redundant power supply, cooling and internet connection. This entire package is available at a much cheaper rate and we immediately jumped on this. Right now all servers are securely running in our office.
I think generally the benefit of cloud is either where your demands are very elastic, or if you are essentially a fractional user - a single server or GPU would be overkill.
Once you have heavy and/or unconventional compute needs, it's likely cheaper to self-host or colo purchased hardware.
They are processing 2.5 Billion images and videos in a single day.
They decided to self host their GPUs.
The solution uses off-the-shelf hardware, with GPU per "server",
add it all together into a single rack?
And that is the GPU compute needed to process all the videos
24/7?
Then they have this rack in the office, but they cant find a place to put it.
That might be a decent thing to start out with, before the build.
Where do we put it?
But no.
Planning for multiple network links, multiple redundant power, cooling, security, monitoring, and backup generators, handling backups, fire suppression, and failover to a different region if something fails was not necessary.
Because Google book?
But our (insert ad here) WeWork let us put our servers in a room on the same floor,
(their data centerish capabilities seem limited)
There are so many additional costs that are not factored into the article.
I am sure once they accrue serious downtime a few times and
irate customers, then paying for hosting in a proper data center
might start making sense.
Now I am basing this comment on the assumption that the company is providing
continuous real-time operations for their clients.
If it is more batch operated, where downtime is fine as long as results are delivered let us say within 12 hours.
These servers are indeed job processing servers. They are critical but not milliseconds critical. Cooling, security, monitoring, backup generators, and backup of data all are taken care of.
It would be nice if you can add numbers, like what would be the cost in your cloud provider, what was the total investment made, how much are you saving, which other options did you have in mind and why were discarded
Still it was a nice post to read
I find it to be an odd choice. I mean the CPU itself is perfectly fine (typing this myself on a 5600G, which I very much like), but AM4 socket is pretty much over - there is no upgrade path anymore once it starts getting long on the tooth. (Unlike the other parts, which can be bumped: RAM, GPU, storage...)
Probably yes but we just built it to be cheaper. AM5 is on the costlier side and we don't plan to upgrade these machines. Our calculation is we can retire them by the end of 3 years.
As soon as the AM4 vs AM5 conversation started, I immediately thought "price". The cost of swapping a mobo to AM5 in a few years is minimal compared to bulk price savings you can get using "good enough" AM4 now.
Seeing AM4 boards and cpus easily 1/2 the price of AM5 gear in consumer sector. Imagine it's similar in the professional sector.
I'm still building more AM4 machines than AM5 for clients, FWIW, even for folks that want relatively performant desktops. The price/performance just isn't better enough yet to do otherwise for most.
How do you feel about their GPU selection? I understand the 2U rack limits their choices, but what would you recommend as a good GPU that strikes a balance between performance and price?
> Our calculation is we can retire them by the end of 3 years.
I was going to say, business just needs server to last 3 years, they are normally written off after 3 years and you don't do upgrade plans. Currently we're aiming more for 5 years, but budgeting for 3, that way anything beyond the 3 years is basically free. No one plans to purchase upgrade parts for their old Dell servers either.
You can also move some of these machines into other roles like QA later on.
We also do, and you'd need to add a couple more zero's for the cost. For administration it paid out that I'm a trained architect, because all the work is in cooling the room. Lots of temperature shielding and air and water flow, monitors, ...
I think you're reading that page wrong; but their pricing page is so confusing that its giving me red flags already.
It says that would cost $6.51/hr and $4752/yr: I think you pay both of those things. I think the first number is the hourly cost, and the second number is the annual commitment. So its $56,246/year if you're running 24x7 + $4,756 = $61,002/year total.
I am sorry if you are finding our pricing page confusing. We had a recent update to that page and we had a glitch for prices for the GPUs. Anyway, the correct price for a 8 x L40S is $4752/mo when paid a year upfront.
So, when the page now says "$6.51/hr // $4,752/mo", that's really just presenting the same actual cost you'd have to pay, across two different time metrics? As in, you pay $6.51/hr, or you pay $4,752/mo, same thing, but not both?
I think you need to consider: Your competitors (e.g. AWS) generally structure annual commitments as an upfront (or monthly) payment + a reduced cost per hour/minute for the resource being reserved; that's the lens through which most people viewing this page will be thinking. If that is not how you structure annual commitments, then that should be made very clear.
If my first paragraph is correct (and again, the page is still confusing, its not obvious to me that this interpretation is correct): You should list one price, and give a dropdown at the top to change the computation of that price across whatever timeframe the user wants ($/hr, $/day, $/month, etc). That would also free up some space in-line to put a chip that says something like "-15% discount!".
it's hopper vs grace hopper and afaik doesn't require custom boards, but i haven't really looked in to it too much as both are far outside my personal price range.
At my last job we did the same thing but for AI training hardware. It was definitely the right call cost-wise, with our little cluster breaking even after 8 months. We found a cheap data center in Texas.
Interestingly, we tried that RTX4000 before we decided to buy our own. Yes, ours will break even in 14 months. ($2300 cost + $40 per month datacenter cost)
As a ML person who's also worked on HPC stuff, you will most certainly save money by doing this and there are plenty of benefits. It is generally a good idea, but there is a bit more barrier to entry and you need in house expertise.
So important piece of advice. If you can, hire an admin with HPC experience. If you can't, find ML people with HPC experience. Things you can ask about are slurm, environment modules (this clear sign!), what a flash buffer is, zfs, what they know about pytorch DDP, their linux experience, if they've built a cluster before, adminning linux, and so on. If you need a test, ask them to write a simple bash script to run some task and see if everything has functions and if they know how to do variable defaults. With these guys, they won't know everything but they'll be able to pick up the slack and probably enjoy it. As long as you have more than one. Adminning is a shitty job so if you only have one they'll hate their life.
There are plenty of ML people who have this experience[0], and you'll really reap rewards for having a few people with even a bit of this knowledge. Without this knowledge it is easy to buy the wrong things or have your system run far from efficient and end up with frustrated engineers/researchers. Even with only a handful of people running experiments schedulers (like slurm) still have huge benefits. You can do more complicated sweeps than wandb, batch submit jobs, track usage, allocate usage, easily cut up your nodes or even a single machine into {dev,prod,train,etc} spaces, and much more. Most importantly, a scheduler (slurm) will help prevent your admin from quitting as it'll help prevent them from going into a spiral of frustration.
[0] At least in my experience these tend to be higher quality ML people too, but not always. I think we can infer why there would be a correlation (details).
Nice ideas, but we have chosen a really simple Kubernetes deployment. We only install the host OS (ubuntu server) and then join the self-hosted GPUs as workers in a Kubernetes cluster.
No other task is needed and our Grafana monitors if the server (and its containers) are up and running.
> "Would you mind sharing the name of the data center?"
Curious to know what you use other than grafana in your monitoring stack. We use prometheus for metrics/alerts and Loki/promtail for logs.
Sorry, my suggestion was if you need to do training. If you're only serving then the suggestions I made aren't as valuable and something like what you've done probably make more sense. But you want a proper cluster setup to do multigpu and especially multi node stuff
> We however found that our co-working space - WeWork has an excellent server hosting solution. We could put the servers on the same floor as our office and they would provide redundant power supply, cooling and internet connection. This entire package is available at a much cheaper rate and we immediately jumped on this. Right now all servers are securely running in our office.
Nice! How much does this cost?
$40 per server per month. Includes bandwidth, cooling and internet.
What connection deal do you get for that? Does it have uninterruptible power supply?
Yes, UPS + 2 independent power lines. Our server has a dual power input PSU. If either power fails, the other one keeps it running.
Thanks for your answer. That sounds good on power.
My first "connection" question was about Internet connection though.
Can you get e.g. 10 Gbit/s, and is there a traffic limit?
How bout compliance concerns? SOC and some of those are going to want to see super sophisticated datacenter ops/security/etc
Sounds like a bargain.
Any idea how long your servers will last until you need to upgrade to the latest GPU/HW?
3 years is what we planned. Any extra time we get will be a plus.
I think generally the benefit of cloud is either where your demands are very elastic, or if you are essentially a fractional user - a single server or GPU would be overkill.
Once you have heavy and/or unconventional compute needs, it's likely cheaper to self-host or colo purchased hardware.
This does not make sense to me.
They are processing 2.5 Billion images and videos in a single day. They decided to self host their GPUs.
The solution uses off-the-shelf hardware, with GPU per "server", add it all together into a single rack? And that is the GPU compute needed to process all the videos 24/7?
Then they have this rack in the office, but they cant find a place to put it. That might be a decent thing to start out with, before the build. Where do we put it?
But no. Planning for multiple network links, multiple redundant power, cooling, security, monitoring, and backup generators, handling backups, fire suppression, and failover to a different region if something fails was not necessary.
Because Google book?
But our (insert ad here) WeWork let us put our servers in a room on the same floor, (their data centerish capabilities seem limited)
There are so many additional costs that are not factored into the article.
I am sure once they accrue serious downtime a few times and irate customers, then paying for hosting in a proper data center might start making sense.
Now I am basing this comment on the assumption that the company is providing continuous real-time operations for their clients.
If it is more batch operated, where downtime is fine as long as results are delivered let us say within 12 hours.
These servers are indeed job processing servers. They are critical but not milliseconds critical. Cooling, security, monitoring, backup generators, and backup of data all are taken care of.
How did you expose the servers to the internet, if at all?
I'd personally have these on tailscale, not exposed to the internet, but at some point in self hosting, clients have to be able to talk to something.
I know tailscale has their endpoints but I can't expect this to be able to server a production API at scale.
Tailscale :) We, fortunately, don't need these exposed to the internet so Tailscale works beautifully.
"tailscale funnel"
No ACLs in front. I don’t know how much that could be done, but at least IP filtering.
this gives me an idea...
It would be nice if you can add numbers, like what would be the cost in your cloud provider, what was the total investment made, how much are you saving, which other options did you have in mind and why were discarded Still it was a nice post to read
Probably yes but we just built it to be cheaper. AM5 is on the costlier side and we don't plan to upgrade these machines. Our calculation is we can retire them by the end of 3 years.
As soon as the AM4 vs AM5 conversation started, I immediately thought "price". The cost of swapping a mobo to AM5 in a few years is minimal compared to bulk price savings you can get using "good enough" AM4 now.
Seeing AM4 boards and cpus easily 1/2 the price of AM5 gear in consumer sector. Imagine it's similar in the professional sector.
I'm still building more AM4 machines than AM5 for clients, FWIW, even for folks that want relatively performant desktops. The price/performance just isn't better enough yet to do otherwise for most.
How do you feel about their GPU selection? I understand the 2U rack limits their choices, but what would you recommend as a good GPU that strikes a balance between performance and price?
> Our calculation is we can retire them by the end of 3 years.
I was going to say, business just needs server to last 3 years, they are normally written off after 3 years and you don't do upgrade plans. Currently we're aiming more for 5 years, but budgeting for 3, that way anything beyond the 3 years is basically free. No one plans to purchase upgrade parts for their old Dell servers either.
You can also move some of these machines into other roles like QA later on.
I'm using same processors (and 5600) and they're FASTEST on single thread i've ever seen!
Typically components are never upgraded in a server: you spec it, buy it, write it off in 3-5 years, then throw it away.
We also do, and you'd need to add a couple more zero's for the cost. For administration it paid out that I'm a trained architect, because all the work is in cooling the room. Lots of temperature shielding and air and water flow, monitors, ...
Shouldn't they be named VPU (vector processing units) as they are no longer to produce graphics?
https://www.stardog.com/blog/skathe-is-a-private-gpu-cloud/
RTX 4000 ADA? That's a very under powered card: https://github.com/mag-/gpu_benchmark
Do you mind sharing the details of the rack mount you use?
Tangential to the post:
Was going to toss an application your way since it sounds like interesting work, but it looks like the Google Form on your Careers page was deleted.
aditya [at] Gumlet.com
How many GPU servers are we talking about here exactly?
We bought 21. This is just a start.
Does each machine have just 1 gpu per rig or is it multiple? Do you network them to do DDP or is it just 1 inference job per card?
Why not run something like 8 x L40's for $4,750 a month from a bare metal provider like latitude.sh? This seems far more cost efficient and flexible.
I think you're reading that page wrong; but their pricing page is so confusing that its giving me red flags already.
It says that would cost $6.51/hr and $4752/yr: I think you pay both of those things. I think the first number is the hourly cost, and the second number is the annual commitment. So its $56,246/year if you're running 24x7 + $4,756 = $61,002/year total.
I am sorry if you are finding our pricing page confusing. We had a recent update to that page and we had a glitch for prices for the GPUs. Anyway, the correct price for a 8 x L40S is $4752/mo when paid a year upfront.
So, when the page now says "$6.51/hr // $4,752/mo", that's really just presenting the same actual cost you'd have to pay, across two different time metrics? As in, you pay $6.51/hr, or you pay $4,752/mo, same thing, but not both?
I think you need to consider: Your competitors (e.g. AWS) generally structure annual commitments as an upfront (or monthly) payment + a reduced cost per hour/minute for the resource being reserved; that's the lens through which most people viewing this page will be thinking. If that is not how you structure annual commitments, then that should be made very clear.
If my first paragraph is correct (and again, the page is still confusing, its not obvious to me that this interpretation is correct): You should list one price, and give a dropdown at the top to change the computation of that price across whatever timeframe the user wants ($/hr, $/day, $/month, etc). That would also free up some space in-line to put a chip that says something like "-15% discount!".
I think a single L40 costs $1320 a month on latitude. L40 is also an older GPU.
Latitude.sh provides the L40S, not L40
The on demand cost is $9.30/hr and $4,752/mo when paid a year upfront
it's hopper vs grace hopper and afaik doesn't require custom boards, but i haven't really looked in to it too much as both are far outside my personal price range.
> "Dedicated GPU clusters for accelerated computing"
- so you have to add the price of AMD/Intel bare metal servers.
- the price of "Networking" PER TB
- and the "Additional services pricing"
https://www.latitude.sh/pricing
That's all included, you don't have to add the bare metal servers, networking or anything else.
thanks, the webpage is not clear ...
"Bandwidth Pricing is based on the country your server is located. Packages are billed monthly and sold in increments of 10 TB."
It looks like their on-demand price would cost $81,468 per year
So even at the reserved price for a year (365 * 24 * $6.51) you're nowhere near $4,750 per year, it's closer to $60k
At my last job we did the same thing but for AI training hardware. It was definitely the right call cost-wise, with our little cluster breaking even after 8 months. We found a cheap data center in Texas.
Would you mind sharing the name of the data center?
Hetzner has RTX 4000 for 185€ per month. Is your solution cheaper?
Seems like it should break even around a year in
Interestingly, we tried that RTX4000 before we decided to buy our own. Yes, ours will break even in 14 months. ($2300 cost + $40 per month datacenter cost)
I skimmed to the part about "We host it in our WeWork office" and thought WTF?
we know but it’s actually pretty good.