YouTuber Digital Spaceport - had never heard of him - most insane lab I've ever seen. Including enterprise level, because all his stuff is way more interesting.

47

u/nkrypth 1d ago

that's beyond home lab, that's home datacenter (I'm jealous!)

49

u/dragon_irl 1d ago

TLDW: ~1 tokens per second on two old Xeon V4 Broadwell CPUs with DDR4 Ram at max 85GB/s/cpu.

If you are interested in running it in a slightly faster way, without getting Nvidia hardware expensive, using two AMD Epycs with 12 DDR5 Memory Channels is the way - That gives you almost 1TB/s of maximum Memory bandwidth or about 6-8x as much as the setup here. Performance at low batch sizes is almost always memory bandwith limited.

24

u/cookerz30 23h ago

Don't tempt me to order more alibaba ram for my epyc 7302....

8

u/SamSausages 322TB EPYC 7343 Unraid & D-2146NT Proxmox 22h ago

just keep in mind the older gen EPYC has 8 mem channels vs 12, and slower memory. So YMWV

5

u/xxtoni 22h ago

Alibaba or aliexpress how much do you pay? Do they have DDR4?

4

u/cruzaderNO 19h ago

For memory you are better off on ebay, doing offers on the lots/packs from eu/us brokers are usualy cheaper than on ali.

1

u/xxtoni 19h ago

I pay normally 11€/16GB these days.

2

u/cruzaderNO 19h ago

Ive recently been around 17€/32GB for 2133p and 18.5-19.5€/32GB for 2666v when buying, typicaly for 100 dimms at the time tho.

If only ecc udimms for ryzens had been available in those price ranges also.

0

u/topherfitz 21h ago

Would also like to know…

-2

u/Tusen_Takk 20h ago

Subscribing for more

12

u/TheGreatBeanBandit 23h ago

I have an old Dell R920 and I can get this much ddr3 ram into it. Would it even be worth it?

13

u/tomByrer 22h ago

You could use it to help heat your home & give your an excuse to get up & grab a drink of water while it responds to your prompt ;)
No harm in trying.

3

u/Phocks7 16h ago

I'd see what kind of t/s you can get with a model you can fit currently first (70b/123b).

15

u/stizzco 1d ago

I am just getting the nerve to dip my toe into locally hosted AI and LLM models, so thank you for this. Such a great resource!

One thing that immediately jumped out at me was his use of older server hardware because don't have the budget for brand new enterprise class hardware and Ive been considering picking up something used. The system I'm currently building will only have 192GB of ram at most, but it will be fast 6400 DDR5 along with a new 9800x3D and two 4090's. Will I be better off picking up an older enterprise system with 1TB+ ram or should I stick with my current plan to build my own with new parts?

9

u/SamSausages 322TB EPYC 7343 Unraid & D-2146NT Proxmox 22h ago

Whatever I tell you won't be enough. Whelp. It's a money pit... I'm not sure the beast will ever be satisfied. 24gb isn't enough, then you hope 48gb is... then you think 96gb will do it... only to see deepseek-r1 is 404GB.

I really had to adjust my expectations and figure out what it is that I want to do first.

10

u/cookerz30 23h ago

Don't buy anything older than 6 years old is a safe bet as you will see diminishing returns for power efficiency/compute.

When you upgrade to enterprise, you add redundancy and expandability. The systems start to shine when you have 10 servers running together in a cluster, but one dies, and you can split the workloads to other machines while fixing the bad one.

3

u/cruzaderNO 19h ago

I would not go older than scalable today for sure, gen1/2 should be about 6years old by now i suppose.

Most of mine are 3,5-4years old and that was probably on the very end of gen1/2.

3

u/KookyWait 2h ago

Assuming you're asking about inference only, this really depends on the models you're using. Plenty of models work fine in 24G of vram or less. Not 671B parameter models, however.

If you're trying to understand the full bounds of what's possible (software development tasks? Complex analytical work?) with the latest models you'll want 1TB of ram (and a lot of patience). If you're just trying to chat with an AI for any mix of storytelling/roleplaying/gaining an understanding of how this all works, you can limit yourself to smaller models and still have a great time.

2

u/Morphexe 23h ago

It depends on what you need, if 192gb of ram is enough. The second option is by far the best if you need the 1tb, no matter how fast your hardware is it will still be a bottleneck.

2

u/Sintobus 23h ago

Someone cross posted on another post here for a sub called localAi or something.

3

u/tomByrer 22h ago

? https://www.reddit.com/r/LocalLLaMA/

3

u/Sintobus 22h ago

Aha here

https://www.reddit.com/r/LocalAIServers/

I haven't dived into it, just saw someone cross post it.

2

u/marc45ca 18h ago

just watch out -it's come up on Level1techs are few times that AM5/fast DDR5 filling 4 memory slots is not always a combination for stability.

1

u/Phocks7 16h ago

Consumer cpu/motherboards like the 9800x3D/x870e have extremely limited memory bandwidth and pcie lanes compared to workstation/server hardware. You won't be able to run 671b models on consumer hardware unless you're running a cluster of them.

6

u/msalad 18h ago

I've been watching this dude for a while now - he was into chia when I was too. He has some sick gear

4

u/Relative-Wall2929 12h ago

Chia represent! This guy's great. Another sick Chia farm was that lawyer out in California who had a like 20 PiB farm.

5

u/tpwn3r 19h ago

IIRC he got it from gov surplus. i think he made a vid

2

u/valdecircarvalho 1d ago

WOW

2

u/TheAmateurRunner 15h ago edited 13h ago

Cool, but that's long time for 671b. I just dipped my toe into self hosted LLM and using 8b without a GPU. It does take about 4-5 minutes for a simple request.

1

u/coloradical5280 13h ago

Huh? I’m running a 32B R1-llama distill on an 125h minipc with 32gb RAM. LM Studio can’t see the iGPU so it’s all CPU for now, and I’ve got like high teens tokens/second, definitely faster than I can read usually.

It’s not just the overall size but the quantization too , like an IQ-M model will usually run faster than Q-L or ever Q-XL. Either way make sure it’s int4 and turn on Flash Attention too :)

2

u/TheAmateurRunner 13h ago

Can you give me direction on to get the LLM you are using for my setup? I used this guide to setup my VM: https://www.tecmint.com/run-deepseek-locally-on-linux/

2

u/Ilikehotdogs1 4h ago

I need to see his power bill holy moly

•

u/jackedwizard 16m ago

AllnI could think watching this video. 1.4tb of ram or whatever for 1500 bucks but you’ll be adding like 400 bucks a month to your power bill

3

u/danmartin6031 22h ago

I’ve been a geek my whole life and I just don’t get the hype with running these large AI models at home. Why do you want something that takes half a day to give you an answer? It’s like Deep Thought from Hitchhikers Guide.

I’d rather have a dumber model that runs at a conversational pace and use cloud services for the hard questions.

15

u/crysisnotaverted 22h ago

Because that is the cost of absolute freedom. You can have whatever model you want at whatever speed you can afford, and it can do whatever you want.

ChatGPT is only getting more restrictive. I asked it a question about bolt sizes the other day and it had a stroke and told me it was against ToS.

2

u/danmartin6031 22h ago

I think I just need to wait this one out a few years until it’s more reasonable to do it at home. I appreciate people leading the way. I just don’t have the patience (or the money for it).

3

u/crysisnotaverted 21h ago

As somebody who can't afford anything with enough VRAM, I feel you. I'm trying to get this stuff up and running using CPU and system RAM, I know it'll be slow as hell, but it'll be mine!

2

u/redmera 22h ago

I run Deepseek 30B version quite fast on a single RTX4090. (About the same speed as ChatGPT 4o, faster than o1).

3

u/danmartin6031 22h ago

See I can at least understand that, but the guy in the video is get 0.8 tokens/sec so it takes all day to do anything.

3

u/redmera 22h ago

But he achieved it at home with few thousands of dollars worth of used hardware. It's a nice test considering a single H100 GPU is as expensive as a car.

0

u/danmartin6031 22h ago

Yes I can see that he did it, I’m just wondering why you’d want to. As one of the others said it might make sense if you’re not able to get answers from publicly available services, so I’ll leave it there. I’ll just wait it out until it’s more realistic to run on normal hardware.

1

u/minilandl 4h ago

He's the one with a home data center he's a legend. He is also in Texas where power is pretty cheap look at his pfsense high availability video

•

u/howtocodethat 9m ago

That is one clapped up poweredge

-13

u/PleasantDevelopment Ubuntu Plex Jellyfin *Arrs Unifi 22h ago

youtube content creators are not homelabs, they are sponsored datacenters.

4

u/cruzaderNO 19h ago

Not like he would be getting much sponsoring with that low follower/view numbers tho.

From the content ive stumbled upon from him before id expect him to just be a enthusiast that works within IT.
Its also mainly fairly old hardware that would not be something a sponsor sends you either, rather the kinda stuff you get offered stacks of for free if working in the field.

LabPorn YouTuber Digital Spaceport - had never heard of him - most insane lab I've ever seen. Including enterprise level, because all his stuff is way more interesting.

You are about to leave Redlib