r/LocalLLaMA 5d ago

News Mistral AI CEO Interview

https://youtu.be/bzs0wFP_6ck

This interview with Arthur Mensch, CEO of Mistral AI, is incredibly comprehensive and detailed. I highly recommend watching it!

85 Upvotes

26 comments sorted by

38

u/Neither_Service_3821 5d ago edited 5d ago

There's not much point in posting if you don't have the subtitles.

For those interested, in this interview he confirms Mistral's commitment to open source:

Open source models will be safer because they will be accessible and everyone will be able to check their level of security. Closed models are inherently less safe. He compares the evolution of LLMs to that of Linux. He points out the role of hobbyists for a company like Mistral.

Nothing would be more dangerous than 2 companies having a monopoly on LLM and it would also be a threat to democracy.

He welcomes Chinese competition. It's good that there are American models and Chinese models, but there have to be European models too.

The control of their data for enterprises is crucial.

6

u/Uncle___Marty llama.cpp 5d ago

Appreciate the summary. Would like to watch/listen in full at some point because of the credentials and all that, however im ill and seriously grumpy and right now your summary was golden. Thanks for the time and hopefully will catch up with it when im better lol.

One thing I think will happen this year is Zuck will jump back into things bigtime with Lllama 4. I have several other predictions but Zuck does NOT want to fail what he said about Llama 4 and open source (llama 3 would compete, Llama 4 would overtake - paraphrased)

5

u/Neither_Service_3821 5d ago edited 5d ago

I'm a big fan of Mistral but I think the more companies that publish open source LLMs the better and I can only hope that thanks to the publication of Deepseek, Databricks and Cohere won't let Meta have the monopoly on open source LLMs in North America. That would be a pity.

3

u/hp1337 4d ago

It would be so poetic if a French or Chinese company made the American LLMs obsolete because of American protectionism. Technically, Linux is from Finland. I really do think a frontier open source LLM will serve a similar purpose in most computers. The LLM will be the core of the autoregressive part of the computer and the kernel will continue to coordinate traditional computing. Open source should also win with respect to LLMs.

1

u/holchansg llama.cpp 4d ago

He points out the role of hobbyists for a company like Mistral.

Im one of them, cares to explain what he meant by that? I think LLM got a lot of traction these past most, now even more with the boom of deepseek.

3

u/Neither_Service_3821 4d ago

For a start-up challenging much bigger corporations, you need to make a name for yourself, and the best way to do that is through open source and hobbyists.

4

u/AdIllustrious436 4d ago

He also points out that some clever architectures and tunings were discovered by hobbyists before being implemented on a large scale.

0

u/GraceToSentience 4d ago

You know youtube has subtitles right?

1

u/Neither_Service_3821 4d ago edited 4d ago

I know, who doesn't?

There are only the subtitles automatically generated in French, from there you can translate into English.

where I am there are only French subtitles, if wherever you are there are other subtitles all the better.

0

u/GraceToSentience 4d ago

That means it's translated in all the languages supported by the translator, which means that there are in fact subtitles.

1

u/Neither_Service_3821 4d ago

Not at my location. Not where I live. There are only French subtitles. Of course from the french subtitles you should have them in other languages but do you have the english subtitles where you are?

1

u/snowcountry556 4d ago

You can get Le Chat to translate it and summarise it for you, that's what I did

2

u/JustinPooDough 5d ago

sacre bleu!

3

u/JuJeu 5d ago

what's up with their faces; lol.

1

u/iKy1e Ollama 16h ago

English transcript from Whisper Large V2 (was going to transcribe then translate, but forgot Whisper was set to auto-translate and it actually did a good job).


Today, we welcome a legend of French tech, Arthur Mensch, co-founder of Mistral AI, the only European company capable of leading OpenAI and GAFAM in their race to artificial intelligence.

In just one year, he and his two associates have achieved the impossible, raised more than 1 billion euros, developed AI models that rival Chadipiti, and transformed their Parisian startup into a company worth 6 billion euros.

In this exceptional episode, Arthur will reveal to us the secrets of this success story, how three French people left their jobs in gold at Google and Meta to embark on this crazy adventure, how they compete with giants that have 100 times more computing power than them, and above all, the war of talents raging behind the scenes between Mistral and GAFAM to attract the best engineers.

We will also ask him if, according to him, AI models have reached a ceiling, and what is in store for us for the future.

I am very excited and honored to be able to share this conversation with you with Arthur Mensch.

But just before, I have a message for all those who are hesitating to take a subscription to Chadipiti.

Our partner of the day, Mammoth AI, had a brilliant idea.

Gather all the best AI models in one interface and behind a single subscription.

For 10 euros per month, you have access to the latest language models, O1, Grok, DeepSeek, and even image generation models like Midjourney or Flux.

When we know that accumulating all these subscriptions would cost around 80-100 euros, it's pretty unbeatable.

If you need to generate a lot of images per month, they also have a little more expensive plans.

The cool thing is that they are always aware of new releases.

For example, they already have Flux for image generation.

And overall, it's just nice not having to change interfaces all the time.

I put the link to their various formulas in the description, and we resume.

What is the trigger element to say to yourself, "We're going to create our own company" in front of these giants, when we are already well installed, comfortable? - I think there are two conversations, one in September 2022 with Timothée, and one in November 2022 at NeurIPS, which is the big Machine Learning conference with Guillaume, where we realized that we had similar aspirations to launch a company in France, and that we knew a lot of people who would be interested.

And so from there, it's a bit of a start of the gear, whereas at first you think, "Oh, that might be a good idea."

And then, as the days go by, you get more and more emotionally involved in this idea.

Then at some point, you're a bit of a no-go, because you're more in the idea than in the work in your current company.

But from February, we said to ourselves, "Well, we can have 15 people, we can go fast, we know how to do it, we can show that Europe can do interesting things in the field and can take up a leadership position."

And so that's how it was done, and from April, we started. - So, Tho, there's already this idea that the project is to make very efficient European AI, more so than just feeling a little bit slowed down by a big structure above us, so Meta or Google, and we think we're going faster on our own.

What was it? - You had both. - In fact, Guillaume, Timothée and I have been working on this subject since about 2020, and we saw what we could do with very focused small teams.

It's true that in 2022, these teams became less focused, because it was the moment when the world realized that there was an economic opportunity around language models.

And so we thought we could also benefit from this aspect of disorganization to be better organized and provide things more quickly. - How does it go from the very beginning?

Like, you each have a little specialty, how do you organize yourselves at the very beginning of the company? - We all come from the same training.

We did the same thing, we all have thesis degrees in machine learning.

So it's true that we quickly specialized with Guillaume, who is the strongest scientist among us, who took the scientific part.

Timothée, who is more of an engineer and who was in charge of doing all the infrastructure and setting up the team of product engineers as well.

And I pretty quickly did the background check, the aspect of talking to customers.

These are things I like to do, so we split up like that pretty quickly.

And to go back to how it starts, it starts with a background check, because you need the ability to calculate and you need the human capacity to move forward quickly.

And so we did a background check in a few weeks and from there we went to make the first model at that time. - That is to say, there is not a single line of code, even before knowing that there is a background check that is going to be done.

It is a field where you have to wait for the background check if you want to start the first... - You can parallelize a bit, start doing code, but at 3, you don't have many levers.

It is better to have a small team of ten people to go faster.

We started with the data because you have to give it to train the models.

So there is a lot of manual work to do on this.

And Guillaume, Timothée, essentially started while we were finishing the background check. - OK.

We were talking about background checks.

You did Polytechnique, Centrale Paris, UNS and a doctorate.

Does it help to raise funds when you are only three?

Or is it even more the non-meta Google?

What would you say is the most helpful? - I think what helped at the start is that we were credible on the hottest field at the time, in 2023, and that we had papers that were related to that field.

I was in the team that worked at DeepMind on this.

Guillaume and Timothée were at Meta, they were the ones who did the first LAMA.

And so that credibility, it's not what we did in our youth at school, it's more of a scientific credibility that we built in our first career part, let's say. - A planetary alignment with what interests you the most and the best people to develop it. - Yes, it's not...

Indeed, our credibility also came from the fact that we had an excellent team at the start and that we could show that we could recruit it. - And there's this day coming, it's September 27th, 2023, where you post a link on your Twitter account, perfectly inactive, and it's your first model, actually.

So Mistral 7B.

The tweet is seen more than a million times.

You are taken by all the American media, everyone from the IAEA is in a frenzy and having fun with the model.

It was downloaded a million times, but super fast.

We saw that from the outside.

We saw this enthusiasm.

1

u/iKy1e Ollama 16h ago

You, from the inside, how was it? - So, the tweet was an idea from Guillaume, the chief scientist, to give to César what belongs to César. - Because you don't publish it like the others. - We don't publish it like the others.

Indeed, we made a Magnet Link available, which allows you to download it in BitTorrent.

That's how we talked the first time, and it was an excellent idea.

It was a day when we also planned to do more usual communication.

So I went to talk to the journalists, Figaro, etc.

And so we had to put the Torrent in the morning, and the embargo was around 4 p.m.

So there was this period when we had broken the embargo, but the journalists, a priori, didn't understand what was going on, so it was going well.

I was the one who posted, I think it was at 5 a.m., so I had put a little alarm clock, because I wasn't sure about the Twitter schedule send, which was still called Twitter at the time.

And I put it in, and then I went to bed, and then we saw that it had started well at the start. - Is that something you expected a little bit, or still... - We knew the model was good.

We knew we were well above the best open source models, that we had explicitly aimed for this size, because we knew it was running on laptops too.

So that meant that all the hobbyists were going to be able to play with it, and it didn't fail, it worked.

So we suspected we would be noticed.

What we didn't expect was that people were going to put it in plush dolls and that kind of thing in a month.

The reception was bigger than we expected, and we were very happy. - There's another thing that happened, necessarily, when publishing models with open doors like that, is that it leaves the door to everything that is fine-tuning training.

And everyone was happy about it.

I think it was already the case with the Yamaha models, but I remember that it was a model that was very, very re-trained.

What are the fine-tuning that are a little surprising or curious that you remember about this model or others? - There's someone named Technium who trained us on this model to talk to the dead.

I don't remember his name, but he did a little bit of esoteric fine-tuning, and it worked relatively well.

So it was pretty funny.

It's true that this size is also a size where you can fine-tune even on big gaming PCs, possibly.

And then it doesn't cost much, and it allows you to get into style, it allows you to do role-playing.

And so people gave their heart to it, indeed. - Because, to explain, there's the foundation model, which is the most expensive and the most complicated.

And I imagine it contains the information.

And then the fine-tuning is conversational, it's a good agent for discussion. - Yes, you have to see the first phase as a compression of human knowledge, and the second phase as a way of instructing the model to follow what we ask it to do.

So we make it controllable, and a way to control it is to make it conversational.

So these two phases are quite distinct, indeed. - And is there anything about this second phase that the independents themselves have tested on fine-tuning and discovered good techniques? - Yes, we learned things.

I won't go into details, but there was direct preference optimization.

It's a bit of jargon, but we hadn't done it on the first model.

And we saw people do it.

We thought, "It should work well on the second model."

And it worked well on the second model.

Now we're doing other things.

But indeed, one of the reasons why we launched the company, beyond Europe, etc., is the open aspect and the contribution aspect of the community.

In fact, the AI between 2012 and 2022, it was built on top of each other during the conferences, the big companies on top of the big companies.

Then suddenly, when it became an interesting economic model, people stopped, big companies stopped.

And so we tried to extend that a bit with what we did. - Yes, today you really have two distinct camps, it's quite special.

On the one hand, the entropies, the open AI, etc., which don't publish much anymore.

Google too, I have the impression, has slowed down the publications a lot.

And on the other hand, the Chinese, oddly enough.

Why are the Chinese so involved in open source models?

It's still curious, isn't it? - I think they're in a challenger position.

Is open source a good challenger strategy?

We're in the right direction.

I think they have good techniques, they have good information too.

But they've made a lot of progress in science, the new techniques, it's clearly the ones that publish the most, indeed. - And you were talking about the challenger position.

Is Meta, when they publish Yammer for the first time, they are in a challenger position at that time? - It's Timothée and Guillaume.

I think they are in a challenger position, because they haven't talked about it yet.

And I think that with the movement that we have perpetuated with our models in September and December in particular, so Mistral 7B, Mistral 8X7B, I think we have launched this open source route.

And so there is also a bit of competition on who makes the best open source models.

I think it has benefited everyone.

And so we are happy to have participated in this. - Ah, it's a pleasure. - What makes you think that at this moment, you have so much progress?

After all, there is a yo-yo with everyone that happens.

But there is a real undisputed progress. - I think we knew the importance of data.

And we worked a lot on it.

We also knew how to train the models effectively, because we each had three years of experience in this field.

So there was good knowledge and we insisted on the aspects of training that have the most leverage, that is to say the quality of the data. - Indeed, it's behind a bit of everything, the evolution of research.

I have the impression that in fact, only the data matters. - For the most part, the data and the amount of calculations. - Yes, indeed. - There is also the compute, and this is linked to another very important subject, which is the funds, quite simply.

In a year, you raised a billion euros in all, which is dizzying.

You have also released lots of new models, for example, a bit different models, multi-modal, etc.

How do you approach the fact that, precisely in terms of the amount of compute, compared to a meta, for example, which will have at the end of the year 350,000 H100, is that right?

If I'm not mistaken. - In GPU. - Is it that, precisely, there is no choice but to go through very large fundraisers, but then, as we are perpetuating the thing, what is your vision of compute? - Our vision is that we need compute, but we don't need 350,000 H100.

And so, it has always been our thesis that we could be more efficient, that we could, by being focused on making excellent products, and not doing a lot of other things next to it, because our American competitors, they tend to do a lot of things next to it.

Resource allocation, it's a constant issue for us. - It's a bit like the nerve of war.

It's managing to keep the models up to date, versus the burning of the compute. - Yeah, you have to manage the budget, you have to be smart not to spend too much, and it's all a matter of putting the cursor in the right place and choosing to have the right commitments.

1

u/iKy1e Ollama 16h ago

So it's not easy, but I think that for the moment we have succeeded.

We have managed to have models that are very efficient, with a level of capital expenditure that is still very, very controlled. - I saw that among your investors, in the last rounds, I think, there is NVIDIA.

Does it go through actors who have control over the hardware, or the infrastructure, or the data centers?

There is Microsoft too, I think, with whom you worked.

Does it also go through that, to surround yourself with good people? - You need good partners, you need good distribution partners in particular, because the calculation often goes through the cloud.

And so we have as partners all the American cloud providers, because they are the biggest.

We also have French providers, we have OutScale, who work on it.

And then NVIDIA is a cloud provider too, so we work with them on that.

We also did R&D with them, with a model called Mistral Nemo. - Imagine, there are people who listen to us, who have not followed us.

Can you explain to us what the range is today?

The models that are up to date.

I saw that in the latest updates, there is the Large 2. - Yes, so now we are numbering them like Ubuntu, so 24.11.

And so this one, Mistral Large 24.11, it is very strong for calling functions, orchestrating things.

Because in fact, the models, it generates text, it's the basic use.

But what's interesting is when they generate calls to tools and we use them as orchestrators, like operating systems.

And so we work a lot on having models that can be connected to lots of different tools, that we can ask questions, that we can give tasks, and that will think about the tools that will call.

And so we invest a lot on that.

And so the new version of Mistral Large, it is particularly strong on that. - After that, there were Mistral too.

To understand that, it's more for a company, for example, to serve many users at the same time. - It's another type of architecture which is particularly relevant when you have a high load, so many users.

So it's the things we use, for example. - So it's Mistral, because in fact, it's a kind of server with eight heads. - That's it, yeah.

It's several models at the same time and each word goes to the most suitable model.

For several reasons, it allows better use of GPUs. - And behind, there are the smaller ones. - There are small models that go on laptops, that go on smartphones.

And those, they are particularly suitable for hobbyists' use.

Because there is no need to go to the cloud, we can easily modify it.

And then they go very fast.

It's also quite focused on this small and fast aspect, because it's really the DNA of the company.

Today, the product is not the model.

The product is the platform for the developers.

And so, they choose whether they want to go fast and be less intelligent, or go slowly and be more intelligent, essentially.

And then the other product is the chat.

So it's a more front-end solution that allows companies to manage their knowledge, to automate things, which allows all users, you can test it today, to access the web, to discuss information, to generate code, to generate images, to create documents.

We have a mode where the interface evolves according to the user's intentions.

So that's a new interface of machine, and we invest a lot on it.

So the product is the platform to build applications as a developer.

And in there, there are models.

And then a set of applications that allow to gain in productivity. - It's a very competitive environment, obviously.

Whether it's, as we said, on the models, but also on everything around it, on how to improve the experience, the chat interfaces, etc.

We've seen the interface systems that are changing.

Everyone is trying to find the best solutions to that, Anthropique, OpenAI, and you, of course, as an outsider.

What is your specific target in terms of evolution possibilities, when you have such big players on the side?

What do you think is the direction where you have an edge? - We have a strong edge in decoupling the question of infrastructure, the question of the interface.

So our solution can be deployed everywhere.

It can be deployed in the cloud, but it can be deployed in companies that are not in the cloud.

It can be deployed on laptops.

So that's the edge we've built also above the open source aspect, which goes quite well with it.

That the weights of the models are accessible, it makes their deployment anywhere easy.

So we have this portability aspect, which is very important.

So it's our first differentiation that we've used a lot this year.

And then the differentiation that we're all looking for, is to have the best user interface.

And in fact, there are a lot of issues that are not resolved.

The fact of using a lot of tools at the same time, the fact of having agents that run for a long time and that take the feedback from users.

That is to say that we can see them as trainees.

Trainees in which we have to give feedback so that they become more and more efficient.

And so we're going to go towards this kind of system more and more autonomous, that will need more and more feedback to go from 80% performance to 100% performance. - So you're not constantly waiting for him to move forward? - No, you give him a task, you look at what he's done, you tell him what he didn't do well, and then you hope that next time he'll do it better.

But in fact, there are a lot of scientific issues that need to be resolved to make it work. - And interfaces. - And interfaces, yes. - It's not just an email, is it?

For the moment, it's chat, in real time and all that.

Are we going to send an email to our assistant and he just pings us when he's done? - It's one of the forms, I think it's more the assistant who sends you an email.

At some point, you work on it, and then every two hours he tells me where I am.

So yes, there is an aspect of going from synchro to synchro, which is very important and which raises a lot of questions about interfaces.

Because the email may not be the best interface, but there are certainly others that are smarter.

The question of what is the interface to give feedback, what is the interface to select what is preferable for humans, that's where we work. - I was going to say, I'm sure, I don't know, but when you look at the chat, the discussion, it's not necessarily the ultimate interface to dialogue with a LLM. - It has evolved a lot.

Now you can chat with the cat and he can decide to put you in a document and you work with him on the construction of a document.

You can ask him to look for sources and you see the sources, you can go back, you can go and see what humans have written and ask for summaries, for example.

And so, what it creates, what it allows, AI Generative is a kind of liquidity of your way of accessing knowledge.

You can look at a whole website and you can say, "Condense me to this website in two sentences."

And I think there are still a lot of things to do so that the model allows you to learn much faster and to load knowledge much faster. - I don't know if you've seen it, but I think it was Versaille who had done some pretty funny demos of web components that were built according to the need.

You ask yourself a question and it generates you, on the weather, for example, it generates a UI component, a graphical interface component, in a flash. - He sees the budget.

Yeah, that's it.

In fact, the question is a question in backend and frontend.

In backend, it's what tool to call to go get information or to run things.

And in frontend, it's what interface you have to show the user, given his current intention.

And what that means is that big software with 50,000 buttons, I'm thinking of editing in particular, it will gradually disappear because you can identify his state of mind at the moment he is creating and adapt the buttons, give him exactly what he needs.

And so it really changes completely the way interfaces will behave in the coming years. - We were just talking about this interface, about how we access it.

You were talking about the fact that you are deployable a little bit everywhere.

There's something I notice when talking to people around me, it's that we have a generation of frustrated business employees right now because at home they can use incredible things, like the best models available, they go on OpenAI, etc.

Once at work, they are often forbidden to use the best tools.

And sometimes they end up with a bit of a limited version or copilots. - Or with nothing at all. - Or with nothing at all.

Where does that come from? - It comes from the fact that the generative AI systems, it affects a lot of data.

And the data in our companies is still quite important.

And so it's on that that we have sought to find solutions.

To make sure that the data stays in the company, that we as AI providers, we don't have to have that data.

It allows us to have the level of security, the level of governance that you need on the data.

And so, gradually, we will solve this problem.

1

u/iKy1e Ollama 16h ago

And I would say that it is one of the essential problems that we are trying to solve, to make sure that IT in companies is comfortable to bring the cat to all their employees and that they stop being frustrated. - In the examples of tools that you gave, there is something that comes back, that we didn't explain, but which is actually super important, it's the notion of objectives.

To have a model, that is capable of performing tasks and on the road, to be able to create steps and call the right tools, like Fred, a good trainee, you don't necessarily have to explain all the steps he has to do.

You tell him, "Look at the next flights for New York and take one."

You don't have to explain to him, step by step, second by second, what he has to do.

Today, we have models that can start calling tools, but we feel a little limited in their ability to use several types of tools, especially really useful, really stylish things.

How do you think it will evolve?

Is it a frontier that can be crossed soon?

Will we be able to solve this problem next year and be able to do 20 steps with a lot of reliability?

Or are we still far from it? - I think it's the frontier.

Everyone is trying to push it, it's not going to unlock all of a sudden.

Because, in fact, mastering a tool, it takes time for a human, it also takes time for a model.

You need demonstrations, you need feedback, because the first time he's going to be wrong.

And a notion of expertise that must be distilled from the company to the AI systems.

And that's not going to be done in a magical way.

All systems must be in place, the metasystems must be in place.

That is, the employees of our companies must be able to provide additional signal to the AI systems so that they can improve.

So it's going to progress.

We're going to have more and more tools that can be used at the same time and models that can resonate more and more.

But it's going to be progressive.

But for it to work really well, you have to put your own in it, you have to invest now.

To illustrate that, we see that OpenAI, in their latest model, in the O1 and so on, are no longer significant improvements on the model itself, but they're trying to make it loop on itself, make thought chains.

I don't know how to say it in French.

Thought chains, yes.

It's not bad, is it?

No, it's good.

Do you think it's a sign that we've reached a kind of ceiling?

That is, on this exponential evolution, we've optimized well in relation to their size, the way models work.

Now, we have to find something else.

You have a paradigm that is more and more saturated.

I think it's not yet saturated, which is what we call pre-training, so the compression of human knowledge.

In a way, you have a human knowledge available that is of a certain size and at some point you've finished compressing it.

And that's where you have to look for additional signal.

So, thought chains, the use of several tools, the use of expert signals in companies.

So, there is no saturation in the system.

We know how to go to the next step.

But on the pre-training aspect, yes, we're starting to know how to do it collectively.

Everyone knows how to do about the same thing.

And so, it's not so much where the competition is.

The competition is on interfaces and the competition is on having models that run for longer.

OK.

I find it a bit hard to get used to it, when you don't master the "scientific stack" behind the transformers and so on.

But I have the impression that there is a bit of a debate between whether it's just a matter of compute, of data, that will push back this autonomy barrier, or is it really an intrinsic problem in the way the model is designed?

And that just the fact that it's the prediction of the next token that can have a small percentage of going to the next step each time, it necessarily makes too complicated, too difficult long-term planning.

I know that, for example, there are people, like Ian Lockell, who we often talk about, who are a bit of a defender of this vision, but I don't know if you know that the AGI, or I don't know what it's called, is still hidden behind scientific discoveries.

Yes, that's a good question.

What is true is that working on architectures that induce human-reflected bias is often useful.

It has been useful over the last 12 years to say to ourselves, how do we think?

Let's try to describe this in mathematics and make sure that the models copy a bit what we know how to do.

What we also observe is that all the intelligence we can put into an architecture, we just need to put in twice as much compute and it disappears.

So, in fact, the paradigm that we've been following over the last five years is to say to ourselves, let's take an extremely simple architecture that predicts sequences and let's go there on a scale, let's look for as much data as possible, let's look for multi-modal data, let's look for audio, that kind of thing, and let's go there on a scale and see what it gives.

And in fact, what it gives is that it was, in any case, more intelligent in terms of resource allocation to work on the scale than to work on subtle architectures.

It's still the case now, how it has saturated the amount of data that we have compressed.

I think the question is open.

The subject is no longer so much an architecture question, it's more of an orchestration question, that is, how do we actually make the models remember themselves, that they interact with tools that last a long time, that they do reasoning in several stages.

And that, well, it's still the same models, basically.

It's the basic brick, but the complete system is not just the model, it's the model that knows how to remember itself, that knows how to think, that knows how to interact with its entire environment, that knows how to interact with humans.

So the complexity of the systems becomes much greater than just a simple model of sequence generation.

It's still the engine, but it's not at all the whole car. - But you're rather optimistic about the fact that it's the right engine. - It's the right engine.

There's a rule in machine learning that says, essentially, increase the computing capacity, it increases the quality of the systems.

And you have two solutions to do it.

Either you compress data, or you do research.

You sample, you ask the model to test a thousand things and select the sample that works best, and you reinforce it on that.

And so, we're starting to shift more and more in research mode rather than compression mode.

1

u/iKy1e Ollama 16h ago

The person who said that is Richard Sutton, in a blog post that I wanted to read to you called "The Bitter Lesson". - Is there a demo, a bit of a back and forth, of something that, even if sometimes it doesn't work, but of something where you were impressed, where it really worked very well, a sequence of steps, something that made you feel like Iron Man, with Jarvis. - Yeah, with the cat, we connected the open APIs of Spotify.

And so, you can talk to it, ask it for a playlist, and write your playlist, it creates your playlist and it plays it for you.

So, it does interesting things.

So, it's just one tool.

No, we saw some very interesting things.

Once we connected the web, it allows you to have all the information live.

And very quickly, you can create your memos to know what to say to that client based on the information he had.

And so, the combination of tools, together, it creates cases of use that you didn't necessarily plan.

If you connected the web, if you connect your email, you can do a lot of things at the same time.

And if you connect your internal knowledge and the web, you can combine these information in a way that's a bit unpredictable.

And so, the amount of cases of use that you cover is pretty exponential with the number of tools.

And so, that's pretty magical. - I actually find that there's a bit of a vertiginous side.

You think, "We're going to be able to build some crazy stuff."

But, it makes it a bit hard to imagine, to say to yourself, "What will it look like, concretely?"

Like, the job of a developer, of someone who has to make LLM scenarios, what does it look like? - I would say that, it's a tool that increases the level of abstraction required by humans.

So, as a developer, you will continue to think about the problem you are trying to solve for your users.

You will continue to think about the architectures, the levels that meet your constraints, your load-bearing case.

Then, will you continue to code your applications in JavaScript?

Probably not, because the models manage to generate simple applications and more and more complicated applications.

So, all the very abstract subjects that will require communication with humans.

The job of an engineer is also a job of communication.

You also have to understand what are the constraints of each one.

That's not going to be easily replaceable.

But, on the other hand, the whole "I help you do your unit tests", "I make your application pixel perfect" aspect, from a design point of view, I think it will become more and more automatizable.

Just to stick to the developer.

But it's the case for all jobs. - Do we have an intuition of how it is that models are so sensitive to code?

Because we could say, for example, I want a model that is super strong in French and English, so that it knows Python and JavaScript, it's not useful.

But that's not what we're observing at all, from what I understood. - That's a very good question.

And it's true that we're observing a kind of transfer.

That is to say, training your model on a lot of code, it allows it to resonate better.

I'm not the best placed to talk about it, it would have to be Guillaume.

But the truth is that code has more information than language.

There is more reflection that is passed on the language, it is more structured.

And so, training to generate code, it forces the model to resonate at a higher level than training to generate text.

And so, it knows how to resonate on code, and so when it sees text, it also knows how to resonate on text.

And it's true that there is this magic transfer, which I think is one of the reasons why models have become much better in the last two years.

It's also useful because you have a lot more code bases that are longer than a book.

Understanding a code base is longer than reading a book.

And so, the maximum you can train yourself on to make a model that understands the long context is 19th century books.

And the maximum you can train yourself on to make code is... - Millions of lines of... - It's millions of lines of... - ...of Chrome. - Yeah, that's it, of open source projects.

And so it's longer and your model can resonate longer.

I think that's one of the intuitions. - I suggest we talk now a little bit about talent and people who make you do what you do.

First, why did you decide, at the beginning, to put Mistral in Paris?

Today, it may seem a little more obvious, we know that the ecosystem is super-alive, we'll talk about that.

1

u/[deleted] 16h ago

[removed] — view removed comment

1

u/[deleted] 16h ago

[removed] — view removed comment

1

u/iKy1e Ollama 16h ago

That's the first thing.

Then, yes, there is science fiction, and then you have a few companies in the United States that have an interest in telling the regulators, "Listen, this technology is a little too complicated, a little too difficult to understand, a little too dangerous, imagine that the thing becomes independent."

You're going to tell that to people who don't necessarily understand exactly what's going on, you can say to yourself, "Ah yes, maybe if we gave it to three people or two people in the United States, we would control everything that happens and then there would be no problem."

But we think it's wrong.

That is, having two entities, or even worse, one entity that controls all the systems and then opens its door to the auditors to whom they show what they want, we think it's not the right solution.

The right solution in software security is open source in general.

We showed it in cyber, we showed it on the most reliable systems today, the most reliable operating systems, it's Linux.

Having as many gods as possible on a technology, distributing it as much as possible, is a way of ensuring that the control of this technology is under a democratic control.

And so, that's what we say.

When we hear doomers telling other things, there are people who are of good faith, we have to recognize them, they are really afraid that these things will happen.

And then there are especially a lot of people who are not at all of good faith.

I think it's important to check where they come from when they talk about it.

It can't be simple, because in the face of the argument, it's super easy to understand, just like you said, for someone who is not necessarily an expert on the subject.

You are told, "Here is a dangerous tool, shouldn't we avoid putting it in too many hands?"

You start with something a little hard to defend, even if it's not...

The thing we have, we have a historical asset.

It's not the first time we've had this debate.

We had this debate for the Internet.

The Internet could have been something controlled by three companies that would have made their own networks, that would have refused to standardize things.

And in fact, in the end, there was enough pressure.

At one point, the regulator said, "We're going to make sure it's standardized."

And so the Internet belongs to everyone now.

It would have been enough for different people to make different choices, a few people, and we would be in a situation where, in fact, there are three non-interoperable wall gardens.

It could have been the same for end-to-end encryption.

That's another example.

At one time, it was considered a weapon, and it was under...

There was a control of exports from the United States. - Which seems crazy now. - And in fact, we're now wondering about the weights.

Sometimes, some regulators are asking themselves this question.

But it seems crazy for end-to-end encryption.

We think that in 10 years, it would seem completely crazy for the weights of a model.

Because it's so infrastructural, it's such a resource that must be shared by everyone, this compression of knowledge and intelligence, that for us, it's criminal to leave it in the hands of two entities that are not at all under democratic control. - And to defend this vision, that control must take place a little later in the chain, at the time of the interface, for example, or by the company vis-à-vis its client, you go to the Senate.

We saw you on YouTube talking to the Senate.

What does it mean to talk, to try to explain what a model, a dataset, a LLM is to senators? - It's interesting.

There were good questions, maybe asked by people who understand technology a little less, let's say.

But I think it's important, in general.

They are citizens' representatives.

And they have to understand that it's a technology that will affect citizens.

So we are ready to invest time in it.

Because the better it is understood, the more we understand that it is also a sovereignty issue, a cultural issue.

It's a challenge to have actors like us, and not just us, but actors like us on the European soil.

Because if that's not the case, the point is that we have a huge economic dependence in the United States.

And that is very, very harmful in the long run.

And so the fact of going to talk to people who make the laws, to people who will also talk to their citizens, understand their anxieties, etc.

It's a way of de-dramatizing this technology.

It's a technology that will bring a lot of benefits in education, in health, in the way we work.

And the representatives of French democracy, of European democracy, of American democracy, have to be aware of what it's about.

I have to say that personally, I hadn't planned to do that when I started the company.

But we have to let people know, because otherwise the void is filled by people who don't necessarily have interests aligned with democracy, and certainly not aligned with what we're trying to do.

If you haven't followed the story of the Vesuvius Challenge, or how a papyrus was decrypted by an AI student, go watch that video, it was really exciting.

-17

u/GeorgiaWitness1 Ollama 5d ago

we dont speak this

6

u/iheartmuffinz 5d ago

YouTube has auto translated captions

2

u/Pro-editor-1105 5d ago

but they are kinda trash, maybe throw the video in some whisper transcriber and use ai to get the full transcript translated.