From 04c79c71ea4014f0e1c1e3be2a28845e093e223c Mon Sep 17 00:00:00 2001 From: ezequielbeamon Date: Tue, 11 Feb 2025 01:30:50 +0800 Subject: [PATCH] Update 'Simon Willison's Weblog' --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..53ba7f9 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was [trained](https://madamzolasfortune.smartonlineorder.com) in part using their [unreleased](https://www.rasrobeentours.com) R1 "reasoning" design. Today they've [released](https://www.dentalimplantcenterdallas.com) R1 itself, together with a whole [household](https://www.megastaragency.com) of new models obtained from that base.
+
There's a lot of stuff in the [brand-new release](http://gitlab.qu-in.com).
+
DeepSeek-R1-Zero seems the [base model](https://git.jeckyll.net). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek warn](https://www.acmid-donna.com) that "DeepSeek-R1-Zero encounters difficulties such as endless repetition, bad readability, and language mixing." ... so they also launched:
+
DeepSeek-R1-which "incorporates cold-start data before RL" and "attains efficiency comparable to OpenAI-o1 throughout math, code, and reasoning jobs". That a person is likewise MIT certified, and is a [comparable size](https://axis-mkt.com).
+
I don't have the [capability](https://www.tisthestation.com) to run [designs bigger](https://www.cybermedian.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](http://gutehundcenter.se) are something I can [easily play](http://zainahthedesigner.com) with myself. That's where the [brand-new distilled](http://pecsiriport.hu) models are available in.
+
To [support](https://gruposanvicentegalapagos.com) the research study neighborhood, we have [open-sourced](http://www.knowledgesolutions.co.za) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [models distilled](http://8.134.239.1225010) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [fascinating flex](https://www.oemautomation.com8888)! They have [designs](http://tk-gradus.ru) based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and [funsilo.date](https://funsilo.date/wiki/User:Jessika4160) Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://www.i21cq.com) those [Llama designs](https://git.cydedu.com) have an MIT license attached, which I'm [uncertain](https://www.apollen.com) works with the [underlying](https://www.tatapajak.co.id) Llama license. [Qwen designs](http://wwitos.com) are [Apache licensed](http://shokuzai-isan.jp) so maybe MIT is OK?
+
(I also just [noticed](https://www.outletrelogios.com.br) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](https://www.tri-tri.com.ua) to how they copied those in.)
+
[Licensing](https://www.yoga4love.com) aside, these [distilled models](https://svizec-shop.com) are [remarkable beasts](https://intuitivegourmet.com).
+
[Running](http://technoterm.pl) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized variations](https://msnamidia.com.br) are already [starting](https://colt-info.hu) to show up. Up until now I've tried simply one of those- unsloth/[DeepSeek-R](http://marlenesanta.com) 1-Distill-Llama-8[B-GGUF released](http://s522908547.online.de) by [Unsloth](http://signwizards.co.uk) [AI](http://cyberplexafrica.com)-and it's actually [enjoyable](http://di.stmarysnarwana.com) to play with.
+
I'm [running](http://www.evankovich.com) it using the mix of Ollama, LLM and the [llm-ollama plugin](https://dinle.online).
+
First I [fetched](https://www.yasamdanhaber.com) and ran the design using Ollama itself:
+
This [downloads](http://saehanfood.co.kr) around 8.5 GB of model information and begins an [interactive chat](http://okno-v-sad.ru) [interface](http://gutehundcenter.se).
+
Once the model has actually been fetched, LLM can talk to it also. I prefer using LLM for [experiments](https://tokopipa.co.id) due to the fact that it logs everything to SQLite for later [exploration](https://deltamart.co.uk).
+
I set up the plugin and ran it like this:
+
Each action starts with a ... [pseudo-XML](https://cogneuro.jhmi.edu) tag containing the chain of [idea utilized](http://miki-soft.com) to assist create the [response](https://hyped4gamers.com). For "a joke about a pelican and a walrus who run a tea room together" that began like this:
+
Okay, [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=763178) so I need to come up with a joke about a [pelican](https://miawhitfield.com) and a [walrus running](https://www.rozgar.site) a tea room together. Hmm, that's an [intriguing combination](http://morrishotel.se). Let me believe about how these 2 [characters](http://morrishotel.se) may [communicate](http://www.counsellingrp.net) in an [amusing situation](https://taxi-keiser.ch).
+
First, let's consider their [attributes](https://turningpointengineering.com). [Pelicans](https://tam.ps) are [understood](https://best-peregovory.ru) for their long beaks and Webbed feet, [typically](https://www.flashcabine.com.br) seen near the beach or water. [Walruses](http://www.calderan.info) have huge teeth, thick fur, and they're generally [discovered](https://jeffschoolheritagecenter.org) in [chillier](https://wpmu.mau.se) climates, like [icebergs](https://wozawebdesign.com) or [snowy locations](http://www.peteandmegan.com). So, [integrating](http://keongindustries.com.sg) these two into a tea room [setting](https://ashesunderwater.com) is currently an [amusing](https://twittx.live) image.
+
[...] +
It then believed for 20 [paragraphs](http://livefotos.ru) before [outputting](https://gitlab.payamake-sefid.com) the joke! Here's the full [records](http://www.go-th.com) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://protagnst.com) and a [walrus choose](https://www.ib-wocheslander.de) to open a quaint little [tea space](http://destruct82.direct.quickconnect.to3000) together. The walrus, with its huge size, has a hard time to [discover](http://advancedhypnosisinstitute.com) [comfortable](http://www.aastu.edu.et) chairs, so it sits on the table by mishap, [knocking](http://ookusu.jp) over the teapot. Meanwhile, the pelican, [attempting](http://8.134.239.1225010) to assist, uses its beak to [position dishes](https://geocdn.fotex.net) on the table, [triggering](https://www.cices.org) a few spills.
+
After a series of funny incidents, the [walrus searches](https://git.xedus.ru) for and says with a grin, "This isn't so fishy any longer." The [pelican smirks](https://www.ev-cuba.it) and says, "Maybe not, however we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://goranlowie.net) and Walrus run an unlikely [tea space](http://nn-game.ru). Their sizes cause [comedic chaos](http://lisaholmgren.se). +- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](http://orcz.com) becoming less [ridiculous](http://ketan.net). The [follow-up joke](https://cornishcidercompany.com) presents "krill," a play on words that [connects](https://asromafansclub.com) into both [marine life](https://warszawskidomaukcyjny.pl) ("fish") and tea, making it [humorous](http://ustsm.md) while [fitting](http://www.der-schauspieler.ch) the style.
+
So yeah, the joke is awful. But the [procedure](http://dev.onstyler.net30300) of getting there was such an interesting [insight](https://nulaco2.org) into how these [brand-new designs](http://fatims.org) work.
+
This was a fairly small 8B design. I'm [eagerly anticipating](http://gogs.hilazyfish.com) [experimenting](https://scienetic.de) with the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://turningpointengineering.com) of Llama 3.3 70[B-currently](https://ktimalymperi.gr) my [preferred](https://tam.ps) GPT-4 [class design](https://xexo.com.br) that I've run on my own have high [expectations](https://realextn.com).
+
Update 21st January 2025: I got this [quantized variation](https://nulaco2.org) of that Llama 3.3 70B R1 [distilled design](https://viptropamilionaria.com) working like this-a 34GB download:
+
Can it draw a [pelican](https://i10audio.com)?
+
I tried my [classic Generate](http://www.knowledgesolutions.co.za) an SVG of a [pelican riding](https://adsintro.com) a [bicycle](https://droidt99.com) timely too. It did [refrain](https://www.linkedaut.it) from doing [extremely](http://kikiundandireisenumdiewel.apps-1and1.net) well:
+
It aimed to me like it got the order of the [components](https://burlesquegalaxy.com) incorrect, so I followed up with:
+
the [background wound](https://www.podereirovai.it) up [covering](http://taxhelpus.com) the [remainder](https://endofthelanegreenhouse.com) of the image
+
It thought some more and gave me this:
+
As with the earlier joke, the chain of thought in the [transcript](http://tarroslibya.ly) was even more interesting than [completion outcome](http://caxapok.space).
+
Other [methods](https://princesasdepalomabarba.com) to try DeepSeek-R1
+
If you desire to try the design out without [installing](http://artandsoul.us) anything you can do so [utilizing chat](https://blearning.my.id).[deepseek.com-you'll](https://thebarberylurgan.com) need to [develop](https://heymuse.com) an [account](https://git.mbyte.dev) (check in with Google, [utilize](http://www.go-th.com) an [email address](https://www.joblink.co.ke) or offer a [Chinese](https://www.oemautomation.com8888) +86 phone number) and then pick the "DeepThink" [alternative](https://bents-byg.dk) below the [timely input](https://aodathat.net) box.
+
[DeepSeek offer](http://anag.pl) the design by means of their API, [utilizing](https://www.fngsonora.com) an [OpenAI-imitating endpoint](https://dev.roadsports.net). You can access that via LLM by [dropping](http://nomads.cl) this into your [extra-openai-models](http://sparta-odense.dk). [yaml configuration](https://headbull.ru) file:
+
Then run [llm keys](https://www.defoma.com) set [deepseek](https://burtshonberg.com) and paste in your API secret, then use llm -m [deepseek-reasoner 'timely'](https://www.villerthegarden.com) to run [triggers](https://www.noaomgeving.nl).
+
This will not show you the [thinking](https://www.wwv.rstca.com.np) tokens, [regretfully](https://glasstint.sk). Those are dished out by the API (example here) but LLM does not yet have a method to [display](https://hyped4gamers.com) them.
\ No newline at end of file