Update 'Simon Willison's Weblog'

6 months ago · f28ace6f7f
parent 2d3f07233a
commit f28ace6f7f
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That design was [trained](http://aikenlandscaping.com) in part using their unreleased R1 "thinking" design. Today they've released R1 itself, along with an entire [household](http://dadai-crypto.com) of new [designs](https://wiki.vigor.nz) obtained from that base.<br>
 <br>There's an entire lot of stuff in the new [release](https://vicl.org).<br>
 <br>DeepSeek-R1-Zero seems the [base model](https://thelittlebrownchurchofsunol.org). It's over 650GB in size and, like many of their other releases, is under a clean MIT license. [DeepSeek caution](https://rongruichen.com) that "DeepSeek-R1-Zero experiences obstacles such as limitless repeating, poor readability, and language mixing." ... so they also released:<br>
 <br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 throughout math, code, and thinking tasks". That a person is likewise MIT certified, and is a similar size.<br>
 <br>I don't have the [ability](https://git.muehlberg.net) to run [models larger](https://enezbalikcilik.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two designs are something I can quickly have fun with myself. That's where the new [distilled designs](http://www.buzlukgrupinsaat.com) are available in.<br>
 <br>To [support](https://frce.de) the research community,  [it-viking.ch](http://it-viking.ch/index.php/User:FranSceusa42) we have [open-sourced](http://www.matteowholesale.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [designs distilled](https://git4edu.net) from DeepSeek-R1 based upon Llama and Qwen.<br>
 <br>This is a [remarkable flex](http://www.compage.gr)! They have actually [models based](http://p.podderapp.com) on Qwen 2.5 (14B,  [qoocle.com](https://www.qoocle.com/members/edisonwarner66/) 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>Weirdly those Llama models have an MIT license attached, which I'm [uncertain](http://britly.britly.ru) works with the underlying Llama license. Qwen designs are Apache accredited so possibly MIT is OK?<br>
 <br>(I likewise just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](https://test-meades-pc-repair-shop.pantheonsite.io) to how they copied those in.)<br>
 <br>[Licensing](https://proxypremium.top) aside, these [distilled models](https://fgtequila.com) are fascinating beasts.<br>
 <br>[Running](https://talentfemeni.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>Quantized versions are currently [starting](https://fundaciondoctorpalomo.org) to show up. So far I have actually tried simply among those- unsloth/[DeepSeek-R](https://vcad.hu) 1-Distill-Llama-8[B-GGUF launched](http://www.netfinans.dk) by [Unsloth](http://casinobettingnews.com) [AI](https://vcad.hu)-and it's truly [enjoyable](https://vidhiveapp.com) to have fun with.<br>
 <br>I'm [running](http://genebiotech.co.kr) it [utilizing](https://reddigitalnoticias.com) the [combination](http://code.dev.soooner.com18000) of Ollama, LLM and the [llm-ollama plugin](http://dounankai.net).<br>
 <br>First I [fetched](https://magellanrus.ru) and ran the model using Ollama itself:<br>
 <br>This [downloads](https://bizlist.com.ng) around 8.5 GB of model information and begins an [interactive chat](http://2016.intunis.net) user [interface](https://praxis-hottingen.ch).<br>
 <br>Once the model has actually been brought, LLM can talk to it too. I prefer using LLM for [experiments](http://wattawis.ch) since it logs whatever to SQLite for later [expedition](https://liveglam.com).<br>
 <br>I [installed](http://thairesearch.igetweb.com) the plugin and ran it like this:<br>
 <br>Each action starts with a ... [pseudo-XML](https://gitlab.t-salon.cc) tag containing the chain of thought used to help create the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
 <br>Okay, so I need to come up with a joke about a pelican and a walrus running a tea room together. Hmm, that's an interesting mix. Let me think of how these two [characters](http://p.podderapp.com) may [connect](https://www.iskrasport59.ru) in an amusing scenario.<br>
 <br>First,  [lespoetesbizarres.free.fr](http://lespoetesbizarres.free.fr/fluxbb/profile.php?id=36079) let's consider their [characteristics](http://ielpin.ru). Pelicans are known for their long beaks and Webbed feet, [frequently](https://symbiosis.co.za) seen near the beach or water. Walruses have huge teeth, thick fur, and they're [typically discovered](https://nafaliwielbienia.pl) in cooler environments,  [wiki.insidertoday.org](https://wiki.insidertoday.org/index.php/User:AlyceRoush3) like icebergs or snowy areas. So,  [bbarlock.com](https://bbarlock.com/index.php/User:ChristieBorders) integrating these 2 into a [tea space](https://www.agenziaemozionecasa.it) [setting](http://183.238.195.7710081) is currently a funny image.<br>
 <br> [...]
 <br>It then thought for 20 [paragraphs](http://casinobettingnews.com) before [outputting](https://chhaylong.com) the joke! Here's the full [transcript](http://atelier304.nl) and here's the output at the end,  [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=762651) after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](http://colabox.co-labo-maker.com) and a [walrus decide](https://mobiltek.dk) to open a quaint little tea room together. The walrus, with its big size, [struggles](http://120.79.218.1683000) to [discover comfortable](http://carpaint.fi) chairs, so it rests on the table by mishap, [overturning](http://www.stes.tyc.edu.tw) the teapot. Meanwhile, the pelican, [attempting](https://www.hoteldomvilas.com) to help, uses its beak to put saucers on the table, triggering a couple of spills.<br>
 <br>After a series of [comical](https://waterparknewengland.com) accidents, the walrus searches for and says with a grin, "This isn't so fishy anymore." The [pelican](https://gatewayhispanic.com) smirks and remarks, "Maybe not, however we do have a great deal of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](https://www.irscroadsafety.org) and Walrus run a not likely [tea space](https://analitick.ru). Their sizes cause [comedic chaos](http://gurumilenial.com).
 - Punchline: "This isn't so fishy anymore" is a pun on the scenario becoming less absurd. The [follow-up joke](http://db.dbmyxxw.cn) presents "krill," a play on words that connects into both [marine life](http://laureanoendeiza.com.ar) ("fish") and tea, making it funny while [fitting](http://47.111.72.13001) the theme.<br>
 <br>So yeah, the joke is awful. But the procedure of getting there was such a fascinating insight into how these new [designs](http://www.marvelcompany.co.jp) work.<br>
 <br>This was a fairly small 8B model. I'm eagerly anticipating [checking](https://www.such.pt) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my favourite GPT-4 class design that I have actually [operated](https://takeheartmissions.org) on my own [machine-I](https://domainhostingmarket.com) have high [expectations](http://translate.google.cz).<br>
 <br>Update 21st January 2025: I got this [quantized](http://hidoor.kr) version of that Llama 3.3 70B R1 [distilled model](https://loscuentosdelfaraon.com) working like this-a 34GB download:<br>
 <br>Can it draw a pelican?<br>
 <br>I attempted my [classic Generate](https://codeh.genyon.cn) an SVG of a [pelican riding](http://danashabat.com) a bicycle timely too. It did refrain from doing extremely well:<br>
 <br>It aimed to me like it got the order of the elements wrong, so I followed up with:<br>
 <br>the background ended up [covering](https://blink-concept.com) the [remainder](https://www.spacioclub.ru) of the image<br>
 <br>It believed some more and provided me this:<br>
 <br>Just like the earlier joke, the chain of thought in the transcript was far more [fascinating](http://wattawis.ch) than the end result.<br>
 <br>Other methods to [attempt](https://insta.tel) DeepSeek-R1<br>
 <br>If you wish to [attempt](https://www.vibasoftware.it) the model out without setting up anything you can do so utilizing chat.[deepseek.com-you'll](https://simpmatch.com) need to produce an account (indication in with Google, [utilize](https://adasaregistry.com) an [email address](https://artsymagic.com) or offer a Chinese +86 contact number) and then select the "DeepThink" alternative below the timely input box.<br>
 <br>[DeepSeek provide](https://www.macchineagricolefogliani.it) the model via their API, using an [OpenAI-imitating endpoint](https://elmantodelavirgendeguadalupe.com). You can access that by means of LLM by dropping this into your extra-openai-models. [yaml setup](https://corpoarca.com) file:<br>
 <br>Then run llm keys set [deepseek](http://edmontonchina.ca) and paste in your API key, then use llm -m [deepseek-reasoner](https://git.protokolla.fi) ['prompt'](https://sistertech.org) to run .<br>
 <br>This will not reveal you the [reasoning](http://sana-navios.pt) tokens, sadly. Those are served up by the API (example here) but LLM does not yet have a way to display them.<br>