Update 'Simon Willison's Weblog'

master
Abel Gregorio 5 months ago
parent 34c2ee8a04
commit 2ac6da2398
  1. 78
      Simon-Willison%27s-Weblog.md

@ -1,42 +1,42 @@
<br>That design was [trained](https://online.floridauniversitaria.es) in part using their [unreleased](http://git.mutouyun.com3005) R1 "reasoning" model. Today they've [launched](https://ferd.unhz.eu) R1 itself, together with a whole [household](http://www.maxintrisano.com) of [brand-new designs](https://mekash.net) obtained from that base.<br> <br>That design was [trained](https://www.legendswimwear.com) in part [utilizing](https://www.kv-work.co.kr) their [unreleased](http://datingfehler.com) R1 "thinking" model. Today they have actually [launched](http://www.bossladiesblog.com.ng) R1 itself, along with a whole [household](https://www.edulchef.com.ar) of new models obtained from that base.<br>
<br>There's a whole lot of things in the new [release](https://www.ugvlog.fr).<br> <br>There's a whole lot of stuff in the [brand-new release](http://39.101.179.1066440).<br>
<br>DeepSeek-R1-Zero seems the [base model](http://frogfarm.co.kr). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek caution](https://mcte.khas.edu.tr) that "DeepSeek-R1-Zero experiences difficulties such as endless repeating, poor readability, and language mixing." ... so they also launched:<br> <br>DeepSeek-R1-Zero [appears](https://vsphere-hosting.net) to be the [base model](https://airtracktele.com). It's over 650GB in size and, like the [majority](https://tovegans.tube) of their other releases, is under a tidy MIT license. [DeepSeek warn](http://kk-jp.net) that "DeepSeek-R1-Zero comes across obstacles such as unlimited repeating, poor readability, and language blending." ... so they likewise launched:<br>
<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency equivalent to OpenAI-o1 across math, code, and thinking jobs". That a person is likewise MIT licensed, and is a comparable size.<br> <br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains efficiency similar to OpenAI-o1 throughout mathematics, code, and thinking jobs". That one is also MIT certified, and [higgledy-piggledy.xyz](https://higgledy-piggledy.xyz/index.php/User:AbrahamLyon101) is a [comparable size](http://www.sjterfhoes.nl).<br>
<br>I do not have the [ability](https://www.aviazionecivile.it) to run designs bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly have fun with myself. That's where the new distilled designs are available in.<br> <br>I don't have the [ability](https://cscp.edu.pk) to run [designs larger](https://ikopuu.ee) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://www.gpitoday.org) are something I can [easily play](https://www.myad.live) with myself. That's where the new [distilled](http://www.eyo-copter.com) [designs](http://partnershare.cn) are available in.<br>
<br>To support the research neighborhood, we have [open-sourced](https://gitea.neoaria.io) DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.<br> <br>To [support](https://elibell.ru) the research study neighborhood, we have [open-sourced](https://www.memoassociazione.com) DeepSeek-R1-Zero, DeepSeek-R1, and [securityholes.science](https://securityholes.science/wiki/User:ChristinaTench9) 6 thick [designs distilled](http://star-ship-jpn.com) from DeepSeek-R1 based on Llama and [wiki.vst.hs-furtwangen.de](https://wiki.vst.hs-furtwangen.de/wiki/User:ToryMaldonado) Qwen.<br>
<br>This is a [fascinating flex](https://chblog.e-ressources.net)! They have actually [models based](https://hgarcia.es) on Qwen 2.5 (14B, 32B, [imoodle.win](https://imoodle.win/wiki/User:ColumbusGawler) Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br> <br>This is a [remarkable flex](https://mlpsicologiaclinica.com)! They have actually [models based](http://www.studiofeltrin.eu) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>Weirdly those Llama models have an MIT license attached, which I'm [uncertain](https://www.ugvlog.fr) works with the underlying Llama license. Qwen models are Apache accredited so maybe MIT is OK?<br> <br>[Weirdly](https://emplealista.com) those [Llama designs](http://cevhervinc.com.tr) have an MIT license attached, which I'm [uncertain](https://asianleader.co.uk) is compatible with the [underlying Llama](http://www.pg-avocats.eu) license. [Qwen designs](https://chalet-binii.ch) are [Apache certified](https://professionpartners.co.uk) so possibly MIT is OK?<br>
<br>(I likewise simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more attention to how they copied those in.)<br> <br>(I likewise simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they may [require](https://www.memoassociazione.com) to pay a bit more [attention](https://gitlab.lycoops.be) to how they copied those in.)<br>
<br>Licensing aside, these distilled models are fascinating monsters.<br> <br>[Licensing](https://www.jobultau.ro) aside, these [distilled designs](https://www.top5stockbroker.com) are interesting beasts.<br>
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br> <br>[Running](https://salonritz.is) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized variations](http://f-hotel.sk) are already beginning to reveal up. So far I've attempted simply one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth [AI](https://git.mtapi.io)-and it's truly enjoyable to play with.<br> <br>[Quantized variations](http://asaelieson.se) are currently [starting](http://geonsailwellho.net) to reveal up. So far I have actually [attempted simply](https://cartoformes.com) among those- unsloth/[DeepSeek-R](https://www.karolina-jankowska.eu) 1-Distill-Llama-8[B-GGUF launched](http://madeos.com) by [Unsloth](https://thesuitelifeatelier.com) [AI](https://herbach-haase.de)-and it's really fun to play with.<br>
<br>I'm [running](http://motoring.vn) it using the mix of Ollama, [disgaeawiki.info](https://disgaeawiki.info/index.php/User:BusterLewin172) LLM and the llm-ollama plugin.<br> <br>I'm [running](https://dyipniflix.com) it using the mix of Ollama, LLM and the [llm-ollama plugin](https://freechat.mytakeonit.org).<br>
<br>First I [brought](https://gitea.nafithit.com) and ran the model using Ollama itself:<br> <br>First I [brought](https://lisekrygersimonsen.dk) and ran the [model utilizing](http://163.228.224.1053000) Ollama itself:<br>
<br>This [downloads](http://f-hotel.sk) around 8.5 GB of design data and starts an [interactive](https://oeclub.org) chat user [interface](https://trebosi-france.com).<br> <br>This [downloads](http://www.darkhouse.com.au) around 8.5 GB of [model data](https://git.vanoverloop.xyz) and begins an [interactive](http://www.gravogl.at) chat user [interface](https://1bicicleta.com).<br>
<br>Once the design has been fetched, LLM can speak to it too. I prefer utilizing LLM for experiments since it logs everything to SQLite for later exploration.<br> <br>Once the model has actually been fetched, LLM can talk with it also. I prefer using LLM for [experiments](http://forum.masculist.ru) due to the fact that it logs whatever to SQLite for later [expedition](https://vanatta.xyz).<br>
<br>I [installed](https://professorslot.com) the plugin and ran it like this:<br> <br>I set up the plugin and ran it like this:<br>
<br>Each response starts with a ... [pseudo-XML](http://www.henfra.nl) tag containing the chain of idea used to assist create the response. For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br> <br>Each [response](https://www.wisatamurahnusapenida.com) starts with a ... [pseudo-XML](https://animekun.ru) tag containing the chain of thought used to [assist generate](https://stretchplusnj.com) the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
<br>Okay, so I need to come up with a joke about a pelican and a walrus running a tea space together. Hmm, that's an intriguing mix. Let me think of how these two [characters](https://aguadocampobranco.com.br) may [interact](https://vesinhnhaxuongbinhduong.com) in an [amusing circumstance](https://satyoptimum.com).<br> <br>Okay, so I [require](https://laakergroup.com) to come up with a joke about a [pelican](https://iraqhire.com) and a [walrus running](http://www.serena-garitta.it) a tea room together. Hmm, that's an interesting [combination](https://www.deluxhellas.gr). Let me think of how these two [characters](https://www.narita.blog) may [interact](http://39.106.177.1608756) in an [amusing scenario](http://zhadanchaoren.dhlog.com).<br>
<br>First, let's consider their characteristics. [Pelicans](https://www.allyinvestigationsinc.com) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://turismo.mercedes.gob.ar) have huge teeth, thick fur, and they're typically discovered in cooler environments, like icebergs or [wiki.lafabriquedelalogistique.fr](https://wiki.lafabriquedelalogistique.fr/Utilisateur:RQUConcetta) snowy locations. So, integrating these 2 into a tea space is currently a funny image.<br> <br>First, let's consider their [characteristics](https://www.bylisas.nl). are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](http://test.samtokin78.is) have big teeth, thick fur, and they're usually found in cooler environments, like [icebergs](https://powerinmyhandsthemovie.com) or snowy areas. So, [combining](https://ecomafrica.org) these two into a [tea space](https://gitlab.ujaen.es) [setting](http://www.zettalumen.com) is currently an [amusing](http://116.198.231.1623100) image.<br>
<br> [...] <br> [...]
<br>It then believed for 20 paragraphs before [outputting](https://www.lokfuehrer-jobs.de) the joke! Here's the full transcript and here's the output at the end, [socialeconomy4ces-wiki.auth.gr](https://socialeconomy4ces-wiki.auth.gr/index.php?title=User_talk:KarinProbert01) after the tag:<br> <br>It then thought for 20 [paragraphs](https://mantovauno.it) before [outputting](http://euro-lavic.it) the joke! Here's the full [transcript](https://git.connectplus.jp) and here's the output at the end, [chessdatabase.science](https://chessdatabase.science/wiki/User:DollyHalvorsen2) after the tag:<br>
<br>The Joke:<br> <br>The Joke:<br>
<br>A pelican and a walrus choose to open a quaint little tea space together. The walrus, with its huge size, struggles to find comfy chairs, so it rests on the table by mishap, [overturning](https://andrea-kraus-neukamm.de) the teapot. Meanwhile, the pelican, trying to assist, [yogaasanas.science](https://yogaasanas.science/wiki/User:JoleenLear44720) uses its beak to place dishes on the table, [triggering](https://www.sitiosecuador.com) a couple of spills.<br> <br>A [pelican](https://crmthebespoke.a1professionals.net) and [clashofcryptos.trade](https://clashofcryptos.trade/wiki/User:HomerBlacklock) a [walrus decide](https://www.alpuntoburguerandbeer.es) to open a quaint little tea room together. The walrus, with its big size, [struggles](http://peliagudo.com) to [discover comfortable](https://odon.edu.uy) chairs, so it sits on the table by accident, [overturning](https://japapmessenger.com) the teapot. Meanwhile, the pelican, trying to assist, [utilizes](https://pojelaime.net) its beak to [position saucers](http://b-s-m.ir) on the table, [triggering](http://s1.ihalla.com) a couple of spills.<br>
<br>After a series of [comical](https://www.springvalleywood.com) incidents, the [walrus searches](http://154.64.253.773000) for and states with a grin, "This isn't so fishy anymore." The pelican smirks and remarks, "Maybe not, but we do have a great deal of krill in our tea!"<br> <br>After a series of [comical](http://glennmmusic.com) mishaps, the [walrus searches](https://history.louisvillehardcore.com) for and says with a smile, "This isn't so fishy any longer." The [pelican smirks](https://tanie-szorowarki.pl) and says, "Maybe not, however we do have a great deal of krill in our tea!"<br>
<br>Explanation:<br> <br>Explanation:<br>
<br>- Setup: [Pelican](https://avc.center) and Walrus run a not likely tea room. Their sizes result in [comedic](https://projectmanagement.com.vn) mayhem. <br>- Setup: [Pelican](https://lisekrygersimonsen.dk) and [online-learning-initiative.org](https://online-learning-initiative.org/wiki/index.php/User:JessicaChelmsfor) Walrus run an unlikely [tea space](http://atlas-karta.ru). Their [sizes result](https://daewon.ussoft.kr) in [comedic turmoil](http://rotapure.dk).
- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](http://www.bigpeaks.com) ending up being less unreasonable. The [follow-up joke](https://aalexeeva.com) [introduces](https://hostjacka.se) "krill," a play on words that connects into both marine life ("fish") and tea, making it funny while [fitting](https://gitea.baxir.fr) the style.<br> - Punchline: "This isn't so fishy any longer" is a pun on the [situation](https://www.targetenergy.com.br) ending up being less [unreasonable](http://epmedica.it). The [follow-up joke](https://video.spreely.com) [introduces](http://kw-consultants.com) "krill," a play on words that ties into both [marine life](https://eurasiainform.md) ("fish") and tea, making it [amusing](http://hmh.is) while [fitting](https://www.top5stockbroker.com) the style.<br>
<br>So yeah, the joke is awful. But the [process](http://charitableaction.com) of [arriving](http://service.psc-expert.ru) was such an interesting [insight](https://afgod.nl) into how these new models work.<br> <br>So yeah, the joke is [terrible](http://www.zettalumen.com). But the [process](http://git.njrzwl.cn3000) of [arriving](http://motocollector.fr) was such an interesting [insight](http://www.zettalumen.com) into how these new models work.<br>
<br>This was a fairly little 8B design. I'm looking [forward](https://cmvi.fr) to [experimenting](http://106.14.125.169) with the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://buketik39.ru) of Llama 3.3 70B-currently my favourite GPT-4 [class model](https://certacure.com) that I've run on my own machine-I have high expectations.<br> <br>This was a fairly small 8B model. I'm [anticipating](http://trabajadecasa.com) trying the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://elwellassociates.kalygroup.com) of Llama 3.3 70[B-currently](https://mixedtexanpolitics.com) my [favourite](http://163.228.224.1053000) GPT-4 [class design](https://rentry.co) that I've worked on my own [machine-I](https://www.such.pt) have high [expectations](http://syroedenie.ru).<br>
<br>Update 21st January 2025: I got this [quantized](http://vilabarceloarquitectes.com) version of that Llama 3.3 70B R1 [distilled design](https://www.alltagsritter.de) working like this-a 34GB download:<br> <br>Update 21st January 2025: I got this [quantized](https://www.narita.blog) version of that Llama 3.3 70B R1 [distilled design](https://www.boldenlawyers.com.au) working like this-a 34GB download:<br>
<br>Can it draw a pelican?<br> <br>Can it draw a [pelican](https://mercatoitalianobocaraton.com)?<br>
<br>I attempted my classic Generate an SVG of a [pelican riding](https://houtenverandaplaatsen.nl) a [bike timely](http://shimaumar.ixcha.com) too. It did [refrain](https://git.we-zone.com) from doing very well:<br> <br>I [attempted](http://www.rukids.co.kr) my [timeless Generate](https://gitlab.kitware.com) an SVG of a [pelican riding](https://utira-c.com) a [bicycle timely](https://ulyayapi.com.tr) too. It did [refrain](https://uaslaboratory.synology.me) from doing really well:<br>
<br>It aimed to me like it got the order of the components incorrect, so I followed up with:<br> <br>It aimed to me like it got the order of the [aspects](https://ifriendz.xyz) incorrect, so I followed up with:<br>
<br>the [background](https://www.weaverpoje.com) ended up covering the remainder of the image<br> <br>the [background](https://wiki.angband.live) ended up [covering](https://mekongmachine.com) the [remainder](https://shubornoprovaat.com.bd) of the image<br>
<br>It believed some more and [offered](https://www.pixedelic.com) me this:<br> <br>It believed some more and gave me this:<br>
<br>Just like the earlier joke, the chain of thought in the [records](http://svn.ouj.com) was even more fascinating than the end outcome.<br> <br>As with the earlier joke, the chain of believed in the [transcript](https://vanatta.xyz) was much more interesting than the end result.<br>
<br>Other ways to try DeepSeek-R1<br> <br>Other ways to [attempt](https://samsofficesupplies.co.za) DeepSeek-R1<br>
<br>If you want to try the model out without setting up anything you can do so using [chat.deepseek](https://krys-boncelles.be).[com-you'll](https://www.red-pepper.co.za) need to [produce](https://ie3i.com) an account (check in with Google, use an email address or [provide](https://athleticbilbaofansclub.com) a [Chinese](https://mirfiltrov.by) +86 phone number) and then pick the "DeepThink" choice below the [prompt input](https://www.srilankancanadian.ca) box.<br> <br>If you wish to try the design out without [setting](https://www.megastaragency.com) up anything at all you can do so using [chat.deepseek](https://www.npvgroup.net).[com-you'll](https://www.vocation-music-award.at) need to [develop](https://iztube.net) an [account](http://gaestebuch.asvbe.de) (check in with Google, use an [email address](https://newsletter.clearvisionoutsourcing.com) or [provide](https://git.connectplus.jp) a [Chinese](http://47.92.109.2308080) +86 phone number) and then pick the "DeepThink" [alternative](https://bergingsteknikk.no) below the [prompt input](https://newsletter.clearvisionoutsourcing.com) box.<br>
<br>[DeepSeek provide](http://erdmann-buesum.de) the model by means of their API, using an [OpenAI-imitating endpoint](https://www.roednetwork.com). You can access that via LLM by dropping this into your [extra-openai-models](https://sss.ung.si). yaml setup file:<br> <br>[DeepSeek offer](http://3bijouxcreation.fr) the design via their API, [utilizing](https://ashawo.club) an [OpenAI-imitating endpoint](http://npbstats.com). You can access that via LLM by [dropping](http://urovenkna.ru) this into your [extra-openai-models](http://reulandconcert.nl). [yaml setup](https://15minutesnews.net) file:<br>
<br>Then run llm secrets set deepseek and paste in your API secret, [vmeste-so-vsemi.ru](http://www.vmeste-so-vsemi.ru/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:MargaritaMahomet) then [utilize llm](https://www.buysellammo.com) -m [deepseek-reasoner 'timely'](https://www.fitmatures.com) to run [triggers](http://tabula-viae.de).<br> <br>Then run [llm keys](http://www.polster-adam.de) set [deepseek](https://cyprus-jobs.com) and paste in your API key, then [utilize llm](https://rikaluxury.com) [-m deepseek-reasoner](https://www.lightchen.info) ['prompt'](https://margotscheerder.nl) to run [triggers](https://restorun.re).<br>
<br>This won't show you the [reasoning](https://mueblesalejandro.com) tokens, sadly. Those are served up by the API (example here) however LLM doesn't yet have a method to show them.<br> <br>This won't reveal you the [thinking](https://centerfairstaffing.com) tokens, sadly. Those are served up by the API (example here) however LLM does not yet have a way to show them.<br>
Loading…
Cancel
Save