-
Note2 Build A Llm From Scratch
需要看pytorch的文档,进一步了解里面第一步的作用是什么,以了解MultiHeadAttention的原理
class MultiHeadAttention(nn.Module): def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False): super().__init__() assert d_out % num_heads == 0, "d_out must be divisible by num_heads" self.d_out = d_out self.num_heads = num_heads self.head_dim = d_out // num_heads # 1 self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) self.out_proj = nn.Linear(d_out, d_out) # 2 self.dropout = nn.Dropout(dropout) self.register_buffer( "mask", torch.triu(torch.ones(context_length, context_length), diagonal=1) ) def forward(self, x): b, num_tokens, d_in = x.shape keys = self.W_key(x) # 3 queries = self.W_query(x) # 3 values = self.W_value(x) # 3 keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) # 4 values = values.view(b, num_tokens, self.num_heads, self.head_dim) queries = queries.view(b, num_tokens, self.num_heads, self.head_dim) keys = keys.transpose(1, 2) # 5 queries = queries.transpose(1, 2) # 5 values = values.transpose(1, 2) # 5 attn_scores = queries @ keys.transpose(2, 3) # 6 mask_bool = self.mask.bool()[:num_tokens, :num_tokens] # 7 attn_scores.masked_fill_(mask_bool, -torch.inf) # 8 attn_weights = torch.softmax(attn_scores / keys.shape[-1] ** 0.5, dim=-1) attn_weights = self.dropout(attn_weights) context_vec = (attn_weights @ values).transpose(1, 2) # 9 # 10 context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out) context_vec = self.out_proj(context_vec) # 11 return context_vec
-
Note1 Build A Llm From Scratch
-
一句话,像
"I am a student."
,am 的 tokenID 是一个[xx,xx,xx]这样的“三维”向量。第书的第二章讲的encode token,是变成一个数字,怎么到了第三章,突然成了一个三维向量?中间发生了哪些操作? -
在第三章的“Implementing self-attention with trainable weights”这一节,说到“我们成功将6个输入标记从三维投影到二维嵌入空间上”。为什么要把三维转成二维?这里说的“embedding space”又是指什么?
-
-
Git Pull Pr
git fetch origin pull/265/head:pr-265
-
Github Pages Cloudflare
将 GitHub Pages 的自定义域名迁到 cloudflare 之后,一直死循环 301。
原因是,cloudflare 默认的 SSL 配置是 “灵活”:仅在访问者与 Cloudflare 之间启用加密。这可以避免浏览器发出安全警告,但 Cloudflare 与您的源服务器之间的所有连接均通过 HTTP 建立。
而我的 github pages 配置了 enforce https,所以就会出现死循环。
但是,还有一个github pages 项目使用不能配置 enforce https,配置页面显示”Enforce HTTPS — Unavailable for your site because your domain is not properly configured to support HTTPS”,还不明白怎么回事,怎么解决。目前是在 cloudflare 配置了始终重定向到 https
-
Autogen Question 1
autogen里面的一段代码如下:
selector_group_chat = SelectorGroupChat( [add_agent, multiply_agent, subtract_agent, divide_agent, identity_agent], model_client=OpenAIChatCompletionClient(model="gpt-4o"), termination_condition=termination_condition, allow_repeated_speaker=True, # Allow the same agent to speak multiple times, necessary for this task. selector_prompt=( "Available roles:\n{roles}\nTheir job descriptions:\n{participants}\n" "Current conversation history:\n{history}\n" "Please select the most appropriate role for the next message, and only return the role name." ), )
selector_prompt 里面的 participants 是什么意思?
-
Python Lambda
方法一:使用默认参数
functions = [] for i in range(10): functions.append(lambda x=i: print(x)) for f in functions: f()
方法二:使用闭包
def create_printer(i): def inner(): print(i) return inner functions = [create_printer(i) for i in range(10)] for f in functions: f()
-
weekend
周六上午,带赛赛去上英语课,又交了下个学期的学费,有点贵,感觉需要换一个。。
中午和赛赛一起玩了塞尔达,找了城堡里面的神庙。
西瓜问我要不要打麻将,我说不打。然后她自己去约别人了。
下午4点左右,我去滑板。滑了一会,西瓜电话说一起吃饭,德州。我就滑板去地铁店一起吃饭。德州最后一把才运气回本。有一把没打好,翻牌满池有人跟,转盘顶对博花All,对方跟。
周日起来,带赛赛去陆家嘴中心和钱吃饭。赛赛在星巴克挑了一套杯子,他本来是洗了一个保温杯,快买单的时候,他又看到一套四件套杯子,要换这个,就换了。下次路过星巴克,再给他买一个保温杯吧。
下午带他学了一会英语,RAZ A。
晚上妈妈要带他做英语作业,叫了他几次,他都不理。妈妈发火,把他拽过去,问他为啥不回答。我把妈妈挡开,赛赛趴我这里哭了一会。然后我带他做了作业,很快就做完了,作业本身不难,应该说作业本身太简单了。还是需要额外的学习内容才行。赛赛现在还基本上只会单词。
-
Golang Time Parse
case ‘.’, ‘,’: // ,000, or .000, or ,999, or .999 - repeated digits for fractional seconds.
看 golang time lib 代码,000 前面需要有, 或者是. , 才是合法的
2006-01-02 150405000 像这种是解析不出来的,不管你的 format 怎么配置。
-
嗓子不疼了
嗓子突然不疼了,开心。这几天的感冒算是好一个段落了。
-
Make Lazy Expand Var
=
is Recursively Expanded:=
is Immediately Expanded下面是 copolit 自动生成的。
?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches:=
is Simply Expanded?=
is Conditionally Expanded+=
is Appending Variable!=
is not equal^=
is Finding and Removing the First Match%=
is Finding and Removing the Last Match*=
is Finding and Removing All Matches