• Note2 Build A Llm From Scratch

    需要看pytorch的文档,进一步了解里面第一步的作用是什么,以了解MultiHeadAttention的原理

    class MultiHeadAttention(nn.Module):
        def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False):
            super().__init__()
            assert d_out % num_heads == 0, "d_out must be divisible by num_heads"
    
            self.d_out = d_out
            self.num_heads = num_heads
            self.head_dim = d_out // num_heads  # 1
            self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias)
            self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias)
            self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias)
            self.out_proj = nn.Linear(d_out, d_out)  # 2
            self.dropout = nn.Dropout(dropout)
            self.register_buffer(
                "mask", torch.triu(torch.ones(context_length, context_length), diagonal=1)
            )
    
        def forward(self, x):
            b, num_tokens, d_in = x.shape
            keys = self.W_key(x)  # 3
            queries = self.W_query(x)  # 3
            values = self.W_value(x)  # 3
    
            keys = keys.view(b, num_tokens, self.num_heads, self.head_dim)  # 4
            values = values.view(b, num_tokens, self.num_heads, self.head_dim)
            queries = queries.view(b, num_tokens, self.num_heads, self.head_dim)
    
            keys = keys.transpose(1, 2)  # 5
            queries = queries.transpose(1, 2)  # 5
            values = values.transpose(1, 2)  # 5
    
            attn_scores = queries @ keys.transpose(2, 3)  # 6
            mask_bool = self.mask.bool()[:num_tokens, :num_tokens]  # 7
    
            attn_scores.masked_fill_(mask_bool, -torch.inf)  # 8
    
            attn_weights = torch.softmax(attn_scores / keys.shape[-1] ** 0.5, dim=-1)
            attn_weights = self.dropout(attn_weights)
    
            context_vec = (attn_weights @ values).transpose(1, 2)  # 9
            # 10
            context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out)
            context_vec = self.out_proj(context_vec)  # 11
            return context_vec
    
  • Note1 Build A Llm From Scratch

    1. 一句话,像"I am a student.",am 的 tokenID 是一个[xx,xx,xx]这样的“三维”向量。第书的第二章讲的encode token,是变成一个数字,怎么到了第三章,突然成了一个三维向量?中间发生了哪些操作?

    2. 在第三章的“Implementing self-attention with trainable weights”这一节,说到“我们成功将6个输入标记从三维投影到二维嵌入空间上”。为什么要把三维转成二维?这里说的“embedding space”又是指什么?

  • Git Pull Pr

    git fetch origin pull/265/head:pr-265

  • Github Pages Cloudflare

    将 GitHub Pages 的自定义域名迁到 cloudflare 之后,一直死循环 301。

    原因是,cloudflare 默认的 SSL 配置是 “灵活”:仅在访问者与 Cloudflare 之间启用加密。这可以避免浏览器发出安全警告,但 Cloudflare 与您的源服务器之间的所有连接均通过 HTTP 建立。

    而我的 github pages 配置了 enforce https,所以就会出现死循环。

    但是,还有一个github pages 项目使用不能配置 enforce https,配置页面显示”Enforce HTTPS — Unavailable for your site because your domain is not properly configured to support HTTPS”,还不明白怎么回事,怎么解决。目前是在 cloudflare 配置了始终重定向到 https

  • Autogen Question 1

    autogen里面的一段代码如下:

    
    selector_group_chat = SelectorGroupChat(
    
    [add_agent, multiply_agent, subtract_agent, divide_agent, identity_agent],
    
    model_client=OpenAIChatCompletionClient(model="gpt-4o"),
    
    termination_condition=termination_condition,
    
    allow_repeated_speaker=True, # Allow the same agent to speak multiple times, necessary for this task.
    
    selector_prompt=(
    
    "Available roles:\n{roles}\nTheir job descriptions:\n{participants}\n"
    
    "Current conversation history:\n{history}\n"
    
    "Please select the most appropriate role for the next message, and only return the role name."
    
    ),
    
    )
    
    

    selector_prompt 里面的 participants 是什么意思?

  • Python Lambda

    方法一:使用默认参数

    functions = []
    for i in range(10):
        functions.append(lambda x=i: print(x))
    
    for f in functions:
        f()
    

    方法二:使用闭包

    def create_printer(i):
        def inner():
            print(i)
        return inner
    
    functions = [create_printer(i) for i in range(10)]
    
    for f in functions:
        f()
    
  • weekend

    周六上午,带赛赛去上英语课,又交了下个学期的学费,有点贵,感觉需要换一个。。

    中午和赛赛一起玩了塞尔达,找了城堡里面的神庙。

    西瓜问我要不要打麻将,我说不打。然后她自己去约别人了。

    下午4点左右,我去滑板。滑了一会,西瓜电话说一起吃饭,德州。我就滑板去地铁店一起吃饭。德州最后一把才运气回本。有一把没打好,翻牌满池有人跟,转盘顶对博花All,对方跟。

    周日起来,带赛赛去陆家嘴中心和钱吃饭。赛赛在星巴克挑了一套杯子,他本来是洗了一个保温杯,快买单的时候,他又看到一套四件套杯子,要换这个,就换了。下次路过星巴克,再给他买一个保温杯吧。

    下午带他学了一会英语,RAZ A。

    晚上妈妈要带他做英语作业,叫了他几次,他都不理。妈妈发火,把他拽过去,问他为啥不回答。我把妈妈挡开,赛赛趴我这里哭了一会。然后我带他做了作业,很快就做完了,作业本身不难,应该说作业本身太简单了。还是需要额外的学习内容才行。赛赛现在还基本上只会单词。

  • Golang Time Parse

    case ‘.’, ‘,’: // ,000, or .000, or ,999, or .999 - repeated digits for fractional seconds.

    看 golang time lib 代码,000 前面需要有, 或者是. , 才是合法的

    2006-01-02 150405000 像这种是解析不出来的,不管你的 format 怎么配置。

  • 嗓子不疼了

    嗓子突然不疼了,开心。这几天的感冒算是好一个段落了。

  • Make Lazy Expand Var

    = is Recursively Expanded

    := is Immediately Expanded

    下面是 copolit 自动生成的。

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches

    := is Simply Expanded

    ?= is Conditionally Expanded

    += is Appending Variable

    != is not equal

    ^= is Finding and Removing the First Match

    %= is Finding and Removing the Last Match

    *= is Finding and Removing All Matches