-
处女座
我在咖啡店,听到一个顾客的店员的对话。
店里是没有卫生间吗?
对,在旁边,暖光商场里面。
多远?
看你走多快。
NONONO,多远并不取决于走多快~
-
流水账
前天看 B 站的跨年演唱会,觉得还不错,今天买入一些 B 站。
今天一上班就搞了两个故障,一个把 FWS SIN 的机器重启后起不来了,可能是 FSTAB 配置有问题,另外一个是更新 Hickwall Kafka 集群配置后,operator 直接重启了 POD,导致一天都在做数据复制。
-
Disk Automatically Unmounts Immediately After Mounting
When it happens, it’s incredibly frustrating - you’ve had a disk replaced on a linux box, the disk has shown up with a different name in /dev, so you edit /etc/fstab and then try to mount the disk.
The command runs, without error, but the disk isn’t mounted, and doesn’t appear in df
This documentation details the likely cause, and how to resolve it
If you look in dmesg, you might see something like the following
[ 462.754500] XFS (sdc): Mounting V5 Filesystem [ 462.857216] XFS (sdc): Ending clean mount [ 462.871119] XFS (sdc): Unmounting Filesystem
Which, whilst it shows the disk is getting unmounted almost immediately, isn’t otherwise very helpful. It doesn’t tell us why.
However, if you look in syslog (e.g. /var/log/messages, journalctl or /var/log/syslog) you may well see this logged again with a couple of additional relevant lines
kernel: XFS (sde): Mounting V5 Filesystem kernel: XFS (sde): Ending clean mount systemd: Unit cache2.mount is bound to inactive unit dev-sdc.device. Stopping, too. systemd: Unmounting /cache2... kernel: XFS (sde): Unmounting Filesystem systemd: Unmounted /cache2.
We can now see that the erstwhile init system - systemd - decided to unmount the filesystem
systemd: Unit cache2.mount is bound to inactive unit dev-sdc.device. Stopping, too. The reason for this is that at boot time systemd-fstab-generator generates, in effect, a bunch of dynamic unit files for each mount. From the output above we can tell the disk used to be sdc but is now sde. Despite fstab saying
/dev/sde /mnt/cache2 xfs defaults,nofail 0 0
When we issue the command
mount /cache2
SystemD picks up on the fact that it has an inactive unit file (inactive because the block device has gone away) which should be mounted to that path, decides there’s a conflict, and that it knows better, and unmounts your mount again If you’re in this position, then, you should be able to briefly resolve with a single command
systemctl daemon-reload
Keep in mind that if your disk moves back following a reboot, you’ll be back to this point where SystemD decides you can’t have wanted to mount your disk after all.
SystemD have a bug for this, raised in 2015 and seemingly still unresolved (it’s certainly still attracting complaints at time of writing). Rather worryingly, it suggests that the above will not always resolve the issue, and instead suggests the following “workaround”
-
Linux Shell Variable Render
a='a' echo "$a"` 和 `a='a' ; echo "$a"` 有什么区别?
我现在的理解是,shell 渲染这个 $a 的时机(顺序)问题,如果没有分号,$a 先渲染,然后才执行命令,foo=bar 是命令的一部分。
shell 会在执行当前命令 之前把 $XX 这种先渲染掉(如果是单引号里面就不渲染了)
加了分号或者是使用 && ,就是两个使用,后面命令执行的时候,变量已经被赋值了。如果没有分号或者没有&&,a=’a’ 是命令的一部分,bash 渲染变量在执行命令之前,所以渲染的时候还没有值。
另外多说一下,在 echo 命令执行的时候,$a 其实是有值的,只不过 echo 跟的参数是 ““,而不是 env(a)。
-
Org.xerial.snappy.snappy
java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
可能是因为 /tmp 挂载点掉了?
-
Resize Fs
lsblk fdisk /dev/nvme0n1 partprobe pvcreate /dev/nvme0n1p4 vgextend VolGroup00 /dev/nvme0n1p4 lvdisplay lvextend -l +100%FREE /dev/VolGroup00/lv_root lvs resize2fs /dev/VolGroup00/lv_root df -h
-
Tanyan
跟烦的人打交道的时候,就想着赶紧把他的事处理掉拉倒,反而让对方占便宜了,操,这是一个矛盾。
-
Respect
人的相处,我觉得最重要是尊重。
不过每个人对尊重的理解不一样吧,不强求。礼尚往来吧。
-
Update Kafka Cert
在重启 ZK 之前先更新 broker 里面的证书并 reload
export c="" && for i in {0..9} ; do k exec -ti $c-shaxy-b-kafka-$i -npro-kafka -- sh -c 'export CERTS_STORE_PASSWORD=$(grep listener.name.replication-9091.ssl.keystore.password /tmp/strimzi.properties | cut -f2 -d=) && sh /opt/kafka/kafka_tls_prepare_certificates.sh' ; k exec -ti $c-shaxy-b-kafka-$i -npro-kafka -- bin/kafka-configs.sh --bootstrap-server 127.0.0.1:9092 --entity-type brokers --entity-name $i --alter --add-config listener.name.replication-9091.ssl.truststore.location=/tmp/kafka/cluster.truststore.p12 ; done
这个命令应该是会去重载证书(虽然路径名字未变),因为中间尝试删除 truststore 再运行命令行报错,说文件不存在。
而且还做了一个“破坏性”的测试:不运行 kafka-config.sh 直接重启一个 Broker,会导致其他 Broker 全部报证书验证失败。然后再刷一下 kafka-confg.sh,报错停止。所以看起来这个脚本的确是重载了新的证书。
按这个方法跑下来,RB 全部无损重启了。
但 XY 有大概6个集群还是出问题,需要重启 Broker 才行。
这些需要重启的 Broker 在支行上面的 kafka-config 命令时报错如下:
Error while executing config command with args '--bootstrap-server 127.0.0.1:9092 --entity-type brokers --entity-name 0 --alter --add-config listener.name.replication-9091.ssl.truststore.location=/tmp/kafka/cluster.truststore.p12' java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='0'): Validation of dynamic config update of SSLFactory failed: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:345) at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:297) at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:90) at kafka.admin.ConfigCommand.main(ConfigCommand.scala) Caused by: org.apache.kafka.common.errors.InvalidRequestException: Invalid config value for resource ConfigResource(type=BROKER, name='0'): Validation of dynamic config update of SSLFactory failed: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
-
Kafka Unclean Leader Election
刚刚发现一个现象,正常吗?
如果一个Topic是下面这样的,它不会 unclean leader elect 选出来0 做leader,需要再次触发一下 unclean.leader.election.enable=true,哪怕这个 topic 已经是 true
{ "PartitionErrorCode": 72, "PartitionID": 43, "Leader": -1, "LeaderEpoch": 21, "Replicas": [ 0, 3 ], "Isr": [ 3 ], "OfflineReplicas": [ 3 ] }