[TOC]
正则表达式相关
开始和结束符 作为或者出现
^和$写到[]里面就是固定的了.
^放到[]并且在第一位时候eg[ ^abc ]表示非abc的意思. 当出现在中间eg [a^bc]表示a、b、c、^ 这四个字符的其中一个。
| 要想匹配a、b 或 c或行开头应该怎么写呢?不能写成 [ ^a bc]。但可以写成^[abc], 或者可以写成(^ | [abc]) |
示例程序
###统计PRF指标
SELECT tp, fp, tn, fn, recall, precision, 2*precision*recall / (precision + recall) as f1
FROM
(
SELECT tp, fp, tn, fn, tp*1.0/(tp+fn) as recall, tp*1.0/(tp+fp) as precision
FROM
(
SELECT sum(tp) as tp, sum(fp) as fp, sum(tn) as tn, sum(fn) as fn
FROM
(
SELECT CAST(SPLIT_PART(flag, ',', 1) AS BIGINT) AS tp, CAST(SPLIT_PART(flag, ',', 2) AS BIGINT) AS fp,
CAST(SPLIT_PART(flag, ',', 3) AS BIGINT) AS tn, CAST(SPLIT_PART(flag, ',', 4) AS BIGINT) AS fn
FROM
(
SELECT
CASE ---- TP FP TN FN
WHEN label_predict = 1 AND label = 1 THEN '1,0,0,0' --tp
WHEN label_predict = 1 AND label = 0 THEN '0,1,0,0' --fp
WHEN label_predict = 0 AND label = 0 THEN '0,0,1,0' --tn
WHEN label_predict = 0 AND label = 1 THEN '0,0,0,1' --fn
END AS flag
FROM
(
SELECT label, CASE WHEN score >= 0.5 THEN 1 ELSE 0 END AS label_predict
FROM du_reader_sentence_classify_validation_output_ori
WHERE ds = '${bizdate}'
) A
) B
) C
) D
) E
计算不同threshold下的覆盖情况
SELECT D.threshold, D.avg_confidence_rouge, D.number, E.all_number, D.number * 1.0/E.all_number as ratio
FROM
(
SELECT "1" AS key, threshold, AVG(confidence_rouge) as avg_confidence_rouge, COUNT(1) as number
FROM
(
SELECT threshold, answer_score, confidence_rouge
FROM
(
SELECT "1" as key, answer_score, confidence_rouge FROM search_du_reader_mrc_predict_result
) A
JOIN
(
SELECT key, CAST(threshold AS DOUBLE) AS threshold
FROM
(
SELECT TRANS_ARRAY(1, ',', key, threshold) AS (key, threshold)
FROM
(
SELECT "1" as key, '0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1' as threshold FROM dual
) B1
) B2
) B ON A.key = B.key
WHERE answer_score >= threshold
) C
GROUP BY threshold
) D
JOIN
(
SELECT "1" as key, COUNT(1) as all_number FROM search_du_reader_mrc_predict_result
) E ON D.key = E.key
DISTRIBUTE BY "1" SORT BY threshold
;
- 高频二阶bert, 覆盖pv: 37%
- 类目改写: 覆盖pv: 42%
- 中长尾二阶bert, 覆盖pv: 29%
- 中长尾替换改写, 覆盖pv: 28%
- 意图改写, 覆盖pv: 27%
- session改写, 覆盖pv: 22%
- 少结果&低质量改写, 覆盖pv: 15%左右
- 向量召回, 覆盖pv: 8%左右
- 新增同义词覆盖pv: +35%左右
方案1: 关闭: 高频二阶bert + 类目改写 + session改写 + 向量召回
方案2: 关闭: 高频二阶bert + 类目改写 + session改写 + 向量召回 + 关闭新增同义词 + 意图改写 + 中长尾替换改写
-
ha3缺口比较大, 需要加资源. (缺口3.5w的core)
-
周四重新压测(或者周三)
-
ha3 70%的时候开始降级,
-
昨天压测, 改写和phrase各降低了10%, 但是还没抗住.
-
昨天压测的时候, 无结果率正常的高. 不是超时. 日常1点几, 后面会上升到3-4%. 随着压测加量, 会提升. (别离. 兆逸)
重点在于理解
1: 把问题再升华下, 提升到query理解这方面
2: mall tab 尝试transfer_learning 试试
3: 不要表现的太明显, 自己的想法不要直接表现出来.
4: hb属于和谁都有隔阂的, 而且举棋不定, 没有自己的想法. 但是也不要表现的太明显
5: 点击反馈和意图理解结合起来.
6: 和某个人靠近不要太明显, 最近刷pv有点刷的太积极了.容易引起老大和同事的误会
’
