Hive 创建表时正则表达式与java代码中一致,RegexSerDe 中用来表示反转码
1. hive 创建表时正则表达式与java代码中一致,regexserde 中用来表示反转码
create table inst_cn_3 (
ip string,
time string,
mac string,
lan string,
ver string,
lc string,
pn string,
reg string,
vrf string,
line string)
row format serde ‘org.apache.hadoop.hive.contrib.serde2.regexserde’
with serdeproperties (
“input.regex” = “- *([d.]+) *[([d]{2}/[w]+/[d]{4}:[d]{2}:[d]{2}:[d]{2}s++[d]+)] *get */mx3/inst/([0-9a-f]{12})/ver=([.d]+),lan=(0x[w]+)(?:,lc=([w]+))(?:,pn=([w]+))(?:,reg=([0-1]))(?:,vrf=([w]+))?.*”
)
stored as textfile;
alter table inst_cn_3 add columns(line string);
ALTER TABLE inst_cn_3 SET SERDEPROPERTIES (
“input.regex” = “- ([d.]+) [([d]{2}/[w]+/[d]{4}:[d]{2}:[d]{2}:[d]{2}s++[d]+)] GET /mx3/inst/([0-9a-f]{12})/ver=([.d]+),lan=(0x[w]+)(?:,lc=([w]+))(?:,pn=([w]+))(?:,reg=([0-1]))(?:,vrf=([w]+))?.*|(.*)”
);
select * from inst_cn_3 limit 100;
select
line
from inst_cn_3
where
1=1
and mac is null
and line is not null
and !(line rlike ‘.*unknowuser00.*’)
;
hadoop fs -cp /mnt/nfs/log/statcn/inst/inst_cn_3.*.txt /hive/warehouse/inst_cn_3/
2011.06.09 hive时间处理
1.select
from_unixtime(unix_timestamp(’02/May/2011:00:00:00 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
from_unixtime(unix_timestamp(’02/May/2011:23:59:59 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
from_unixtime(unix_timestamp(’03/May/2011:00:00:00 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
from_unixtime(unix_timestamp(’03/May/2011:23:59:59 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’)
from
stat_c_log
where
1=1
and partkey = ‘20110503’
and logType = ‘inst_cn_3’
and url rlike ‘/mx3/inst/.*’
limit
10
;
Reclaim.ai
为优先事项创建完美的时间表
67 查看详情
2.
select
from_unixtime(unix_timestamp(’02/May/2011:00:00:00 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
from_unixtime(unix_timestamp(’02/May/2011:23:59:59 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
from_unixtime(unix_timestamp(’03/May/2011:00:00:00 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
from_unixtime(unix_timestamp(’03/May/2011:23:59:59 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’),’yyyy-MM-dd HH:mm:ss’) ,
round((unix_timestamp(’02/May/2011:00:00:00 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’)-4*3600)/(3600*24))*(3600*24*1000),
round((unix_timestamp(’02/May/2011:23:59:59 +0800′,’dd/MMMMM/yyyy:HH:mm:ss Z’)-4*3600)/(3600*24))*(3600*24*1000)
from
stat_c_log
where
1=1
and partkey = ‘20110503’
and logType = ‘inst_cn_3’
and url rlike ‘/mx3/inst/.*’
limit
10
;
2012.03.01
1. Hive 方法注册类 FunctionRegistry
2012.06.14
1. set hive.cli.print.header=true; 可以设置hive shell的输出.
2012.06.26
1. hive cdh4b2 使用arichive 对表归档后, 使用select line 对归档后的partition查询时,, 报FileNotFoundException 异常。
https://issues.apache.org/jira/browse/MAPREDUCE-2704 是因为CombineFileInputFormat constructs new Path objects by converting an existing path to a URI, and then only pulling out the “path” part of it. This drops the scheme and host, which makes CombineFileInputFormat fail if the paths are on a filesystem other than the default one.
2012.07.16
1. EXPLAIN EXTENDED hive_query; 查看运行.
2012.07.29
1. DESCRIBE FORMATTED mock; 显示columns, location, params等.
2. /src/ql/src/test/queries/clientpositive/

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 chuangxiangniao@163.com 举报,一经查实,本站将立刻删除。
发布者:程序猿,转转请注明出处:https://www.chuangxiangniao.com/p/1084182.html
微信扫一扫
支付宝扫一扫