大数据软件数据库 Hadoop 移动开发 Strata

Strata Data Conference 北京2017

2017年07月13日 - 07月15日

北京

¥4451 起

会议介绍

【会议嘉宾】

往届嘉宾：Strata + Hadoop World 2016 讲师

数据创新

Bin Fan
Software Engineer, Alluxio
Bin Fan is a software engineer at Alluxio. Bin is one of the top committers on the Alluxio project. Prior to Alluxio, Bin worked at Google building next-generation storage infrastructure, where he won Google’s Technical Infrastructure award. Bin has a PhD in computer science from Carnegie Mellon University.

Haoyuan Li
CEO, Alluxio
李浩源是Alluxio公司（前称Tachyon Nexus公司）的创始人和CEO。李浩源在加州大学伯克利分校AMPLab读博期间，他联合创造了Alluxio，一个开源的内存级别速度的虚拟分布式存储系统。此外，他是Apache Spark项目的founding committer。在进入AMPLab之前, 他曾经在Conviva和Google公司工作。李浩源在拥有康奈尔大学的硕士学位和北京大学的学士学位。
Haoyuan Li is founder and CEO of Alluxio (formerly Tachyon Nexus). He is also a computer science PhD candidate at AMPLab, UC Berkeley, where he co-created Alluxio, an open source memory speed virtual distributed storage system. He is a founding committer of Apache Spark. Before the AMPLab, he worked at Conviva and Google. Haoyuan has an MS from Cornell University and a BS from Peking University.

Dongjie Shi
软件工程师, intel
I’m a senior software engineer at Intel.

Haojun Wang
Software Architect, Baidu
Haojun Wang is a tech lead of Baidu’s US autonomous driving car team. Currently, Haojun is driving the in-car computing platform and offline data platform. Prior to Baidu, he worked at the IBM Silicon Valley Lab, focusing on database core development and big data processing. Haojun received his PhD in computer science from the University of Southern California.

Fangjin Yang
CEO, Imply
杨仿今是开源项目Druid的联合发起人、核心开发者，Imply联合创始人、ＣＥＯ。Imply 是位于美国旧金山的一家技术创业公司。杨仿今之前曾在 Metamarkets 和 Cisco 等公司任高级工程师。加拿大滑铁卢大学电气工程专业本科，计算机工程专业硕士。

Tianlun Zhang
Software Engineer, Intel
英特尔开源流处理系统Gearpump开发者，长期关注大数据领域和分布式计算，专注于流处理系统的开发和研究。

Sean Zhong
Senior Developer, Intel
Sean Zhong is a cloud architect in Intel Big Data engineering group. Sean’s expertise is in streaming, and he is the creator of Apache Gearpump as well as a PMC member of Apache Storm. Besides streaming, Sean participates in many other Apache projects, including Hadoop NativeTask and HBase media object storage.

周云庆
Google
周云庆，Google工程师，2011年上海交通大学毕业，曾就职于百度及阿里巴巴，参与了凤巢检索系统后端开发以及流式数据处理系统的开发。目前在Google参与Cloud Dataflow后端服务的开发工作。

孙玄
58同城高级系统架构师，技术委员会架构组主任，产品技术学院优秀讲师，58同城即时通讯、C2C技术负责人，擅长架构设计，负责58核心系统的架构以及优化工作，满足百亿级系统吞吐需求。分布式系统存储专家，2007年开始从事大规模高性能分布式存储系统架构设计实现工作。涉及自主研发分布式存储系统、MongoDB、MySQL、Memcached、Redis等。毕业于浙江大学。前百度高级工程师，参与社区搜索部多个基础系统的设计与实现。代表58同城多次参与QCon，SACC，DTCC，Top100等业界大会嘉宾演讲，并为《程序员》杂志两次撰稿。

富羽鹏
软件工程师, Alluxio Inc
Yupeng Fu is a software engineer at Alluxio Inc. Before joining Alluxio, he was a software engineer at Palantir Inc and a PhD student in UCSD.

杨克特
阿里巴巴
杨克特，花名鲁尼。2011年获得浙江大学计算机硕士学位后，一直在阿里巴巴从事技术研发工作，目前在搜索事业部离线部门当任搜索研发专家一职。

罗德祥
高级工程师, 星环信息科技（上海）有限公司
2008毕业于上海交通大学计算机，获得计算机科学与工程系硕士学位。
2008-2014,nVidia（上海）从事GPU架构设计，CUDA通用计算的相关工作。
2014-现在，星环科技，Hadoop的易用性，数据迁移，同步等工作。

顾荣
南京大学在读博士生, Alluxio 社区核心贡献者。

数据科学与高级分析

Jianmin Chen
Software Engineer, Google
Jianmin Chen is a software engineer at Google working on TensorFlow. Jianmin has been focusing on large-scale distributed training recently. Before joining the TensorFlow team, he worked on high-performance parallel computing architectures and platforms. In his spare time, you can find Jianmin hiking or hanging out with family and friends.

zhifeng chen
software engineer, Google
Zhifeng Chen is a software engineer at Google working on TensorFlow. Before joining the TensorFlow team, Zhifeng worked on the large-scale distributed storage systems that power Google Search and Gmail. Outside of work, he loves spending time with his wife and daughter.

Jike Chong
Head of Data Science, YiRenDai/CreditEase
种骥科博士现任宜人贷 (NYSE:YRD) 首席数据科学家，正利用“万神庙”框架创建／布局宜人贷数据部，并负责反欺诈风控，和数字驱动的运营和创新。之前，种骥科曾任职于美国Simply Hired招聘平台，创建了数据科学部，并应邀为白宫科技办公室参谋大数据技术产品设计。还曾就职于美国Silver Lake 私募公司任Kraftwerk基金数据科学架构师，负责大数据技术在私募投资风控方面的应用。种骥科曾任美国卡内基梅隆大学教授与博士生导师，持有加州大学伯克利分校电子工程和计算机科学系博士学位，卡内基梅隆大学电子和计算机工程系硕士及本科学位，和8项美国专利(5项获准，3项待批)。

Pengcheng He
Senior Software Engineer, Microsoft
Pengcheng He is a senior software engineer at Microsoft, where he works on large-scale computation, especially large-scale machine-learning algorithms. Previously, he worked on machine learning for about 2 years at Tencent. Pengcheng holds a degree from GUCAS.

Yihua Huang
Professor, Nanjing University(PASA BigData Lab)
黄宜华, 博士, 南京大学计算机系教授、博导，南京大学PASA大数据技术实验室主任。中国计算机学会大数据专家委员会常务委员、副秘书长, 江苏省计算机学会大数据专家委员会主任。主要研究方向为大数据并行处理。早在大数据还鲜为关注的2009年即已进入大数据技术领域，是国内最早从事大数据处理技术研究和教学的团队之一，在大数据存储查询、大规模RDF语义数据查询与推理、分布式内存文件系统、Hadoop/Spark系统优化、并行化机器学习与数据挖掘算法、大规模机器学习算法与系统等方面有一系列的研究工作，在国内外学术刊物和国际会议上发表大数据相关学术论文30多篇，撰写并出版大数据处理书籍/教材两部。在大数据领域，主持多项国家和省部级科研项目，此外还开展了与Google、Intel、UC Berkeley AMP Lab、微软亚洲研究院、百度、华为、中兴通讯等著名企业和机构的合作研究。

Yi Liu
Intel
刘轶(Yi Liu)，Intel高级软件工程师， Apache Hadoop的核心贡献者和项目管理委员会成员。致力于Hadoop的社区贡献和基于Spark的数据分析框架。

Angie Ma
COO, ASI
Angie Ma is cofounder and COO of ASI, a London-based startup that offers bespoke data science and engineering training and internships combining the best of academic and startup cultures for partner companies. She is passionate about enabling companies to access a data science skill set that generates real business value. Prior to joining ASI, Angie was a researcher in nanotechnology at UCL.

Ting Wang
数据科学家, 宜人贷
王婷，中国矿业大学（北京）计算机博士毕业，已从事数据挖掘、大规模社交网络分析、社会计算等领域研究近5年，博士期间曾在清华大学计算机系数据库组访问学习并研究大规模社交网络中社区发现算法。现任宜人贷数据科学家，从事金融反欺诈模型建模工作，搭建自动化个人信用风险分析系统，利用整合多种数据源帮助线上金融服务进行实时、快速、准确的风险识别与响应。

Yingsong Zhang
Data Science, 爱思数据科学
Yingsong Zhang is a data scientist at ASI, where she has worked on everything from social media data to special data from clients to build predictive models. Yingsong has published over 10 first-author research papers in top journals and conferences in the field of signal/image processing and has accumulated extensive experience in algorithm design and information representation. She recently completed a three-year postdoc project at Imperial College London developing sampling theory and the application system. Yingsong holds a BA in mathematics, an MSc in artificial intelligence and pattern recognition from one of China’s top universities, and a PhD in signal and image processing from Cambridge University.

张铭
教授, 北京大学
张铭，北京大学信息科学技术学院教授，博士生导师，ACM Education Council惟一的中国委员兼任中国ACM教育专委会主席，是ACM/IEEE IT2017学科规范起草小组成员。自1984年考入北京大学，分别获得学士、硕士和博士学位。研究方向为文本挖掘、社会网络分析、教育大数据等，目前主持国家自然科学基金和教育部博士点基金在研项目，合作发表科研学术论文100多篇（ICML, KDD, AAAI, IJCAI, ACL, WWW, TKDE等A类会议和期刊），获得ICML 2014最佳论文奖。发表了SIGCSE、L@S等教学研究论文，出版学术专著1部，获软件著作权6项，获发明专利3项。主编多部教材，其中2部教材为国家“十一五”规划教材，《数据结构与算法》获北京市精品教材奖并得到国家“十二五”规划教材支持。主持的“数据结构与算法”被评选为国家级和北京市级精品课程，也是教育部精品资源共享课程。

朱军
副教授, 清华大学
朱军，清华大学计算机系副教授、博士生导师、智能技术与系统国家重点实验室教学副主任、卡内基梅隆大学兼职副教授。主要从事机器学习、贝叶斯统计等基础理论、高效算法及相关应用研究，在国际重要期刊与会议JMLR、PAMI、ICML、NIPS等发表学术论文70余篇。受邀担任人工智能与模式识别著名杂志TPAMI的编委，担任机器学习国际大会ICML2014地区联合主席, 担任ICML (2014-2016)、NIPS (2013, 2015)、UAI (2014-2016)、IJCAI2015、AAAI2016等国际会议的领域主席。获微软学者、中国计算机学会优秀博士论文奖、中国计算机学会青年科学家奖、国家优秀青年基金、中创软件人才奖等，入选IEEE Intelligent Systems杂志评选的“AI’s 10 to Watch”、国家“万人计划”青年拔尖人才、及清华大学221基础研究人才计划。

褚崴
资深技术专家, 阿里云大数据事业部iDST
褚崴博士，现任职阿里云iDST资深技术专家，负责分布式机器学习平台产品的研发。之前曾任职美国微软首席科学家，美国雅虎实验室科学家，美国哥伦比亚大学副研究科学家。2003年至2006年，在英国伦敦大学学院Gatsby Unit做博士后研究工作。2003年在新加坡国立大学获得博士学位，统计机器学习方向。主要从事机器学习与大数据挖掘领域的研究，在个性化推荐系统和搜索产品等领域有多年的研发经验。在顶级期刊和国际会议上累计发表40余篇论文，并担任评审委员。Google Scholar引用3000多次，H-指数达到28；ACM WSDM 2011年会获得最佳论文奖。2016年入选第十二批国家“千人计划”创新长期类人才。

企业应用

Tony Xing
大数据资深产品经理, 微软中国有限公司
负责微软应用与服务集团的大数据平台构建，数据产品与服务。

史少锋
Kyligence
Kyligence技术合伙人兼资深架构师，Apache Kylin核心开发者和项目管理委员会成员（PMC)，专注于大数据分析和云计算技术。曾任eBay全球分析基础架构部大数据高级工程师，IBM云计算部门软件架构师；曾是IBM公有云Bluemix dev&ops团队核心成员，负责平台的规划、开发和运营。

崔宝秋
首席架构师, 小米
崔宝秋是小米科技有限责任公司的首席架构师，小米平台团队负责人，有十多年传统软件和互联网产品的开发经验。2000年获得美国纽约州立大学石溪分校计算机系博士学位。2000年至2006年历任IBM高级工程师和高级研发经理，从事数据库优化和内核总控等核心模块的工作。2006年至2010年任雅虎搜索技术（YST）核心团队主任工程师，参与了雅虎搜索引擎的热门搜索、查询优化和新一代查询缓存等重要项目的研发。2010年至2012年任LinkedIn主任工程师，开始接触社交网络，并负责LinkedIn搜索产品的研发，期间作为5个创始成员之一开源了SenseiDB，一个分布式实时搜索系统。2012加入小米科技有限责任公司，现负责小米服务器和云平台团队的工作。

谭耀宗
企业数据科学家, Thomson Reuters
演讲者是汤森路透企业数据科学家。他曾在不同的会议演讲过，包括在财务心情指数，能源经济学和计算金融计量经济学方面。他拥有哥伦比亚大学的金融工程硕士学位和新加坡国立大学商业分析硕士学位。他有超过12年的金融市场和数据分析经验曾在投资银行，证券交易所和石油巨头公司工作过.

韩卿
Co-Founder & CEO, Kyligence Inc
Kyligence联合创始人兼CEO，Apache Kylin联合创建者及项目副总裁，首个来自国内的Apache软件基金会顶级项目VP，负责Kylin的战略规划，发展路线图及产品设计等，并致力于发展Apache Kylin全球社区，构建生态系统及推广等。曾任eBay全球分析基础架构部大数据产品负责人，Actuate中国首席顾问，卓越动力华东区技术总监等职，在大数据，数据仓库，商业智能，可视化情报分析等领域拥有丰富的经验。

Hadoop应用案例

Min Shen
Senior Software Engineer, LinkedIn
Min Shen is an engineer on LinkedIn’s Hadoop infrastructure development team helping to build the next generation Hadoop infrastructure at LinkedIn with better performance and manageability. Min holds a PhD degree in computer science from the University of Illinois with a research interest in distributed computing.

huiju song
Big Data Engineer, IBM
Hui Ju Song is a curious and passionate software developer very interested in the big data and machine-learning ecosystems. Hui Ju works as a big data engineer at IBM, where he develops spatial-temporal trajectory data analysis programs for custom projects.

henry zeng
Senior Solution Architect, IBM
Henry Zeng is a solution architect in IBM’s Big Data Analytics department with more than 10 years of experience on data analytics. Henry is located in Beijing, China.

罗震霄
Senior Software Engineer, Uber
Zhenxiao is a senior software engineer at Uber working on Presto and Parquet. Before joining Uber, he led the development and operations of Presto at Netflix. Zhenxiao has big data experience at Facebook, Cloudera, and Vertica on Hadoop-related projects. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.

Hadoop内核与开发

Xiao Chen
Software Engineer, Cloudera
Xiao Chen is a software engineer at Cloudera working on HDFS. Previously, he worked in Thomson Reuter’s time series team focusing on real-time in-memory databases. Xiao was born in Beijing, China. He holds a bachelor’s degree from Beihang University and a master’s degree from New York University.

Rui Li
Software Engineer, Intel
Rui Li acquired a master degree in computing science from Fudan University in 2013. He is now a software engineer at Intel and a committer of Apache Hive. He is also a contributor to Apache Spark.

Zhe Zhang
Software Engineer, LinkedIn
Zhe Zhang is a software engineer at LinkedIn’s Hadoop team. He’s an Apache Hadoop Committer and author of HDFS Erasure Coding, a major feature for Hadoop 3.0. Before LinkedIn Zhe worked at Cloudera and IBM T. J. Watson Research Center. Zhe has over 20 research publications and 5 US patents. While at IBM he has received the Research Accomplishment Award and Outstanding Technology Achievement Award.

刘鹏翔
高级架构师, 上海易鲸捷信息技术有限公司
刘鹏翔，易鲸捷高级方案架构师。负责EsgynDB和Apache Trafodion的行业应用方案的架构、开发和部署，有丰富的分布式海量并发和SQL-On-Hadoop的部署和调优经验，包括互联网、在线娱乐、银行、电信、车联网等。

物联网与实时计算

Yi Ai
技术经理, 滴滴出行(Didi Chuxing)
滴滴出行大数据部BI系统组负责人。负责滴滴大数据实时计算系统的架构设计及研发。

Maosong Fu
Twitter Inc.
Twitter实时计算平台技术主管，负责Heron, Presto等服务。Heron的原作者之一。专注于分布式系统，在SIGMOD等会议期刊发表多篇论文。本科毕业于华中科技大学；研究生毕业于卡耐基梅隆大学.

周明伟
架构师, 浙江大华技术股份有限公司
周明伟，2010年加入浙江大华技术股份有限公司，现任职于浙江大华大数据研究院，负责技术架构组相关工作。是公司内最早开始关注云存储、云计算相关的技术、架构，以及考虑大数据技术架构引入视频监控行业应用的人员之一。作为架构师和核心代码开发人员，主导行业内第一个自研分布式文件系统的设计和开发工作。一直从事云存储、云计算相关技术、架构等工作，对视频监控行业有较深刻理解，对云存储、云计算相关技术、架构有较广泛的涉猎。擅长分布式系统、高并发服务、高性能等相关技术。

王晨
CTO, 昆仑智汇数据科技（北京）有限公司
王晨，现担任昆仑数据CTO，清华大学软件学院大数据中心总工程师，《中国制造2025》路线图（操作系统与工业软件）编写组成员。加入清华大学前担任IBM中国研究院资深研究员，数据管理技术研究部高级经理，IBM软件部中国信息管理软件开发中心核心技术领导团队成员，IBM全球分析云研究战略负责人。他领导并参与了多个数据领域IBM新产品以及产品新技术的研发。他同时在数据库与数据分析领域的一流国际会议与期刊（SIGMOD, VLDB, TKDE, TVCG等）上发表了20余篇论文，拥有50余项中国、美国专利（含申请中），担任多个学术会议审稿人。他是中国计算机学会数据库专家委员会委员。他拥有复旦大学计算机科学与技术专业学士与硕士学位，比利时鲁汶根特商学院—北京大学MBA学位。

陈奇
大中华区域总经理, 事业部
陈奇： Think Big 大中华区总经理，现领导 Think Big 团队致力于大数据以及 open source 相关的咨询以及服务。是中国大数据领域的开拓者，他最先把大数据的理念带入了国内金融，电信，制造等领域并加以实施。曾负责创建了IBM BigInsights团队，领导BigInsights 全世界首发。也领导过 Intel Hadoop 团队，帮助确立IDH在中国市场的领先地位，曾帮助建立Cloudera 中国。是亚太市场大数据领域的开拓者与传播者。

安全

Hao Hao
Software Engineer, Cloudera
Hao Hao 是Cloudera（总部在美国加州硅谷）的一名软件工程师。她参与了Apache开源项目 Sentry的开发。她也是Apache Sentry 的PMC。Hao在美国雪城大学进行博士学位学习时，她的研究课题是关于智能手机系统的安全性和网络安全性。在加入Cloudera之前，Hao曾工作于eBay Inc 的Search Backend 团队，并参与eBay的网上购物平台搜索引擎的开发。

Anne Yu
Software Engineer, Cloudera
Anne Yu 是总部在加州硅谷的Cloudera的一名软件工程师，至2014年以来一直致力 Apache的sentry的开发和测试。她也是Apache sentry PMC。在加入Cloudera之前，安妮曾工作于amazon.com 的 search team开发测试基于aws的产品搜索引擎。她获得美国田纳西州大学的计算机工程学院的硕士学位（ ECE ）和美国奥克拉荷马的电讯管理硕士学位（MIS），学校期间她的研究方向是图像和计算机视觉处理，并有2个企业专利和学术文章。

Spark及更多新发展

Biao Chen
System Engineer, Cloudera
Cloudera售前技术经理、行业领域顾问、资深方案架构师，原Intel Hadoop发行版核心开发人员。2006年加入Intel编译器部门从事服务器中间件软件开发，擅长服务器软件调试与优化。2010 年后开始Hadoop 产品开发及方案顾问，先后负责Hadoop 产品化、HBase 性能调优，以及行业解决方案顾问。

Yupeng Fu
工程师, Alluxio
富羽鹏是Alluxio公司的工程师，也是开源软件Alluxio的主要贡献者与PMC成员。在加入Alluxio之前，曾在Palantir带领团队开发存储平台，再之前在加州大学圣地亚哥分校进行了博士学习。富羽鹏本科与硕士毕业于清华大学。

Cheng He
Principal Engineer, Huawei
Cheng He is a principal engineer and research manager in Huawei’s Noah’s Ark Lab, where his research interests include traffic measurement and modeling, distributed stream computing, big data stream mining, and online learning. Cheng has led important projects like MBB traffic measurement and modeling, system design for distributed stream big data processing, stream mining, and online learning of massive telecom data for intelligent network management. He has applied for more than 20 patents in China, the EU, and the US in his research area. His current research focuses on designing and developing online ML and stream-mining algorithms–oriented distributed streaming systems to support the intelligent management of large-scale telecom networks.

Shengsheng Huang
Software architect, Intel
Shengsheng (Shane) Huang is currently a software architect at Intel leading the development of large-scale analytical applications and infrastructure on Spark in Intel as well as an Apache Spark committer and PMC member. Shane’s area of focus is distributed machine learning, especially deep (convolutional) neural networks. Previously at NUS (the National University of Singapore), her research interests are large-scale vision data analysis and statistical machine learning. Before that, she worked at Intel as lead engineer on distributed big data frameworks (e.g., Hadoop and Spark) for over six years.

Jianfeng Qian
Researcher, Huawei
Jianfeng Qian is a researcher at Huawei Technologies’s Noah’s Ark Lab. His main research interests are mobile data analysis and stream machine learning. Jianfeng holds a PhD degree in computer science and technology from Zhejiang University.

Jerry Shao
Member of Technical Staff, Hortonworks
Jerry Shao works as a member of the technical staff at Hortonworks focused mainly on Spark, especially Spark core, Spark on YARN, and Spark Streaming. Jerry is an active Apache Spark contributor and Apache Chukwa committer. Prior to Hortonworks, he was a software engineer at Intel working on performance tuning and optimization of Hadoop and Spark.

HONG SHEN
高级工程师, 腾讯
腾讯数据平台部高级工程师，2015年加入腾讯，参与过大规模数据处理平台Hadoop与Spark集群的建设与优化。现主要专注在分布式计算引擎的研究与优化。

Yiheng Wang
Software engineer, Intel
Yiheng Wang is a software development engineer on the Big Data Technology team at Intel who works in the area of big data analytics. He and his colleagues are developing and optimizing distributed machine-learning algorithms (e.g., neural network and logistic regression) on Apache Spark. He also helps Intel customers build and optimize their big data analytics applications.

Yuhao Yang
Software engineer, Intel
杨玉皓，Intel大数据技术部门软件工程师，主要关注分布式机器学习应用和基础框架，为企业大规模机器学习应用提供合作与支持。Apache Spark Contributor，为Spark MLlib贡献多个算法和改进。

Jeff Zhang
Member of the Technical Staff, Hortonworks
Jeff Zhang has seven years of experience in the big data industry in big data infrastructure as well as on the application level of how to leverage big data tools to get insight from data. He has used Hadoop since 2009 and is a PIG committer. Currently, Jeff is a member of the technical staff at Hortonworks, where he focuses on Tez and Spark.

hucheng zhou
Researcher, Microsoft Research
Hucheng Zhou is a researcher for the System Research group at Microsoft Research Asia where he focuses on topics including large-scale learning systems, data-parallel computing, big data, machine learning, mobile computing, program analysis, compiler optimization, computer architecture, and tool development. Hucheng is interested in building a scalable, efficient, fault-tolerant, and easy-to-use distributed learning system, with the belief that such a system could be built on top of existing data-parallel execution engines, thus treated as a learning library. In this way, the entire machine-learning pipeline, including feature preparing, training, online learning, and model serving, could be supported by one single platform like Apache Spark. Hucheng holds a PhD from Tsinghua University.

jinqing zhu
高级数据专家, alibaba
朱金清（穆公）阿里巴巴高级数据专家，目前在阿里从事infrastructure数据的分析和开发工作，专注于spark的实时计算分析；最早加入阿里在淘宝/阿里数据库技术团队从事MySQL/HBase数据库的管理和数据开发；
人大数据库方向硕士，毕业之后在百度从事凤巢等广告数据库的管理调优工作、曾主导过凤巢历史上最大的数据库拆分工作（1拆N）。

孙垚光
架构师, 百度
目前是百度分布式计算方向架构师，离线计算技术负责人。2009年加入百度，先后从事内核网络协议栈、Hadoop/Spark大数据等方向的研发和优化工作，对Hadoop大数据生态有较为深入的理解，积累了丰富的大数据实战经验。

徐凯
高级工程师, 去哪儿网
去哪儿网大住宿数据部高级工程师，2008年毕业于北京邮电大学。目前负责大住宿数据部的数据系统架构设计、用户画像、模型定价系统的设计与开发.

李雪岩
开发工程师, 去哪儿网
北京趣拿软件科技有限公司平台事业部数据平台研发工程师。毕业于黑龙江大学软件工程专业。现主要负责资源管理系统Mesos和布分式内存管理系统Alluxio的持续集成开发，为各业务线的数据方面基础公共服务支持。主要涉及ELK日志ETL平台，Spark&Flink批处理系统和流式处理系统， zeppelin交互式处理等系统的发布与监控。

王奕恒
软件工程师, Intel
王奕恒来自于Intel大数据技术团队，专注于大数据分析领域。他的同事和他致力于在Apache Spark平台上开发分布式机器学习算法，以满足大数据背景下的机器学习需求。他还为这些分布式机器学习算法在Intel平台上进行优化，以及帮助Intel的客户为他们的业务开发大数据分析程序。

范斌
工程师, Alluxio
Alluxio核心软件工程师，博士毕业于卡内基梅隆大学。曾经在Google, Microsoft Research就职。

可视化及用户体验

崔岸雍
高级产品经理, 阿里云-数据事业部
崔岸雍，就职于阿里云数据事业部，花名永翎。目前负责阿里大数据体系上的数据分析与可视化产品。DataV 5年以来一直支撑阿里集团双11，集团对外可视化展示等大屏可视化的技术与设计，近两年开始对外产品输出，服务于公安、电力、烟草、电商、物流等各行业的数据可视化大屏需求。
个人一直活跃于国内数据可视化、数据新闻和开放领域，djchina.org数据新闻网联合创始人，开放数据中国负责人之一，曾负责组织翻译《数据新闻手册》一书

赞助商赞助

【主题演讲】

Strata Data Conference 北京2017

叶杰平 (Ye Jieping)
副总裁, 滴滴出行

叶杰平，滴滴出行研究院副院长，DiDi Fellow，美国密歇根大学终身教授及密歇根大学大数据研究中心的管理委员会成员。2005年美国明尼苏达大学计算机系博士毕业。专业方向为机器学习, 数据挖掘，以及大数据分析。在机器学习和数据挖掘国际顶级会议及期刊上共发表论文200余篇。曾获KDD和ICML最佳论文奖以及美国国家自然科学基金会生涯奖 (NSF CAREER Award)，并担任多个机器学习和数据挖掘领域顶级会议的主席。现任职机器学习和数据挖掘期刊IEEE TPAMI，DMKD，和 IEEE TKDE的副编委。

大数据在滴滴出行(Big Data at DiDi Chuxing)

Every day, Didi's platform generates over 70TB worth of data, processes more than 9 billion routing requests, and produces over 13 billion location points. In this talk, Ye Jieping will show how AI technologies have been applied to analyze such big transportation data to improve the travel experience for millions of people in China.

Strata Data Conference 北京2017

Lukas Biewald
Founder & Chief Data Scientist , CrowdFlower

Lukas Biewald is the founder and Chief Data Scientist of CrowdFlower. Founded in 2009, CrowdFlower is a data enrichment platform that taps into an on-demand to workforce to help companies collect training data and do human-in-the-loop machine learning.

Strata Data Conference 北京2017

Doug Cutting
Chief Architect, Cloudera

Doug Cutting is the chief architect at Cloudera and the founder of numerous successful open source projects, including Lucene, Nutch, Avro, and Hadoop. Doug joined Cloudera from Yahoo, where he was a key member of the team that built and...

周六欢迎致辞

大会日程主席 Jason Dai、Ben Lorica 与 Doug Cutting致辞开始第二天主题演讲。

Strata Data Conference 北京2017

Jason (Jinquan) Dai
CTO, Big Data Technologies, Intel

Jason Dai is currently a Senior Principal Engineer and CTO, Big Data Technologies, at Intel. Prior to that, he was a principle architect in Microsoft, responsible for building the large-scale cloud and big data platform that powers some of...

周六欢迎致辞

大会日程主席 Jason Dai、Ben Lorica 与 Doug Cutting致辞开始第二天主题演讲。

Strata Data Conference 北京2017

Yuanqing Lin
百度深度学习实验室（IDL）主任, Baidu

林元庆，现任百度深度学习实验室（IDL）主任，拥有清华大学光学工程硕士学位和宾夕法尼亚大学电气工程博士学位。
林元庆在机器学习和计算机视觉等研究领域拥有多年的研究经验和显著的成果。在加入百度前，曾任NEC美国实验室媒体分析部门主管。在他的带领下NEC研究团队在深度学习、计算机视觉和无人驾驶等领域取得世界领先水平。2005年至今在顶级国际会议和期刊发表论文30余篇，拥有11项美国专利，曾担任NIPS大会领域主席、大规模视觉识别和检索国际研讨会联合主席等。
加入百度后，林元庆致力于带领深度学习实验室研发具有统治级别的人工智能技术，其领导的团队在多个领域实现了技术上重大进展并且应用到百度的多项产品中去，极大地提升了产品的性能以及用户的体验，其带领的团队在多项重要计算机视觉技术在国际测试集上取得世界第一名的好成绩。

主题演讲 (Keynote by), Dr. Lin Yuanqing

Strata Data Conference 北京2017

Ben Lorica
Chief Data Scientist, O'Reilly Media

Ben Lorica is the chief data scientist at O’Reilly Media. Ben has applied business intelligence, data mining, machine learning, and statistical analysis in a variety of settings, including direct marketing, consumer and market research, targeted advertising, text mining, and financial...

周六欢迎致辞

大会日程主席 Jason Dai、Ben Lorica 与 Doug Cutting致辞开始第二天主题演讲。

Zhe Zhang
Software Engineer, LinkedIn

Zhe Zhang is an Engineering Manager at LinkedIn where he leads the Core Big Data Services team. The team leverages open source technologies including Hadoop, Spark, TensorFlow, and beyond, to form the storage-compute engine of LinkedIn’s big data platform. Zhe...

成长的烦恼--领英大数据平台500倍扩展中应对的挑战 (Growing pains - when your big data platform grows really big)

领英是全球最早应用大数据技术的公司之一。在过去9年的时间里，领英的大数据平台扩展了将近500倍，从20台节点支持10个用户运行MapReduce，到现在超过1万台节点支持几千名工程师和科学家运行从交互式Presto查询到TensorFlow深度学习的各种大规模数据分析。这个报告会分享领英的大数据平台团队怎样解决大规模和高速增长带来的各种挑战。

【议题】

11:15–11:55 Friday, 2017-07-14

Driving Southeast Asia Forward With Big Data

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): Non-technical

11:15–11:55 Friday, 2017-07-14

Pluto: A Distributed Heterogeneous Deep Learning Framework

地点：报告厅（Auditorium) 观众水平 (Level): 中级 (Intermediate)

11:15–11:55 Friday, 2017-07-14

Apache Hadoop 3 Features and Development Update

地点：多功能厅2（Function Room 2) 观众水平 (Level): Beginner

11:15–11:55 Friday, 2017-07-14

HAP：多流动态实时分析系统

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): 中级 (Intermediate)

11:15–11:55 Friday, 2017-07-14

Scaling R faster and larger using Apache Spark

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)

13:10–13:50 Friday, 2017-07-14

Spinach: 使用Spark SQL进行即席查询

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): 中级 (Intermediate)

13:10–13:50 Friday, 2017-07-14

数据驱动企业增长(Data Drive Enterprise's Growth)

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): 高级 (Advanced)

13:10–13:50 Friday, 2017-07-14

ING的快速数据——运用流式分析解决方案来创建一个是实时、数据驱动的银行(Fast Data at ING - streaming analytics solutions to create a real-time, data-driven bank)

地点：多功能厅2（Function Room 2) 观众水平 (Level): Intermediate

14:00–14:40 Friday, 2017-07-14

Offheap HBase read-path in production - The Alibaba story

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): Advanced

14:00–14:40 Friday, 2017-07-14

Distributed Deep Leaning at Scale on Apache Spark with BigDL

地点：报告厅（Auditorium) 观众水平 (Level): 中级 ()

14:00–14:40 Friday, 2017-07-14

Mastering Spark Unit Testing

地点：多功能厅2（Function Room 2) 观众水平 (Level): Intermediate

14:00–14:40 Friday, 2017-07-14

SDK+FinGraph+Go：用一手行为数据和图谱信息创造商业价值

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 中级 ()

14:00–14:40 Friday, 2017-07-14

Cost-Based Optimizer Framework for Spark SQL

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): 中级 (Intermediate)

14:50–15:30 Friday, 2017-07-14

机器人的预测性维护实战：解读实时、可扩展的分析管道(Robot Predictive Maintenance in Action: Real-time, Scalable Pipeline Explained)

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): Intermediate

14:50–15:30 Friday, 2017-07-14

使用Spark/BigDL高级机器学习实现寿险业务再发现(Re-implement Life Insurance Services by using Spark/BigDL Advanced Machine Learning)

地点：报告厅（Auditorium) 观众水平 (Level): 中级 (Intermediate)

14:50–15:30 Friday, 2017-07-14

Speed Up Big Data Encryption In Apache Hadoop And Spark

地点：多功能厅2（Function Room 2) 观众水平 (Level): Intermediate

14:50–15:30 Friday, 2017-07-14

大数据时代银行客户社交关系圈研究与应用

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)

14:50–15:30 Friday, 2017-07-14

Apache Kudo: 1.0版和未来(Apache Kudu: 1.0 and beyond)

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): Beginner

16:20–17:00 Friday, 2017-07-14

Spark在今日头条的实践(Practices of Spark in JinRi TouTiao)

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): 中级 (Intermediate)

16:20–17:00 Friday, 2017-07-14

从LR到DNN点击率预估系统的进化(The evolution of CTR prediction systems, from LR to DNN)

地点：报告厅（Auditorium) 观众水平 (Level): 中级 (Intermediate)

16:20–17:00 Friday, 2017-07-14

Spark和TiDB(Spark On TiDB)

地点：多功能厅2（Function Room 2) 观众水平 (Level): 中级 (Intermediate)

16:20–17:00 Friday, 2017-07-14

ShadowMask: Anonymize your sensitive big data

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)

16:20–17:00 Friday, 2017-07-14

Hadoop遇到云上对象存储——实现原理、陷阱和性能优化(When Hadoop meets Object Storage - Implementation Principles, Pitfalls and Performance Optimization)

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): 中级 (Intermediate)

Saturday, July 15

11:15–11:55 Saturday, 2017-07-15

The Architecture of Decoupling Compute and Storage with Open Source Alluxio

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): Non-technical

11:15–11:55 Saturday, 2017-07-15

Active Learning in the Real World

地点：报告厅（Auditorium) 观众水平 (Level): Intermediate

11:15–11:55 Saturday, 2017-07-15

领英大数据平台--超过1万节点，每天15万个作业，智能连接4.7亿职场用户(LinkedIn Big Data Platform - 10,000+ nodes, 150,000+ daily jobs, connecting 470 million members)

地点：多功能厅2（Function Room 2) 观众水平 (Level):

11:15–11:55 Saturday, 2017-07-15

Angel:面向高纬度的机器学习计算框架(Angel：A Machine Learning Framework for High Dimensionality)

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 高级 (Advanced)

11:15–11:55 Saturday, 2017-07-15

Apache Kylin 2.0：从Hadoop上的OLAP 引擎到实时数据仓库

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): 中级 (Intermediate)

13:10–13:50 Saturday, 2017-07-15

在滴滴出行的最佳实践(Spark best practice in Didi)

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): 中级 (Intermediate)

13:10–13:50 Saturday, 2017-07-15

On-device machine learning: TensorFlow on Android

地点：报告厅（Auditorium) 观众水平 (Level): Beginner

13:10–13:50 Saturday, 2017-07-15

Transactional Streaming with Apache DistributedLog

地点：多功能厅2（Function Room 2) 观众水平 (Level): 中级 (Intermediate)

13:10–13:50 Saturday, 2017-07-15

多视图建模与半监督学习：应用于海量用户数据挖掘与行为分析

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)

13:10–13:50 Saturday, 2017-07-15

Hyperledger与CDH大数据生态系统的融合以及应用实践(Hyperledger’s Integration with CDH Big Data Ecosystem, and Its Application in Real World )

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): 中级 (Intermediate)

14:00–14:40 Saturday, 2017-07-15

基于深度学习的网络表示

地点：报告厅（Auditorium) 观众水平 (Level): Intermediate

14:00–14:40 Saturday, 2017-07-15

在领英搭建Hadoop和Kafka之间的桥梁——Hadoop团队的视角(Building the bridge between Hadoop and Kafka at Linkedin - A Hadoop team's perspective)

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): 中级 (Intermediate)

14:00–14:40 Saturday, 2017-07-15

欺诈的潜伏性－如何利用大数据进行反欺诈检测

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)

14:00–14:40 Saturday, 2017-07-15

HBase多数据中心方案及未来的增量备份功能介绍

地点：多功能厅2（Function Room 2) 观众水平 (Level): 中级 (Intermediate)

14:50–15:30 Saturday, 2017-07-15

Columnar Storage @ Uber

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): 非技术性 (Non-technical)

14:50–15:30 Saturday, 2017-07-15

Powering Robotics Clouds with Alluxio

地点：报告厅（Auditorium) 观众水平 (Level): 中级 (Intermediate)

14:50–15:30 Saturday, 2017-07-15

Alluxio缓存策略优化与大规模性能评测

地点：多功能厅2（Function Room 2) 观众水平 (Level): 中级 (Intermediate)

14:50–15:30 Saturday, 2017-07-15

GraphSQL - 崭新的游戏规则: 一个完整的高效图数据和分析平台(GraphSQL - A Game Changer: A Complete High Performance Graph Data & Analytics Platform)

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): Intermediate

14:50–15:30 Saturday, 2017-07-15

Data service and processing platform for Ads in Ebay

地点：紫金大厅B（Grand Hall B) 观众水平 (Level): 中级 (Intermediate)

16:20–17:00 Saturday, 2017-07-15

人工智能工业应用痛点及解决思路

地点：报告厅（Auditorium) 观众水平 (Level): Advanced

16:20–17:00 Saturday, 2017-07-15

HDFS纠删码最新探秘

地点：多功能厅2（Function Room 2) 观众水平 (Level): 中级 (Intermediate)

16:20–17:00 Saturday, 2017-07-15

微软的通用异常检测平台(Common Anomaly Detection Platform at Microsoft)

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): 非技术性 (Non-technical)

16:20–17:00 Saturday, 2017-07-15

Unified SQL for Big Data on Hadoop

地点：多功能厅6A+B（Function Room 6A+B) 观众水平 (Level): Intermediate

【辅导课】

辅导课

请选择8月4日周四的全天或半天辅导课。专家的讲座将带你深入重要议题。请注意：参加辅导课您的注册内容包必须包含周四辅导课；该门票不能参加培训课程。

Thursday, July 13

09:00–12:30 Thursday, 2017-07-13

使用Alluxio(前Tachyon)来加速大数据计算 (Using Alluxio (formerly Tachyon) to speed up big-data analytics)

地点：多功能厅2（Function Room 2) 观众水平 (Level): 中级 (Intermediate)

Strata Data Conference 北京2017

Bin Fan (Alluxio), Haoyuan Li (Alluxio)

在这个三个小时的教学课中, 我们将向参与者讲授Alluxio基础知识，演示Alluxio如何工作以及如何使用此系统帮助分布式计算引擎（如Spark或MapReduce）以内存速度共享数据。

09:00–12:30 Thursday, 2017-07-13

用TensorFlow进行深度学习 Deep Learning with TensorFlow

地点：报告厅（Auditorium) 观众水平 (Level): Beginner

Strata Data Conference 北京2017

TensorFlow是一个流行的开源机器学习库，特别适合进行深度学习。本辅导课会通过实际的例子来介绍机器学习和深度学习。我们会指导参会者自己动手来使用TensorFlow和TensorBoard进行练习。

09:00–12:30 Thursday, 2017-07-13

从简单到复杂：Apache Kafka应用实例详解

地点：多功能厅8A+8B（Function Room 8A+8B) 观众水平 (Level): 中级 (Intermediate)

Strata Data Conference 北京2017

Jiangjie Qin (linkedin corp)

Apache Kafka作为近年来最流行的消息系统之一，其使用场景已经从最初的集中系统消息队列发展到更为复杂的一系列使用场景，包括流处理，数据库复制，CDC等等。本次演讲将以Kafka在LinkedIn的实践为基础详细介绍Kafka的各种应用场景。

09:00–12:30 Thursday, 2017-07-13

大数据的数据模型Big Data - Data Modeling

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): Beginner

Ted Malaska (Blizzard Entertainment )

The recent advancement in distributed processing engines, from Spark to Impala to Spark Streaming or Storm, has proved exciting. However, if your design only focuses on the processing layer to get speed and power then you may be missing half the story, leaving a significant amount of optimization untapped.

13:30–17:00 Thursday, 2017-07-13

AWS上使用MXNet进行分布式深度学习Distributed Deep Learning on AWS using MXNet

地点：多功能厅2（Function Room 2) 观众水平 (Level): Intermediate

Strata Data Conference 北京2017

Damon Deng (AWS)

深度学习正持续地在诸如计算机视觉、自然语言处理和推荐引擎等领域引领最前沿的进步。带来这个进步的一个关键因素就是大量的高度灵活和对开发人员很友好的深度学习框架的出现。在本辅导课里，亚马逊机器学习团队的成员将会就深度学习的背景做一个简短的介绍，主要关注与其相关的应用领域。并会对强大和可扩展的深度学习框架——MXNet——做一个介绍。辅导课的最后，你可以获得上手的机会来获得针对多种应用的经验，包括计算机视觉和推荐引擎等。并可以看到如何使用预先配置好的深度学习AMI和CloudFormation模版来帮助加快开发速度。

13:30–17:00 Thursday, 2017-07-13

使用Apache Spark和BigDL来构建深度学习驱动的大数据分析(Building Deep Learning Powered Big Data Analytics using Apache Spark and BigDL)

地点：报告厅（Auditorium) 观众水平 (Level): 中级 (Intermediate)

Strata Data Conference 北京2017

Yiheng Wang (Intel)

深度学习已经在很多的领域（例如计算机视觉、自然语言处理和语音识别等）取得了顶尖水准的表现，对工业界有极大的潜在应用价值。我们应该注意到深度学习和大数据的联系非常得紧密。首先，深度学习的模型需要使用大量的数据来训练，这就是为什么它直到大数据时代才开始蓬勃发展。其次，现在绝大部分的大数据都是视频、音频和文字数据，非常适合使用深度学习算法来处理。为了能释放深度学习的能力，我们就应该把它运用在大数据的环境里。

13:30–17:00 Thursday, 2017-07-13

Modern Streaming Architectures

地点：多功能厅8A+8B（Function Room 8A+8B) 观众水平 (Level): Beginner

Strata Data Conference 北京2017

Sijie Guo (Twitter), Maosong Fu (Twitter)

The move to streaming architectures from batch processing is a revolution in how companies use data. But what is the state of the art for real-time data stack, including stream computing engine, data storage engine, language and tools. What are the typical challenges in a modern real-time data stack? How will the modern technology impact the streaming architecture and applications in the future?

13:30–17:00 Thursday, 2017-07-13

Hadoop application architectures: Fraud detection

地点：多功能厅5B＋C（Function Room 5B+C) 观众水平 (Level): Intermediate

Strata Data Conference 北京2017

Ted Malaska (Blizzard Entertainment )

Ted will walk participants through building a fraud-detection system, using an end-to-end case study to provide a concrete example of how to architect and implement real-time systems via Apache Hadoop components like Kafka, HBase, Impala, and Spark.

【培训】

Strata Data Conference 培训

Training courses takes place 9:00am - 17:00pm and are limited in size to maintain a high level of hands-on learning and instructor interaction. Training passes do not include access to tutorials on Thursday.

Breaks:

Morning break: 10:30 – 11:00
Lunch: 12:30 – 1:30
Afternoon break: 15:00 – 15:30

July 12 — 13, 2017
Included in Platinum and 2-Day Training passes.
Participants should plan to attend both days of this training course.

09:00–17:00 Wednesday, 2017-07-12

数据科学精髓：互联网金融实例 - 量化线上金融信用与欺诈风险的评估 (Data Science Essentials: Examples from Internet Finance - Quantifying Credit and Fraud Risks Online)

Location: 多功能厅3B（Function Room 3B) 观众水平 (Level): 中级 (Intermediate)

Strata Data Conference 北京2017

Jike Chong (YiRenDai/CreditEase)

您想了解互联网金融幕后的量化分析流程吗？个人信用是怎样通过大数据被量化的？在实践过程中，机器学习算法的应用存在着哪些需要关注的方面？怎样通过图谱分析来融合多维数据，为我们区分正常用户和欺诈用户？这套辅导课基于清华大学交叉信息研究院2017年春天新开设的一门"量化金融信用与风控分析”研究生课。其中会用LendingClub的真实借贷数据做为案例，解说一些具体模型的实现。

July 12, 2017
Included in Gold and 1-Day Training passes.

09:00–17:00 Wednesday, 2017-07-12

Apache Spark高级实践和原理解析 (Advanced practice and principle analysis)

Location: 多功能厅5A（Function Room 5A) 观众水平 (Level): 中级 (Intermediate)

Strata Data Conference 北京2017

Carson Wang (Intel), 俞育才 (Intel), Zhichao Li (Intel), Yiheng Wang (Intel), Daoyuan Wang (Intel)

这几年随着大数据分析和机器学习等等在工业界中越来越广泛的应用，越来越多的人选择在大数据平台比如Apache Spark之上构建大规模数据处理、分析和机器学习，以便利用大量原始数据和扩展架构。如何深入理解大数据关键技术并更好的运用它们？本次课程将结合当前大数据技术的浪潮和趋势，为您介绍Apache Spark的高级实践和原理解析，帮助您加深领会Apache Spark的精华设计思想，以及如何与流式分析、机器学习，深度学习等紧密结合，在数据采集，分析处理，特征提取，机器学习等方面提供一致性和集成性的高级实践。

【会议介绍】

Strata Data Conference: Make data work

Strata Data Conference是关于数据、机器学习及分析如何改变商业和社会本身的领先会议。来自各种规模创新公司的顶尖数据科学家、分析师和管理人员聚集一堂，分享深入、难以获取的知识。

Strata Data Conference 7月12-15日重返北京。我们在寻找讲师来和有才华的技术观众分享有吸引力的数据案例分析、成熟的最佳实践、有效的新分析方法以及不同寻常的技能。我们寻找的主题包括：

会议主题包括：

AI应用

智能应用例如个人助理、聊天机器人、机器人、无人机、无人驾驶汽车正出现在各领域。除了应用和使用案例我们也对一些议题和培训感兴趣，包括深度学习、强化学习、概率机器学习、计算机视觉、自然语言理解、语音和情感技术及相关主题。

数据科学&高级分析

机器学习的最新算法和进展，以及文化演变和团队建设方面的棘手问题。

统计、算法及机器学习（包括深度学习）
主动学习及其他“人类参与的”机器学习系统
数据分析流程、探索、协作、同行评审、记录的再现性和数据来源。
通过设计和社交科学技巧来创建更好的试验并提出正确的问题。
数据结构和布局（表、图形、网络、时间序列、非结构化文本）
欺诈侦测、对抗分析、游戏理论

Hadoop使用案例

Hadoop生态系统真实案例分析，范围包括从初创企业到行业巨头。

Hadoop内核&发展

深入了解这一主流大数据栈，包括实践经验、整合技巧和前瞻。

数据工程和架构

Hadoop之外的工具——例如Cassandra、Storm、Elasticsearch、Kafka还有基于云的服务——以及它们如何融入数据科学工具包。

大数据平台及架构（Hadoop、Elasticsearch、Cassandra、Kafka、Storm等等。）
扩展、查询性能、可用性、计算成本、自动化、加密
为分析做预处理、清理、整理和增强数据
混合内部部署和云数据服务

Spark&更多发展

Apache Spark最佳实践、架构考量、来自于初创公司和大型企业的真实案例研究。

物联网&实时计算

物联网搜集和产生的数据——包括存储、分析和发布这些信息的困难；及从结果洪流中抽取能理解的、有意义的见解。

机器、传感器、人群及移动数据搜集
分析与物联网
开放数据标准与互操作性

安全

数据需要像加密这样的工具来保护安全和隐私；越来越多的数据和算法可以改善我们的安全状况。但是安全团队一直都在和那些试图钻算法漏洞的人展开不断竞赛。该主题系列探讨数据治理和数据在更好的安全性中的角色。

企业应用

企业如何从遗留数据存储向大数据转移以及最佳实践——及障碍——从而成为数据驱动企业

可视化&用户体验

如果数据没产生结果就没任何意义。该主题系列解决增强、用户体验、新界面、交互性及可视化。

分析和报告
增强和虚拟现实
设计、交互性及可视化
设计中断和上下文接口
用户体验和数据驱动设计

开放数据

在中国有越来越多的开放数据存储库和计划。我们将展示大数据技术、数据科学和开放数据在公共或私有领域的应用。

关于 Strata Data Conference

Strata Data Conference是最前沿科学与新兴商业基础碰撞和融合的地方。在这里我们会深入探索新兴技术和科技。您将通过深入的辅导课剖析案例研究、发展新技能，分享数据科学中新兴的最佳实践并畅想未来。

该活动之前作为Strata + Hadoop World创建于2012年，O'Reilly和Cloudera将两个成功的大数据会议组合在一起。

议题主席Doug Cutting（Cloudera首席架构师，Apache Hadoop创始人）、Roger Magoulas（O'Reilly研究主管）以及企业家Alistair Croll和议题开发总监Ben Lorica（O'Reilly首席数据科学家）已经安排了一个覆盖整个大数据工具和技术的议题安排。Strata Data Conference涵盖了像人工智能和机器学习等当前热门话题，并且重点放在如何实施数据战略上。

为什么您应该参加

Strata Data Conference 将聚集大数据领域最有影响力的产业决策者、战略专家、架构师、开发人员和分析师，共同打造产业和技术的未来。

成为了解如何利用这些巨大变化的最前沿人群，并在所导致的颠覆中存活下来
在各个行业和学科找到利用您的数据资产的新方法
学习如何从科学项目中提取数据并应用到实际行业中
对专业数据人士来讲将发现培训、雇佣和职业机会
与其他创新人员和意见领袖面对面交流

体验 Strata Data Conference

3整天的议程包括富于启发的主题演讲、非常实用又有丰富信息的议题，以及很多有趣的社交活动。

探索最新的前沿问题、案例研究以及最佳实践
与商界领袖、数据专家、设计者和开发者交流的机会
为参会者、记者和供应商提供了活跃的“走廊交流会” ，使您有机会对重要问题进行探讨和辩论
有趣的晚间活动招待会，更重要的是给您更多与参会者和演讲者面对面的时间

您会看到谁

Strata Data Conference 将吸引数据行业最出色的人员：开发者、数据科学家、数据分析师以及其他数据行业的专业人员，包括：

商业智能经理和分析师
商务经理、战略专家和决策者
CIO, CTO 以及企业架构师
数据驱动设计者，记者以及人类学家
数据工程师
数据科学家
设计者
开发者和数据库专业人员
创新人士和企业家
产品经理
研究人员和学术人员
风投和投资者
副总裁、市场主管或数据仓库主管

【会议门票】

团购优惠政策（请联系客服）：

如果一个公司：
注册3-5人则享受八折优惠。
6-9人：七五折优惠
10人或10人以上：七折优惠

温馨提示：白金门票已售罄

会议门票如下：

Strata Data Conference 北京2017

请注意：白金门票和黄金门票不包含周四的辅导课；标准折扣不适用。白银门票和青铜门票不包括周三和周四的培训课程。

培训门票

Strata Data Conference 北京2017

请注意：这些培训门票不包括周四的辅导课。标准折扣不适用。

取消和转让政策

如果您必须取消一定要在June 14, 2017之前书面通知我们。请联系我们。会议开始前30天之内取消是不退款的。June 28, 2017之前您可以把注册转让给其他人，一定要发送授权。确认并完成支付后取消的参会者、或者超过截止日期取消的参会者要承担全部会议费用。遇到极端情况该会议取消O'Reilly Media, Inc.的责任仅限于退回支付的注册款项。

行为守则

所有参会者必须遵守我们的行为守则，其核心想法是：O'Reilly会议对每个人都应该是一个安全、富有成效的环境。

母婴室

会议现场会提供一个附近私密空间方便母亲和孩子哺乳等。

摄影 & 视频

我们的目的是捕捉会议中激动人心的时刻，您可能看到一些摄影师，包括我们请来的摄影师，来记录本次活动。我们拍摄的照片和视频可能会在网站上发布，也可能会在未来的市场宣传中使用。

请注意会议中是不允许拍摄视频的。

隐私政策

根据各自的隐私政策注册人联系信息将会被会议主办方分享和使用——O'Reilly/Strata Conference 以及 Cloudera/Hadoop World。

Strata Data Conference 北京2017

会议介绍

往届嘉宾：Strata + Hadoop World 2016 讲师

数据创新

数据科学与高级分析

企业应用

Hadoop应用案例

Hadoop内核与开发

物联网与实时计算

安全

Spark及更多新发展

可视化及用户体验

赞助商赞助

July 12 — 13, 2017 Included in Platinum and 2-Day Training passes. Participants should plan to attend both days of this training course.

July 12, 2017 Included in Gold and 1-Day Training passes.

拟邀嘉宾

黄文宇

Xuefu Zhang

ximeng zhang

张李晔

Pengfei Yue

姚舜扬

Yu Xu

Qinyan

Tony Xing

吴中

Mingxi Wu

Binggang Wo

Yiheng Wang

王海华

Daoyuan Wang

Carson Wang

Andrew Wang

Zhe Zhang

Xiaoyong

丛宏雷

马洪宾

马晓宇

顾佳盛

陈雨强

费辉

莫云

王玮

王振华

杨帆

杨军

李浒

李呈祥

李元健

张铭

吴炜

俞育才

Daniel Templeton

Chen Sammi

Maosong Fu

Darren Fu

Bin Fan

Mateusz Dymczyk

Mathieu Dumoulin

Damon Deng

Jason (Jinquan) Dai

Doug Cutting

Jike Chong

Feng Cheng

Haifeng Chen

Biao Chen

Lukas Biewald

蒋守壮

顾荣

叶杰平

Bas Geerdink

Sijie Guo

Yufeng Guo

Jiangjie Qin

Gene Pang

Ted Malaska

Zhenxiao Luo

Ben Lorica

刘轶

Shaoshan Liu

July 12 — 13, 2017
Included in Platinum and 2-Day Training passes.
Participants should plan to attend both days of this training course.

July 12, 2017
Included in Gold and 1-Day Training passes.