I Use This!
Very High Activity

News

Analyzed 20 days ago. based on code collected 3 months ago.
Posted 2 months ago by Sally
Happy Friday from the Apache community! Here's what we've been up to over the past week: Support Apache –billions of users depend on Apache's free, community-driven software. Every dollar counts. http://apache.org/foundation/contributing.html ... [More] ASF Board –management and oversight of the business and affairs of the corporation in accordance with the Foundation's bylaws. - Next Board Meeting: 21 June 2017. Board calendar and minutes http://apache.org/foundation/board/calendar.html ApacheCon™ –the official conference of the Apache Software Foundation. Tomorrow's Technology Today. - Presentations from ApacheCon https://s.apache.org/Hli7 and Apache: Big Data https://s.apache.org/tefE - Videos of keynotes + presentations https://s.apache.org/AE3m and Audio recordings https://feathercast.apache.org/ ASF Infrastructure –our distributed team on four continents keeps the ASF's infrastructure running around the clock. - 7M+ weekly checks yield savvy performance at 98.54% uptime http://status.apache.org/ Apache Incubator –the entry path for codebases and their communities to become an official part of the ASF. - Welcome New Podlings: Livy, Pulsar, and Superset http://incubator.apache.org/ Apache Ant™ Compress –offers tasks and types for archive and compression formats. - Apache Compress Antlib 1.5 released http://ant.apache.org/antlibs/compress/ Apache Arrow™ –a columnar in-memory analytics layer designed to accelerate Big Data. - Apache Arrow 0.4.1 released http://arrow.apache.org/ Apache Commons™ FileUpload –parses HTTP requests which conform to RFC 1867, "Form-based File Upload in HTML". - Apache Commons FileUpload 1.3.3 released http://commons.apache.org/proper/commons-fileupload/ Apache Commons™ Lang –provides helper utilities for the java.lang API, notably String manipulation methods, basic numerical methods, object reflection, concurrency, creation and serialization and System properties. - Apache Commons Lang 3.6 released http://www.apache.org/dist/commons/lang/ Apache Directory™ DS –an extensible and embeddable directory server entirely written in Java, which has been certified LDAPv3 compatible by the Open Group. - ApacheDS 2.0.0-M24 released http://directory.apache.org/apacheds Apache Fluo (incubating) –adds distributed transactions to Apache Accumulo, and provides an observer/notification framework to allow users to incrementally update large data sets stored in Accumulo. - Apache Fluo 1.1.0-incubating released https://fluo.apache.org/ Apache Groovy™ –a multi-facet programming language for the JVM. - Apache Groovy-2.5.0-beta-1 released https://groovy.apache.org/ Apache Jackrabbit™ –a fully compliant implementation of the Content Repository for Java(TM) Technology API, version 2.0 (JCR 2.0) as specified in the Java Specification Request 283 (JSR 283). - Apache Jackrabbit 2.15.3 and Jackrabbit Oak 1.6.2 and 1.7.1 released https://jackrabbit.apache.org/ Apache Kudu™ –an Open Source storage engine for structured data that supports low-latency random access together with efficient analytical access patterns. - Apache Kudu 1.4.0 released http://kudu.apache.org/ Apache NiFi™ –an easy to use, powerful, and reliable system to process and distribute data. - Apache NiFi 0.7.4 and 1.3.0 released https://nifi.apache.org/ - Apache NiFi CVE-2017-7667 and CVE-2017-7665 http://mail-archives.apache.org/mod_mbox/www-announce/201706.mbox/%3CCAFddr25eFkXCOQGwyN4B4VVNjdVYLcKya_JCaW%3Dd%3D11%3DQkyd4g%40mail.gmail.com%3E Apache Portable Runtime™ –provides cross-platform APIs that relieve developers of the need to deal with platform differences. - Apache Portable Runtime and Utilities 1.6 released https://apr.apache.org/ Apache Sling™ –a Web framework that uses a Java Content Repository, such as Apache Jackrabbit, to store and manage content.  - Apache Sling 9 released https://sling.apache.org/ Apache Zeppelin™ –a collaborative data analytics and visualization tool for distributed, general-purpose data processing system such as Apache Spark, Apache Flink, etc. - Apache Zeppelin 0.7.2 released http://zeppelin.apache.org/ Did You Know?  - Did you know that the Symphony communications and messaging platform uses Apache Cassandra, HBase, Kafka, Solr, Tomcat, and the Apache License? http://cassandra.apache.org/ http://hbase.apache.org/ http://kafka.apache.org/ http://lucene.apache.org/solr http://tomcat.apache.org/ https://www.apache.org/licenses/LICENSE-2.0  - Did you know that the Apache Drill distributed SQL engine brings flexibility and agility to data lakes across the Hadoop ecosystem? http://drill.apache.org/  - Did you know that we have 10 project birthdays this month? Happy Apache Anniversary to SpamAssassin (13 yrs); Santuario (12 yrs); Commons and Wicket (10 yrs); Sling (8 yrs); Karaf (7 yrs); Flume and VCL (5 yrs); Mesos (4 yrs); and Twill (1 year) --many happy returns to all! https://projects.apache.org/ Apache Community Notices:  - "Success at Apache" focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be  - Check out the Apache Community Development blog https://blogs.apache.org/comdev/  - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity  - Apache ActiveMQ Call For Logo https://blogs.apache.org/activemq/entry/apache-activemq-call-for-logo  - Catch the Apache Ignite and Spark communities at the In-Memory Computing Summit 20-21 June in Amsterdam and 24-25 October in San Francisco https://imcsummit.org/  - ASF Operations Summary - Q3 FY2017 https://s.apache.org/NKFz  - The list of Apache project-related MeetUps can be found at http://apache.org/events/meetups.html  - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache HTTP Server, Avro, ComDev (community development), Directory, Incubator, OODT, POI, Polygene, Syncope, Tika, Trafodion, and more! https://helpwanted.apache.org/  - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby = = = For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers. # # # [Less]
Posted 2 months ago by sharan
Throughout May we promoted that we had been invited to have an Apache booth at the Open Expo in Madrid. The event took place on 1st June 2017 and was the first time that we had an Apache presence there. We received an invitation to the event because ... [More] of a contact made at FOSDEM (so you can see how being at one event can lead to another!) In preparation we made sure that we had enough stickers and swag (pens, usb hubs, fandanas and lapel pins) for our booth. Booth DutyOur booth staff was made up of myself and two amazing Spanish speaking volunteers (Ignasi Barrera and Jan Iversen). Ignasi had also designed a flyer / leaflet with details about the ASF, who we are and what we do on one side and some information about Incubator on the other. This was translated into Spanish as we expected that it would be a predominantly Spanish event. One hundred of the leaflets were printed and when we put the first few out, they disappeared quickly. It was then that we realised that we needed to ensure we kept enough leaflets available on the table but we ran out of them completely by early afternoon. It was a long day and Jan and Ignasi were kept very busy explaining to people about the Foundation, its goals and how it all works. We understand that the conference attracted over 3000 visitors throughout the day and 300 of them (around 10%) stopped by the Apache booth to talk to us. We found that most people already knew about specific Apache projects and were keen to talk to us about them. Around one in every six spent around 5-10 minutes which is really great as it shows that they are very interested in knowing or finding out more about Apache. Being part of Apache, we sometimes take it for granted and it was a little surprising to find out that many people didn't really know very much about the Apache Software Foundation and in fact most of them didn't even know that there was a Foundation behind Apache projects!. Being at the event helped us fill this knowledge gap. We had a wide range of questions - ranging from 'I want to use the Apache Licence for my own open source project, Do I need to join Apache to do that?' to 'What do I need to do if I wanted to sponsor the Foundation?' We noticed that people were quite selective in which stickers they took (it wasn't a grab everything!) and one attendee especially wanted the first sticker on his new PC to be an Apache one! It was good to see and meet up with some of our Apache project contributors and committers at the event and they were also very happy to see us. I was also pleasantly surprised to see that there were a lot of women attending this conference. I'd heard that Madrid is becoming a technology hub and it was great to see women in technology being so well represented at this event. A main highlight (apart from the great atmosphere and conference buzz) was that we completely ran out of our flyers (information leaflets) about the ASF. To us it showed that people were really interested in finding out more about Apache. What we learned and achieved?These are the main things we achieved and learned by being here were:. This event was mainly a Spanish speaking event although the organisers are trying to bring more English speaking presentations. There wasn't any Apache content at the event so if we are do plan to be there in then it would be good to encourage participation in the CFP (NOTE: We need to be prepared to do this in Spanish) Being there brought a bit more awareness about Apache to a new audience Having information leaflets and flyers at events like these, is a great way to provide essential information about Apache. People were happy to take them away which showed that they were really interested in finding out more about us. As well as potential sponsors, we were asked about taking part in some Spanish podcasts about free software, and some people also asked about Apache meetups in Spain so have opened up some potential opportunities Being able to scan our booth visitors meant that we could send out a thank you message to them all (Thanks Ignasi!) Community development is also about going out and being active at conferences and events where people can see and speak to us, so let's keep doing it! So was it a success? Yes definitely! And we will be looking out for any future events to see if we can participate. Also some photos from the event have been uploaded to our Facebook page.. We are hope that we will be invited again next year and if so will be looking forward to another successful event so please come along (especially any Spanish speakers) and support us. Special Thanks!Once again huge thank you to Ignasi Barrera and Jan Iversen who volunteered their time and energy to come to Madrid and help out. [Less]
Posted 2 months ago by sharan
Welcome to our monthly blog update about what is happening in Apache Community Development (ComDev)! This month we have lots of news about ApacheCon North America; share some links to some of the conference keynotes; deliver a co-ordinated Apache ... [More] Way track alongside our Community one; get some feedback from a first time ApacheCon attendee; encourage you to listen to ApacheCon via our FeatherCast podcast channel and we run yet another successful BarCampApache. ApacheCon NA 2017After all the preparation ApacheCon North America took place this month in Miami. As well as the standard ApacheCon and Big Data tracks, several mini conferences were held around the main event. It also featured a new Apache Way track which is mentioned in separate section below in more detail. ApacheCon featured some great keynote speakers with some interesting topics from IoT to Digital Pychometrics. If you haven't had a chance to hear or see them then they are all available on Youtube as follows: Keynote: State of the Feather - Sam Ruby, President, Apache Software Foundation Keynote: Apache Projects and Comcast’s Journey to Open Source - Nithya Ruff Keynote: Training Our Team in the Apache Way - Alan Gates Keynote: Apache CouchDB: A Tale of Community, Cooperation and Code - Adam Kocoloski Keynote: Digital Psychometrics and it's Future Effects on Technology - Sandra Matz Keynote: Machine Learning & Apache Spark: A Dynamic Duo - John Thomas Keynote: Future of IoT: A VC Perspective - Sudip Chakrabarti The complete Youtube video playlist from ApacheCon NA Miami is also available. For those sessions there were not on video, we have recorded the audio and these are available on FeatherCast. During ApacheCon Benjamin Young organised a ComDev Tools Hackathon where people interested in improving some of our tools got together and started working on it. Thanks very much to Benjamin for this initiative and to all the hackathon participants who came along to help out. BarCampApacheOur BarCampApache was a great success and attracted well over 35 people. It was facilitated by Jean-Frederic Clere and some of our attendees on the day. It was great to see so much involvement from people from a wide range of Apache projects. The following were some of the topics raised by our partitcipants and discussed: MQ in Practice Community Engagement & How to Atract More Contributors / Committers Benchmarking Open Source in China Orchestraiton YARN/MESOS Static Analysis and Project Management At every BarCampApache a key theme is sharing information and advice as well as making contacts, and I'm sure we succeeded in doing that. How do we know? The discussions kept on well after the time when we were planning to finish! So thanks again to everyone that attended and we hope that you will pass on your experiences to other colleagues and encourage them to participate at future events. The audio from the BarCampApache is available on FeatherCast. (WARNING: This current file is over 5 and a half hours long but we will be splitting this up into more topic based discussions) The Apache Way Track and Community TrackAs well as our usual community based track, for the first time at ApacheCon a co-ordinated track detailing different aspects of the Apache Way was organised by Shane Curcuru and Nick Burch. The five presentations and one panel discussion incorporated common themes and explored some of the challenges encountered. The Apache Way Track" was recorded and videos are available on Youtube at the following links: Apache Way: Effective Open Source Project Management - Shane Curcuru A Tale of Two Developers: Finding Harmony Between Commercial Software Development and the Apache Way - Andrew Wang & Alex Leblang From dev-AT- to user@ to the Apache Way- Steve Blackmon Apache Way Panel: Software is Easy; People are Hard - Moderated by Benjamin Young The Apache Way for Business Panel - Moderated by Nick Burch Committed to The Apache Way - Sharan Foga "The Community Track" was also recorded and videos are available at the following links: Gaining Insight Into Your Apache Project with Snoot - Daniel Gruno Diversity, For Those Playing Life on Easy - Nick Burch InnerSource 101 and the Apache Way - Jim Jagielski User Groups: The Gateway to Apache - Bob Paulin Practical Trademark Law For FOSS Projects - Shane Curcuru Thank you to everyone who attended and participated, both on and off camera. Please watch, learn and share. FeatherCast Audio from ApacheConOur Community Development podcast channel FeatherCast has been extremely busy this month. Before ApacheCon we were recording interviews with some of the speakers and during the conference itself we talked to attendees, sponsors and some of our Apache Directors and Officers. We now have a large amount of content for you to listen to. Also don't worry if you missed missed ApacheCon or couldn't make to all the sessions because they have all been recorded. You can find the audio for all the ApacheCon sessions on FeatherCast. Thanks very much to all the people who worked hard to get this audio up so quickly after ApacheCon. Please follow FeatherCast on Twitter to be informed about our latest interviews. We are always looking for volunteers so if you are interested in helping out with our FeatherCast podcast channel then please email contact our mailing list feather@apache.org First Time ApacheCon Experience - Stephen DownieIf you haven't taken a look at the blog post by Stephen Downie then it is highly recommended. Miami was Stephen's his first ever ApacheCon and his blog post describes his experiences all the way through from apprehension to realisation. Stephen's full blog post can be found here. My First Experience of ApacheCon Blog Post by Stephen Downie OpenExpo in MadridDuring May we promoted that we had been invited to have an Apache booth at the Open Expo in Madrid on 1st June. This is a new event for us and we wanted to find out whether it could be a good place to have an Apache presence at in the future. We understand that the conference had over 3000 visitors and 300 of them passed by our Apache booth not only to pick up stickers but to also find out more about the Foundation itself. Please take a look at our detailed blog post about the event. A big thank you to Ignasi Barrera, Jan Iversen and Sharan Foga who spent the day on booth duty during the event. Contacting Community DevelopmentRemember that we are always happy to get your feedback and comments so please feel free to contact us, follow our events and participate in our discussions on our mailing list. If you would like to be kept up to date with all the latest news about what is happening in Community Development then please subscribe to our mailing list by sending an email to dev-subscribe AT community DOT apache DOT org. [Less]
Posted 2 months ago by sharan
Welcome to our monthly blog update about what is happening in Apache Community Development (ComDev)! This month we have lots of news about ApacheCon North America; share some links to some of the conference keynotes; deliver a co-ordinated Apache ... [More] Way track alongside our Community one; get some feedback from a first time ApacheCon attendee; encourage you to listen to ApacheCon via our FeatherCast podcast channel and we run yet another successful BarCampApache. ApacheCon NA 2017After all the preparation ApacheCon North America took place this month in Miami. As well as the standard ApacheCon and Big Data tracks, several mini conferences were held around the main event. It also featured a new Apache Way track which is mentioned in separate section below in more detail. ApacheCon featured some great keynote speakers with some interesting topics from IoT to Digital Pychometrics. If you haven't had a chance to hear or see them then they are all available on Youtube as follows: Keynote: State of the Feather - Sam Ruby, President, Apache Software Foundation Keynote: Apache Projects and Comcast’s Journey to Open Source - Nithya Ruff Keynote: Training Our Team in the Apache Way - Alan Gates Keynote: Apache CouchDB: A Tale of Community, Cooperation and Code - Adam Kocoloski Keynote: Digital Psychometrics and it's Future Effects on Technology - Sandra Matz Keynote: Machine Learning & Apache Spark: A Dynamic Duo - John Thomas Keynote: Future of IoT: A VC Perspective - Sudip Chakrabarti The complete Youtube video playlist from ApacheCon NA Miami is also available. For those sessions there were not on video, we have recorded the audio and these are available on FeatherCast. During ApacheCon Benjamin Young organised a ComDev Tools Hackathon where people interested in improving some of our tools got together and started working on it. Thanks very much to Benjamin for this initiative and to all the hackathon participants who came along to help out. BarCampApacheOur BarCampApache was a great success and attracted well over 35 people. It was facilitated by Jean-Frederic Clere and some of our attendees on the day. It was great to see so much involvement from people from a wide range of Apache projects. The following were some of the topics raised by our partitcipants and discussed: MQ in Practice Community Engagement & How to Atract More Contributors / Committers Benchmarking Open Source in China Orchestraiton YARN/MESOS Static Analysis and Project Management At every BarCampApache a key theme is sharing information and advice as well as making contacts, and I'm sure we succeeded in doing that. How do we know? The discussions kept on well after the time when we were planning to finish! So thanks again to everyone that attended and we hope that you will pass on your experiences to other colleagues and encourage them to participate at future events. The audio from the BarCampApache is available on FeatherCast. (WARNING: This current file is over 5 and a half hours long but we will be splitting this up into more topic based discussions) The Apache Way Track and Community TrackAs well as our usual community based track, for the first time at ApacheCon a co-ordinated track detailing different aspects of the Apache Way was organised by Shane Curcuru and Nick Burch. The five presentations and one panel discussion incorporated common themes and explored some of the challenges encountered. The Apache Way Track" was recorded and videos are available on Youtube at the following links: Apache Way: Effective Open Source Project Management - Shane Curcuru A Tale of Two Developers: Finding Harmony Between Commercial Software Development and the Apache Way - Andrew Wang & Alex Leblang From dev-AT- to user@ to the Apache Way- Steve Blackmon Apache Way Panel: Software is Easy; People are Hard - Moderated by Benjamin Young The Apache Way for Business Panel - Moderated by Nick Burch Committed to The Apache Way - Sharan Foga "The Community Track" was also recorded and videos are available at the following links: Gaining Insight Into Your Apache Project with Snoot - Daniel Gruno Diversity, For Those Playing Life on Easy - Nick Burch InnerSource 101 and the Apache Way - Jim Jagielski User Groups: The Gateway to Apache - Bob Paulin Practical Trademark Law For FOSS Projects - Shane Curcuru Thank you to everyone who attended and participated, both on and off camera. Please watch, learn and share. FeatherCast Audio from ApacheConOur Community Development podcast channel FeatherCast has been extremely busy this month. Before ApacheCon we were recording interviews with some of the speakers and during the conference itself we talked to attendees, sponsors and some of our Apache Directors and Officers. We now have a large amount of content for you to listen to. Also don't worry if you missed missed ApacheCon or couldn't make to all the sessions because they have all been recorded. You can find the audio for all the ApacheCon sessions on FeatherCast. Thanks very much to all the people who worked hard to get this audio up so quickly after ApacheCon. Please follow FeatherCast on Twitter to be informed about our latest interviews. We are always looking for volunteers so if you are interested in helping out with our FeatherCast podcast channel then please email contact our mailing list feather@apache.org First Time ApacheCon Experience - Stephen DownieIf you haven't taken a look at the blog post by Stephen Downie then it is highly recommended. Miami was Stephen's his first ever ApacheCon and his blog post describes his experiences all the way through from apprehension to realisation. Stephen's full blog post can be found here. My First Experience of ApacheCon Blog Post by Stephen Downie OpenExpo in MadridDuring May we promoted that we had been invited to have an Apache booth at the Open Expo in Madrid on 1st June. This is a new event for us and we wanted to find out whether it could be a good place to have an Apache presence at in the future. We understand that the conference had over 3000 visitors and 300 of them passed by our Apache booth not only to pick up stickers but to also find out more about the Foundation itself. Please look out for our detailed blog post about the event. A big thank you to Ignasi Barrera, Jan Iversen and Sharan Foga who spent the day on booth duty during the event. Contacting Community DevelopmentRemember that we are always happy to get your feedback and comments so please feel free to contact us, follow our events and participate in our discussions on our mailing list. If you would like to be kept up to date with all the latest news about what is happening in Community Development then please subscribe to our mailing list by sending an email to dev-subscribe AT community DOT apache DOT org. [Less]
Posted 2 months ago by nickpan47
We are very excited to announce the release of Apache Samza 0.13.0.Samza has been powering real-time applications in production across several large companies (including LinkedIn, Netflix, Uber) for years now. Samza provides leading support for ... [More] large-scale stateful stream processing with: •  First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD. •  Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state. •  A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.). •  A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless. •  Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.New featuresThe 0.13.0 release contains previews for the following highly anticipated features:High Level APIWith the new high level API you can express your complex stream processing pipelines concisely in few lines of code and accomplish what previously required multiple jobs. This new API facilitates common operations like re-partitioning, windowing, and joining streams. Check out some examples to see the high level API in action here Flexible Deployment ModelSamza now provides flexibility for running your application in any hosting environment and with cluster managers other than YARN. Samza can now also be run as a lightweight stream processing library embedded inside your application. Your processes can coordinate task distribution amongst themselves using ZooKeeper or static partition assignments out-of-the box. See more details and code examples here.Enhancements, Upgrades and Bug FixesThis release also includes the following enhancements to existing features: SAMZA-871 adds a heart-beat mechanism between JobCoordinator and all running containers to prevent orphaned containers. SAMZA-1140 enables non-blocking commit in the AsyncRunloop. SAMZA-1143 adds configurations for localizing general resources in YARN. SAMZA-1145 provides the ability to configure the default number of changelog replicas. SAMZA-1154 adds a tasks endpoint to samza-rest to get information about all tasks in a job. SAMZA-1158 adds a samza-rest monitor to clean up stale local stores from completed containers. This release also includes several bug-fixes and improvements for operational stability. Some notable ones are: SAMZA-1083 prevents loading task stores that are older than delete tombstones during container startup. SAMZA-1100 fixes an exception when using an empty stream as both bootstrap and broadcast. SAMZA-1112 fixes BrokerProxy to log fatal errors. SAMZA-1121 fixes StreamAppender so that it doesn't propagate exceptions to the caller. SAMZA-1157 fixes logging for serialization/deserialization errors. We've also upgraded the following dependency versions: Samza now supports Scala 2.12. Kafka version to 0.10.1.1. Elasticsearch version to 2.2.0  Community DevelopmentsWe've made great community progress since the previous release. We showcased how Samza is powering stream processing at LinkedIn in Kafka Summit 2017 and O’Reilly Strata 2017. We also presented Samza use cases and case studies from several large companies in ApacheCon Big Data, 2017. In addition, the Samza talk in LinkedIn's Stream Processing Meetup in Sunnyvale was well-received with over 200 attendees. Here are links to some of these events: March 15, 2017 - Processing millions of events per second without breaking the bank - Kartik Paramasivam (Video) May 8, 2017 - Data Processing at LinkedIn with Apache Kafka and Apache Samza (Kafka Summit NYC 2017) (Slides) May 16, 2017 - What it takes to process a trillion events a day? Case studies in scaling stream processing at LinkedIn - Jagadish Venkatraman (ApacheCon Big Data '17) (Slides) May 16, 2017 - The continuing story of Batching to Streaming analytics at Optimizely, Michael Borsuk (ApacheCon Big Data’17) (Slides) May 24, 2017 - Managed or stand alone, streaming or batch; Unified processing with the Samza Fluent API - Yi Pan (LinkedIn Stream Processing Meetup) (Slides) May 25, 2017 - How companies are using Apache Samza - Jagadish Venkatraman (Apache Con podcast) Future:We'll continue improving the new High Level API and flexible deployment features with your feedback.It’s a great time to get involved. You can start by reviewing the tutorials, signing up for the mailing list, and grabbing some newbie JIRAs. I'd like to close by thanking everyone who's been involved in the project. It's been a great experience to be involved in this community, and I look forward to its continued growth. [Less]
Posted 2 months ago by sharan
Apache OFBiz News May 2017 Welcome to our regular monthly round-up of OFBiz news. This month we have news about our new OFBiz release, we begin work on tidying up and re-structuring our wiki, work starts on improving end user documentation for our ... [More] users and creating reports in OFBiz gets easier. Apache OFBiz 16.11.02 Released During this month the community announced the release of 16.11.02. This release consolidates all work done since the previous release in November last year. The complete OFBiz 16.11 series of releases is dedicated to the memory of Adrian Crum, OFBiz Committer and former PMC member who died last year. The release file can be downloaded following the instructions in the OFBiz Downloads page. Please refer to the Release Notes for more details of the changes introduced with this new version. A big thank you to everyone from the community who was involved in helping to get the release done. Re-structuring the OFBiz Wiki A key discussion this month was about a proposal to re-organize and restructure our existing wiki. The main aim will be to get it clean and more user friendly so that people can easily navigate and locate the information they need. Over the years our wiki has evolved and has become a little cluttered, meaning that information is spread across several pages or even workspaces. This can make it difficult for the community to find what they need. Documentation is extremely important for new users so this effort is more than welcome. Our wiki contains lots of useful information but also some older or outdated resources, so we need to work on tidying it up. An initial basic structure has been agreed upon as follows: Apache OFBiz - How and where to start? Documentation Community Developers Apache Software Foundation Wiki Attic Please note that this structure has already been implemented as the main wiki navigation menu. The next focus will be on reviewing the wiki pages and re-organizing them into these main categories. Many thanks to Michael Brohl for proposing and kickstarting this effort. We are looking for people to contribute to this work so if you are interested in helping with the wiki clean up effort then please join the discussion on the development mailing list. End User Documentation Another community initiative that was launched last month was about end user documentation. We currently do not have a consolidated end user guide that gives practical information about the setup and use of the standard OFBiz applications. Feedback from the community has shown that this is an important area that we need to address. Please note that the focus of this effort will be to provide information for users and non-technical people. The following points are part of the proposal: OFBiz Glossary : Putting together a full glossary of OFBiz words, definitions and concepts so that the people have a common understanding of what things mean End User Guide : This guide will give users an overview of the applications and processes of OFBiz and will include a basic list of tasks for each process Menu Structured Documentation : Documentation that follows the existing menu structure and provides details about a specific screen. Hopefully this will eventually replace or update the current in-application screen help that is available already within OFBiz How-Tos : This will be a quick reference How-To by topic Examples and Tutorials : These will provide practical examples of using the applications in a real life scenario We will be temporarily using our existing End User Documentation confluence workspace to work on and prepare these. As the documentation sections are completed, they will be moved back onto our re-structured wiki. Thanks very much to Craig Parker and Sharan Foga, who are leading and co-ordinating this effort. If you are interested in helping contribute or would like to be involved then please join the End User Documentation discussion on our development mailing list. OFBiz Flexible Reports Also announced this month was the creation of OFBiz Flexible Reports. This is a new feature that was recently added. Documentation about Flexible Reports is included in the OFBiz Birt component as part of the markdown files. This is a major improvement for creating reports in OFBiz. Essentially it is now a lot simpler for users to use the Birt component within OFBiz to easily create, modify and update reports. More details about the changes can be found here. New features and improvements Functional enhancements and improvements as well as updates of third party libraries and source code refactoring: Framework Update msyql sql-type for datetime field-type to support Fractional Seconds in Time Values (OFBIZ-9337) Remove final remaining dependencies from framework on plugins (OFBIZ-9322) Add support for 'set-if-null' and 'set-if-empty' attributes on screens for "set" element (OFBIZ-9251) "set-if-null" controls if field can be set to null and "set-if-empty" controls if field can be set to an empty value. Refactor fields with "id-ne", "id-long-ne" and "id-vlong-ne" to "id", "id-long" and "id-vlong" respectively which are primary keys (OFBIZ-9354) The new field types will be given the "not-null=true" attribute in order to make the fields NOT NULL in the database (similarly to primary keys). This change will be reflected in the documentation. This discussion sparked the change. Split the tools folder from the trunk and put it in another branch (OFBIZ-9256) The tools folder contains only tools used by the OFBiz team and is of no help for OFBiz users. So this should not be delivered with the OFBiz trunk, plugins or releases. Improvement of String Comparisons (OFBIZ-9254) There is an inconsistency in the code for string comparisons. For example statusId.equals("PRUN_COMPLETED") should be written as "PRUN_COMPLETED".equals(statusId) because the former can throw NullPointerException if the variable is found to be NULL. Upgrade Tomcat to 8.5.15 (OFBIZ-9366) Convert RateServices.xml from mini-lang to groovyDSL (OFBIZ-9381) Related to task OFBIZ-9350 . Deprecate mini-lang by converting the services updateRateAmount, deleteRateAmount, updatePartyRate and deletePartyRate from mini-ang to groovyDSL. Plugins Update Apache Solr/Lucene to release 6.2.1 (OFBIZ-8316) Rename Lucene runtime folders to clearly show the origin (OFBIZ-9357) Improvement of String Comparisons (see above) (OFBIZ-9254) Refactor fields which are primary keys for plugin components (see above) (OFBIZ-9351) Bugfixes Functional and technical bugfixes: Framework Remove duplicated data for PartyStatus Reference discussion: https://s.apache.org/T2UD Error viewing tomahawk-themed page when externalLoginKey is not enabled (OFBIZ-9345) In TemporalExpressions.Frequency the starting times of a job move away from given freqCount raster (OFBIZ-9374) If a job is scheduled using TemporalExpressions.Frequency the start time of the job will gradually move forward when the excecution of the job is delayed by one or more units of the frequency type. Plugins Multisite feature not working properly due to URL modification (OFBIZ-7120) Documentation Framework Remove unnecessary field types (see above) (OFBIZ-9351) [Less]
Posted 2 months ago by Sally
Another week has passed with the Apache community collaborating in full force: Support Apache –help sustain your favorite Apache project for less than $14/day. Every dollar counts. http://apache.org/foundation/contributing.html - new ASF VP ... [More] Fundraising Kevin McGrail on his goals for the coming year https://feathercast.apache.org/2017/05/18/kevin-mcgrail-fundraising-and-apachecon-north-america/ ASF Board –management and oversight of the business and affairs of the corporation in accordance with the Foundation's bylaws. - Next Board Meeting: 21 June 2017. Board calendar and minutes http://apache.org/foundation/board/calendar.html Success at Apache –monthly blog series that focuses on the processes behind why the ASF "just works". - Learning to Build a Stronger Community by John Ament https://s.apache.org/x9Be ApacheCon™ –the official conference of the Apache Software Foundation. Tomorrow's Technology Today. - Presentations from ApacheCon https://s.apache.org/Hli7 and Apache: Big Data https://s.apache.org/tefE - Videos of keynotes + presentations https://s.apache.org/AE3m and Audio recordings + soundbites from the conference floor https://feathercast.apache.org/ ASF Infrastructure –our distributed team on four continents keeps the ASF's infrastructure running around the clock. - 7M+ weekly checks yield skipping performance at 99.96% uptime http://status.apache.org/ Apache Directory™ LDAP API –an ongoing effort to provide an enhanced LDAP API, as a replacement for JNDI and the existing LDAP API (jLdap and Mozilla LDAP API). - Apache Directory LDAP API 1.0.0 released http://directory.apache.org/api Apache Groovy™ –a multi-facet programming language for the JVM. - Apache Groovy-2.5.0-beta-1 released https://groovy.apache.org/ Apache Hadoop™ –the cornerstone of the Big Data ecosystem, from which dozens of Apache Big Data projects and countless industry solutions originate. - The Apache Software Foundation Announces Momentum With Apache® Hadoop® v2.8 https://s.apache.org/h0Tl Apache HBase™ –an Open Source, distributed, versioned, non-relational database. - Apache HBase 1.2.6 released https://hbase.apache.org/ Apache Jackrabbit™ Oak –a scalable, high-performance hierarchical content repository designed for use as the foundation of modern world-class Web sites and other demanding content applications. - Apache Jackrabbit Oak 1.2.26 and 1.4.16 released http://jackrabbit.apache.org/ Apache Lucene™ –a high-performance, full-featured text search engine library written entirely in Java. - Apache Lucene 6.6.0 and Apache Solr 6.6.0 released http://lucene.apache.org/ Apache Tomcat™ –an Open Source software implementation of the Java Servlet, JavaServer Pages, Java Unified Expression Language, Java WebSocket and JASPIC technologies. - CVE-2017-5664 Apache Tomcat Security Constraint Bypass http://mail-archives.apache.org/mod_mbox/www-announce/201706.mbox/%3C3abc830a-69e1-9ce1-27c8-1eaf9c2d6739%40apache.org%3E Did You Know?  - Did you know that if it's not at *.apache.org, it's not from us? https://s.apache.org/QviH  - Did you know that Apache CouchDB was one of the first Apache projects to use git? https://blog.couchdb.org/2017/06/06/couchdb-developer-profile-joan-touzet/  - Did you know that nearly half of businesses' security breaches are due to the Internet Of Things? Apache Spot (incubating) can help! http://spot.incubator.apache.org/ Apache Community Notices:  - "Success at Apache" focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be  - The latest Apache Community Newsletter https://blogs.apache.org/comdev/entry/community-development-news-april-2017  - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity  - Apache ActiveMQ Call For Logo https://blogs.apache.org/activemq/entry/apache-activemq-call-for-logo  - Join members of the Apache Apex, Beam, Flink, Hadoop, Kafka, Lucene, Solr, and Spark communities at Berlin Buzzwords 11-13 June in Berlin https://berlinbuzzwords.de/17/  - The Apache Phoenix community will be holding PhoenixCon on 13 June in San Francisco https://www.eventbrite.com/e/phoenixcon-2017-tickets-32872245772  - Will we be seeing you at HBaseCon 12 June/Mountain View https://www.eventbrite.com/e/hbasecon-west-2017-tickets-33101238696 and PhoenixCon 13 June/San Francisco https://www.eventbrite.com/e/phoenixcon-2017-tickets-32872245772 ?  - Meet members of Apache's Cloud community at Cloud Foundry Summit Silicon Valley 13-15 June in Santa Clara; enjoy 20% off registration rates using discount code CFSV17ASF20 https://goo.gl/Uq3g0t  - Catch the Apache Ignite and Spark communities at the In-Memory Computing Summit 20-21 June in Amsterdam and 24-25 October in San Francisco https://imcsummit.org/  - ASF Operations Summary - Q3 FY2017 https://s.apache.org/NKFz  - The list of Apache project-related MeetUps can be found at http://apache.org/events/meetups.html  - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache HTTP Server, Avro, ComDev (community development), Directory, Incubator, OODT, POI, Polygene, Syncope, Tika, Trafodion, and more! https://helpwanted.apache.org/  - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby = = = For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers. # # # [Less]
Posted 2 months ago by mark...@apache.org
Real-Time SQL On Event Streams Mark Payne -  @dataflowmark Apache NiFi has grown tremendously over the past 2 and a half years since it was open sourced. The community is continuously thinking of, implementing, and contributing amazing ... [More] new features. The newly released version 1.2.0 of NiFi is no exception. One of the most exciting features of this new release is the QueryRecord Processor and the Record Reader and Record Writer components that go along with it. If you aren't familiar with those components, there's a blog post that explains how those work. This new Processor, powered by Apache Calcite, allows users to write SQL SELECT statements to run over their data as it streams through the system. Each FlowFile in NiFi can be treated as if it were a database table named FLOWFILE. These SQL queries can be used to filter specific columns or fields from your data, rename those columns/fields, filter rows, perform calculations and aggregations on the data, route the data, or whatever else you may want to use SQL for. All from the comfy confines of the most widely known and used Domain Specific Language. This is a big deal! Of course, there are already other platforms that allow you to run SQL over arbitrary data, though. So let's touch on how NiFi differs from other platforms that run SQL over arbitrary data outside of an RDBMS: Queries are run locally. There is no need to push the data to some external service, such as S3, in order to run queries over your data. There's also no need to pay for that cloud storage or the bandwidth, or to create temporary "staging tables" in a database. Queries are run inline. Since your data is already streaming through NiFi, it is very convenient to add a new QueryRecord Processor to your canvas. You are already streaming your data through NiFi, aren't you? If not, you may find Our Docs Page helpful to get started learning more about NiFi. Query data in any format. Write results in any data format. One of the goals of this effort was to allow data to be queried regardless of the format. In order to accomplish this, the Processor was designed to be configured with a "Record Reader" Controller Service and a "Record Writer" Controller Service. Out of the box, there are readers for CSV, JSON, Avro, and even log data. The results of the query can be written out in CSV, JSON, Avro, or free-form text (for example, a log format) using the NiFi Expression Language. If your data is in another format, you are free to write your own implementation of the Record Reader and/or Record Writer Controller Service. A simple implementation can wrap an existing library to return Record objects from an InputStream. That's all that is needed to run SQL over your own custom data format! Writing the results in your own custom data format is similarly easy. It's fast - very fast! Just how fast largely depends on the performance of your disks and the number of disks that you have available. The data must be read from disk, queried, and then the results written to disk. In most scenarios, reading of the data from disk is actually avoided due to operating system disk caching, though. We've tested the performance on a server with 32 cores and 64 GB RAM. We ran a continuous stream of JSON-formatted Provenance data through the Processor. To do this, we used the NiFi Provenance Reporting Task to send the Provenance data back to the NiFi instance. Because we wanted to stress the Processor, we then used a DuplicateFlowFile processor to create 200 copies of the data (this really just creates 200 pointers to the same piece of data; it doesn't copy the data itself). We used a SQL query to pull out any Provenance "SEND" event (a small percentage of the data, in this case). Using 12 concurrent tasks, we were able to see the query running at a consistent rate of about 1.2 GB per second on a single node - using less than half of the available CPU! Data Provenance keeps a detailed trail of what happened. One of the biggest differentiators between NiFi and other dataflow platforms is the detailed Data Provenance that NiFi records. If the data that you end up with is not what you expect, the Data Provenance feature makes it easy to see exactly what the data looked like at each point in the flow and pinpoint exactly what when wrong - as well as understand where the data come from and where the data went. When the flow has been updated, the data can be replayed with the click of a button and the new results can be verified. If the results are still not right, update your flow and replay again. How It Works In order to get started, we need to add a QueryRecord Processor to the graph. Once we've added the Processor to the graph, we need to configure three things: the Controller Service to use for reading data, the service to use for writing the results, and one or more SQL queries. The first time that you set this all up, it may seem like there's a lot going on. But once you've done it a couple of times, it becomes pretty trivial. The rest of this post will be dedicated to setting everything up. First, we will configure the Record Reader Controller Service. We'll choose to create a new service: We are given a handful of different types of services to choose from. For this example, we will use CSV data, so we will use a CSVReader: We will come back and configure our new CSVReader service in a moment. For now, we will click the "Create" button and then choose to create a new Controller Service to write the results. We will write the results in JSON format: We can again click the "Create" button to create the service. Now that we have created our Controller Services, we will click the "Go To" button on our reader service to jump to the Controller Service configuration page: Configuring the Reader We can click the pencil in the right-hand column to configure our CSV Reader: The CSV Reader gives us plenty of options to customize the reader to our format, as can be seen in the above image. For this example, we will leave most of the defaults, but we will change the "Skip Header Line" Property from the default value of "false" to "true" because our data will contain a header line that we don't want to process as an actual record. In order to run SQL over our data and make sense of the columns in our data, we need to also configure the reader with a schema for the data. We have several options for doing this. We can use an attribute on the FlowFile that includes an Avro-formatted schema. Or we can use a Schema Registry to store our schema and access it by name or by identifier and version. Let's go ahead and use a local Schema Registry and add our schema to that Registry. To do so, we will change the "Schema Access Strategy" to "Use 'Schema Name' Property." This means that we want to look up the Schema to use by name. The name of the schema is specified by the "Schema Name" Property. The default value for that property is "${schema.name}", which means that we will just use the "schema.name" attribute to identify which schema we want to use. Instead of using an attribute, we could just type the name of the schema here. Doing so would mean that we would need a different CSV Reader for each schema that we want to read, though. By using an attribute, we can have a single CSV Reader that works for all schemas. Next, we need to specify the Schema Registry to use. We click the value of the "Schema Registry" property and choose to "Create new service..." We will use the AvroSchemaRegistry. (Note that our incoming data is in CSV format and the output will be in JSON. That's okay, Avro in this sense only refers to the format of the schema provided. So we will provide a schema in the same way that we would if we were using Avro.) We will click the "Create" button and then click the "Go To" arrow that appears in the right-hand column in order to jump to the Schema Registry service (and click 'Yes' to save changes to our CSV Reader service). This will take us back to our Controller Services configuration screen. It is important to note that from this screen, each Controller Service has a "Usage" icon on the left-hand side (it looks like a little book). That icon will take you to the documentation on how to usage that specific Controller Service. The documentation is fairly extensive. Under the "Description" heading, each of the Record Readers and Writers has an "Additional Details..." link that provides much more detailed information about how to use service and provides examples. We will click the Edit ("pencil") icon next to the newly created AvroSchemaRegistry and go to the Properties tab. Notice that the service has no properties, so we click the New Property ("+") icon in the top-right corner. The name of the property is the name that we will use to refer to the schema. Let's call it "hello-world". For the value, we can just type or paste in the schema that we want to use using the Avro Schema syntax. For this example, we will use the following schema: { "name": "helloWorld", "namespace": "org.apache.nifi.blogs", "type": "record", "fields": [ { "name": "purchase_no", "type": "long" }, { "name": "customer_id", "type": "long" }, { "name": "item_id", "type": ["null", "long"] }, { "name": "item_name", "type": ["null", "string"] }, { "name": "price", "type": ["null", "double"] }, { "name": "quantity", "type": ["null", "int"] }, { "name": "total_price", "type": ["null", "double"] } ] } Now we can click "OK" and apply our changes. Clicking Enable (the lightning bolt icon) enables the service. We can now also enable our CSV Reader. Configuring the Writer Similarly, we need to configure our writer with a schema so that NiFi knows how we expect our data to look. If we click the Pencil icon next to our JSONRecordSetWriter, in the Properties tab, we can configure whether we want our JSON data to be pretty-printed or not and how we want to write out date and time fields. We also need to specify how to access the schema and how to convey the schema to down-stream processing. For the "Schema Write Strategy," since we are using JSON, we will just set the "schema.name" attribute, so we will leave the default value. If we were writing in Avro, for example, we would probably want to include the schema in the data itself. For the "Schema Access Strategy," we will use the "Schema Name" property, and set the "Schema Name" property to "${schema.name}" just as we did with the CSV Reader. We then select the same AvroSchemaRegistry service for the "Schema Registry" property. Again, we click "Apply" and then click the Lightning icon to enable our Controller Service and click the Enable button. We can then click the "X" to close out this dialog. Write the SQL Now comes the fun part! We can go back to configure our QueryRecord Processor. In the Properties tab, we can start writing our queries. For this example, let's take the following CSV data: purchase_no, customer_id, item_id, item_name, price, quantity 10280, 40070, 1028, Box of pencils, 6.99, 2 10280, 40070, 4402, Stapler, 12.99, 1 12440, 28302, 1029, Box of ink pens, 8.99, 1 28340, 41028, 1028, Box of pencils, 6.99, 18 28340, 41028, 1029, Box of ink pens, 8.99, 18 28340, 41028, 2038, Printer paper, 14.99, 10 28340, 41028, 4018, Clear tape, 2.99, 10 28340, 41028, 3329, Tape dispenser, 14.99, 10 28340, 41028, 5192, Envelopes, 4.99, 45 28340, 41028, 3203, Laptop Computer, 978.88, 2 28340, 41028, 2937, 24\" Monitor, 329.98, 2 49102, 47208, 3204, Powerful Laptop Computer, 1680.99, 1 In our Properties tab, we can click the "Add Property" button to add a new property. Because we can add multiple SQL queries in a single Processor, we need a way to distinguish the results of each query and route the data appropriately. As such, the name of the property is the name of the Relationship that data matching the query should be routed to. We will create two queries. The first will be named "over.1000" and will include the purchase_no and customer_id fields of any purchase that cost more than $1,000.00 and will also include a new field named total_price that is the dollar amount for the entire purchase. Note that when entering a value for a property in NiFi, you can use Shift + Enter to insert a newline in your value: SELECT purchase_no, customer_id, SUM(price * quantity) AS total_price FROM FLOWFILE GROUP BY purchase_no, customer_id HAVING SUM(price * quantity) > 1000 The second property will be named "largest.order" and will contain the purchase_no, customer_id, and total price of the most expensive single purchase (as defined by price times quantity) in the data: SELECT purchase_no, customer_id, SUM(price * quantity) AS total_price FROM FLOWFILE GROUP BY purchase_no, customer_id ORDER BY total_price DESC LIMIT 1 Now we will wire our QueryRecord processor up in our graph so that we can use it. For this demo, we will simply use a GenerateFlowFile to feed it data. We will set the "Custom Text" property to the CSV that we have shown above. In the "Scheduling" tab, I'll configure the processor to run once every 10 seconds so that I don't flood the system with data. We need to add a "schema.name" attribute, so we will route the "success" relationship of GenerateFlowFile to an UpdateAttribute processor. To this processor, we will add a new Property named "schema.name" with a value of "hello-world" (to match the name of the schema that we added to the AvroSchemaRegistry service). We will route the "success" relationship to QueryRecord. Next, we will create two UpdateAttribute Processors and connect the "over.1000" relationship to the first and the "largest.order" relationship to the other. This just gives us a simple place to hold the data so that we can view it. I will loop the "failure" relationship back to the QueryRecord processor so that if there is a problem evaluating the SQL, the data will remain in my flow. I'll also auto-terminate the "original" relationship because once I have evaluated the SQL, I don't care about the original data anymore. I'll also auto-terminate the "success" relationship of each terminal UpdateAttribute processor. When we start the QueryRecord, GenerateFlowFile, and the first UpdateAttribute processors, we end up with a FlowFile queued up before each UpdateAttribute processor: If we right-click on the "over.1000" connection and click "List queue," we are able to see the FlowFiles in the queue: Clicking the "information" icon on the left-hand-side of the FlowFile gives us the ability to view the FlowFile's attributes and download or view its contents. We can click the "View" button to view the content. Changing the "View as" drop-down at the top from "original" to "formatted" gives us a pretty-printed version of the JSON, and we quickly see the results that we want: Note that we have a null value for the columns that are defined in our schema that were not part of our results. If we wanted to, We could certainly update our schema in order to avoid having these fields show up at all. Viewing the FlowFiles of the "largest.order" connection shows us a single FlowFile also, with the content that we expect there as well: Of course, if we have already run the data through the flow and want to go back and inspect that data and what happened to it, we can easily do that through NiFi's powerful Data Provenance feature. Conclusion Here, I have given but a small glimpse into the capabilities of the new QueryRecord Processor for NiFi. The possibilities that it opens up are huge. I'd love to hear thoughts and ideas on how to make it even better or questions that you may have! Feel free to reach out to the NiFi mailing list at users@nifi.apache.org or dev@nifi.apache.org. [Less]
Posted 2 months ago by Sally
by John Ament As the next line in the series of "Success at Apache", I had to think about what kind of blog post I wanted to write.  Given my personal focus, it made sense to focus on new projects coming in and the incubator.  When I'm not ... [More] busy dreaming up new ideas and working on personal projects, I'm helping new projects get in to Apache, keeping their goals in alignment with the Apache Way http://apache.org/foundation/governance/ . I'm a member of a few different PMCs here at Apache, notably the Incubator. I'm a mentor to five different podlings right now. While my primary programming focus is on programming models, my podlings are all over the place. Starting a new project here at Apache can be a daunting task: how do I get in? What if I don't build a diverse community?  Becoming a podling has more to do with the community than it does the technical aspects of the project. We don't expect you to be experts in it, but we do expect new projects to be experts in how their own software works. We want to teach you, and we want you to be receptive to learning about The Apache Software Foundation and its best practices. I'm not sure if everyone does it, but I build a lot of parallels between how an ASF project works and how an Agile team works. Agile teams start off as a bunch of people who don't really know each other but have assembled themselves into an informal team focused on solving a problem, or some number of problems, knowing that they can only do it together. They have common goals and objectives, but lack camaraderie early on to be able to work together smoothly. Over time, they get to know one another, figure out strengths and weaknesses and can resolve issues together. A well-functioning team isn't one at the beginning. It takes time and practice for them to work well - both together and as an outwardly facing unit. Projects here at Apache follow the same type of maturity progression. Whether it's learning The Apache Way or learning to work with one another, it takes them time to mature and get into a good groove.  Open CommunicationThe ASF is pretty big on open communication, wherever it's a sensible solution. We want to discuss with each other what we're doing, ideas around how to solve it and come up with a good solution together, as a team, in an open manner.   This all ties into agile practices. We host stand ups to talk about what we're doing and see if others have an opinion about what we're doing. When a project comes to Apache, the original authors need to remember that they're bringing in a lot of experience, and the expectation is that those existing contributors must help get new contributors from the outside - outside their organization specifically - to contribute into the project. By driving towards open communication, outside of your own organization, you're encouraging more people to participate. This sort of governance model ensures that all parties who can participate are aware of decisions being made. Open Communication isn't for everything though. We need to remember to be respectful in our communications with others and if it's felt that something’s awry - speak privately. But remember that isn't part of the decision making process. Likewise, anytime we're talking about individuals in either a positive or negative way that should be conducted on the private list for a project. Turning Into a Well Oiled MachineOnce a project begins to grow, new people start to get attracted to it. As a community, you have to figure out how to work together. Building a community of diverse ideas and skills will ensure that new ideas keep flowing. Contributors can react quickly to a user's question on list and help them resolve the problem, put in an enhancement request or get a bug report squashed in a following commit. Time is of the essence right now because I have availability to work on this. There can't be a long drawn out waterfall style process when dealing with Open Source. At the same time, making sure there's a documented decision process and in sometimes an in depth design is critical for both new contributors and existing alike to come to a shared understanding of what is being proposed. SustainingProjects need to plan for longevity. Longevity comes in many forms. A strong backlog of features is important. Having a diverse set of committers is even more critical. You could even say that each helps create the other. Just like any feature set, we get to a point where the feature is complete enough that we can move on to another feature.   How do you get there?Apache's main way to go to these points is to incubate http://incubator.apache.org/ . You can't get to this point by yourselves, experiencing with first-hand from existing Foundation members will help get your community to turn a new leaf and adopt this way of working. We want you to be successful, as long as your project can dedicate itself to the practices that have been set forth within the Foundation. New projects may be comfortable with a champion http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Champion that can work with them closely, answering their questions up front. While a lot of the pre-incubation chatter will happen off list, it is important that potential new podlings subscribe to the incubator general list http://incubator.apache.org/guides/lists.html#general+at+incubator.apache.org and understand both the goings on of a podling as well as try to build their list of mentors http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Mentor in the open. Mentors are extremely important to a podling, and understanding their roles and why you need to pick great mentors is something your champion and the rest of the Incubator community can help explain. Participating in our public discussion lists is sometimes the first step to joining the foundation at a deeper level. Where do we go next?If you're a potential new project, feel free to reach out on the Incubator mailing lists http://incubator.apache.org/guides/lists.html#general+at+incubator.apache.org to get started. We'd love to hear from you and get you acquainted with The Apache Software Foundation. If you're on an existing project, we want to hear your perspectives on how the Foundation works. You may want to reach out to dev@community http://community.apache.org/lists.html to let others know your thoughts, or even just subscribe and see what others have to say. We're all working together to make the foundation better. The more input we receive, both the positive and the negative, will help shape everyone's actions in the community. = = = "Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh [Less]
Posted 2 months ago by Sally
Major release of the cornerstone of the Big Data ecosystem, from which dozens of Apache Big Data projects and countless industry solutions originate. Forest Hill, MD —5 June 2017— The Apache Software Foundation (ASF), the all-volunteer ... [More] developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today momentum with Apache® Hadoop® v2.8, the latest version of the Open Source software framework for reliable, scalable, distributed computing. Now ten years old, Apache Hadoop dominates the greater Big Data ecosystem as the flagship project and community amongst the ASF's more than three dozen projects in the category. "Apache Hadoop 2.8 maintains the project's momentum in its stable release series," said Chris Douglas, Vice President of Apache Hadoop. "Our community of users, operators, testers, and developers continue to evolve the thriving Big Data ecosystem at the ASF. We're committed to sustaining the scalable, reliable, and secure platform our greater Hadoop community has built over the last decade." Apache Hadoop supports processing and storage of extremely large data sets in a distributed computing environment. The project has been regularly lauded by industry analysts worldwide for driving market transformation. Forrester Research estimates that firms will spend US$800M in Hadoop software and related services in 2017. According to Zion Market Research, the global Hadoop market is expected to reach approximately US$87.14B by 2022, growing at a CAGR of around 50% between 2017 and 2022. Apache Hadoop 2.8 is the result of 2 years of extensive collaborative development from the global Apache Hadoop community. With 2,914 commits as new features, improvements and bug fixes since v2.7, highlights include: Several important security related enhancements, including Hadoop UI protection of Cross-Frame Scripting (XFS) which is an attack that combines malicious JavaScript with an iframe that loads a legitimate page in an effort to steal data from an unsuspecting user, and Hadoop REST API protection of Cross site request forgery (CSRF) attack which attempt to force an authenticated user to execute functionality without their knowledge. Support for Microsoft Azure Data Lake as a source and destination of data. This benefits anyone deploying Hadoop in Microsoft's Azure Cloud. The Azure Data Lake service was actually developed for Hadoop and analytics workloads. The "S3A" client for working with data stored in Amazon S3 has been radically enhanced for scalability, performance, and security. The performance enhancements were driven by Apache Hive and Apache Spark benchmarks. In Hive TCP-DS benchmarks, Apache Hadoop is currently faster working with columnar data stored in S3  than Amazon EMR's closed-source connector. This shows the benefit of collaborative Open Source development. Several WebHDFS related enhancements include integrated CSRF prevention filter in WebHDFS, support OAuth2 in WebHDFS, disallow/allow snapshots via WebHDFS, and more. Integration with other applications has been improved with a separate jar for the hdfs-client than the hadoop-hdfs JAR with all the server side code. Downstream projects that access HDFS can depend on the hadoop-hdfs-client module to reduce the amount of transitive classpath dependencies. YARN NodeManager Resource Reconfiguration through RM Admin CLI for a live cluster that allows YARN clusters to have a more flexible resource model especially for a Cloud deployment. In addition to physical Hadoop clusters, where the majority of storage and computation lies, Apache Hadoop is very popular within Cloud infrastructures. Contributions from Apache Hadoop's diverse community includes improvements provided by Cloud infrastructure vendors and large Hadoop-in-Cloud users. These improvements include: Azure and S3 storage and YARN reconfiguration in particular, improve Hadoop's deployment on and integration with Cloud Infrastructures. The improvements in Hadoop 2.8 enable Cloud-deployed clusters to be more dynamic in sizing, adapting to demand by scaling up and down. "My colleagues and I are happy that tests of Apache Hive and Hadoop 2.8 show that we are able to provide a similar experience reading data in from S3 as Amazon EMR, with its closed-source fork/rewrite of S3," said Steve Loughran, member of the Apache Hadoop Project Management Committee. Hailed as a "Swiss army knife of the 21st century" by the Media Guardian Innovation Awards  and "the most important software you’ve never heard of…helped enable both Big Data and Cloud computing" by author Thomas Friedman, Apache Hadoop is used by an array of companies such as Alibaba, Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, IBM, HP, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, SAP,  Tencent, Teradata, Tesla Motors, Uber, and Twitter. Yahoo, an early pioneer, hosts the world's largest known Hadoop production environment to date, spanning more than 38,000 nodes. Catch Apache Hadoop in action at DataWorks Summit 13-15 June 2017 in San Jose, CA. Availability and OversightApache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop About The Apache Software Foundation (ASF)Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF © The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners. # # # [Less]