I Use This!
Activity Not Available

News

Analyzed 3 months ago. based on code collected 4 months ago.
Posted 2 days ago by Sally
One of the great things about Apache is that we're all about the individual (contributor). No one has higher rank/status over another. We're not pay-to-play: no-one can "buy" their way in. Titles are for organizational purposes only: a Vice President ... [More] of a project doesn't carry any more weight than any other member of a project management committee, for example. We have diverse backgrounds, opinions, and experiences. Each person has their own preferences and personal style, and we celebrate that. Whilst we do adhere to The Apache Way, we don't impose "corporate conformity" directives on anyone, from our support staff to our executive leadership. As technologists (and perfectionists), we're trained to look for bugs and are always looking for ways to make things better. And, in keeping with our tenets of openness, our matter-of-fact communication style can sometimes be perceived as too honest and transparent. In light of that, it might be easy to misinterpret the intent of the State of The Feather presentation by ASF President Sam Ruby at ApacheCon last week: This isn't another "the ASF is great" presentation where I will talk about how we do things differently/better than others. Instead, this is a talk where I identify what works and where there is more work that needs to be done. TL;DR We've been around for 18 years. We're continuing to grow by every measure. We expect to continue to be around. We expect to continue to grow. ...Perhaps even a bit too fast. I'm not saying it is easy… As with any organization managing dramatic business growth, meeting these challenges presents unique opportunities, which, at times, may not be an easy feat with an all-volunteer Board overseeing a nearly all-volunteer organization. Luckily for us, we are well-versed in the mantra "If it isn't hard, it isn't worth doing". With more than 18 years of successfully honing our process of developing, incubating, and shepherding projects under our belt, we are well prepared to overcome operational demands. The Foundation's ongoing transformation is driven by existing Apache projects and an impressive number of new innovations undergoing incubation. The collective Apache community continues to be highly productive, as summarized every week. Our commitment to rise to the challenge is evident, as demonstrated at ApacheCon. We are proud of our achievements and look forward to sharing our successes in the upcoming Annual Report. # # #  [Less]
Posted 3 days ago by mark...@apache.org
Mark Payne -  @dataflowmark Intro - The What Apache NiFi is being used by many companies and organizations to power their data distribution needs. One of NiFi's strengths is that the framework is data agnostic. It doesn't care what ... [More] type of data you are processing. There are processors for handling JSON, XML, CSV, Avro, images and video, and several other formats. There are also several general-purposes processors, such as RouteText and CompressContent. Data can be hundreds of bytes or many gigabytes. This makes NiFi a powerful tool for pulling data from external sources; routing, transforming, and aggregating it; and finally delivering it to its final destinations. While this ability to handle any arbitrary data is incredibly powerful, we often see users working with record-oriented data. That is, large volumes of small "records," or "messages," or "events." Everyone has their own way to talk about this data, but we all mean the same thing. Lots of really small, often structured, pieces of information. This data comes in many formats. Most commonly, we see CSV, JSON, Avro, and log data. While there are many tasks that NiFi makes easy, there are some common tasks that we can do better with. So in version 1.2.0 of NiFi, we released a new set of Processors and Controller Services, for working with record-oriented data. The new Processors are configured with a Record Reader and a Record Writer Controller Service. There are readers for JSON, CSV, Avro, and log data. There are writers for JSON, CSV, and Avro, as well as a writer that allows users to enter free-form text. This can be used to write data out in a log format, like it was read in, or any other custom textual format. The Why These new processors make building flows to handle this data simpler. It also means that we can build processors that accept any data format without having to worry about the parsing and serialization logic. Another big advantage of this approach is that we are able to keep FlowFiles larger, each consisting of multiple records, which results in far better performance. The How - Explanation In order to make sense of the data, Record Readers and Writers need to know the schema that is associated with the data. Some Readers (for example, the Avro Reader) allow the schema to be read from the data itself. The schema can also be included as a FlowFile attribute. Most of the time, though, it will be looked up by name from a Schema Registry. In this version of NiFi, two Schema Registry implementations exist: an Avro-based Schema Registry service and a client for an external Hortonworks Schema Registry. Configuring all of this can feel a bit daunting at first. But once you've done it once or twice, it becomes rather quick and easy to configure. Here, we will walk through how to set up a local Schema Registry service, configure a Record Reader and a Record Writer, and then start putting to use some very powerful processors. For this post, we will keep the flow simple. We'll ingest some CSV data from a file and then use the Record Readers and Writers to transform the data into JSON, which we will then write to a new directory. The How - Tutorial In order to start reading and writing the data, as mentioned above, we are going to need a Record Reader service. We want to parse CSV data and turn that into JSON data. So to do this, we will need a CSV Reader and a JSON Writer. So we will start by clicking the "Settings" icon on our Operate palette. This will allow us to start configuring our Controller Services. When we click the 'Add' button in the top-right corner, we see a lot of options for Controller Services: Since we know that we want to read CSV data, we can type "CSV" into our Filter box in the top-right corner of the dialog, and this will narrow down our choices pretty well: We will choose to add the CSVReader service and then configure the Controller Service. In the Properties tab, we have a lot of different properties that we can set: Fortunately, most of these properties have default values that are good for most cases, but you can choose which delimiter character you want to use, if it's not a comma. You can choose whether or not to skip the first line, treating it as a header, etc. For my case, I will set the "Skip Header Line" property to "true" because my data contains a header line that I don't want to process as a record. The first properties are very important though. The "Schema Access Strategy" is used to instruct the reader on how to obtain the schema. By default, it is set to "Use String Fields From Header." Since we are also going to be writing the data, though, we will have to configure a schema anyway. So for this demo, we will change this strategy to "Use 'Schema Name' Property." This means that we are going to lookup the schema from a Schema Registry. As a result, we are now going to have to create our Schema Registry. If we click on the "Schema Registry" property, we can choose to "Create new service...": We will choose to create an AvroSchemaRegistry. It is important to note here that we are reading CSV data and writing JSON data - so why are we using an Avro Schema Registry? Because this Schema Registry allows us to convey the schema using the Apache Avro Schema format, but it does not imply anything about the format of the data being read. The Avro format is used because it is already a well-known way of storing data schemas. Once we've added our Avro Schema Registry, we can configure it and see in the Properties tab that it has no properties at all. We can add a schema by adding a new user-defined property (by clicking the 'Add' / 'Plus' button in the top-right corner). We will give our schema the name "demo-schema" by using this as the name of the property. We can then type or paste in our schema. For those unfamiliar with Avro schemas, it is a JSON formatted representation that has a syntax like the following: { "name": "recordFormatName", "namespace": "nifi.examples", "type": "record", "fields": [ { "name": "id", "type": "int" }, { "name": "firstName", "type": "string" }, { "name": "lastName", "type": "string" }, { "name": "email", "type": "string" }, { "name": "gender", "type": "string" } ] } Here, we have a simple schema that is of type "record." This is typically the case, as we want multiple fields. We then specify all of the fields that we have. There is a field named "id" of type "int" and all other fields are of type "string." See the Avro Schema documentation for more information. We've now configured our schema! We can enable our Controller Services now. We can now add our JsonRecordSetWriter controller service as well. When we configure this service, we see some familiar options for indicating how to determine the schema. For the "Schema Access Strategy," we will again use the "Use 'Schema Name' Property," which is the default. Also note that the default value for the "Schema Name" property uses the Expression Language to reference an attribute named "schema.name". This provides a very nice flexibility, because now we can re-use our Record Readers and Writers and simply convey the schema by using an UpdateAttribute processor to specify the schema name. No need to keep creating Record Readers and Writers. We will set the "Schema Registry" property to the AvroSchemaRegistry that we just created and configured. Because this is a Record Writer instead of a Record Reader, we also have another interesting property: "Schema Write Strategy." Now that we have configured how to determine the data's schema, we need to tell the writer how to convey that schema to the next consumer of the data. The default option is to add the name of the schema as an attribute. We will accept the default. But we could also write the entire schema as a FlowFile attribute or use some strategies that are useful for interacting with the Hortonworks Schema Registry. Now that we've configured everything, we can apply the settings and start our JsonRecordSetWriter as well. We've now got all of our Controller Services setup and enabled: Now for the fun part of building our flow! The above will probably take about 5 minutes, but it makes laying out the flow super easy! For our demo, we will have a GetFile processor bring data into our flow. We will use UpdateAttribute to add a "schema.name" attribute of "demo-schema" because that is the name of the schema that we configured in our Schema Registry: We will then use the ConvertRecord processor to convert the data into JSON. Finally, we want to write the data out using PutFile: We still need to configure our ConvertRecord processor, though. To do so, all that we need to configure in the Properties tab is the Record Reader and Writer that we have already configured: Now, starting the processors, we can see the data flowing through our system! Also, now that we have defined these readers and writers and the schema, we can easily create a JSON Reader, and an Avro Writer, for example. And adding additional processors to split the data up, query and route the data becomes very simple because we've already done the "hard" part. Conclusion In version 1.2.0 of Apache NiFi, we introduced a handful of new Controller Services and Processors that will make managing dataflows that process record-oriented data much easier. I fully expect that the next release of Apache NiFi will have several additional processors that build on this. Here, we have only scratched the surface of the power that this provides to us. We can delve more into how we are able to transform and route the data, split the data up, and migrate schemas. In the meantime, each of the Controller Services for reading and writing records and most of the new processors that take advantage of these services have fairly extensive documentation and examples. To see that information, you can right-click on a processor and click "Usage" or click the "Usage" icon on the left-hand side of your controller service in the configuration dialog. From there, clicking the "Additional Details..." link in the component usage will provide pretty significant documentation. For example, the JsonTreeReader provides a wealth of knowledge in its Additional Details. [Less]
Posted 4 days ago by jleroux
Reporting in OFBiz and the new OFBiz Flexible Reports Reporting in OFBiz, a brief history Initially, OFBiz came with JasperReports as its main reporting tool but due to licensing issues, it was removed when OFBiz became part of the Apache ... [More] Software Foundation. OFBiz needed to generate reports so at that time only Apache FOP was available until Birt was added as to OFBiz as a component. Apart their licenses, Birt and JasperReports are roughly comparable. Notably, they both offer a report editor, respectively Birt Report Designer and iReport Designer. Some people prefer Birt, while others prefer JasperReports. When making a choice, there is though a major difference between them. JasperReports works on a pixel basis, like a desktop GUI while Birt works on relative positions, like HTML. This is an important point, because it makes them incompatible meaning that you can't convert a report from one format to the other. After an initial effort in 2007, the Birt component was finally integrated into OFBiz in 2009. It was later refactored because of few issues. And more recently, because it was not correctly licensed, a minor part, the Birt Web Viewer was removed. But the main purpose of this blog post is to announce the creation of the OFBiz Flexible Reports. It's a new feature added recently and documented inside the Birt component with Markdown files. These Markdown files are also used to render a wiki page where using them is explained in more detail. But in few words, why are the new OFBiz Flexible Reports important? When developing an application, the reporting part is often neglected. This is because, though it seems trivial to create reports, it's often more complicated. And reporting is almost always a major function in displaying and using information from any application software. To create reports in OFBiz with Birt before the OFBiz Flexible Reports, you had to: completely create the design of your report using the Birt Report Designer. Code in Java how to collect and insert the data into the report In other words, you had to create a complete .rptdesign file using the Birt Report Designer. And once done as a single piece, you could not change your report without changing the .rptdesign file. This was time consuming and something that users could not easily adapt to their needs. As explained in the OFBiz documentation, with the OFBiz Flexible Reports you can: automatically create a .rptdesign file based on an Entity, an Entity View or even a Service. It's then very easy, using the Birt Report Designer, to add the data set you want to use The report design can be easily enhanced by the user with the Birt Report Designer. Now you can see the advantages that OFBiz Flexible Reports has over what was previously available in OFBiz. Thanks for taking the time to read this post and please feel free to provide your comments and feedback. For any questions please use the Apache OFBiz User Mailing list. This blog post was written by Jacques Le Roux, Apache OFBiz PMC Member, Committer and ASF Member. [Less]
Posted 4 days ago by jleroux
Reporting in OFBiz and the new OFBiz Flexible Reports Reporting in OFBiz, a brief history Initially, OFBiz came with JasperReports as its main reporting tool. But due to licensing issues, it was removed when OFBiz became part of the Apache ... [More] Software Foundation. OFBiz needed to generate reports so at that time only Apache FOP was available until Birt was added to OFBiz as a component. Apart their licenses, Birt and JasperReports are roughly comparable. Notably, they both offer a report editor, respectively Birt Report Designer and iReport Designer. Some people prefer Birt, while others prefer JasperReports. When making a choice, there is though a major difference between them. JasperReports works on a pixel basis, like a desktop GUI, while Birt works on relative positions, like HTML. This is an important point, because it makes them incompatible, meaning that you can't convert a report from one format to the other. After an initial effort in 2007, the birt component was finally integrated into OFBiz in 2009. It was later refactored because of few issues. And more recently, because it was not correctly licensed, a minor part, the Birt Web Viewer was removed. But the main purpose of this blog post is to announce the creation of the OFBiz Flexible Reports. It's a new feature added recently and documented inside the birt component with Markdown files. These Markdown files are also used to render a wiki page where using them is explained in more detail. But in few words, why are the new OFBiz Flexible Reports important? When developing an application, the reporting part is often neglected. This is because, though it seems trivial to create reports, it's often more complicated. And reporting is almost always a major function in displaying and using information from any application software. To create reports in OFBiz with Birt before the OFBiz Flexible Reports, you had to: completely create the design of your report using the Birt Report Designer. Code in Java how to collect and insert the data into the report In other words, you had to create a complete .rptdesign file using the Birt Report Designer. And once done as a single piece, you could not change your report without changing the .rptdesign file. This was time consuming and something that users could not easily adapt to their needs. As explained in the OFBiz documentation, with the OFBiz Flexible Reports you can: automatically create a .rptdesign file based on an Entity, an Entity View or even a Service. It's then very easy, using the Birt Report Designer, to add the data set you want to use. Later, the report design can be easily enhanced by the user with the Birt Report Designer. Now you can see the advantages that OFBiz Flexible Reports has over what was previously available in OFBiz. Thanks for taking the time to read this post and please feel free to provide your comments and feedback. For any questions please use the Apache OFBiz User Mailing list. This blog post was written by Jacques Le Roux, Apache OFBiz PMC Member, Committer and ASF Member. [Less]
Posted 5 days ago by Sally
We're wrapping up a great week in Miami at ApacheCon, with thanks to all our attendees, event sponsors, organizers, producers, staff, volunteers, and the greater Apache community of developers, users, and enthusiasts --we miss you already. Here's ... [More] what happened this week: Support Apache –if Apache software has helped you, please consider making a donation, no matter the size. Every dollar counts. http://apache.org/foundation/contributing.html ASF Board –management and oversight of the business and affairs of the corporation in accordance with the Foundation's bylaws. - Next Board Meeting: 21 June 2017. Board calendar and minutes http://apache.org/foundation/board/calendar.html ApacheCon™ –the official conference of the Apache Software Foundation. Tomorrow's Technology Today. - Presentations from ApacheCon https://s.apache.org/Hli7 and Apache: Big Data https://s.apache.org/tefE - Videos of keynotes + presentations are now available https://s.apache.org/AE3m - Soundbites from the conference floor https://feathercast.apache.org/ ASF Infrastructure –our distributed team on four continents keeps the ASF's infrastructure running around the clock. - 7M+ weekly checks yield outstanding performance at 99.98% uptime http://status.apache.org/ Apache Archiva™ –an application for managing one or more remote repositories, including administration, artifact handling, browsing and searching. - Apache Archiva 2.2.3 released http://archiva.apache.org/ Apache Beam™ –Open Source unified programming model for batch and streaming Big Data processing. - The Apache Software Foundation Announces Apache® Beam™ v2.0.0 https://s.apache.org/k5W7 Apache Buildr™ –a build system for Java-based applications, including support for Scala, Groovy and a growing number of JVM languages and tools. - Apache Buildr 1.5.3 released http://buildr.apache.org/ Apache CarbonData™ –an indexed columnar data format for fast analytics on Big Data platforms such as Apache Hadoop and Apache Spark. - Apache CarbonData 1.1.0 released http://carbondata.apache.org/ Apache Commons™ Compress –library defines an API for working with ar, cpio, Unix dump, tar, zip, gzip, XZ, Pack200, bzip2, 7z, arj, lzma, snappy, DEFLATE, lz4, Brotli and Z files. - Apache Commons Compress 1.14 released http://commons.apache.org/compress/ Apache Directory™ Kerby –a Java Kerberos binding. - Apache Kerby™ 1.0.0 released http://directory.apache.org/kerby Apache NiFi™ MiNiFi –provides a complementary data collection approach that supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation. - Apache NiFi MiNiFi C++ 0.2.0 released https://nifi.apache.org/minifi Apache PDFBox™ –an Open Source Java tool for working with PDF documents. - Apache PDFBox 2.0.6 released http://pdfbox.apache.org/ Apache Qpid™ –implements the latest AMQP specification, the first open standard for enterprise messaging, and provides transaction management, queuing, distribution, security, management, clustering, federation and heterogeneous multi-platform support and a lot more. - Apache Qpid JMS 0.23.0 released http://qpid.apache.org Apache Samza™ –Open Source Big Data distributed stream processing framework in production at Intuit, LinkedIn, Netflix, Optimizely, Redfin, and Uber, among other organizations. - The Apache Software Foundation Announces Apache® Samza™ v0.13 https://s.apache.org/CSbJ Apache Tomcat™ –an Open Source software implementation of the Java Servlet, JavaServer Pages, Java Unified Expression Language, Java WebSocket and Java Authentication Service Provider Interface for Containers technologies. - Apache Tomcat 7.0.78, 8.0.44, and 8.5.15 released http://tomcat.apache.org/ Apache Wicket™ –an Open Source Java component oriented web application framework that powers thousands of Web applications and web sites for governments, stores, universities, cities, banks, email providers, and more. - Apache Wicket 7.7.0 and 8.0.0-M6 released http://wicket.apache.org/ Did You Know?  - Did you know that Autodesk's private Cloud is powered by Apache CloudStack? http://cloudstack.apache.org/  - Did you know that Formula 1 races have 1.5 billions of data points for per race, and use Apache Drill, Flink, Hadoop, HBase, Hive, Kafka, MapReduce, Solr, and Spark for their Big Data architectures? http://events.linuxfoundation.org/sites/events/files/slides/fast_car_big_data_code_motion_carol3.pdf  - Did you know that over the past month, 1,086 Apache Committers changed 5,147,842 lines of code over 15,487 commits? http://status.apache.org/ Apache Community Notices:  - "Success at Apache" is a blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh  - The latest Apache Community Newsletter https://blogs.apache.org/comdev/entry/community-development-news-april-2017  - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity  - The Apache Phoenix community will be holding PhoenixCon on 13 June in San Francisco https://www.eventbrite.com/e/phoenixcon-2017-tickets-32872245772  - Catch the Apache Ignite and Spark communities at the In-Memory Computing Summit 20-21 June in Amsterdam and 24-25 October in San Francisco https://imcsummit.org/  - ASF Operations Summary - Q3 FY2017 https://s.apache.org/NKFz  - The list of Apache project-related MeetUps can be found at http://apache.org/events/meetups.html  - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache HTTP Server, Avro, ComDev (community development), Directory, Incubator, OODT, POI, Polygene, Syncope, Tika, Trafodion, and more! https://helpwanted.apache.org/  - ApacheCon North America + Apache: BigData, CloudStack Collaboration Conference, FlexJS Summit, Apache: IoT, and TomcatCon will be held 16-18 May 2017 in Miami http://apachecon.com/  - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby = = = For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers. # # # [Less]
Posted 7 days ago by Sally
Open Source unified programming model for batch and streaming Big Data processing in use at Google Cloud, PayPal, and Talend, among others. Forest Hill, MD —17 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers ... [More] , stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Beam™ v2.0.0, the first stable release of the unified programming model for both batch and streaming Big Data processing. An Apache Top-Level Project (TLP) since December 2016, Beam includes Java and Python software development kits used to define data processing pipelines and runners to execute them on Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow, among other execution engines. Apache Beam has its roots in Google's internal work on data processing over the last decade, evolving from the initial MapReduce system, through FlumeJava and MillWheel, into Google Cloud Dataflow v1.x, which defined the unified programming model that became the heart of Apache Beam. "The first stable release is an important milestone for the Apache Beam community," said Davor Bonaci, Vice President of Apache Beam. "This is a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, making Beam suitable for enterprise deployment." Apache Beam v2.0.0 improves user experience across the project, focusing on seamless portability across execution environments, including engines, operating systems, on-premise clusters, cloud providers, and data storage systems. Other highlights include: API stability and future compatibility within this major version; Stateful data processing paradigms that unlock efficient, data-dependent computations; Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and A metrics subsystem for deeper insight into pipeline execution. Apache Beam is in use at Google Cloud, PayPal, and Talend, among others. "Apache Beam is a mature data processing API for the enterprise, with powerful semantics that solve real-world challenges of stream processing," said Tomer Pilossof, Big Data Manager at PayPal. "With Beam, we provide data processing solutions for a wide range of customers within the PayPal organization." "We at Talend are thrilled to have contributed to Apache Beam reaching the 2.0.0 milestone and its first official stable release," said Laurent Bride, Chief Technology Officer at Talend. "Apache Beam is now part of the foundation of Talend products. Recently, we released Talend Data Preparation for Big Data which leverages Beam to create transformation pipelines that are portable across many execution engines. Later this year, we plan to deliver Talend Data Streams, taking the Apache Beam integration one step further by utilizing its powerful streaming semantics. Whether for batch, streaming, or real-time use cases, Apache Beam is a powerful framework that delivers the flexibility and advanced functionality our customers need." "We congratulate the Apache Beam community for reaching the key milestone of a first stable release," said William Vambenepe, Lead Product Manager for Big Data, Google Cloud. "We look forward to our Google Cloud Dataflow customers taking full advantage of Beam's powerful programming model and newest features to run their data processing pipelines on Google Cloud." Apache Beam v2.0.0 is making its debut at Apache: Big Data, taking place this week in Miami, FL, with four sessions featuring Apache Beam. Apache Beam will also be highlighted at numerous face-to-face meetups and conferences, including the Future of Data San Jose meetup, Strata Data Conference London, Berlin Buzzwords, and DataWorks Summit San Jose. "I'd like to invite everyone to try out Apache Beam v2.0.0 today and consider joining our vibrant community," added Bonaci. "We welcome feedback, contribution and participation through our mailing lists, issue tracker, pull requests, and events." Availability and Oversight Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/ and https://twitter.com/ApacheBeam About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server -- the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF © The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apex", "Apache Apex", "Flink", "Apache Flink", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners. # # # [Less]
Posted 7 days ago by Sally
Open Source unified programming model for batch and streaming Big Data processing in use at Google Cloud, PayPal, and Talend, among others. Forest Hill, MD —17 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers ... [More] , stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Beam™ v2.0.0, the first stable release of the unified programming model for both batch and streaming Big Data processing. An Apache Top-Level Project (TLP) since December 2016, Beam includes Java and Python software development kits used to define data processing pipelines and runners to execute them on Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow, among other execution engines. Apache Beam has its roots in Google's internal work on data processing over the last decade, evolving from the initial MapReduce system, through FlumeJava and MillWheel, into Google Cloud Dataflow v1.x, which defined the unified programming model that became the heart of Apache Beam. "The first stable release is an important milestone for the Apache Beam community," said Davor Bonaci, Vice President of Apache Beam. "This is a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, making Beam suitable for enterprise deployment." Apache Beam v2.0.0 improves user experience across the project, focusing on seamless portability across execution environments, including engines, operating systems, on-premise clusters, cloud providers, and data storage systems. Other highlights include: API stability and future compatibility within this major version; Stateful data processing paradigms that unlock efficient, data-dependent computations; Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and A metrics subsystem for deeper insight into pipeline execution. Apache Beam is in use at Google Cloud, PayPal, and Talend, among others. "Apache Beam is a mature data processing API for the enterprise, with powerful semantics that solve real-world challenges of stream processing," said Tomer Pilossof, Big Data Manager at PayPal. "With Beam, we provide data processing solutions for a wide range of customers within the PayPal organization." "We at Talend are thrilled to have contributed to Apache Beam reaching the 2.0.0 milestone and its first official stable release," said Laurent Bride, Chief Technology Officer at Talend. "Apache Beam is now part of the foundation of Talend products. Recently, we released Talend Data Preparation for Big Data which leverages Beam to create transformation pipelines that are portable across many execution engines. Later this year, we plan to deliver Talend Data Streams, taking the Apache Beam integration one step further by utilizing its powerful streaming semantics. Whether for batch, streaming, or real-time use cases, Apache Beam is a powerful framework that delivers the flexibility and advanced functionality our customers need." "We congratulate the Apache Beam community for reaching the key milestone of a first stable release," said William Vambenepe, Lead Product Manager for Big Data, Google Cloud. "We look forward to our Google Cloud Dataflow customers taking full advantage of Beam's powerful programming model and newest features to run their data processing pipelines on Google Cloud." Apache Beam v2.0.0 is making its debut at Apache: Big Data, taking place this week in Miami, FL, with four sessions featuring Apache Beam. Apache Beam will also be highlighted at numerous face-to-face meetups and conferences, including the Future of Data San Jose meetup, Strata Data Conference London, Berlin Buzzwords, and DataWorks Summit San Jose. "I'd like to invite everyone to try out Apache Beam v2.0.0 today and consider joining our vibrant community," added Bonaci. "We welcome feedback, contribution and participation through our mailing lists, issue tracker, pull requests, and events." Availability and Oversight Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/ and https://twitter.com/ApacheBeam ‏ About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server -- the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF © The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apex", "Apache Apex", "Flink", "Apache Flink", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners. # # # [Less]
Posted 8 days ago by Liang Chen
The Apache CarbonData PMC team is happy to announce the release of Apache CarbonData version 1.1.0.   The key features of this release are highlighted as below.  Introduced new data format called V3 to improve scan performance ... [More] (~20 to 50%).  Alter table support in carbondata. (for Spark 2.1)  Supported Batch Sort to improve data loading performance.  Improved Single pass load by upgrading to latest netty framework and launched dictionary client for each loading  Supported range filters to combine the between filters to one filter to improve the filter performance.  Many improvements done on large cluster especially in query processing.  More than 160 bugs and many improvements done in this release. The release notes is available at: https://cwiki.apache.org/confluence/display/CARBONDATA/Apache+CarbonData+1.1.0+Release You can follow this document to use these artifacts: https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.mdYou can find the latest CarbonData document and learn more at: http://carbondata.apache.orgThanks The Apache CarbonData team [Less]
Posted 8 days ago by Liang Chen
Apache CarbonData is pleased to announce the release of the Version 1.1.0 in The Apache Software Foundation (ASF). CarbonData is a new BigData native file format for faster interactive query using advanced columnar storage, index, compression, and ... [More] encoding techniques to improve computing efficiency. In turn it will help to speedup queries an order of magnitude faster over PetaBytes of data. We encourage everyone to download the release, and feedback through the CarbonData user mailing lists! This release notes provides information on the new features, improvements, and bug fixes of this release. What’s New in Version 1.1.0? In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData. 1.Introducing V3 Data Format Benefits: 1)Improves the query performance by ~20% to 50%. 2)Improves the sequential IO by using larger size blocklets, this helps in reading larger data at once to memory. 3)Introduced pages with size of 32000 each for every column inside the blocklets, and min/max is maintained for each page to improve the filter queries. 4)Improved compression/decompression for row pages. 2.Alter Table Support Benefits: 1)Renaming of existing table. 2)Adding a new column for existing table. 3)Removing of new column for existing table. 4)Upcasting of data type from INT to BIGINT or decimal precision from lower to higher. 3.Batch Sort Support for Data Loading Benefits: Batch sort makes sort step as non blocking step, and capable of sorting whole batch in memory and converts to CarbonData file. 4.Improved Single Pass Benefits: Improved Single Pass load by upgrading to latest Netty framework, and launched dictionary client for each loading thread. 5.Range Filter Support Benefits: Range filters combines the between filters to one filter to improve the filter performance. 6.Improvements on Large Cluster Benefits:No more parallel loading of dictionary metadata in executor. Now dictionary metadata is loaded only once after all tasks inside executor uses it. Minimized file operations to avoid multiple namenode calls during query Please find the detailed JIRA list : https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12338987 [Less]
Posted 9 days ago by Sally
Open Source Big Data distributed stream processing framework in production at Intuit, LinkedIn, Netflix, Optimizely, Redfin, and Uber, among other organizations. Forest Hill, MD —15 May 2017— The Apache Software Foundation (ASF), the ... [More] all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Samza™  v0.13, the latest version of the Open Source Big Data distributed stream processing framework. An Apache Top-Level Project (TLP) since January 2015, Samza is designed to provide support for fault-tolerant, large scale stream processing. Developers use Apache Samza to write applications that consume streams of data and to help organizations understand and respond to their data in real-time. Apache Samza offers a unified API to process streaming data from pub-sub messaging systems like Apache Kafka and batch data from Apache Hadoop. "The latest 0.13 release takes Apache Samza's data processing capabilities to the next level with multiple new features," said Yi Pan, Vice President of Apache Samza. "It also improves the simplicity and portability of real-time applications." Apache Samza powers several real-time data processing needs including realtime analytics on user data, message routing, combating fraud, anomaly detection, performance monitoring, real-time communication, and more. Apache Samza can process up to 1.1 million messages per second on a single machine. v0.13 highlights include: A higher level API that developers can use this to express complex processing pipelines on streams more concisely; Support for running Samza applications as a lightweight embedded library without relying on YARN; Support for flexible deployment options;  Support for rolling upgrade of running Samza applications; Improved monitoring and failure detection using a built-in heart beating mechanism; Enabling better integrations with other cluster-manager frameworks and environments; and Several bug-fixes that improve reliability, stability and robustness of data processing, Organizations such as Intuit, LinkedIn, Netflix, Optimizely, Redfin, TripAdvisor, and Uber rely on Apache Samza to power complex data architectures that process billions of events each day. A list of user organizations is available at https://cwiki.apache.org/confluence/display/SAMZA/Powered+By "Apache Samza is a highly performant stream/data processing system that has been battle tested over the years of powering mission critical applications in a wide range of businesses," said Kartik Paramasivam, Head of Streams Infrastructure, and Director of Engineering at LinkedIn. "With this 0.13 release, the power of Samza is no longer limited to YARN based topologies. It can now be used in any hosting environment. In addition, it now has a new higher level API that makes it significantly easier to create arbitrarily complex processing pipelines." "Apache Samza has been powering near real-time use cases at Uber for the last year and a half," said Chinmay Soman, Staff Software Engineer at Uber. "This ranges from analytical use cases such as understanding business metrics, feature extraction for machine learning as well as some critical applications such as Fraud detection, Surge pricing and Intelligent promotions. Samza has been proven to be robust in production and is currently processing about billions of messages per day, accounting for 100s of TB of data flowing through the system."  "At Optimizely, we have built the world’s leading experimentation platform, which ingests billions of click-stream events a day from millions of visitors for analysis," said Vignesh Sukumar, Senior Engineering Manager at Optimizely. "Apache Samza has been a great asset to Optimizely's Event ingestion pipeline allowing us to perform large scale, real time stream computing such as aggregations (e.g. session computations) and data enrichment on a multiple billion events/day scale. The programming model, durability and the close integration with Apache Kafka fit our needs perfectly." "It has been a phenomenal experience engaging with this vibrant international community of users and contributors, and I look forward to our continued growth. It is a great time to be involved in the project and we welcome new contributors to the Samza community," added Pan. Catch Apache Samza in action at Apache: Big Data, 16-18 May 2017 in Miami, FL http://apachecon.com/ , where the community will be showcasing how Samza simplifies stream processing at scale. Availability and Oversight Apache Samza software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Samza, visit http://samza.apache.org/ , https://blogs.apache.org/samza/ , and https://twitter.com/samzastream About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF © The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", "Kafka", "Apache Kafka", "Samza", "Apache Samza", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners. # # # [Less]